The%20BIG%20Goal - PowerPoint PPT Presentation

About This Presentation
Title:

The%20BIG%20Goal

Description:

is important for studying the molecular basis. of diseases, diagnostics, developing drugs ... be monitored today using molecular techniques. The Tree of Life: ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 52
Provided by: Ben5152
Category:
Tags: 20big | 20goal

less

Transcript and Presenter's Notes

Title: The%20BIG%20Goal


1
The BIG Goal
  • The greatest challenge, however, is analytical.
    Deeper biological insight
  • is likely to emerge from examining
  • datasets with scores of samples.
  • Eric Lander, array of hope Nat.
    Gen.
  • volume 21 supplement pp 3 - 4,
    1999.

Bio-informatics Provide methodologies for
elucidating biological knowledge from biological
data.
2
Central Paradigm of Bio-informatics
Genetic Information
3
Central Paradigm of Bio-informatics
Molecular Structure
Genetic Information
4
Central Paradigm of BioInformatics
Biochemical Function
Molecular Structure
Genetic Information
5
Central Paradigm of Bio-informatics
Biochemical Function
Molecular Structure
Genetic Information
Symptoms
6
Central Paradigm of Bio-informatics
Biochemical Function
Molecular Structure
Genetic Information
Symptoms
7
Computer Science Tools are Crucial
http//www.sanger.ac.uk/PostGenomics/S_pombe/prese
ntations/EMBOCopenhagenWebsite.pdf
8
Computer Science Tools are Crucial
  • New bio-technologies create huge amounts
  • of data.
  • It is impossible to analyze data by manual
  • inspection.
  • Novel mathematical, statistical, algorithmic
  • and computational tools are necessary !

9
Automated Sequencing
http//cbms.st-and.ac.uk/academics/ryan/Teaching/S
BBioinf/lecture1.htm
10
What is Bio-Informatics ?
  • A field of science in which Biology, Computer
    Science and Information Technology merge into a
    single discipline.
  • Computers ( software tools) are used to
    collect, analyze and interpret biological
    information at the molecular level.
  • Goal To enable the discovery of new
    biological
  • insights and create a global perspective for
    biologists.

11
Disciplines
  • Development of new algorithms and statistical
  • methods to assess relationships among members
  • of large data sets.
  • Analysis and interpretation of various types
    of
  • data.
  • Development and implementation of tools to
  • efficiently access and manage different types
  • of information.

12
Why Use Bio-Informatics ?
  • An explosive growth in the amount of biological
    information
  • necessitates the use of computers for cataloging
    and retrieval of data (gt 3 billion bps, gt 30,000
    genes).
  • The human genome project.
  • Automated sequencing.
  • GenBank has over 16 Billion bases
  • and is doubling every year !!!

13
New Types of Biological Data
  • Micro arrays - gene expression.
  • Multi-level maps genetic, physical
  • sequence, annotation.
  • Networks of protein-protein
  • interactions.
  • Cross-species relationships
  • Homologous genes.
  • Chromosome organization.

http//www.the-scientist.com/yr2002/apr/research?0
20415.html
14
Why Bio Informatics ? (cont.)
  • A more global view of experimental design.
  • (from one scientist one gene/protein/diseas
    e
  • paradigm to whole organism consideration).
  • Data mining - functional/structural
    information
  • is important for studying the molecular basis
  • of diseases, diagnostics, developing drugs
  • (personal medicine), evolutionary patterns,
    etc.

15
Why Bio Informatics ? (cont.)
http//www.library.csi.cuny.edu/davis/Bioinfo_326
/lectures/lect14/lect_14.html
16
Future of Genomic Research
  • Principle milestones in data mining and genome
    analysis
  • Sanger method for sequencing, invented in 1977
  • (winner of the Nobel Prize in 1980),
  • Polymerase chain reaction (PCR), invented in
    1989
  • (awarded the Nobel Prize in 1993).

http//www.usgenomics.com/technology/index.shtml
17
The next step Locate all the genes and
understand their function.
This will probably take another 15-20 years !
18
Disease Genes Discovered
19
(No Transcript)
20
The job of biologists is changing
One can efficiently find information Using
databases and software on the web .
Question How likely are you to use a free
bio-informatics library of accessible software ?
http//www.cryst.bbk.ac.uk/classlib/BBSRC_poster/p
otential.html
21
Molecular Biology Analysis Software Tools
- Freely Available on the Web. - Highlights
22
Broad Classification of Biological Databases
http//www.mrc-lmb.cam.ac.uk/genomes/madanm/pres/b
iodb.htm
23
NCBI
ENTREZ - PubMed
24
http//www3.ncbi.nlm.nih.gov/Entrez/index.html
25
Post-genomic terms (Oct. 2002)
Google search
PubMed
Genome Proteome Transcriptome Gene
function Metabolome Glycome
2.1x106 76,566
89,300 1,701
9,960 229
1.2x106 6.5x105
1,170 29
138 6
From Computational Proteomics, Mark B Gerstein,
Yale U.
26
http//cbms.st-and.ac.uk/academics/ryan/Teaching/S
BBioinf/lecture1.htm
27
http//cbms.st-and.ac.uk/academics/ryan/Teaching/S
BBioinf/lecture1.htm
28
http//cbms.st-and.ac.uk/academics/ryan/Teaching/S
BBioinf/lecture1.htm
29
http//cbms.st-and.ac.uk/academics/ryan/Teaching/S
BBioinf/lecture1.htm
30
Similarity / Analogy
Examples If looks like an elephant, and smells
like an elephant its an elephant. If walks
like a duck, and quacks like a duck its a
duck.
http//cbms.st-and.ac.uk/academics/ryan/Teaching/m
olbiol/Bioinf_files/v3_document.htm
31
Similarity Search in Databanks
Find similar sequences to a working draft. As
databanks grow, homologies get harder, and
quality is reduced. Alignment Tools BLAST
FASTA (time saving heuristics- approximations).
gtgbBE588357.1BE588357 194087 BARC 5BOV Bos
taurus cDNA 5'. Length 369 Score
272 bits (137), Expect 4e-71 Identities
258/297 (86), Gaps 1/297 (0) Strand Plus /
Plus
Query 17
aggatccaacgtcgctccagctgctcttgacgactccacagataccccga
agccatggca 76
Sbjct 1
aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgc
agccatggcc 59
Query 77
agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccagga
agccgtgtca 136
Sbjct 60
agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccagga
agcggtgaca 119
Query 137
gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggg
gcagaaagcc 196
Sbjct
120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaa
gcagggcagaaagcc 179
Query
197 atggaccagctggccaagaccacccaggaaaccatcgacaagactg
ctaaccaggcctct 256
S
bjct 180 atggaccaggttgccaagactacccaggaaaccatcgacc
agactgctaaccaggcctct 239

Query 257 gacaccttctctgggattgggaaaaaattcggcctcct
gaaatgacagcagggagac 313
Sbjct
240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgac
agaagggagac 296
Pairwise alignment
32
Multiple Sequence Alignment
Multiple alignment find protein families and
functional domains.
33
Structure - Function Relationships
structure
sequence
function
34
Protein Structure (domains)
35
Phylogeny
Evolution - a process in which small changes
occur within species over time. These changes
could be monitored today using molecular
techniques.
  • The Tree of Life
  • A classical, basic
  • science problem,
  • since Darwins 1859
  • Origin of Species.

36
Searching Protein Sequence Databases - How far
can we see back ?
Tree of Life
Mammalian radiation
Invertebrates/ vertebrates
Plant/ animals
Prokaryotes/ eukaryotes
First self replicating systems
Formation of the solar system
Origin of the universe ?
37
The Human Genome Project (HGP)
  • Write down all of human DNA on a single CD
  • (completed 2001).
  • Identify all genes, their location and
  • function (far from completion).

38
Example for Gene Localization Bio-Tool (FISH).
39
FISH - Fluorescence In-Situ Hybridization.
  • Fluorescent labeled probes hybridize to
    specific
  • chromosomal locations.
  • Example application low resolution
    localization of a gene.

40
Sequencing Genes Gene Assembly
Automated sequencing
41
Gene Finding
  • Only 2-3 of the human genome encodes for
    functional genes.
  • Genes are found along large non-coding DNA
    regions.
  • Repeats, pseudo-genes, introns, contamination of
    vectors,
  • are very confusing.

42
Gene Finding - cont.
  • Find special gene patterns
  • Translation start and stop sites (open reading
  • frames - ORF).
  • Transcription
  • factors, promoters.
  • Intron splice sites.
  • Etc

43
(No Transcript)
44
Micro Arrays (DNA Chips)
New biotechnology breakthrough measure RNA
expression levels of thousands of genes (in one
experiment).
45
The Idea Behind Micro Arrays
46
Clustering Analysis of Gene Expression Data
DNA chips and personalized medicine (leading
edge, future technologies).
47
Pharmaco-genomics
Use DNA information to measure and predict the
reaction to drugs. Personalized
medicine. Faster clinical trials selected
populations. Less drug side-effects.
48
Protein and Other Arrays
Sequencing the human genome gt finite
problem. Studying the proteome gt endless
possible variations, dynamic.
Protein array
Future fields of study Proteins Genomics
Proteomics Lipids Genomics
Lipomics Sugars Genomics Glycomics
49
Understanding Mechanisms of Disease
EC number
compound
50
Putting it all together Bio-Informatics
ORTHOLOG GENES (Taxonomy)
SEQUENCE ALIGNMENT
CODING REGIONS
CONSERVED DOMAINS
SEQUENCES LITERATURE
3-D STRUCTURE
GENE FAMILIES
SIGNAL PEPTIDE
MUTATIONS POLYMORPHISM
GENOME MAPS
CELLULAR LOCATION
51
Putting it all together Bio-Informatics
ORTHOLOG GENES (Taxonomy)
SEQUENCE ALIGNMENT
CODING REGIONS
CONSERVED DOMAINS
GENE EXPRESSION, GENES FUNCTION, DRUG PERSONAL
THERAPY
3-D STRUCTURE
GENE FAMILIES
SIGNAL PEPTIDE
MUTATIONS POLYMORPHISM
GENOME MAPS
CELLULAR LOCATION
Write a Comment
User Comments (0)
About PowerShow.com