Lecture 2 Tools presentation

About This Presentation

Transcript and Presenter's Notes

Title: Lecture 2 Tools

1
Lecture 2 - Tools
Objective - To familiarize you with the available
www resources,so that you can weave your way
through analysis of your data, and to help
interpret the analysis results.
2
Computational Biology vs. Biologist using
Computers- Two Different Things
A biologist or medical researcher typically
supplies data to or retrieves data from a
database and analyzes their data using available
tools created by others.
A computational biologist develops the original
tools, applies tools in new ways to make
discoveries in the data, develops and maintains
databases, and attempts to form a bigger picture
from large amounts of complex data.
An applied computational biologist uses
computational tools and laboratory skills
together.
3
Most Important - Use every database and tool
available, public and private. Check it
regularly, for there is new data every
day. Remember A computer search can save you
years!
4
Tools

BLAST - preparing the input, interpreting the
output.
Multiple alignment - assembly.
Protein structure visualization.
Coding region determination.
Feature extraction - CpG islands, polymorphism,
visualization.

5
Entrez is a search and retrieval system that
integrates information from databases at NCBI.
These databases include nucleotide sequences,
protein sequences, macromolecular structures,
whole genomes, and MEDLINE, through PubMed.
6
What is BLAST?
BLAST (Basic Local Alignment Search Tool) is an
algorithm and a computer program that compares a
query DNA or protein sequence to a database of
other DNA or protein sequences. The results of
that comparison are ranked according to a score
and then each high scoring hit is shown with
the bases of the query and the hit aligned to
show the regions of similarity. Search engines
like BLAST, can find distant relationships
between a query and a database entry, i.e.
similarities that are far from identity. An
adjustable scoring matrix is used by these
codes to assign a value for a match and a penalty
for a mismatch. This matrix reflects
biological/evolution information specific to each
species.
7
Select your database, Be careful!
Searching - BLAST
blastp compares an amino acid query sequence
against a protein sequence
database blastn compares a nucleotide query
sequence against a nucleotide
sequence database blastx compares the
six-frame conceptual translation
products of a nucleotide query sequence
(both strands) against a protein
sequence database tblastn compares a
protein query sequence against a
nucleotide sequence database
dynamically translated in all
six reading frames (both
strands). tblastx compares the six-frame
translations of a nucleo- tide
query sequence against the six-frame transla-
tions of a nucleotide sequence
database.
http//www.ncbi.nlm.nih.gov/BLAST/blast_help.html
8
Pre-Filtering before a BLAST search
DNA sequences, especially those of mammals and
plants contain a large number of repeated
sequences, like CACACACACACACA.. The purpose
of these sequences being present is unknown at
this time. Since database entries and queries
often have many repeat sequences contained in
them, spurious similarities, due only because of
the presence of these sequences often occur. This
detracts from identifying important real
similarities. To eliminate spurious hits to
repeat sequences, query sequences are usually
filtered and masked so that they will make
no contribution to the overall similarity score.
Careful, these simple sequence databases are
incomplete, and vary from species to species!
9
BLAST inputs.

Query, usually in Fasta format gtSkipgene
CAGTATAGTATATCAT
Search Parameters
Number of Hits to save.
Search as DNA (4 nucleotide) or translated
protein (amino acid) sequence
Similarity (PAM) matrix, a matrix of penalties
used to compute the similarity score given the
types of discrepancies between the query and the
database entries.

10
BLAST output components.

Execution statistics - database, its size, size
of the query.
Hits - entries in the databases that have the
highest similarity to the query.
Alignments - a base by base or protein by protein
comparison that can be inspected by eye to
confirm regions of similarity.

11
How the researcher uses similarity results.

Each user typically inspects the score of the
P(N) value and has a particular threshold above
which that individual feels is significant,
others use alignments, voodoo, etc..
The user then inspects the short description for
keywords or information of biological interest to
him/her. The biological background and specific
research objective greatly affects what is of
interest.
For hits of interest, the user typically will
inspect the alignment to confirm a real
similarity.
For each hit of interest, the user might retrieve
the full database entry and inspect the complete
annotation.

12
Lets start somewhere, how about a short set of
sequences you saw as a marker in some paper.
Where can we go from there?
GCGAGCGTGTGGAAT GACGACCACAACTA
How about complementing one of them to put on the
same strand, concatenate with an n so that you
know where you joined them, and submit to BLASTn,
WITH GAPs.
GCGAGCGTGTGGAATnCTGCTGGTGTTGAT
13
You get this back

Score E Sequences producing
significant alignments
(bits) Value gbAF110314AF110314 Homo sapiens
herpesvirus immunoglobuli... 36
0.27 gbAF060231AF060231 Homo sapiens
herpesvirus entry protein... 36
0.27 refNM_002855.1HVEC Homo sapiens
herpesvirus entry mediat... 36
0.27 embZ34275UUTUFG U.urealyticum tuf gene
for elongation fac... 32 4.2
14
The alignment on the second looks good, so click
on it and lets see what is up.
gbAF060231AF060231 Homo sapiens herpesvirus
entry protein C (HVEC) mRNA, complete cds
Length 1710 Score 36.2 bits
(18), Expect 0.27 Identities 20/21 (95)
Strand Plus / Plus
Query 1 gcgagcgtgtggaatncctgc 21
Sbjct 437
gcgagcgtgtggaattcctgc 457 Score 32.2 bits
(16), Expect 4.2 Identities 16/16 (100)
Strand Plus / Plus
Query 17 cctgctggtgttgatt 32
Sbjct 1248 cctgctggtgttgatt 1263
15
The HVEC DNA sequence can be retrieved.
GCGAGCGTGTGGAATTCCTGCGGCCCTCCTTCACCGATGGCACTATCCGC
CTCTCCCGCCTGGAGCTGGA GGATGAGGGTGTCTACATCTGCGAGTTTG
CTACCTTCCCTACGGGCAATCGAGAAAGCCAGCTCAATCTC ACGGTGAT
GGCCAAACCCACCAATTGGATAGAGGGTACCCAGGCAGTGCTTCGAGCCA
AGAAGGGGCAGG ATGACAAGGTCCTGGTGGCCACCTGCACCTCAGCCAA
TGGGAAGCCTCCCAGTGTGGTATCCTGGGAAAC TCGGTTAAAAGGTGAG
GCCAGAGTACCAGGAGACTCCGGAACCCCAATGGCACCAGTGACGGTCAT
CAGC CGCTACCGCCTGGTGCCCAGCAGGGAAGCCCACCAGCAGTCCTTG
GCCTGCATCGTCAACTACCACATGG ACCGCTTCAAGGAAAGCCTCACTC
TCAACGTGCAGTATGAGCCTGAGGTAACCATTGAGGGGTTTGATGG CAA
CTGGTACCTGCAGCGGATGGACGTGAAGCTCACCTGCAAAGCTGATGCTA
ACCCCCCAGCCACTGAG TACCACTGGACCACGCTAAATGGCTCTCTCCC
CAAGGGTGTGGAGGCCCAGAACAGAACCCTCTTCTTCA AGGGACCCATC
AACTACAGCCTGGCAGGGACCTACATCTGTGAGGCCACCAACCCCATCGG
TACACGCTC AGGCCAGGTGGAGGTCAATATCACAGAATTCCCCTACACC
CCGTCTCCTCCCGAACATGGGCGGCGCGCC GGGCCGGTGCCCACGGCCA
TCATTGGGGGCGTGGCGGGGAGCATCCTGCTGGTGTTGATTGTGGTCGGC
G
There are a lot of directions one can go from
here.
16
M_002855 . Homo sapiens herpe...gi4506336 LOC
US HVEC 1557 bp mRNA
PRI 10-NOV-1999 DEFINITION Homo sapiens
herpesvirus entry mediator C (poliovirus
receptor-related 1 nectin) (HVEC),
mRNA. ACCESSION NM_002855 NID
g4506336 VERSION NM_002855.1
GI4506336 KEYWORDS . SOURCE human.
ORGANISM Homo sapiens Eukaryota
Metazoa Chordata Craniata Vertebrata
Mammalia Eutheria Primates
Catarrhini Hominidae Homo. REFERENCE 1
(bases 1 to 1557) AUTHORS Lopez,M.,
Eberle,F., Mattei,M.G., Gabert,J., Birg,F.,
Bardin,F., Maroc,C. and Dubreuil,P.
TITLE Complementary DNA characterization and
chromosomal localization of a human
gene related to the poliovirus receptor-encoding
gene JOURNAL Gene 155 (2), 261-265 (1995)
MEDLINE 95237621 REFERENCE 2 (bases 1 to
1557) AUTHORS Geraghty RJ, Krummenacher C,
Cohen GH, Eisenberg RJ and Spear PG. TITLE
Entry of alphaherpesviruses mediated by
poliovirus receptor-related protein 1
and poliovirus receptor JOURNAL Science 280
(5369), 1618-1620 (1998) MEDLINE
98279152 REFERENCE 3 (bases 1 to 1557)
AUTHORS Cocchi F, Menotti L, Mirandola P, Lopez
M and Campadelli-Fiume G. TITLE The
ectodomain of a novel member of the
immunoglobulin subfamily related to
the poliovirus receptor has the attributes of a
bona fide receptor for herpes simplex
virus types 1 and 2 in human cells JOURNAL J.
Virol. 72 (12), 9992-10002 (1998) MEDLINE
99030909 COMMENT REFSEQ This reference
sequence was derived from X76400.1.
PROVISIONAL RefSeq This is a provisional
reference sequence record that has
not yet been subject to human review. The final
curated reference sequence record may
be somewhat different from this one.
Inspect the annotation in the GenBank entry.
17
FEATURES Location/Qualifiers
source 1..1557
/organism"Homo sapiens"
/db_xref"taxon9606"
/map"11q23- q24"
/clone_lib"cDNA in pSPORT" gene
1..1557 /gene"HVEC"
/note"PVRL1 HIGR PRR1 PVRR1
SK-12" /db_xref"LocusID5818
" /db_xref"MIM600644"
CDS 1..1557
/gene"HVEC" /codon_start1
/db_xref"LocusID5818"
/db_xref"MIM600644"
/product"herpesvirus entry mediator C
(poliovirus receptor-related
1 nectin)"
/protein_id"NP_002846.1"
/db_xref"PIDg4506337"
/db_xref"GI4506337"
/db_xref"SPTREMBLQ15223"
/translation"MARMGLAGAAGRWWGLALGLTAFFLPGVHSQVVQVN
DSMYGFIG TDVVLHCSFANPLPSVKITQ
VTWQKSTNGSKQNVAIYNPSMGVSVLAPYRERVEFLRP
SFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNLTV
MAKPTNWIEGTQAVLRAK
KGQDDKVLVATCTSANGKPPSVVSWETRLKGEARVPGDSGTPMAPVTVIS
RYRLVPSR EAHQQSLACIVNYHMDRFKE
SLTLNVQYEPEVTIEGFDGNWYLQRMDVKLTCKADANP
PATEYHWTTLNGSLPKGVEAQNRTLFFKGPINYSLAGTYI
CEATNPIGTRSGQVEVNI
TEFPYTPSPPEHGRRAGPVPTAIIGGVAGSILLVLIVVGGIVVALRRRRH
TFKGDYST KKHVYGNGYSKAGIPQHHPP
MAQNLQYPDDSDDEKKAGPLGGSSYEEEEEEEEGGGGG
ERKVGGPHPKYDEDAKRPYFTVDEAEARQDGYGDRTLGYQ
YDPEQLDLAENMVSQNDG
18
Lets see if there is any information we can dig
up on the protein.
19
gtgi4506337refNP_002846.1pHVEC herpesvirus
entry mediator C (poliovirus receptor-related 1
nectin) MARMGLAGAAGRWWGLALGLTAFFLPGVHSQVVQVNDSMYG
FIGTDVVLHCSFANPLPSVKITQVTWQKS TNGSKQNVAIYNPSMGVSVL
APYRERVEFLRPSFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNL
TVMAKPTNWIEGTQAVLRAKKGQDDKVLVATCTSANGKPPSVVSWETRL
KGEARVPGDSGTPMAPVTVIS RYRLVPSREAHQQSLACIVNYHMDRFKE
SLTLNVQYEPEVTIEGFDGNWYLQRMDVKLTCKADANPPATE YHWTTLN
GSLPKGVEAQNRTLFFKGPINYSLAGTYICEATNPIGTRSGQVEVNITEF
PYTPSPPEHGRRA GPVPTAIIGGVAGSILLVLIVVGGIVVALRRRRHTF
KGDYSTKKHVYGNGYSKAGIPQHHPPMAQNLQYP DDSDDEKKAGPLGGS
SYEEEEEEEEGGGGGERKVGGPHPKYDEDAKRPYFTVDEAEARQDGYGDR
TLGYQ YDPEQLDLAENMVSQNDGSFISKKEWYV
Let work from the FASTA format of protein
sequence.
refNP_002846.1PHVEC herpesvirus entry
mediator C (poliovirus receptor-related 1
nectin) gtgi1082702pirJC4024 poliovirus
receptor-related protein - human
gtgi732796embCAA53980 (X76400) PRR1 Homo
sapiens Length 518
Score 57.0 bits (135), Expect 6e-08
Identities 33/119 (27), Positives 61/119
(50), Gaps 5/119 (4) Query 2
VVYTDREVYGAVGSQVTLHCSFWSSEWVSDDISFTWRYQPEGGRDAISIF
HYAKGQPYID 61 VV YG G V LHCSF
TW G I G Sbjct 32
VVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVTWQKSTNGSKQNVAIY
NPSMG---VS 88 Query 62 EVGTFKERIQWVGDPSWKDGSIVI
HNLDYSDNGTFTCDVKNPPDIVGKTSQVTLYVFEK 120
ER PS DGI L D G C P
SQ L V K Sbjct 89 VLAPYRERVEFL-RPSFTDGTIRLSRLE
LEDEGVYICEFATFP-TGNRESQLNLTVMAK 145
Myelin Membrane Adhesion Molecule is one thing we
get back.
20
And the sequence of the other hit, for HIgR
gtgi4154346gbAAD04944.1 herpesvirus
immunoglobulin-like receptor HIgR MARMGLAGAAGRWWGL
ALGLTAFFLPGVHSQVVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVT
WQKS TNGSKQNVAIYNPSMGVSVLAPYRERVEFLRPSFTDGTIRLSRLE
LEDEGVYICEFATFPTGNRESQLNL TVMAKPTNWIEGTQAVLRAKKGQD
DKVLVATCTSANGKPPSVVSWETRLKGEAEYQEIRNPNGTVTVISR YRL
VPSREAHQQSLACIVNYHMDRFKESLTLNVQYEPEVTIEGFDGNWYLQRM
DVKLTCKADANPPATEY HWTTLNGSLPKGVEAQNRTLFFKGPINYSLAG
TYICEATNPIGTRSGQVEVNITEKPRPQRGLGSAARLL AGTVAVFLILV
AVLTVFFLYNRQQKSPPETDGAGTDQPLSQKPEPSPSRQSSLVPEDIQVV
HLDPGRQQQ QEEEDLQKLSLQPPYYDLGVSPSYHPSVRTTEPRGECP
Can you identify motifs, or highly conserved
regions, in these sequences? Try
http//www.sdsc.edu/MEME/meme/website/ What
about conserved regions for Myelin and HVEC, for
which sequence homology was found?
21
Lets put HVEC and Myelin into MEME
The following motifs are found.
DATABASE meme.30154.data (peptide) Last updated
on Tue Nov 23 055931 1999 Database contains 1
sequences, 642 residues MOTIFS
meme.30154.results (peptide) MOTIF WIDTH BEST
POSSIBLE MATCH ----- ----- -------------------
1 8 VYTCEFAN 2 12 ERHEQSLTCNVD
3 12 RSSQVNLNVFEK 4 8 PSWNDGSI 5
12 VSWQKRLKGEKR
Myelin HVEC
22
Myelin membrane adhesion molecule, which has a
solved structure, has shared motifs, Motif
1,3,4, with HVEC.
Myelin HVEC
homology
4 1 3
7.7e-11 1.0e-05 1.1e-11
PSWNDGSI VYTCEFAN RSSQVNLNVFEK
76
PSWKDGSIVIHNLDYSDNGTFTCDVKNPPDIVGKTSQVTLYVFEKVPT
RMARMGLAGAAGRWWGLALGLTAFFLP
1 5
1.3e-07 3.4e-12
VYTCEFAN VSWQKRLKGEKR
151
GVHSQVVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVTWQKSTNGSKQ
NVAIYNPSMGVSVLAPYRERVEFLR 4
1 3
1 1.2e-08 5.0e-08 8.5e-09
5.2e-06 PSWNDGSI
VYTCEFAN RSSQVNLNVFEK
VYTCEFAN
226
PSFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNLTVMAKPTNWI
EGTQAVLRAKKGQDDKVLVATCTSAN 5
2 2
4.3e-13 4.3e-09
2.6e-08 VSWQKRLKGEKR
ERHEQSLTCNVD ERHEQSLTCNVD

301 GKPPSVVSWETRLKGEARVPGDSGTPMAPVT
VISRYRLVPSREAHQQSLACIVNYHMDRFKESLTLNVQYEPEVT
23
There is no solved protein for our sequence, so
we take the protein sequence BLAST results and
see if we turn up any that are solved, and then
look at those.

Score E Sequences producing
significant alignments
(bits) Value pdb1NEU Structure Of Myelin
Membrane Adhesion Molecule P0 57
4e-09 pdb1BIHA Chain A, Crystal Structure Of
The Insect Immune ... 34 0.025 pdb2H1PL
Chain L, The Three-Dimensional Structures Of A
P... 34 0.033 pdb1A3LL Chain L, Catalysis
Of A Disfavored Reaction An ... 33
0.056 pdb1A4JL Chain L, Diels Alder Catalytic
Antibody Germline... 33 0.056
Lucky, Myelin Membrane Adhesion Molecule is
solved!
Region of high homology on the outside of the
protein, perhaps a hint as to a domain involved
in some kind of interaction, maybe not.
24
Can we find a genomic clone for this sequence?
Why would we want to?
gbAC015907.1AC015907 Homo sapiens clone
RP11-48A13, LOW-PASS SEQUENCE SAMPLING
Length 55317 Score 48.1 bits
(24), Expect 0.005 Identities 24/24 (100)
Strand Plus / Plus
Query 878 gtgtggaggcccagaacagaaccc
901 Sbjct
44103 gtgtggaggcccagaacagaaccc 44126
Maybe, only maybe, because this is probably a
repeat sequence that passed the filters (WHY?),
but it might be worth trying to see what the rest
of the sequence looked like from this Roswell
Park clone, but the link is not good. Dead end?
25
Maybe we can find by electronic PCR.
26
What is missing?

Unification and integration of the analysis.
In-depth analysis.
The big picture?
Tools that work on many pieces of data at once.
Data mining.
Expression database - mRNA, proteins.
Other?

Now, a quick look at a couple of stabs at this
list!
27
Local Software Projects
BANAL - NLP/Bayesian Network analysis of
Expression Arrays ARROGANT - Optimized Expression
Array Design and Analysis X-Hyb - Looking for
cross-hybridization in Expression Arrays MAD
PAD - Expression Array database and
layout Protein Molecular Dynamics - Sequence
polymorphism effects on solved protein
structures SNIDE - SNP prediction Rep-X (aka
UniPOMPOUS) - simple sequence repeat polymorphism
prediction
28
PANORAMA - a new server for Integrated Genomic
Sequence Analysis

Genomic sequence features visualization
Preparation for Expert System Based Analysis
GenBank (EST and non-EST) homologies
Gene prediction (GenScan)
POMPOUS
New - control / recognition sequences,
Transcription factors, CpG islands, enhancers,
termination sequences more on the way!

29

30

PANORAMA Integrated Analysis on The
WWW BLAST CpG islands GenScan Repeats POMPOUS .
. Java soon.
31
Polymorphism prediction software

SNIDE (SNp IDEntification) Predict high-impact,
high-probability SNPs.
POMPOUS Prediction of polymorphic markers for
allelotyping (PNAS, June 98, Vol. 95 p7514-19)
Rep-X (UniPOMPOUS) Improvement of POMPOUS code
and application to expressed gene sequences via
Unigene

32
Rep-X (Repeat eXpansions within mRNAs)
Background on Nucleic acid repetitive elements

Repeating sequence units (microsatellites) known
for long time to undergo expansion and
contraction of base repeat unit
Slipped-strand mispairing and unequal
recombination thought to be responsible
Well known polymorphic sequence units CA
(intervening sequence) and CAG (linked to several
neurological disorders).
Polymorphic repeat units mentioned in the
literature range from 1 to over 250 bp.
Impact of polymorphisms found in all regions

5 UTR Hyperandrogenaemia CDS Haw River
Syndrome Intron Fredreichs Ataxia 3 UTR
Myotonic Dystrophy
33
Reasons to study

Candidates for genetic diseases
Candidates for phenotypic variations
Polymorphism profile indicative of functional
role for protein region
Nature may use non-degenerate codon repeats for
more rapid evolutionary response to selection
pressure
Learn more about roles of peptide repeats

34
Computational Process

Download UniGene (Unique Gene) dataset of
assembled EST sequences
Longest, cleanest sequence obtained for each
Unigene cluster
Program run on entire Unigene database (10/99
build 85,639 entries)
Candidates for follow-up experimental study
picked by repeat type, location and interest in
the gene

35
Example Follow-up on 30 patient DNAs

Herpes Virus Entry Protein C AGG(8)

5-----------start--------------------------X---
------stop-----------------------gt3

Variable resistance to HSV infection in
population
HSV unable to penetrate cell in C-terminal
deletion experiment (including Glu repeat)
Glu region bears homology to calcium-based
transporters
HSV unable to enter without calcium present

36
Experimental Verification Results

Out of 146 genes chosen for testing, 102
amplified and 54 were polymorphic (53)
Tested on 30 patient DNA samples.

37
We can predict repeat polymorphisms, and there
are a lot of them.
We have found defective entries in UniGene that
result in overprediction of the number of genes
by 20
38
Data Mining - What is it? Certainly a
fashionable term.
39
and public servers are available for SQL queries
to linked data
MEDLINE
ACGATGTGGTCGATG TTCTCTATTATTATC GGAAGCTAAGGATAT CG
CTGATGTGAGGTGA TCGGTTCTATCTGCA TAGCATGGATATTGA TGG
CTTATAGGCTAG CGCTGATGTGAGGTG
MVILLVILAIVLISD VTGREGSWQIPCMNV KRKKGREGDHIVLIL IL
LNNAWASVLPESDS SDSGPLIILHEREKR LALAMAREENSPNCT PLI
KRESAEDSEDLR KRKKTDEDDHIVLIL
Protein Sequences
GenBank
Links
Genomes
Structures
40
Some are using simple mining methods on titles

pieces of evidence extracted from titles of
articles in the biomedical literature (Swanson
1988, Swanson and Smalheiser 1997)

stress can lead to loss of magnesium magnesium
is a natural calcium channel blocker high levels
of magnesium inhibit SCD magnesium can suppress
platelet aggregability
stress is associated with migraines calcium
channel blockers prevent some migraines spreading
cortical depression (SCD) is implicated in some
migraines migraine patients have high platelet
aggregability

led to the discovery
magnesium deficiency may play a role in migraine
headaches
confirmed by subsequent study (Ramadan et al.
1989)

41
and there is still enormous opportunity!!

Traditional approach
one thread is followed through several databases
Result
finding A is related to sequence B and structure C

text databases
A
sequence databases
structure databases
42
and there is still enormous opportunity!!

Directed approaches
keyword/grammar based SQL
simple data mining on titles and abstracts
Result
finding A is related to other findings of the
same data type

text databases
A
sequence databases
structure databases
43
and there is still enormous opportunity!!

Machine learning
data mining on full texts and other biomedical
data
Result
finding A is related to other findings of the
same data type through connections found among
other datatypes

text databases
A
sequence databases
structure databases
44
because tree overlap may be new associations
?
45
and EMILE (Perot Systems)for knowledge discovery

Entity Modeling Intelligent Learning Engine
language-independent grammar induction
cluster analysis on text to identify semantic and
syntactic clusters
clustering of biomedical data based on concepts
will use for biological knowledge base
construction
concept clustering may reveal previously
undiscovered knowledge
molecular interaction networks (MINe) to serve as
basis for static cell and dynamic cell modeling

46
EMILE Results dataset too small to learn the
language
91 PubMed abstracts keywords cancer, polymorphism
47
EMILE results interesting associations are
discovered when clusters are inspected manually

EMILE realized that these are related
94 --gt Chinese
94 --gt Japanese
94 --gt Polish

And found a biological connection among diverse
verb use
11 --gt LOH was 105
105 --gt identified in 13 cases (72
105 --gt detected in 9 of 87 informative cases
(10
105 --gt observed in 5 (55

48
Some vision of what is to come.

Data mining will become important given the
amount of data becoming available.
Patient/phenotypes become increasingly important
for identifying genes and their function within
existing genomic sequence.
Genome-Transcriptome-Proteome are unified.
Understanding of complex systems (humans)
possible from network analysis and computers.
Novel genes will continue to be discovered as we
sequence more organisms.

49
Closing message
The intent of this set of lectures was to
introduce you to the wide variety of data and
tools that are available, mainly on the www, and
encourage you to use these tools For an
in-depth understanding of the organization of the
data and the algorithms that are the basis of the
tools, there may be a new course next year?

Write a Comment

User Comments (0)

About PowerShow.com

Lecture 2 Tools PowerPoint PPT Presentation