Title: Bioinformatics :Data to biological knowledge in a mouseclick
1Bioinformatics Data to biological knowledge in a
mouseclick
Rinku Saha
Biomedical Informatics Team
UAMS
2Biology vs bioinformatics
Biology now is a science in transition changing
rapidly from data-poor to data-rich
science Biology Uses time consuming wet lab
data and analysis is a repetative
process. Bioinformatics Its fast as it uses
resources from several datasets at once.
Bioinformatics
Traditional biology
FigReferenced from Bioinformatics from data
to biological knowledge by Dena Leshkowitz,
Ph.D Bioinformatics Unit Hebrew University
3Life is a library of sequences
Crick and Watson, a pair of students in 1950's
Cambridge, discovered the structure of DNA, In
1956 Frederick Sanger was the first to establish
the order of amino acids of a protein hormone
Insulin. The progress in determining protein
sequences was slow until the mid 1970's when the
same Frederick Sanger (amongst others) developed
methods for the rapid sequencing of DNA. The
ability to sequence DNA lead rapidly to an
immediate increase in the number of protein
sequences resolved. Central databases were
established in Europe, the USA and Japan to
collect this sequence information from individual
scientists and make it available to other
researchers. The Human Genome Project is a decade
long endeavor which is benefitting in the form of
gene sequences that emerge from the project .
This was followed by a series of projects, some
still continuing, have been successful in
sequencing the smaller genomes of some bacteria,
yeast,invertebrates,rat,mice etc. The increased
quantity of data will lead to a better
understanding of the way genes and their protein
products work and thus help us with developing
better methods for dealing with the diseases that
happen when the processes that control life go
wrong. Then came the microarray technology that
allows simultaneous measurement of expression
levels for up to tens of thousands of genes which
helps us in examining complex biological
interactions simateneosly.And now Protein arrays
are rapidly becoming established as a powerful
means to detect proteins, monitor their
expression levels, and investigate protein
interactions and functions . Protein arrays make
possible the parallel multiplex screening of
thousands of interactions, encompassing
protein-antibody, protein-protein, protein-ligand
or protein-drug, enzyme-substrate screening and
multianalyte diagnostic assays in the chip
format
4Data Explosion
Human Genome Project
Microarray and Protein Array
Human Genome project completed assembly in year
2000
The Challenge of terabytes of data and its
annotation and representation
Fig RefBioinformatics from data to biological
knowledge Dena Leshkowitz, Ph.D Bioinformatics
Unit Hebrew University
5What do we do with such a huge volume of data
Bionformatics Solutions Develop
fast applications to analyze the data .
Develop databases and software to
store the data, enter new data and
query the data(NCBI,EMBL etc)
Design data structures to
represent this Information
RefBioinformatics from data to biological
knowledge Dena Leshkowitz, Ph.D Bioinformatics
Unit Hebrew University
6Nucleotides and Bioinformatics
Question What would you do when you
discover a unknown DNA fragment in a gel which
you have had sequenced
- Answer Bioinformatics tools for dna sequence and
genomic analysis - Vector sequence check Using Blast2Evec or NCBI
VecScreen etc - Restriction mapping Using REBASE,ResMap,Restricti
on Analysis etc - Design PCR primers Using Oligo,Primer3,BioMath,P
rimerStation etc - Analyze DNA compositionUsing Repeat
Masker,Emboss tools such as chips,compseq,chips
etc - Identify coding regions and translationORF
Finder,Genie,Translate tool,Transeq etc - Motif identificationSMART,ProfileScan
- Identification of signals associated with gene
regulationGrailEXP - Similarity searching for identifying a probable
functional role tblastn,megablast,psi- - blast(NCBI)
- Genome search to identify similar regions in
wider range of organisms - GenomeScan,SNP(Ncbi) of
- Important Links
- http//restools.sdsc.edu/biotools/biotools16.html
- http//www.humgen.nl/primer_design.html
- http//ccb.ucmerced.edu/app/?idemboss
- http//www.123genomics.com/files/analysis.html
- http//www.dnalc.org/bioinformatics/dnalc_nucleoti
de_analyzer.htm
7RNA and Bioinformatics
Question DNA to RNA
- Analysis you can perform using bioinformatics
tools - Detect tRna and tmRna in nucleotide
sequenceARAGORN - RNA secondary structure preditionRNAView
Secondary Structure Viewer,RNAmine,RNA Fold
Server
- Important Links
- http//www.bioinfo.rpi.edu/applications/mfold/
- http//rnamine.ncrna.org/rnamine/
- http//phmmts.ncrna.org/phmmts/jsp/mainIndex.jsp?p
ageRefphmmts
8Gene Expression
Facts
- every cell of the body contains a full set of
chromosomes and identical genes - a fraction of these genes are turned on, however,
and it is the subset that is "expressed" - that confers unique properties to each cell
type - during transcription information contained within
the DNA, the repository of genetic information - into messenger RNA (mRNA) molecules
- mRNA molecules are then translated into the
proteins that perform most of the critical
functions of cells - scientists now study the kinds and amounts of
mRNA produced by a cell to learn which genes are - expressed (using microarrays ), which in turn
provides insights into how the cell responds to
its changing needs
Question What would you do to manage huge
datasets and analyze microarray
image and data?
- Databases to store data and applications to
analyze data Bioinformatics way - Management of microarray dataAMAD,BASE
- Image and data analysis R, bioconductor, d-chip
etc,Affymetrix, Scananalyze, clustering - Data AnnotationNetaffyx,DAVID,Onto-Express,GenMap
p -
Important Links http//genome-www5.stanford.edu/ h
ttp//staffa.wi.mit.edu/chipdb/public/ http//info
.med.yale.edu/microarray/data_analysis.htmkeckwks
http//david.abcc.ncifcrf.gov/
9Protein and Bioinformatics
Facts
- amino acids are strung together in particular
sequences that will fold up into a specific
structure - each protein is a nanomachine that can perform
a particular task - understanding how different proteins fold up and
how they work, we can begin to understand how
they work together - to make up a cell or play a role in disease
gt mdlsavriqe vqnvlhamqk ilecpiclel ikepvstqcd
hifckfcmlk llnqkkgpsq cplckneitk rslqgsarfs
qlveellkii dafeldtgmq cangfsfskk knsssellne
dasiiqsvgy rnrvkklqqi esgsatlkds lsvqlsnlgi
vrsmkknrqt qpqnksvyia lesdsseerv napdgcsvrd
qelfqiapgg agdegklnsa kkaacdfseg
- QuestionWhat would you do when you discover a
unknown protein band in a 2-D gel? - AnswerBioinformatics tools for protein and
proteomic analysis - Determine the mass and molecular weight Compute
pI/Mw,ProFound,SearchXlinks,Mascot,X-proteo etc - Primary sequence analysisblastp(NCBI) etc
- Multiple allignment to identify the conserved
regions of identity between a set of sequences
selected from blast results - Clustalw,multialign etc
- Pattern search search using conserved regions for
probable functional role Prosite,Pfam,PRINTS,Inte
rPro (Expasy) etc - Post-translational modification
predictionChloroP,LipoP,PATS,SignalP etc - Secondary structure prediction/Threading
PredictProtein,GOR,Jnet,Threader,GenTHREADER etc - Homology modelling DALI,Modeller,SCRAWL etc
- Visualization RasMol,Swiss-PDB Viewer etc
- Molecular dynamics and structure quantam
mechanics for real life predictionAmber,Gaussian - Phylogenetic tree construction Phylip
Important links http//expasy.org/ http//www.comp
bio.dundee.ac.uk/ http//www.ebi.ac.uk/
10The underlying assumption used ...
- Mapping protein formation of a novel sequence to
- infer cellular metabolics
- infer probable evolutionary trend both past
- and future
- Develop near perfect disease controlling
- and preventive therapeutics
11A Short List of Bioinformatics Databases
Refhttp//www.biw.kuleuven.be/vakken/i287/bioinfo
rmatica.htm
12The most amazing question
What is Bioinformatics ?
13Answer Bioinformatics now occupies a
fundamental role in modern biology, chemistry,
genetics and systems biology, enabling and
accelerating the path to biological discoveries
and the understanding of systems.(RefHttp//bioi
nformatics.ubc.ca).
or
Using computational systems,software applications
and database solutions
Refhttp//bioinformatics.ubc.ca/research/talks/ar
chive/LIBR534_061003_jfox.pdf
14Important Links
http//bioinformatics.ubc.ca/about/what_is_bioinfo
rmatics http//binf.gmu.edu/websites.html http//b
ioinformatics.unr.edu/seqbx/tutorials.htm http//w
ww.sacs.ucsf.edu/Resources/biolinks.html http//ww
w.biw.kuleuven.be/vakken/i287/bioinformatica.htmP
rimary20DB