Title: http://creativecommons.org/licenses/by-sa/2.0/
1http//creativecommons.org/licenses/by-sa/2.0/
2Welcome to the Canadian Bioinformatics Workshops
- Bioinformatics, 8th EditionVancouver BC, Feb 14
26, 2005 - Francis Ouellette,
- Director, CGDN Bioinformatics Core Facility
- Director, UBC Bioinformatics Centre
3Outline
- Instructors, schedule, other things ...
- Why does bioinformatics exist?(data, data, data
) - Will it exist for ever? (some experts say no!)
- What is bioinformatics?(get 50 scientists in a
room, get 50 answers ) - What are the big challenges in bioinformatics?(Re
search discipline differences between life
sciences and computer sciences) - Resources available to Bioinformaticians
- The importance of Open Access and Open Source
4(No Transcript)
5CBW Bioinformatics Vancouver 2005
6CBW Bioinformatics Vancouver 2005
Stefanie
7Acknowledgement
- A great part of this talk is adapted from what
Fiona Brinkman developed last year. - When ever you see , this means
thatslide is from Fionas presentation. - Other slides are simply acknowledged with names.
8Today
- 09.00 Introduction
- 10.00 Break
- 10.15 Biology
- 1115 UNIX
- 12.30 Lunch (on your own)
- 13.45 Biological Databases
- 1530 Break
- 1545 Entrez lab
- 1715 Break
- 1730 Public Lecture Nat Goodman
- followed by a CBW reception
9Administrative stuff
- Accounts on Linux machines
- login guest
- password cbw2005
- Security, badges, fire exits, food
10Assignments and Marking Scheme
11Canadian Bioinformatics Workshops
Bioinformatics
? You are here ?
www.bioinformatics.ca
12CBW Sponsorshttp//bioinformatics.ca/sponsors.php
13Questions?
142004
2001
1998
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Introduction - Objectives
- Why does bioinformatics exist
- What is bioinformatics
- What are the big challenges in bioinformatics
- Research
- Discipline differences between Bio and CS
22Why is there Bioinformatics?
Sequencing technology!
- Lots of new sequences being added
- Automated sequencers
- Genome Projects
- EST sequencing
- Microarray studies
- Proteomics
- Metagenomics (Metagenomics describes the
functional and sequence-based analysis of the
collective microbial genomes contained in an
environmental sample) - Patterns in datasets that can only be analyzed
using computers
23Need for informatics in biology origins
- Gramicidine S (Consden et al., 1947), partial
insulin sequence (Sanger and
Tuppy, 1951) - 1961 tRNA fragments
- Francis Crick, Sydney Brenner, and colleagues
propose the existence of transfer RNA that uses a
three base code and mediates in the synthesis of
proteins (Crick et al., 1961) General nature of
genetic code for proteins. Nature 192 1227-1232.
In Microbiology A Centenary Perspective, edited
by Wolfgang K. Joklik, ASM Press. 1999, p.384 - First codon assignment UUU/phe (Nirenberg and
Matthaei, 1961)
24Need for informatics in biology origins
- The key to the whole field of nucleic acid-based
identification of microorganismsthe
introduction molecular systematics using proteins
and nucleic acids by the American Nobel laureate
Linus Pauling. Zuckerkandl, E., and L. Pauling.
"Molecules as Documents of Evolutionary History."
1965. Journal of Theoretical Biology 8357-366 - Another landmark Nucleic acid sequencing (Sanger
and Coulson, 1975)
25Need for informatics in biology origins
- Early databases Dayhoff, 1972 Erdmann, 1978
The Atlas of Protein Sequences was
available on Digital Tape, and in 1980,
by modem (300 Baud). - Early programs restriction enzyme sites, patern
finding, promoters, etc circa 1978. - 1978 1993 Nucleic Acids Research published
supplemental information - 1982 DDBJ/EMBL/GenBank are created as a public
repository of genetic sequence
information. - 1983 NIH funds the PIR (Protein Information
Resource) database. - 1988 Pearson and Lipman create FASTA
26Genomes
-
Number of base
pairs - __________________________________________________
_________ - 1971 First published DNA sequence
12 - 1977 PhiX174
5,375 - 1982 Lambda
48,502 - 1992 Yeast Chromosome III
316,613 - 1995 Haemophilus influenza
1,830,138 - 1996 Saccharomyces
12,068,000 - 1998 C. elegans
97,000,000 - 2000 D. melanogaster
120,000,000 - 2001 H. sapines (draft)
2,600,000,000 - 2003 H. sapiens
2,850,000,000
27Fr
History of DNA Sequencing
From Eric Green
Adapted from Messing Llaca, PNAS (1998)
28- Genbank doubles every 14 months
(from the National Centre for Biotechnology
Information)
Shorter than Moores law (computer power doubling
every 20 months!)
29- The next step is obviously to locate all of the
genes and regulatory regions, describe their
functions, and identify how they differ between
different groups (i.e. disease vs healthy)
bioinformatics plays a critical role
30(No Transcript)
31(No Transcript)
32Bioinformatics will help with. Similarity
Searching Sequence Databases
- What is similar to my sequence?
- Searching gets harder as the databases get bigger
- and quality changes - Tools BLAST and FASTA time saving heuristics
(approximate methods) - Statistics informed judgement of the biologist
33Bioinformatics will help with.Structure-Functio
n Relationships
- Can we predict the function of protein molecules
from their sequence? - sequence gt structure gt function
- Prediction of some simple 3-D structures
(a-helix, b-sheet, membrane spanning, etc.)
34Bioinformatics will help with.Phylogenetics
- Can we define evolutionary relationships between
organisms by comparing DNA sequences - What is the molecular clock?
- Lots of methods and software, what is the
"correct" analysis?
35Top 10 Future Challenges for Bioinformatics
Precise, predictive model of transcription
initiation and termination ability to predict
where and when transcription will occur in a
genome Precise, predictive model of RNA
splicing/alternative splicing ability to predict
the splicing pattern of any primary transcript in
any tissue Precise, quantitative models of
signal transduction pathways ability to predict
cellular responses to external stimuli
Determining effective proteinDNA, proteinRNA
and proteinprotein recognition codes
Accurate ab initio protein structure prediction
Rational design of small molecule inhibitors
of proteins Mechanistic understanding of
protein evolution understanding exactly how new
protein functions evolve Mechanistic
understanding of speciation molecular details of
how speciation occurs Continued development
of effective gene ontologies - systematic ways to
describe the functions of any gene or protein
Education development of appropriate
bioinformatics curricula for secondary,
undergraduate and graduate education
Chris Burge, Ewan Birney, Jim Fickett. Genome
Technology, issue No. 17, January, 2002
36What is Bioinformatics?
37Bioinformatics is about understanding how life
works. It is an hypothesis driven science.
38Bioinformatics is about integrating biological
themes together with the help of computer tools
and biological databases, and gaining new
knowledge from this.
39BLAST Result
- Basic
- Local
- Alignment
- Search
- Tool
40PubMed Text Neighboring
- Common terms could indicate similar subject
matter - Statistical method
- Weights based on term frequencies within document
and within the database as a whole - Some terms are better than others
From Mark Boguski
41Micro-array analysis
Science Jan 1 1999 83-87 The Transcriptional
Program in the Response of Human Fibroblasts to
Serum Vishwanath R. Iyer, Michael B. Eisen,
Douglas T. Ross, Greg Schuler, Troy Moore,
Jeffrey C. F. Lee, Jeffrey M. Trent, Louis M.
Staudt, James Hudson Jr., Mark S. Boguski, Deval
Lashkari, Dari Shalon, David Botstein, Patrick
O. Brown
Figure 1
Figure 4
42VAST Result
- Vector
- Alignment
- Search
- Tool
- Ferredoxin
- Halobacterium marismortui
- Chlorella fusca
43Computational Biology Analysis
Q Gln NH2-C-CH2-CH2- O
R Arg NH2-C-NH-CH2-CH2-CH2- NH2
44(No Transcript)
45Structural Interactions
Other interactions occurring within this
structure (blue). In this case Glutaminyl-tRNA
Synthetase interacting with AMP.
From Chris Hogue
46What does it mean to do CB?
- Like to work with sequences, structures,
expression arrays, interaction of molecules and
genetic maps. - Like the whole systems approach
- Like the IT component, and the power it provides
to crunching through lots of data - Like clear answers
- Like to do Science
47Doing CB means to be
- Database user
- Tool user
- Database developer
- Tool developer
- Training, practicing or developing
- Doing bioinformatics experiments
48Bioinformatics experiments
Alignment
BLAST search
Sequence
- Reagents
- Sequence
- Databases
- Method
- P-P BLASTP
- N-P BLASTX
- P-N TBLASTN
- N-N BLASTN
- N (P) N (P) TBLASTX
- Interpretation
- Similarity
- Hypothesis testing
Know your reagents
Know your methods
Do your controls
49Bioinformatics Citizenship What it means!
Nature 409452
50The job of the biologist is changing
- As more biological information becomes available
- The biologist will spend more time using
computers - The biologist will spend more time on data
analysis - Biology will become a more quantitative science
(think how the periodic table and atomic theory
affected chemistry) - Bioinformatics will disappear as a field of
research Lincoln Stein - Bioinformatics will be part of all life sciences
and many computer science courses Francis
Ouellette
51The challenge Putting it all together
- The current state of the art requires the
biologist to jump around from Web to mainframe to
personal computer - The trend is for integration
- Real Power Being able to use and customize all
resources
52Data to integrate
- Genomic sequences
- mRNA sequences
- Protein sequences
- Protein structures
- Protein function
- Biomolecular interactions
- Gene expression
- DNA Polymorphisms
- Taxonomic data
- Molecular pathways
- Genetic networks
- Bibliographic data
- Populations
- Evolution
53The Computer Scientist in the Age of Genomics
54How much biology to understand?
- Increasing sophistication required for
computational biologists in terms of biological
knowledge - What knowledge is important? What about all those
exceptions? - What problems are important?
55What computational tools to understand?
- Perl is used extensively in bioinformatics, but
so are many other languages. - Open source is prevalent in bioinformatics
(Bioconductor, BLAST, Linux, MySQL, bioperl). - Need to be knowledgeable about both the standard
bioinformatics algorithms and common tools that
are based on them. - Appreciate the different databases and programs
out there and what their benefits and fallacies
are databases have widely varying quality and
are not all alike. - There is some database geo politics out there!
56High quality bioinformatics researchExcellent
communication between biologists and computer
scientists is key
57The computer scientist and biologist compared
- Computer scientist
- Logic
- Problem-solving
- Process-oriented
- Algorithmic
- Optimizing
- Biologist
- Knowledge gathering
- Experimentally-focused
- Exceptions are as common as rules
- Describe work as a story
- Develop conclusions and models
58Comp Sci vs Bio
- The result.
- see the world differently
- ask different questions
- come to problems with different assumptions
- pick up on different details
- use different metaphors to organize knowledge
- have different sets of analytical tools at their
disposal - can even interact with people differently
- Coming together
- Communicate constantly!
- Gain a better understanding of different ways of
thinking - Try communicating in different ways
- Remember there are others. Statisticians,
mathematicians, engineers, physicists, chemists,
physiologists.
59http//bioinformatics.ubc.ca/
60UBiC Links Directory
- Curated list of links to bioinformatics tools
available worldwide
SCIENCE 16 April 2004
61bioinformatics.ubc.ca/resources/links_directory
62(No Transcript)
63Open Source and Open Access
- Making It Work with Open Source and Open Access
- What does it mean?
- Will it cause the economy to go bust?
- Is it too good to be true?
- What if I get scooped?
64Open Source what does it mean?
- Open source - Any software whose code is
available for users to look at and modify freely.
Linux is the best-known example others include
Apache, the dominant software for servers that
provide web pages worldwide.
65Open Source in the life sciences
- Present in all areas of bioinformatics
- Some very well known examples of tools used in
industry and academic circles include - BLAST
- EMBOSS
- EnsEMBL
- MLagan
- GenScan
- Bioconductor
66http//bioinformatics.ubc.ca/resources/tools/
67Open Access
- Unrestricted access to data
- Allows all to work and make discoveries
- Discoveries are not necessarily open access
- Open access is applicable to any kind of data you
want to apply it to - Sequence data (DNA, RNA or protein)
- Gene expression data
- Protein-protein interaction data
- Publication
68Open access critical to progress in Science
- Without GenBank and other public sequence
databases - There would be no BLAST
- There would be no diagnostics DNA testing
- There would be no understanding of the human
genome (there probably would not have been a
human genome to work on in the first place).
69(No Transcript)
70Open Access of Publications
- We are way overdue to break down the ivory towers
that surround a few journals that are allowed to
hide data from everybody that does not pay them!
Who did the work? Taxpayers dollars! - There are enough good (read well reviewed)
journals out there now so that we need not
publish in closed journals. - We will need to get rid of the old guard that
only wants to publish in Science, Nature and
Cell. - I think these journals will change Physicists
have been doing this for decades biologists
will figure it out soon. - Need the reagents to do discoveries on the text
data have diseases to understand, cures to find!
71http//creativecommons.org/licenses/by-sa/2.0/
72(No Transcript)
73(No Transcript)
74http//bioinformatics.ubc.ca/ouellette/
75(No Transcript)
76(No Transcript)
77(No Transcript)
78Questions?
- Open access?
- Open source?
- Where the washrooms are?
- CBW Bioinformatics?
- CBW Proteomics? Genomics? Tools?
- Graduate program in Bioinformatics at UBC?
- When do we start?