http://creativecommons.org/licenses/by-sa/2.0/ - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

http://creativecommons.org/licenses/by-sa/2.0/

Description:

http://creativecommons.org/licenses/by-sa/2.0/ – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 79
Provided by: FrancisO6
Category:

less

Transcript and Presenter's Notes

Title: http://creativecommons.org/licenses/by-sa/2.0/


1
http//creativecommons.org/licenses/by-sa/2.0/
2
Welcome to the Canadian Bioinformatics Workshops
  • Bioinformatics, 8th EditionVancouver BC, Feb 14
    26, 2005
  • Francis Ouellette,
  • Director, CGDN Bioinformatics Core Facility
  • Director, UBC Bioinformatics Centre

3
Outline
  • Instructors, schedule, other things ...
  • Why does bioinformatics exist?(data, data, data
    )
  • Will it exist for ever? (some experts say no!)
  • What is bioinformatics?(get 50 scientists in a
    room, get 50 answers )
  • What are the big challenges in bioinformatics?(Re
    search discipline differences between life
    sciences and computer sciences)
  • Resources available to Bioinformaticians
  • The importance of Open Access and Open Source

4
(No Transcript)
5
CBW Bioinformatics Vancouver 2005
6
CBW Bioinformatics Vancouver 2005
Stefanie
7
Acknowledgement
  • A great part of this talk is adapted from what
    Fiona Brinkman developed last year.
  • When ever you see , this means
    thatslide is from Fionas presentation.
  • Other slides are simply acknowledged with names.

8
Today
  • 09.00 Introduction
  • 10.00 Break
  • 10.15 Biology
  • 1115 UNIX
  • 12.30 Lunch (on your own)
  • 13.45 Biological Databases
  • 1530 Break
  • 1545 Entrez lab
  • 1715 Break
  • 1730 Public Lecture Nat Goodman
  • followed by a CBW reception

9
Administrative stuff
  • Accounts on Linux machines
  • login guest
  • password cbw2005
  • Security, badges, fire exits, food

10
Assignments and Marking Scheme
11
Canadian Bioinformatics Workshops
Bioinformatics
? You are here ?
www.bioinformatics.ca
12
CBW Sponsorshttp//bioinformatics.ca/sponsors.php

13
Questions?
14
2004
2001
1998
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Introduction - Objectives
  • Why does bioinformatics exist
  • What is bioinformatics
  • What are the big challenges in bioinformatics
  • Research
  • Discipline differences between Bio and CS

22
Why is there Bioinformatics?
Sequencing technology!
  • Lots of new sequences being added
  • Automated sequencers
  • Genome Projects
  • EST sequencing
  • Microarray studies
  • Proteomics
  • Metagenomics (Metagenomics describes the
    functional and sequence-based analysis of the
    collective microbial genomes contained in an
    environmental sample)
  • Patterns in datasets that can only be analyzed
    using computers

23
Need for informatics in biology origins
  • Gramicidine S (Consden et al., 1947), partial
    insulin sequence (Sanger and
    Tuppy, 1951)
  • 1961 tRNA fragments
  • Francis Crick, Sydney Brenner, and colleagues
    propose the existence of transfer RNA that uses a
    three base code and mediates in the synthesis of
    proteins (Crick et al., 1961) General nature of
    genetic code for proteins. Nature 192 1227-1232.
    In Microbiology A Centenary Perspective, edited
    by Wolfgang K. Joklik, ASM Press. 1999, p.384
  • First codon assignment UUU/phe (Nirenberg and
    Matthaei, 1961)

24
Need for informatics in biology origins
  • The key to the whole field of nucleic acid-based
    identification of microorganismsthe
    introduction molecular systematics using proteins
    and nucleic acids by the American Nobel laureate
    Linus Pauling. Zuckerkandl, E., and L. Pauling.
    "Molecules as Documents of Evolutionary History."
    1965. Journal of Theoretical Biology 8357-366
  • Another landmark Nucleic acid sequencing (Sanger
    and Coulson, 1975)

25
Need for informatics in biology origins
  • Early databases Dayhoff, 1972 Erdmann, 1978
    The Atlas of Protein Sequences was
    available on Digital Tape, and in 1980,
    by modem (300 Baud).
  • Early programs restriction enzyme sites, patern
    finding, promoters, etc circa 1978.
  • 1978 1993 Nucleic Acids Research published
    supplemental information
  • 1982 DDBJ/EMBL/GenBank are created as a public
    repository of genetic sequence
    information.
  • 1983 NIH funds the PIR (Protein Information
    Resource) database.
  • 1988 Pearson and Lipman create FASTA

26
Genomes

  • Number of base
    pairs
  • __________________________________________________
    _________
  • 1971 First published DNA sequence
    12
  • 1977 PhiX174
    5,375
  • 1982 Lambda
    48,502
  • 1992 Yeast Chromosome III
    316,613
  • 1995 Haemophilus influenza
    1,830,138
  • 1996 Saccharomyces
    12,068,000
  • 1998 C. elegans
    97,000,000
  • 2000 D. melanogaster
    120,000,000
  • 2001 H. sapines (draft)
    2,600,000,000
  • 2003 H. sapiens
    2,850,000,000

27
Fr
History of DNA Sequencing
From Eric Green
Adapted from Messing Llaca, PNAS (1998)
28
  • Genbank doubles every 14 months

(from the National Centre for Biotechnology
Information)
Shorter than Moores law (computer power doubling
every 20 months!)
29
  • The next step is obviously to locate all of the
    genes and regulatory regions, describe their
    functions, and identify how they differ between
    different groups (i.e. disease vs healthy)
    bioinformatics plays a critical role

30
(No Transcript)
31
(No Transcript)
32
Bioinformatics will help with. Similarity
Searching Sequence Databases
  • What is similar to my sequence?
  • Searching gets harder as the databases get bigger
    - and quality changes
  • Tools BLAST and FASTA time saving heuristics
    (approximate methods)
  • Statistics informed judgement of the biologist

33
Bioinformatics will help with.Structure-Functio
n Relationships
  • Can we predict the function of protein molecules
    from their sequence?
  • sequence gt structure gt function
  • Prediction of some simple 3-D structures
    (a-helix, b-sheet, membrane spanning, etc.)

34
Bioinformatics will help with.Phylogenetics
  • Can we define evolutionary relationships between
    organisms by comparing DNA sequences
  • What is the molecular clock?
  • Lots of methods and software, what is the
    "correct" analysis?

35
Top 10 Future Challenges for Bioinformatics
Precise, predictive model of transcription
initiation and termination ability to predict
where and when transcription will occur in a
genome Precise, predictive model of RNA
splicing/alternative splicing ability to predict
the splicing pattern of any primary transcript in
any tissue Precise, quantitative models of
signal transduction pathways ability to predict
cellular responses to external stimuli
Determining effective proteinDNA, proteinRNA
and proteinprotein recognition codes
Accurate ab initio protein structure prediction
Rational design of small molecule inhibitors
of proteins Mechanistic understanding of
protein evolution understanding exactly how new
protein functions evolve Mechanistic
understanding of speciation molecular details of
how speciation occurs Continued development
of effective gene ontologies - systematic ways to
describe the functions of any gene or protein
Education development of appropriate
bioinformatics curricula for secondary,
undergraduate and graduate education
Chris Burge, Ewan Birney, Jim Fickett. Genome
Technology, issue No. 17, January, 2002
36
What is Bioinformatics?
  • Think Pair Share!

37
Bioinformatics is about understanding how life
works. It is an hypothesis driven science.
38
Bioinformatics is about integrating biological
themes together with the help of computer tools
and biological databases, and gaining new
knowledge from this.
39
BLAST Result
  • Basic
  • Local
  • Alignment
  • Search
  • Tool

40
PubMed Text Neighboring
  • Common terms could indicate similar subject
    matter
  • Statistical method
  • Weights based on term frequencies within document
    and within the database as a whole
  • Some terms are better than others

From Mark Boguski
41
Micro-array analysis
Science Jan 1 1999 83-87 The Transcriptional
Program in the Response of Human Fibroblasts to
Serum Vishwanath R. Iyer, Michael B. Eisen,
Douglas T. Ross, Greg Schuler, Troy Moore,
Jeffrey C. F. Lee, Jeffrey M. Trent, Louis M.
Staudt, James Hudson Jr., Mark S. Boguski, Deval
Lashkari, Dari Shalon, David Botstein, Patrick
O. Brown
Figure 1
Figure 4
42
VAST Result
  • Vector
  • Alignment
  • Search
  • Tool
  • Ferredoxin
  • Halobacterium marismortui
  • Chlorella fusca

43
Computational Biology Analysis
Q Gln NH2-C-CH2-CH2- O
R Arg NH2-C-NH-CH2-CH2-CH2- NH2
44
(No Transcript)
45
Structural Interactions
Other interactions occurring within this
structure (blue). In this case Glutaminyl-tRNA
Synthetase interacting with AMP.
From Chris Hogue
46
What does it mean to do CB?
  • Like to work with sequences, structures,
    expression arrays, interaction of molecules and
    genetic maps.
  • Like the whole systems approach
  • Like the IT component, and the power it provides
    to crunching through lots of data
  • Like clear answers
  • Like to do Science

47
Doing CB means to be
  • Database user
  • Tool user
  • Database developer
  • Tool developer
  • Training, practicing or developing
  • Doing bioinformatics experiments

48
Bioinformatics experiments
Alignment
BLAST search
Sequence
  • Reagents
  • Sequence
  • Databases
  • Method
  • P-P BLASTP
  • N-P BLASTX
  • P-N TBLASTN
  • N-N BLASTN
  • N (P) N (P) TBLASTX
  • Interpretation
  • Similarity
  • Hypothesis testing

Know your reagents
Know your methods
Do your controls
49
Bioinformatics Citizenship What it means!
Nature 409452
50
The job of the biologist is changing
  • As more biological information becomes available
  • The biologist will spend more time using
    computers
  • The biologist will spend more time on data
    analysis
  • Biology will become a more quantitative science
    (think how the periodic table and atomic theory
    affected chemistry)
  • Bioinformatics will disappear as a field of
    research Lincoln Stein
  • Bioinformatics will be part of all life sciences
    and many computer science courses Francis
    Ouellette

51
The challenge Putting it all together
  • The current state of the art requires the
    biologist to jump around from Web to mainframe to
    personal computer
  • The trend is for integration
  • Real Power Being able to use and customize all
    resources

52
Data to integrate
  • Genomic sequences
  • mRNA sequences
  • Protein sequences
  • Protein structures
  • Protein function
  • Biomolecular interactions
  • Gene expression
  • DNA Polymorphisms
  • Taxonomic data
  • Molecular pathways
  • Genetic networks
  • Bibliographic data
  • Populations
  • Evolution

53
The Computer Scientist in the Age of Genomics
54
How much biology to understand?
  • Increasing sophistication required for
    computational biologists in terms of biological
    knowledge
  • What knowledge is important? What about all those
    exceptions?
  • What problems are important?

55
What computational tools to understand?
  • Perl is used extensively in bioinformatics, but
    so are many other languages.
  • Open source is prevalent in bioinformatics
    (Bioconductor, BLAST, Linux, MySQL, bioperl).
  • Need to be knowledgeable about both the standard
    bioinformatics algorithms and common tools that
    are based on them.
  • Appreciate the different databases and programs
    out there and what their benefits and fallacies
    are databases have widely varying quality and
    are not all alike.
  • There is some database geo politics out there!

56
High quality bioinformatics researchExcellent
communication between biologists and computer
scientists is key
57
The computer scientist and biologist compared
  • Computer scientist
  • Logic
  • Problem-solving
  • Process-oriented
  • Algorithmic
  • Optimizing
  • Biologist
  • Knowledge gathering
  • Experimentally-focused
  • Exceptions are as common as rules
  • Describe work as a story
  • Develop conclusions and models

58
Comp Sci vs Bio
  • The result.
  • see the world differently
  • ask different questions
  • come to problems with different assumptions
  • pick up on different details
  • use different metaphors to organize knowledge
  • have different sets of analytical tools at their
    disposal
  • can even interact with people differently
  • Coming together
  • Communicate constantly!
  • Gain a better understanding of different ways of
    thinking
  • Try communicating in different ways
  • Remember there are others. Statisticians,
    mathematicians, engineers, physicists, chemists,
    physiologists.

59
http//bioinformatics.ubc.ca/
60
UBiC Links Directory
  • Curated list of links to bioinformatics tools
    available worldwide

SCIENCE 16 April 2004
61
bioinformatics.ubc.ca/resources/links_directory
62
(No Transcript)
63
Open Source and Open Access
  • Making It Work with Open Source and Open Access
  • What does it mean?
  • Will it cause the economy to go bust?
  • Is it too good to be true?
  • What if I get scooped?

64
Open Source what does it mean?
  • Open source - Any software whose code is
    available for users to look at and modify freely.
    Linux is the best-known example others include
    Apache, the dominant software for servers that
    provide web pages worldwide.

65
Open Source in the life sciences
  • Present in all areas of bioinformatics
  • Some very well known examples of tools used in
    industry and academic circles include
  • BLAST
  • EMBOSS
  • EnsEMBL
  • MLagan
  • GenScan
  • Bioconductor

66
http//bioinformatics.ubc.ca/resources/tools/
67
Open Access
  • Unrestricted access to data
  • Allows all to work and make discoveries
  • Discoveries are not necessarily open access
  • Open access is applicable to any kind of data you
    want to apply it to
  • Sequence data (DNA, RNA or protein)
  • Gene expression data
  • Protein-protein interaction data
  • Publication

68
Open access critical to progress in Science
  • Without GenBank and other public sequence
    databases
  • There would be no BLAST
  • There would be no diagnostics DNA testing
  • There would be no understanding of the human
    genome (there probably would not have been a
    human genome to work on in the first place).

69
(No Transcript)
70
Open Access of Publications
  • We are way overdue to break down the ivory towers
    that surround a few journals that are allowed to
    hide data from everybody that does not pay them!
    Who did the work? Taxpayers dollars!
  • There are enough good (read well reviewed)
    journals out there now so that we need not
    publish in closed journals.
  • We will need to get rid of the old guard that
    only wants to publish in Science, Nature and
    Cell.
  • I think these journals will change Physicists
    have been doing this for decades biologists
    will figure it out soon.
  • Need the reagents to do discoveries on the text
    data have diseases to understand, cures to find!

71
http//creativecommons.org/licenses/by-sa/2.0/
72
(No Transcript)
73
(No Transcript)
74
http//bioinformatics.ubc.ca/ouellette/
75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
Questions?
  • Open access?
  • Open source?
  • Where the washrooms are?
  • CBW Bioinformatics?
  • CBW Proteomics? Genomics? Tools?
  • Graduate program in Bioinformatics at UBC?
  • When do we start?
Write a Comment
User Comments (0)
About PowerShow.com