Introduction to bioinformatics (I617) - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to bioinformatics (I617)

Description:

the Giant Panda Riddle ... Giant pandas look like bears but have features that are unusual for bears and ... solved the giant panda classification problem using ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 40
Provided by: hat89
Category:

less

Transcript and Presenter's Notes

Title: Introduction to bioinformatics (I617)


1
Introduction to bioinformatics(I617)
  • Haixu Tang
  • School of Informatics
  • Email hatang_at_indiana.edu
  • Office EIG 1008
  • Tel 812-856-1859

2
Textbook
  • A Primer of Genome Science (2nd Edition) by Greg
    Gibson, Spencer V. Muse, Sinauer Associates, 2004
  • Suggested reading materials will be posted on the
    class wiki page http//cheminfo.informatics.india
    na.edu/djwild/I617_2006_wiki/index.php/Main_Page
  • Office Hour MW 1100-1200, EIG 1008 or
    appointment

3
Grading
  • Class project selected from one of four covered
    areas (bioinformatics, Chemical informatics,
    Laboratory informatics and Health informatics)
    25
  • Suggested Bioinformatics topics will be posted on
    the class wiki page
  • Homework 25 in Bioinformatics
  • 4, each 6.25

4
Bioinformatics BIOlogy informatics?
  • Not really it is a term (somehow arbitrarily
    chosen) to define a multi-disciplinary area that
    combines life sciences, physical sciences and
    computer science / informatics
  • It addresses biological problems using
    theoretical informatics approaches, not vice
    versa
  • It is transforming classical Biology into a
    Information Science.

5
The birth of bioinformatics
  • A revolution in biology research the emergence
    of Genome Science
  • Technology advancement in both biology and
    information science

6
Genome science a revolution of biology
  • Classical Biology
  • Genome Science

Hypothesis
Data
Hypothesis driven approach
Data driven approach
7
Bioinformatics from data analysis to data mining
  • Classical Biology
  • Genome Science

Hypothesis
Data
High throughput data
Low throughput data
Hypothesis generation
Hypothesis confirmation / rejection
8
Bioinformatics in the drivers seat
  • Classical Biology
  • Genome Science

Hypothesis
Data
Data mining
Data analysis
9
Key technology advancements
  • High throughput biotechnologies
  • Genome sequencing techniques
  • DNA microarray
  • Mass spectrometry
  • Large-scale experiments
  • HGP, HapMap
  • Omics / Systems Biology
  • Massive data generation, storage, exchange and
    analysis
  • CPU, storage, etc.
  • High speed network (Internet)
  • Bioinformatics

10
Bioinformatics mutually beneficial
  • For biologists
  • Fragment assembly in genome sequencing
  • Genome comparison
  • Gene clustering in DNA microarray analysis
  • Protein identification in proteomics
  • For computer scientists
  • String algorithms / Tree algorithms
  • Alternative Eulerian path (BEST theorem)
  • Reversal distances
  • Probabilistic graphic models (HMMs, BNs, etc.)

11
Two origins of bioinformatics
  • Combinatorial pattern matching in theoretical
    computer science
  • DNA and protein sequence analysis
  • Physical and analytical chemistry of Biomolecules
  • Protein structure analysis ? Structural
    bioinformatics
  • Bio-analytical chemistry ? Proteomics

12
Bioinformatics addresses computational challenges
in life and medical sciences
  • New computational problems for automatic data
    analysis
  • Reformulation of old problems using new high
    throughput data
  • Formulating new problems using high throughput
    data

13
Bioinformatics addresses computational challenges
in life and medical sciences
  • New computational problems for automatic data
    analysis
  • Genome sequencing
  • Proteomics
  • Transcriptomics
  • Data representation and visualization
  • Genome Browser
  • Solving biological problems by in silico
    approaches
  • Reformulation of old problems using new high
    throughput data
  • Gene finding
  • Protein structure and function
  • Formulating new problems using high throughput
    data
  • Comparative genomics
  • Polymorphisms / Population genetics
  • Systems Biology

14
Bioinformatics resources
  • Databases
  • Nucleic Acid Research (NAR) annual database issue
  • Organization
  • ISCB (International Society in Computational
    Biology)
  • Conferences
  • ISMB
  • RECOMB
  • Many other smaller or regional conferences, e.g.
    ECCB, CSB, PSB, etc, including local Indiana
    Bioinformatics conference

15
A case study
  • How bioinformatics help and transform classical
    biological topics?
  • Molecular evolutionary studies from anatomical
    features to molecular evidences
  • Genome evolution comparison of gene orders

16
Early Evolutionary Studies
  • Anatomical features were the dominant criteria
    used to derive evolutionary relationships between
    species since Darwin till early 1960s

17
Early Evolutionary Studies
  • Anatomical features were the dominant criteria
    used to derive evolutionary relationships between
    species since Darwin till early 1960s
  • The evolutionary relationships derived from these
    relatively subjective observations were often
    inconclusive. Some of them were later proved
    incorrect

18
Evolution and DNA Analysis the Giant Panda
Riddle
  • For roughly 100 years scientists were unable to
    figure out which family the giant panda belongs
    to
  • Giant pandas look like bears but have features
    that are unusual for bears and typical for
    raccoons, e.g., they do not hibernate

19
Evolution and DNA Analysis the Giant Panda
Riddle
  • In 1985, Steven OBrien and colleagues solved the
    giant panda classification problem using DNA
    sequences and bioinformatics algorithms

20
Evolutionary Tree of Bears and Raccoons
21
Evolutionary Trees DNA-based Approach
  • 40 years ago Emile Zuckerkandl and Linus Pauling
    brought reconstructing evolutionary relationships
    with DNA into the spotlight
  • In the first few years after Zuckerkandl and
    Pauling proposed using DNA for evolutionary
    studies, the possibility of reconstructing
    evolutionary trees by DNA analysis was hotly
    debated
  • Now it is a dominant approach to study evolution.

22
Evolutionary Trees
  • How are these trees built from DNA sequences?

23
Evolutionary Trees
  • How are these trees built from DNA sequences?
  • leaves represent existing species
  • internal vertices represent ancestors
  • root represents the common evolutionary ancestor

24
Rooted and Unrooted Trees
In the unrooted tree the position of the root
(common ancestor) is unknown. Otherwise, they
are like rooted trees
25
Distances in Trees
  • Edges may have weights reflecting
  • Number of mutations on evolutionary path from one
    species to another
  • Time estimate for evolution of one species into
    another
  • In a tree T, we often compute
  • dij(T) - the length of a path between leaves i
    and j
  • dij(T) tree distance between i and j

26
Distance in Trees an Exampe
d1,4 12 13 14 17 12 68
27
Distance Matrix
  • Given n species, we can compute the n x n
    distance matrix Dij
  • Dij may be defined as the edit distance between a
    gene in species i and species j, where the gene
    of interest is sequenced for all n species.
  • Dij edit distance between i and j

28
Fitting Distance Matrix
  • Given n species, we can compute the n x n
    distance matrix Dij
  • Evolution of these genes is described by a tree
    that we dont know.
  • We need an algorithm to construct a tree that
    best fits the distance matrix Dij

29
Reconstructing a 3 Leaved Tree
  • Tree reconstruction for any 3x3 matrix is
    straightforward
  • We have 3 leaves i, j, k and a center vertex c

Observe dic djc Dij dic dkc Dik djc
dkc Djk
30
Turnip vs Cabbage Look and Taste Different
  • Although cabbages and turnips share a recent
    common ancestor, they look and taste different

31
Turnip vs Cabbage Comparing Gene Sequences
Yields No Evolutionary Information
32
Turnip vs Cabbage Almost Identical mtDNA gene
sequences
  • In 1980s Jeffrey Palmer studied evolution of
    plant organelles by comparing mitochondrial
    genomes of the cabbage and turnip
  • 99 similarity between genes
  • These surprisingly identical gene sequences
    differed in gene order
  • This study helped pave the way to analyzing
    genome rearrangements in molecular evolution

33
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

Before
After
Evolution is manifested as the divergence in gene
order
34
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

35
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

36
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

37
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

38
Transforming Cabbage into Turnip
Reversal distance
39
History of Chromosome X
Rat Consortium, Nature, 2004
Write a Comment
User Comments (0)
About PowerShow.com