Bioinformatics - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Bioinformatics

Description:

The Human Genome Project has produced a huge storehouse of data that will be ... Yeast, C. elegans, Drosophila. Mouse. Human. Comparative genomics ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 57
Provided by: NBIF9
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics


1
Bioinformatics
  • Genomic Biology as a Quantitative Science

Stuart M. Brown, Ph.D. Director, Research
Computing, NYU School of Medicine
2
A Genome Revolution is underway in Biology and
Medicine
  • We are in the midst of a "Golden Era" of biology
  • The Human Genome Project has produced a huge
    storehouse of data that will be used to change
    every aspect of biological research and medicine
  • The revolution is about treating biology as an
    information science, not about specific
    technologies.

3
The Human Genome Project
4
The job of the biologist is changing
As more biological information becomes available
and laboratory equipment becomes more automated
...
  • The biologist will spend more time using
    computers
  • on experimental design and data analysis (and
    less time doing tedious lab biochemistry)
  • Biology will become a more quantitative science
    (think how the periodic table affected chemistry)

5
Biological Information
Protein 2-D gel
mRNA Expression
Protein 3-D Structure
Mass Spec.
Genome sequence
The Cell
6
A review of some basic genetics
7
(No Transcript)
8
DNA
  • 4 bases (G, C, T, A)
  • base pairs
  • G--C
  • T--A
  • genes
  • non-coding regions

9
Decoding Genes
10
Classic Molecular Biology
  • A gene is a DNA sequence at a particular locus on
    a chromosome that encodes a protein.
  • The Central Dogma of Molecular Biology
  • DNA gt RNA gt Protein
  • A mutation changes the DNA sequence - leads to a
    change in protein sequence - or no protein.
  • Alleles are slightly different DNA sequences of
    the same gene.

11
  • The human genome is the the complete DNA content
    of the 23 pairs of human chromosomes - 44
    autosomes plus two sex chromosomes
  • - approximately 3.2 billion base pairs.

12
Bold Words from Francis Collins
  • The history of biology was forever altered a
    decade ago by the bold decision to launch a
    research program that would characterize in
    ultimate detail the complete set of genetic
    instructions of the human being.

Francis S. Collins Director of the National
Human Genome Research Institute N Engl J Med 1999
88242-65
13
Genome Projects
  • Complete genomic sequences
  • Dozens of microorganisms
  • Yeast, C. elegans, Drosophila
  • Mouse
  • Human
  • Comparative genomics
  • All this data is enabling new kinds of research -
    for those with the computational skills to take
    advantage of it.

14
How does genome sequencing technology work?
  • Molecular biology of the Sanger method
  • Sub-cloning of fragments - BAC, PAC, cosmid,
    plasmid, phage
  • Automated sequencers
  • The need for computers to assemble the "reads"
    and manage the workflow

15
  • Automated sequencing machines,
  • particularly those made by PE Applied
    Biosystems, use 4 colors, so they can read all 4
    bases at once.

16
(No Transcript)
17
Raw Genome Data
18
Lots of Sequence Data
  • How to extract useful knowledge from all of this
    data?
  • Need sophisticated computer tools
  • Find the genes
  • Figure out what they do (function)
  • Diagnostic tests
  • Medical treatments

19
Finding genes in genome sequence is not easy
  • About 1 of human DNA encodes functional genes.
  • Genes are interspersed among long stretches of
    non-coding DNA.
  • Repeats, pseudo-genes, and introns confound
    matters

20
  • Gene prediction tools - look for Start and Stop
    codons, intron splice sites, similarity to known
    genes and cDNAs, etc.

21
(No Transcript)
22
Data Mining Tools
  • Scientists need to work with a lot of layers of
    information about the genome
  • coding sequence of known genes and cDNAs
  • genetic maps (known mutations and markers)
  • gene expression
  • Protein sequence (from Mass Spectroscopy)
  • cross species homology
  • Most of the best tools are free on the Web

23
(No Transcript)
24
UCSC
25
Ensembl at EBI/EMBL
26
What comes after Genome Sequencing?
  • We are now in the "Post-Genomic" era.
  • It is possible to use the genome sequence plus a
    variety of automated laboratory equipment to do
    entirely new kinds of biology.
  • Not just scaled-up, but comprehensive

27
Relate genes to Organisms
  • Diseases
  • OMIM Human Genetic Disease
  • Metabolic and regulatory pathways
  • KEGG
  • Cancer Genome Project

28
Human Alleles
  • The OMIM (Online Mendelian Inheritance in Man)
    database at the NCBI tracks all human mutations
    with known phenotypes.
  • It contains a total of about 2,000 genetic
    diseases and another 11,000 genetic loci with
    known phenotypes - but not necessarily known gene
    sequences
  • It is designed for use by physicians
  • can search by disease name
  • contains summaries from clinical studies

29
(No Transcript)
30
KEGG Kyoto Encylopedia of Genes and Genomes
  • Enzymatic and regulatory pathways
  • Mapped out by EC number and cross-referenced to
    genes in all known organisms
  • (wherever sequence information exits)
  • Parallel maps of regulatory pathways

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Genomics
  • What is Genomics?
  • An operational definition
  • The application of high throughput automated
    technologies to molecular biology.
  • A philosophical definition
  • A wholistic or systems approach to the study of
    information flow within a cell.

35
Genomics Technologies
  • Automated DNA sequencing
  • Automated annotation of sequences
  • DNA microarrays
  • gene expression (measure RNA levels)
  • SNP Genotyping
  • Genome diagnostics (genetic testing)
  • Proteomics
  • Protein identification
  • Protein-protein interactions

36
DNA chip microarrays
  • Put a large number (100K) of cDNA sequences or
    synthetic DNA oligomers onto a glass slide (or
    other substrate) in known locations on a grid.
  • Label an RNA sample and hybridize
  • Measure amounts of RNA bound to each square in
    the grid
  • Make comparisons
  • Cancerous vs. normal tissue
  • Treated vs. untreated
  • Time course
  • Many applications in both basic and clinical
    research

37
Spot your own Chip (plans available for free
from Pat Browns website)
Robot spotter
Ordinary glass microscope slide
38
cDNA spotted microarrays
39
(No Transcript)
40
Goal of Microarray experiments
  • Microarrays are a very good way of identifying a
    bunch of genes involved in a disease process
  • Differences between cancer and normal tissue
  • Tuberculosis infected vs resistant lung cells
  • Mapping out a pathway
  • Co-regulated genes
  • Finding function for unknown genes
  • Involved these processes

41
Direct Medical Applications
  • Diagnosis
  • Type of cancer
  • Aggressive or benign?
  • Monitor treatment outcome
  • Is a treatment having the desired effect on the
    target tissue?

42
When you go looking
43
you will certainly find something!
44
Human Genetic Variation
  • Every human has essentially the same set of genes
  • But there are different forms of each gene --
    known as alleles
  • blue vs. brown eyes
  • genetic diseases such as cystic fibrosis or
    Huntingtons disease are caused by dysfunctional
    alleles

45
  • Alleles are created by mutations in the DNA
    sequence of one person - which are passed on to
    their descendants

46
Clinical Manifestationsof Genetic Variation
  • (All disease has a genetic component)
  • Susceptibility vs. resistance
  • Variations in disease severity or symptoms
  • Reaction to drugs (pharmacogenetics)
  • All of these traits can be traced back to
    particular genes (or sets of genes)

47
Pharmacogenomics
  • People react differently to drugs
  • Side effects
  • Variable effectiveness
  • There are genes that control these reactions
  • SNP markers can be used to identify these genes
    (profiles)

48
Use the Profiles
  • Genetic profiles of new patients can then be used
    to prescribe drugs more effectively avoid
    adverse reactions.
  • Sell a drug with a gene test
  • Can also speed clinical trials by testing on
    those who are likely to respond well.

49
Toxicogenomics
  • There are a number of common pathways for drug
    toxicity (or environmental tox.)
  • It is possible to compile genomic signatures
    (gene expression data) for these pathways.
  • Candidate drug molecules can be screened in cell
    culture or in animals for induction of these
    toxicity pathways.

50
Planning for a Genomics Revolution
  • Bioinformatics support must be integral in the
    planning process for the development of new
    genomics research facilities.
  • Genome Project sequencing centers have more staff
    and more spent on data analysis than on the
    sequencing itself.
  • Microarray facilities will be even more skewed
    toward data analysis
  • It is an information-intensive business!

51
Implications for Biomedicine
  • Physicians will use genetic information to
    diagnose and treat disease.
  • Virtually all medical conditions have a genetic
    component.
  • Faster drug development research
  • Individualized drugs
  • Gene therapy
  • All Biologists will use gene sequence information
    in their daily work

52
Training "computer savvy" scientists
  • Know the right tool for the job
  • Get the job done with tools available
  • Network connection is the lifeline of the
    scientist
  • Jobs change, computers change, projects change,
    scientists need to be adaptable

53
Long Term Implications
  • A "periodic table for biology" will lead to an
    explosion of research and discoveries - we will
    finally have the tools to start making systematic
    analyses of biological processes (quantitative
    biology).
  • Understanding the genome will lead to the
    ability to change it - to modify the
    characteristics of organisms and people in a wide
    variety of ways

54
Genomics Education
  • Genomics scientists need basic training in both
    Molecular Biology and Computing
  • Specific training in the use of automated
    laboratory equipment, the analysis of large
    datasets, and bioinformatics algorithms
  • Particularly important for the training of
    medical doctors - at least a familiarity with the
    technology

55
Genomics in Medical Education
  • The explosion of information about the new
    genetics will create a huge problem in health
    education. Most physicians in practice have had
    not a single hour of education in genetics and
    are going to be severely challenged to pick up
    this new technology and run with it."
  • Francis Collins

56
Stuart M. Brown, Ph.D.stuart.brown_at_med.nyu.eduww
w.med.nyu/rcr
Bioinformatics A Biologist's Guide to
Biocomputing and the Internet
Write a Comment
User Comments (0)
About PowerShow.com