The Genome Access Course - PowerPoint PPT Presentation

About This Presentation
Title:

The Genome Access Course

Description:

To locate all of the genes in the human genome and describe their functions may ... Mustard weed (A. thaliana): 25,800. Nematode (C. elegans): 18,266 ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 29
Provided by: dnalearni
Category:

less

Transcript and Presenter's Notes

Title: The Genome Access Course


1
The Genome Access Course
Gene Prediction
2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
Raw Genome Data
6
  • To locate all of the genes in the human genome
    and describe their functions may take 15-20
    years!

7
Whats the problem?
  • 1-3 coding sequences
  • large number and long stretches of repetitions
  • pseudogenes
  • highly specific, rarely expressed genes
  • paralogous genes
  • regulatory regions
  • short RNAs
  • first and last exons
  • splice variations

8
Eukaryotic Genomes
9
Highly contentious question
  • How many genes?

10
Celera says that there are only 30,000 genes
  • Affymetrix 60,000 human genes on GeneChips?
  • Incyte over 120,000 genes?
  • GenBank 49,000 gene coding sequences?
  • UniGene gt 89,000 clusters of unique ESTs?

11
How about other organisms?
  • Human (H. sapiens) 35,000
  • Mustard weed (A. thaliana) 25,800
  • Nematode (C. elegans) 18,266
  • Fruit fly (D. melanogaster) 13,338
  • Bakers yeast (S. cerevisiae) 6144
  • Bacterium (E. coli) 4,300

12
Definition of GeneWhat are genes?
13
The "one gene one protein hypothesis predates
the description of the chemical structure of DNA
by Watson and Crick 1953 and even the
identification of DNA as the molecule of
inheritance.
14
Patched up
  • The One Gene - One Protein dogma has been
    patched up a lot exon/introns, alternative
    splicing, etc.

15
Eukaryotic Genes
16
A Dynamic Concept
  • Introns/exons
  • Postransriptional modifications
  • Alternative splicing
  • Differential expression
  • Genes-in-genes
  • Genes-ad-genes
  • Postranslational modifications
  • Multi-subunit proteins

17
Maybe
  • A
  • One Gene - One Transcript
  • Dogma

18
(No Transcript)
19
Current consensus
  • 15,000 known genes (similarity to previously
    isolated genes and expressed sequences from a
    large variety of different organisms)
  • 17,000 predicted (GenScan, GeneFinder, GRAIL)
  • Based on and limited to previous knowledge

20
Sources of Complexity
  • Gene number (2-3 fold over worm and fly)
  • Alternative splicing (ca. 3 transcripts per gene
    vs. 1.3 for worm)
  • Different organization (domains, subunits)
  • New architecture (CNS, brain complexity)
  • New abilities (Cognitive abilities)

21
What does it mean?
  • 30,000 - 35,000 genes
  • Average coding length 1.4 kb
  • Average gene extent 30 kb
  • Average gene density 11.5/1 Mb
  • Y chromosome 6.4/1 Mb
  • Chromosome 19 26.8/1 Mb

22
Complete
23
(No Transcript)
24

25
(No Transcript)
26
Finding Genes in Human Genome Sequence is Not
Easy
  • Functional genes need to be detected in vast
    amount of non-coding human DNA.
  • Repeats, pseudo-genes, and introns confound
    matters.

27
(No Transcript)
28
Its all about patterns
  • Promoters
  • Open reading frames (ORF)
  • Translational start and stop codons
  • Intron splice sites
  • Go on to http//www.dnalc.org/bioinformatics/
Write a Comment
User Comments (0)
About PowerShow.com