Title: 7. Understanding a Genome Sequence
17. Understanding a Genome Sequence
2Outline
- 7.1. Locating the Genes in a Genome
Sequence - 7.2. Determining the Functions of Individual
Genes - 7.3. Global Studies of Genome Activity
- 7.4. Comparative Genomics
37.1. Locating the Genes in a Genome Sequence
4Methods of locating the genes
- Simply inspecting the sequence by eye.
- Inspecting by computer
- Bioinformatics
5Gene location by sequence inspection
- ORF scanning
- Initial codon ATG (usually)
- Termination codon TAA, TAG, TGA
6ORF scanning (1/3)
- A double-stranded DNA molecule has six reading
frames
7ORF scanning (2/3)
- Length of gene
- E. Coli 317 codons
- S. cerevisiae 483 codons
- Human approximately 450 codons
- Simple ORF scanning is an effective way with
bacterial
8ORF scanning (3/3)
- Real gene-red line, spurious ORF-blue line
9Simple ORF scanning
- Effective with bacterial (in most case)
- Real gene do not overlap
- No genes-within-genes
- LESS effective with higher eukaryotic DNA
- More space between the real genes
- Genes are often split by introns
- Many exon are shorter than 100 codons
10Higher eukaryotic DNA
11Three modifications to the basic procedure for
ORF scanning
- Codon bias
- Exon-intron boundaries
- Upstream regulatory sequences
12Codon bias
- All organisms have a bias
- Bias is different in different species
- In human genes, leucine is most frequently coded
by CTG
13The genetic code
14Exon-intron boundaries
- Sequence of the upstream
- Sequence of the downstream
15Upstream regulatory sequences
- Locate the regions where genes begin
- Have distinctive sequences feature
- Variable
- Not all genes have the same collection of
regulatory sequence
16Other strategy
- CpG island
- Vertebrate genomes contain CpG island upstream of
many genes - Some 40-50 of human genes are associated with
an upstream CpG island
17Homology search
- Search the DNA database
- If the sequence is similar to any known genes
- Assign functions to newly discovery
18Experimental techniques of gene location
- Hybridization tests
- cDNA sequencing
- Exon-intron boundaries
19Northern hybridization
20Zoo-blotting
21cDNA capture
22RACE rapid amplification of cDNA ends
23Exon-intron boundaries
247.2. Determining the Functions of Individual Genes
25We know rather less than we thought
- E. Coli
- 4288 protein-coding genes
- 1853 (43) previously identified
- S. cerevisiae
- 30 previously identified
26Homology reflects evolutionary relationships
- Orthologous
- Homologous genes located in the genomes of
different organisms - Paralogous
- Two or more homologous genes located in the same
genome
27Amino acids or nucleotides
28Homologous domain
- Tudor domain (120-amino-acid motif)
29Homology analysis in the yeast genome project
- Yeast genome 6000 genes
- Identified by conventional genetic analysis 30
- Homology analysis 70
30Assign gene function by experimental analysis
- Gene inactivation
- Ultraviolet radiation
- Mutagenic chemical
- Mutants are present in a natural population
31Homologous recombination
- Inactivate individual gene by homologous
recombination
32Example Yeast deletion cassette
- Disruption has occurred are identified
- Antibiotic-resistance gene is expressed
33Example Gene inactivation with mice
- Identifying the function of unknown human genes
- Use embryonic stem to make knockout mice
- Some gene inactivations are lethal
34Gene inactivation without homologous
recombination (1/2)
- Transposon tagging
- Most genomes contain transposable elements are
inactive, but still few that retain their ability
to transpose - Difficult to target individual genes
35Gene inactivation without homologous
recombination (2/2)
- RNA interference
- Not disrupting gene itself, but its mRNA
- Effectively in the worm Caenorhabditis elegans
- Difficult applying to mammalian
36RNA interference (cont.)
- Fusion with liposomes can be used to deliver
double-stranded RNA into a human cell
37Using gene overexpression to assess function
- Test gene is much more active than normal (gain
of function) - Vector multicopy (40-200 copies per cell)
- The vector must contain a highly active promoter
38Function analysis by gene overexpression
39Directed mutagenesis
- Probe gene function in detail
- Delete or alter the relevant part of the gene
sequence - Applications lie in the area of protein
engineering
40Directed mutagenesis
- Knowing which cells have undergone homologous
recombination - Placing a marker gene
41Reporter genes
- Function of a gene can often be obtained by where
and when gene is active - What the reporter genes can do?
42Immunocytochemistry
- Searching for where the protein is located
- Labeled antibody
- Fluorescent labeling and colloidal gold
437.3. Global Studies of Genome Activity
44Global Studies of Genome Activity
- Understanding how the genomes as a whole operates
within the cell - From genome itself, transcriptome and proteome
45Studying the transcriptome
- mRNAs that are present in a cell at a particular
time - Identify the mRNAs that is contains and determine
their relative abundances
46Assay the composition of a transcriptome
- Convert its mRNA into cDNA, and then to sequence
every clone in the resulting cDNA library - Feasible but laborious
- SAGE (Serial analysis of gene expression)
- Study short sequence (12bp in length)
- Short but sufficient to enable the gene to be
identified
47SAGE why 12-bp tags is enough?
- 412 16,777,216 bp
- Average size of eukaryotic mRNA is about 1500 bp
- 150011000 16,500,000
48Using chip and microarray technology
- Converting target transcriptomes mRNA into cDNA
- Chip - Immobilized oligonucleotides
- Microarray - cDNA
49Studying the proteome
- Proteome plays as the link between the genome and
the biochemical capability of the cell - Between transcriptome and proteome
- Not all mRNAs are actively translated at any
particular time - The protein content is variable
50Proteomics - methodology
- Protein electrophoresis
- Mass spectrometry
51Identifying proteins that interact with one
another
- An interaction with a second well-characterized
protein can indicate something - Two most useful method
- Phage display
- Yeast two-hybrid system
52Phage display
- Insert particular DNA for protein of phage coat
- More powerful strategy
- Prepare a phage display library
53Phage display
M13???????????phage display????,M13????????DNA ???
???,??DNA???????????????????
M13 filamentous phage ??????
pIII?pVIII phage display?????, ?????full?hybrid??,
??hybrid?????
54Phage display???????
1.?????????????phage????????????????phage???
2.??2???????phage?????????????????
3.?????????????ligand???,??low pH????,???eluted???
55Yeast two-hybrid system
- Activator
- Bind to a DNA sequence upstream
- Polymerase activation
56(No Transcript)
57Using homology analysis to deduce protein-protein
interactions
- 5 region of the yeast HIS2
- E. coli his2 and 3 region of the E. coli his10
587.4. Comparative Genomics
59Comparative genomics as an aid to gene
mapping (1/3)
- Genomes of related organisms are similar.
- The closer two organisms are on the evolutionary
scale, the more related their genomes will be.
60Comparative genomics as an aid to gene
mapping (2/3)
- The pufferfish genome is just 400Mb, but
containing approximately the same number of gene
with human. - It should be possible to use the pufferfish map
to find human homologs of pufferfish genes.
61Comparative genomics as an aid to gene
mapping (3/3)
- For example
- Wheat genome 16000Mb
- Rice genome 430Mb
62Comparative genomes in the study of human disease
genes
- Gain access to the sequences of genes involved in
human disease. - Discovery of a homolog of a human disease in a
second organism. - Find the biochemical role of the human gene from
the homolog that have already been characterized.
63Example of human disease genes that have homologs
in Saccharomyces cerevisiae