Title: Genetics and Molecular Biology Tutorial II Computational Perspective
1Genetics and Molecular Biology Tutorial II --
Computational Perspective
- The goal is to introduce some topics to
individuals with a minimal background in
genetics/biology, and yet try to provide some
examples of topics to maintain the interest of
individuals with extensive biological/genetics
backgrounds.
2Outline
- Gene structure
- genomic structure vs mRNA structure
- coding and noncoding exons
- introns
- primary transcript processing
- aside -- nonsense mediated mRNA degradation
- alternative splicing and differential
polyadenylation - evolutionary conservation of coding and noncoding
sequences
3Outline
- Genomic structure
- repetitive sequences
- LINES and SINES
- example -- Y chromosome palindromes
- C value paradox
- genomes of model organisms
- example
- yeast genome and gene-chip
- single/double knockouts
- cross-species sequence similarities for putative
function identification - example -- chaperonine
4Fundamental Genetics and Probability Concepts
- meiosis and sampling
- patterns of inheritance
- monogenic and complex inheritance
- phenocopy
- reduced penetrance
- DNA variation
- polymorphisms, SNPs, and mutations
- positional cloning
5Gene Structure
6Transcript Processing
- DNA -gt pre-mRNA -gt mRNA -gt protein
7Nonsense mediated mRNA degradation
- unknown mechanism
- more rapidly degrades mRNA containing
- Lykke-Andersen, mRNA quality control Marking
the message for life or death. Current Biology,
11, 2001.
8Nonsense Mediated mRNA Degradation
9Genome Structure -- repeat classes
10C-Value ParadoxHartl, Molecular melodies in
high and low C, Nat. Rev. Genetics, Nov 20001
- refers to the massive, counterintuitive and
seemingly arbitrary differences in genome size
observed in eukaryotic organisms - Drosophila melanogaster 180 Mb
- Podisma pedestris 18,000 Mb
- difference is difficult to explain in view of
apparently similar levels of evolutionary,
developmental, and behavioral complexity
11Alternative Splicing
- Every conceivable pattern of alternative
splicing is found in nature. Exons have multiple
5 or 3 splice sites alternatively used (a, b).
Single cassette exons can reside between 2
constitutive exons such that alternative exon is
either included or skipped ( c ). Multiple
cassette exons can reside between 2 constitutive
exons such that the splicing machinery must
choose between them (d). Finally, introns can be
retained in the mRNA and become translated. - Graveley, Alternative splicing increasing
diversity in the proteomic world. Trends in
Genetics, Feb., 2001.
12Classic View of Gene No Longer Valid -- Strachan
pg 185
13Alternative Splicing Example -- Graveley 2001
14Alternative PolyAdenylation
- common in human RNA (Edwards-Gilbert 1997)
- in many genes, 2 or more poly-A signals in 3 UTR
- alternative transcripts can show tissue
specificity - alternative poly-A signals may be brought into
play following alternative splicing
15Edwards-Gilbert. Nucleic Acids Res, 13, 1997
16- Evolution of the mitochondrial genome and origin
of eukaryotic cells
17Evolutionary Conservation of Coding and Noncoding
Sequences
- Sequencing of H. sapiens and model organisms is
basis for comparative genomics - Generally, functional solutions (encoded as
genes) across organisms allows us to compare gene
sequences and infer function - protein functional/structural region domains
- Intergenic regions are generally not conserved
(always exceptions)
18Example - MKKS (UniGene Clusters)
- human rat 87.4
- human mouse 84.9
- human cow 87.1
- mouse rat 97.8
- rat cow 91.0
- mouse cow 85.1
- frog rat 62.5
19Example - MKKS
20(No Transcript)
21Computational Approach to Using Conserved Regions
- Problem -- want to screen genes for mutations
- Conventional approach -- screen all exons of a
single gene - Alternative -- identify domains with in multiple
genes, and screen domains first, to optimize
screening time and resources
22Cross-Species Similarities
- yeast
- gene chip for hybridization/expression
- complete genome (first eukaryote)
- singe knockouts and double knockouts
23Fundamental Genetics
- meiosis
- Hs are diploid
- meiosis produces haploid gametes
- mechanism for transmission of genetic material to
offspring - recombination by cross-over (Holliday structure)
or by independent segregation of homologous pairs
24Fundamental Genetics (Background for Linkage
Analysis)
- Rule of Segregation
- offspring receive ONE allele (genetic material)
from the pair of alleles possessed by BOTH
parents - Rule of Independent Assortment
- alleles of one gene can segregate independently
of alleles of other genes - (Linkage Analysis relies on the violation of
Independent Assortment Rule)
25Genetic Marker Prelude to LA
- A genetic marker allows for the observation of
the genetic state at a particular genomic
location (locus). - A genotype is the measured state of a genetic
marker. - May never be feasible to sequence cases directly.
- An informative marker is often heterozygous,
or polymorphic and enables the observation of
the inheritance of genetic material.
26Monogenic and Polygenic Diseases
- monogenic (Mendelian) -- one gene
- simple (dominant and recessive) Mendelian
inheritance - direct correspondence between one gene mutation
and one disorder - majority of disease genes found are monogenic
- polygenic -- (complex) multiple genes
- heterogeneity and epistasis
- combinatorics
- no longer have direct correspondence between one
gene and disorder - majority of disorders are probably polygenic
- complexity of organisms and observed pathways
27...Mongenic and Polygenic Diseases
- phenocopy
- reduced penetrance
- Example -- sickle cell anemia
- classic recessive disorder
- defect in red blood cells (hemoglobin)
- but infant hemoglobin gene can leak
- wide range of phenotypes
28Examples
29Examples
30Example
31BBS4 Pedigree
32Hardy-Weinberg Equilibrium
- Rule that relates allelic and genotypic
frequencies in a population of diploid, sexually
reproducing individuals if that population has
random mating, large size, no mutation or
migration, and no selection - Assumptions
- allelic frequencies will not change in a
population from one generation to the next - genotypic frequencies are determined in a
predictable way by allelic frequencies - the equilibrium is neutral -- if perturbed, it
will reestablish within one generation of random
mating at the new allelic frequency
33(No Transcript)
34H-W
- f(AA) p2
- f(Aa) 2pq
- f(aa) q2
- (pq)2
- (p2 q2 r2 2pq 2pr 2qr) (pqr)2
35Dominant and Recessive Penetrance
Modeledpenetrance P(pt gt)
- DD Dd dd
- 1 1 0
- DD Dd dd
- 0.9 0.9 0.0
- DD Dd dd
- 0 0 1
- DD Dd dd
- 0 0 0.8
36D-R Heterogeneous, DD Epistatic
- AA Aa aa
- BB 1 1 0
- Bb 1 1 0
- bb 1 1 1
- reduced penetrance
- 3,9,27,81,243 3n
- AA Aa aa
- BB 1 1 0
- Bb 1 1 0
- bb 0 0 0
37Dom-Rec Heterozygous
Screen genes A, B?, b
38Uninformative Marker
39Informative Marker
40- Given the following observations family
structure, affection status, genotypes, and
disease allele frequencies. Assuming a model for
the disease, can we calculate the probability
that these observations fit an assumed model???
41Linkage
42Linkage Analysis
- Goal find a marker linked to a disease gene.
- LOD score log of likelihood ratio
- LR?data k Pdata ?
- theta estimate of genetic distance
(recombination fraction) between marker and
disease - proportion of recombinant gametes/total gametes
43Linkage Analysis
- Linkage analysis calculates the likelihood that
the inheritance pattern of the phenotype
(disease) is supported by the observed
inheritance patterns (genotypes) in a pedigree. - few monogenic models, easy to test
- more difficult to find models explaining
inheritance in polygenic models - parameter maximization
44Linkage Analysis Programs
- FASTLINK - 2 point
- O(n2), where n number of markers
- GeneHunter - multipoint, 2 point
- O(n2), where n number of people
45Allele Sharing
- tries to show that affected family members
inherit the same chromosomal regions more often
than expected by chance
46Allele Sharing Example
Needs at least sibs.
47Association Studies
- Allelic association studies provide the most
powerful method for locating genes of small
effect contributing to complex diseases and
traits. Daniels, Am J Hum Genet 621189-1197,
1998. - Linkage analysis
- genome wide screen, 400 markers 10 cM (10 MB),
association needs 4000 polymorphic markers - generally need nuclear family or larger
- Association finds linkage disequilibruim
48Association Studies
- Association is simply a statistical statement
about the co-occurrence of alleles or phenotypes.
Allele A is associated with disease D if people
who have D also have A more (or maybe less) often
than would be predicted from the individual
frequencies of D and A in the population. Pg.
286 Human Molecular Genetics 2, Tom Strachan
49Examples
- HLA-DR4 (antigen marker)
- 36 in UK
- 78 with rheumatoid arthritis
- CF( RFLP markers XV2.c (X1,X2), KM19(K1,K2))
- Marker Alleles CF(case) Normal(control)
- X1, K1 3 49
- X1, K2 147 19
- X2, K1 8 70
- X2, K2 8 25
- CF associated with X1, K2 in 89 (Strachan)
50Linkage Disequilibrium
- linkage equilibrium (aka Hardy-Weinberg) is true
if - P(gt1,gt1gt2,gt2) P(gt1,gt1)P(gt2,gt2)
where P(haplotype) - case vs controls
- TDT (heterozygous marker transmitted), HRR
(untransmitted alleles as control) - allelic associations (outbred populations)
maintained at only lt 1cM
51Equilibrium
52SNPs
- Single-Nucleotide Polymorphisms
- 1 every 1000 bp (estimated)
- 2,972,052 SNPs submitted to dbSNP
- dbSNP summary link
- 50 of all SNPs are in question
- 10 of UTRs have SNPs
- 100,000 - 500,000 SNPs needed
- Why dont we do this?
-
53Homozygosity Mapping
54Positional Cloning
55Disease Gene Identification
- SSCP -- single strand conformational polymorphism
- PCR -- polymerase chain reaction
- primers amplify template sequence
- direct sequencing
- BBS2 (Bardet-Biedl Syndrome)
56BBS2 genetic mapping
C16
1 2 3 4 5 6 7 8 9 10 11 12
57BBS2 genetic mapping
unaffected
affected
C16
1 2 3 4 5 6 7 8 9 10 11 12
58BBS4 Gene (Direct Sequencing)
(Hs.26471)
59BBS4 Deletion (by PCR)
exons 3 4
60BBS4 Mutations (direct sequencing)
(R295P)
61Summary
- Disease Gene Identification
- challenges
- interval localization
- genotyping and genetic markers, linkage analysis,
allele sharing, association studies (SNiPs),
homozygosity mapping - disease gene identification techniques
- Take home
- A complex disorder (with interacting genes) has
yet to be characterized
62Demo -- installing a database
- A database organizes data
- Most common
- relational database (oracle, sybase)
- perceived as a collection of tables,
- where table is an unordered collection of rows
- each row has a fixed number of fields, and each
field can store a predefined type of data value
(date, integer, string, etc.) - simplest
- flat file
63Databases
- NCBI
- BLAST
- Amazon
- Yahoo
- Several of our own
- genotypes
- rat ESTs
- eye clones from differential display
- micro-array data
64This space intentionally left blank