Biostat 830 - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Biostat 830

Description:

Special topics on human genetics and population genetics. Who should take this class? ... 10 million SNPs have been identified in human genome (dbSNP Build 125) ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 45
Provided by: sphU
Category:

less

Transcript and Presenter's Notes

Title: Biostat 830


1
Introduction
  • Biostat 830
  • Winter 2006
  • Lecture 1

2
Title
  • SNPs, haplotypes and association studies
  • Special topics on human genetics and population
    genetics

3
Who should take this class?
  • For those who are or will involve in association
    studies.
  • Design.
  • Conduct the study.
  • Data analysis.
  • For those who are looking for interesting
    research topics in statistics and genetics.

4
Overall objective
  • Connecting phenotype with genotype.
  • Identify DNA sequence variation that cause
    specific traits.

5
Brief history
  • First genome-wide linkage analysis using DNA
    polymorphisms was first proposed. (Botstein et
    al. 1980).
  • Many disease (mostly mendelian) causing genes
    identified using positional cloning.
  • Cystic fibrosis. Karem et al. 1989, Riordan et
    al. 1989.
  • Huntingtons disease. Gusella et al. 1983.

6
Limitation of linkage study
  • Successes of linkage studies need near perfect
    phenotype-genotype match.
  • In complex diseases, misdiagnoses, heterogeneity,
    complex inheritance or frequent phenocopies are
    abundant.
  • Linkage mapping limits resolution to the order of
    1-10 cM. The dominant limit of resolution is the
    number of meioses in which crossovers might have
    occurred.

7
Motivation
  • Identify genetic variants that cause complex
    diseases.
  • Association studies have emerged as a promising
    approach for such endeavor.
  • Problems need to be addressed
  • population stratification, tagSNP selection,
    multiple testing, study design,

8
Contents of the class I
  • Lectures
  • LD, haplotype block.
  • TagSNP selection.
  • HapMap project.
  • Haplotype inference.
  • Population structure.
  • Colascence theory.
  • Multiple testing.

9
Contents of the class II
  • Lectures
  • Association studies
  • Genotyping assays Affy 500k, Illumina 250K,
  • Design strategy power, case-control, two stage,
  • Quality control HWE, genotyping errors,.
  • Data analysis haplotype based, multiple testing,
    gene-gene interaction, gene-environment
    interaction, adjust for population
    stratification,

10
Contents of the class III
  • Paper discussion
  • Two papers each time,
  • Two groups,
  • Each group make up ten questions about the paper,
    and challenge the other group to answer these
    questions.

11
Contents of the class IV
  • Student presentations
  • Each student present once, on analysis performed
    on a real dataset, using HapMap resources.

12
Evaluation
  • Class participation,
  • Data analysis exercise/homework,
  • Analysis project and presentation.

13
Human Genome Variation
14
Terminology
  • Locus The physical location of a gene.
  • e.g., D1S80, D4S43, D16S126
  • Allele Alternative form of a gene.
  • e.g., A, a A,B,O
  • Genotype
  • The observed alleles at a genetic locus for an
    individual.
  • e.g., AA, Aa, aa AA,AB,AO,BB,BO,OO
  • Homozygous AA, aa
  • Heterozygous Aa
  • Phenotype
  • The expression of a particular genotype
  • Continuous plasma LDL level, blood pressure
  • Dichotomous Hyperlipidemia, Hypertension

15
DNA Polymorphism
  • Restriction Fragment Length Polymorphism
  • Tandem Repeats
  • Satellites
  • Minisatellites
  • Microsatellites
  • Single Nucleotide Polymorphism
  • Single-base substitutions
  • Single-base insertion/deletions

16
RFLP
  • Discovered in 1975
  • Only two alleles present or not present
  • 10,000 in genome

17
RFLP
18
Tandem repeats
  • They include three subclasses
  • Satellites.
  • Minisatellites.
  • Microsatellites. 
  • The name "satellites" comes from their optical
    spectra Buoyant density gradient centrifugation
    can separate DNA fragments with significantly
    different base compositions. The main band
    represents the bulk DNA, and the "satellite"
    bands originate from tandem repeats.

19
Satellites
  • Satellites
  • The size of a satellite DNA ranges from 100 kb to
    over 1 Mb. 
  • the alphoid DNA located at the centromere of all
    chromosomes.  Its repeat unit is 171 bp.

20
Minisatellite
  • Large family of repetitive sequences (Jeffreys et
    al. 1985, Armour et al. 1992).
  • Many of the original tandem repeat families
  • could be purified from the rest of the genome
    as satellite fraction of DNA.
  • 9-24 bp monomer, total length 0.5-30kb.
  • Located in non-coding regions.
  • Used for
  • Paternity test
  • Forensic identification

21
VNTR
22
  • D1S80
  • 16 base pair repeats
  • Non-coding region on
  • chromosome 1
  • Repeat number 14-40

BS 50 Genetics and Genomics Spring 2001, Prof.
Dan Hartl
23
Microsatellite
  • Nucleotide repeat markers (Weber et al. 1989,
    Litty et al. 1989)
  • Short Tandem Repeat
  • Mostly located in intron and UTR, some in coding
    region.
  • Used for
  • DNA fingerprinting and DNA testing.
  • Linkage analysis.
  • Genetic and physical mapping of genes.

24
Huntingtons Disease
  • On short arm of Chromosome 4,
  • CAG repeats
  • normal 11-29 times
  • disease 40-80 times

THIS LAND IS YOUR LAND words and music by Woody
Guthrie This land is your land, this land is
my land From California, to the New York Island
From the redwood forest, to the gulf stream
waters This land was made for you and me As I
was walking a ribbon of highway I saw above me
an endless skyway I saw below me a golden valley
This land was made for you and me
Woody Guthrie
25
SNP
26
What is a SNP?
27
SNP Key Concepts
  • Definition More than one alternative bases occur
    at an appreciable frequency
  • Availability Over 10 million SNPs have been
    identified in human genome (dbSNP Build 125)
  • Function Most SNPs are neutral, and less than 1
    is present in protein-coding regions

28
SNP
  • The most common genetic polymorphism
  • Distribute throughout genome with high density
  • More stable and easy to assay
  • Major cause of genetic diversity among different
    (normal) individuals, e.g. drug response, disease
    susceptibility.
  • Facilitates large scale genetic association
    studies as genetic markers.

29
Total Number of SNPs in PHASE II HapMap
Total 5,894,684.
30
SNP
  • Most of SNPs neither change protein synthesis nor
    cause disease directly. Rather, they serve as
    landmarks, since they may be physically close to
    the mutation site on the chromosome. Because of
    this proximity, SNPs may be shared among groups
    of people with common characteristics.
  • Analyze SNP patterns among different groups of
    people may shed light on evolution of human race,
    understand ethnic groups and races.

31
SNP Types
32
SNP Locations
Exon 1
Exon 2
Exon 3
Exon 4
Intron 1
Intron 2
Intron 3
3
5
DNA
TRANSCRIPTION
pre-mRNA
SPLICING
Mature mRNA
AAAAAAAA
ORF
Phenotype Change (e.g. Asthma)
TRANSLATION
AUG - B1Bn - STOP
Protein Sequence
protein 3D structure
33
Haplotype
  • Definition an ordered list of alleles of
    multiple linked loci on a single chromosome

34
Genotype vs Haplotype
Single locus Homozygous wild type AA Homozygous
mutant aa Heterozygous Aa
Haplotype ABC ABc AbC Abc aBC aBc abC abc
Multiple locus 1 2 3 AA BB CC aa bb cc Aa
Bb Cc
35
Haplotypes vs. SNPs
  • ADVANTAGES
  • Haplotypes are more informative
  • Haplotypes may enhance the power for LD analysis
  • Haplotypes can be used to study the evolutionary
    relationship of SNPs
  • DISADVANTAGE
  • May not be completely resolved in the absence of
    family data or experimentation

36
Haplotype phasing
  • To determine the haplotypes from genotypes
    containing tightly linked SNPs from a set of n
    individuals

Subject 1 AA BB cc Subject 2 Aa BB cc Subject
3 AA Bb Cc Subject 4 aa BB Cc Subject
5 Aa Bb CC . . .
37
Thank you
38
  • Recurrent Risk Ratio (?R)
  • Ratio of the risk of disease in a particular
    class of relative to the risk of disease in the
    general population

?R KR/Ko KR Pr(X21X11)
KR recurrent risk for a type R relative of an
affected individual K0 prevalence of the disease
in the general population X1 and X2 represent
relative1 and relative2 1 means affected, 0
means unaffected Reference ???
39
Evolution
  • Mutation.
  • Spontaneous heritable changes in genes.
  • Migration.
  • Movement of subpopulation within a larger
    population.
  • Natural selection.
  • Difference in the ability to survive and
    reproduce.
  • Random genetic drift.
  • Chance.

40
  • Recombination Fraction (?)
  • The frequency of crossing over between two loci.

?AC
?AB
?BC
A
B
C
?AB?BC-2?AB?BC??AC?min(?AB?BC,1/2)
41
Goals of Linkage Studies
  • To obtain a crude chromosomal location of the
    gene or genes associated with a phenotype of
    interest, e.g. a genetic disease or an important
    quantitative traitsExamples cystic fibrosis
    (found), diabetes, multiple sclerosis, and blood
    pressure

42
Linkage and Association
  • Linkage studies use individual families where
    members are affected and attempt to demonstrate
    linkage between the occurrence of the disease and
    genetic markers (creates associations within
    families, but not among unrelated people)
  • Association studies are based on populations and
    attempt to show an association between a
    particular allele and susceptibility to disease
    (a statistical statement about the co-occurrence
    of alleles or phenotypes)

43
Linkage analysis

44
Association Studies
Cases
Controls
Write a Comment
User Comments (0)
About PowerShow.com