Title: BIODATA ANALYSIS
1BIODATA ANALYSIS
2KNOWLEDGE BASE
- Descriptive vs Inferential - assumed
- Parametric/Non-Parametric - focus on choice
- Data Types/Levels of Measurement - types of
statistic/basis for testing - Importance of Normal Distribution
- (Non-Normality, Transformation - e.g.
lab-based) - Precision/Accuracy
- Differences
- Confidence, Significance - probability basis
3STRUCTURE
1,2, manysamples E.D., Regn., C.T.
Replication, Assays, Counts
Estimation/H.T.
H.T.
Study techniques
Lab. techniques
Non-Parametric
Parametric
Distributional Assumptions, Probability Basis
Size/Type of Data Set
DESCRIPTIVE ORDERED
4CONTEXT
- GENETICS 5 branches- Laws of Chemistry,
Physics, Maths. in Biology - GENOMICS Study of Genomes (complete set of DNA
carried by Gamete) by integration of 5 branches
of Genetics with Informatics and Automated
systems - PURPOSE of GENOME RESEARCH Info. on Structure,
Function, Evolution of all Genomes past and
present - Techniques of Genomics from molecular,
quantitative, population genetics Concepts and
Terminology from Mendelian genetics and
cytogenetics
5CONTEXT GENOMICS -LINKAGES
Mendelian
Cytogenetics
Molecular
GENOMICS Genetic markers DNA Sequences Linkage/Phy
sical Maps Gene Location QTL Mapping
Population
Quantitative
6CONTEXT GENETICS - BRANCHES
- Classical Mendelian Gene and Locus, Allele,
Segregation, Gamete, Dominance, Mutation - Cytogenetics Cell, Chromasome, Meiosis and
Mitosis, Crossover and Linkage - Molecular DNA sequencing, Gene Regulation and
Transcription, Translation and Genetic Code
Mutations - Population Allelic/Genotypic Frequencies,
Equilibrium, Selection, Drift, Migration,
Mutation - Quantitative Heritability/Additive,
Non-additive Genetic Effects, Genetic by
Environment Interaction, Plant and Animal Breeding
7GENOMICS - FOCUS
CLASSICAL Genetic Markers, Linkage
Analysis, Gene Ordering, Multipoint Analysis,
Genetic and QTL mapping
INFORMATICS Databases, Sequence
Comparison,Data Communications, Automation
DNA SEQUENCE ANALYSIS Sequence
Assembly, Placement, Comparison
8GENOMICS KEY QUESTIONS
- HOW do Genes determine total phenotype?
- HOW MANY functional genes necessary and
sufficient in Biosystems? - WHAT are necessary Physical/Chemical aspects of
gene structure? - IS gene location in Genome specific?
- WHAT DNA sequences/structures needed for
gene-specific functions? - HOW MANY different functional genes in whole
biosphere? - WHAT MEASURES of essential DNA sameness in
different species?
9STATISTICAL GENOMICS
- UNUSUAL FEATURES
- Mixtures discrete/continuous variables e.g.
combination of genotypes of genetic markers (D)
and values quantitative traits (C) - Empirical Distibutions needed for some Test
Statistics e.g. QTL analysis, H.T. of locus order - Size databases very large e.g. molecular marker
and DA protein sequence data - Intensive Computation e.g. Linkage Analysis, QTL
and computationally greedy algorithms in locus
ordering, derivation of empirical distributions
etc. - Likelihood Analysis - Linear Models typically
insufficient alone
10EXAMPLE Mendelian Genetics - Cytogenetics
- GENE unit of heredity. Single gene passed
between generations by Mendelian Inheritance - DIPLOID Individual two copies (alleles) of each
gene (A) - HOMOZYGOSITY AA aa
(genotypes) - HETEROZYGOSITY Aa
(genotype) - (multiple alleles possible for a gene)
- PHENOTYPE -appearance /measurement gene
characteristic - - AA,Aa,aa
(codominant) - - AA,Aa same (A
dominant allele)
11Example contd.
- Common Mating schemes
- Single gene haploid gene cell (AA,aa) gamete
- AA A(gamete) aa a F1 hybrid diploid
genotype Aa - then F1 x 1 parent (AA or aa) (Backcross)
- F1 x F1 (self-pollination - or sibs if two
sexes) (F2) - Mendelian Laws
- 1. Segregation Single gene trait, simple
heredity - Genotypic segregation ratio 11
codominant (Backcross) - G.S.R.
121 (F2) - P.S.R.
31 (dominant alleles in F2)
12Example contd.
- 2. Independent Assortment (Inheritance of
unlinked multiple genes). Each pair of alleles of
a gene segregate independently of the segregation
of alleles of another gene) - e.g. A, B and 2 alleles for both
- 9 genotypes (AABB, AABb, Aabb, AaBB, AaBb,
Aabb,aaBB,aaBb and aabb) in F2 progeny - Expect G.S.R. 1212
121242121 - For Dominant genes 4 phenotypes
- P.S.R. 9331 (A_B_, A_bb, aaB_, aabb)
- where _ ?either dominant
or recessive allele
13EXPERIMENTAL OUTCOME
- Estimation of Expected frequency of specific
genotype/ phenotype in population - e.g. Freq A_b_ in F2 9/16 in previous example
- e.g. 4 independent loci
- AaBBccDd in F2
- (with parental groups of cross AAbbCCdd
aaBBccDD) - Clearly
- PAaBBccDd (1/2)(1/4)(1/4)(1/2)
1/64 - N.B. Basis for estimation/testing how closely
observed segregation fits expected segregation
chi-squared
14No. of genotypes/phenotypes for genes in F2
progeny under Hardy-Weinberg ?m
15Mechanisms of MENDELIAN HEREDITY Gene Linkage -
genomic mapping
- Cell division and chromasomes - mitosis, meiosis
- Genetic Linkage association of genes located on
the same chromasome. (Seg. Ratios depart from
Mendelian. Parental (non-recombinant) types more
frequent when recombination frequency low. Result
of recombination (meiosis) is the existence of
non-parental chromaosmes in cellular meiotic
products). Each crossover (or exchange of
chromasomal segments between homologs) creates
two reciprocal recombinant (non-parental gametes) - Recombination in general is random on chromasomes
and recombination between loci is associated with
distance between them. (Basic premises of genetic
mapping). - (Note relationship varies between and within
organisms) - Models, Linkage phase (codominance,
experimental data), factors affecting
recombination, (genetics, environment and
methods, manipulation (genotypes with new gene -
so manipulation potential) - Measurement recombinant fraction
16Example
- For 2 loci, A and B, same chromasome
- A a B b segregation
- two alleles each locus - Ab, Ab, aB, ab gametes by
meiosis - if AB, ab possible
Parents - then Ab, aB recombinants
- Sampling from population, observe nr recombinant
gametes ( Ab and aB) out of total of n samples - Recombinant Fraction r nr/n
- Notes Usually observe phenotypic rather than
gamete frequencies. Estimation of R.F.
using phenotypic data involves constructing the
likelihood. Estimation using M.L.E. - More than 2 or 3 loci - several R.F.,
crossover interference and relationship complex
-use a Mapping Function, based e.g. on Poisson. - General theory of genetic mapping -
dependence on graining
17Example Population genetics
- Focus - frequencies, distributions, origins of
genes in populations, - and changes - due to mutation,
migration, selection - Allelic frequency (Prob) of crossover between two
parents - Cross Comment Allelic freq.(Prob.)
- A1
A2 A3 A4 - abxcd 4 0.25
0.25 0.25 0.25 - abxcc 3 0.25
0.25 0.5 0 - abxab 2 (F2) 0.5
0.5 0 0 - abxaa Backcross 0.75 0.25
0 0 - Aaxaa Fixed 1.0 0
0 0
18QUANTITATIVE GENETICS
- Focus - inheritance of quantitative traits. As
number of genes controlling a trait increases, as
effects on phenotype increase, ability to model
through Mendelian inheritance diminishes. - Single Gene Model
- Single-locus A two alleles A and a, three
possible genotypes AA, Aa ad aa. Three values
a,d, and -a assigned arbitrarily to each of
genotypes. Population assumed to be in
Hardy-Weinberg equilibrium (gene and genotypic
frequencies constant generation to generation),
and two alleles have frequencies of p and (1-p)
q - where population mean in terms of allelic
frequencies and genotypic values
Aa
aa
AA
-a
d
a
19Measures of Interest
- Deviation of genotypic value from population mean
(e.g. a-? for AA in example) - Average effect of gene substitution (? )
average effect on trait of one allele being
replaced by another (e.g. for the two-allele
system described, a gamete containing allele A
?progeny with genotypes AA and aa with
frequencies p and q respectively. Similarly for a
gamete containing allele a ?progeny with
genotypes Aa and aa with frequencies p,q
respectively. Mean values of each from
probability rules and difference between the two
gives ?. Note can also use regression of
genotypic value (as deviation from mean)on number
of copies of target alleles.
20Quantitative Genetics measures contd.
- Breeding Value average genotypic value of
progeny(e.g. progeny having genotype AA receive 2
copies of allele A etc.). - Dominance deviation - D.D. part of genotypic
value not explained by breeding value. - Variances Total Genetic Variance in a Population
Variance of genmotypic values, (usual
probability rules) Sum of variances for B.V.
and D.D. - Heritability ratio of genotypic/phenpotypc
variances - Trait Models (Linear Model e.g. of a continuous
trait - where yij is trait type for genotype i in
replication j, ? the population mean, Gi the
genetic effect for i and ?ij the error term
associated with genotype i in replication j