Confounding from Cryptic Relatedness in Association Studies - PowerPoint PPT Presentation

About This Presentation

Title:

Confounding from Cryptic Relatedness in Association Studies

Description:

Case/control association tests are becoming increasingly popular to identify ... 'Tautological' Hutterite Analysis. Quick-note on the Hutterites ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 24

Provided by: benjami120

Category:

more less

Transcript and Presenter's Notes

Title: Confounding from Cryptic Relatedness in Association Studies

1
Confounding from Cryptic Relatedness in
Association Studies

Benjamin F. Voight
(work jointly with JK Pritchard)

2
Importance

Case/control association tests are becoming
increasingly popular to identify genes
contributing to human disease.
These tests can be susceptible to false positives
if the underlying statistical assumptions are
violated, i.e. independence among all sampled
alleles used in the test for association.
It is well appreciated that population structure
results in false positives (Knowler et al., 1988
Lander and Schork, 1994).
Methods exist which correct for this effect
(Devlin and Roeder, 1999 Pritchard and
Rosenberg, 1999 Pritchard et al. 2000).

Case/control association tests are becoming
increasingly popular to identify genes
contributing to human disease.
These tests can be susceptible to false positives
if the underlying statistical assumptions are
violated, i.e. independence among all sampled
alleles used in the test for association.
It is well appreciated that population structure
results in false positives (Knowler et al., 1988
Lander and Schork, 1994).
Methods exist which correct for this effect
(Devlin and Roeder, 1999 Pritchard and
Rosenberg, 1999 Pritchard et al. 2000).

3
Your (favorite) Population
Obtain a sample of affected cases from
the population.
Obtain a sample of affected cases from
the population.
Cases are not independent draws from the
population allele frequencies.
Cases are not independent draws from the
population allele frequencies.
Problem the relatedness is cryptic, so the
investigator does not know about
the relationships in advance.
Problem the relatedness is cryptic, so the
investigator does not know about
the relationships in advance.
4
Importance

Devlin and Roeder (1999) have argued that if one
is doing a genetic association study, then surely
one must believe that the trait of interest has a
genetic basis that is at least (partially) shared
among affected individuals.
Given that cases share a set of risk factors by
descent, then presumably they are more related to
one another than to random controls.
These authors presented numerical examples which
suggested that this effect may be an important
factor, in practice.
However, these examples were artificially
constructed, and not modeled on any
population-based process.
Few empirical data to suggest if cryptic
relatedness negatively impacts association
studies. In a founder population,
non-independence resulting from relatedness does
matter. (Newman et al., 2001).

Devlin and Roeder (1999) have argued that if one
is doing a genetic association study, then surely
one must believe that the trait of interest has a
genetic basis that is at least (partially) shared
among affected individuals.
Given that cases share a set of risk factors by
descent, then presumably they are more related to
one another than to random controls.
These authors presented numerical examples which
suggested that this effect may be an important
factor, in practice.
However, these examples were artificially
constructed, and not modeled on any
population-based process.
Few empirical data to suggest if cryptic
relatedness negatively impacts association
studies. In a founder population,
non-independence resulting from relatedness does
matter. (Newman et al., 2001).

5
Goals

Determine whether, or when, cryptic relatedness
is likely to be a problem for general
applications.
Develop a formal model for cryptic relatedness in
a population genetics framework.
In a founder population, estimate the inflation
factor due to (cryptic) relatedness, and compare
to analytical results.
Avoid staring at x in front of a chalkboard.

Determine whether, or when, cryptic relatedness
is likely to be a problem for general
applications.
Develop a formal model for cryptic relatedness in
a population genetics framework.
In a founder population, estimate the inflation
factor due to (cryptic) relatedness, and compare
to analytical results.
Avoid staring at x in front of a chalkboard.

6
Modeling Definitions

m affected individuals and m random controls,
sampled in the current generation.
Pairs of chromosomes coalesce in a previous
generation t 1, 2, t with the usual
probabilities.
All samples are typed at a single bi-allelic
locus, unlinked to disease, with alleles B and b,
at frequencies p and (1-p) in the population.

m affected individuals and m random controls,
sampled in the current generation.
Pairs of chromosomes coalesce in a previous
generation t 1, 2, t with the usual
probabilities.
All samples are typed at a single bi-allelic
locus, unlinked to disease, with alleles B and b,
at frequencies p and (1-p) in the population.

7
Definitions

Define
Kp population prevalence of disease.
Kt probability that an relative of type t (or t
) of an affected proband is also affected.
lt recurrence risk ratio, Kt/Kp (Risch, 1990).
Gi(a) indicator (0 or 1) for the B allele on
homologous chromosome a for the i-th case. (with
a Î 0, 1 for diploid individuals)
Hj(a) as above, but for a j-th random control.

Define
Kp population prevalence of disease.
Kt probability that an relative of type t (or t
) of an affected proband is also affected.
lt recurrence risk ratio, Kt/Kp (Risch, 1990).
Gi(a) indicator (0 or 1) for the B allele on
homologous chromosome a for the i-th case. (with
a Î 0, 1 for diploid individuals)
Hj(a) as above, but for a j-th random control.

Define a test statistic which measure the
difference in allele counts between cases and
controls (slightly modified from Devlin and
Roeder, 1999)

Define a test statistic which measure the
difference in allele counts between cases and
controls (slightly modified from Devlin and
Roeder, 1999)

Under the null hypothesis of no association
between the marker and phenotype, an allele has a
genotype B with probability p, independently for
all alleles in the sample. If so,

Under the null hypothesis of no association
between the marker and phenotype, an allele has a
genotype B with probability p, independently for
all alleles in the sample. If so,

If cryptic relatedness exists in the sample, then
the variance of the test call this VarT
may exceed the variance under the null. We
measure the deviation from the null variance
using the inflation factor d

If cryptic relatedness exists in the sample, then
the variance of the test call this VarT
may exceed the variance under the null. We
measure the deviation from the null variance
using the inflation factor d

9
(No Transcript)
10

Recall that we want the variance to our test, T,
under a model of cryptic relatedness

Recall that we want the variance to our test, T,
under a model of cryptic relatedness

Use the following non-dodgy assumptions
1. Draws of alleles from the population are
simple Bernoulli trials. (Variance terms)
2. Controls are a random sample from the
population. (Covariance terms with Hjs are 0)
3. Allow the possibility that cases and controls
depart from Hardy-Weinberg proportions by some
factor, call this F. (Covariance terms for
alleles in the same individual)
4. For the mutational model,
a. Suppose the mutation process is the same for
cases and random controls.
b. Conditional on a case and random chromosome
having a very recent coalescent time (on the
order of 1-10 generations), assume that the
chance that the alleles are in different states
is 0.

Use the following non-dodgy assumptions
1. Draws of alleles from the population are
simple Bernoulli trials. (Variance terms)
2. Controls are a random sample from the
population. (Covariance terms with Hjs are 0)
3. Allow the possibility that cases and controls
depart from Hardy-Weinberg proportions by some
factor, call this F. (Covariance terms for
alleles in the same individual)
4. For the mutational model,
a. Suppose the mutation process is the same for
cases and random controls.
b. Conditional on a case and random chromosome
having a very recent coalescent time (on the
order of 1-10 generations), assume that the
chance that the alleles are in different states
is 0.

11
Then after
JKP attempts desperately to keep me honest.
Smoke from my brain
Me, after many hours of intensive
thought processing
12

VarT can be simplified to

where i?i.

And now, we evaluate the covariance term under a
model of cryptic relatedness. This covariance
term is fairly complicated, but it is related to
the following probability

Apply some Bayesian Trickery

and after some plug and play we finally get

14
(No Transcript)
15
Under an additive model

Handy relationship between any lrs and the
sibling recurrence risk ratio, a single parameter
under an additive model (Risch, 1990)

where fr is the kinship coefficient for type-r
relatives, which is ¼ for r 1, and decays by ½
for each increment to r. Using this relationship
we can simplify
16
Simulations

Use Wright-Fisher forward simulation to assess
analytical results
Simulate 1,000 bi-allelic unlinked loci forward
in time 4N generations, with mutation parameter q
4Nm 1. ()
Choose a single locus with the desired disease
allele frequency, and assign phenotypes to all
members of the population under an additive
genetic model.
Select m cases and m random controls, use all
non-disease loci to infer the inflation factor
based on the mean of all tests.

() because WF simulations are notoriously slow
to simulate, we use a speed-up by simulating a
smaller population with a proportionally higher
mutation rate, and then rescale the population
size and mutation rate to the desired levels.
17
Simulation Results
95 central interval about the mean was at least
.001 in each case.
18
Tautological Hutterite Analysis

Quick-note on the Hutterites
13,000 member pedigree where the genealogy is
known, with 800 members phenotyped/genotyped at
many markers across the genome.
Target (for each phenotype)
a. Estimate coalescent probabilities for cases
and random controls based on the genealogy
allele-walking simulations
b. Calculate the inflation factor (d) for each
phenotype, and compare to the analytic
prediction.

19
Note increased probabilities in cases over random
controls for recent coalescent times
20
Hutterite Analysis

Quick-note on the Hutterites
13,000 member pedigree where the genealogy is
known, with 800 members phenotyped/genotyped at
many markers across the genome.
Target (for each phenotype)
a. Estimate coalescent probabilities for cases
and random controls based on the genealogy
allele-walking simulations
b. Calculate the inflation factor (d) for each
phenotype, and compare to the analytic
prediction.

21
Empirical ds in a Founder Population
The inbreeding coefficient (F) was estimated at
.048 and was included in the calculation.
22
Summary

We modeled cryptic relatedness using
population-based processes. Surprisingly, these
expressions are functions of directly observable
parameters (population size, sample size, and the
genetic model parameterized by lr).
Our analytical results indicate that increased
false positives due to cryptic relatedness will
usually be negligible for outbred populations.
We applied out technique to a founder population
as an example. For six different phenotypes we
found evidence for inflation, which matched
analytic predictions.

We modeled cryptic relatedness using
population-based processes. Surprisingly, these
expressions are functions of directly observable
parameters (population size, sample size, and the
genetic model parameterized by lr).
Our analytical results indicate that increased
false positives due to cryptic relatedness will
usually be negligible for outbred populations.
We applied out technique to a founder population
as an example. For six different phenotypes we
found evidence for inflation, which matched
analytic predictions.

23
Acknowledgements

JK Pritchard and NJ Cox (thesis advisors)
Carole Ober (access to the empirical data)
/
NIH, NIH/NIGMS Genetics Training Grant

Fine, name that tune from memory, recite of the
first 1677 words of Kingmans 1982 paper and Ill
get the next round.
In the bar at the conference during the week

Write a Comment

User Comments (0)