Title: Random Match Probability Statistics
1Random Match Probability Statistics
- From single source to three person mixtures with
allelic drop out
2Statistics
- There are three kinds of lies lies, damned
lies, and statistics. Benjamin Disraeli,
British Prime Minister as popularized by Mark
Twain - 18.7 of all statistics are made up
- My introduction to forensics statistics. It had
been a loooooong time since sophomore genetics
3Heterozygote
- Alleles P and Q
- Could be PQ or it could be QP
- So 2pq
- Where p is frequency of P
- And q is frequency of Q
- If p 0.2 and q 0.15, then 2(0.2)(.15) 0.06
Most of us understood this pretty quickly
4Homozygote
- Allele P
- Above stochastic threshold
- So p x p or p2
- But theres that T business
Most of us understood this pretty quickly too
5Homozygote
- You dont use p2
- But I understood that
- Use p2 p(1-p)T
- I didnt understand this
- Where did T come from?
- Its the inbreeding coefficient.
6Homozygote
- OK, but where did p2 p(1-p)T come from?
- Its the correction factor for inbreeding.
- Not so helpful
- Why isnt it just p2 T?
7Homozygote
- We start with what we thought
-
- But some percentage is from inbreeding
-
- Correct for that amount of inbreeding
-
- Combine them
-
p2
Tp
(1-T)
p2
8Homozygote
- Now its algebra
- Tp (1 T)p2 (inbred p non-inbred p2)
- Tp p2 Tp2 (expand the terms)
- p2 Tp Tp2 (we like to see p2 term first)
- p2 p(T Tp) (pull out p)
- p2 p(1 p)T (pull out T to get final form)
9Single source stat
- Do the 2pq calculation at each heterozygous locus
- Do p2 p(1 p)T at each homozygous locus
- Then multiply the results for all loci
10Partial single source stat
- What if you dont detect everything from a single
contributor? - Consistent with one contributor, but obvious
there is a lot of drop out
11Partial single source stat
No result
No result
Drop out
??
Drop out
Drop out
No result
12With a sample like this, would you
- Inconclusive data
- Exclude only
- Exclude or inc a person
- Exclude/include no stat
- Exclude/include stat for 2 allele loci
- Exclude/include for all loci with something
detected - Other
13Partial single source stat
- Heterozygous loci still 2pq
14Partial single source stat
- What about loci that you dont know about?
15Partial single source stat
- Any person that is a 9.3 could be the source
- How to calculate 9.3, Any?
16Partial single source stat
- The 9.3 could be a homozygote
- So p2 p(1-p)T covers that
- But the 9.3 could be a heterozygote with any
other allele - So 2pq, but what is q?
17Partial single source stat
- You could go to the ladder
- 2(p)(q)
- p 9.3
- q 4 so 2(f9.3)(f4)
- q 5 so 2(f9.3)(f5)
- q 6 so 2(f9.3)(f6)
- ..
- q 13.3 so 2(f9.3)(f13.3)
- Then add them up
But what about off ladder alleles,
microvariants, etc? How do you do 2pq for those?
18Partial single source stat
- Instead if p is what you see (or detect)
- Then q must be what you dont see (or detect)
- Since this is a binary system
- (What you see/detect) (what you dont) 1.0
- (what you dont see) 1 (what you see/detect)
- So q (1-p)
- Therefore 2pq becomes 2p(1-p)
19Partial single source stat
- Now just combine the homozygote and heterozygote
options (p f9.3) - p2 p(1-p)T 2p(1-p) for anyone with 9.3
20Partial single source stat
- What about loci that look like homozygotes?
- Use your PHR and stochastic threshold studies
- If you treat a locus as a homozygote, you better
be above your stochastic threshold - When in doubt, use Allele, Any youre covered
- At USACIL, Allele, Any modified RMP
21Partial single source stat
- The 2p rule
- Section 5.2.1.3 SWGDAM
5.2.1.3. For single-allele profiles where the
zygosity is in question (e.g., it falls below the
stochastic threshold) 5.2.1.3.1. The formula
2p, as described in recommendation 4.1 of NRCII,
may be applied to this result. 5.2.1.3.2.
Instead of using 2p, the algebraically identical
formulae 2p p2 and p2 2p(1-p) may be used to
address this situation without double-counting
the proportion of homozygotes in the population.
22Partial single source stat
- 2p is an extremely conservative approximation
- There is a better way
- 2p-p2
- p2 2p(1-p)
- But this is even better
- p2 p(1-p)T 2p(1-p)
- (computers can calculate anything)
23Partial single source stat
- Algebraically identical formulae
- f9.3 0.3054
2p p2 p2
2p(1-p) 2(0.3054) - (0.3054)2
(0.3054)2 2(0.3054) (1-0.3054) 0.6108
- 0.09326 0.09326
0.6108 (0.6946) 0.5175
0.09326 0.42426 0.5175
0.5175
24Partial single source stat
- So for 9.3, Any
- 2p 0.6108
- 2p-p2 0.5175
- p2 2p(1-p) 0.5175
- p2 p(1-p)T 2p(1-p) 0.5197
25Minor contributor stat
26When the minor is probative, would you
- Inconclusive data
- Exclude only
- Exclude or inc a person
- Exclude/include no stat
- Exclude/include stat for some allele loci
- Exclude/include for all loci
- Other
27Minor contributor stat
- For our purposes, it is an intimate sample from
known female contributor - Female is major
- Major would have a single source stat
- But isnt probative
- Focus on the minor (or foreign) contributor
28Minor contributor stat
- Situations you need to be able to calculate
- When you know the minor type
- When you are concerned about drop out
- When you are not concerned about drop out, but
you dont know the minor type (masking/sharing) - When you do not see any minor alleles, but still
think the minor contributor is represented
We havent discussed the last two yet
29Minor contributor stat
- When you know the minor type
- 10, 11
- 2pq
- 2(f10)(f11)
- 6, 9.3
- 2pq
- 2(f6)(f9.3)
30Minor contributor stat
- When you are concerned about drop out
- 24, Any
- p2 p(1-p)T 2p(1-p)
- (f24)2 (f24)(1-(f24))T 2(f24) (1-(f24))
31Minor contributor stat
- When you are not concerned about drop out, but
dont know the minor type - What types are possible?
- 9, 9
- 8, 9
- 9, 11
- Combo stat
32Minor contributor stat
- Combo stat
- 9 is above stochastic threshold
- 9, 9
- 8, 9
- 9, 11
- Add them up
p2 p(1-p)T
2pq
2pr
(f9)2 (f9)(1-(f9))T 2(f8) (f9) 2(f9) (f11)
33Minor contributor stat
5.2.2. When the interpretation is conditioned
upon the assumption of a particular number of
contributors greater than one, the RMP is the sum
of the individual frequencies for the genotypes
included following a mixture deconvolution.
Examples are provided below. 5.2.2.1. In a sperm
fraction mixture (at a locus having alleles P, Q,
and R) assumed to be from two contributors, one
of whom is the victim (having genotype QR), the
sperm contributor genotypes included
post-deconvolution might be PP, PQ, and PR. In
this case, the RMP for the sperm DNA contributor
could be calculated as p2 p(1-p) 2pq 2pr.
34Minor contributor stat
35Minor contributor stat
- No minor alleles present, but you know the minor
is contributing - Every other locus has minor alleles
- Did the enzyme just get lazy?
- Just inc the locus for stats
- That doesnt make any more sense than throwing
out any other locus - You just need the right calculator
36Minor contributor stat
- Two scenarios to consider
- No stochastic concerns
- Stochastic concerns
- Two slightly different stats, but can deal with
both
37Minor contributor stat
- No stochastic concerns
- In some cases, PHR and P may help
- 17, 17 or possibly 16, 17
- Maybe not 16, 16
- But, you know minor must be
- 16, 16
- 16, 17
- 17, 17
p2 p(1-p)T
2pq
This is the combo stat
q2 q(1-q)T
38Minor contributor stat
- Couple more definitions
- Unrestricted RMP
- The combo stat where we used all possibilities
- 16,16 and 16,17 and 17,17 from previous slide
- Restricted RMP
- The combo stat where we chose not to use one
(or more) possible types based on what fits peak
heights, peak height ratios, or proportions of
contributors - 17,17 or 16,17 but not 16,16 from previous slide
39Minor contributor stat
- What if stochastic concerns?
- You would take anyone with
- 16, Any
- 17, Any
- But that has the 16, 17 counted twice
- Subtract 16, 17
- But only once!
(p2 p(1-p)T) 2p(1-p)
(q2 q(1-q)T) 2q(1-q)
2pq
40Modified random match probability
- Lets look at this double any calculation
- Simplify by removing T
- This is the basis for dealing with any number of
Allele, Any contributors - USACIL calls this a modified RMP because Anys
are involved
(q2 q(1-q)T) 2q(1-q)
(p2 p(1-p)T) 2p(1-p)
p2
2pq
2pq
2p(1-p)
2q(1-q)
q2
q2
2q(1-q)
2pq
2p(1-p)
p2
41Modified random match probability
- Lets say weve got a two contributor mixture
with signs that both contributors are having
stochastic issues. - But what you see is consistent with two
contributors - Remember Take a stand on the stand.
- Validation studies, interpretation guidelines,
your experience, Tech Review agrees
42Modified random match probability
- Well start with this same pattern
- But stochastic concerns
- Homozygote threshold
- Mixture interpretation threshold
- Stochastic threshold
- Drop out threshold
- Lets just call it the Danger Zone
- Why do I always think of Top Gun when I have
low peak heights?
16 230
17 260
(Were not suggesting that you MUST do this -
only that you can calculate it.)
43Modified random match probability
- Remember the Allele, Any
- 2pq 2p(1-p)
- 2x(what you do see)x(what you dont see)
- (We used it for a single allele below stochastic
threshold for partial or minor contributor) - Because we have two contributors
- 16, Any
- 17, Any
- Or both
or
16 230
17 260
44Modified random match probability
- Also, remember the combo stat for the
combinations you can see - p2 2pq q2
- Well rearrange this in a minute
16 230
17 260
45Modified random match probability
- Allele, Any for p (16)
- 2(what you see)(what you dont)
- 2p(1-?)
- You see two alleles now
- Both p and q (16 and 17)
- Stick with 1 what you see for what you dont
see - 2p(1-(pq)) for p (16)
- Same thing for q (17)
- 2q(1-(pq))
16 230
17 260
46Modified random match probability
- So, the obvious combinations
- Combo for
visible - The Allele, Any combinations
- Allele, Any for
the 16 - Allele, Any for
the 17 - Add them up
p2 2pq q2
16 230
17 260
2p(1-(pq))
2q(1-(pq))
47Modified random match probability
- Here is the formula for multiple Allele, Any
- Now we rearrange that first part
-
-
-
- That last line should look familiar
p2 2pq q2
2p(1-(pq))
2q(1-(pq))
p2 2pq q2
(p q) x (p q)
(p q)2
48Modified random match probability
- Remember back in the good old days?
- CPI stat
- For two alleles
- For three alleles
-
- For nine alleles
(p q)2
(p q r)2
(p q r s t u v w x)2
CPI
49Modified random match probability
- Two ways to think about Allele, Any
- The way we derived it for that minor contributor
- The way that works for as many contributors as we
may need - They are equivalent
- (Remember we dropped T for the top one)
- (CPI math is the foundation for the bottom one,
and doesnt use T)
p2 2p(1-p) q2 2q(1-q) 2pq
(p q)2 2p(1-(pq)) 2q(1-(pq))
50Modified random match probability
- Expand this one (Double Allele, Any
duplicate) - To get
- Rearrange the terms
p2 2p(1-p) q2 2q(1-q) 2pq
p2 2p 2p2 q2 2q 2q2 2pq
p2 q2 2p 2q 2pq 2p2 2q2
51Modified random match probability
- Now expand the other one (Multiple Allele, Any)
- To get
- Rearrange the terms
- Condense the 2pq terms
(p q)2 2p(1-(pq)) 2q(1-(pq))
p2 2pq q2 2p 2p2 2pq 2q 2q2 2pq
p2 q2 2p 2q 2pq 2pq 2pq 2p2 2q2
p2 q2 2p 2q 2pq 2p2 2q2
52Modified random match probability
- Now compare them
- This was the single source one (2 slides ago)
- This is the generic form for multiple
contributors (previous slide)
p2 q2 2p 2q 2pq 2p2 2q2
p2 q2 2p 2q 2pq 2p2 2q2
53Modified random match probability
5.2.2.3. In a mixture having at a locus alleles
P, Q, and R, assumed to be from two contributors,
where all three alleles are below the stochastic
threshold, the interpretation may be that the two
contributors could be a heterozygote-homozygote
pairing where all alleles were detected, a
heterozygote-heterozygote pairing where all
alleles were detected, or a heterozygote-heterozyg
ote pairing where a fourth allele might have
dropped out. In this case, the RMP must account
for all heterozygotes and homozygotes represented
by these three alleles, but also all
heterozygotes that include one of the detected
alleles. The RMP for this interpretation could be
calculated as (2p p2) (2q q2) (2r r2)
2pq 2pr 2qr. 5.2.2.3.1. Since 2p includes
2pq and 2pr, 2q includes 2pq and 2qr, and 2r
includes 2pr and 2rq, the formula in 5.2.2.3
subtracts 2pq, 2pr, and 2qr to avoid
double-counting these genotype frequencies.
54Modified random match probability
- To use RMP you must state the number of
contributors - Validation studies
- Experience
- Yadda, yadda
- Now that we know how to deal with drop out via
Allele, Any, we can use RMP more often - Modified RMP (modified denotes Anys)
- This is the language we use at our lab
55CPI compared to RMP
- But CPI is NOT the same as RMP
- CPI is used when you are unsure about the number
of contributors - Consequently, you have problems when you have
alleles in the stochastic range Danger Zone - If you dont know how many contributors you have,
you dont know how many alleles are missing
56CPI compared to RMP
- But we can use the CPI math in our RMP stat
- We must make two changes to the base CPI
formula that we use in the RMP - 1. We must correct for situations that change the
number of contributors - 2. We must account for allelic drop out
- Weve been through that second, so lets deal
with the first
57CPI compared to RMP
- Consider a four allele pattern
- We interpret the overall profile as having two
contributors. - CPI considers all possible visible combinations
of contributors - (p q r s)2
- This includes P, P and Q, Q and R, R and S, S
types
58CPI compared to RMP
- But if you think you could have a P, P
contributor, that leaves three alleles left - We stated that there were only 2 contributors
- If Contributor 1 is P, P
- Contributor 2 cannot account for Q, R and S
alleles - Having a homozygote changes the assumption of the
number of contributors
59CPI compared to RMP
- So all we need to do is subtract the homozygotes
but only when the presence of a homozygote
changes the number of contributors - 2 contributors and 4 alleles detected
- 3 contributors and 6 alleles detected
60CPI compared to RMP
- Easy to do with a friendly computer
-
-
- USACIL defines this as an Unrestricted RMP
- We kind of think of it as a CPI stat corrected
for a defined number of contributors
(p q r s)2 p2 q2 r2 s2
(p q r s t u)2 p2 q2 r2 s2 t2
u2
61Unrestricted RMP
5.2.2.6. The unrestricted RMP might be calculated
for mixtures that display no indications of
allelic dropout. The formulae include an
assumption of the number of contributors, but
relative peak height information is not utilized.
For two-person mixtures, the formulae for loci
displaying one, two, or three alleles are
identical to the CPI calculation discussed in
section 5.3. For loci displaying four alleles (P,
Q, R, and S), homozygous genotypes would not
typically be included. The unrestricted RMP in
this case would require the subtraction for
homozygote genotype frequencies, e.g., (p q r
s) 2 p2 q2 r2 s2.
62Modified random match probability
- Same thing for our Allele, Any situation
- No need to consider an Allele, Any if it
changes the number of contributors - It doesnt matter how many alleles are below your
stochastic threshold - If you say there are 2 contributors and you
detect 4 alleles, by definition there are no
alleles missing - Similar for 3 contributors and 6 alleles detected
63Modified random match probability
- About as bad as it can get
- 3 contributors
- All alleles are in the Danger Zone
- Each allele could be missing its sister allele
(pqrst)2 2p(1-(pqrst))
2q(1-(pqrst)) 2r(1-(pqrst))
2s(1-(pqrst)) 2t(1-(pqrst))
64Modified random match probability
- GIANT DISCLAIMER!!
- We are not saying that you can charge ahead and
now use any profile of any number of people with
any number of alleles dropping out if you just
use a modified RMP calculation - Bad data is bad data
- Its science, not Voodoo