CIS205 Forensic Statistics - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

CIS205 Forensic Statistics

Description:

Chloroform heaver than water forms lower layer: Will pull purple color into lower layer ... piece of glass found at the scene of the crime; these would be ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 27

Provided by: osirisSun

Category:

more less

Transcript and Presenter's Notes

Title: CIS205 Forensic Statistics

1
CIS205 Forensic Statistics

Module Leader
Michael.Oakes_at_sunderland.ac.uk

2
Data Types, Location and Dispersion

Chapter 2 of Introduction to Statistics for
Forensic Scientists by David Lucy (Wiley, 2005)

3
Types of Data

Nominal, simply classified into different
categories, the ordering having no significance
e.g. people classified by sex (male/female),
drugs classified by location (South America /
Afghanistan / Indian / Oriental)
Ordinal, data again classified into discrete
categories, but this time the ordering does
matter, e.g. the development of the third molar
classified into ten categories related to age
(Solari and Abramovitch, 2002).
Continuous data can take any value, e.g. the
concentration of magnesium in glass can be any
value between 0 and 5, such as 1.225.

4
Types of Data (2)

Nominal and Ordinal data types are known
collectively as discrete, because they place
entities into discrete exclusive categories.
All three data types are called variables.
There are nominal and ordinal variables which are
used to classify other variables, called factors.
E.g. ?9-THC concentrations in marijuana seizures
from various years in the 1980s in Table 2.1.
Here ?9-THC is a continuous variable, and year
is an ordinal variable used as a factor to
classify ?9-THC.

5
Table 2.1. Year and ?9-THC for marijuana seizures
(ElSohly et al, 2002)
6
Table 2.2 Data of Table 2.1 classified by year
as a factor.
7
Marijuana

Marijuana
Derived from the plant Cannabis
Hashish concentrated
Sinsemilla unfertilized flowering tops of the
female Cannabis plant
Active ingredient is THC
Potency is normally 4-5
Simsemilla averages 6-12
Liquid hashish averages 8-22
Potential medical uses

8
(No Transcript)
9
Identification of Marijuana

Green Plant Material
Dry Package in Paper
Microscopic Examination
Look for Bear Claw cystolythic hair on top
surface of leaf
Duquenois-Levine Color test (Screening)
2 vanillin, 1 acetaldehyde in Ethanol
Hydrochloric acid purple color
Chloroform heaver than water forms lower layer
Will pull purple color into lower layer
Thin Layer Chromatography (TLC)
Results THC red color on plate
Marijuana is a mixture of compounds

10
Powders / Color Tests

Marquis Test 2 formaldehyde in H2SO4
Purple
Opiates
Orange to brown
Amphetamine Meth
Blue
Ecstasy
Red
Aspirin
Pink
cocaine

11
Populations and Samples

Generally, in chemistry and biology, a sample is
something taken for the purposes of examination,
such as a fibre or piece of glass found at the
scene of the crime these would be termed
samples.
In statistics, sample has a different meaning. It
is a subset of a larger set, known as a
population.
In Table 2.1, the ?9-THC column gives
measurements of the ?9-THC in a sample of
marijuana seizures at the corresponding date. In
this case the population is marijuana seizures.

12
Distributions

A distribution is an arrangement of frequencies
of some observation in a meaningful order.
If all 20 values for the THC content of 1986
marijuana seizures on the next slide are grouped
into broad categories, i.e. the continuous
variable THC is made into an ordinal variable
with many values, then the frequencies of THC
content in each category can be tabulated
This table can be represented graphically as a
histogram.

13
?9-THC concentrations in a sample of 20 marijuana
seizures taken in 1986, arranged in ascending
order

6.29
7.05 7.21
7.72 7.91
8.16 8.29 8.32 8.40 8.41 8.41
8.82 8.84 8.93
9.02 9.26
9.74, 9.95
10.30
10.70

14
(No Transcript)
15
The histogram

The histogram, which gives the sample frequency
distribution for ?9-THC in marijuana from 1986,
has 3 important properties
It has a single highest point at about 8.25
?9-THC, the two ends of the distribution having
progressively lower frequencies as they get
further from the highest point. The curve is
unimodal, and shows that ?9-THC tends towards a
value about 8.25.
The distribution is more or less symmetric about
the 8.25 value, i.e. not skewed.
The distribution is dispersed about the 8.25
point in some measurable way.

16
Location

How do we measure the typical properties and
the dispersions ?
First some mathematical notation and terminology
is required.

17
Arrays and Scalars

Let x be an array such that x 2, 4, 3, 5, 4.
This means that x is a series of quantities
called an array which are indexed by the suffix
i, so that
n is the number of elements in array x. In this
case there are five elements in x, so that
n is a single number on its own, and is sometimes
referred to as a scalar

18
Summation S
19
Multiplication

Mathematicians often leave out multiplication
signs, so rather than writing out 3 x a 6, they
write 3a 6.
But 3 x 4 12 would never be written as 34 12.

20
There are 3 basic measures of location, mean,
median and mode.
Mean is the arithmetic mean, what we usually
think of as average, denoted by
In the previous example,
21
Median

Median is simply the value of the middle one of a
number of values ordered in increasing magnitude.
If x 2,4,3,5,4, let x be an ordered vector
of x so that x 2,3,4,4,5. In the range 1 to 5
the central value is the third, so the median is
4.
For even n split the difference of the two middle
values

22
Mode

Mode is the value with most instances. In x
2,4,3,5,4 there are two occurrences of 4, so 4
is the modal value.
Technically, for the THC concentration data all
values are on a continuous scale, so there are no
repeats. However, if the data are grouped, as
with the histogram, the modal group for the
sample from 1986 is the one with the tallest
column, corresponding to a value of 8.25
(mid-point of modal group).

23
Skewed distributions

Using the correct measure of location is
important.
Usually this will be the mean, but in the case of
incomes the median and mode give a truer picture.
If x 12000, 20000, 21000, 11000, 9000, 7000,
13000, 85000, 120000 in then mean 33111,
median 13000.
This is an example of a skewed distribution, in
this case highly skewed towards the higher values
of income (positively skewed).

24
The standard measure of dispersion is called
variance
The reason we use n-1 rather than n is to offset
the sample size.
There are other measures of dispersion, including
the inter-quartile range.
25
Hierarchies of variation

Measurements from empirical sources are nearly
always subject to some form of variability
The lowest level in the hierarchy is
observational variability an observation is made
on the same entity several times in exactly the
same way, and those observations are seen to
vary.
The magnitude of observational variability may be
zero for discrete variable types, but may be
considerable for continuous variables.
The next level up is within entity variability
the same entity is repeatedly measured, but we
vary the way in which it is measured.
Within sample variability is where different
entities from the same sample (such as the
composition of different fragments from the same
pane of glass). Again this may be zero for
discrete variable types.
Between sample variabiltiy, e.g. THC levels in
marijuana seizures in 1986 and 1987.
These stages in the hierarchy of variation tend
to be additive.