CIS205 Forensic Statistics - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

CIS205 Forensic Statistics

Description:

Chloroform heaver than water forms lower layer: Will pull purple color into lower layer ... piece of glass found at the scene of the crime; these would be ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 27
Provided by: osirisSun
Category:

less

Transcript and Presenter's Notes

Title: CIS205 Forensic Statistics


1
CIS205 Forensic Statistics
  • Module Leader
  • Michael.Oakes_at_sunderland.ac.uk

2
Data Types, Location and Dispersion
  • Chapter 2 of Introduction to Statistics for
    Forensic Scientists by David Lucy (Wiley, 2005)

3
Types of Data
  • Nominal, simply classified into different
    categories, the ordering having no significance
    e.g. people classified by sex (male/female),
    drugs classified by location (South America /
    Afghanistan / Indian / Oriental)
  • Ordinal, data again classified into discrete
    categories, but this time the ordering does
    matter, e.g. the development of the third molar
    classified into ten categories related to age
    (Solari and Abramovitch, 2002).
  • Continuous data can take any value, e.g. the
    concentration of magnesium in glass can be any
    value between 0 and 5, such as 1.225.

4
Types of Data (2)
  • Nominal and Ordinal data types are known
    collectively as discrete, because they place
    entities into discrete exclusive categories.
  • All three data types are called variables.
  • There are nominal and ordinal variables which are
    used to classify other variables, called factors.
    E.g. ?9-THC concentrations in marijuana seizures
    from various years in the 1980s in Table 2.1.
    Here ?9-THC is a continuous variable, and year
    is an ordinal variable used as a factor to
    classify ?9-THC.

5
Table 2.1. Year and ?9-THC for marijuana seizures
(ElSohly et al, 2002)
6
Table 2.2 Data of Table 2.1 classified by year
as a factor.
7
Marijuana
  • Marijuana
  • Derived from the plant Cannabis
  • Hashish concentrated
  • Sinsemilla unfertilized flowering tops of the
    female Cannabis plant
  • Active ingredient is THC
  • Potency is normally 4-5
  • Simsemilla averages 6-12
  • Liquid hashish averages 8-22
  • Potential medical uses

8
(No Transcript)
9
Identification of Marijuana
  • Green Plant Material
  • Dry Package in Paper
  • Microscopic Examination
  • Look for Bear Claw cystolythic hair on top
    surface of leaf
  • Duquenois-Levine Color test (Screening)
  • 2 vanillin, 1 acetaldehyde in Ethanol
  • Hydrochloric acid purple color
  • Chloroform heaver than water forms lower layer
    Will pull purple color into lower layer
  • Thin Layer Chromatography (TLC)
  • Results THC red color on plate
  • Marijuana is a mixture of compounds

10
Powders / Color Tests
  • Marquis Test 2 formaldehyde in H2SO4
  • Purple
  • Opiates
  • Orange to brown
  • Amphetamine Meth
  • Blue
  • Ecstasy
  • Red
  • Aspirin
  • Pink
  • cocaine

11
Populations and Samples
  • Generally, in chemistry and biology, a sample is
    something taken for the purposes of examination,
    such as a fibre or piece of glass found at the
    scene of the crime these would be termed
    samples.
  • In statistics, sample has a different meaning. It
    is a subset of a larger set, known as a
    population.
  • In Table 2.1, the ?9-THC column gives
    measurements of the ?9-THC in a sample of
    marijuana seizures at the corresponding date. In
    this case the population is marijuana seizures.

12
Distributions
  • A distribution is an arrangement of frequencies
    of some observation in a meaningful order.
  • If all 20 values for the THC content of 1986
    marijuana seizures on the next slide are grouped
    into broad categories, i.e. the continuous
    variable THC is made into an ordinal variable
    with many values, then the frequencies of THC
    content in each category can be tabulated
  • This table can be represented graphically as a
    histogram.

13
?9-THC concentrations in a sample of 20 marijuana
seizures taken in 1986, arranged in ascending
order
  • 6.29
  • 7.05 7.21
  • 7.72 7.91
  • 8.16 8.29 8.32 8.40 8.41 8.41
  • 8.82 8.84 8.93
  • 9.02 9.26
  • 9.74, 9.95
  • 10.30
  • 10.70

14
(No Transcript)
15
The histogram
  • The histogram, which gives the sample frequency
    distribution for ?9-THC in marijuana from 1986,
    has 3 important properties
  • It has a single highest point at about 8.25
    ?9-THC, the two ends of the distribution having
    progressively lower frequencies as they get
    further from the highest point. The curve is
    unimodal, and shows that ?9-THC tends towards a
    value about 8.25.
  • The distribution is more or less symmetric about
    the 8.25 value, i.e. not skewed.
  • The distribution is dispersed about the 8.25
    point in some measurable way.

16
Location
  • How do we measure the typical properties and
    the dispersions ?
  • First some mathematical notation and terminology
    is required.

17
Arrays and Scalars
  • Let x be an array such that x 2, 4, 3, 5, 4.
    This means that x is a series of quantities
    called an array which are indexed by the suffix
    i, so that
  • n is the number of elements in array x. In this
    case there are five elements in x, so that
  • n is a single number on its own, and is sometimes
    referred to as a scalar

18
Summation S
19
Multiplication
  • Mathematicians often leave out multiplication
    signs, so rather than writing out 3 x a 6, they
    write 3a 6.
  • But 3 x 4 12 would never be written as 34 12.

20
There are 3 basic measures of location, mean,
median and mode.
Mean is the arithmetic mean, what we usually
think of as average, denoted by
In the previous example,
21
Median
  • Median is simply the value of the middle one of a
    number of values ordered in increasing magnitude.
  • If x 2,4,3,5,4, let x be an ordered vector
    of x so that x 2,3,4,4,5. In the range 1 to 5
    the central value is the third, so the median is
    4.
  • For even n split the difference of the two middle
    values

22
Mode
  • Mode is the value with most instances. In x
    2,4,3,5,4 there are two occurrences of 4, so 4
    is the modal value.
  • Technically, for the THC concentration data all
    values are on a continuous scale, so there are no
    repeats. However, if the data are grouped, as
    with the histogram, the modal group for the
    sample from 1986 is the one with the tallest
    column, corresponding to a value of 8.25
    (mid-point of modal group).

23
Skewed distributions
  • Using the correct measure of location is
    important.
  • Usually this will be the mean, but in the case of
    incomes the median and mode give a truer picture.
  • If x 12000, 20000, 21000, 11000, 9000, 7000,
    13000, 85000, 120000 in then mean 33111,
    median 13000.
  • This is an example of a skewed distribution, in
    this case highly skewed towards the higher values
    of income (positively skewed).

24
The standard measure of dispersion is called
variance
The reason we use n-1 rather than n is to offset
the sample size.
There are other measures of dispersion, including
the inter-quartile range.
25
Hierarchies of variation
  • Measurements from empirical sources are nearly
    always subject to some form of variability
  • The lowest level in the hierarchy is
    observational variability an observation is made
    on the same entity several times in exactly the
    same way, and those observations are seen to
    vary.
  • The magnitude of observational variability may be
    zero for discrete variable types, but may be
    considerable for continuous variables.
  • The next level up is within entity variability
    the same entity is repeatedly measured, but we
    vary the way in which it is measured.
  • Within sample variability is where different
    entities from the same sample (such as the
    composition of different fragments from the same
    pane of glass). Again this may be zero for
    discrete variable types.
  • Between sample variabiltiy, e.g. THC levels in
    marijuana seizures in 1986 and 1987.
  • These stages in the hierarchy of variation tend
    to be additive.

26
Matlab Practicals
  • Cell M9
  • F Drive
  • MATLAB71
Write a Comment
User Comments (0)
About PowerShow.com