Statistics for Linguistics Students - PowerPoint PPT Presentation

About This Presentation
Title:

Statistics for Linguistics Students

Description:

E.g. examination results (out of 100) 22 98 40 45 16 31 77 78. 55 45 61 91 ... Example: time to utter a particular sentence: x = 3.45s and sd = .84s. Questions: ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 20
Provided by: phon5
Category:

less

Transcript and Presenter's Notes

Title: Statistics for Linguistics Students


1
Statistics for Linguistics Students
  • Michaelmas 2004
  • Week 1
  • Bettina Braun

2
Why calculating statistics?
  • Describe and summarise the data
  • E.g. examination results (out of 100)
  • 22 98 40 45 16 31 77 78
  • 55 45 61 91 87 45 54 66
  • 75 87 88 49 64 76 58 61
  • Average mark/Spread of scores/Lowest and highest
    marks?/Comparison with other results (e.g. from
    last years?)

3
Population vs. Sample
  • Population total universe of all possible
    observations.Populations can be finite or
    infinite, real or theoretical
  • the IQ of all adult men in Britain
  • The outcome of an infinite number of flips of a
    coin
  • Descriptive statitics are called parameters

4
Population vs. Sample (contd)
  • Sample Subset of observations drawn from a given
    population
  • The IQ scores of 100 adult men in Britain
  • The outcome of 50 flips of a coin
  • Descriptive statitics from a sample are called
    statistics
  • Note In experimental research it is important to
    draw a representative, random sample that is not
    biased

5
Histograms Frequency distribution of each event
Data Tutorial1.sav
6
Central tendency mode and median
  • Mode Most frequent mark (Note there may be
    multiple modes)
  • Median score from the middle of the list when
    ordered from lowest to highest. Cuts data into
    halves (doesnt take account of values of all
    scores but only of the scores in middle
    position).

7
Central tendency mean
  • Mean sum of scores divided by the number of
    scores
  • Note on notation Greek letters often used for
    population, roman letters used for statistic
    (properties of a sample)

8
Comparing measures of central tendency
  • Mode
  • quick if we have frequency distribution
  • Possible with categorical data
  • Median
  • Good estimate if we have abnormally large or
    small values (e.g. max aircraft speed of 450km/h,
    480km/h, 500km/h, 530km/h, 600km/h, and 1100km/h)
  • Only influenced by values in the middle of
    ordered data
  • Mean
  • Every score is taken into account
  • Some interesting properties ? Most widely used

9
Types of variables
  • Interval (scale) difference between consecutive
    numbers are of equal intervals (e.g. time, speed,
    distances). Precise measurements
  • Ordinal assignments of ranks that represent
    position along some ordered dimension (e.g.
    ranking people wrt their speed, 1 fastest, 4
    slowest). No equal intervals
  • Categorical (nominal) numerical categories,
    labels (e.g. brown 1, blue 2, green 3)
  • Question on which type of data can we calculate
    a meaningful central tendency?

10
Spread of distributions why?
11
Spread of distributionsrange and quartiles
  • Small spread often desirable as it indicates a
    high proportion of identical scores
  • Large spread indicates large differences between
    individual scores
  • Range difference between highest and lowest
    score rather crude measure
  • Quartiles cuts the ordered data into quarters
    (second quartile median)

12
Median, quartiles, and outliers
  • Outlier (more than 1.5 box lengths above or below
    the box)
  • Interquartile range
  • Extreme value (more than 3 box lengths below or
    above the box)

Largest value which is not outlier
Upper quartileMedianLower quartile
Smallest value which is not outlier
tutorial1.sav simple bp, sep. var
13
Spread of the population variance measures
  • Variance sum of squared deviations from the mean
  • Variance
  • Standard deviation square root of variance

14
Normal distribution (Gaussian distribution)
  • Example IQ scores, mean100, sd16

Mean Median Mode
15
Skewed distributions and measures of central
tendency
16
Bimodal distributions
17
Normal distribution (Gaussian distribution)
  • Example IQ scores, mean100, sd16

Mean Median Mode
18
z-scores
  • Z-score deviation of given score from the mean
    in terms of standard deviations

19
How likely is a given event?
  • Example time to utter a particular sentence x
    3.45s and sd .84s
  • Questions
  • What proportion of the population of utterance
    times will fall below 3s?
  • What proportion would lie between 3s and 4s?
  • What is the time value below which we will find
    1 of the data?
Write a Comment
User Comments (0)
About PowerShow.com