2DS00 - PowerPoint PPT Presentation

About This Presentation
Title:

2DS00

Description:

to prepare students for (first-year) laboratory assignments ... make excercises during guided self-study. reread lecture notes after guided self-study ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 51
Provided by: adibucc
Category:
Tags: 2ds00 | excercises

less

Transcript and Presenter's Notes

Title: 2DS00


1
2DS00
  • Statistics 1 for Chemical Engineering

2
Lecturers
  • Dr. A. Di Bucchianico
  • Department of Mathematics,
  • Statistics group
  • HG 9.24
  • phone (040) 247 2902
  • a.d.bucchianico_at_tue.nl
  • Ir. G.D. Mooiweer,
  • Department of Mathematics
  • ICTOO
  • HG 9.12
  • phone 040 247 4277 (Thursdays)
  • g.d.mooiweer_at_tue.nl
  • Dr. R.W. van der Hofstad
  • Department of Mathematics,
  • Statistics group
  • HG 9.04
  • phone (040) 247 2910
  • rhofstad_at_win.tue.nl

3
Goals of this course
  • to prepare students for (first-year) laboratory
    assignments
  • to learn students how to perform basic
    statistical analyses of experiments
  • to learn students how to use software for data
    analysis
  • to learn students how to avoid pitfalls in
    analysing measurements

4
Important to remember
  • Web site for this course www.win.tue.nl/sandro/
    2DS00/
  • No textbook, but handouts (Word) Powerpoint
    sheets through web site
  • Bring notebook to both lectures and self-study
  • (Optional) buy lecture notes 2256 Statgraphics
    voor regulier onderwijs
  • (Optional) buy lectures notes 2218 Statistisch
    Compendium

5
How to study
  • read lecture notes briefly before lecture
  • ask questions during lecture
  • study lecture notes carefully after lecture
  • make excercises during guided self-study
  • reread lecture notes after guided self-study
  • try out previous examinations shortly before the
    examination
  • N.B. Lecture notes (pdf documents) ? PowerPoint
    files

6
Week schedule
  • Week 1 Measurement and statistics
  • Week 2 Error propagation
  • Week 3 Simple linear regression analysis
  • Week 4 Multiple linear regression analysis
  • Week 5 Nonlinear regression analysis

7
Detailed contents of week 1
  • measurement errors
  • graphical displays of data
  • summary statistics
  • normal distribution
  • confidence intervals
  • hypothesis testing

8
Measurements and statistics
  • perfect measurements do not exist
  • possible sources of measurement errors
  • reading
  • environment
  • temperature
  • humidity
  • ...
  • impurities
  • ...

9
Necessity of good measurement system
10
Three experiments
11
Types of measurement errors
  • Random errors
  • always present
  • reduce influence by averaging repeated
    measurements
  • Systematic errors
  • requires adjustment/repair of measuring devices
  • Outliers
  • recording errors
  • mistakes in applying procedures

12
Illustration of measurement concepts
13
Accuracy
difference between average of measured values and
true value
14
Accuracy
  • relates to systematic errors
  • absolute error
  • relative error

15
Location statistics
  • mean
  • median
  • trimmed means

16
Precision
  • the degree in which consistent results are
    obtained

17
Accurate and precise
18
Statistics for precision standard deviation co
  • standard deviation
  • standard error
  • variation coefficient
  • variance
  • range

19
Robust statistics for precision
  • robust statistics
  • less sensitive to outliers
  • difficult mathematical theory
  • requires use of statistical software
  • interquartile range
  • IQR 75 quantile 25 quantile 3rd quartile
    1st quartile
  • mean absolute deviation

20
Graphical displays
  • always make graphical displays for first
    impression
  • one picture says more than 1000 words

2 3.1 4 1.9 2.8
21
Basic graphical displays
  • scatter plot
  • watch out for scale (automatic resizing)
  • time sequence plot
  • for detecting time effects like warming up
  • Box-and-Whisker plot
  • outliers
  • quartiles
  • skewness

22
Time sequence plot
23
Box-and-Whisker plot
24
(No Transcript)
25
Probability theory
  • (cumulative) distribution function
  • density
  • density to distribution function

26
The concept of probability density
density function
a
b
area denotes probability that observation falls
between a and b
27
Normal distribution
28
Normal distribution
  • bell shaped curve
  • Important because of Central Limit Theorem
  • Normal distribution
  • symmetric around µ (location of centre)
  • spread parametrised by ?2
  • http//www.win.tue.nl/marko/statApplets/function
    Plots.html
  • http//www-stat.stanford.edu/naras/jsm/NormalDen
    sity/NormalDensity.html
  • µ0 and ?21 standard normal distribution Z

29
More on normal distribution
  • Area between
  • ? ? 0,67? is 0,500
  • ? ? 1,00? is 0,683? ? 1,645? is 0,975
  • ? ? 1,96? is 0,950
  • ? ? 2,00? is 0,954
  • ? ? 2,33? is 0,980? ? 2,58? is 0,990
  • ? ? 3,00? is 0,997

30
Standardisation
  • X normally distributed with parameters ? en ?2,
    then (X-?)/? standard normal
  • suppose
  • ?3
  • ?24

31
Testing normality
  • many statistical procedures implicitly assume
    normality
  • if data are not normally distributed, then
    outcome of procedure may be completely wrong
  • user is always responsible for checking
    assumptions of statistical procedures
  • Graphical checks
  • normal probability plot
  • density trace
  • Formal check
  • Shapiro-Wilks test

32
Estimation of density function histogram
curve normal distribution with sample mean and
variance as parameters
33
Drawbacks of the histogram
  • misused for investigating normality
  • time ordering of data is lost
  • shape depends heavily on bin width bin location

Histogram for strength
5
4
same data set
3
frequency
2
1
0
24
29
34
39
44
49
54
strength
  • shape is stable for data sets of size 75 or
    larger
  • optimal number of bins ??n

34
Alternative to histogram Density Trace
  • Density Trace (also called naive density
    estimator)
  • use moving bins instead of fixed bins
  • choose bin width (automatically in Statgraphics)
  • count number of observations in bin at each
    point
  • divide by length of bin

35
Density Trace
  • Example dataset

4/9
3/9
2/9
1/9
1
2
3
4
5
6

36
Choice of bin widths in density trace
  • too small bin width yields too fluctuating curve
  • too large bin width yields too smooth curve

37
Patterns in distribution normal curve
  • Depicted by a bell-shaped curve
  • Indicates that measurement process is running
    normally

38
Patterns in distribution bi-modal curve
  • Distribution appears to have two peaks
  • May indicate that data from more than process
    are mixed together

39
Patterns in distribution saw-toothed
  • Also commonly referred to as a comb distribution,
    appears as an alternating jagged pattern
  • Often indicates a measuring problem
  • improper gauge readings
  • gauge not sensitive enough for readings

40
Testing normality
41
Normal Probability Plot
42
Normally distributed?
43
Normal Probability Plot of not normally
distributed data
44
Test for Normality Shapiro-Wilks
  • statistical test for Normality Shapiro-Wilks
  • idea sophisticated regression analysis in the
    spirit of normal probability plot
  • makes Normal Probability Plot objective
  • check outliers (measurement error? normality
    sometimes disturbed by single observation)
  • analyse if not normally distributed

45
Statgraphics Shapiro Wilks
Tests for Normality for width Computed
Chi-Square goodness-of-fit statistic
254.667 P-Value 0.0 Shapiro-Wilks W statistic
0.921395 P-Value 0.000722338
  • Interpretation
  • value statistic itself cannot need be
    interpreted
  • P-value indicates how likely normal distribution
    is
  • use ? 0.01 as critical value in order to avoid
    too strict rejections of normality

46
Dixons test
  • Box-and-Whisker plot graphical test of outliers
  • if data are normally distributed, then formal
    test may be used

47
Disadvantages of point estimators
48
Confidence intervals
  • 95 confidence interval for µ probability 0.95
    that interval contains true value µ
  • more observations ? narrower interval (effect in
    particular for n lt 20)
  • higher confidence ? wider interval
  • example ?0,05 ?

49
Confidence intervals example
50
Hypothesis testing
  • example test whether there is a systematic
    error

Hypothesis Tests for meting Sample mean
4.994 Sample median 5.01 t-test ------ Null
hypothesis mean 5.0 Alternative not
equal Computed t statistic -0.155011 P-Value
0.880233 Do not reject the null hypothesis for
alpha 0.05.
Write a Comment
User Comments (0)
About PowerShow.com