OA3103: Data Analysis - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

OA3103: Data Analysis

Description:

Labs, accounts, book, S-Plus (2000?), R. The command line is your friend. Moving the labs? ... Approximate Normality is often good 'The Ladder of Transformations' ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 25
Provided by: Butt9
Category:

less

Transcript and Presenter's Notes

Title: OA3103: Data Analysis


1
OA3103 Data Analysis
  • Spring 2002
  • Prof. Buttrey, Glasgow 290
  • buttrey_at_nps.navy.mil x3035
  • http//web.nps.navy.mil/buttrey

2
Before We Begin...
  • Labs, accounts, book, S-Plus (2000?), R
  • The command line is your friend
  • Moving the labs?
  • Mid-term and final projects
  • Three rules
  • Try to show up on time
  • It might not be funny
  • No food or beverages in the computer room

3
Review of Univariate Statistics
  • Center mean(x), median(x)
  • Spread sqrt(var(x)), diff(quantile (x,
    c(.25, .75)))
  • Shape hist (x), density(x), boxplot
    (x)(two-d) boxplot (list (x,y))
  • QQplots qqnorm (x)(two-d) qqplot (x, y)

4
S-Plus Interlude
  • Vector- and scalar-valued functions
  • List handling
  • Functions with side effects (graphics)
  • Handling NAs
  • Read the stuff on my home page

5
Normal QQ Plot
  • Suppose your data set has 20 points
  • 1 is the 2.5th percentile, 2 is the 7.5th
    percentile, , 20 is the 97.5th ile
  • (Definitions of percentiles differ)
  • The 2.5th percentile of N(0, 1) is 1.96 the
    7.5th is 1.45 the 97.5 is 1.96
  • Plot your data, sorted, against 1.96, 1.45, ,
    1.96 and look for a straight line

6
QQ-plot Example
7
Transformations
  • Symmetry is often good
  • Approximate Normality is often good
  • The Ladder of Transformations
  • x3, x2, x1, log(x), 1/x, 1/x2, ...,
    especially log
  • Think about the justification
  • Right tails are more common than left

8
Example prim9,4
9
Symmetry Plots
  • Plot
  • (distance from the median for first point)
  • against (distance for the last)
  • (distance for the second point)
  • against (distance for 2nd-last), etc.
  • For a symmetric distn, these should all fall
    near a straight line

10
Splus Interlude II
  • Symmetry Plot
  • Sort xs get x median(x) deal with even/odd
    thang
  • First half are negative change em
  • Now plot the first half (reversed) against the
    second half. Pass additional arguments to plot()

11
Example prim9 contd.
12
What Should It Look Like?
13
Moderate-size samples
14
Nine Plot
  • Measures how weird your picture is under the
    null hypothesis
  • Nine pictures yours in the middle, eight from
    the null all around
  • If yours is the weirdest, theres a p-value of p
    lt .11
  • Easy enough in S-Plus

15
S-Plus Interlude
  • Use par(mfrow c(3,3)) to divide screen into 9
    pieces (see also split.screen() )
  • Simple way use for() loop for pictures 1-4, then
    draw yours, then another loop
  • Examples (hist, qqnorm)
  • Hard way function that takes function, data,
    args, random no. generator

16
Four Views
  • Hamilton plot that shows histogram, boxplot, symm
    plot, qqnorm plot
  • Easy to construct in S-Plus we can pass a power
    argument, too
  • We know how to add a Normal curve to a histogram
  • qqline() adds a line to a qqnorm plot

17
S-Plus Interlude
  • Try the v4() function on some known
    distributions t, logistic, uniform, exp
  • Now try it on some real data, like prim9
  • Sometimes nothing works (e.g. col. 7)
  • Again think about the justification!

18
Two-dimensional Data
  • (y1, x1), (y2, x2), , (yn, xn)
  • First rule draw pictures
  • primary scatterplot (plot(x, y) )
  • Example beer data
  • Are higher calories associated with higher
    alcohol levels?
  • Different question does alcohol cause calories?

19
Linear Regression
  • We have a bunch of (xi, yi) pairs
  • Maybe they are a sample from some bivariate
    distribution
  • Maybe the xs are fixed as part of an experiment,
    and the ys are measured with error -- another
    measurement, with the same x, might have produced
    a different y.

20
Scatterplot Example
21
Linear Regression
  • Assume E(YX) is linear then it has the form
    E(YX) b0 b1 x
  • (This is a big assumption!)
  • b0 is the true intercept b1 is the true slope
  • We want to estimate the bs by some pair of
    numbers b0 and b1.
  • Regression is the modeling of E(YX)

22
Modeling E(YX)
  • How to fit a line to a scatterplot?
  • Answer no. 1 the method of least squares
  • Every possible combination of b0 and b1 describes
    a line
  • Every line describes predictions of E(YX)

23
Modeling E(YX)
  • Pick the line for which the predictions are least
    wrong
  • Measure wrongness by the sum of squared
    differences between observed ys and predicted
    ys (residuals)
  • Smallest sum of squares best line
  • Other measures possible

24
Not a good model
Write a Comment
User Comments (0)
About PowerShow.com