Exploring Data - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Exploring Data

Description:

Side-by-side boxplots comparing gas mileages of minicompact and two-seater cars ... shows the relationship between two numerical variables measured on the same ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 26
Provided by: awc7
Category:

less

Transcript and Presenter's Notes

Title: Exploring Data


1
Exploring Data
  • The description of data is an important link
    between its collection and its interpretation.
  • Exploratory data analysis combines numerical
    summary with graphical display in an attempt to
    discern patterns in data
  • Data are often values of a numerical variable.
    The pattern of these values is called a
    distribution.

2
Data Analysis
  • Individuals Objects described by a set of data
  • Variable any measured characteristics of an
    individual
  • Explanatory variable Causes/Reasons
  • Response variable Effects/Consequences
  • The organizing principles
  • Examine individual variables, then look for
    relationships between several variables
  • Draw a graph and add to it numerical summaries
  • Look first for an overall pattern and then for
    significant deviations from the pattern

3
Numerical Summary
  • Measure of center
  • Mean Arithmetic average
  • Median the midpoint of a set of observations
  • Measures of spread or variability
  • Range The difference between the highest and
    lowest observations
  • Quartiles The first quartile is the midpoint of
    all data below the median. The third quartile is
    the midpoint of all data above the median.
  • Standard deviation Square root of the variance
    it measures the spread of the data around the
    mean in the same units of measurement as the
    original data set
  • Variance average of the squared differences
    between the individual observations and their mean

4
Example
  • For the following data set
  • 85, 74, 64, 92, 76, 85, 32, 67,71 and 59.
  • Find mean, median, quartiles, range, variance,
    standard deviation

5
Standard Deviation
  • The standard deviation gives the average distance
    of every observation from the mean. A small
    standard deviation means that the observations
    are clustered about the mean. A large standard
    deviation means that the data is scattered away
    from the mean.

6
Standard Deviation
  • The variance of a set of data is the average of
    the squares of their deviations from the mean.
  • The standard deviation s is the square root of
    the variance

7
Standard Deviation
  • S measures spread about the mean, so we must use
    mean and not median.
  • S0 only if there is no spread. Otherwise sgt0.
    The larger s, the more spread.
  • S has the same units as the units of the
    observations.
  • S is strongly influenced by a few extreme
    observations.

8
Visual Summary of Data
  • Histogram Bar graph
  • Stemplot
  • Boxplot
  • Scatterplot points plotted on xy-plane

9
Histograms
  • A histogram is a common graph of the distribution
    of outcomes (often divided into classes) for a
    single variable. The number of each bar is the
    number of observations in the class of outcomes
    covered by the base of the bar
  • How to make a histogram
  • Divide the range of data into classes of equal
    width
  • Count the number of individuals in each class
  • Draw the histogram

10
(No Transcript)
11
(No Transcript)
12
Interpreting Histograms
  • Important features
  • Overall Pattern Shape, center and spread
  • Deviation Outlier
  • Symmetric distribution A distribution with a
    histogram or stem plot in which the left and
    right sides are approximately mirror image of
    each other
  • A distribution is skewed to the right if the
    right side extends much farther out than the left
    side
  • A distribution is skewed to the left if the left
    side extends much farther out than the right side

13
(No Transcript)
14
Stemplot
  • A good way to represent data for small data sets
  • Look like a histogram turned sideway.
  • Easy to make
  • Quicker to create than histogram and give more
    detailed information
  • Preserve the actual values of data

15
Stemplot
  • To make a stemplot
  • Separate each observation into a stem consisting
    of all but the final digit. Stems may have as
    many digits as needed, but each leaf contains
    only a single digit.
  • Write the stems in a vertical column with the
    smallest at the top and draw a vertical line to
    the right of this column.
  • Write each leaf in a row to the right of its
    stem, in increasing order from its stem

16
Boxplot
  • Five-number summary
  • Min first quartile Median third
    quartile max
  • ( Min Q1 M
    Q3 Max)
  • Boxplot a visual display of the five-number
    summary
  • A central box spans the quartiles
  • A line in the box marks the median
  • Lines extend from the box out to the smallest and
    largest observations

17
Side-by-side boxplots comparing gas mileages of
minicompact and two-seater cars
18
(No Transcript)
19
Mean and Standard Deviation vs Five-number
Summary
  • The five-number summary is usually better than
    the mean and standard deviation for describing a
    skewed distribution or a distribution with
    outliers.
  • Use the mean and standard deviation for
    reasonably symmetric distributions free of
    outliers.

20
Scatterplot
  • A useful device for examining relationship
    between 2 variables
  • Consists of points in the plane These points
    represent pairs of values for the variables in
    question, one variable being plotted along the
    x-axis and the other along the y-axis

21
Scatterplot
  • A scatterplot shows the relationship between two
    numerical variables measured on the same
    individuals.
  • The values of one variable are shown on the
    horizontal axis and the values of the other
    variable are shown on the vertical axis.
  • Each point corresponds to coordinates on both
    axes.
  • You can describe the overall pattern of a
    scatterplot by the form (what shape is it?) ,
    direction (up? Down?) , and strength (do the
    points lie close to the shape or are they
    scattered about it?) of the relationship

22
Linear Relation
  • Often, a scatterplot suggests a linear relation
    between 2 variables
  • ?the points look as though they may be scattered
    about a line in the plan. Such a line is called
    regression line

23
(No Transcript)
24
(No Transcript)
25
Homework assignment
  • 3-9 odd, 13, 15,19, 21, 23, 37, 39, 41
Write a Comment
User Comments (0)
About PowerShow.com