Title: Exploratory Data Analysis EDA
1Section 3-5
- Exploratory Data Analysis (EDA)
2EXPLORATORY DATA ANALYSIS
Exploratory data analysis (EDA) is the process of
using statistical tools (such as graphs, measures
of center, and measures of variation) to
investigate data sets in order to understand
their important characteristics.
3OUTLIERS
- An outlier is a value that is located very far
away from almost all of the other values. - An outlier is also known as an extreme value.
- Outliers can have a dramatic effect on the mean,
standard deviation, and on the scale of the
histogram so that the true nature of the
distribution is totally obscured. - To find outliers, examine a sorted list of data
and look for values that are far from most other
values.
45-NUMBER SUMMARY
For a set of data, the 5-number summary consists
of
- the minimum value
- the first quartile, Q1
- the median (or second quartile, Q2)
- the third quartile, Q3 and
- the maximum value.
5EXAMPLE
Find the 5-number summary for Bank of Providence
waiting times.
6BOXPLOTS(BOX-AND-WHISKER DIAGRAMS)
Boxplots are good for revealing 1. center of
the data 2. spread of the data 3. distribution
of the data 4. presence of outliers Boxplots are
also excellent for comparing two or more data
sets.
7CONSTRUCTING A BOXPLOT
- Find the 5-number summary.
- Construct a scale with values that include the
minimum and maximum data values. - Construct a box (rectangle) extending from Q1 to
Q3, and draw a line in the box at the median
value. - Draw lines extending outward from the box to the
minimum and maximum data values.
8AN EXAMPLE OF A BOXPLOT
9DRAWING A BOXPLOTON THE TI-83/84
- Press STAT select 1Edit.
- Enter your data values in L1. (Note You could
enter them in a different list.) - Press 2ND, Y (for STATPLOT). Select 1Plot1.
- Turn the plot ON. For Type, select the boxplot
(middle one on second row). - For Xlist, put L1 by pressing 2ND, 1.
- For Freq, enter the number 1.
- Press ZOOM. Select 9ZoomStat.
10EXAMPLE
Use boxplots to compare the waiting times at
Jefferson Valley Bank and the Bank of Providence.
Interpret your results.
11BOXPLOTS AND DISTRIBUTIONS
12BOXPLOTS AND DISTRIBUTIONS (CONTINUED)
13BOXPLOTS AND DISTRIBUTIONS (CONCLUDED)
14EXPLORING
- Measures of Center mean, median, and mode
- Measures of Variation standard deviation and
range - Measures of Dispersion minimum value, maximum
value, and quartiles - Unusual Values outliers
- Distribution histogram, stem-leaf plots, and
boxplots
15EXAMPLE
Explore the data below which shows the ages of
most employees at the Vita Needle Company. 76
45 72 77 63 87 73 84 86
79 86 75 87 74 39 75 41 82
34 88 85 79 73 53 65 (Based on
data from Where Retirement Became a Dirty Word
by Julie Flaherty, New York Times.)