Title: Describing Data
1Describing Data
2Lecture Objectives
- You should be able to
- Define Basic Terms
- Recognize Types of Data and Data Scales
- Draw appropriate graphs based on type of data and
type of analysis desired. - Interpret the graphs
3Basic Terms
- Data, Information, and Knowledge
- Populations and Samples
- Variables and Observations
- Types of Data
- Categorical and Numerical
- Cross Sectional and Time Ordered
4Data, Information, and Knowledge
Knowledge
Information
- Processing
- Analysis
- Reports
- Application
- Meaning
- Relevance
5Populations and Samples
Sample Subset of collection Described by
Statistics
Population Collection of all possible entities
of interest Described by Parameters
Statistical Inference Art and science of using
samples to make conclusions about populations.
6Variables and Observations
VARIABLES
Entity Height (inches) Weight (pounds) Age (years) Sex (Category)
Person 1 Person 2 Person 3 67 61 72 170 120 220 33 38 62 Male Female Male
OBSERVATIONS
Measurement
7Types of Data Categorical and Numerical
Categorical
Numerical
8Data Scales
- Data are generally classified into four types
- Nominal Categorical data
- Ordinal shows ranks, intervals may vary
- Interval intervals are constant, arbitrary 0
- Ratio Numeric data with a real 0 value.
- Ordinal, Interval and Ratio scales are all
Numeric data.
9Types of Data Time Series and Cross-sectional
Population
Month (Millions)
1900 56
1910 58
1920 60
1930 65
1940 76
1950 84
1960 95
1970 120
1970
Population GDP Gender
Country (Millions) Billion Ratio
USA 160 575 0.998
China 800 155 1.105
India 600
Nigeria 100
Japan 120
Canada 30
Variable(s) at one point in time across multiple
entities (countries in this case)
Variable(s) over time
10Numeric Data (Interval or Ratio) Frequency
Tables
A Frequency Table showing a classification of the
AGE of attendees at an event.
Relative
Class Frequency Frequency Percent
10 to 20 3 0.15 15
20 to 30 6 0.30 30
30 to 40 5 0.25 25
40 to 50 4 0.20 20
50 to 60 2 0.10 10
20 1.00 100
11Frequency Histograms
A graphical display of distribution of frequencies
12Developing Frequency Tables and Histograms
- Sort Raw Data in Ascending Order
- 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35,
- 37, 38, 41, 43, 44, 46, 53, 58
- Find Range 58 - 12 46
- Select Number of Classes 5 (usually between 5
and 15) - Compute Class Interval (width) 10 (range/classes
46/5 then round up) - Determine Class Boundaries (limits) 10, 20, 30,
40, 50 - Compute Class Midpoints 15, 25, 35, 45, 55
- Count Observations Assign to Classes
13Categorical DataBar Charts
Obs Age Gender State Salary
1 25 M FL 25
2 28 F SC 36
3 31 M GA 44
4 35 F GA 38
5 36 M SC 56
6 38 F FL 68
7 42 M SC 79
8 51 F FL 64
9 55 M GA 88
10 61 F FL 71
11 62 M GA 92
12 65 F SC 54
State Freq
FL 3
SC 5
GA 4
14Categorical DataPie Charts
State Freq
FL 3
SC 5
GA 4
15Numeric Data by Category
F M
FL 66.00 25.00
GA 70.00 74.67
SC 53.67 67.50
16Bivariate Numerical DataScatter Plot
17Two variables, different units
Year CO Nox
1990 154,188 25,527
1991 147,128 25,180
1992 140,895 25,261
1993 135,902 25,356
1994 133,558 25,350
1995 126,778 24,955
1996 128,859 24,786
1997 117,911 24,706
1998 115,380 24,347
1999 114,541 22,843
2000 114,465 22,599
2001 106,263 21,546
2002 109,235 21,277
2003 107,062 20,476
2004 104,892 19,564
2005 102,721 18,947
2006 100,552 18,226
- Source http//www.epa.gov/ttn/chief/trends/trends
06/nationaltier1upto2006basedon2002finalv2.1.xls
18Chapter Summary
Categorization Bar, Pie charts Distribution
Stem and Leaf, Histogram, Box Plot Relationships
Scatter Plots, Line Charts Multivariate Spider
Plots, Maps, Bubble Charts