Title: Data Analysis Quantitative Methods
1Data Analysis (Quantitative Methods)
- Lecture 1
- Fundamental Statistics
2What is statistics
- Statistics is the science of data. This involves
collecting, classifying, summarizing, organizing,
analyzing, and interpreting numerical information.
3Dealing with Data
- Measurement Scales
- Descriptive Statistics
- Inferential Statistics
4Dealing with Data
- Quantitative data are measurements that are
recorded on a naturally occurring numerical
scale. - Qualitative data are measurements that cannot be
measured on a natural numerical scale they can
only be classified into one of a group of
categories.
5Levels of Measurement.
- Nominal Scale (Qualitative category membership
e.g. gender, eye colour, nationality). - Ordinal Scale (Ranks or assignments, positions in
a group e.g. 1st 2nd 3rd). - Interval and Ratio Scales (measured on an
independent scale with units, e.g. I.Q scale.
Ratio scale has an absolute zero point e.g.
distance, Kelvin scale).
6Population
- Population is a set of units (usually, people,
objects, transactions, or events) that we are
interested in studying.
7Sample Statistical inference
- A sample is a subset of the units of a
population. - A statistical inference is an estimate,
prediction, or some other generalization about a
population based on information contained in a
sample
8Variables
- A variable is a characteristic or property of an
individual population unit (a set of unit we are
interested in studying). - Discrete Variables There are no possible values
between adjacent units on the scale. For
Example, number of children in a family. X1, X2,
, Xn - Continuous Variables Is a variable that
theoretically can have an infinite number of
values between adjacent units on the scale. For
Example, Time, height, weight. X e 0,100,
0,30), (12, 80, (1,2)
9Descriptive Statistics
Descriptive statistics utilizes numerical and
graphical methods to look for patterns in a data
set, to summarize the information revealed in a
data set, and to represent that information in a
convenient form.
- Graphical Representation of Data
- Measures of Central Tendency
- Measures of Dispersion
10Representing Data Graphically
- Bar Charts
- Histograms
- Pie Charts
- Scattergrams
11The Bar Chart
- Used for Discrete variables
- Bars are separated
12Histogram
- Columns can only represent frequencies.
- All categories represented.
- Columns are not spaced apart.
13Pie Chart
- Used to illustrate percentages
14Scattergrams - Positive Relationships
15Negative Correlation
16No Relationship
17Measures of Central Tendency
- The Mean
- The Median
- The Mode
18The Mean
Mean Sum of all values in a group divided by
the number of values in that group. So if 5
people took 135, 109, 95, 121, 140 seconds to
solve an anagram, the mean time taken is
135 109 95 121 140
600 --------------------------------------------
----------- 120 5
5
19The Mean Pros Cons
- Advantages
- Very Sensitive Measure.
- Forms the basis of most tests used in inferential
statistics.
- Disadvantages
- Can be effected by outlying scores E.g.
- 135, 109, 95, 121,140 480. Mean 1080/6 180
seconds.
20The Median
The median is the central value of a set of
numbers that are placed in numerical order.
For an odd set of numbers 95, 109, 121, 135, 140
The Median is 121
For an even set of numbers 95, 109, 121, 135,
140, 480 The Median is the two central scores
divided by 2. (121 135)/2 128
21The Median Pros Cons
- Advantages
- Easier and quicker to calculate than the mean.
- Unaffected by extreme values.
- Disadvantages
- Doesnt take into account the exact values of
each item - If values are few it can be unrepresentative.
- e.g 2,3,5,98,112 the median is 5
22The Mode
The Mode The most frequently occurring value.
1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6,
7, 7, 7, 8
The Mode 5
23The Mode Pros cons
- Disadvantages
- Doesnt take into account the value of each item.
- Not useful for small sets of data
- Advantages
- shows the most important value of a set.
- Unaffected by extreme values
24Data Types and Central Tendency Measures.
The Mode may also be used on Ordinal and Interval
Data. The median may also be used on Interval
Data.
25Why look at dispersion?
- 17, 32, 34, 58, 69, 70, 98, 142
- Mean 65
- 61, 62, 64, 65, 65, 66, 68, 69
- Mean 65
26Measures of Dispersion
- The Range
- Variance
- The Standard Deviation
27The Range
The Range is the difference between the highest
and the lowest scores.
Range Highest score - lowest score
4, 10, 5, 12, 6, 14 Range 14 - 4 10
28Variance
Population Variance
Sample Variance
29Calculating the Standard Deviation
Population standard deviation
Sample standard deviation
30Inferential Statistics
- Inferential statistics utilizes sample data to
make estimates, decisions, predictions, or other
generations about a larger set of data. - Inferential statistics allows us to draw
conclusions about populations, and to test
research hypotheses. - Inferential Statistics Involves
- Probability, Distribution Theory,
- Tests of Hypothesis etc.
31Summary
- All data is measured on either Nominal, Ordinal,
Interval or Ratio Scales - Variables can be discrete and continuous
- Descriptive Statistics such as measures of
central tendency and dispersion are used to
describe or characters data - Inferential Statistics is used to make inferences
from sample data about the population at large.
32References
- Statistics, 8th Edition
- MaClave and Sincich
- Prentice Hall, 2000.
-
33Exercise
- Briefly explain what is meant by each of the
following - 1. Statistics
- 2. Descriptive statistics
- 3. Inferential statistics
- 4. Quantitative data
- 5. Qualitative data
- 6. Population variance
- 7. Sample standard deviation
34Exercise
- 1. Calculate the mode, mean, and median of the
following data - 8, 0, 5, 3, 7, 5, 2, 5, 8, 6, 1
- 2. Calculate the range, variance, and standard
deviation of the following sample - 6, 2, 3, 4, 3, 1, 4
- (Answers are in the next slide)
35Answers
- 1. Mean4.545455
- Median5
- Mode5
- 2. Range6-15
- Sample Variance 2.57142857
- Sample Standard Deviation1.60356745