Title: Visualizing Data
1Visualizing Data
- Visualizing Quantitative Data,
- Tufte E. R., Graphics Press, 2001
- Graphical Methods for Data Analysis,
- Chambers J., Cleveland, B. Kleiner,
- and P. Tukey, Duxbury Press, Boston, 1983
-
- Exploratory Data Analysis,
- Tukey J., Addison-Wesley Pub Co., 1977
2Perspective
3Tufte Graphics reveal data.
4(No Transcript)
5Communicate complex ideas with clarity,
precision, and efficiency Tufte
- Show the data
- Substance rather than method
- Avoid distortion
- Present many numbers in a small space
- Make large data sets coherent
- Encourage eye to make comparisons
- Reveal data at several levels
- Purpose
- Description, exploration, tabulation, decoration
- Closely integrated with statistical and verbal
descriptions
6Napoleans Russian CampaignMinard (1885) Tufte
(2001)
http//www.math.yorku.ca/SCS/Gallery/re-minard.htm
l
7Leaves data (histogram)
hist(leaves)
8 (stem leaf)
- N 132 Median 68
- Quartiles 58, 79
- Decimal point is 1 place to the right of the
colon - 2 7
- 3
- 3 689
- 4 3
- 4 555688
- 5 00022223344
- 5 556777778888899
- 6 00001111122223344
- 6 5556667888888999
- 7 0011222233444
- 7 5555556666788899999
- 8 0000112222344
- 8 555567777999
- 9 01234
stem(leaves)
9Boxplot
boxplot(leaves)
10Quality Control
- Head, Neck, Arm Length (cm)
11Shirt1 data
head neck arm 1 56.0 32.00 45.50 2
22.0 12.00 20.00 3 57.0 33.60 47.00 4
59.5 32.00 35.00 5 57.0 37.50 56.00 6
57.0 34.00 51.00 7 56.0 34.00 55.00 8
54.2 30.00 52.00 9 NA 36.00 53.00 10
55.2 31.60 68.00 . . .
12Shirt1 data
dim(shirt1) 1 97 3 shirt112,12 head
neck 1 56 32 2 22 12 shirt1head15 1
56.0 22.0 57.0 59.5 57.0 shirt115,1 1
56.0 22.0 57.0 59.5 57.0
13hist(shirt1head)
14par(mfrowc(2,2)) hist(shirt1head) title("head")
hist(shirt1neck) title("neck") hist(shirt1arm) t
itle("arm") par(mfrowc(1,1))
15 plot(shirt1neck,shirt1head)
16Script (convert inches to cm)
shirtlt-shirt1 for(i in 197 ) if(shirt1neck
i lt 20) shirti, lt- shirt1i,2.54
17plot(shirtneck,shirthead)
18lines(supsmu(shirtneck,shirthead))
19par(mfrowc(2,2)) hist(shirthead) title("head") h
ist(shirtneck) title("neck") hist(shirtarm) titl
e("arm") par(mfrowc(1,1))
20Further exploration
- range(shirthead,na.rmT)
- 51 61
- range(shirtneck)
- 28.000 41.275
- range(shirtarm)
- 28 85
- seq(25,90,by5)
- 25 30 35 40 45 50 55 60 65 70 75 80 85 90
21To create histograms with the same axes
- par(mfrowc(3,1))
- hist(shirthead,na.rmT,breaksseq(25,90,by5))
- title("head")
- hist(shirtneck,breaksseq(25,90,by5))
- title("neck")
- hist(shirtarm,breaksseq(25,90,by5))
- title("arm")
- par(mfrowc(1,1))
22(No Transcript)
23boxplot(shirt,cex1.5)
24Stem and leaf plot
- round(shirtneck)
- 1 32 30 34 32 38 34 34 30 36 32 33 32 30 38
- 15 33 35 34 34 41 34 36 38 36 38 36 31 40 31
- 29 38 32 34 32 36 36 28 39 38 28 33 32 37 37
- 43 39 34 34 32 36 30 38 38 33 31 31 32 35 30
- 57 38 30 36 34 36 36 36 33 38 29 32 33 36 36
- 71 32 34 31 32 32 32 36 40 34 32 33 34 36 34
- 85 40 33 37 31 30 33 40 34 37 37 36 37 36
25- stem(round(shirtneck))
- N 97 Median 34
- Quartiles 32, 36
- Decimal point is at the colon
- 28 00
- 29 0
- 30 0000000
- 31 000000
- 32 000000000000000
- 33 000000000
- 34 000000000000000
- 35 00
- 36 00000000000000000
- 37 000000
- 38 0000000000
- 39 00
26Some commands
- mean(shirtneck)
- 34.33247
- median(shirtneck)
- 34
- var(shirtneck)
- 8.668525
- stdev(shirtneck)
- 2.944236
- sqrt(var(shirtneck))
- 2.944236
- sample(shirtneck,3)
- 37.50 33.50 38.75
- shirt.samplelt-sample(shirtneck,20)
- mean(shirt.sample)
- 34.4605
27Counts and Amounts
- 21 Freshmen
- 32 Sophomores
- 20 Juniors
- 21 Seniors
- 2 Graduates
- 1 Other
28Piechart
pie(c(21,32,20,21,2,1),colc(2,3,4,5,6,7),
namesc("Freshman","Sophomore","Junior",
"Senior","Graduate","Other"))
29ExplodingPiechart
pie(c(21,32,20,21,2,1),colc(2,3,4,5,6,7),
namesc("Freshman","Sophomore","Junior",
"Senior","Graduate","Other"), explodeT)
30Barplot
barplot(c(21,32,20,21,2,1), namesc("Freshman","
Sophomore","Junior", "Senior","Graduate","Other"
))
31Dotchart
dotchart(c(21,32,20,21,2,1), labelsc("Freshman"
,"Sophomore","Junior", "Senior","Graduate","Othe
r"))
32Higher Level Analyses
331977 USA State Characteristics
34Variables by State
- Population
- Per capita income
- Percent illiteracy
- Life expectancy
- Murder rate per 100,000
- Percent high-school graduates
- Mean number of days with min temperature lt 32
degrees - Land area in square miles
35Accessing State Data
state.x7715, Population Income
Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05
15.1 41.3 20 50708 Alaska 365
6315 1.5 69.31 11.3 66.7 152
566432 Arizona 2212 4530 1.8
70.55 7.8 58.1 15 113417 Arkansas
2110 3378 1.9 70.66 10.1 39.9
65 51945 California 21198 5114
1.1 71.71 10.3 62.6 20 156361
36pairs(state.x77)
37(No Transcript)
38(No Transcript)
39Population Size vs. Area
- my.statelt-as.data.frame(state.x77)
- plot(my.stateArea, my.statePopulation,
- xlab"Area",ylab"",
- pch15,lwd4,cex1.5)
- mtext(side2,"Population",line3,cex1.5)
- title("Population vs. Area",cex2.0)
40Population Size vs. Area
- my.statelt-as.data.frame(state.x77)
- plot(my.stateArea,my.statePopulation, xlab"Area
",ylab"",type"n", pch15,lwd4,cex1.5) - mtext(side2,"Population",line3,cex1.5)
- title("Population vs. Area,cex2.0)
- text(my.stateArea,my.statePopulation,
- row.names(my.state),cex1.5)
41(No Transcript)
42State Data Faces
state.x7719, Population Income
Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05
15.1 41.3 20 50708 Alaska 365
6315 1.5 69.31 11.3 66.7 152
566432 Arizona 2212 4530 1.8
70.55 7.8 58.1 15 113417 Arkansas
2110 3378 1.9 70.66 10.1 39.9
65 51945 California 21198 5114
1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06
6.8 63.9 166 103766 Connecticut 3100
5348 1.1 72.48 3.1 56.0 139
4862 Delaware 579 4809 0.9
70.06 6.2 54.6 103 1982 Florida
8277 4815 1.3 70.66 10.7 52.6
11 54090 The feature parameters are 1-area
of face 2-shape of face 3-length of nose
4-location of mouth 5-curve of smile 6-width
of mouth 7, 8, 9, 10, 11-location, separation,
angle, shape and width of eyes 12-location of
pupil 13, 14, 15-location, angle and width of
eyebrow. faces(state.x7719,,labelsc("Alabama
","Alaska","Arizona","Arkansas","California",
"Colorado","Connecticut","Delaware","Florida"))
43(No Transcript)
44State Data Stars
state.x7719, Population Income
Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05
15.1 41.3 20 50708 Alaska 365
6315 1.5 69.31 11.3 66.7 152
566432 Arizona 2212 4530 1.8
70.55 7.8 58.1 15 113417 Arkansas
2110 3378 1.9 70.66 10.1 39.9
65 51945 California 21198 5114
1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06
6.8 63.9 166 103766 Connecticut 3100
5348 1.1 72.48 3.1 56.0 139
4862 Delaware 579 4809 0.9
70.06 6.2 54.6 103 1982 Florida
8277 4815 1.3 70.66 10.7 52.6
11 54090
45State Data Stars
plot(c(0.5,3.5),c(0.5,3.5),type'n',axesF,xlab'
',ylab' ') symbols(c(1,1,1,2,2,2,3,3,3),
c(3,2,1,3,2,1,3,2,1),stars
t(t(state.x7719,)/apply(state.x7719,,2,max))
,addT) text(c(1,1,1,2,2,2,3,3,3),
c(3,2,1,3,2,1,3,2,1), c("Alabama","Alaska","Arizo
na","Arkansas","California", "Colorado","Connect
icut","Delaware","Florida")) symbols(2.5,2.5,sta
rst(c(.1,.2,.3,.4,.5,.6,.7,.8)),addT) text(2.5,2
.5,"Key")
46(No Transcript)
47(No Transcript)
48brush(my.state)