Visualizing Data - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Visualizing Data

Description:

Communicate complex ideas with clarity, precision, and efficiency Tufte ... Mean number of days with min temperature 32 degrees. Land area in square miles ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 49
Provided by: dnrCo
Category:
Tags: data | visualizing

less

Transcript and Presenter's Notes

Title: Visualizing Data


1
Visualizing Data
  • Visualizing Quantitative Data,
  • Tufte E. R., Graphics Press, 2001
  • Graphical Methods for Data Analysis,
  • Chambers J., Cleveland, B. Kleiner,
  • and P. Tukey, Duxbury Press, Boston, 1983
  • Exploratory Data Analysis,
  • Tukey J., Addison-Wesley Pub Co., 1977

2
Perspective
3
Tufte Graphics reveal data.
4
(No Transcript)
5
Communicate complex ideas with clarity,
precision, and efficiency Tufte
  • Show the data
  • Substance rather than method
  • Avoid distortion
  • Present many numbers in a small space
  • Make large data sets coherent
  • Encourage eye to make comparisons
  • Reveal data at several levels
  • Purpose
  • Description, exploration, tabulation, decoration
  • Closely integrated with statistical and verbal
    descriptions

6
Napoleans Russian CampaignMinard (1885) Tufte
(2001)
http//www.math.yorku.ca/SCS/Gallery/re-minard.htm
l
7
Leaves data (histogram)
hist(leaves)
8
(stem leaf)
  • N 132 Median 68
  • Quartiles 58, 79
  • Decimal point is 1 place to the right of the
    colon
  • 2 7
  • 3
  • 3 689
  • 4 3
  • 4 555688
  • 5 00022223344
  • 5 556777778888899
  • 6 00001111122223344
  • 6 5556667888888999
  • 7 0011222233444
  • 7 5555556666788899999
  • 8 0000112222344
  • 8 555567777999
  • 9 01234

stem(leaves)
9
Boxplot
boxplot(leaves)
10
Quality Control
  • Head, Neck, Arm Length (cm)

11
Shirt1 data
head neck arm 1 56.0 32.00 45.50 2
22.0 12.00 20.00 3 57.0 33.60 47.00 4
59.5 32.00 35.00 5 57.0 37.50 56.00 6
57.0 34.00 51.00 7 56.0 34.00 55.00 8
54.2 30.00 52.00 9 NA 36.00 53.00 10
55.2 31.60 68.00 . . .
12
Shirt1 data
dim(shirt1) 1 97 3 shirt112,12 head
neck 1 56 32 2 22 12 shirt1head15 1
56.0 22.0 57.0 59.5 57.0 shirt115,1 1
56.0 22.0 57.0 59.5 57.0
13
hist(shirt1head)
14
par(mfrowc(2,2)) hist(shirt1head) title("head")
hist(shirt1neck) title("neck") hist(shirt1arm) t
itle("arm") par(mfrowc(1,1))
15
plot(shirt1neck,shirt1head)
16
Script (convert inches to cm)
shirtlt-shirt1 for(i in 197 ) if(shirt1neck
i lt 20) shirti, lt- shirt1i,2.54
17
plot(shirtneck,shirthead)
18
lines(supsmu(shirtneck,shirthead))
19
par(mfrowc(2,2)) hist(shirthead) title("head") h
ist(shirtneck) title("neck") hist(shirtarm) titl
e("arm") par(mfrowc(1,1))
20
Further exploration
  • range(shirthead,na.rmT)
  • 51 61
  • range(shirtneck)
  • 28.000 41.275
  • range(shirtarm)
  • 28 85
  • seq(25,90,by5)
  • 25 30 35 40 45 50 55 60 65 70 75 80 85 90

21
To create histograms with the same axes
  • par(mfrowc(3,1))
  • hist(shirthead,na.rmT,breaksseq(25,90,by5))
  • title("head")
  • hist(shirtneck,breaksseq(25,90,by5))
  • title("neck")
  • hist(shirtarm,breaksseq(25,90,by5))
  • title("arm")
  • par(mfrowc(1,1))

22
(No Transcript)
23
boxplot(shirt,cex1.5)
24
Stem and leaf plot
  • round(shirtneck)
  • 1 32 30 34 32 38 34 34 30 36 32 33 32 30 38
  • 15 33 35 34 34 41 34 36 38 36 38 36 31 40 31
  • 29 38 32 34 32 36 36 28 39 38 28 33 32 37 37
  • 43 39 34 34 32 36 30 38 38 33 31 31 32 35 30
  • 57 38 30 36 34 36 36 36 33 38 29 32 33 36 36
  • 71 32 34 31 32 32 32 36 40 34 32 33 34 36 34
  • 85 40 33 37 31 30 33 40 34 37 37 36 37 36

25
  • stem(round(shirtneck))
  • N 97 Median 34
  • Quartiles 32, 36
  • Decimal point is at the colon
  • 28 00
  • 29 0
  • 30 0000000
  • 31 000000
  • 32 000000000000000
  • 33 000000000
  • 34 000000000000000
  • 35 00
  • 36 00000000000000000
  • 37 000000
  • 38 0000000000
  • 39 00

26
Some commands
  • mean(shirtneck)
  • 34.33247
  • median(shirtneck)
  • 34
  • var(shirtneck)
  • 8.668525
  • stdev(shirtneck)
  • 2.944236
  • sqrt(var(shirtneck))
  • 2.944236
  • sample(shirtneck,3)
  • 37.50 33.50 38.75
  • shirt.samplelt-sample(shirtneck,20)
  • mean(shirt.sample)
  • 34.4605

27
Counts and Amounts
  • 21 Freshmen
  • 32 Sophomores
  • 20 Juniors
  • 21 Seniors
  • 2 Graduates
  • 1 Other

28
Piechart
pie(c(21,32,20,21,2,1),colc(2,3,4,5,6,7),
namesc("Freshman","Sophomore","Junior",
"Senior","Graduate","Other"))
29
ExplodingPiechart
pie(c(21,32,20,21,2,1),colc(2,3,4,5,6,7),
namesc("Freshman","Sophomore","Junior",
"Senior","Graduate","Other"), explodeT)
30
Barplot
barplot(c(21,32,20,21,2,1), namesc("Freshman","
Sophomore","Junior", "Senior","Graduate","Other"
))
31
Dotchart
dotchart(c(21,32,20,21,2,1), labelsc("Freshman"
,"Sophomore","Junior", "Senior","Graduate","Othe
r"))
32
Higher Level Analyses
  • Multivariate

33
1977 USA State Characteristics
34
Variables by State
  • Population
  • Per capita income
  • Percent illiteracy
  • Life expectancy
  • Murder rate per 100,000
  • Percent high-school graduates
  • Mean number of days with min temperature lt 32
    degrees
  • Land area in square miles

35
Accessing State Data
state.x7715, Population Income
Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05
15.1 41.3 20 50708 Alaska 365
6315 1.5 69.31 11.3 66.7 152
566432 Arizona 2212 4530 1.8
70.55 7.8 58.1 15 113417 Arkansas
2110 3378 1.9 70.66 10.1 39.9
65 51945 California 21198 5114
1.1 71.71 10.3 62.6 20 156361
36
pairs(state.x77)
37
(No Transcript)
38
(No Transcript)
39
Population Size vs. Area
  • my.statelt-as.data.frame(state.x77)
  • plot(my.stateArea, my.statePopulation,
  • xlab"Area",ylab"",
  • pch15,lwd4,cex1.5)
  • mtext(side2,"Population",line3,cex1.5)
  • title("Population vs. Area",cex2.0)

40
Population Size vs. Area
  • my.statelt-as.data.frame(state.x77)
  • plot(my.stateArea,my.statePopulation, xlab"Area
    ",ylab"",type"n", pch15,lwd4,cex1.5)
  • mtext(side2,"Population",line3,cex1.5)
  • title("Population vs. Area,cex2.0)
  • text(my.stateArea,my.statePopulation,
  • row.names(my.state),cex1.5)

41
(No Transcript)
42
State Data Faces
state.x7719, Population Income
Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05
15.1 41.3 20 50708 Alaska 365
6315 1.5 69.31 11.3 66.7 152
566432 Arizona 2212 4530 1.8
70.55 7.8 58.1 15 113417 Arkansas
2110 3378 1.9 70.66 10.1 39.9
65 51945 California 21198 5114
1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06
6.8 63.9 166 103766 Connecticut 3100
5348 1.1 72.48 3.1 56.0 139
4862 Delaware 579 4809 0.9
70.06 6.2 54.6 103 1982 Florida
8277 4815 1.3 70.66 10.7 52.6
11 54090 The feature parameters are 1-area
of face 2-shape of face 3-length of nose
4-location of mouth 5-curve of smile 6-width
of mouth 7, 8, 9, 10, 11-location, separation,
angle, shape and width of eyes 12-location of
pupil 13, 14, 15-location, angle and width of
eyebrow. faces(state.x7719,,labelsc("Alabama
","Alaska","Arizona","Arkansas","California",
"Colorado","Connecticut","Delaware","Florida"))
43
(No Transcript)
44
State Data Stars
state.x7719, Population Income
Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05
15.1 41.3 20 50708 Alaska 365
6315 1.5 69.31 11.3 66.7 152
566432 Arizona 2212 4530 1.8
70.55 7.8 58.1 15 113417 Arkansas
2110 3378 1.9 70.66 10.1 39.9
65 51945 California 21198 5114
1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06
6.8 63.9 166 103766 Connecticut 3100
5348 1.1 72.48 3.1 56.0 139
4862 Delaware 579 4809 0.9
70.06 6.2 54.6 103 1982 Florida
8277 4815 1.3 70.66 10.7 52.6
11 54090
45
State Data Stars
plot(c(0.5,3.5),c(0.5,3.5),type'n',axesF,xlab'
',ylab' ') symbols(c(1,1,1,2,2,2,3,3,3),
c(3,2,1,3,2,1,3,2,1),stars
t(t(state.x7719,)/apply(state.x7719,,2,max))
,addT) text(c(1,1,1,2,2,2,3,3,3),
c(3,2,1,3,2,1,3,2,1), c("Alabama","Alaska","Arizo
na","Arkansas","California", "Colorado","Connect
icut","Delaware","Florida")) symbols(2.5,2.5,sta
rst(c(.1,.2,.3,.4,.5,.6,.7,.8)),addT) text(2.5,2
.5,"Key")
46
(No Transcript)
47
(No Transcript)
48
brush(my.state)
Write a Comment
User Comments (0)
About PowerShow.com