Title: PROC TABULATE
1PROC TABULATE
2(No Transcript)
301 Multiple Choice Poll
- What is your experience with the TABULATE
procedure? - I have never used the procedure.
- I seldom use the procedure.
- I often use this procedure.
- I have seen PROC TABULATE code that other people
use. - Other.
4Sample PROC TABULATE Output
- These two versions of the same summary report
show the count of the Order_Type variable
grouped by Order_Date and Customer_Country.
r102d01
5The TABULATE Procedure
- The TABULATE procedure produces
- summary reports based on class variables and
analysis variables - output that is tabular in structure
- statistics that are calculated based on the class
and analysis variables specified for the table.
6PROC TABULATE Statement
- The PROC TABULATE statement invokes the
procedure. Selected options for the PROC TABULATE
statement alter the format used for all the data
cells or affect other aspects of the report
(missing values or style of data cells). - General form of the PROC TABULATE statement
PROC TABULATE DATAlibref.filename
FORMATw.d
7TABULATE Procedure Features
- The TABULATE syntax enables you to specify the
output table dimensions (PAGE, ROW, COLUMN) and
the table structure, based on the table operators
you use. - Within the TABULATE syntax, a universal class
variable (ALL) enables you to obtain totals in
any dimension. - An extensive group of keyword statistics are also
available for analysis. - Before the TABULATE procedure can build the
table, the variables in the table must be
identified as analysis or class variables.
8Sample PROC TABULATE Output
- Table dimensions are page, row, and column.
Column
Page
Row
r102d01
9Model Syntax for the TABULATE Procedure
PROC TABULATE DATASASdataset CLASS
classification-variable(s) VAR
analysis-variable(s) TABLE
page-dimension,
row-dimension,
column-dimension RUN
The CLASS and VAR statements tell PROC TABULATE
how to use the variables. The TABLE statement
tells PROC TABULATE how to arrange the variables
in a table. All of the variables in the TABLE
statement must appear in either the VAR statement
or the CLASS statement.
10PROC TABULATE Statements
The TABLE statement is required, accompanied by
either a VAR statement, or a CLASS statement, or
both.
11Additional PROC TABULATE Statements
- Other SAS statements also work within a PROC
TABULATE step
12The CLASS Statement
You can use multiple CLASS statements.
13CLASS Statement
- The CLASS statement specifies one or more
variables that the TABULATE procedure uses to
group the data. - Class variables can be either character or
numeric - Class variables can have continuous values, but
they typically have a few discrete values that
define the classifications of the variable. - You do not have to sort the data by class
variables.
General form of the CLASS statement
CLASS classification-variable(s)
14(No Transcript)
15Setup for the Poll
- Consider the following hypothetical variables and
their values
1602 Multiple Answer Poll
- Which of the following variables are most likely
to be used as class variables to set categories
for PROC TABULATE? - Price
- Type
- Country
- Date
- Year
1702 Multiple Answer Poll Correct Answer
- Which of the following variables are most likely
to be used as class variables to set categories
for PROC TABULATE? - Price
- Type
- Country
- Date
- Year
class type country
18Alternative Class Variables
price
type
country
date
year
- The variables date and year could also be used in
a CLASS statement. - Although date and year are numeric, they can be
used as class variables. The variable date is a
SAS date variable, so you might want to use a
format to group your date values into other units
of time such as year, month, or quarter.
class date year
19Alternative Class Variables
price
type
country
date
year
- The variable price could also be used in a CLASS
statement. - However, because it contains continuous values
you might want to set categories for price. - A user-defined format would enable you to
classify price into categories such as Under 50
or Over 50.
class price
20The VAR Statement
You can use multiple VAR statements.
21VAR Statement
- The VAR statement identifies which variables are
used for analysis. - Analysis variables are numeric variables for
which the TABULATE procedure calculates
statistics. - The values of an analysis variable can be
continuous or discrete. - Only numeric variables can be used in a VAR
statement.
VAR analysis-variable(s)
22(No Transcript)
23Setup for the Poll
- Consider the following hypothetical variables and
their values.
2403 Multiple Choice Poll
- Which variable is most likely to be used in a
VAR statement for analysis (such as SUM or MEAN)
with PROC TABULATE? - Price
- Type
- Country
- Date
- Year
2503 Multiple Choice Poll Correct Answer
- Which variable is most likely to be used in a
VAR statement for analysis (such as SUM or MEAN)
with PROC TABULATE? - Price
- Type
- Country
- Date
- Year
var price
26Alternative VAR Variables
date
year
15JAN2004 23FEB2006 15APR2007 11JUN2007
2004 2005 2006
The variables date and year could also be used
for analysis. SAS date variables are numeric
variables, so they can be used in a VAR
statement. However, it is unlikely that you would
ever find the MEAN or SUM of year.
var date year
27(No Transcript)
28The TABLE Statement
You can use multiple TABLE statements within one
PROC TABULATE step, as long as all the variables
used by all of the TABLE statements appear in the
CLASS and VAR statements for the step.
29TABLE Statement Operators
- Almost all TABLE statement syntax uses table
operators.
30TABLE Statement Dimensions
- The TABLE statement enables you to specify how
the variables will be arranged in the table
structure. - The TABLE statement specifies which variable, or
variable and statistics, appear in which table
dimension.
TABLE row-dimension,
column-dimension
TABLE column-dimension
Column
Row
r102d01
31Simple One-Dimensional Table Example
- This TABLE statement puts Order_Type values in
the column dimension with the default statistic
of N shown for each Order_Type.
Column
class order_type table order_type
r102d01
32Simple Two-Dimensional Table Example
- This TABLE statement puts Order_Type values in
the row and the sum of Quantity in the column for
each Order_Type. The table operator in this
example is a comma (,).
Column
class order_type var quantity table
order_type, quantity
Row
r102d01
33Three-Dimensional TABLE Statement
Page
Column
TABLE page-dimension,
row-dimension, column-dimension
Row
- The TABLE statement specifies the table structure
for variable names, keyword statistics with table
operators. It is possible to have one-, two-, or
three-dimensional tables. The table operator in
this example is the comma (,).
r102d01
34Simple Three-Dimensional Table Example
- This TABLE statement puts Order_Type values in
the column dimension, the Customer_Country values
in the row dimension and the year of each
Order_Date in the page dimension. The statistic
shown is the count, or N statistic.
Column
Page
class customer_country order_type var
quantity table order_date,
customer_country, order_type
Row
r102d01
35Required Statements
- All of the variables in the TABLE statement must
appear in either the CLASS statement or the VAR
statement. - If you did not have Order_Type in a CLASS
statement, you would receive the following error
message in the SAS log - If you did not have Quantity in a VAR statement,
you would receive the following error message
r102d01
36About the Data
- The data set used is orion.customer_orders.
- A subset of orders in 2005, 2006, and 2007 are
selected from the above file to enable focus on
table structure. - The name of the subset data file is orders.
- User-defined formats are used for several of the
variables.
37Data Set Contents and Formats
- Three variables have user-defined formats
assigned
r102a01
38CLASS and TABLE Statement Example
39Review One-Dimensional Table Example
- This TABLE statement puts Order_Type values in
the column dimension with the default statistic
of N shown for each Order_Type. There are no
table operators in this example.
proc tabulate dataorders class order_type
table order_type run
r102d01
40CLASS and TABLE Statements
- Customer_country is in the column dimension.
- The default statistic for the class variable is
N. - Without a VAR statement, there is no analysis
variable.
proc tabulate dataorders class
customer_country table customer_country run
r102d02
41VAR and TABLE Statement Example
42VAR and TABLE Statements
proc tabulate dataorders var quantity
table quantity run
- Quantity is in the column dimension.
- The default statistic for the analysis variable
is SUM. - Without a class variable to set groups, 1661.00
is the sum of all the quantity values in the SAS
data subset.
r102d02
43Common TABLE Statement Operators
44CLASS, VAR, and TABLE Statements
- Customer_country is in the row dimension.
- Quantity is in the column dimension.
- The comma operator causes the table to have more
than one dimension.
proc tabulate dataorders class
customer_country var quantity table
customer_country, quantity run
r102d02
45Comma Operator Example
proc tabulate dataorders class
customer_country order_type table
customer_country, order_type run
- Customer_country is in the row dimension.
- Order_type is in the column dimension.
- The comma operator causes the table to have more
than one dimension.
r102d02
46Comma Operator
Column
Page
Row
table order_date, customer_country,
quantity
2 commas3 dimensions
r102d02
47(No Transcript)
48Setup for the Poll
- Given the following table examples, determine
which dimension goes in the blank.
table customer_country
1.
no commas ________ dimension
table customer_country, order_type
2.
one comma _______ and _______ dimensions
4904 Multiple Choice Poll
- With NO commas, what will the table dimension be?
- Page
- Row
- Column
5004 Multiple Choice Poll Correct Answer
- With NO commas, what will the table dimension be?
- Page
- Row
- Column
If you work backward from the semicolon, you
can always determine the correct dimension.
C
table customer_country
1.
column
no commas ________ dimension
5105 Multiple Choice Poll
- With ONE comma, what will the table dimensions
be? - Page and Row
- Page and Column
- Row and Column
5205 Multiple Choice Poll Correct Answer
- With ONE comma, what will the table dimensions
be? - Page and Row
- Page and Column
- Row and Column
R
C
2.
table customer_country, order_type
row column
one comma _______ and _______ dimensions
5305 Poll Review
- If you work backward from the semicolon, you can
always determine the correct dimension.
C
table customer_country
1.
column
no commas ________ dimension
R
C
2.
table customer_country, order_type
row column
one comma _______ and _______ dimensions
P
C
R
table customer_country, order_type, quantity
3.
page row column
two commas _______, _______, and _______
dimensions
two commas _______, _______, and _______
dimensions
54(No Transcript)
55Table Defaults
- This two-dimensional table shows all the
defaults default labels, default formats,
default statistic names. - Even though you do not specify a statistic, the
default statistic for an analysis variable is SUM.
proc tabulate dataorders class
customer_country var quantity table
customer_country, quantity run
r102d02
56PROC TABULATE Statement Options
- The FORMAT (or F) option in the PROC TABULATE
statement alters the format used for all the data
cells. Other options affect different aspects of
the report (missing values or style of data
cells).
PROC TABULATE DATAlibref.filename FORMATw.d
57Format Specification in TABULATE Statement
- PROC TABULATE supports the F or FORMAT option.
When specified in the PROC TABULATE statement,
this option controls the formats of the data
cells.
proc tabulate dataorders fcomma6. class
customer_country var quantity table
customer_country, quantity label
customer_country 'Location' quantity
'Quantity' keylabel Sum 'Total' run
r102d02
58PROC TABULATE Statements
- Other SAS statements also work within a PROC
TABULATE step
59Using the LABEL Statement
- The LABEL statement is supported by the TABULATE
procedure, as with other SAS procedures. You
specify a text string to use as the label
forthe variable, which in TABULATEoutput
becomes the columnheader label or the row
headerlabel.
proc tabulate dataorders fcomma6. class
customer_country var quantity table
customer_country, quantity label
customer_country 'Location' quantity
'Quantity' keylabel Sum 'Total' run
r102d02
60Additional PROC TABULATE Statements
- Other SAS statements also work within a PROC
TABULATE step
61Using the KEYLABEL Statement
- The KEYLABEL statement can provide a label for
keyword statistics or can suppress the appearance
of the keyword statistics as a header.
keylabel Sum 'Total'
keylabel Sum ' '
r102d02
62(No Transcript)
63The BLANK Table Operator
64Blank Operator
proc tabulate dataorders class
customer_age_group table customer_age_group
all run
The universal class variable all summarizes all
of the categories for class variables in the same
dimension. customer_age_group and all are in the
column dimension.
r102d03
65Blank Operator
proc tabulate dataorders class
customer_age_group table customer_age_group
all run
- The BLANK operator between customer_age_group and
all arranges the table for all next to the table
for customer_age_group.
r102d03
66Using an Analysis Variable
proc tabulate dataorders class
customer_age_group var quantity table
customer_age_group all,quantity label
quantity 'Quantity' run
- The space (or blank) and comma operators can be
used with all to place summary information in
any dimension.
r102d03
67Common TABLE Statement Operators
68Asterisk Operator
- The asterisk () operator is used to cross or
nest elements.
proc tabulate dataorders fcomma8. class
order_date order_type var quantity table
order_type,order_datequantity allquantity .
. .additional SAS code . . . run
r102d05
69Asterisk Operator
proc tabulate dataorders fcomma8. class
order_date order_type var quantity table
order_type,order_datequantity allquantity .
. .additional SAS code . . . run
- The asterisk () operator is used to cross or
nest elements.
r102d05
70Asterisk Operator
- In this example, the asterisk arranges the nested
variables in the column dimension.
order_date quantity
all quantity
r102d05
71Crossing in the Row and Column Dimensions
- The asterisk operator can be used for crossing in
more than one dimension.
Crossing or nesting in the ROW dimension
table order_typecustomer_country,
order_datequantity allquantity
r102d05
72(No Transcript)
73Requesting Specific Statistics
- The N and MEAN statistics are being calculated.
r102d05
74Requesting Statistics
continued
75Requesting Statistics
If a variable name (class or analysis) and a
statistic name are the same, enclose the
statistic name in single quotation marks.
76Requesting Statistics
TABLE page-var, row-var
statistic, col-var statistic
TABLE page-var, row-var,
col-var statistic
correct
TABLE page-var, row-var
statistic, col-var
correct
When you request statistics in the TABLE
statement,you can specify them in any dimension
however, allstatistics must be requested in only
one dimension.
77Incorrect Statistic Specification
- You get an error if you specify a statistic in
more than one dimension.
table order_typen, order_daten
quantitymean
ERROR There are multiple statistics associated
with a single table cell in the following nesting
Order_Type N Order_Date N. ERROR There
are multiple statistics associated with a single
table cell in the following nesting Order_Type
N Quantity Mean.
r102d05
78Incorrect Statistic Specification
- You will also get an error if you cross All with
a statistic, but you do not include an analysis
variable in the crossing.
table order_type, order_daten allmean
ERROR Statistic other than N was requested
without analysis variable in the following
nesting Order_Type All Mean.
r102d05
79(No Transcript)
80Setup for the Poll
- Which TABLE statement produces this output?
table order_type, order_daten
allquantitymean
a.
table order_type, order_daten allmean
b.
8106 Multiple Choice Poll
- Which TABLE statement produces this output?
- table order_type, order_daten
allquantitymean - table order_type, order_daten allmean
8206 Multiple Choice Poll Correct Answer
- Which TABLE statement produces this output?
- table order_type, order_daten
allquantitymean - table order_type, order_daten allmean
83(No Transcript)
84Review Changing PROC TABULATE Output
- SAS statements and global options can impact
PROC TABULATE output.
85Review Changing PROC TABULATE Defaults
- You have already seen examples of how to use
these statements and options to change some
PROC TABULATE defaults.
86Changing TABULATE Defaults
- The FORMAT, LABEL, and KEYLABEL statements and
options work as previously shown.
label Order_Date 'Order Year' format
order_date year4.
keylabel sum ' ' all'All Countries'
proc tabulate data... fcomma8.
r102d06
87Using TABLE Statement Methods
- You can also use TABLE statement modifiers and
labels to alter the output for specific variables
and/or statistics.
88Using TABLE Statement Options
- TABLE statement options also impact the table
appearance or alter the default behavior. These
options are specified after a slash in the TABLE
statement.
89The TABLE Statement Format Modifier
- You can also use PROC TABULATE specific modifiers
and options to alter the output.
90Format Modifier
- Note how the format modifier in the TABLE
statement takes precedence over the FORMAT option
in the PROC TABULATE statement.
proc tabulate dataorion.customer_orders
fcomma8. ltadditional SAS statementsgt
table order_date, customer_country all,
customer_age_groupquantitysumfcomma8
.2
r102d06
91Formats and PROC TABULATE
- These three methods all change TABULATE output
formats. Since formats can be requested in more
than one table dimension, you must be aware of
how the formats interact.
92Default Format Precedence Behavior
These rules represent the default TABULATE
behavior.
93FORMAT_PRECEDENCE Option
- The FORMAT_PRECEDENCE option is specified after
a slash in the TABLE statement.
94Format Precedence Syntax Model
- To specify which format should take precedence
in SAS9, use the FORMAT_PRECEDENCE option in
the TABLE statement
TABLE page-dimension, row-dimension,
column-dimension /
FORMAT_PRECEDENCEpagerowcolumn
95Format Specified in Two Dimensions
- The general rule is that the column format wins
over other formats, as shown in this example.
Want the row dimension format to be used, not the
column dimension format.
table customer_country'Country' all'All
Countries'fcomma10.0, total_retail_price
' 'order_date'Year' total_retail_price'T
otal'all'Price'fcomma10.2
r102d06
96Using FORMAT_PRECEDENCE
- The FORMAT_PRECEDENCE option instructs PROC
TABULATE to use the row dimension format.
table customer_country'Country' all'All
Countries'fcomma10.0, total_retail_price
' 'order_date'Year' total_retail_price'T
otal'all'Price'fcomma10.2 /
format_precedencerow
r102d06
97(No Transcript)
98Setup for the Poll
- Given the following code, which format is used
for the variable statistics?
proc tabulate dataorion.orders class
order_date customer_country
customer_age_group var quantity table
order_datefcomma8.0, customer_country
allfcomma8.3, customer_age_groupquanti
tysumf8.2 / misstext'N/A' format
order_date year4. run
9907 Multiple Choice Poll
- Given the following code, which format is used
for the variable statistics? - 8.0 (format in page dimension)
- 8.3 (format in row dimension)
- 8.2 (format in column dimension)
10007 Multiple Choice Poll Correct Answer
- Given the following code, which format is used
for the variable statistics? - 8.0 (format in page dimension)
- 8.3 (format in row dimension)
- 8.2 (format in column dimension)
The 8.2 format is used because the format in the
column definition takes precedence over the other
specified formats.
101(No Transcript)
102The TABLE Statement Label Method
- You can also use PROC TABULATE specific modifiers
and options to alter the output.
103Labels and PROC TABULATE
- By default, PROC TABULATE uses variable labels
for variables and statistic names for statistics.
If no labels exist or are specified in the PROC
TABULATE step, then the default is to use the
variable name or statistic name.
104Using Label Strings in the TABLE Statement
- This label method in the TABLE statement
overrides the LABEL statement and/or KEYLABEL
statement value.
The quoted text string to be used as a label
follows an equal sign next to the variable or
statistic name.
table order_date'Year', customer_country'Count
ry' all'All Countries', customer_age_group'Age
Group'quantity'Qty'sum' ' label
customer_country'Cust Cntry' keylabel
sum'Total'
r102d06
105Suppressing Variable Labels
- This LABEL statement does not suppress the
Quantity variable label.
label Order_Date 'Order Year' Quantity
' '
r102d07
106Suppressing Variable Labels
- The blank label specified in the TABLE statement
successfully suppresses the variable label for
Quantity.
table order_date, customer_country all,
customer_age_groupquantity' 'sumf8.0 /
misstext'N/A'
r102d07
107The MISSTEXT Option
- The MISSTEXT option is specified after a slash
in the TABLE statement.
108Using the MISSTEXT Option
- The MISSTEXT option is placed after a slash in
the TABLE statement. You can specify a text
string up to 256 characters to be used when a
cell contains missing values.
table order_date, customer_country all,
customer_age_groupquantity' 'sumf8.0 /
misstext'N/A'
r102d07
109The BOX Option
- The BOX option is specified after a slash in the
TABLE statement.
110Page Text or Box Area Position
- By default, page-dimension text is located at the
upper-left side and above the table.
111Page Text or Box Area Position
- BOX values enable you to place specific text
into the box area in any PROC TABULATE report.
112BOX_PAGE_ Option
table order_date, customer_country all,
customer_age_groupquantity' 'sumf8.0
/ misstext'N/A' box_page_
r102d08
113BOX"STRING" Option
table order_date, customer_country all,
customer_age_groupquantity' 'sumf8.0
/ misstext'N/A' box'Quantity Analysis'
r102d08
114BOX Variable Option
table order_date, customer_country' '
all, customer_age_groupquantity'
'sumf8.0 / misstext'N/A'
boxCustomer_Country
r102d08
115(No Transcript)
116Setup for the Poll
- These two tables were produced with two
different PROC TABULATE programs. Review
program r102a05 and determine which three things
are different between the two PROC TABULATE
steps.
r102a05
117Setup for the Poll
proc tabulate dataorders title
'BOX"STRING"' class order_date var
quantity table order_date' '
all,quantitysumf8.0 / box'Year'
keylabel sum ' ' all'All Countries' format
order_date year4. run proc tabulate
dataorders title 'BOXVariable' class
order_date var quantity table order_date'
' all,quantitysumf8.0 /
boxorder_date label Order_Date 'Year'
keylabel sum ' ' all'All Countries' format
order_date year4. run
r102a05
11808 Multiple Choice Poll
- Review program r102a05 and determine which three
things are different between the two PROC
TABULATE steps. - TITLE statement, BOX option, KEYLABEL statement
- BOX option, LABEL statement, FORMAT modifier
- TITLE statement, BOX option, LABEL statement
- CLASS statement, BOX option, TITLE statement
11908 Multiple Choice Poll Correct Answer
- Review program r102a05 and determine which three
things are different between the two PROC
TABULATE steps. - TITLE statement, BOX option, KEYLABEL statement
- BOX option, LABEL statement, FORMAT modifier
- TITLE statement, BOX option, LABEL statement
- CLASS statement, BOX option, TITLE statement
120(No Transcript)
121Default Column Headers
- Note how the Sum and Mean columns each contain a
header for the Quantity variable.
122Improving Table Appearance
- One way to streamline the table's appearance
would be to have the header for Quantity span
the Sum and Mean columns.
Quantity appears only once.
123Common TABLE Statement Operators
124Without Parentheses Operators
- Without the parentheses operator in your TABLE
statement, the label Quantity appears twice in
the column headers.
table order_type all, order_daten
quantitysum quantitymedian / box"No
Parentheses" label Order_Date 'Order Year'
Quantity 'Quantity'
r102d09
125Using Multiple Statistics with Parentheses
- With the parentheses operator, you can make the
label Quantity appear only once in the column
header.
table order_type all, order_daten
quantity(sum median) / box"Use
Parentheses" label Order_Date 'Order Year'
Quantity 'Quantity'
r102d09
126Using Multiple Statistics with Parentheses
- If you specify a format modifier outside the
parentheses, the format is applied to all the
statistics inside the parentheses.
table order_type all, order_daten
quantity(sum median)fcomma8.2 / box"Same
Format"
r102d09
127Using Multiple Statistics with Parentheses
- With a format modifier inside the parentheses,
the format is applied to the statistic or
variable it modifies.
table order_type all, order_daten
quantity(sumfcomma8.2 medianfcomma6.0)
/ box"Different Format"
r102d09
128Syntax for the TABULATE Procedure
General form of the TABULATE procedure with
commonly used options added
PROC TABULATE DATAlibref.filename Fformat.
CLASS classification-variable(s)
VAR analysis-variable(s) TABLE
page-var'text',
row-var'text', column-var'text'stati
stic'text'Fformat.
/MISSTEXT'text' BOXbox_option FORMAT
variable format. LABEL variable'label
' KEYLABEL statistic'label ' RUN
129(No Transcript)
130Setup for the Polls
- Build the correct TABLE statement for this sketch.
class Customer_Country Order_Type var
Quantity table __________________,
_______________________________
column dimension
row dimension
13109 Multiple Choice Poll
- What is the correct column dimension for the
desired table? - Order_TypeQuantity allSum
- Order_TypeQuantity allQuantity
- Customer_CountryQuantity allCustomer_CountryQua
ntity
13209 Multiple Choice Poll Correct Answer
- What is the correct column dimension for the
desired table? - Order_TypeQuantity allSum
- Order_TypeQuantity allQuantity
- Customer_CountryQuantity allCustomer_CountryQua
ntity
13310 Multiple Choice Poll
- What is the correct row dimension for the desired
table? - Order_Type
- ALL
- Customer_Country
- Quantity
13410 Multiple Choice Poll Correct Answer
- What is the correct row dimension for the desired
table? - Order_Type
- ALL
- Customer_Country
- Quantity
13510 Poll Review
class Customer_Country Order_Type var
Quantity table Customer_Country,
Order_TypeQuantity allQuantity
column dimension
row dimension
136Refine Correct Answer
class Customer_Country Order_Type var
Quantity table Customer_Country,
(Order_Type all) Quantity
class Customer_Country Order_Type var
Quantity table Customer_Country,
Quantity(Order_Type all)
137(No Transcript)
138Thanks for coming!