Creating a Compact Columnar Output with PROC REPORT - PowerPoint PPT Presentation

About This Presentation
Title:

Creating a Compact Columnar Output with PROC REPORT

Description:

Creating a Compact Columnar Output with PROC REPORT Walter R. Young Principal Clinical Programmer Analyst Wyeth Why create a columnar output from a data set. – PowerPoint PPT presentation

Number of Views:264
Avg rating:3.0/5.0
Slides: 41
Provided by: admin1621
Learn more at: http://www.philasug.org
Category:

less

Transcript and Presenter's Notes

Title: Creating a Compact Columnar Output with PROC REPORT


1
Creating a Compact Columnar Output with PROC
REPORT
  • Walter R. Young
  • Principal Clinical Programmer Analyst
  • Wyeth

2
Why create a columnar output from a data set.
  • For own use.
  • To show others at a meeting.
  • To put in a standardized regulatory report.
  • To put in a publication.
  • The above are ranked in approximately increasing
    effort required to create the report.
  • Authors Opinion It is the neatest, most
    heuristic way to present a data set.

3
PROC CONTENTS for ECGTEST57 Variables 28
Observations
4
Solution 1 Default PROC PRINT
  • Advantages
  • Easy and Neat for Narrow Data Set
  • Can Use ID and VAR Statements.
  • Disadvantages
  • Virtually No Beautification Options
  • Doesnt Work for a Wide Data Set
  • Wraps Neatly but One Cant Control Wrapping with
    the exception of increasing the page size.
  • Wide Columns are Truncated.

5
Default PROC PRINT
6
Default PROC REPORT
7
Solution 2 Use a Default PROC REPORT with NOWD
(Required in Batch) Option
  • Advantages
  • Output is columnar.
  • Small number of variables fit on a page.
  • Disadvantages
  • Rows are not identified for multiple pages.
  • Spacing between columns is uneven.
  • Column labels split unattractively.
  • Column order isnt optimum.
  • Wide columns cause program to bomb.

8
PROC REPORT Defaults and Properties
  • PROC was designed to run interactively.
  • Labels are used as column headers.
  • Variables are in position order.
  • Spacing is 2 including before the first column.
    Actual spacing rules are explainable but messy.
  • WRAP with NAMED options create messy output.
  • If all variables are numeric and none are
    specified as DISPLAY, they are summed instead of
    listed.
  • MISSING option is needed to print all data rows.
  • If a variable name not in the data set, is listed
    in both the COLUMN and DEFINE statements, no
    error will result.

9
PROC REPORT Justification Rules
  • Default is right for numeric - left for
    character.
  • Numerical values are right justified within
    formats which are justified within the width.
  • Character values are justified in width with
    leading blanks retained and trailing blanks
    eliminated.
  • Rules apply simultaneously to labels and values.

10
PROC REPORT SPLIT Character
  • The SPLIT character splits both labels and FLOW
    variables.
  • Choose a printable character(e.g., , , \)
    other than the default / which is common in
    many entered texts.
  • If a words length in a flowed variable is
    greater than the variables width, the word will
    split at that width.
  • To indent flowed text, insert a split character
    plus the desired number of spaces and one at the
    end of the text.
  • If there is a split character in the flowed
    variable, words at the end of the field will
    split randomly due to a SAS bug which will be
    fixed in a future SAS version. To fix this one
    either widens the field to eliminate non-indented
    flow or writes a macro to insert split characters
    where desired.
  • If unprintable printer control characters exist
    in the flowed variable, they must be removed.
    This is an uncommon problem which can happen if
    data is coming from many sources.

11
Solution 3 Use ID Statement (introduced in 6.12)
With PROC REPORT
  • Observations are identified.
  • However, All Other Default Problems Exist.
  • Wont Work if the Width of any Variable
  • Exceeds the Inherent PROC REPORT limit.
  • Plus the Width of the ID variables Plus the
    Spaces Between Columns Exceeds Line Size.
  • (In this case the FLOW option must be used.)

12
Solution 4 PROC REPORT with Minimal Options for
a Narrow Data Set
  • Must Use a COLUMN Statement (Analogous to a VAR
    statement in PROC PRINT).
  • Use a BREAK Statement for spaces between lines.
  • Use HEADLINE, HEADSKIP, BREAK Statement or
    underline to separate labels from the
    observations.
  • Specify a Constant Spacing Between Columns.
  • Customize Labels in the DEFINE statements or Use
    Variable Names (System NOLABEL option).
  • Possibly use PANEL option to minimize paper use.
  • The above gives you most of the features of using
    a PUT statement formatting (DATA _NULL_).

13
If the width of a data set wont fit within the
line size, one should make the output compact.In
a compact output, the maximum number of
observations of the variables should be made to
either fit on the width of a single page or on
the width of a minimum number of pages.Authors
Opinion Presenting the data in columns on a
single page width, neatly and informatively, is
more heuristic than presenting it on multiple
pages.
Solution 5 Use PROC REPORT With All Applicable
Options for a Wide Data Set
14
To make the report compact
  • Make every reasonable effort to limit width to a
    single page.
  • Reduce the space between columns to one.
  • Drop space before 1st column (SPACING0 in
    DEFINE).
  • Drop unnecessary variables from COLUMN statement.
  • Drop variables having the same value for all
    observations and consider putting them in a
    title, footnote or legend.
  • Sort the data by sensible variables having a fair
    number of rows for each combination in the BY
    statement and use the BY in PROC REPORT.
  • For data sets wider than a single page, pick the
    minimum of ID variables to adequately identify
    all observations. Balance the width of the non-ID
    variables across pages.
  • Use PROC FREQ to determine whether long variables
    can be coded and describe the code in a legend.

15
To make the report compact
  • Dont use the FLOW option unless necessary as it
    increases the number of lines per observation.
    Also, consider not using the SPLIT character in
    its label.
  • Dont alter any variable if proofreading.
  • Eliminate variables which have a one to one
    relationship with other variables.
  • Sensibly condense character variables.
  • Edit variables without eliminating their meaning.
  • Transfer meaning from a variable to its label.
  • Since formats can alter variable widths, apply
    them prior to calculating column widths.
  • Use the STYLE attribute, some of the 6 font
    parameters and ODS. While good for publications,
    this doesnt support a standardized line size and
    appearance.

16
For alphanumeric variables
  • Determine their maximum width in the data set.
  • If a format increases this width, use that width.
  • Consider removing any invariant prefixes or
    suffixes
  • If the FLOW parameter is required, consider the
    line size constraint, calculate the width plus
    spacing of all other variables and
  • For a single FLOW variable, use its maximum
    width.
  • For multiple FLOW variables, determine how to
    best allocate their widths to minimize lines per
    observation.
  • See if other data can be put on the added
    line(s) per observation (e.g., concatenate visit
    date, SPLIT character and visit name and use the
    FLOW option).

17
For numeric variables
  • Determine their range, maximum value and whether
    theyre integer and then specify an appropriate
    format (not the default BEST) and decimal point.
  • For date time variables, specify an appropriate
    compact format (e.g., MMDDYY6.). Separate date
    and time with DATEPART. If time is missing for
    all observations remove it from the report.
  • If it has a format which transforms it into an
    alphanumeric variable, apply the format and treat
    it as though it were an alphanumeric variable.

18
To increase the attractiveness of a compact
report
  • Appropriately order the COLUMN statement
    variables.
  • Appropriately specify ORDER variables and the
    BREAK statement (e.g., blank line between ID
    variables).
  • Use informative labels neatly spanned in the
    COLUMN statement and appropriately split in the
    DEFINE.
  • Use tricks (e.g., unprintable character at end of
    label, SPLIT character and blank at beginning of
    label, spacing 0) to separately justify labels
    and values.
  • Use neat and informative titles, footnotes and/or
    legends. If necessary, expand a labels meaning
    in a legend.

19
20 Possible Variables for Compact Output
20
Condensing and Editing LVALC and SASNAME
21
Compact Output (15 Variable) Data Step
22
Compact Output
23
Final Compact Output
24
Compact Output PROC REPORT
25
Compact Output With BY Statement
26
Compact Output With Patient ID Option
27
Compact Output With Patient ID Option
28
General ECGTEST Compact Report
  • Decide what variables to always exclude.
  • Decide constant variables for title.
  • Decide what 1 to 1 variables to include.
  • Count and remove applicable leading zeroes.
  • Determine which data condensing tricks work.
  • If not condensable, they must be output in full
  • Predetermine variables that need FLOW.
  • Determine which variables have a fixed width.
  • Calculate width of all remaining variables.
  • Use BY variables and the ID option for date.

29
Compact Output With Visit ID Option
30
Compact Output With Visit ID Option
31
Original AE PROC REPORT Code
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Changes Made to Produce Final Listing
  • Width of all variables was minimized.
  • Leading zeroes were stripped from subject it was
    concatenated with age sex and FLOW added.
  • Century was eliminated and date was output after
    subject and made an ORDER variable.
  • Body system was coded into footnotes.
  • Verbatim label indentation was corrected.
  • Labels were beautified. STUDY DAY was centered.
  • Width of indented column was maximized to
    eliminate FLOW of the concatenated variables.
  • DAI was put in data set and FLOW added.
  • The above reduced the output from 21 to 11 pages.

37
(No Transcript)
38
(No Transcript)
39
Final AE PROC REPORT Code
40
Creating an Automated AE Listing
  • Find variables needed for other projects and
  • Find if their width is variable and calculate it.
  • Make attractive labels with SPLIT characters.
  • Exclude them if they are blank (e.g., time).
  • Use minimum possible width.
  • 2 lines per observation Thus use FLOW?
  • Maximize width of the verbatim variable.
  • User should specify variables and their order.
  • Change footnotes to an automated legend.
  • Add options for the BY variables.
Write a Comment
User Comments (0)
About PowerShow.com