Statistical Methods II - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Statistical Methods II

Description:

Statistical Methods II Notes on SQL SQL Programming Employers increasingly tell us that they look for 2 things on a resume: SAS and SQL. You have learned A LOT of SAS ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 22
Provided by: Mauric123
Category:

less

Transcript and Presenter's Notes

Title: Statistical Methods II


1
Statistical Methods II
  • Notes on SQL

2
SQL Programming
  • Employers increasingly tell us that they look for
    2 things on a resume SAS and SQL.
  • You have learned A LOT of SASbut lets focus our
    attention on SQL.
  • In these notes you will learn
  • What SQL is
  • Why it is used
  • The basics of SQL syntax
  • And, we will go through a few REALLY fun and
    exciting examples.

3
SQL Programming
  • What is SQL?
  • SQL stands for Structured Query Language. It
    was designed as a language to manage data in
    relational database management systems (DBMS).
  • The SQL language is sub-divided into several
    language elements, including
  • Queries, which retrieve the data based on
    specific criteria. This is the most important
    element of SQL.
  • Clauses, which are constituent components of
    statements and queries.
  • Expressions, which can produce either scalar
    values or tables consisting of columns and rows
    of data.
  • Statements, which may have a persistent effect on
    schemas and data, or which may control
    transactions, program flow, connections,
    sessions, or diagnostics.
  • SQL statements also include the semicolon
    statement terminator. Though not required on
    every platform, it is defined as a standard part
    of the SQL grammar.

4
SQL Programming
  • Why is PROC SQL better than Data steps?
  • The syntax is transferable to other SQL software
    packages
  • You can join up to 250 SAS tables
  • No need to sort any of the input tables
  • Simplified syntax than in a normal SAS Data step
  • When is Proc SQL not better than Data steps?
  • Uses more memory than any regular data/procedure
    steps
  • Could take longer than other procedures when
    working with very large contributing tables
  • Logic flow becomes harder to implement

5
SQL Programming
  • Why do we use SQL?
  • SQL is used primarily to
  • Retrieve data from and manipulate tables/datasets
  • Add or modify data values in a table/datasets
  • Add, modify, or drop columns in a table/datasets
  • Create tables and views
  • Join/Merge multiple tables (whether or not they
    contain columns
  • with the same name)
  • Generate reports.

6
SQL Programming
  • Why do we use SQL?
  • You probably noticed that the previous list
    includes a lot of things that we do with DATA
    statements in SAS. In many cases, SQL is a
    better alternative to DATA statements in SAS it
    is more efficient.
  • Clarification regarding SQL in SAS
  • We use SQL like Data Statements in SASNOT like
    (most) Proc Statements. SQL is used to extract
    data, merge data and create variablesnot to
    analyze data.
  • Lets take a look

7
SQL Programming
  • Consider the Pennstate1 dataset. Lets say that
    you needed to
  • Only retain sex, earpierces, tattoos, height,
    height choice, looks and friends variables.
  • Sort by sex.
  • Delete observations with more than 4 earpierces.
  • Create a new variable called HeightDifference
    which is the difference between their current
    height and their Height Choice.
  • Create a new dataset called Modeling from the
    above requirements.

8
SQL Programming
  • My guess is that at this point, you would use a
    DATA step and your code would look something like
    this
  • Data Modeling (keep sex earprces tattoo height
    htchoice looks friends)
  • set jlp.pennstate2
  • where earprces lt4
  • Heightdiff Htchoice-Height
  • run
  • Proc sort datamodeling
  • by Sex
  • run
  • This code would run and produce what you need.

9
SQL Programming
  • Here is what this same requirement would look
    like using Proc SQL
  • proc sql
  • create table work.modeling as
  • Select sex,earprces,tattoo,height,htchoice,looks,f
    riends,
  • Htchoice-Height as HeightDiff
  • from jlp.pennstate2
  • where earprceslt4
  • order by sex
  • quit

What do you notice about this code that is
unexpected in SAS?
10
SQL Programming
  • Lets pull this apart
  • proc sql
  • ltThis is the Proc statement in SAS that calls
    SQL. Notice that there is no DATA reference.gt
  • create table work.modeling as
  • ltIn SQL order matters. If you want to retain the
    dataset (table) which is being created rather
    than just view it you must have this create
    table statement next. The syntax create table
    library.file as will create a dataset in the
    designated library with the designated filename.
    Please note that there is NO semicolon after this
    statement gt

11
SQL Programming
  • Select sex,earprces,tattoo,height,htchoice,looks,
    friends,
  • ltthis statement functions like a keep
    statement. Note that you could use an to
    simply include all variables. In SQL we use
    commas. Again, notice that there is no semicolon
    at the end of the select statementgt
  • Htchoice-Height as HeightDiff
  • ltthis part of the statement is the creation of a
    new variable HeightDiff. Notice that the
    nomenclature is Old variables then new variable.
    This is different than what we normally do in
    SASwhich is new variable and then old variablesgt
  • from jlp.pennstate2
  • ltthis part of the statement is the equivalent to
    the Set statement in a Data step. It references
    the source dataset where everything comes
    fromgt.

12
SQL Programming
  • where earprceslt4
  • ltThis part of the statement looks just like what
    we would expect to see in a Data step.gt
  • order by sex
  • ltThis part of the statement is the equivalent of
    embedding a Proc Sort in the Data step. Notice
    that since this is the end of the Proc SQL
    statement, it is concluded with a Semicolon.gt
  • quit
  • ltProc SQL ends with a quit rather than with a
    rungt

13
SQL Programming
  • Lets look at another examplelets focus on
    categorizing a variable. Consider the UCDAVIS1
    dataset.
  • Create a new dataset called UCTEST.
  • Only retain GPA, SEAT, SEX and ALCOHOL.
  • Create a new variable GPACAT which is a
    categorization of the GPA variablewhere lt2 is
    low, lt3 is medium and lt4 is high.
  • How would we do this without using SQL and using
    SQL

14
SQL Programming
  • Using a Data step, your code probably looks like
    this
  • Data UCTEST (keep GPA GPACAT SEX ALCOHOL)
  • set jlp.ucdavis1
  • Format GPACAT CHAR7.
  • If GPA . then GPACAT " "
  • else if GPA lt2 then GPACAT "LOW"
  • else if GPA lt3 then GPACAT "MEDIUM"
  • else GPACAT "HIGH"
  • Run
  • Proc print dataUCTEST
  • Run

? Why do we need this format statement?
15
SQL Programming
  • Using SQL, your code probably looks like this
  • PROC SQL
  • CREATE TABLE work.UCTEST AS
  • SELECT GPA,Sex,alcohol,
  • CASE
  • WHEN GPA . THEN '
  • WHEN GPAlt2.0 THEN 'LOW
  • WHEN GPAlt3.0 THEN 'MEDIUM
  • ELSE 'HIGH
  • END AS GPACAT
  • FROM jlp.ucdavis1
  • QUIT

What do you notice about this code that is
different from the Data step?
16
SQL Programming
  • Lets look at another example...lets focus on
    creating a new quantitative variable using a
    mathematical operator.
  • Consider the UCDAVIS1 dataset again.
  • Create a new dataset called UCTEST1.
  • Create a new variable that is called Leisure
    which is the amount of TV time plus the amount of
    Computer time.
  • Create a new variable that is 2x the sleep
    variable.
  • Only retain those sitting in the front and the
    back.
  • Sort the data by seat.
  • How would we do this without using SQL and using
    SQL

17
SQL Programming
  • Using a Data step, your code probably looks like
    this
  • Data UCTEST1 (keep TV Computer Sleepx2 Seat
    Leisure)
  • set jlp.ucdavis1
  • Leisure (TV Computer)
  • Sleepx2 Sleep2
  • If seat "Middle" then delete
  • Run
  • Proc sort data UCTEST1
  • by seat
  • Run

18
SQL Programming
  • Using SQL, your code probably looks like this
  • PROC SQL
  • CREATE TABLE work.TEST AS
  • SELECT TV, Computer, Sleep, Seat,(TV Computer)
    AS Leisure
  • FROM jlp.ucdavis1
  • WHERE SEAT IN ('Front', 'Back')
  • ORDER BY SEAT
  • QUIT

19
SQL Programming
The general form of PROC SQL includes the
following PROC SQL SELECT ltLIST THE
COLUMNS/VARIABLES TO BE INCLUDED IN THE ANALYSIS
OR NEW DATASETgt CREATE TABLE...AS ltCREATES A NEW
DATASETgt FROM ltIDENTIFY THE LIBRARY.FILENAME TO
BE USED AS THE SOURCE DATAgt WHERE ltIDENTIFY ANY
CONDITIONS HERE - LIKE ONLY OBSERVATIONS WITH A
GPA gt3.0gt ORDER BY ltCREATES A SORTING OF THE
DATAgt CASE WHEN ltCREATES A NEW VARIABLE FROM AN
OLD VARIABLE...UNLIKE IN A DATA STATEMENT, THE
VARIABLE WILL ACCOMMODATE ALL VALUE LENGTHS - NOT
JUST THE FIRST ONEgt END AS ltMUST COMPLETE A CASE
CLAUSEgt QUIT
20
SQL Programming Summary Statistics
Proc SQL syntax Description
AVG, MEAN means or average of values
COUNT, FREQ, N number of nonmissing values
CSS corrected sum of squares
CV coefficient of variation (percent)
MAX largest value
MIN smallest value
NMISS number of missing values
PRT probability of a greater absolute value of Student's t
RANGE range of values
STD standard deviation
STDERR standard error of the mean
SUM sum of values
SUMWGT sum of the WEIGHT variable values
T Student's t value for testing the hypothesis that the population mean is zero
USS uncorrected sum of squares
VAR variance

The above table can found here
http//www.tau.ac.il/cc/pages/docs/sas8/proc/zsum
func.htm
21
Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com