Title: Introduction to Statistical Computing in Clinical Research
1Introduction to Statistical Computing in Clinical
Research
- Biostatistics 212
- Lecture 1
2Today...
- Course overview
- Course objectives
- Course details grading, homework, etc
- Schedule, lecture overview
- Where does Stata fit in?
- Basic data analysis with Stata
- Stata demos
- Lab
3Course Objectives
- Introduce you to using STATA and Excel for
- Data management
- Basic statistical and epidemiologic analysis
- Turning raw data into presentable tables, figures
and other research products - Prepare you for Fall courses
- Start analyzing your own data
4Course details
- Introduction to Statistical Computing - 1 unit
- Schedule 7 lectures, 7 lab sessions, on 7
Tuesdays in a row - Dates August 4 September 15
- Lectures 115-245
- Labs 300-400
- All in China Basin, CBL 6702 (6704 for lab)
- Final Project Due 9/22/09
5Course details
- Introduction to Statistical Computing
- Grading Satisfactory/Unsatisfactory
- Requirements
- -Hand in all six Labs (even if late)
- -Satisfactory Final Project
- -80 of total points
- Reading Optional
6Course details, cont
- Course Director
- Mark Pletcher
- Teaching Assistants
- Justin Parekh Section 1
- Elena Flowers Section 2 (Mac)
- Tamara Castillo
- Maurice Garcia
- Lecturers
- Andy Choi
- Jennifer Cocohoba
- Lab Instructor
- Mandana Khalili
7Overview of lecture topics
- 1- Introduction to STATA
- 2- Do files, log files, and workflow in STATA
- 3- Generating variables and manipulating data
with STATA - 4- Using Excel
- 5- Basic epidemiologic analysis with STATA
- 6- Making a figure with STATA
- 7- Organizing a project, making a table
8Overview of labs
- Lab 1 Load a dataset and analyze it
- Lab 2 Learn how to use do and log files
- Lab 3 Import data from excel, generate new
variables and manipulate data, document
everything with do and log files. - Lab 4 Using and creating Excel spreadsheets
- Lab 5 Epidemiologic analysis using Stata
- Lab 6 Making a figure with Stata
- Last lab session will be dedicated to working on
the Final Project - - Labs 3 and 5 are significantly longer and
harder than the others
9Overview of labs, cont
- Official Lab time is 300-400, but we will start
right after lecture, and you can leave when you
are done.
10Overview of labs, cont
- Labs are due the following week prior to lecture.
Labs turned in late (less than 1 week) will
receive only half credit after that, no points
will be awarded. However, ALL labs must be
turned in to pass the class (even if no points
are awarded). - Lab 1 is paper
- Labs 2-6 are electronic files, and should be
emailed to your section leaders course email
address biostat212_section1_at_yahoo.com (Justin)
or biostat212_section2_at_yahoo.com (Elena)
11Final Project
- Create a Table and a Figure using your own data,
document analysis using Stata. - Due 1 week after last lab session, 20 points
docked for each 1 day late.
12Course Materials
- Course Overview
- Final Project
- Lectures and Labs (just in time)
- Other handouts
- Books
13Getting started with STATA
14Types of software packages used in clinical
research
- Statistical analysis packages
- Spreadsheets
- Database programs
- Custom applications
- Cost-effectiveness analysis (TreeAge, etc)
- Survey analysis (SUDAAN, etc)
15Software packages for analyzing data
- STATA
- SAS
- S-plus, and R
- SPS-S
- SUDAAN
- Epi-Info
- JMP
- MatLab
- StatExact
16Why use STATA?
- Quick start, user friendly
- Immediate results, response
- You can look at the data
- Menu-driven option
- Good graphics
- Log and do files
- Good manuals, help menu
17Why NOT use STATA?
- SAS is used more often?
- SAS does some things STATA does not
- Programming easier with S-plus and R?
- R is free
- Complicated data structure and manipulation
easier with SAS? - Epi-info (free) is even easier than STATA?
18STATA Basic functionality
- Holds data for you
- Stata holds 1 flat file dataset only (.dta
file) - Listens to what you want
- Type a command, press enter
- Does stuff
- Statistics, data manipulation, etc
- Shows you the results
- Results window
19Demo 1
- Open the program
- Load some data
- Look at it
- Run a command
20STATA - Windows
- Two basic windows
- Command
- Results
- Optional windows
- Variable list
- History of commands
- Other functions
- Data browser/editor
- Do file editor
- Viewer (for log, help files, etc)
21STATA - Buttons
- The usual open, save, print
- Log-file open/suspend/close
- Do-file editor
- Browse and Edit
- Break
22STATA - Menus
- Almost every command can be accessed via menu
23Demo 2
- Enter in some data
- Look at it
- Run a couple of commands
24Menu vs. Command line
- Menu advantages
- Look for commands you dont know about
- See the options for each command
- Complex commands easier learn syntax
- Command line advantages
- Faster (if you know the command!)
- Closer to the program
- Only way to write do files
- Document and repeat analyses
25STATA commandsDescribing your data
- describe varlist
- Displays variable names, types, labels
- list varlist
- Displays the values of all observations
- codebook varlist
- Displays labels and codes for all variables
26STATA commandsDescriptive statistics
continuous data
- summarize varlist , detail
- obs, mean, SD, range
- , detail gets you more detail (median, etc)
- ci varlist
- Mean, standard error of mean, and confidence
intervals - Actually works for dichotomous variables, too.
27STATA commandsGraphical exploration continuous
data
- histogram varname
- Simple histogram of your variable
- graph box varlist
- Box plot of your variable
- qnorm varname
- Quantile plot of your variable to check normality
28STATA commandsDescriptive statistics
categorical data
- tabulate varname
- Counts and percentages
- (see also, table - this is very different!)
29STATA commandsAnalytic statistics 2
categorical variables
30STATA commandsAnalytic statistics 2
categorical variables
- tabulate var1 var2
- Cross-tab
- Descriptive options
- , row (row percentages)
- , col (column percentages)
- Statistics options
- , chi2 (chi2 test)
- , exact (fishers exact test)
31Getting help
- Try to find the command on the pull-down menus
- Help menu
- If you dont know the command - Search...
- If you know the command - Stata command...
- Try the manuals
- more detail, theoretical underpinnings, etc
32STATA commandsAnalytic statistics 1
categorical, 1 continuous
33STATA commandsAnalytic statistics 1
categorical, 1 continuous
- bysort catvar summarize contvar
- mean, SD, range of one in subgroup
- ttest contvar, by(catvar)
- t-test
- oneway contvar catvar
- ANOVA
- table catvar , contents(mean contvar)
- Table of statistics
34STATA commandsAnalytic statistics 2 continuous
35STATA commandsAnalytic statistics 2 continuous
- scatter var1 var2
- Scatterplot of the two variables
- pwcorr varlist , sig
- Pairwise correlations between variables
- sig option gives p-values
- spearman varlist , stats(rho p)
36Demo 3
- Load a STATA dataset
- Explore the data
- Describe the data
- Answer some simple research questions
- Gender and HTN, age and HTN
37In Lab Today
- Familiarize yourself with Stata
- Load a dataset
- Use Stata commands to analyze data and fill in
the blanks
38Next week
- Do files, log files, and workflow in Stata
- Find a dataset!
39Website addresses
- Course website
- http//www.epibiostat.ucsf.edu/courses/schedule/bi
ostat212.html - Computing information
- http//www.epibiostat.ucsf.edu/courses/ChinaBasinL
ocation.htmlcomputing - Download RDP for Macs (for Stata 10 Server)
- http//www.microsoft.com/mac/otherproducts/otherpr
oducts.aspx?pidremotedesktopclient - Citrix Web Server
- http//apps.epi-ucsf.org/