Oswald - PowerPoint PPT Presentation

1 / 205
About This Presentation
Title:

Oswald

Description:

1530-1600 afternoon tea. 1600-1730 hands-on session. Oswald/Windows. 3. Outline - Day 2 ... 1530-1600 afternoon tea. 1600-1730 free laboratory session. Oswald ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 206
Provided by: davidm126
Category:
Tags: afternoon | oswald | tea

less

Transcript and Presenter's Notes

Title: Oswald


1
Oswald
  • for Windows
  • Lancaster University 7-9 June 1998

MathSoft International Knightway House Park
Street Bagshot, Surrey GU19 5AQ United Kingdom
David Smith S-PLUS Product Manager Phone 44
(0)1276 452299 ext. 212 Fax 44 (0)1276
451224 E-mail dsmith_at_mathsoft.co.uk
2
Outline - Day 1
  • 0915-1045 Intro to S-plus and Oswald
  • 1045-1115 coffee
  • 1115-1245 hands-on session
  • 1245-1400 lunch
  • 1400-1530 Longitudinal data
  • 1530-1600 afternoon tea
  • 1600-1730 hands-on session

3
Outline - Day 2
  • 0915-1045 Exploring longitudinal data
  • 1045-1115 coffee
  • 1115-1245 hands-on session
  • 1245-1400 lunch
  • 1400-1530 Basic modelling
  • 1530-1600 afternoon tea
  • 1600-1730 hands-on session

4
Outline - Day 3
  • 0915-1045 Advanced modelling
  • 1045-1115 coffee
  • 1115-1245 hands-on session
  • 1245-1400 lunch
  • 1400-1530 Other topics
  • 1530-1600 afternoon tea
  • 1600-1730 free laboratory session

5
Day 1
  • Introduction to S-plus and Oswald

6
Day 1 Session 1
  • What is S-PLUS?
  • Oswald overview
  • Starting S-PLUS for Windows
  • The S command language
  • Fundamentals -- expressions and assignments
  • Vectors, matrices and lists
  • Calling functions

7
S-PLUS History
  • 1980s S Language developed by ATT Bell Labs
  • Added value S-plus distributed by StatSci Inc
  • 1992 Mathsoft acquires Statsci, Axum
  • 1997 S-plus 4.0 released
  • Axum Graphics, S Language, Office GUI
  • 1998 S-plus 4.5 (Windows) S-plus 5.0 (Unix)
    released
  • Best-regarded statistical research tool in the
    industry
  • 50,000 users world-wide

8
S-PLUS 4.5For exploratory data analysis and
statistical modelling
  • From basic statistics to cutting edge statistics
  • Customisable and extensible GUI
  • For data analysts and researchers of all levels
  • Powerful, object-oriented S language
  • With thousands of built-in functions
  • Total control of data and graphics
  • Worldwide S-PLUS users group

9
Book Recommendations
  • Modern Applied Statistics with S-plus
  • W.N. Venables and B.D. Ripley
  • Springer 1997 (second edition)
  • The New S Language A programming environment for
    data analysis and graphics
  • Becker, Chambers Wilks
  • Wadsworth Brooks/Cole 1988
  • Analysis of Longitudinal Data
  • Diggle, Liang Zeger
  • Oxford University Press 1994

10
Oswald - Introduction
  • Object-oriented Software for the Analysis of
    Longitudinal Data

11
Longitudinal Data
  • A (largish) number of subjects
  • people, households, rats, cows,
  • A measurement taken repeatedly on each subjects
    over a (smallish) number of times
  • Daily blood-pressure on 65 patients taken over 2
    weeks. 30 patients on control drug, 35 on new
    anti-hypertensive drug.

12
Oswald features
  • New data type for longitudinal data
  • Subsample selection
  • Creating replicated covariates
  • Parallel plots
  • Shadow plots
  • Available for Windows and Unix
  • Analysis of longitudinal data
  • Kernel smoothing
  • Rice-Silverman C-V
  • Variogram
  • Mixed modelling
  • Dropout modelling
  • Comes with GEE and ALR

13
Using Oswald
  • From the S-plus prompt
  • library(oswald)
  • The Oswald function and datasets are now
    available.

14
Starting S-PLUS
15
(No Transcript)
16
Object Browser
  • S-plus is object-oriented
  • data, functions, analysis results, plots, ...
  • persistence (press DEL to delete)
  • The Object Browser organises all the objects you
    create
  • Left pane -- objects by classification
  • Right pane -- list/components of objects
  • Completely customisable

17
Getting Data
  • Data Select Data
  • Select existing data, or import data from a file
  • Dialog appears at startup (by default)
  • Import data (later)
  • Existing data enter the name of an S-plus object
    in the Name box and click OK.

18
Data Windows
  • Data appears in a data window
  • Like a spreadsheet, but column-oriented
  • Includes column names and row labels
  • Click on a column name to select a column
  • Shift-click to select a block of columns
  • Ctrl-click for non-adjacent columns
  • Order of selection is important!
  • Selected columns used for plots, analysis, ...

19
Simple graphs
  • Open the 2D or 3D plot palette
  • Select column(s) for plotting from data window
  • First ctrl-click -- x axis
  • Second ctrl-click -- y axis
  • (Third ctrl-click -- z axis)
  • Click on the palette button corresponding to the
    desired plot

20
Trellis Graphs
  • You can condition any plot by the value of any
    other variable in the data
  • Create a plot
  • Select the conditioning column in the data
    window, and drag to the title area
  • Be sure to start dragging from the data
  • Cursor changes to when ready to drop
  • Plot is redrawn with panels for subranges of the
    conditioning variable

21
Example Ozone data
  • Select Data
  • Histogram of Ozone
  • Scatterplot Temp vs Ozone
  • With loess line
  • Trellis conditioning on Wind
  • 3-D plot Temp vs Wind vs Ozone
  • Vary colour by Radiation
  • Export graphics

22
The Command Window
  • Behind all of the menus and toolbars lies the S
    command language
  • A complete programming language
  • expressions, data types, loops, functions,
  • object oriented
  • We will mostly use Oswald by entering commands in
    the S language using the Command Window
  • Click on to open.

23
S Language absolute basics
  • Enter an expression at the prompt gt
  • S-plus prints out the result (usually)
  • gt 11
  • 1 2
  • gt pi
  • 1 3.141593
  • gt sin((1sqrt(5))/2pi)
  • 1 -0.9320324

24
Secondary prompt
  • If you fail to complete the expression, continue
    at the prompt
  • gt exp(1/sqrt(2) -
  • 0.5)
  • 1 1.230114
  • If you get stuck, enter lots of )s
  • gt log(1sqrt(5 sin(2)
  • )))))
  • Syntax error No opening parenthesis, before ")"
    at this point
  • log(1sqrt(5 sin(2)

25
Vectors
  • Many functions return vectors
  • gt rnorm(6)
  • 1 -1.4217905 0.7963978 -0.2487190 -0.8279492
  • 5 0.7495827 1.0976972
  • The n on the left shows where the row starts.
    A single number is a length 1 vector
  • gt mean(rpois(50, mean5))
  • 1 4.74

26
Making vectors
  • Sequences ab
  • gt 110
  • 1 1 2 3 4 5 6 7 8 9 10
  • gt 5045
  • 1 50 49 48 47 46 45
  • Concatenation c(v1,v2,v3,)
  • gt c(6,2,1)
  • 1 6 2 1
  • gt c(15,51)
  • 1 1 2 3 4 5 5 4 3 2 1

27
Assignment
  • Store results in objects with lt- (gets)
  • gt a lt- 110
  • The result is not printed until you enter the
    object name by itself
  • gt a
  • 1 1 2 3 4 5 6 7 8 9 10

28
Object Names
  • Valid object names may contain only
  • Letters abcXYZ
  • Numbers 0123456789
  • Dot .
  • Valid names
  • y Y weight x.var LD50 .my.dat sum
  • Invalid names
  • m-1 heart_rate 9th T F NA

29
Handling objects
  • List all your objects
  • gt objects()
  • 1 ".Last.value" ".Random.seed"
  • 3 ".ldats.options" "a
  • Objects are persistent and remain until removed
    (even if you quit S-plus)
  • gt remove("a")
  • gt a
  • Error Object "a" not found
  • Or, use the Object Browser

30
Objects as variables
  • Objects can be used in expressions
  • gt a lt- 15
  • gt sum(a)
  • 1 15
  • gt b lt- c(a, 10)
  • gt length(b)
  • 1 6
  • gt 2b
  • 1 2 4 6 8 10 20

31
Vector arithmetic
  • Scalar functions work elementwise
  • gt a lt- 14
  • gt sqrt(a)
  • 1 1.000000 1.414214 1.732051 2.000000
  • Scalar and vector arithmetic
  • gt 2a
  • 1 2 4 6 8
  • gt 2a log(a)
  • 1 2.000000 4.693147 7.098612 9.386294

32
Logical vectors
  • Expressions with relational operators return
    logical vectors
  • ab agt5 blt0 a!0
  • T is True, F is False
  • gt a lt- rnorm(5)
  • gt a
  • 1 -0.1495632 0.1389647 2.3571278 2.1495335
  • 5 -1.8157597
  • gt a lt 0
  • 1 T F F F T

33
Missing values -- NA
  • The missing value is represented by NA
  • gt a lt- c(1, NA, 2)
  • Most operations on NA return NA
  • gt a 1
  • 1 2 NA 3
  • gt sum(a)
  • 1 NA
  • is.na checks for missing values
  • gt is.na(a)
  • 1 F T F

Not aNA
34
Vector indexing
  • Use to select elements of a vector
  • gt a lt- c(2,3,5,7,11,13,17)
  • gt a1
  • 1 2
  • gt a35
  • 1 5 7 11
  • gt ac(1,3,5)
  • 1 2 5 11
  • Negative indices remove elements
  • gt a-(13)
  • 1 7 11 13 17

35
Logical indices
  • A logical index selects elements
  • index as long as the indexed vector
  • gt b lt- rnorm(8)
  • gt bblt0
  • 1 -0.04100 -0.86600 -0.07229
  • gt log.b lt- log(b)
  • Warning messages
  • NAs generated in log(x)
  • gt bis.na(log.b)
  • 1 -0.04100 -0.86600 -0.07229
  • gt log.b!is.na(log.b)
  • 1 0.7242 0.9154 0.6976 -0.1279 0.2408

36
Replacement
  • You can also use on the left hand side of an
    assignment
  • gt a lt- sample(110)
  • gt a
  • 1 2 1 10 9 7 8 3 6 4 5
  • gt a5 lt- NA
  • gt a
  • 1 2 1 10 9 NA 8 3 6 4 5
  • gt ais.na(a) lt- 0
  • gt a
  • 1 2 1 10 9 0 8 3 6 4 5

37
Calling functions
  • Functions are called like this
  • function.name(argument, argument, ...)
  • Functions always return a value
  • NULL represents no value
  • Function arguments have a
  • position (first, second, )
  • name
  • default value (sometimes)

38
Example rep
  • rep(x, times inferred, length.out
    inferred)
  • gt rep(13,2)
  • 1 1 2 3 1 2 3
  • gt rep(13,length8)
  • 1 1 2 3 1 2 3 1 2
  • gt rep(13,c(3,2,1))
  • 1 1 1 1 2 2 3
  • gt rep()
  • Error in rep Argument "x" is missing, with no
  • default rep()
  • Dumped

39
Example seq
  • seq(from1, toend, by1,
  • lengthinferred, alongNULL)
  • gt seq(1,5)
  • 1 1 2 3 4 5
  • gt seq(10, 20, length6)
  • 1 10 12 14 16 18 20
  • gt seq(to100, by15, length7)
  • 1 10 25 40 55 70 85 100
  • gt seq(length10)
  • 1 1 2 3 4 5 6 7 8 9 10

40
Matrices
  • matrix(dataNA, nrowinferred, ncolinferred,
    byrowF, dimnamesNULL)
  • gt X lt- matrix(110, nrow2)
  • gt X
  • ,1 ,2 ,3 ,4 ,5
  • 1, 1 3 5 7 9
  • 2, 2 4 6 8 10
  • gt dim(X)
  • 1 2 5

41
Matrix indices
  • gt X2 lt- matrix(115, nrow3, byrowT)
  • gt X22,3
  • 1 8
  • gt X212,
  • ,1 ,2 ,3 ,4 ,5
  • 1, 1 2 3 4 5
  • 2, 6 7 8 9 10
  • gt X223, 35
  • ,1 ,2 ,3
  • 1, 8 9 10
  • 2, 13 14 15
  • gt X2,1
  • 1 1 6 11

gt X2,1,dropF ,1 1, 1 2,
6 3, 11
42
Transpose and multiply
  • Transpose t(X)
  • gt X lt- cbind(rep(1,5),15)
  • gt t(X)
  • ,1 ,2 ,3 ,4 ,5
  • 1, 1 1 1 1 1
  • 2, 1 2 3 4 5
  • Matrix multiply
  • gt t(X) X
  • ,1 ,2
  • 1, 5 15
  • 2, 15 55

43
Matrix algebra
  • Matrix inverse solve
  • gt solve(t(X) X)
  • ,1 ,2
  • 1, 1.1 -0.3
  • 2, -0.3 0.1
  • Decompositions
  • Eigenvector (eigen)
  • Singular value (svd)
  • Cholesky (chol)

gt eigen(t(X)X) values 1 1.1831
0.0169 vectors ,1 ,2 1,
0.9637 0.2669 2, -0.2669 0.9637
44
Lists
  • An ordered collection of arbitrary objects
  • gt mylist lt- list(dat15, name"short",
  • locc(1.0,-2.5))
  • gt mylist
  • dat
  • 1 1 2 3 4 5
  • name
  • 1 "short"
  • loc
  • 1 1.0 -2.5

gt mylistname 1 "short" gt mylist3 1 1.0
-2.5 gt names(mylist) 1 "dat" "name" "loc"
45
Getting help
  • Help on functions
  • ?sample
  • help(sample)
  • General help Help menu
  • Function index, keyword search
  • On-line manuals (Users/Programmers/Statistics)
  • Visual demo

46
Stop!
  • To cancel the current command, press ESC
  • Cancel long outputs
  • Abort long computations
  • To quit from S-plus (command prompt)
  • gt q()
  • Or File Exit from menu bar

47
Day 1 Session 2
  • Practical
  • S-plus tutorial

48
Day 1 Session 3
  • Reading data and data import
  • Data frames and factors
  • ldframe objects
  • adding columns
  • sub-selections
  • summary
  • ssv/tsv

49
Reading in data
  • File Import Data From File
  • Choose file and file type from dialog

50
Data import options
  • Select the Options tab for import options
  • By default, ASCII files are assumed to have
  • column labels on row 1 (blank rows ignored)
  • data from row 2 onwards
  • columns separated by whitespace or comma
  • Change defaults with import options
  • First data line, separators, column spec
  • NB .dat files assumed to be Gauss files!

51
Data objects
  • When data are imported, the data are displayed in
    a data window and a new data object is created
  • data frame
  • The name of the data object is chosen from the
    file name, or can be specified in the import
    dialog.

52
Data Frames
  • data.frame
  • usually used for data
  • Like a matrix with named columns
  • columns may be of different types
  • rows have labels
  • gt catalyst
  • Temp Conc Cat Yield
  • 1 160 20 A 60
  • 2 180 20 A 72
  • 3 160 40 A 54
  • 4 180 40 A 68
  • 5 160 20 B 52
  • 6 180 20 B 83
  • 7 160 40 B 45
  • 8 180 40 B 80

53
Data frame matrix?
  • You can operate on a data frame as if it were a
    matrix
  • gt catalyst15,-3
  • Temp Conc Yield
  • 1 160 20 60
  • 2 180 20 72
  • 3 160 40 54
  • 4 180 40 68
  • 5 160 20 52
  • But numeric functions usually dont work
  • gt mean(catalyst,-3)
  • Error in as.double Cannot coerce mode list to
  • double .Data list(..
  • Dumped
  • gt mean(
  • data.matrix(catalyst,-3))
  • 1 22.42

54
Data frame list
  • A data frame is actually a list whose components
    are all of equal length
  • gt names(catalyst)
  • 1 "Temp" "Conc" "Cat" "Yield"
  • gt catalystConc
  • 1 20 20 40 40 20 20 40 40
  • gt catalystnewcol lt- rnorm(8)

55
Data frame summary
  • gt summary(catalyst)
  • Temp Conc Cat
  • 1604 204 A4
  • 1804 404 B4
  • Yield newcol
  • Min. 45.0 Min. -0.9880
  • 1st Qu.53.5 1st Qu.-0.3780
  • Median 64.0 Median -0.1610
  • Mean 64.2 Mean -0.0075
  • 3rd Qu.74.0 3rd Qu. 0.2890
  • Max. 83.0 Max. 1.3700

56
Factors
  • A factor is a vector whose elements are
    categories (levels)
  • Used for categorical data in modelling
  • To check class of object class(object)
  • gt f lt- as.factor(str) convert string to factor
  • gt str lt- as.character(f) convert factor to
    string
  • gt val lt- as.numeric(as.character(f))
  • convert factor with numeric levels to numbers

57
Factor example
  • gt class(catalystConc)
  • 1 "factor"
  • gt levels(catalystConc)
  • 1 "20" "40"
  • gt catalystConc2 lt- 30
  • Warning messages
  • replacement values not all in levels(x) NA's
  • generated in "lt-.factor"(.A0, 2, value .A1)
  • gt catalystConc
  • 1 2 3 4 5 6 7 8
  • 20 NA 40 40 20 20 40 40

58
Longitudinal data
  • We have
  • m subjects, each with a subject id
  • subject ids are usually 1, 2, 3, , m
  • ni events for each subject i1,,m
  • responses, covariates
  • each event occurs at a particular time tij
  • We will use a rectangular data structure with one
    row per event.

59
Rat bodyweight data
  • gt rats.ldf
  • y Time Subject group
  • 1 57 1 1 Control
  • 2 86 2 1 Control
  • 3 114 3 1 Control
  • 4 139 4 1 Control
  • 5 172 5 1 Control
  • 6 60 1 2 Control
  • 51 59 1 11 Thyroxin
  • 52 85 2 11 Thyroxin
  • 134 104 4 27 Thiouracil
  • 135 122 5 27 Thiouracil

60
ldframe objects
  • Like data frames, but required to have these two
    columns (at least)
  • Time the time point of the event.
  • index number (1,2,3, )
  • days since start of study
  • weeks since calving
  • Subject the subject id code
  • identifies which events belong to which subject.
  • all observations on each subject must appear in a
    contiguous block

61
ldframe object -- data
  • Other columns are used for data
  • responses
  • its best to use the first column for the most
    important response
  • covariates
  • constant covariates (e.g. Sex) are simply
    repeated for each event within each subject
  • These columns can have any (legal) name

62
Converting data.frame to ldframe
  • Convert data frame to ldframe with as.ldframe
  • my.ldf lt- as.ldframe(my.df)
  • If the data frame does not have Subject and/or
    Time columns, specify them here.
  • names(Orthodont)
  • 1 "Subject" "Sex" "age" "distance"
  • Orthodont.ldf lt- as.ldframe(Orthodont,Timeage)

63
Two-way indexing
  • , indexing as for matrix or data.frame
  • gt rats.ldf12,
  • y Time Subject group
  • 1 57 1 1 Control
  • 2 86 2 1 Control
  • gt mean(rats.ldf,"y")
  • 1 100.8
  • gt rats.ldf123,"group"
  • 1 Thiouracil
  • Levels
  • 1 "Control" "Thiouracil" "Thyroxin"

64
Column operations
  • Select columns using list () notation
  • gt gp lt- rats.ldfgroup
  • gt table(gp)
  • Control Thiouracil Thyroxin
  • 50 50 35
  • Add or delete columns using notation
  • gt rats.ldfbodymass lt- newdata
  • gt rats.ldfy lt- NULL

65
Examples
  • First 3 observations from all subjects
  • gt rats.ldfrats.ldfTimelt3,
  • All from the Thyroxin group
  • gt rats.ldfrats.ldfgroupThyroxin,
  • Large responses from the Control group
  • gt rats.ldfrats.ldfgroup"Control"
  • rats.ldfygt170,
  • y Time Subject group
  • 5 172 5 1 Control
  • 10 177 5 2 Control
  • 15 185 5 3 Control

66
Occurrences -- nths
  • nths returns an index vector to nth within-block
    occurrences in a block-sorted vector
  • nths(n, vec)
  • vec is the vector to index
  • n1 first occurrences
  • n13 first three
  • n-1 last occurrence
  • nc(1,-1) first and last

67
Example
  • gt vec
  • 1 a a a b b b b b c c c c c c d d
  • gt nths(1,vec)
  • 1 1 4 9 15
  • gt nths(13,vec)
  • 1 1 2 3 4 5 6 9 10 11 15 16
  • gt nths(-1,vec)
  • 1 3 8 14 16
  • gt nths(c(1,-1),vec)
  • 1 1 3 4 8 9 14 15 16

4
9
15
68
Practical example
  • gt idx lt- nths(c(1,-1),rats.ldfSubject)
  • gt rats.ldfidx,
  • y Time Subject group
  • 1 57 1 1 Control
  • 5 172 5 1 Control
  • 6 60 1 2 Control
  • 10 177 5 2 Control
  • 11 52 1 3 Control
  • 15 185 5 3 Control
  • 16 49 1 4 Control
  • 20 164 5 4 Control
  • ...

69
Selecting by subject
  • Use a single index to select all events for
    selected subject(s)
  • gt rats.ldf1
  • y Time Subject group
  • 1 57 1 1 Control
  • 2 86 2 1 Control
  • 3 114 3 1 Control
  • 4 139 4 1 Control
  • 5 172 5 1 Control

70
More than one subject
  • gt rats.ldfc(10,27)
  • y Time Subject group
  • 46 57 1 10 Control
  • 47 82 2 10 Control
  • 48 110 3 10 Control
  • 49 139 4 10 Control
  • 50 169 5 10 Control
  • 131 53 1 27 Thiouracil
  • 132 72 2 27 Thiouracil
  • 133 89 3 27 Thiouracil
  • 134 104 4 27 Thiouracil
  • 135 122 5 27 Thiouracil

71
Selecting by subject id
  • To select by subject id, use quotes
  • gt rats.ldf.rev lt- rats.ldf271
  • gt rats.ldf.rev1,
  • y Time Subject group
  • 131 53 1 27 Thiouracil
  • gt rats.ldf.rev"15"12,
  • y Time Subject group
  • 71 57 1 15 Thyroxin
  • 72 72 2 15 Thyroxin

Note row labels stay with the correct event even
after the data has been reordered.
72
Bad ordering
  • Selecting so that subjects arent in blocks
    anymore
  • gt rats.perm lt- rats.ldfsample(1135),
  • Warning messages
  • Selection resulted in invalid ldframe object
  • converted to data.frame in ".ldframe\
  • "(rats.ldf, sample(1135), )
  • gt rats.perm13,
  • y Time Subject group
  • 30 153 5 6 Control
  • 8 123 3 2 Control
  • 29 131 4 6 Control
  • gt rats.perm lt- rats.ldfsample(127)

73
Replication
  • Some variables may be subject-specific
  • Constant (replicated) within a subject id
  • e.g. Sex, Treatment Group
  • Some variables may be time-specific
  • Constant (replicated) within a time point
  • e.g. daily temperature, regression dummy variables

74
ldframe summary
  • gt summary(rats.ldf)
  • Longitudinal data frame (ldframe) object
  • Number of subjects 27
  • Number of time points 5
  • Total events 135
  • Distribution of number of observations per
  • Subject Time
  • 527 275
  • Subject-specific variables
  • group
  • Control 10
  • Thiouracil10
  • Thyroxin 7
  • Time-specific variables none
  • Non-replicated variables
  • y
  • Min. 46
  • 1st Qu. 71
  • Median 100
  • Mean 101
  • 3rd Qu.122
  • Max. 189

75
ssv
  • ssv(ldframe)returns all subject-specific columns,
    condensed into one-per-subject
  • gt ssv(rats.ldf)
  • Subject group
  • 1 1 Control
  • 17 17 Thyroxin
  • 18 18 Thiouracil
  • 27 27 Thiouracil

76
selective ssv
  • To select a specific subject-specific column
  • gt gps lt- ssv(rats.ldf,"group")
  • gt data.class(gps)
  • 1 "data.frame"
  • gt table(gpsgroup)
  • Control Thiouracil Thyroxin
  • 10 10 7
  • gt ssv(rats.ldf,"y")
  • NULL
  • Warning messages
  • Variable y exists, but is not
  • subject-specific in ssv(rats.ldf, "y")

77
Adding columns with ssv
  • ssv(obj,"sex") lt- dat
  • gt sex lt- rep(c("m","f"),c(10,17))
  • gt ssv(rats.ldf,"sex") lt- sex

ldframe object
new col name
vector as long as number of subjects in obj
78
ssv results
  • gt rats.ldfc(1,27)
  • y Time Subject group sex
  • 1 57 1 1 Control m
  • 2 86 2 1 Control m
  • 3 114 3 1 Control m
  • 4 139 4 1 Control m
  • 5 172 5 1 Control m
  • 131 53 1 27 Thiouracil f
  • 132 72 2 27 Thiouracil f
  • 133 89 3 27 Thiouracil f
  • 134 104 4 27 Thiouracil f
  • 135 122 5 27 Thiouracil f

79
Another example
  • gt span lt- function(x) xlength(x)-x1
  • gt growth lt- tapply(rats.ldfy, rats.ldfSubject,
  • span)
  • gt growth
  • 1 2 3 4 5 6 7 8 9 10 11 12
  • 115 117 133 115 95 107 90 91 91 112 122 84
  • 13 14 15 16 17 18 19 20 21 22 23 24 25 26
  • 133 118 87 88 119 68 63 80 63 89 68 52 82 61
  • 27
  • 69
  • gt ssv(rats.ldf,"growth") lt- growth

80
Time-specific variables
  • tsv is like ssv for time-specific variables
  • varies within subject
  • constant for each time point
  • Sometimes data are unbalanced -- each subject
    measured at different times
  • in this case tsv requires a value for each unique
    time point

81
tsv example
  • gt tsv(rats.ldf, "temp") lt- c(21,20,25,28,35)
  • gt rats.ldf1011
  • y Time Subject group growth temp
  • 46 57 1 10 Control 112 21
  • 47 82 2 10 Control 112 20
  • 48 110 3 10 Control 112 25
  • 49 139 4 10 Control 112 28
  • 50 169 5 10 Control 112 35
  • 51 59 1 11 Thyroxin 122 21
  • 52 85 2 11 Thyroxin 122 20
  • 53 121 3 11 Thyroxin 122 25
  • 54 146 4 11 Thyroxin 122 28
  • 55 181 5 11 Thyroxin 122 35

82
Help Files
  • data.frame.object
  • read.table
  • ldframe
  • as.ldframe
  • ldframe.object
  • nths
  • summary.ldframe
  • ssv
  • tsv

83
Day 1 Session 4
  • Practical
  • Read in data
  • get summaries
  • create subsets of data
  • using tsv/ssv
  • complex subsetting -- nths

84
Day 2
  • Exploring longitudinal data

85
Day 2 Session 1
  • Plotting
  • Smoothing
  • Cross-validation
  • Object-orientation

86
Plotting longitudinal data
  • Goals
  • Show how response varies with time
  • Show variation within subject across time
  • Show variation between subjects
  • Show variation between groups (covariates)
  • Highlight unusual observations missing data
  • Problem too much data!

87
Scatterplot?
  • Scatterplot of yij versus tij

88
Basic parallel plot
  • Create a scatterplot of yij versus tij
  • Connect observations within each subject i with
    lines
  • "traces"
  • Shows overall variation
  • Allows observing within-subject behaviour by
    following lines

89
Basic plot
  • The standard S-plus plot function creates a
    parallel plot for ldframe objects
  • gt plot(rats.ldf)
  • NB plots created from the command line cannot
    (easily) be modified using the GUI

90
Milk Protein Data
  • 79 cows
  • Three treatment levels (group)
  • Barley diet
  • Lupins diet
  • Mixed (mixture of barley and lupins)
  • Response y milk protein content
  • Time is weeks since calving (up to 19)
  • Missing data

91
milk.ldf
  • gt idx lt- nths(c(12,-(12)),milk.ldfSubject)
  • gt milk.ldfidx,c(78,4)
  • y Time Subject group
  • 1464 3.67 1 78 Lupins
  • 1465 3.26 2 78 Lupins
  • 1481 3.35 18 78 Lupins
  • 1482 3.09 19 78 Lupins
  • 58 3.66 1 4 Barley
  • 59 3.50 2 4 Barley
  • 75 3.28 18 4 Barley
  • 76 NA 19 4 Barley
  • gt plot(milk.ldf)

92
Highlighting by group
  • Subject-specific columns can be used to
    differentiate subjects by colour, line style, or
    plotting character.
  • gt plot(milk.ldf, colorgroup)
  • gt plot(milk.ldf, linegroup)
  • gt plot(milk.ldf, pchargroup,
  • generallist(type"b"))
  • Combine color line pchar for 2 or 3-way
    discrimination.

93
Milk data plot
  • gt plot(milk.ldf, colorgroup)
  • Colours chosen according to levels of group
  • Missing data highlighted
  • Dashed lines for intermittent missing values
  • Red crosses for dropouts

94
Adding a legend
  • Use the legend argument to add a legend to the
    plot
  • gt plot(milk.ldf, colorgroup, legendlocator(1))
  • The legend is created automatically using
    information from the plot specification.

95
Shadow plots
  • The previous plot is very "busy"
  • difficult to follow individual trails
  • difficult to distinguish between-group behaviour
  • Idea plot most traces as light "shadows" to
    reduce clutter but maintain impression of
    variation.

96
Shadow plot
  • The n.full argument specifies the number of
    traces to plot in full
  • chosen at random
  • the remainder as shadows
  • use number between 0 and 1 for percentage
  • plot(milk.ldf, colorgroup,
  • n.full0.15)

97
Selector function
  • When n.full is used, the full traces are selected
    at random.
  • Alternative select an "even spread" according to
    some statistic of the trace
  • The selector argument may be used to specify a
    function to calculate the statistic
  • plot(milk.ldf, n.full12, colorgroup,
  • selectorfunction(y) y1)

98
Selector notes
  • The traces with the minimum and maximum
    statistics are always chosen
  • If the selector function returns NA, that trace
    will never be chosen
  • So be careful when using functions like mean as
    the selector function
  • If there are ties in the statistics, traces are
    randomly selected within ties

99
Example selector functions
  • selector
  • function(y) mean(y, na.rmT)
  • function(y) max(y, na.rmT)
  • function(y) diff(range(y, na.rmT))
  • function(y) sum(!is.na(y))
  • function(y) rev(y!is.na(y))1

100
Overlaying plots
  • When used with ldframe objects, lines and points
    add to the current plot
  • lines -- parallel plot
  • points -- scatterplot
  • plot(milk.ldf, n.full0)
  • lines(milk.ldfc(1,27,76),
  • colorgroup)

101
Different response
  • By default, plot.ldframe uses the first column of
    the ldframe as the response
  • To use a different response, supply an expression
    as the second argument
  • You can use any of the column names in the
    expression
  • plot(rats.ldf, log(y), colorgroup)
  • plot(my.ldf, var2)

102
Different x variable
  • By default, plot.ldframe uses the Time column as
    the x variable
  • To use a different x variable supply an
    expression for argument 3 (x.expr)
  • plot(rats.ldf, x.exprlog(Time))
  • plot(cog.ldf, x.exprAge)
  • plot(mydat, x.exprjitter(Time))

103
Customization
  • There are many optional arguments to plot.ldframe
    to change the appearance of the plot.
  • These can be included in the call to plot, or use
    tsopts to make permanent changes
  • See ?tsopts for a complete list
  • plot(milk.ldf, colorgroup, colsc(8,5,6))
  • tsopts(colsc(8,5,6))

104
general
  • A list of options to be passed to the standard
    plot function.
  • generallist(xlab"Hours")
  • generallist(type"o", cex2)
  • generallist(main"Rat Data")
  • gt plot(milk.ldf, generallist(type"o", pch""))
  • See ?plot, ?par

105
full
  • A list of graphics parameters to use when
    plotting traces in full (see ?par)
  • fulllist(lwd2)
  • fulllist(lty3, pch"o")
  • fulllist(col6)
  • gt plot(rats.ldf, fulllist(lwd2, col3))
  • Also shadows for shadow traces.

106
cols, pchs, ltys
  • Used when the color, line, or pchar arguments are
    used in plot
  • cols colours to cycle through
  • pchs plotting characters
  • ltys line types
  • tsopts(pchs115)
  • plot(rats.ldf, pchargroup,
  • generallist(type"b"), legendlocator(1))

107
Highlighting missing values
  • miss.style graphics parameters to use when
    drawing a line between two points in a trace
    separated by missing value(s)
  • dropout.style graphics parameters to use when
    plotting the last point in a trace terminated by
    missing value(s). Also dropin.style
  • tsopts(dropout.styleNULL)

108
Mean vs time
  • How best to observe changes in sample mean over
    time?
  • Possibly within several treatment groups
  • Within-time-point means?
  • What if data are unbalanced?
  • What if there is too much variability?
  • What about correlation?

109
Kernel Smoothing
  • Assume a single group for the moment.
  • Basic idea estimate mean at time t using a
    weighted average of points on all subjects
    observed near t
  • Bigger weights nearer to t
  • Weights defined by a kernel function K(t)

110
Kernel function
K(t)
h
t
  • The "width" of the kernel function is specified
    by a bandwidth h

111
Kernel estimation
  • Model yi(t) µ(t) ?i(t)
  • Data (yij, tij)
  • weight for yij w(t, tij, h) K((t-tij)/h)

112
ksmooth.lda
  • Simplest specify ldframe and bandwidth
  • sm1 lt- ksmooth.lda(milk.ldf, band3.5)
  • plot(sm1)
  • Uses a Normal kernel by default
  • Returns an ldframe with a single subject
  • smooth calculated at all unique time points by
    default

113
The bandwidth
  • If the bandwidth is large, the kernel estimate
    will be very smooth
  • A small bandwidth will reflect local behaviour,
    and be "rougher"
  • A zero bandwidth will calculate the average of
    any points which exists at the time point.

114
Bandwidth example
  • Pass a vector to bandwidth to calculate many
    smooths at once.
  • sm2 lt- ksmooth.lda(milk.ldf,
  • bandc(0,2,3.5, 10, 100))
  • plot(sm2, Smooth1, fulllist(col1))
  • lines(sm2, Smooth2, fulllist(col2))
  • lines(sm2, Smooth3, fulllist(col3))
  • lines(sm2, Smooth4, fulllist(col4))
  • lines(sm2, Smooth5, fulllist(col5))

115
More points
  • The kernel smooth can be calculated at any time
    point
  • Specify time points to calculate at with the
    x.points argument
  • sm3 lt-ksmooth.lda(milk.ldf, band3.5,
  • x.pointsseq(-1,22,length1000))
  • plot(sm3)

116
Smoothing within groups
  • Often we use smoothing to look at differences
    between treatment groups
  • Specify the groups with the by argument
  • A factor variable classifying the data points
    into subgroups (eg treatment group)
  • The result ldframe will have as many traces as
    there are levels in by

117
Within-groups example
  • Calculate smooths within each of the three diets
    for the milk protein data, and plot the smooths
    with the raw data.
  • sm4 lt- ksmooth.lda(milk.ldf,
  • band3.0, bygroup)
  • plot(milk.ldf, n.full0)
  • lines(sm4, colorSubject,
  • legendlocator(1))

118
Changing the kernel function
  • You can specify any kernel function with the
    kernel argument
  • A scalar valued function of one argument
  • Default kernels kernel.knormal,
    ksmooth.kbox,ksmooth.ktriangle.
  • All have quartiles at 0.25
  • The choice of kernel function isn't nearly as
    important as the choice of bandwidth

119
Kernel example
  • Create a new kernel function and plot it
  • mykernel lt- function(u)
  • ifelse(abs(u)lt1, 1, 1/abs(u))
  • x lt- seq(-2,2,length100)
  • plot(x,mykernel(x),type"l")
  • Use it on the milk protein data
  • plot(ksmooth.lda(milk.ldf, bygroup,
  • band0.5, kernelmykernel))

120
Selecting the bandwidth
  • The Rice-Silverman Criterion
  • Set bandwith at h
  • Remove subject i from the data
  • Calculate the kernel smooth
  • Add up the squared differences between subject i
    and the kernel smooth
  • Sum over all subjects
  • Minimise over h to determine bandwidth

121
ksmooth.rsc
  • ksmooth.rsc works like ksmooth.lda, but returns
    the criterion value instead
  • gt ksmooth.rsc(milk.ldf, band1.5)
  • bandwidth RSC
  • 1, 1.5 132.3614
  • gt rsc lt- ksmooth.rsc(milk.ldf,
  • bandseq(0,4,length30))
  • gt plot(rsc)

122
Grouped RSC
  • If you use the by argument with ksmooth.rsc, you
    get the sum of the criterion values within the
    subgroups.
  • rsc lt- ksmooth.rsc(milk.ldf, bygroup,
  • bandseq(0,2,length30))
  • plot(rsc)

123
For more information
  • ?plot.ldframe
  • ?tsopts
  • ?ksmooth.lda
  • ?ksmooth.rsc
  • Why plot.ldframe?

124
Object-oriented
  • S-plus is object oriented
  • Some objects have a class which identifies the
    type of object.
  • gt class(milk.ldf)
  • 1 "ldframe" "data.frame"
  • milk.ldf is of class ldframe, which inherits from
    class data.frame

125
Method functions
  • Some "generic" functions behave differently
    depending on the class of the first argument
  • plot, summary, print, operators,
  • You call plot(milk.ldf)
  • S runs plot.ldframe(milk.ldf)

126
Day 2 Session 2
  • Practical -- plotting and smoothing

127
Day 2 Session 3
  • Model specification
  • OLS models -- lm
  • Missing data issues -- na.add

128
Linear Models
  • Basic Model
  • Y X? ?

129
Models in S-plus
  • formula notation
  • response predictor predictor
  • response vector
  • predictors vector, factor
  • Intercept term (1) automatically included unless
    specifically excluded

130
Model formulae
131
Y 1
Response
Time
132
Y Time
Response
Time
133
Y Time Time2
Response
Time
134
Y poly(Time,3)
Response
Time
135
Y group Time
Response
Time
136
Y groupTime
Response
Time
137
More formula examples
  • y group - 1
  • y group - 1 Time
  • log(y) sex group/Time
  • log(y) sex group/Time - group
  • y poly(Time,3) sexgroup
  • y group/ns(Time,4)
  • (4 df natural splines within each group -- see
    also bs)

138
Factor contrasts
  • By default, S-PLUS uses Helmert contrasts
  • options(
  • contrastsc("contr.treatment","contr.poly"))
  • Intercept -- level 1 estimate
  • Factor g estimate -- Level g - Level 1

139
Ordinary Least Squares
  • Although we know there is within-subject
    correlation, sometimes useful to ignore it
  • The clever ostrich
  • Point estimates of parameters consistent and
    usually fairly efficient
  • Standard errors can be very wrong (small)
  • but possible to get better estimates

140
Fitting OLS models lm
  • Basic usage
  • obj lt- lm(formula, datadataframe)
  • formula is a model formula
  • dataframe is the name of a data frame in which to
    find the formula variables
  • obj is an object to store the results

141
Model results
  • You can print or extract various components of
    the result object
  • obj brief details
  • summary(obj) more details
  • coef(obj) estimated coefficients
  • anova(obj) AOV table
  • fitted(obj) fitted values
  • resid(obj) residuals
  • plot(obj) diagnostic plot

142
lm example
  • fit lt- lm(log(y) group
  • poly(Time,2), datarats.ldf)
  • fit
  • coef(fit)
  • summary(fit)
  • anova(fit)
  • plot(rats.ldf, fitted(fit))
  • plot(rats.ldf, resid(fit))
  • plot(fit)

143
Model comparison
  • You can compare two model objects using
    anova(obj1,obj2)
  • for lm this gives an AOV table with an F test for
    each term (not parameter)
  • fit2 lt- lm(log(y)
  • group poly(Time,3), datarats.ldf)
  • anova(fit1,fit2)

144
Prediction
  • predict(obj) return fitted values
  • predict(obj,newdata) make predictions
  • newdata is a data frame with new covariate values
    to predict at
  • WARNING predict gives wrong answers when the
    parameterisation depends on the covariate values
    (eg poly)
  • use predict.gam instead (safe prediction)

145
predict example
  • newdata lt- ldframe(Subjectrep(13,rep(100,3)))
  • tsv(newdata,"T2") lt- seq(1,6,length100)
  • newdataTime lt- newdataT2
  • ssv(newdata,"group") lt- levels(rats.ldfgroup)
  • plot(rats.ldf, log(y),
  • generallist(xlimc(1,6)))
  • lines(newdata, predict.gam(fit2,newdata),
  • colorSubject)
  • lines(newdata, predict(fit2,newdata),
  • colorSubject) wrong!!

146
coef example
  • Lets look at variation at residual slope and
    intercept across subjects
  • resfit lt- lm(resid(fit)
  • Subject/Time - 1, datarats.ldf)
  • co lt- coef(resfit)
  • hist(co127)
  • hist(co-(127))

147
Missing data
  • If the response or any covariates contain NAs, lm
    will fail with an error message
  • gt lm(y group, datamilk.ldf)
  • Error in function(object, ...) missing values
    not allowed found in y
  • Dumped
  • Include the option na.actionna.omit to delete
    all rows with missing values before doing
    calculations

148
na.omit example
  • gt fit3lt-lm(ygroup, datamilk.ldf,
    na.actionna.omit)
  • gt fit3
  • Call
  • lm(formula y group, data milk.ldf,
    na.action na.omit)
  • Coefficients
  • (Intercept) group1 group2
  • 3.42467 -0.1097727 0.002512743
  • Degrees of freedom 1337 total 1334 residual
  • Dropped 164 cases due to missing values
  • Residual standard error 0.3198013

149
na.omit a problem
  • na.omit works by temporarily deleting rows with
    missing data from the data frame
  • lm thinks there are fewer rows than there are
  • resid and fitted return vectors as long as the
    number of non-missing rows
  • gt plot(milk.ldfy, resid(fit3))
  • Error in plot Lengths of x and y must match
  • gt length(milk.ldfy)
  • 1 1501
  • gt length(resid(fit3))
  • 1 1337

150
na.add a solution
  • You can use na.add to add a shortened vector to
    an ldframe, expanding it by inserting missing
    values where needed.
  • na.add(milk.ldf, resid) lt- resid(fit3)
  • milk.ldf41719,
  • y Time Subject group resid
  • 74 3.35 17 4 Barley -0.1819294
  • 75 3.28 18 4 Barley -0.2519294
  • 76 NA 19 4 Barley NA

151
na.add example
  • Milk data 4df splines within each treatment
    group.
  • a lt-lm(y group/ns(Time,4), datamilk.ldf,
    na.actionna.omit)
  • af lt- fitted(a)
  • na.add(milk.ldf, "spline") lt- af
  • plot(milk.ldf, spline, colorgroup)

152
na.omit an alternative
  • Alternatively, you can simply use na.omit to
    remove all rows with missing data from your data
    before you begin.
  • milk.nona.ldf lt- na.omit(milk.ldf)
  • Not always appropriate
  • Oswald functions work best with missing data left
    in.

153
Day 2 Session 4
  • Practical
  • Fitting models
  • Getting coefficients
  • Making predictions
  • Robust SEs

154
Day 3
  • Analysing Longitudinal Data

155
Day 3 Session 1
  • Variogram
  • Parametric correlation models
  • Likelihood ratio testing
  • Structured random effects

156
The variogram
  • Assume our data are replications of a stationary
    process Y(t)
  • The variogram is used to explore the correlation
    structure

157
Variogram
  • When u is small
  • ?(u)gt0 -- Measurement error
  • When u is large
  • if ?(u)ltlt?2, no independence even at large lags
  • random intercept

158
Creating stationarity
  • Before we can calculate the variogram, need to
    de-trend the data
  • Calculate the most complex model imaginable, and
    take residuals
  • For balanced data, residuals from saturated
    treatmenttime model
  • For unbalanced data, residuals from kernel smooth
    by treatment group

159
Stationarity -- balanced data
  • Kernel smoothing with zero bandwidth calculates
    means within group at each time
  • ksmooth.res returns residuals
  • res lt- ksmooth.res(milk.ldf, bygroup, band0)
  • plot(res, n.full10, colorssv(milk.ldf)group)
  • Also works with "slightly" unbalanced data

160
Stationarity -- unbalanced data
  • Choose the most complex classification
  • Choose bandwidth with RSC
  • Kernel smooth and take residuals with ksmooth.res

161
Sample variogram
  • Take stationary residuals rij
  • Calculate vijk½(rij- rik)2, uijktij- tik
  • Variogram estimate v(u) at lag u is average of
    all vijk where uijk is near u
  • "Near" means within a binwidth eps
  • Also estimate process variance ?2
  • average ½(rij- rlk)2 for i? l

162
Calculating the variogram
  • res lt- ksmooth.res(milk.ldf,
  • bygroup, band0)
  • vg lt- variogram(res)
  • plot(vg)
  • The variogram is calculated at all unique lags at
    least 1 binwidth apart
  • Change the binwidth with the eps argument

163
Variogram cloud
  • You can also calculate and plot all the vijk's by
    including the cloudT argument
  • vg2 lt- variogram(res, cloudT)
  • plot(vg2)
  • The plot includes the cloud as a scatterplot
  • Warning there can be many cloud points, which is
    why the cloud is not returned by default

164
Variogram interpretation
  • Intercept
  • measurement error
  • Process variance - stable value at large lags
  • random intercept
  • if it doesnt stabilise, not stationary!
  • Rate of increase to stable value
  • serial correlation

165
Mixed Effects Modelling
  • Sometimes, it makes sense to include random
    coefficients in our model
  • random intercept
  • random slope
  • This is useful in many ways
  • more natural models
  • more efficient parameterisation
  • explains correlation

166
Mixed effects examples
  • Random intercept
  • Yij (? Ui) ?tij ?ij
  • Random slope
  • Yij (? Ui) (? Si) tij ?ij
  • Multilevel effects
  • Yij (? Ui) (? Si) tij Gg ?ij

-- no can do!
167
Structured random effects
  • Y X?Z? ?
  • Random effects ? Var(?)G
  • Serial correlation ? Var(?)R
  • Random effects zero-mean, Normal
  • include fixed effects for nonzero mean
  • ? realised independently for each subject
  • 2-level model (multilevel gt 2 not possible)

168
Basic lme usage
  • How to specify, Z, G, R?
  • lme(fixed, random, cluster, data)
  • fixed linear model (X)
  • random RE model (Z)
  • cluster independent units
  • usually cluster Subject
  • data ldframe (or dataframe)
  • Other options for specifying G, R, ML/REML,
    (see ?lme.formula)

169
Random effect design
  • Specify random model formula for Z
  • random 1 random intercept
  • random Time random slope intercept
  • NB no LHS to formula!
  • By default, all fixed components included in
    random model.
  • All random effects assumed correlated
  • general (unstructured) form for G

170
lme -- rats
  • NB lme does not allow within-formula
    transformations
  • rats.ldflogy lt- log(rats.ldfy)
  • Parallel quadratics, random intercept
  • lme(logy group poly(Time,2), 1,
  • clusterSubject, datarats.ldf)

171
lme -- orthodontic data
  • Orthodont.plot()
  • orth1 lt- lme(fixed distance age,
  • random age, cluster Subject,
  • data Orthodont)
  • orth1
  • summary(orth1)

172
Predictions
  • Population predictions
  • assuming ?0 (i.e. EY)
  • Cluster predictions
  • EYi Zmode(?ik Y)
  • predict(obj, newdata, cluster)
  • predict(orth1, Orthodont, Subject)

173
Diagnostic plots
  • lme provides 3 types of diagnostic plot
  • scatterplot matrix of predicted REs
  • plot(orth1,optionc)
  • Residual plots
  • plot(orth1, optionr)
  • response vs fitted values
  • residuals vs fitted values
  • residuals by cluster
  • Standardised residual plots
  • plot(orth1, options)

174
RE covariance structures
  • By default, all between-random-effect covariances
    are estimated independently
  • unstructured G
  • With many random effects, a simpler structure may
    be preferred.

175
re.structure
  • re.structureunstructured
  • independently estimate all correlations (default)
  • re.structurecompsymm (compound symmetry)
  • all variances and covariances equal
  • re.structurediagonal
  • independent, different variances
  • re.structureidentity
  • independent, common variance
  • re.structurear1
  • common variance, AR(1) correlation

176
More complex structures
  • re.block can be used to group random effects into
    independent blocks
  • re.block list(c(1,2),3)
  • re.structure is given as a vector for the
    strucure of each block
  • re.structurec(compsymm,identity)
  • Or, you can design your own structures by writing
    an S function

177
Serial correlation
  • Structure for Var(?i) (and therefore R)
  • serial.structure
  • identity (constant variance, default)
  • ar1 (AR(1), unit spacing)
  • compsymm
  • ar2, ma1, ma2, arma11
  • DIY
  • Dont overparameterise!!

178
Balanced or unbalanced?
  • Random effects model
  • no problem here
  • Serial correlation model
  • identity, compsymm OK
  • ar1 etc. assume unit time spacing by default
  • lme(, serial.structurear1,
  • serial.covariate Time, serial.covariate.trans
    formationnone)

179
Diggle 3-component
  • Measurement error variance ?2
  • Random intercept variance ?2
  • Random process Wi(t) SGP(0, ?2?(?t ))
  • ?(?t ) ??t, correlation function
  • continuous-time AR(1)

180
Diggle 3-component example
  • Fit D3C model (almost) to rats data
  • lme(logy group poly(Time,2),
  • random 1, cluster Subject,
  • serial.structure"ar1", datarats.ldf)
  • Compare variance estimates with variogram
  • plot(variogram(ksmooth.res(rats.ldf,
  • log(y), bygroup, band0)))

181
REML or ML
  • ML estimation
  • est.methodML
  • REML estimation (default)
  • est.methodRML
  • Model comparison (LR test) with REML
  • only when fixed models are the same!

182
Model testing
  • Use anova to compare models
  • obj1 lt- lme(logy grouppoly(Time,2),
  • 1, Subject, datarats.ldf)
  • obj2 lt- lme(logy grouppoly(Time,2),
  • 1Time, Subject, re.struct"diag",
  • datarats.ldf)
  • anova(obj1,obj2)

183
Simplifying the variance model
  • If you give a variance parameter a zero initial
    estimate, it will be fixed at zero
  • Useful for testing whether a certain variance
    component is necessary in the model
  • Use a Likelihood Ratio test
  • Use CS with covariance0 for ID

184
Parameter estimation a warning
  • It can be difficult to get reliable variance
    parameter estimates from mixed models
  • flat likelihoods (and no standard errors!)
  • correlated parameters (and no est. correlations!)
  • easily overparameterised (with no warning!)
  • Restart your model with different initial
    estimates to check convergence
  • start list(D matrix, alpha c())

185
Day 3 Session 2
  • Practical
  • variograms
  • Parametric correlation models
  • Random effect models

186
Day 3 Session 3
  • Other topics
  • NLME
  • GEE
  • dropout modelling
  • alternating logistic regressions
  • resources
  • general discussion

187
Nonlinear mixed effects
  • nlme Allows a nonlinear model for with random
    effects
  • gt Orange.plot()
  • gt Orange.fit lt- nlme(
  • circumference A/(1 exp((B - age)/C)),
  • fixed list(A ., B ., C .),
  • random list(A ., B .),
  • cluster Tree, data Orange,
  • start list(fixed c(160, 700, 350)),
  • serial.structure "ar2",
  • var.function "expon")

188
Pharmokinetics example
  • one-compartment open model with first-order
    absorption and elimination
  • clearance ?1i, absorption ?2i, elimination ?3i
  • randomly varying per patient i
  • Theoph.plot()

189
GEE
  • Generalised Estimating Equations
  • Simplest case
  • EY X?
  • VarY not specified
  • working correlation matrix R

190
GEE for non-normal data
  • For non-Normal data
  • h(EY) X?
  • Link function h determined by dist. family
  • familygaussian identity
  • familybinomial logit
  • familygamma inverse
  • familypoisson log
  • familyquasi(link, variance)
Write a Comment
User Comments (0)
About PowerShow.com