Title: Research Techniques
1 Prof. Andy Mauromoustakos
Presented To
Research Techniques Probability Theory
Analysis of multivariate Data using JMP Heavy
metals in marine sediments, differences between
locations
Presented by Tareq Altamimi econ0306
2Slides Index
- Database Overview Objectives.
Click INDEX to return back here or Enter to go to
the next Slide
- Database Variables definition.
- Correlation between Variables.
- Repeated measures Response.
3Presentation Overview
- This presentation is a description of an
important statistical subject which is
multivariate analysis using JMP program. The
analysis in this presentation is done for a
database collected by scientists in the
University of Melbourne at the department of
biology. The analysis is mainly about heavy
metals in marine sediments, differences between
llocations. We take three different locations
with four observations(experiments) each. -
- JMP make it easy to see the results of this
changes in the heavy metals in marine sediments
in an easy graphically presented data. This
graphics is called charts, these chars can be
translated easily by statisticians. This
presentation also will include a translation of
the results of analysis in three different
locations which this database was tested on. -
- The Analysis in this project is done on
multivariate analysis on variance (MANOVA) which
is used to see the main and interaction effects
of categorical variables on multiple dependent
interval variables. MANOVA uses one or more
categorical independents as predictors. MANOVA
tests the differences in the centred (vector) of
means of the multiple interval dependents, for
various categories of the independent(s). There
are multiple potential purposes for MANOVA. So
all of the analysis here will be done by MANOVA
using JMP Program.
INDEX
4Why MANOVA, why Repeated Measures?
Why JMP?
- JMP dynamically links statistics and graphics so
you can easily explore data, make discoveries,
and gain the knowledge you need to make better
decisions. Click on a point in a graph to
highlight the corresponding observation
everywhere it is represented in JMP in other
graphs, in 3-D spinning plots, and in the data
tables. JMP provides a comprehensive set of
statistical tools as well as Design of
Experiments (DOE) and advanced quality control
(QC and SPC) tools for Six Sigma in a single
package. Advanced modelling techniques include
ANOVA and MANOVA, stepwise, log linear, ordinal
logistic regression, survival/reliability, true
non-linear modelling, partitioning (decision
trees), neural networks, time series
multivariate, cluster, discriminant, and partial
least squares (PLS). The JMP Scripting Language
(JSL) lets you capture the results of your work
in automatically-generated scripts, and offers
all the power of a programming language, complete
with matrix algebra support, so you can create
custom analyses, interactive graphics, and more. - Multivariate statistics were developed to handle
situations where multiple variables or measures
are involved. Any analysis of more than two
variables or measures can loosely be considered a
multivariate statistical analysis or multivariate
analysis. One of the primary goals of
multivariate statistical analysis is to describe
the relationships among a set of variables. The
multivariate analysis is widely used in various
fields, such as agriculture, food and life
sciences, business and engineering and so on.
Repeated measures analysis (also called
longitudinal data when repeated measurements are
taken on each subject and you want to analyse
effects both between subjects and within subjects
across the measurements. This multivariate
approach is especially important when the
correlation structure across the measurements is
arbitrary.
INDEX
5The JMP way of doing things is best summarised in
the following four points
- Variables are assigned to one of the three levels
of measurement nominal, ordinal or continuous.
This assignment is under user control and may be
changed at will. It is used to allow JMP to
decide what summary statistics to provide and
what techniques are suitable for analysis using
the variable. - In any analysis, each variable may be assigned
one of the roles X, Y, weight, frequency or
label. - Using a combination of the above two pieces of
information, JMP is able to decide on an
appropriate analysis if the user chooses one of
the following activities, referred to as
'personalities' - Distribution of Y for single variable summaries
and plots. - Fit Y by X for one response and one explanatory
variable, the techniques employed are ANOVA, LS
regression, logistic regression or contingency
table analysis depending on the levels of the X
and Y variables. - Fit Model for variable numbers of responses and
explanatory variables, under this heading a range
of techniques are available including ANOVA,
ANCOVA, MANOVA, LS regression and stepwise
procedures, logistic and ordinal regression,
log-linear models, proportional hazard models,
screening models and D-optimal designs. - Non-linear Fit for non-linear models specified by
the user and fit using Gauss-Newton or
Newton-Raphson and with either one of a range of
built in loss functions or one specified by the
user. - Correlation of Ys for examination of the
correlation or covariance structure of a set of
variables, including scatter plot matrices, PCA
and factor analysis. - Cluster for cluster analysis using hierarchical
and K- means approaches. - Survival for survival analysis using
Kaplan-Meier, Cox regression and non-linear
survival models. - Most of the analysis 'personalities' produce
graphs, many of them dynamic, as part of their
standard output. There are always a range of
additional outputs both textual and graphical
available from analyses.
INDEX
6MANOVA Tests
- There are multiple potential purposes for MANOVA.
- To compare groups formed by categorical
independent variables on group differences in a
set of interval dependent variables. - To use lack of difference for a set of dependent
variables as a criterion for reducing a set of
independent variables to a smaller, more easily
modeled number of variables. - To identify the independent variables which
differentiate a set of dependent variables the
most.
- MANOVA has four main tests described as
following - Hotelling's T-Square is the most common,
traditional test where there are two groups
formed by the independent variables. Note one may
see the related statistic, Hotelling's Trace
(a.k.a. Lawley-Hotelling or Hotelling-Lawley
Trace). To convert from the Trace coefficient to
the T-Square coefficient, multiply the Trace
coefficient by (N-g), where N is the sample size
across all groups and g is the number of groups.
The T-Square result will still have the same F
value, degrees of freedom, and significance level
as the Trace coefficient. - Wilks' lambda, U. This is the most common,
traditional test where there are more than two
groups formed by the independent variables. It is
a measure of the difference between groups of the
centroid (vector) of means on the independent
variables. The smaller the lambda, the greater
the differences. The Bartlett's V transformation
of lambda is then used to compute the
significance of lambda. Wilks's lambda is used,
in conjunction with Bartlett's V, as a
multivariate significance test of mean
differences in MANOVA, for the case of multiple
interval dependents and multiple (gt2) groups
formed by the independent(s). The t-test,
Hotelling's T, and the F test are special cases
of Wilks's lambda. - Pillai-Bartlett trace, V. Multiple discriminant
analysis (MDA) is the part of MANOVA where
canonical roots are calculated. Each significant
root is a dimension on which the vector of group
means is differentiated. The Pillai-Bartlett
trace is the sum of explained variances on the
discriminant variates, which are the variables
which are computed based on the canonical
coefficients for a given root. Olson (1976) found
V to be the most robust of the four tests and is
sometimes preferred for this reason. - Roy's greatest characteristic root (GCR) is
similar to the Pillai-Bartlett trace but is based
only on the first (and hence most important)
root. Specifically, let lambda be the largest
eigen value, then GCR lambda/(1 lambda).
INDEX
7Heavy metals in marine sediments, differences
between locations -
The University of Melbourne is doing a research
on Marine sediments. The main aim is to discuss
the heavy metals in this marine sediments if the
location is changed, is there any changes in the
levels of these sediments?. The answer of this
problem will be discuss in this database study on
this presentation. This analysis will be done
using JMP software.
Research objectives
One objective for this particular study was to
determine if there is any differences of the
percentage of heavy metals in marine sediments on
different locations. Treatment Design The
treatments included the quantities of cooper(CU),
lead(PB), nickel(NI) and manganese(MN). In every
location we get 4 different samples and we made
the experiment on it. We did this in the four
location which are Delray, Seaspray and
Woodside. Experiment Design Samples of marine
sediments were randomly assigned to the four
treatments in a completely randomized design. The
treated marine sediments samples were placed in
airtight containers and incubated under
conditions conducive to microbial activity. This
experiment was done to discover it there is
differences in the percentage of heavy metals in
marine sediments if the sample is taken from
different locations?, is there any effect of the
location in these percentages? These differences
can be measured by measuring the three main heavy
metals in it. The heavy measures were measured in
three different locations. In each location we
have four different samples to make sure that we
have reached to a general conclusion about this
kind of amazing marine sediments. The heavy metal
quantity in each marine sediment sample was
recorded on an idealized experiment area. The
data is already shown in the previous slides and
a profile plot from the Fit Models MANOVA
personality is shown on the right.
INDEX
8Database Variables Definition
- Site the first Variable in this database is
the site which is the place where the experiment
had done. In this database we have three
different sites , which is (Delray beach, Sea
spray, Woodside). This Variable is important
because it divides the data in three different
group depending on the area the experiment had
been done.The following variable are continuous
as CU,PB,NI and MN are dependant variables, this
is considered as one of the main conditions of
MANOVA. - Factor Variable
- Site It is having values of the area where the
marine sediments had already taken from. - Responses
- CU is the concentration of Copper in marine
sediment, on every site we have four different
tests. - PB This Variable represent is the
concentration of lead in Marine sediment. - NI It is the concentration of nickel in
marine sediment. - MN It is the concentration of manganese in
marine Sediment. - This last four Variables are considered the
most important variables because it will show to
us the differences or the similarity of the
marine sediments in three different sites. Some
other Variable used in this database is the log10
transformation of the previous mentioned
variables, in addition to the log10
transformation of FE - LCU - log10 transformation of CU LPB -
log10 transformation of PB - LNI - log10 transformation of NI
LMN - log10 transformation of MN - LFE log10 transformation of FE
- This data which we have is eligible to the
MANOVA conditions as we have continuous dependant
variable and categorical independent variable.
NEXT
9Database of heavy metals in marine sediments,
differences between locations.
SITE - sites from which data were collected
(Delray Beach, Seaspray,Woodside)
CU - concentration of copper PB -
concentration of lead NI - concentration of
nickel MN - concentration of manganese LCU
- log10 transformation of CU LPB - log10
transformation of PB LNI - log10
transformation of NI LMN - log10
transformation of MN LFE
log10 transformation of FE
Sea spray
Woodside
Delray
Marine sediments
NEXT
10My models (scripts)
This picture showing us how the data look like at
the JMP program during the analysis.
The database in JMP program
To know what does this mean look at the previous
page in the previous page
INDEX
11Correlation between variables
The table behind is called the Correlations
table, which is a matrix of correlation
coefficients that summarizes the strength of the
linear relationships between the each response
which are the heavy metal variables (CU,PB,NI,
MN, LCU,LFE,LNI,LMN). The scatter plot matrix on
the left showing that there is a relation between
the different items in the data so it shows that
the data is correlated, this correlation is not a
high correlation but it is correlated. If we get
the LFE variable with the LNI it shows a strong
relationship between both in the plot.
INDEX
12Description of how some analysis work in JMP.
- In MANOVA analysis you can select a response
design that indicates whether you want to use the
response variables individually or in some linear
combination. JMP like most software supports
several response designs, but it also allows you
to build your own. Included designs - Rep. Measures Automatic analysis of
repeated measures design. - This is the way used in this database analysis.
- Sum the sum of the responses, one value
- Identity each response, the identity matrix (no
transformation MI) - Contrast each response (except the first) minus
the first - Polynomial orthogonal polynomials
- Helmert each response versus the ones after it,
except the last - Profile each response versus all others, except
the last - Mean each response versus the mean of the
others, except the last - Compound for response forming a compound of more
than one effect - Custom any M matrix you want to enter a d edit
yourself.
INDEX
13This third column shows the cumulative percent of
variation represented by the eigenvalues. The
first three principal components account for
93.6277 of the variation in the sample.
The Spinning Plot platform displays a
three-dimensional spinnable plot.
INDEX
14Analysis with MANOVA
- Least Squares Means Report
- This graph give us for each pure nominal effect,
the overall least squares means of all the heavy
metals and their log transformations and profile
plots of the means. Shows the profile plot of the
metal and their transformations logs and the
table of least squares means. -
- The second graph is showing to us every site mean
so it dividing to us the over all means depending
on the location where every sample is gotten
from. It also include the table of least square
means organized by sites (locations). From here
we can see that the least square mean of the
Woodside area is more than the Seapray and the
Seapray results is also more than Delray. This
also can be noticed from the table of least
squares of every variable described under this
graph.
NEXT
15Partial covariance and correlation tables
The Partial Correlation table here shows the
covariance matrix and the partial correlation
matrix of residuals from the initial fit. The
partial correlation table shows the partial
correlations of each pair of variables after
adjusting for all the other variables so we can
notice how its designed to make relations between
every heavy metal here. Notice that the diagonal
is 1 always in the partial correlation.
- The main ingredients of multivariate tests are
the E and the H matrices - The elements of the E matrix are the cross
products of the residuals.
E H meaning click here - The H matrices correspond to hypothesis sums of
squares and cross products. - There is an H matrix for the whole model and for
each effect in the model. Diagonal elements of
the E and H matrices correspond to the hypothesis
(numerator) and error (denominator) sum of
squares for the univariate F tests. New E and H
matrices for any given response design are formed
from these initial matrices, and the multivariate
test statistics are computed from them.
INDEX
16The MANOVA Analysis In this MANOVA analysis we
choose to use the Repeated measures response
because the data has several observations in
every site we have. If we look to the F-test we
can notice that its intercept test probgtf
0.0001 which is less than 0.05(a). From the main
principals of the Multivariate tests we will
notice that in Roy's max root test probgtf is
0.0353 which is less than 0.05. From here we can
understand that there is a difference in the
level of heavy metals in marine sediments when we
change the location where the sample is gotten
from. The result of this test improve that
the level of heavy metals is varied whenever we
change the location.
INDEX
17Conclusion
- After applying a statistics analysis to the data
of marine sediments in three different locations
(Woodside,Sea spray,Delray) we discover that the
heavy metals levels in the marine sediments
depend rationally on the area where these marine
sediments is located.
INDEX
18References
Experimental Design and Data Analysis for
Biologists Gerry Quinn Mick Keough Chapter 16
Multivariate analysis of variance and
discriminant analysis Published by Cambridge
University Press 2002.
http//www.jmp.com/ other alternative
website.
PowerPoint presentation on Marketing Research
Part B Continuous Data Applications Multivariate
Analysis and other PDF and printed material.
Dr. Andy Mauromoustakos
JMP version 5 help.
INDEX
The database is a study of University of
Melbourne.
Home