Title: Development of Chemistry Indicators
1Development of Chemistry Indicators
Sediment Quality Objectives For California
Enclosed Bays and Estuaries
Scientific Steering Committee Meeting July 26,
2005
2Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
3Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
4Chemistry Indicators
- Several challenges to effective use
- Bioavailability
- Unmeasured chemicals
- Mixtures
5Objectives
- Identify important geographic, geochemical, or
other factors that affect relationship between
chemistry and biological effects - Develop indicator(s) that reflect relevant
biological effects caused by contaminant exposure - Develop thresholds and guidance for use in MLOE
framework
6Approach
- Use CA sediment quality data in developing and
validating indicators - Address concerns and uncertainty regarding
influence of regional factors - Document performance for realistic applications
- Investigate multiple approaches
- Both mechanistic and empirical methods
- Existing methods used by other programs
- Existing methods calibrated to California
- New approaches
7Approach
- Evaluate SQG performance
- Use CA data
- Use quantitative and consistent approach
- Select methods with best performance for expected
applications - Describe response levels (thresholds)
- Consistent with needs of MLOE framework
- Based on observed relationships with biological
effects
8Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
Data screening processing Strata Calibration
validation subsets
9Data Screening
- Appropriate habitat and geographic range
- Subtidal, embayment, surface sediment samples
- Chemistry data screening
- Valid data (from qualifier information)
- Nondetect values (estimated)
- Completeness (metals and PAHs)
- Minimum of 10 chemicals metals and organics
- Habitat type (surface, embayment, subtidal)
- Standardized sumsDDTs, PCBs, PAHs, Chlordanes
10Data Screening
- Toxicity data screening
- Valid data
- Selection of candidate acute and chronic toxicity
test - Lack of ammonia interference
- EPA toxicity test thresholds
- Acceptable control performance
- Matched data (toxicity and chemistry)
- Same station, same sampling event
- Test method amphipod mortality only
- Eohaustorius or Rhepoxynius
11Data Screening
12Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
Data screening processing Strata Calibration
validation subsets
13Strata
- Are there differences in contamination among
regions of CA that are likely to affect the
development of a chemical indicator? - Geographic Strata
- North (North of Pt. Conception)
- South (South of Pt Conception
- Habitat Strata
- Ports, Marinas, Shallow
- Magnitude of contamination
- Relationship between contamination and toxicity
14Strata
15Strata
16Strata Decisions
- Treat North and South as separate strata
- Different contamination levels and sources
- May be different empirical relationships with
effects - Adequate data for statistical analyses
- Do not distinguish among habitat regions
- Limited data for some habitats
- Added complexity of application
17Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
Data screening processing Strata Calibration
validation subsets
18Calibration and Validation Datasets
- Calibration/development dataset
- Screened data minus withheld validation data
- Calibration of SQGs
- Development of new SQGs
- Comparison of performance
- Validation dataset
- Confirm performance of candidate SQGs
19Validation Dataset
- Independent subset of SQO database plus new
studies - Approximately 30 of data, selected randomly to
represent contamination gradient - North and South data are proportional between the
calibration/development and validation datasets
20Bay/Estuary Samples inDatabase After Screening
Number of Samples (matched chemistry toxicity) Number of Samples (matched chemistry toxicity)
Stratum Calibration/Development Validation
North 504 298
South 800 328
21Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
Existing national SQGs Calibration of national
SQGs New approaches
22National SQGs
- Two main types of approaches
- Empirical and Mechanistic
- Empirical
- Intended to aid in prediction of potential for
adverse impacts - Derived from analysis of extensive field datasets
- Various approaches for development of chemical
values - Little explicit consideration of bioavailability
- Incorporate a wide range of chemicals
- Work best when applied to mixture of contaminants
in a sediment
23Empirical SQGs
SQG Metric Source
ERM Effects Range Median Analysis of diverse studies and effects values Mean Quotient for Chemical Mixture Long et al.
Consensus MEC Mid-range effect concentration Geometric mean of similar guidelines Mean Quotient for Chemical Mixture MacDonald et al, Swartz, SCCWRP
SQGQ-1 Mid-range effect concentration Subset of chemical guidelines from various sources Mean Quotient for Chemical Mixture Fairey et al.
Logistic Regression Regression model for each chemical Probability of Toxicity (Pmax) for Chemical Mixture Field et al.
24National SQGs
- Mechanistic
- Intended to assess potential for impacts due to
specific chemical groups, not predict overall
effects - Derived using equilibrium partitioning and
toxicological dose-response information - Incorporate water quality objectives
- Explicit consideration of bioavailability
- Applicable to a restricted range of chemicals
- Work best when applied to specific contaminants
25Mechanistic SQGs
SQG Metric Source
EqP Organics Acute and chronic effects Organic Carbon Normalized Sum of Toxic Units (TU) EPA CA Toxics Rule
EqP Metals Acid Volatile and Organic Carbon Normalized Difference Between metal concentration and strong binding capability EPA
26National SQGs
27Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
Existing national SQGs Calibration of national
SQGs New approaches
28Calibration of National SQGs
- Objective Improve empirical relationship between
chemistry and effects by modifying national SQGs
to address potential sources of uncertainty - Variation in bioavailability of organics
- Variation in natural background concentration of
metals - CA-Specific variations in chemical mixtures
- Differences in organic carbon content of sediment
influences exposure -
- Metal content of sediment matrix varies according
to particle type and source material -
- Relative proportions of contaminants within
regions of State may differ from national average
29Organics Bioavailability Calibration
- TOC normalization to represent changes in
bioavailability - Conc./TOC
- Evaluate whether predictive relationship for
chemical classes is improved after normalization - Correlation analysis
- Use normalized values as basis for SQG
calibration if there is evidence of improved
predictive relationship
30TOC Normalization
Relationship to sediment toxicity is not improved
by TOC normalization of organics
31Metal Background Calibration
- Metals occur naturally in the environment
- Silts and clays have higher metal content
- Source of uncertainty in identifying
anthropogenic impact - Background varies due to sediment type and
regional differences in geology - Need to differentiate between natural background
levels and anthropogenic input - Investigate utility for empirical guideline
development - Potential use for establishing regional
background levels
32Reference Element Normalization
- Established methodology applied by geologists and
environmental scientists - Reference element covaries with natural sediment
metals and is insensitive to anthropogenic inputs - Regression between reference element and metal
developed using a dataset of uncontaminated
samples - Regression line indicates natural background
metal concentration for different sediment
particle size composition - Use of iron as reference element validated for
southern California - 1994 and 1998 Bight regional surveys
33Iron Normalization Approach
- Log transformed data
- Selected subset of reference stations from SQO
database - Least potential for anthropogenic metal
enrichment - Nontoxic stations in lowest 30th percentile of
DDT, PCB, and PAH concentrations - Reviewed selected stations using GIS to eliminate
redundant and likely impacted sites - Calculated regressions
- Used residuals from regression as normalized
values - Compared relationship of normalized/non
-normalized data to toxicity
34Southern California Results
Significant regressions obtained for metals of
interests in all strata
35Residual Calculation
Residual relative metal enrichment Used for
correlation analysis with amphipod mortality
36Iron Normalization
Relationship to sediment toxicity is not improved
by iron normalization of metals
37Normalization Summary
- TOC and iron normalization are apparently not
effective for improving relationships between
chemistry and toxicity - Have not pursued use of normalized data in
calibrating/developing SQGs - Iron normalization may be useful for establishing
background metal levels
38Calibration of SQGs
- Adjustment of models or chemical specific values
based on California data - Logistic Regression Model (Pmax)
- Excluded individual chemical models with poor fit
- Antimony, Arsenic, Chromium, Nickel
- Adjusted Pmax model to fit CA data (N, S, All)
- ERM
- Derived CA-specific values using modified method
of Ingersoll et al. - Sample-based analysis
39CA ERM Calculation
- Select paired chemistry and amphipod toxicity
data by stratum - Log transform all chemistry data
- Classify samples as toxic/nontoxic based on 20
mortality threshold - Calculate median concentration of the nontoxic
samples - Select only those toxic samples where
concentration of individual chemicals gt 2x
nontoxic median - CA ERM median concentration from screened toxic
samples - At least 10 toxic samples required for ERM
calculation
40Substantial differences in some ERM values
derived for California datasets compared to
nationally derived values
41Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
Existing national SQGs Calibration of national
SQGs New approaches
42New SQG Characteristics
- Compatible with multiple line of evidence
assessment framework - Capability to include/adapt to new contaminants
of concern - Adaptable to different application objectives
- Able to use toxicity and benthic community impact
data in development - Result reflects uncertainty of empirical
relationship
- Categorical classification and multiple
thresholds - Based on individual chemical models or values
- Thresholds can be adjusted
- Accept continuous and categorical data
- Some type of weighting based on strength of
relationship
43Kappa Statistic
- Developed in 1960-70s
- Peer-reviewed literature describes derivation and
interpretation - Used in medicine, epidemiology, psychology to
evaluate observer agreement/reliability - Similar problem to SQG development and assessment
- Accommodates multiple categories of
classification - Multiple thresholds can be adjusted by user
- Categorical or ordinal data
- Result reflects magnitude of disagreement (can be
used to weight values) - Sediment quality assessment is a new application
44Kappa
- Evaluates agreement between 2 methods of
classification - Chemical SQG result
- Toxicity test result
- Magnitude of error affects score
45Â Chemical 1Good Association Between
Concentration and Effect(most of errors in cells
adjacent to diagonal)
Â
46Chemical 2 Poor Association Between
Concentration and Effect(more errors in
categories distant from diagonal)
Â
47Kappa Analysis Output
- Kappa (k)
- Similar to correlation coefficient
- Confidence intervals
- Multiple thresholds
- Optimized for correspondence to effect levels
- Applied to other data to predict effect category
(cat) - E.g., Category 1, 2, 3, or 4
48New Kappa SQGs
- Derived Kappa and thresholds for target chemicals
using amphipod mortality data - As, Cd, Cr, Cu, Pb, Hg, Ni, Ag, Zn , t chlordane,
t DDT, t PAH, t PCB - Calculated Kappa score for each chemical in
sample - k x cat
- Mean weighted Kappa score
- Average of k x cat
- Each constituent contributes to final
classification in a manner proportional to
reliability of relationship - Mixture joint effects model
- Maximum Kappa
- Highest Kappa score for any individual chemical
- Independent mixture effects model
49Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
Categorical classification Correlation Predictive
ability
50Evaluation Process
- Compare performance of candidate SQG approaches
in a manner relevant to desired application - Ability to accurately classify presence and
magnitude of biological effects based on
chemistry - California marine embayment data
- Use statistical measures to identify short list
of best performing approaches - Categorical classification
- Correlation
- Validate performance results
- Validation dataset
- Rank candidate approaches
- Examine significance of differences
- Predictive ability
51Evaluation of SQGs
- Categorical (ability to classify each station
into one of four toxicity response categories) - Kappa value
- Level 1lt10 mortality, Level 210-20, Level
320-40, Level 4gt40 - SQG thresholds optimized for best score
- Spearmans correlation coefficient
- Nonparametric measure of association
- Independent of Kappa calculation
- Validation
- Used same thresholds selected for calibration
dataset
52SQG Evaluation North
53SQG EvaluationSouth
54SQG ValidationNorth
All top ranked SQGs validate
55SQG ValidationSouth
All top ranked SQGs validate
56Significance of Differences
- Are the differences in performance significant to
the user? - Do differences in SQG ranking correspond to
greater accuracy, applicability, or utility of
the SQG? - Better predictive ability (efficiency)?
- Better sensitivity or specificity?
- Need to look at the data
57SQGs Applied to So CA Data
58Predictive Ability
Negative Predictive Value C/(CA) x 100(percent
of no hits that are nontoxic)Nontoxic
Efficiency SpecificityC/(CD) x 100(percent of
all nontoxic samples that are classified as a no
hit) Positive Predictive Value B/(BD) x
100(percent of hits that are toxic)Toxic
Efficiency SensitivityB/(BA) x 100(percent of
all toxic samples that are classified as a hit)
59South mERMq
- SQG performance is threshold dependent
- Inverse relationship between efficiency (toxic or
nontoxic) and specificity or sensitivity - Improved SQG accuracy when greater efficiency
obtained - Improved SQG utility when greater sensitivity or
specificity obtained without sacrificing
efficiency
60South mERMq
- Plots of efficiency vs. specificity or
sensitivity illustrate tradeoffs in SQG
performance at different thresholds
61South Candidate SQGs
- Mean weighted Kappa shows improved overall
utility for distinguishing both nontoxic and
toxic samples
62North Candidate SQGs
- Mean weighted Kappa shows improved specificity
and toxic efficiency
63Evaluation and Validation Summary
- North
- Mean weighted Kappa has highest performance
- Northern California ERM and Northern California
Pmax also perform better than others - South
- Mean weighted Kappa has highest performance
- Max Kappa also performs better than others
- Validation results consistent with evaluation
- The approaches are robust
64Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
65Conclusions
- Pursue mean weighted Kappa as component of
chemistry LOE - Best relationship with toxicity
- Easily adaptable to new chemicals or different
datasets - Provides information on strength of relationship
- Use EqP benchmarks as component of stressor
identification, not chemical LOE score - Predictive value not strong enough
- Provide guidance on calculation and
interpretation
66Presentation Overview
- Objectives
- Data preparation
- SQG calibration and development
- Validation
- Conclusions
- Next steps
Thresholds Benthos
67Options for Threshold Development
- Optimum statistical fit to effects in CA
- Toxicity only?
- Benthos only?
- Combination?
- Based on accuracy or error rate
- Consideration of national patterns
68National vs. CA data
North
South
- Narrower contamination range in CA
- High range threshold (1.5) of limited utility
69Benthos
- How should benthic community response be
incorporated into the chemical LOE - In the SQG approach?
- In the thresholds?
- Factors to consider
- Strength of relationship between benthos and
chemistry or toxicity - Relative sensitivity of benthos and toxicity
responses - Nature of association with chemistry
- Are there different drivers?
70Benthos
- Preliminary data analysis
- Used existing benthic response index (BRI) data
for So. Calif. and San Francisco Bay - South San Francisco Bay (North) n83
- Southern California (South) n203
- Examined three aspects of relationship with
chemistry - Strength of relationship with SQGs and chemicals
- Relative sensitivity of response compared to
toxicity - Chemical drivers
71Benthos
72Benthos
- Significant correlations are present between BRI
scores and SQGs or individual chemicals
73Benthos
- Strong correlation between benthic response and
amphipod mortality - Benthic response when no toxicity is evident
74Relative Sensitivity of Benthos Response
- Use cumulative distribution function to indicate
approximate thresholds for increased incidence of
impacts (10th percentile) and likely impacts
(50th percentile) - Compare results for toxicity and benthos (same
dataset)
75Relative Sensitivity of Benthos Response
- Toxicity and benthos responses occur over similar
contamination ranges
76Chemical Correlations North
Benthos
Chlordane, copper, and zinc show different
relative influence on effects
Toxicity
77Chemical Correlations South
Benthos
Cadmium, DDTs, and zinc show different relative
influence on effects
Toxicity
78Recommendations
- Develop thresholds of application specific to
toxicity and benthos - Need to incorporate both types of responses into
assessment - Continue development of a SQG that is best
predictor of benthic community impacts - May respond to different chemical mixtures
- Need revised benthic index data to complete
development and evaluation - Determine whether toxicity and benthos SQGs are
needed - A method to combine the results will be needed to
produce a single chemistry LOE score