Data Mining Approaches for Water Quality Protection - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Data Mining Approaches for Water Quality Protection

Description:

Center for Health Effects of Environmental Contamination. The University of ... Antimony. Arsenic. Asbestos. Barium. Beryllium. Cadmium. Chromium (total) Copper ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 33
Provided by: Networ58
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Approaches for Water Quality Protection


1
Data Mining Approaches for Water Quality
Protection
Edwin Brands, R. Rajagopal, Lifang Huang,
Katie Foreman, Bethany Gast,
David Riley
Department of Geography Center for Health
Effects of Environmental Contamination The
University of Iowa, Iowa City IA
Cooperative State Research, Education, and
Extension Service
Grant 2001-51130-11373
2
Data Mining Exercise
  • Goal Create a word from the following

76 83 69 68 73
  • What do we need to do first?

3
Data Mining Exercise
  • Recognize the form of the data and translate

76 L 83 S 69 E 68 D 73 I
4
Data Mining Exercise
76 83 69 68 73
Re-order these characters
  • L S E D I

? ? ? ? ?
120 possible combinations
S L I D E
5
Data Mining Steps
  • Cleaning (Standardization, Translation)
  • Processing (Reorientation, Re-ordering)
  • Winnowing (Sifting, Narrowing Down, Choosing the
    Right Word)

6
Data Mining (our definition)
  • Relies on cognition, pattern recognition,
    statistics, and computer programming in order to
    sift through large databases in an organized
    manner and uncover valuable relationships.
  • In the context of this project the process of
    squeezing the juice out of the SDWA and other
    datasets.

7
Background Water Quality in Agroecosystems
  • 3 Year Effort
  • Year 1 SDWA
  • Year 2 CWA/ambient water
  • Year 3 Integration

8
SDWA, Overarching Goal
  • To protect public health 250 million consumers
    of water from 54,000 Community Water Supplies
  • Treatment/Filtration
  • Monitoring/Reporting 86 constituents

9
SDWA and Data Collection
  • Each year, millions of worth of data collected
    for compliance with SDWA
  • Total investment (1974-present) gt 1 billion
  • Data used for
  • making treatment/blending decisions
  • determining compliance
  • adjusting sampling frequency
  • informing consumers about water quality

10
SDWA What is Measured?
  • Organic Chemicals (e.g. pesticides)
  • Inorganic Chemicals (e.g. arsenic)
  • Disinfectants/Disinfection Byproducts
    (e.g. chlorine)
  • Microorganisms (e.g. coliform bacteria)
  • Radionuclides (e.g. radium)

11
SDWA CONSTITUENTS
Toxaphene 2,4,5-TP (Silvex) 1,2,4-Trichlorobenzene
1,1,1-Trichloroethane 1,1,2-Trichloroethane Trich
loroethylene Vinyl chloride Xylenes (total)
cis-1,2-Dichloroethylene trans-1,2-Dichloroethylen
e Dichloromethane 1,2-Dichloropropane Di(2-ethylhe
xyl) adipate Di(2-ethylhexyl) phthalate Dinoseb Di
oxin (2,3,7,8-TCDD) Diquat Endothall Endrin Epichl
orohydrin Ethylbenzene Ethylene
dibromide Glyphosate Heptachlor Heptachlor
epoxide Hexachlorobenzene Hexachlorocyclopentadien
e Lindane Methoxychlor Oxamyl (Vydate) Polychlorin
ated biphenyls (PCBs) Pentachlorophenol Picloram S
imazine Styrene Tetrachloroethylene Toluene
Microorganisms (7) Cryptosporidium Giardia
lamblia Heterotrophic plate count Legionella Total
Coliforms Turbidity Viruses (enteric) Disinfect
ion Byproducts (4) Bromate Chlorite Haloacetic
acids (HAA5) Total Trihalomethanes
(TTHMs) Disinfectants (3) Chloramines (as
Cl2) Chlorine (as Cl2) Chlorine dioxide (as
ClO2) Inorganic Chemicals (16) Antimony Arsenic
Asbestos Barium Beryllium Cadmium
Chromium (total) Copper Cyanide (as free
cyanide) Fluoride Lead Mercury (inorganic) Nitrate
(measured as Nitrogen) Nitrite (measured as
Nitrogen) Selenium Thallium Organic Chemicals
(53) Acrylamide Alachlor Atrazine Benzene Benzo(a
)pyrene (PAHs) Carbofuran Carbon
tetrachloride Chlordane Chlorobenzene 2,4-D Dalapo
n 1,2-Dibromo-3-chloropropane o-Dichlorobenzene p
-Dichlorobenzene 1,2-Dichloroethane 1,1-Dichloroet
hylene
86 Contaminants
Radionuclides (4) Alpha particles Beta particles
and photon emitters Radium 226 and Radium
228 Uranium
12
Public Water Supplies in Eastern and Central Iowa
Watersheds
Legend
Public Water Supply
State Boundary
Iowa River Watershed
Des Moines River Watershed
13
Data Received from CHEEC
14
Data Received from CHEEC

15
Data Standardization and Transformation
  • Problems with original data set
  • Incomplete information (Sample IDs missing in
    some records)
  • Difficult to collapse and generalize (Due to file
    structure)
  • Data Standardization
  • Create unique sample ID for records missing IDs
  • Database Transformation
  • Transform original database into tabular format
  • Remove redundancy

16
Data Standardization
Missing Sample IDs
Replace with unique IDs
17
Original File Structure
18
Transformation Steps/Rules
  • Create a list of all measured constituents
  • Transpose them to read horizontally rather than
    vertically
  • Eliminate all duplicate analyses (same day,
    place, and sample)
  • Average all replicate analyses (same day, place,
    different sample)

19
File Transformation 4 Potential Methods
  • Manual
  • Excel Pivot Table
  • Access Cross Tab Query
  • Original Computer Code (Lifang)

20
Transformed File (Flat File, or Tabular Format)
21
(No Transcript)
22
File Structure Comparison
  • New Structure
  • Tabular
  • Each row 1 sample Each column 1 constituent
  • Easy to collapse or generalize
  • Large amount of blank space
  • Old Structure
  • Relational
  • Each row 1 record
  • Not easy to collapse or generalize
  • Easy to retrieve individual items
  • Efficient storage

23
What do we want to know?
  • In the context of SDWA protection of public
    health is the main goal, so
  • Which SDWA contaminants are of public
  • health concern?
  • Which SDWA contaminants occur in
  • what concentrations?

24
Narrowing the Database
  • Create a summary table (min/max, mean/median,
    percentiles)
  • Merge summary table with SDWA requirements
  • Keep only SDWA contaminants
  • Delete all non-occurring contaminants
  • Delete contaminants whose maximum found value is
    less than 1/5th of the SDWA standard
  • Generate a final table of contaminants

25
Summary Table
26
Narrowing the Database
  • Create a summary table (min/max, mean/median,
    percentiles)
  • Merge summary table with SDWA requirements
  • Keep only SDWA contaminants
  • Delete all non-occurring contaminants
  • Delete contaminants whose maximum found value is
    less than 1/5th of the SDWA standard
  • Generate a final table of contaminants

27
Final Table
28
Results from 18 Supplies
Average of Contaminants 7
29
SDWA CONSTITUENTS
Toxaphene 2,4,5-TP (Silvex) 1,2,4-Trichlorobenzene
1,1,1-Trichloroethane 1,1,2-Trichloroethane Trich
loroethylene Vinyl chloride Xylenes (total)
cis-1,2-Dichloroethylene trans-1,2-Dichloroethylen
e Dichloromethane 1,2-Dichloropropane Di(2-ethylhe
xyl) adipate Di(2-ethylhexyl) phthalate Dinoseb Di
oxin (2,3,7,8-TCDD) Diquat Endothall Endrin Epichl
orohydrin Ethylbenzene Ethylene
dibromide Glyphosate Heptachlor Heptachlor
epoxide Hexachlorobenzene Hexachlorocyclopentadien
e Lindane Methoxychlor Oxamyl (Vydate) Polychlorin
ated biphenyls (PCBs) Pentachlorophenol Picloram S
imazine Styrene Tetrachloroethylene Toluene
Microorganisms (7) Cryptosporidium Giardia
lamblia Heterotrophic plate count Legionella Total
Coliforms Turbidity Viruses (enteric) Disinfect
ion Byproducts (4) Bromate Chlorite Haloacetic
acids (HAA5) Total Trihalomethanes
(TTHMs) Disinfectants (3) Chloramines (as
Cl2) Chlorine (as Cl2) Chlorine dioxide (as
ClO2) Inorganic Chemicals (16) Antimony Arsenic
Asbestos Barium Beryllium Cadmium
Chromium (total) Copper Cyanide (as free
cyanide) Fluoride Lead Mercury (inorganic) Nitrate
(measured as Nitrogen) Nitrite (measured as
Nitrogen) Selenium Thallium Organic Chemicals
(53) Acrylamide Alachlor Atrazine Benzene Benzo(a
)pyrene (PAHs) Carbofuran Carbon
tetrachloride Chlordane Chlorobenzene 2,4-D Dalapo
n 1,2-Dibromo-3-chloropropane o-Dichlorobenzene p
-Dichlorobenzene 1,2-Dichloroethane 1,1-Dichloroet
hylene
86 Contaminants
Radionuclides (4) Alpha particles Beta particles
and photon emitters Radium 226 and Radium
228 Uranium
30
Conclusion/Question
  • Only about 3.3 (7 out of 210) of contaminants
    are found in significant concentrations in our 18
    supplies.
  • If similar findings hold for the state of Iowa
    and/or the entire U.S., what does this say about
    SDWA monitoring requirements?

31
Limitations
  • Small number of supplies
  • Similar types of supplieswould such an approach
    be useful with a homogeneous group of systems or
    would additional processing steps be necessary?
  • Different data mining rules or procedures might
    produce different outcomes

32
Summary
  • Cleaning
  • Processing
  • Winnowing
Write a Comment
User Comments (0)
About PowerShow.com