Title: Better Data, Better Science Better Science through Better Data Management
1Better Data, Better Science! Better Science
through Better Data Management
- Todd D. OBrien
- NOAA NMFS - COPEPOD
2BETTER DATA is
- Easily Accessible
- Well Documented
- Integrated / Interlinked
- The Best Quality possible
3Oops! (When Data Management Fails)
4BETTER DATA is
- Easily Accessible
- Well Documented
- Integrated / Interlinked
- The Best Quality possible
5BETTER DATA is
- Easily Accessible
- Well Documented
- Integrated / Interlinked
- The Best Quality possible
6WHY QC?
- To find errors in the data
7WHY QC?
- To find errors in the data
- To detect instrument failure or sampling problems
8WHY QC?
- To find errors in the data
- To detect instrument failure or sampling problems
- To detect phenomena of scientific interest
- Natural physical or biological events
- Something new
9WHY QC?
- To find errors in the data that were not
present in the original data ?!
10WHY QC?
- To find errors in the data that were not
present in the original data ?! - Data Pathway errors
- human error
- computer error
11WHAT TO QC?
- Individual values (the measurements)?
- Profile of multiple values?
- Cruise of multiple profiles?
- Project of multiple cruises?
- Region or Ocean of multiple Projects?
- Entire World of multiple Regions?
12What software, tools, and skills are available?
13What software, tools, and skills are available?
14What software, tools, and skills are available?
15What software, tools, and skills are available?
16What software, tools, and skills are available?
17What software, tools, and skills are available?
18What software, tools, and skills are available?
19Lets get started
20QC OF THE WHAT HOW
21QC OF THE WHAT HOW
- Need to first understand the methods, variables,
and units of the data before trying to QC the
data
22QC OF THE WHAT HOW
- Need to first understand the methods, variables,
and units of the data before trying to QC the
data - Are all labels clear and unambiguous
- Are methods provided (or a reference)
- What are the value units
23QC OF THE WHEN WHERE
24QC OF THE WHEN WHERE
- Primary Data
- First, check the master ship record
- Then check PI files
25QC OF THE WHEN WHERE
- Primary Data
- First, check the master ship record
- Then check PI files
- Simple Range Checks
- Time (0-23? 1-24?)
- What is the time zone?
- Lat /- 90 Lon /- 180
- Are hemisphere signs present (E/W) or described
26QC OF THE WHEN WHERE
- Map the Cruise Track
- sorted by station sequence
- sorted by sampling time
27QC OF THE WHEN WHERE
- Calculate ship speed (distance/time) between
stations
28QC OF THE HOW MUCH
29QC OF THE HOW MUCH
- First, look at the background environment
- Check for depth inversions
- Check for density inversions
- Look at T vs. S plot
30QC OF THE HOW MUCH
- Look at the variable vs. depth
31QC OF THE HOW MUCH
- Check against basic value ranges
32QC OF THE HOW MUCH
- Check against basic value ranges
- Check for excessive gradients (spikes) between
values at adjacent depths
33QC OF THE HOW MUCH
34Expert / Specialist Data Centers
35Expert / Specialist Data Centers
- Can provide guidance on
- Metadata (standards, minimum requirements)
- Data Formats (format suggestions / review)
- Tools and Methods
36Expert / Specialist Data Centers
- Can provide guidance on
- Metadata (standards, minimum requirements)
- Data Formats (format suggestions / review)
- Tools and Methods
- May have advanced visualization or QC methods
available for your data.
37(No Transcript)
38Empirical Comparisons with Historical
Observations (ECHO)
39Expert / Specialist Data Centers(just a few
examples)
- CCHDO- CLIVAR Carbon Hydrographic Data Office
- BCO-DMO- Biological and Chemical Oceanography
Data Management Office - BODC- British Oceanographic Data Centre
- COPEPOD- Coastal Oceanic Plankton Ecology,
Production Observation Database
40The Conclusions
41Some Conclusions
- Each additional layer of QC and examination may
highlight issues that were previously undetected.
42Some Conclusions
- Each additional layer of QC and examination may
highlight issues that were previously undetected. - Each instance of transfer or reformatting the
data has a chance of introducing new errors (or
data loss).
43Some Conclusions
- Each additional layer of QC and examination may
highlight issues that were previously undetected. - Each instance of transfer or reformatting the
data has a chance of introducing new errors (or
data loss). - The comprehensiveness of the co-stored metadata
will determine the extent to which the data are
still usable/understandable 10 years after the
project.
44BETTER DATA is
- Easily Accessible
- Well Documented
- Integrated / Interlinked
- The Best Quality possible