Title: The DES DM Team
1- The DES DM Team
- Tanweer Alam1 Dora Cai1 Joe Mohr1,2
- Jim Annis3 Greg Daues1 Choong Ngeow2
- Wayne Barkhouse2 Patrick Duda1 Ray Plante1
- Cristina Beldica1 Huan Lin3 Douglas Tucker3
- 1 NCSA 2 UIUC Astronomy 3 Fermilab
- Astronomers
- Grid Computing, Middleware, Portals
- Database development, maintenance, Archive web
portal - NVO lead at NCSA
- Senior Developer Oversight Group
- Randy Butler, Mike Freemon, and Jay Alameda
- (NCSA)
2Architecture Overview
Components Pipelines Archive
Portals Development 30 FTE-yrs total
Current status 13 FTE-yrs to date
3Where are we today?Iterative/Spiral Development
- Oct 04-Sep05 initial design and development
- basic image reduction, cataloguing, catalog and
image archive, etc - Oct 05-Jan06 DC 1 deployed DES DM system v1
- Used Teragrid to reduce 700GB of simulated raw
data Fermilab into 5TB of images, weight maps,
bad pixel maps, catalogs - Catalogued, ingested and calibrated 50M objects
- Feb06-Sep06 refine develop
- full science processing through coaddition,
greater automation, ingestion from HPC platforms,
quality assurance, etc - Oct06-Jan 07 DC 2 deploy DES DM system v2
- Use NCSA and SDSC Teragrid platforms to process
500deg2 in griz with 4 layers of imaging in each
(equiv to 20 of SDSS imaging dataset, 350M
objects) - Use DES DM system on workstation to reduce Blanco
Cosmology Survey data (http//cosmology.uiuc.edu/B
CS) from MOSAIC2 camera - Evaluate ability to meet DES data quality
requirements
DC1 Photometry
DC1 Photometry
DC1 Astrometry
4DES Archive
- Components of the DES Archive
- Archive nodes filesystems that can host DES data
files - Large number-- no meaningful limit
- Distributed-- assumed to be non-local
- Database tracks data using metadata describing
the files and file locations - Archive web portal allows external (NVO) users
to select and retrieve data from the DES archive - Try it at https//des.cosmology.uiuc.edu9093/des/
5Archive Filesystem Structure
- host/root/Archive
- raw/
- nite/ (des2006105, des20061006, etc)
- src/ original data from telescope
- raw/ split and cross-talk corrected data
- log/ logs from observing and processing
- red/
- runid/
- xml/ location of main OGRE workflows
- etc/ location of SExtractor config files, etc
- bin/ all binaries required for job
- data/nite/
- cal/ biases, flats, illumination correction, etc
- raw/ simply a link to appropriate raw data
- log/ processing logs
- band1/ reduced images and catalogs for
band1 - band2/ and so on for each band
-
- cal/ calibration data (bad pixel masks, pupil
ghosts)
6DES Database
- Image metadata
- Many header parameters (including WCS params)
- All image tags that uniquely identify the DES
archive location - archive_site (fnal, mercury, gpfs-wan, bcs,
etc) - imageclass (raw, red, coadd, cal)
- nite, runid, band, imagename
- ccd_number, tilename, imagetype
- As long as we adopt a fixed archive structure we
can very efficiently track extremely large
datasets - Simulation metadata
- We could easily extend the DES archive to track
simulation data - Need to adopt some logical structure and we could
be up and running very rapidly
7Data Access Framework
- With DC2 we are fielding grid data movement tools
that are integrated with the DES archive - ar_copy copies dataset from one archive node to
another - ar_verify file by file comparison of datasets on
two archive nodes - ar_remove deletes dataset from archive node
- These tools update file locations within the DES
database - Data selected using file tags
- ar_copy -imclassraw -nitedes20051005
-imagetypesrc mercury gpfs-wan - ar_copy -imclassred -runidDES20061120_des2006101
0_01 mercury mss - Underlying grid-ftp tools can vary with archive
node - Most sites use Trebuchet, data movement tools
integrated with the Elf/OGRE middleware
development project at NCSA - FNAL uses globus-url-copy, because theres an
incompatibility with Trebuchet listing - Metadata in the DES db encode the grid-ftp
technology as well as combinations of buffer
sizes, number of parallel streams, etc for moving
large and small files - Recent test by Greg Daues achieved 100MB/s for
single copy Typically weve combined 5 or 6
copies in parallel to achieve total data movement
off Mercury of about 50MB/s
8Archive Portal https//des.cosmology.uiuc.edu909
3/des/ You will be redirected to NVO Login
9Archive Portal Image Query
10DC2 Overview
- Transferred 10 nights of simulated data from FNAL
Enstore - Roughly 3000 DECam exposures 500 deg2 in griz 4
layers deep plus 50 flats/biases each night - Currently Processed 8 of 10 nights
- Use Convert_Ingest pipeline to split data
crosstalk corr in this stage - Typically 20 jobs, each running a couple of hours
- Raw data are 600GB for each night
- Submit 62 processing jobs for each of these
nights - Each night produces 3.4TB, 35 million catalogued
objects for ingestion - Each job takes around 11hrs 1 CPU-month to
reduce a night of data - Stages zerocombine, flatcombine, imcorrect,
astrometry, remapping, cataloguing, fitscombine,
ingestion - Currently some jobs fail because of failures in
astrometric refinement - Ingest objects into the db
- Move data from processing platforms to storage
cluster and mass storage - Then determine photometric solution for each
band/night - Update zeropoints for all objects/images for that
night - Total data production 4.8TB raw, 27TB reduced,
240 million objects - Still to do complete processing, co-add all
data, extract summary statistics
11DC2 Challenges
- Scale of data- almost overwhelming overwhelming
- 330GB arrive 3.4TB produced by next day
- Ingesting 35 million objects is a challenge--
takes 10 hours if ingest rate is 1000 objects/s - Exploring sqlldr alternatives-- most come with a
price - Moving processed data off compute nodes is a
challenge- takes about 10 hours if transfer rate
is 100MB/s - New data movement tools making this more reliable
and automatic - Astrometry problems persist
- With BCS data we find that astrometry errors are
bad enough to produce double sources in a few
percent of the images this translates to at
least one failure per co-added image - Taking advice of Emmanuel Bertin to run SCAMP on
a per exposure basis rather than a per image
basis-- new astrometric refinement framework
currently being tested
12DC2 Photometry and Astrometry
- Nightly spot checks-- no exhaustive testing so
far - Astrometry scatter plots look much like DC1
- Photometry scatter plots dont look as good, but
we think we have figured out why - Diffraction spikes/halos added to stars in ImSim2
- Done in such a way as to augment total stellar
flux - This leads to an offset in our photometry at the
few percent level - Detailed statistics await further testing
- What is full distribution of astrometric and
photometric errors? - How do both depend on seeing, location on the
chip, intrinsic galaxy parameters, etc
13Coaddition Framework
- Three steps to coaddition
- Remapping images to std reference frame
- Determining relative flux scale for overlapping
remapped images - Combining remapped images (with filtering)
- DES DM enables a simple automated coadd
- Coadd tiling stored as metadata in the db
- db tools
- find all tiles associated with image
- find all images associated with tile
- Execution
- Reduced images immediately remapped (SWarp) to
each tile they overlap (and catalogued) - Flux scales determined through (1) db object
matching in overlapping images, (2) photometric
calibration and (3) relative throughput of chips
1-62 - Image combine (SWarp) happens en masse using
archive to find correct image combinations
14BCS Coadd Tests
g (2 deep)
i (3 deep)
- Test framework by creating 46 coadd tiles that
draw images from 10 different nights - griz, 36X36 with 0.26 pixels
- lt1hr job on server with 14 drive RAID5 disk array
- Issues
- Flux scaling ignored
- Combine algorithm sum
- Science quality?
- Some astrometry failures (double sources)
4
z (3 deep)
r (2 deep)
15Weak Lensing FrameworkMike Jarvis, Bhuv Jain,
Gary Bernstein, Erin Sheldon
- Science Strategy
- start from complete object lists and measure
shear for each object jointly using all available
reduced data - Draft DES DM strategy
- Measure shapes of all objects on reduced images
as part of standard reduction and cataloguing - Use isolated stars to model PSF distortions
across the survey - Catalog on coadded images to create complete
object lists - Use archive tools to select all reduced objects
(and images) for joint shear measurements that
include PSF corrections - Implementation just in infancy
- Shape measurements one more module for pipeline,
db schema change - Modeling PSF distortions computational (not
data) challenge - Complete object lists Coadd catalogs already
available in db - Final shear measurements a data challenge
- Apply data parallel approach grouping by sky
coordinates (coadd tiling)