The DES DM Team - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

The DES DM Team

Description:

Jim Annis3 Greg Daues1 Choong Ngeow2. Wayne Barkhouse2 Patrick Duda1 Ray ... Randy Butler, Mike Freemon, and Jay Alameda (NCSA) December 11, 2006. DES DM - Mohr ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 16
Provided by: desdoc
Category:
Tags: des | alameda | team

less

Transcript and Presenter's Notes

Title: The DES DM Team


1
  • The DES DM Team
  • Tanweer Alam1 Dora Cai1 Joe Mohr1,2
  • Jim Annis3 Greg Daues1 Choong Ngeow2
  • Wayne Barkhouse2 Patrick Duda1 Ray Plante1
  • Cristina Beldica1 Huan Lin3 Douglas Tucker3
  • 1 NCSA 2 UIUC Astronomy 3 Fermilab
  • Astronomers
  • Grid Computing, Middleware, Portals
  • Database development, maintenance, Archive web
    portal
  • NVO lead at NCSA
  • Senior Developer Oversight Group
  • Randy Butler, Mike Freemon, and Jay Alameda
  • (NCSA)

2
Architecture Overview
Components Pipelines Archive
Portals Development 30 FTE-yrs total
Current status 13 FTE-yrs to date
3
Where are we today?Iterative/Spiral Development
  • Oct 04-Sep05 initial design and development
  • basic image reduction, cataloguing, catalog and
    image archive, etc
  • Oct 05-Jan06 DC 1 deployed DES DM system v1
  • Used Teragrid to reduce 700GB of simulated raw
    data Fermilab into 5TB of images, weight maps,
    bad pixel maps, catalogs
  • Catalogued, ingested and calibrated 50M objects
  • Feb06-Sep06 refine develop
  • full science processing through coaddition,
    greater automation, ingestion from HPC platforms,
    quality assurance, etc
  • Oct06-Jan 07 DC 2 deploy DES DM system v2
  • Use NCSA and SDSC Teragrid platforms to process
    500deg2 in griz with 4 layers of imaging in each
    (equiv to 20 of SDSS imaging dataset, 350M
    objects)
  • Use DES DM system on workstation to reduce Blanco
    Cosmology Survey data (http//cosmology.uiuc.edu/B
    CS) from MOSAIC2 camera
  • Evaluate ability to meet DES data quality
    requirements

DC1 Photometry
DC1 Photometry
DC1 Astrometry
4
DES Archive
  • Components of the DES Archive
  • Archive nodes filesystems that can host DES data
    files
  • Large number-- no meaningful limit
  • Distributed-- assumed to be non-local
  • Database tracks data using metadata describing
    the files and file locations
  • Archive web portal allows external (NVO) users
    to select and retrieve data from the DES archive
  • Try it at https//des.cosmology.uiuc.edu9093/des/

5
Archive Filesystem Structure
  • host/root/Archive
  • raw/
  • nite/ (des2006105, des20061006, etc)
  • src/ original data from telescope
  • raw/ split and cross-talk corrected data
  • log/ logs from observing and processing
  • red/
  • runid/
  • xml/ location of main OGRE workflows
  • etc/ location of SExtractor config files, etc
  • bin/ all binaries required for job
  • data/nite/
  • cal/ biases, flats, illumination correction, etc
  • raw/ simply a link to appropriate raw data
  • log/ processing logs
  • band1/ reduced images and catalogs for
    band1
  • band2/ and so on for each band
  • cal/ calibration data (bad pixel masks, pupil
    ghosts)

6
DES Database
  • Image metadata
  • Many header parameters (including WCS params)
  • All image tags that uniquely identify the DES
    archive location
  • archive_site (fnal, mercury, gpfs-wan, bcs,
    etc)
  • imageclass (raw, red, coadd, cal)
  • nite, runid, band, imagename
  • ccd_number, tilename, imagetype
  • As long as we adopt a fixed archive structure we
    can very efficiently track extremely large
    datasets
  • Simulation metadata
  • We could easily extend the DES archive to track
    simulation data
  • Need to adopt some logical structure and we could
    be up and running very rapidly

7
Data Access Framework
  • With DC2 we are fielding grid data movement tools
    that are integrated with the DES archive
  • ar_copy copies dataset from one archive node to
    another
  • ar_verify file by file comparison of datasets on
    two archive nodes
  • ar_remove deletes dataset from archive node
  • These tools update file locations within the DES
    database
  • Data selected using file tags
  • ar_copy -imclassraw -nitedes20051005
    -imagetypesrc mercury gpfs-wan
  • ar_copy -imclassred -runidDES20061120_des2006101
    0_01 mercury mss
  • Underlying grid-ftp tools can vary with archive
    node
  • Most sites use Trebuchet, data movement tools
    integrated with the Elf/OGRE middleware
    development project at NCSA
  • FNAL uses globus-url-copy, because theres an
    incompatibility with Trebuchet listing
  • Metadata in the DES db encode the grid-ftp
    technology as well as combinations of buffer
    sizes, number of parallel streams, etc for moving
    large and small files
  • Recent test by Greg Daues achieved 100MB/s for
    single copy Typically weve combined 5 or 6
    copies in parallel to achieve total data movement
    off Mercury of about 50MB/s

8
Archive Portal https//des.cosmology.uiuc.edu909
3/des/ You will be redirected to NVO Login
9
Archive Portal Image Query
10
DC2 Overview
  • Transferred 10 nights of simulated data from FNAL
    Enstore
  • Roughly 3000 DECam exposures 500 deg2 in griz 4
    layers deep plus 50 flats/biases each night
  • Currently Processed 8 of 10 nights
  • Use Convert_Ingest pipeline to split data
    crosstalk corr in this stage
  • Typically 20 jobs, each running a couple of hours
  • Raw data are 600GB for each night
  • Submit 62 processing jobs for each of these
    nights
  • Each night produces 3.4TB, 35 million catalogued
    objects for ingestion
  • Each job takes around 11hrs 1 CPU-month to
    reduce a night of data
  • Stages zerocombine, flatcombine, imcorrect,
    astrometry, remapping, cataloguing, fitscombine,
    ingestion
  • Currently some jobs fail because of failures in
    astrometric refinement
  • Ingest objects into the db
  • Move data from processing platforms to storage
    cluster and mass storage
  • Then determine photometric solution for each
    band/night
  • Update zeropoints for all objects/images for that
    night
  • Total data production 4.8TB raw, 27TB reduced,
    240 million objects
  • Still to do complete processing, co-add all
    data, extract summary statistics

11
DC2 Challenges
  • Scale of data- almost overwhelming overwhelming
  • 330GB arrive 3.4TB produced by next day
  • Ingesting 35 million objects is a challenge--
    takes 10 hours if ingest rate is 1000 objects/s
  • Exploring sqlldr alternatives-- most come with a
    price
  • Moving processed data off compute nodes is a
    challenge- takes about 10 hours if transfer rate
    is 100MB/s
  • New data movement tools making this more reliable
    and automatic
  • Astrometry problems persist
  • With BCS data we find that astrometry errors are
    bad enough to produce double sources in a few
    percent of the images this translates to at
    least one failure per co-added image
  • Taking advice of Emmanuel Bertin to run SCAMP on
    a per exposure basis rather than a per image
    basis-- new astrometric refinement framework
    currently being tested

12
DC2 Photometry and Astrometry
  • Nightly spot checks-- no exhaustive testing so
    far
  • Astrometry scatter plots look much like DC1
  • Photometry scatter plots dont look as good, but
    we think we have figured out why
  • Diffraction spikes/halos added to stars in ImSim2
  • Done in such a way as to augment total stellar
    flux
  • This leads to an offset in our photometry at the
    few percent level
  • Detailed statistics await further testing
  • What is full distribution of astrometric and
    photometric errors?
  • How do both depend on seeing, location on the
    chip, intrinsic galaxy parameters, etc

13
Coaddition Framework
  • Three steps to coaddition
  • Remapping images to std reference frame
  • Determining relative flux scale for overlapping
    remapped images
  • Combining remapped images (with filtering)
  • DES DM enables a simple automated coadd
  • Coadd tiling stored as metadata in the db
  • db tools
  • find all tiles associated with image
  • find all images associated with tile
  • Execution
  • Reduced images immediately remapped (SWarp) to
    each tile they overlap (and catalogued)
  • Flux scales determined through (1) db object
    matching in overlapping images, (2) photometric
    calibration and (3) relative throughput of chips
    1-62
  • Image combine (SWarp) happens en masse using
    archive to find correct image combinations

14
BCS Coadd Tests
g (2 deep)
i (3 deep)
  • Test framework by creating 46 coadd tiles that
    draw images from 10 different nights
  • griz, 36X36 with 0.26 pixels
  • lt1hr job on server with 14 drive RAID5 disk array
  • Issues
  • Flux scaling ignored
  • Combine algorithm sum
  • Science quality?
  • Some astrometry failures (double sources)

4
z (3 deep)
r (2 deep)
15
Weak Lensing FrameworkMike Jarvis, Bhuv Jain,
Gary Bernstein, Erin Sheldon
  • Science Strategy
  • start from complete object lists and measure
    shear for each object jointly using all available
    reduced data
  • Draft DES DM strategy
  • Measure shapes of all objects on reduced images
    as part of standard reduction and cataloguing
  • Use isolated stars to model PSF distortions
    across the survey
  • Catalog on coadded images to create complete
    object lists
  • Use archive tools to select all reduced objects
    (and images) for joint shear measurements that
    include PSF corrections
  • Implementation just in infancy
  • Shape measurements one more module for pipeline,
    db schema change
  • Modeling PSF distortions computational (not
    data) challenge
  • Complete object lists Coadd catalogs already
    available in db
  • Final shear measurements a data challenge
  • Apply data parallel approach grouping by sky
    coordinates (coadd tiling)
Write a Comment
User Comments (0)
About PowerShow.com