Title: Astrophysics%20on%20the%20OSG%20(LIGO,%20SDSS,%20DES)%20Kent%20Blackburn%20LIGO%20Laboratory%20California%20Institute%20of%20Technology
1Astrophysics on the OSG(LIGO, SDSS, DES)Kent
BlackburnLIGO LaboratoryCalifornia Institute of
Technology
- Open Science Grid Consortium Meeting
- University of Florida
- January 23, 2006
2Outline and Contributors
- LIGO on the OSG
- Kent Blackburn, Duncan Brown, Albert Lazzarini,
David Meyers - SDSS, NEO DES on the OSG
- Nickolai Kuropatkin, Neha Sharma, Chris
Stoughton, James Annis, Steve Kent
3Gravitational Wave Physics on the OSG
- Laser Interferometer Gravitational wave
Observatory (LIGO) - LIGO Scientific Collaboration (LSC)
4LIGO on the Open Science Grid
- Search for Gravitaitional Waves
- Hanford, WA
- Livingston, LA
- Plus GEO, TAMA and VIRGO
- LIGO Scientific Collaboration
- 40 Institutions worldwide
- 400 individuals contributing
- LIGO Data Grid (LDG)
- Nine Grid Sites
- Over 2000 CPUs
- Multi-Petabyte Data Archive at Caltech
- Scientific Data Collection grouped into temporal
Science Runs - Currently In Science Run 5
- Goal to collect one year plus of design
sensitivity data - One Terabyte of data each day
- Analysis carried out primarily on the LIGO Data
Grid (LDG) - Stepping out onto the OSG
- http//www.ligo.caltech.edu
5LIGO Data Analysis Classifications
- Principle Classifications of Searches
- Binary Inspiral (Neutron Stars Black Holes)
- Consumes bulk of LIGO Data Grid resources
- Burst (Supernovae and other Unmodeled Events)
- Coincidence between different data streams
necessary - Stochastic Background (Similar to the CMB)
- Computationally least demanding but requires
cross correlation - Periodic (Pulsars, Rotating Neutron Stars)
- Signal sinusoidal in reference frame of source
- All Sky Survey could promote Global Warming
(Order 1020 FLOPS) - Binary Inspiral Search selected for initial
adoption onto the OSG - Workflow well suited to Open Science Grid
- Already using a similar set of Grid Technologies
within LIGO Data Grid - Simple parametric parallelization of algorithms
- Optimal filtering of data against tens of
thousands of waveforms - Computationally demanding but interesting on the
scale of the OSG - Expect other searches to follow once OSG
trailblazing work done
6Binary Inspiral Search Experiences on the Open
Science Grid
- First attempt at July, 2005 OSG Consortium
Meeting in Milwaukee, Wisconsin - Unsuccessful at submitting a binary inspiral
workflow at any OSG site - Authentication was primary reason for failures
(LIGO VO not part of 0.2.1) - Other issues discovered with the version of VDS
distributed in 0.2.1 - First successful completion of a binary inspiral
workflow October 1st, 2005 on LIGOs OSG
Integration Testbed Cluster at Caltech - Eight Node Dual CPU cluster with two terabytes of
disk space - Running a patched version of VDS on top of OSG
0.2.1 - Used a test workflow involved 38 GBs of LIGO
Data and workflows with about 700 DAG nodes. - Followed up by running at LIGOs OSG Productions
sites at PSU(PBS) and UWM(Condor) (once VDS patch
applied at each) - Collaborated with several CMS resources to
further test outside LIGOs VO - Worked with clusters at San Diego, Nebraska and
Caltech - All clusters added LIGOs VOMS to allow
authentication - Updated OSG 0.2.1 with VDS patches
- Mixed results do to size of LIGO data sets
transferred for this test workflow - Worked with Deployment and Integration Teams to
assure LIGOs functional requirements appeared in
the OSG 0.4 software stack (just announced!)
7Greatly Simplified LIGO DAG
8LIGOs Next Move on the OSG
- The OSG 0.4.0 release should greatly improve the
OSG for LIGOs Binary Inspiral Workflow - A workflow geared toward actually conducting a
scientific study would involve at least 16000 DAG
nodes and close to two terabytes of data. - Recent OSG motivated activities in LIGO have
produced a nearly 101 reduction is data through
improved data selection and compression - Need to develop more flexible workflows that
dont challenge the limited data storage
resources typical of a present day OSG site - Pegasus is used to construct concrete DAGS from
abstract DAX workflows - Flexibility here to recognize and adapt to OSG
site specifics could facilitate greater
utilization of the OSG as an abstract Grid - Develop ability to benefit from Storage Resource
Management - Typical LIGO data analyses benefit from being
able to repeat the analysis on the same data set
with improved calibration and selection criteria - LIGO is currently bringing up an SE on our local
ITB cluster at Caltech to experiment with SRM
9Astronomy on the OSG
- Sloan Digital Sky Survey (SDSS)
- Experimental Astronomy Group (EAG)
- Fermi National Accelerator Laboratory
10Near Earth Objects
- Near Earth Objects (NEOs)
- Comets and Asteroids nudged by the gravitational
attraction of planets into orbits that pass by
the Earth's neighborhood - Composed of water ice and dust, formed early in
the history of the Solar System - The scientific interest in comets and asteroids
is due to their being remnants of the early solar
system the interest in NEO is their potential
for hitting the earth - 37 Near Earth Object candidates are identified in
the SDSS imaging data - Apparent magnitudes r19 21 and proper motions
of 1.3 to 18 degrees per day - The earth collision rate for this population
(size greater than 20 m) is estimated to be one
per century
11How to find Near Earth Objects
12NEO Workflow
13NEO Job Statistics
Total Jobs 180 Total Input Data 91801620
GB Total Output Data121802160 K
14Quasar Spectra Fitting using SDSS
- Quasars are super massive black holes. Swirling
clouds of gas and plasma falling into a black
hole glowing at many different wavelengths. We
measure the spectrum of the light to measure the
properties of each quasar. - The SDSS provides us with 50,000 quasar spectra.
We make fits to these spectra that include the
following components - Power-law continuum, decreasing as e-l
- A Balmer continuum due to ionized Hydrogen, with
a characteristic bump from 2000 to 4000 Angstroms
- Strong emission lines from ionized gas, such as
Hydrogen, Nitrogen, Oxygen, and Magnesium - Many faint emission lines from Iron
- Starlight from the galaxy that surrounds the
quasar
15Example Quasar Spectrum with Fit
16Quasar Fit Production Science using the Generic
Grid Gofer (GGG)
- All jobs are stored in jobs table.
- Available grid sites are stored in pool table
- Job Manager takes jobs from the database, creates
Condor DAG files and submits them to sites from
the pool in an automatic mode. - Two main parts Job Manager and DAG Creator
- All completed stages of a job are recorded in the
database together with submission time and
execution time
17Workflow in Generic Grid Gofer
Nickolai Kuropatkin
18Astronomy Experiences on the Grid
Spectra CPU Intensive NEO DataCPU Intensive
Grid Match Ideal for Grid Grid not very happy
Total No. of Jobs 50000 180
Data Input/Job 1 Megabytes 9 Gigabytes
Data Output/Job 2 Megabytes 12 Kilobytes
Avg. Rate of Job Completion 800-1200 per day 10-15 per day ?
- Experience tells us that Grid is more suitable
for CPU Intensive Jobs - achieve parallelism
- more jobs
- finish sooner
- Running locally would limit the number of jobs
run simultaneously - On OSG, can run several run-rerun and camcols
within a run-rerun in parallel - Current Workflow also will facilitate further
analysis
19Future Grid Projects in Astronomy
- In the coming year 2005-2006 Experimental
Astrophysics Group ( EAG) has 4 projects planned
for the Open Science Grid - The Simulation effort for the Dark Energy Survey
(DES) - Genetic algorithm fitting of Sloan Digital Sky
Survey (SDSS) Quasar Spectra - Search for Near Earth Asteroids (NEOs) in the
SDSS Imaging data - The Co-addition of the SDSS Southern Stripe