Title: USQCD ILDG Status
1USQCD ILDG Status
- Bálint Joó (bjoo_at_jlab.org)
- Jefferson Lab, Newport News, VA
- given at
- ILDG 8 Virtual Meeting
- May 11, 2006
1
2Main Focus Since Last Meeting
- Metadata Catalogue Implementation
- Mark up example data MILC Ensembles from Gauge
Connection - Intense discussions on QCDML Markup on MDWG
- Circling discussions on MDC functionality in
MWWG, - Raw query, added in Tsukuba, now removed
- MDC_NO_DATA, or just return 0 results?
- How to ask for ALL results to a Query?
- Learning to Program Web Services with Axis
2
3Main Result Prototype MDC Implementation
- Marked up 11 MILC Ensembles QCDML 1.3 March
- Each one containing O(10) configs only, for
demonstration. - Store Data in eXist Native XML Database
- Implemented MDC Service
- http//www.usqcd.org/mdc-service/services/ILDGMDCS
ervice - Implemented 2 kinds of client
- Command Line Java Applications
- Server Side Client Web based Application
3
4MDC Architecture
JLab Firewall
4
5MDC Service Functionality
- doEnsembleURIQuery(format,query,start,max)
- Returns markovChainURIs of matching ensemble IDs
- doConfigurationLFNQuery(format,query,start,max)
- Returns dataLFNs of matching config. IDs
- getEnsembleMetadata(mcURI)
- Returns ensemble metadata ID document
- getConfigurationMetadata(dataLFN)
- Returns config ID document.
5
6MDC Functionality (continued)
- getMDCInfo()
- Returns information about MDC Implementation
- version of MDC standard to which service is
compliant - Query Types accepted (eg XQuery, XPath)
- Collaboration information Name, URL of web page
- doRawQuery(format,query,start,max_results)
- USQCD extension Allow raw XPath, XQuery to
database. - Can be useful in server side clients, debugging
etc.
6
7Server Side Web Client
XSLT Stylesheet
HTML Page User Input
HTML Page ensemble list
click button
send MDC request
render result to HTML
Ensemble URI Query
MDC Service
Java Servlet
ensemle URI result
click ensemble
send MDC request
render result to HTML
HTML Page config list
config LFN Query
config LFN results
Java Servlet
- XSLT Stylesheet
- Counts Fermion Flavors
- Makes links clickable eg clicking LFN retreives
XML ID
http//www.usqcd.org/mdc-web-client
7
8Server Side MDC Client (cont'd)
- Prototype only
- could do with redesign
- Tested against (used to browse)
- USQCD Service, LatFor Service, UKQCD Test Service
works, modulo stylesheet rendering - Illustrates Web Technology
- Web service requests/responses made by servlet
- MDC results rendered with XSLT XSLT is
namespace specific only renders QCDML 1.3
8
9Other Middleware Progress
- VOMS not officially part of middleware yet but
- Fermilab runs VOMS server
- there is an lqcd VO there
- Need to be member of lqcd VO to access data at
FNAL with grid tools (d-Cache and SRM v1.x) - Replica Catalogue Next on list
- Revive old Jlab implementation
- Read only initially
- load with MILC data to back MDC prototype
9
10What Data is Shared in the US
- DWF data at BNL QCDSP data has been made public
(Nf 2, V163x32, Ls12, m0.02,3,4, M1.8) - BNL provides its own archive HTTP (wget) access
- http//lattices.qcdoc.bnl.gov
- Can browse params of non public production there
10
11Existing and planned MILC ensembles
- MILC Data available immediately as its produced
- Configurations at NERSC and FNAL
- Data Sets outside of GaugeConnection not in MDC
yet - Access through http (NERSC) and D-cache (FNAL)
11
12Plans for more mark-up
- DWF at BNL
- Working with Enno Scholz at BNL on marking up
publicly available DWF configs and adding to MDC - Provide DWF template ensmble and config mark up
for publicly available datasets (only 3 of these) - Co work with Chris Maynard in UKQCD for this
- Then hand over to Enno for remaining ones (if
initial mark up is judged a success)
12
13Plans for more mark up
- There is now plenty of prototype mark up for MILC
and improved staggered data - Hand off task of marking up to MILC
- Share scripts, stylesheets and expertise
13
14Marking Up MILC Configs My Own Experience
- Ensembles
- Python Script
- MILC filename internal lookup table gt Simple
XML
lt?xml version'1.0' encoding'UTF-8'?gt ltMILCEnsemb
legt ltsizegt ltXgt20lt/Xgt ltYgt20lt/Ygt
ltZgt20lt/Zgt ltTgt64lt/Tgt lt/sizegt
ltbetagt6.81lt/betagt ltmarkovChaingtmc//USQCD/MILC/a
sqtad/2_plus_1_flavor/MILC_2064f21b681m030m050lt/ma
rkovChaingt ltmgtltMassgt0.030lt/MassgtltNfgt2lt/Nfgtlt/mgt
ltmgtltMassgt0.050lt/MassgtltNfgt1lt/Nfgtlt/mgt
ltu0gt0.8696lt/u0gt ltensembleLabelgtUSQCD_MILC_0.13fm
lt/ensembleLabelgt lt/MILCEnsemblegt
From Filename
From Table (paper email)
- Simple XML XSLT (with boilerplate data)gtQCDML
14
15Marking Up MILC Configs My own experience
- Configurations
- Most information from filename and ensemble info
- Plaquette and timestamp info from NERSC header
- CRC32 checksum
- Download config
- Convert to ILDG format
- Extract data record
- Compute checksum
- Produce Simple Configuration XML
15
16Marking up the MILC Configs
lt?xml version'1.0' encoding'UTF-8'?gt ltMILCConfig
gt ltdataLFNgtlfn//USQCD/MILC/quenched/MILC_2064f0
b800/series_0/u_MILC_l2064f0b800.480lt/dataLFNgt
ltdataTURLgthttp//qcd-dmz.nersc.gov/bin/getlat?MILC
/2064f0b800/u_MILC_l2064f0b800.480lt/dataTURLgt
ltseriesgt0lt/seriesgt ltupdategt480lt/updategt
ltcrcChecksumgt2180224234lt/crcChecksumgt
ltmarkovChainURIgtmc//USQCD/MILC/quenched/MILC_2064
f0b800lt/markovChainURIgt ltNERSCChecksumgt5554528elt
/NERSCChecksumgt ltNERSCLinkTracegt0.0000264118lt/NE
RSCLinkTracegt ltPlaquettegt0.6214973319lt/Plaquette
gt lttimestampgt2003-03-22T102247-0800lt/timestam
pgt lt/MILCConfiggt
- Simple Config XML contains NERSC URL
- for setting up Initial Replica Catalog
- dataLFN chosen by fiat
- info gleaned from header and user input table
16
17Marking up the MILC Configs
- Process is SLOW, CLUMSY and ERROR PRONE
- SLOW Must download and process every config.
- About 5 minutes for every 203x64config
- 400-500 configs per ensemble work it out...
- CLUMSY Need to retune scripts
- ERROR PRONE Boiler plate info may need to change
- crc32Checksum is misleading if file is converted
to ILDG, checksummed for metadata and then
discarded since different conversion may produce
different checksum
17
18Biggest Burdens for Full ILDG Compliance
- Metadata for legacy data
- plaquette and crcChecksum need to download
every legacy config - Metadata during production
- crcChecksum - pain to compute during simulation
-gt post processing new data becomes legacy data - ILDG file format
- some codes still don't produce it
- conversion for legacy data needs (disk) space
and time
18