Title: The%20Condor%20DB%20Group%20Report
1The Condor DB Group Report
- Jiansheng Huang, Ameet Kini, Shrinivas
- Lakshmikant, Erik Paulson, Christine Reilly,
- Eric Robinson, Srinath Shankar, David DeWitt,
- Jeff Naughton
2Overview
- General overview of group projects (Naughton).
- Quill (Paulson).
3Condor DB Group
- Overall task
- Focus on data management aspects of Condor
- Deliver prototypes of useful technology
- Explore, develop and evaluate technology that may
be useful to Condor down the road.
4Projects other than Quill
- Provenance in a Condor System.
- Statistical mining of log data to evaluate system
health. - Interaction of user data placement, caching, and
workflow job scheduling. - Job-machine matching in DB context.
- Condor functionality based on App-Server
technology. - Recency and consistency in captured data.
5Provenance and Condor
- Christine Reilly (chrisr_at_cs.wisc.edu).
- Provenance information on how data was produced.
- Observation for each user job, Condor can
record - Which version of program(s) was used
- Which version of data was used
- When it was produced
- What system it ran on (hardware, software.)
- Questions
- How much information should we gather?
- How much burden should we place on the system
designer, application programmer, or both?
6Debugging through log mining
- Srinivas Lakshmikant (pachu_at_cs.wisc.edu)
- Idea
- Record events, logically associated with
entities. - E.g., job entities start, get scheduled, run,
terminate. - Find which entities have infrequent events.
- Find which entities lack frequent events.
- Can you use this to detect problems?
- Early results suggest yes finds and pinpoints
problems that might not be found otherwise. - How can you increase the accuracy and efficiency
over naïve approaches?
7Caching,Scheduling,Workflow
- Srinath Shankar (srinath_at_cs.wisc.edu)
- Idea
- Cache input files and intermediate files on disks
of pool machines - Record where these files are cached
- Schedule tasks in a workflow to minimize data
fetches/moves - Result potentially much greater throughput.
8Job Matching in a DBMS
- Ameet Kini (akini_at_cs.wisc.edu)
- Idea matching looks a lot like a DBMS join.
- If machine and job data are already stored in a
DBMS, can we or should we use the DBMS to do the
matching? - Answer early results are promising but this is a
non-trivial problem.
9Recency of Quill Data
- Jiansheng Huang (jhuang_at_cs.wisc.edu.)
- Problem daemons report in at uncontrollable and
unpredictable times. - Result out of date and inconsistent data set.
- Can we provide the user with a concise
characterization of the recency of the sources
relevant to a user query? - Note surprisingly non-trivial to define what we
mean by relevant in this setting.
10App. Servers and Condor
- Eric Robinson (erobinso_at_cs.wisc.edu)
- Idea applications servers provide a lot of
technology that appears useful in a Condor
setting. - Approach build prototype of some Condor
functionality using these tools, evaluate the
approach.
11Moving on
- Further questions on these projects? Best bet is
to contact student listed on each slide. - On to Quill portion of talk.
12The Condor Quill
The Quill Developers
- Give me a condor's quill! Give me Vesuvius'
crater for an ink stand. Friends, hold my arms!
For in the mere act of penning my thoughts of
this Leviathan, they weary me. . . To produce a
mighty book you must choose a mighty theme. - -Melville, Moby Dick
13What is Quill?
- A non-invasive method of storing a read-only
version of the Condor operational data in a
relational database.
14Quill In pictures
Disk
With Quill
Without Quill
15Quill Where weve been
- First shipped in 6.7.11 (Sept 05)
- Now over the fence Condor Team is driving the
6.8 version - Response from users very helpful!
- Lessons learned
- Passive collection good
- DBMSes are full of surprises
16Quill Where wed like to be
- Shared databases
- Better job data
- Data from non-job sources
- More than just PostgreSQL DBMS
- Examples of usage
17Quill in Condor 6.9.3
- Development effort mostly complete
- Previous bullet points addressed ?
- Migration path for historical job data
- Out of the box changes for Quill users
- Horizontal and vertical schema for active jobs
- Jobs from multiple schedds in one database
- By default, no new historical data stored
18Example tables
ScheddName Cluster Proc Owner JobStatus JobPrio Universe
north.cs.wisc.edu 23 2 epaulson IDLE 10 Vanilla
north.cs.wisc.edu 23 3 epaulson IDLE 10 Vanilla
south.cs.wisc.edu 13 2 jhuang RUN 5 Grid
north.cs.wisc.edu 13 2 miron HELD 30 Standard
Horizontal Job Table
ScheddName Cluster Proc Attr Value
north.cs.wisc.edu 23 2 WantIO TRUE
north.cs.wisc.edu 23 2 Group Database
north.cs.wisc.edu 23 3 Group Condor
south.cs.wisc.edu 13 2 Group Condor
Vertical Job Table
19More job information
- The lifecycle of the job would be nice to have
- Events like those in the user log
- But, need more info than whats in the job queue
- Passive data collection works
20Quill 6.9.3 diagram
Disk
- Schedd writes events to the new Event log,
Quill daemon passively picks up the events and
inserts them into the database. - For the schedd, event log contains userlog
events and job history events
21Examples
- Show me all the jobs that exited with a segfault
that at some point ran on this machine - When my jobs get preempted, how long until they
get matched again? - What is the average runtime for jobs for each
different type of input file - SQL GROUP by
22Collecting non-job information
Disk
23New information stored
- StartD Machine status
- Negotiator Matches made
- Starter/Shadow Files transferred
- Collector Submitter ads
- All daemons Generic Events, daemon ads
24The DBMSD
- New daemon responsible for database housekeeping
- Only one needed per DBMS
- Purges old data
- Three classes, independent thresholds
- Resource Machine classads
- Run matches, job log events
- Job condor_history information
- Estimates size of database
- Soft quota, warn when exceeded
25Multiple DBMS systems
- Oracle supported
- Appears to need less maintenance
- A nearly unified schema
- Main difference is large text fields
- Same binaries, DBMS type selectable via
configuration file
26Example Usage
- PHP web front end
- Good enough for some people
- Or, use as the basis for your own system
- BoF on Thursday at 1100am
- Well use the web front end to explain the
information Quill now stores