The Condor DB Group Report - PowerPoint PPT Presentation

About This Presentation
Title:

The Condor DB Group Report

Description:

The Condor DB Group Report Jiansheng Huang, Ameet Kini, Shrinivas Lakshmikant, Erik Paulson, Christine Reilly, Eric Robinson, Srinath Shankar, David DeWitt, – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 27
Provided by: pradeep
Category:

less

Transcript and Presenter's Notes

Title: The Condor DB Group Report


1
The Condor DB Group Report
  • Jiansheng Huang, Ameet Kini, Shrinivas
  • Lakshmikant, Erik Paulson, Christine Reilly,
  • Eric Robinson, Srinath Shankar, David DeWitt,
  • Jeff Naughton

2
Overview
  • General overview of group projects (Naughton).
  • Quill (Paulson).

3
Condor DB Group
  • Overall task
  • Focus on data management aspects of Condor
  • Deliver prototypes of useful technology
  • Explore, develop and evaluate technology that may
    be useful to Condor down the road.

4
Projects other than Quill
  • Provenance in a Condor System.
  • Statistical mining of log data to evaluate system
    health.
  • Interaction of user data placement, caching, and
    workflow job scheduling.
  • Job-machine matching in DB context.
  • Condor functionality based on App-Server
    technology.
  • Recency and consistency in captured data.

5
Provenance and Condor
  • Christine Reilly (chrisr_at_cs.wisc.edu).
  • Provenance information on how data was produced.
  • Observation for each user job, Condor can
    record
  • Which version of program(s) was used
  • Which version of data was used
  • When it was produced
  • What system it ran on (hardware, software.)
  • Questions
  • How much information should we gather?
  • How much burden should we place on the system
    designer, application programmer, or both?

6
Debugging through log mining
  • Srinivas Lakshmikant (pachu_at_cs.wisc.edu)
  • Idea
  • Record events, logically associated with
    entities.
  • E.g., job entities start, get scheduled, run,
    terminate.
  • Find which entities have infrequent events.
  • Find which entities lack frequent events.
  • Can you use this to detect problems?
  • Early results suggest yes finds and pinpoints
    problems that might not be found otherwise.
  • How can you increase the accuracy and efficiency
    over naïve approaches?

7
Caching,Scheduling,Workflow
  • Srinath Shankar (srinath_at_cs.wisc.edu)
  • Idea
  • Cache input files and intermediate files on disks
    of pool machines
  • Record where these files are cached
  • Schedule tasks in a workflow to minimize data
    fetches/moves
  • Result potentially much greater throughput.

8
Job Matching in a DBMS
  • Ameet Kini (akini_at_cs.wisc.edu)
  • Idea matching looks a lot like a DBMS join.
  • If machine and job data are already stored in a
    DBMS, can we or should we use the DBMS to do the
    matching?
  • Answer early results are promising but this is a
    non-trivial problem.

9
Recency of Quill Data
  • Jiansheng Huang (jhuang_at_cs.wisc.edu.)
  • Problem daemons report in at uncontrollable and
    unpredictable times.
  • Result out of date and inconsistent data set.
  • Can we provide the user with a concise
    characterization of the recency of the sources
    relevant to a user query?
  • Note surprisingly non-trivial to define what we
    mean by relevant in this setting.

10
App. Servers and Condor
  • Eric Robinson (erobinso_at_cs.wisc.edu)
  • Idea applications servers provide a lot of
    technology that appears useful in a Condor
    setting.
  • Approach build prototype of some Condor
    functionality using these tools, evaluate the
    approach.

11
Moving on
  • Further questions on these projects? Best bet is
    to contact student listed on each slide.
  • On to Quill portion of talk.

12
The Condor Quill
The Quill Developers
  • Give me a condor's quill! Give me Vesuvius'
    crater for an ink stand. Friends, hold my arms!
    For in the mere act of penning my thoughts of
    this Leviathan, they weary me. . . To produce a
    mighty book you must choose a mighty theme.
  • -Melville, Moby Dick

13
What is Quill?
  • A non-invasive method of storing a read-only
    version of the Condor operational data in a
    relational database.

14
Quill In pictures
Disk
With Quill
Without Quill
15
Quill Where weve been
  • First shipped in 6.7.11 (Sept 05)
  • Now over the fence Condor Team is driving the
    6.8 version
  • Response from users very helpful!
  • Lessons learned
  • Passive collection good
  • DBMSes are full of surprises

16
Quill Where wed like to be
  • Shared databases
  • Better job data
  • Data from non-job sources
  • More than just PostgreSQL DBMS
  • Examples of usage

17
Quill in Condor 6.9.3
  • Development effort mostly complete
  • Previous bullet points addressed ?
  • Migration path for historical job data
  • Out of the box changes for Quill users
  • Horizontal and vertical schema for active jobs
  • Jobs from multiple schedds in one database
  • By default, no new historical data stored

18
Example tables
ScheddName Cluster Proc Owner JobStatus JobPrio Universe
north.cs.wisc.edu 23 2 epaulson IDLE 10 Vanilla
north.cs.wisc.edu 23 3 epaulson IDLE 10 Vanilla
south.cs.wisc.edu 13 2 jhuang RUN 5 Grid
north.cs.wisc.edu 13 2 miron HELD 30 Standard
Horizontal Job Table
ScheddName Cluster Proc Attr Value
north.cs.wisc.edu 23 2 WantIO TRUE
north.cs.wisc.edu 23 2 Group Database
north.cs.wisc.edu 23 3 Group Condor
south.cs.wisc.edu 13 2 Group Condor
Vertical Job Table
19
More job information
  • The lifecycle of the job would be nice to have
  • Events like those in the user log
  • But, need more info than whats in the job queue
  • Passive data collection works

20
Quill 6.9.3 diagram
Disk
  • Schedd writes events to the new Event log,
    Quill daemon passively picks up the events and
    inserts them into the database.
  • For the schedd, event log contains userlog
    events and job history events

21
Examples
  • Show me all the jobs that exited with a segfault
    that at some point ran on this machine
  • When my jobs get preempted, how long until they
    get matched again?
  • What is the average runtime for jobs for each
    different type of input file
  • SQL GROUP by

22
Collecting non-job information
Disk
23
New information stored
  • StartD Machine status
  • Negotiator Matches made
  • Starter/Shadow Files transferred
  • Collector Submitter ads
  • All daemons Generic Events, daemon ads

24
The DBMSD
  • New daemon responsible for database housekeeping
  • Only one needed per DBMS
  • Purges old data
  • Three classes, independent thresholds
  • Resource Machine classads
  • Run matches, job log events
  • Job condor_history information
  • Estimates size of database
  • Soft quota, warn when exceeded

25
Multiple DBMS systems
  • Oracle supported
  • Appears to need less maintenance
  • A nearly unified schema
  • Main difference is large text fields
  • Same binaries, DBMS type selectable via
    configuration file

26
Example Usage
  • PHP web front end
  • Good enough for some people
  • Or, use as the basis for your own system
  • BoF on Thursday at 1100am
  • Well use the web front end to explain the
    information Quill now stores
Write a Comment
User Comments (0)
About PowerShow.com