Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review

1 / 221
About This Presentation
Title:

Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review

Description:

PanSTARRS PS1 Published Science Products Subsystem Critical Design Review – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 222
Provided by: jimh156

less

Transcript and Presenter's Notes

Title: Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review


1
Pan-STARRS PS1 Published Science Products
Subsystem Critical Design Review
  • November 5-6, 2007
  • Honolulu

2
Welcome
  • Welcome to this Critical Design Review for
    the Pan-STARRS 1 (PS1) Published Science Products
    Subsystem.
  • To those who have agreed to serve on the
    review committee, we want to thank you for making
    your valuable time available to assess the
    progress of the team developing the PSPS
    component of the PS1 observatory.

3
Introductions
  • CDR Review Panel
  • Dr. Will Burgett, Pan-STARRS Project Manager
    panel chair
  • Dr. Bruce Berriman, IPAC
  • Dr. Andrew Connolly, University of Washington
  • Dr. Roc Cutri, IPAC

4
Introductions
  • IfA
  • Dr. Jim Heasley, IfA, Pan-STARRS, PSPS subsystem
    lead
  • JHU
  • Dr. Alex Szalay, ODM subsystem lead
  • Ms. Maria Nieto-Santisteban, system software
    architect
  • Dr. Ani Thakar, research associate
  • Mr. Jan vandenBerg, system hardware architect

5
Introductions
  • Referentia
  • Mr. Matt Shawver, DRL subsystem lead
  • Mr. Jay Knight, software engineer
  • Mr. Chris Richmond, software engineer
  • Mr. Kenn Yuen, quality control engineer

6
Introductions
  • Other Individuals Attending
  • Dr. Nick Kaiser, IfA, Pan-STARRS PI
  • Dr. Ken Chambers, IfA, PS1 Project Scientist
  • Mr. Larry Denneau, IfA, MOPS software engineer
  • Mr. Erik Small, IfA, Pan-STARRS OTIS software
    engineer
  • Mr. Conrad Holmberg, IfA, Pan-STARRS PSPS
    software engineer (starting in January 2008)
  • Mr. Mike Maberry, IfA, Asst Director External
    Relations
  • Dr. Dave Monet, USNO Flagstaff
  • Mr. George Spix, Microsoft

7
Charter for the CDR
  • The Pan-STARRS project will construct its
    first telescope (PS1) at the former LURE site
    locate on the Haleakala summit on Maui as a
    prototype for the full four-telescope system
    (PS4) being developed by the University of
    Hawaiis Institute for Astronomy. As a
    prototype, the PS1 Published Science Products
    Subsystem (PSPS) may not meet all of the
    requirements for the full PS4 system, but part of
    its mission is to better understand how this full
    set of requirements can be achieved.
  • The intent of this critical design review
    is for the PSPS team to demonstrate to the Review
    Committee that
  • The elements of the PSPS baseline design have
    matured to the critical design stage,
  • The design will be reliable and maintainable,
  • The hardware choices are within the scope of the
    overall Project budget,
  • The test plan is well conceived to demonstrate
    requirements compliance,
  • Designs of major interfaces are also sufficiently
    mature,
  • Potential risk areas and risk mitigation
    strategies have been identified.
  • The Committee is asked to review the
    documentation, presentations, answers to specific
    questions, and other material in order to assess
    the status and likely progress of the PSPS team
    relative to the above items in moving forward
    toward PSPS implementation/integration into the
    PS1 system in the Summer 2008 time frame.
  • The Review Committee is asked to submit
    a report to the Pan-STARRS Project Management
    Office and the PSPS Subsystem Lead, Dr. James
    Heasley, no later than six weeks following the
    review with an assessment of the materials
    presented and recommendations for continued PS1
    PSPS development.

8
Meeting Focus
  • The presentations over today and tomorrow will
    focus on the detailed design of the PSPS, and how
    it meets the design requirements specified in the
    PS1 PSPS SRS/SCD (PSDC-630-00-02).
  • The PowerPoint slides prepared for this meeting
    are intended to stimulate real-time discussion of
    the design and the background materials provided
    to reviewers for their information prior to CDR.

9
Topics November 5 / Day 1
  • Welcome Introductions (Heasley)
  • CDR Committee Charter (Heasley)
  • PSPS System Concept (Heasley)
  • Response to the PDR Report from May 2006
    Changes in the PSPS Since PDR (Heasley)
  • Top Level PSPS Requirements (Heasley)
  • The Pan-STARRS Data Store Mechanism (Small)
  • The Solar System Data Manager (Denneau)
  • Lunch Break
  • The Object Data Manager (JHU team)
  • Adjourn for the day

10
Topics November 6 / Day 2
  • Welcome back (Heasley)
  • ODM continued (JHU team)
  • The PSPS Data Retrieval Layer (Referentia team)
  • Lunch
  • The Web Based Interface (Heasley)
  • Pan-STARRS External Interfaces (Heasley)
  • PSPS Test Plan (Heasley)
  • PSPS Schedule (Heasley)
  • System Level Risk Assessment (Heasley)
  • Concluding Remarks (Heasley)
  • Executive Session (Committee only)
  • Recap Meeting of Review Panel with Subsystem and
    component leads
  • Adjourn

11
What is the PSPS?The Conceptual Design
  • Jim Heasley

12
What is the PSPS?
  • The Published Science Products Subsystem of
    Pan-STARRS will
  • Provide access to the data products generated by
    the Pan-STARRS telescopes and data reduction
    pipelines
  • Provide a data archive for the Pan-STARRS data
    products
  • Provide adequate security to protect the
    integrity of the Pan-STARRS data products
    protect the operational systems from malicious
    attacks.

13
What is PS1 PSPS?
  • The PS1 telescope is being developed as a
    prototype for the full four-telescope (PS4)
    system.
  • Part of the PS1 PSPS mission is to better
    understand how the full set of requirements might
    be achieved in the future. We understand these
    requirements to include the entire process, from
    hardware design to responding to user queries.
  • The PS1 PSPS will provide access to the PS1 data
    to members of the PS1 Science Consortium

14
What is PSPS? A Refresher from the PS1 System
View
  • PS1 PSPS will not receive image files, which are
    retained by IPP
  • Three significant PS1 I/O threads
  • Responsible for managing the catalogs of digital
    data
  • Ingest of detections and initial celestial object
    data from IPP
  • Ingest of moving object data from MOPS
  • User queries of detection/object data records

15
What is PSPS? A Refresher from the PS1 PSPS View
  • Web Based Interface the link with the human
  • Data Retrieval Layer the gate-keeper of the
    data collections
  • PS1 data collection managers
  • Object Data Manager
  • Solar System Data Manager
  • Other (future/PS4) data collection managers
    e.g.,
  • Postage stamp cutouts
  • Metadata database (vice attributes managed in PS1
    ODM)
  • Cumulative sky image server
  • Filtered transient database (or other special
    clients)

ODM
SSDM
16
PSPS ComponentsOverview/Terminology
  • DRL Data Retrieval Layer
  • Software clients, not humans, are PDCs
  • Connects to DMs
  • PDC Published Data Client
  • WBI Web Based Interface
  • External PDCs (non-PSPS)
  • DM Data Manager (generic)
  • ODM Object Data Manager
  • IPP detections/ objects
  • SSDM Solar System Data Manager
  • MOPS orbits/ detections

17
Issues Raised at the PDR
  • Jim Heasley

18
Major Items Identified at PDR
  • The amount of prototyping conducted to date
    appears to be less than what will be necessary to
    mature the design from the preliminary to the
    critical design stages in the time interval
    envisioned for reaching CDR 6 months after PDR
  • The close coupling of the current design concept
    to Oracle could limit the flexibility for design
    alternatives should the design path that includes
    Oracle not prove feasible (for cost and schedule
    as well as technical reasons)
  • There appears to be some risk with remaining
    imprecise or incomplete requirements definition
    within the PSPS that could affect design
    evolution, This includes the IPP-PSPS interface
    that still requires further attention given the
    maturity level of the IPP, and its planned
    deployment schedule being significantly in
    advance of the PSPS.
  • The PSPS Lead should implement some additional
    management methods and tools as a mitigation
    strategy to minimize cost, schedule, and
    technical risk.

19
Other Items Identified at PDR
  • PSPS Cost commitment by the project
  • Managing Subsystem Software Development
  • Ingest Issues
  • Role of data crawlers
  • Operational Issues
  • Processing for tuning ingest correlation is not
    defined
  • Backup Recovery plan is needed
  • Reproducing results in a constantly changing
    database
  • Disk I/O and failure rate needs more
    consideration
  • Does PSPS provide real time alerts?

20
Other Items Identified at PDR
  • User Interface Issues
  • Requirements for the user interface are poorly
    defined. Concern that this is an area
    traditionally under-scoped. Consider an
    alternative to adopt an existing web interface in
    general use in another astronomical database
  • One reviewer suggested providing a web form
    interface to hide SQL details from the users
  • Concern that the number of users and processes
    scoped was unrealistic (on the low side).
  • Concern about the precision of the public
    interface, specifically, there may be
    requirements that flow back to the IPP to ensure
    that data are passed with sufficient precision to
    satisfy the public presentation. The reviewers
    felt that is unclear who is responsible for
    defining the public interface requirements,
    formats, etc.

21
Response to Changes Since PDR
  • Jim Heasley

22
Response to Changes Since PDR
  • Project Wide Management Issues
  • The project has demonstrated a real commitment to
    the PSPS along with a reasonable budget.
  • The PSPS has been declared a priority with the
    PS1 project.
  • Good working relationship with the PM(?)
  • ODM Architecture Decisions
  • SAIC proposal was deemed too expensive
  • Established a working relationship with JHU which
    has been formalized in a collaborative agreement.
    This is a different architecture that involves
    some risk. (Their work to date has been
    voluntary.)
  • PI also involved in the risk assessment relating
    to the ODM decision.

23
The ODM Architecture
Ingest
Query
RIGHT BRAIN
LEFT BRAIN
Publish
  • Single multi-processor machine
  • High performance storage
  • Objects
  • Staging
  • Ingest detections
  • Clustered small processor machines
  • High capacity storage
  • Published detections

24
The ODM Architecture
Ingest
Query
RIGHT BRAIN
LEFT BRAIN

Publish
  • Single multi-processor machine
  • High performance storage
  • Objects
  • Staging
  • Ingest detections
  • Clustered small processor machines
  • High capacity storage
  • Published detections

25
The ODM Architecture
  • The SAIC plan called for a large SMP machine for
    ingest an Oracle RAC for query processing.
  • We plan to keep the left-right brain split for
    ingest and query processing, but implement it on
    a different hardware architecture.
  • The new plan leverages the SDSS development,
    scaled out to a shared-nothing architecture.

26
Response to Changes Since PDR
  • The ODM design is too tightly tied to Oracle
  • Oracle is out along with the SAIC ODM plan, also
  • Oracle cost, even at 90 discount, was still too
    expensive.
  • Spatial model developed by JHU replaces the
    Oracle Spatial Module. This is a proven approach
    for general spatial queries in SDSS, and a
    powerful method for performing detection-object
    correlations for data ingest into the ODM.
  • But, are we replacing a dependency on Oracle with
    one on Microsoft? Microsoft pricing for academic
    use very reasonable. (At least theyre not IBM!)

27
Response to Changes Since PDR
  • Sharpening/Refining Subsystem Requirements
  • Problem sizing we (IPP PSPS) have converged
    on a common sizing estimate for the PS1
    Astrometric/Photmetric Survey (the major
    contributor to the ODM). These estimates are used
    for the numbers of objects and detections
    expected in the ODM.
  • WBI requirements were called vague (and
    justifiably so). However, rather than revise
    these I have chosen to follow the recommendation
    of the PDR panel and adopt existing client-side
    software to be supported through the API as well
    as developing a new menu driven interface.

28
Response to Changes Since PDR
  • Subsystem Management
  • Rapid Development (OK, I bought the book!)
  • Weekly telecons with the JHU team frequent
    face-to-face visits to JHU. Telecons were held
    with SAIC as well. The big difference is JHU
    seems to be more willing to listen.
  • Choices on software for several subcomponents
  • ODM will leverage development lessons learned
    during SDSS
  • WBI will make use of the SDSS web interface,
    software already developed for MOPS, and a
    Gator-like menu driven interface for the ODM.
  • SSDM will be a clone of the MOPS internal
    database.

29
Response to Changes Since PDR
  • Inadequate Prototyping?
  • We have targeted prototyping efforts (in the time
    available) at those items that have been deemed
    to have the greatest technical challenge
    highest risk.
  • WBI a new Gator-like interface has been
    developed.
  • DRL interfacing to the SDSS MyBestDR5 database
    to test JDBC interface
  • ODM
  • The ODM prototype already exceeds the size of any
    existing astronomical database!
  • Data ingest testing, specifically the
    detection-object correlation problem. Final
    configuration may still require more
    experimentation and testing with real data.
  • Scale out capability of the SDSS database.

30
Response to Changes Since PDR
  • Other Issues Raised at the PDR
  • Backup/Recovery/Replication Plan
  • We have added requirements for backup recovery
    and these will be addressed under the SSDM and
    ODM sections
  • I/O performance and failures
  • To be discussed during the ODM hardware
    presentation
  • User interfaces
  • Changes in this area already noted. However, it
    isnt clear that protecting the users from
    direct interaction with SQL is such a good thing
    (we hear LSST wants to do it), but it may
    severely limit the richness of queries users
    can pose. How does one develop an SQL educated
    user community if one hides it from them?

31
Response to Changes Since PDR
  • Other Issues Raised at the PDR (continued)
  • Reproducing old results in a database that is
    continually being updated.
  • We dont believe this is a big issue. The ODM
    logical schema contains the information necessary
    to do this.
  • We plan to save periodic snap shots of the
    objects table.
  • The role of PSPS in real time alerts
  • There is no problem. PSPS does NOT generate real
    time alerts.
  • Coordination of IPP and PSPS given the former is
    more mature
  • That may be, but there still needs to be
    collaborative adjustments as PS1 really is still
    a development system.

32
Response to Changes Since PDR
  • Other Issues Raised at the PDR (continued)
  • Software and hardware migration
  • PS1 has a 3.5 year mission lifetime. Lessons we
    learn from it will be applied to the design of
    the PSPS for PS4. We will be augmenting the
    hardware over the PS1 mission which will inform
    us how to best upgrade the system over time.
  • Role of data crawlers (and who codes/runs them)
  • With the creation of the PS1 Science Consortium
    its now possible to call on a pool of
    experienced astronomers to develop data
    applications, either crawlers or external codes,
    that will create value added information to be
    stored in or adjacent to the ODM. This is no
    longer considered to be a PSPS role We (PSPS)
    have several mechanisms for providing access to
    these data products within the context of the
    PSPS in general and ODM in particular.

33
Top Level Requirements for the PSPS
  • Jim Heasley

34
PSPS Top Level Requirements
  • 3.3.01 The PSPS shall be able to ingest a total
    of 1.5x1011 P2 detections, 8.3x1010 cumulative
    sky detections, and 5.5 x109 celestial objects
    together with their linkages.
  • Subsystems affected
  • Object Data Manager
  • Note these estimates are now adopted project
    wide from detailed estimates by the IPP team.

35
PSPS Top Level Requirements
  • 3.3.02 The PSPS shall be able to ingest the
    observational metadata for up to a total of
    1.1x1010 observations.
  • Subsystems affected
  • Object Data Manager
  • Note these estimates are now adopted project
    wide from detailed estimates by the IPP team.

36
PSPS Top Level Requirements
  • 3.3.0.3 The PS1 PSPS shall be capable of
    archiving up to 100 Terabytes of data.
  • Subsystems affected
  • Object Data Manager
  • Note these estimates are now adopted project
    wide from detailed estimates by the IPP team and
    database overhead estimates by the PSPS team

37
PSPS Top Level Requirements
  • 3.3.0.4 The PSPS shall archive the PS1 data
    products.
  • Subsystems affected
  • Object Data Manager
  • Solar System Data Manager

38
PSPS Top Level Requirements
  • 3.3.0.5 The PSPS shall possess a computer
    security system to protect potentially vulnerable
    subsystems from malicious external actions.
  • Note This includes providing layers of
    internet security to prevent unauthorized access
    to the PSPS data stores or other Pan-STARRS
    subsystems.
  • Subsystems affected
  • Object Data Manager
  • Solar System Data Manager
  • Data Retrieval Layer
  • Web-based Interface

39
PSPS Top Level Requirements
  • 3.3.0.6 The PSPS shall provide end-users access
    to detections of objects in the Pan-STARRS
    databases.
  • Subsystems affected
  • Web-based Interface
  • Data Retrieval Layer

40
PSPS Top Level Requirements
  • 3.3.0.7 The PSPS shall provide end-users access
    to the cumulative stationary sky images generated
    by the Pan-STARRS.
  • Subsystems affected
  • Web-based Interface
  • During PS1 operations, all cumulative sky
    images will be held and served by the IPP. We
    plan to provide a web interface to allow user
    access to the images and feed the requests to the
    IPP for service.

41
PSPS Top Level Requirements
  • 3.3.0.8 The PSPS shall provide end-users with
    metadata required to interpret the observational
    legacy and processing history of the Pan-STARRS
    data products.
  • Subsystems affected
  • Web-based Interface
  • Object Data Manager

42
PSPS Top Level Requirements
  • 3.3.0.9 The PSPS shall provide end-users with
    Pan-STARRS detections of objects in the Solar
    System for which attributes can be assigned.
  • Subsystems affected
  • Web-based Interface
  • Solar System Data Manager

43
PSPS Top Level Requirements
  • 3.3.0.10 The PSPS shall provide end-users with
    derived Solar System objects deduced from
    Pan-STARRS attributed observations and
    observations from other sources.
  • Subsystems affected
  • Web-based Interface
  • Solar System Data Manager

44
PSPS Top Level Requirements
  • 3.3.0.11 The PSPS shall provide the capability
    for end-users to construct queries to search the
    Pan-STARRS data products over space and time to
    examine magnitudes, colors, and proper motions.
  • Subsystems affected
  • Web-based Interface
  • Object Data Manager

45
PSPS Top Level Requirements
  • 3.3.0.12 The PSPS shall provide a mass storage
    system with a reliability requirement of 99.9
    (TBR).
  • Subsystems affected
  • Data Retrieval Layer
  • Object Data Manager
  • Solar System Data Manager

46
PSPS Top Level Requirements
  • 3.3.0.13 The PSPS baseline configuration should
    accommodate future additions of databases (i.e.,
    be expandable).
  • Subsystems affected
  • Web-based Interface
  • Data Retrieval Layer
  • Expansion of the databases will be needed to
    support value added products generated by the
    analysis of primary data by the PS1 Consortium
    partners.

47
The Pan-STARRS Data Store
  • Erik Small
  • Pan-STARRS OTIS Software Engineer

48
Data Store - Motivation
  • Driven by data exchange needs of PS subsystems
  • Storage for image data
  • Passing of metadata
  • Read-write and read-only access methods

49
Data Store - Design
  • Simple and standard
  • Read-write file system-level access on the data
    Producer side
  • Implemented by NFSv3
  • Read-only access on the data Consumer side
  • Implemented by HTTP
  • Essentially a web server with indexing and
    automatic file expiration

50
Data Store Block Diagram
51
Data Store - The PSPS Client
  • PSPS is a Data Store client
  • Pulls data from IPP / MOPS
  • Ingests into own database

52
Data Store - PS Subsystem Context
Data Stores
53
Data Store - PSPS Concerns
  • Data TTL expiry -- ensuring that no data is
    missed
  • Reporting of higher-level status back to
    originating subsystems
  • e.g. metadata was invalid

54
The Solar System Data Manager
  • Larry Denneau
  • MOPS Software Engineer

55
Programming Language
  • Perl embedded SQL
  • Perl database interface (DBI) and DBDMySQL
    driver
  • Object interface to data collections
  • HTMLMason web interface

56
Database Design
  • MySQL Version 5 database
  • Multi-CPU Linux server
  • High normalization
  • Efficiency monitoring
  • Complete derived object reconstruction

57
Database Design
  • MOPS database will be replicated in PSPS
  • InnoDB replication
  • Serves double-duty as MOPS backup

58
Replication
59
Storage Requirements
Data Collection Rows Size (b) Total (GB)
Fields 536K 300 0.16
Detections 1B 150 161
Tracklets 134M 280 75
Tracklet Attrib 134M 280 75
Derived Objects 4.3M 250 2.16
Derived Object Attrib 4.3M 250 2.16
Orbits 51M 1044 54.1
History/Efficiency 47M 100 4.75
Derived Objects 4.3M 50 0.22
Attribution 43M 50 2.16
Precovery 43M 50 2.16
Identification 4M 50 0.22
SSM 11M 228 2.51
Total Raw 304
Overhead (indexes, backup) 5X 1522
Total 1827
60
Table Layout
61
Database SURVEYS
  • Pan-STARRS survey mode types
  • e.g. SS, OP, MD

62
Database FILTERS
  • Pan-STARRS telescope filters
  • g, r, i, z, y, w

63
Database SSM
  • MOPS synthetic Solar System object definitions
  • Cometary orbital elements

64
Database SHAPES
  • Triaxial ellipsoid definitions for synthetic
    objects

65
Database FIELDS
  • IPP field metadata

66
Database DETECTIONS
  • IPP high-confidence (HC) detections
  • Aggregated by field

67
Database TRACKLETS
  • Intra-night linkages of detections
  • TRACKLETS contains summary data
  • TRACKLET_ATTRIB table manages DB associations

68
Database ORBITS
  • All orbits used by MOPS derived objects
  • IODs
  • Differentially-corrected orbits
  • New orbits supercede old

69
Database DERIVEDOBJECTS
  • Associations of inter-night linkages of tracklets
  • DERIVEDOBJECT table contains summary data
  • DERIVEDOBJECT_ATTRIB manages DB associations

70
Database PRECOVERY
  • Precovery to-do list of recently-modified derived
    objects

71
Database RUNTIME
  • Running history of MOPS operations, including
    operator requests and automatically-generated
    requests

72
Table Layout (Efficiency)
73
Database HISTORY
  • Hierarchical object representation
  • All derived object modifications have an
    associated HISTORY row
  • Derivation
  • Attribution
  • Precovery
  • Identification
  • Removal

74
The Object Data Manger System
  • The Johns Hopkins Team

75
Outline
  • ODM Overview
  • Critical Requirements Driving Design
  • Work Completed
  • Detailed Design
  • Spatial Querying AS
  • ODM Prototype MN
  • Hardware/Scalability JV
  • How Design Meets Requirements
  • WBS and Schedule
  • Issues/Risks
  • AS Alex, MN Maria, JV Jan

76
ODM Overview
  • The Object Data Manager will
  • Provide a scalable data archive for the
    Pan-STARRS data products
  • Provide query access to the data for Pan-STARRS
    users
  • Provide detailed usage tracking and logging

77
ODM Driving Requirements
  • Total size 100 TB,
  • 1.5 x 1011 P2 detections
  • 8.3x1010 P2 cumulative-sky (stack) detections
  • 5.5x109 celestial objects
  • Nominal daily rate (divide by 3.5x365)
  • P2 detections 120 Million/day
  • Stack detections 65 Million/day
  • Objects 4.3 Million/day
  • Cross-Match requirement 120 Million / 12 hrs
    2800 / s
  • DB size requirement
  • 25 TB / yr
  • 100 TB by of PS1 (3.5 yrs)

78
Work completed so far
  • Built a prototype
  • Scoped and built prototype hardware
  • Generated simulated data
  • 300M SDSS DR5 objects, 1.5B Galactic plane
    objects
  • Initial Load done Created 15 TB DB of simulated
    data
  • Largest astronomical DB in existence today
  • Partitioned the data correctly using Zones
    algorithm
  • Able to run simple queries on distributed DB
  • Demonstrated critical steps of incremental
    loading
  • It is fast enough
  • Cross-match gt 60k detections/sec
  • Required rate is 3k/sec

79
Detailed Design
  • Reuse SDSS software as much as possible
  • Data Transformation Layer (DX) Interface to IPP
  • Data Loading Pipeline (DLP)
  • Data Storage (DS)
  • Schema and Test Queries
  • Database Management System
  • Scalable Data Architecture
  • Hardware
  • Query Manager (QM CasJobs for prototype)

80
High-Level Organization
81
Detailed Design
  • Reuse SDSS software as much as possible
  • Data Transformation Layer (DX) Interface to IPP
  • Data Loading Pipeline (DLP)
  • Data Storage (DS)
  • Schema and Test Queries
  • Database Management System
  • Scalable Data Architecture
  • Hardware
  • Query Manager (QM CasJobs for prototype)

82
Data Transformation Layer (DX)
  • Based on SDSS sqlFits2CSV package
  • LINUX/C application
  • FITS reader driven off header files
  • Convert IPP FITS files to
  • ASCII CSV format for ingest (initially)
  • SQL Server native binary later (3x faster)
  • Follow the batch and ingest verification
    procedure described in ICD
  • 4-step batch verification
  • Notification and handling of broken publication
    cycle
  • Deposit CSV or Binary input files in directory
    structure
  • Create ready file in each batch directory
  • Stage input data on LINUX side as it comes in
    from IPP

83
DX Subtasks
DX
Initialization Job FITS schema FITS reader CSV
Converter CSV Writer
Batch Ingest Interface with IPP Naming
convention Uncompress batch Read batch Verify
Batch
Batch Verification Verify Manifest Verify FITS
Integrity Verify FITS Content Verify FITS
Data Handle Broken Cycle
Batch Conversion CSV Converter Binary
Converter batch_ready Interface with DLP
84
DX-DLP Interface
  • Directory structure on staging FS (LINUX)
  • Separate directory for each JobID_BatchID
  • Contains a batch_ready manifest file
  • Name, rows and destination table of each file
  • Contains one file per destination table in ODM
  • Objects, Detections, other tables
  • Creation of batch_ready file is signal to
    loader to ingest the batch
  • Batch size and frequency of ingest cycle TBD

85
Detailed Design
  • Reuse SDSS software as much as possible
  • Data Transformation Layer (DX) Interface to IPP
  • Data Loading Pipeline (DLP)
  • Data Storage (DS)
  • Schema and Test Queries
  • Database Management System
  • Scalable Data Architecture
  • Hardware
  • Query Manager (QM CasJobs for prototype)

86
Data Loading Pipeline (DLP)
  • sqlLoader SDSS data loading pipeline
  • Pseudo-automated workflow system
  • Loads, validates and publishes data
  • From CSV to SQL tables
  • Maintains a log of every step of loading
  • Managed from Load Monitor Web interface
  • Has been used to load every SDSS data release
  • EDR, DR1-6, 15 TB of data altogether
  • Most of it (since DR2) loaded incrementally
  • Kept many data errors from getting into database
  • Duplicate ObjIDs (symptom of other problems)
  • Data corruption (CSV format invaluable in
    catching this)

87
sqlLoader Design
  • Existing functionality
  • Shown for SDSS version
  • Workflow, distributed loading, Load Monitor
  • New functionality
  • Schema changes
  • Workflow changes
  • Incremental loading
  • Cross-match and partitioning

88
sqlLoader Workflow
  • Distributed design achieved with linked servers
    and SQL Server Agent
  • LOAD stage can be done in parallel by loading
    into temporary task databases
  • PUBLISH stage writes from task DBs to final DB
  • FINISH stage creates indices and auxiliary
    (derived) tables
  • Loading pipeline is a system of VB and SQL
    scripts, stored procedures and functions

89
Load Monitor Tasks Page
90
Load Monitor Active Tasks
91
Load Monitor Statistics Page
92
Load Monitor New Task(s)
93
Data Validation
  • Tests for data integrity and consistency
  • Scrubs data and finds problems in upstream
    pipelines
  • Most of the validation can be performed within
    the individual task DB (in parallel)

94
Distributed Loading
Samba-mounted CSV/Binary Files
Load Monitor
Master
LoadAdmin
Slave
Slave
LoadSupport
LoadSupport
LoadSupport
View of Master Schema
Task DB
Task DB
Task DB
Publish

Finish
95
Schema Changes
  • Schema in task and publish DBs is driven off a
    list of schema DDL files to execute (xschema.txt)
  • Requires replacing DDL files in schema/sql
    directory and updating xschema.txt with their
    names
  • PS1 schema DDL files have already been built
  • Index definitions have also been created
  • Metadata tables will be automatically generated
    using metadata scripts already in the loader

96
Workflow Changes
LOAD
  • Cross-Match and Partition steps will be added to
    the workflow
  • Cross-match will match detections to objects
  • Partition will horizontally partition data, move
    it to slice servers, and build DPVs on main

Export
Check CSVs
Create Task DBs
Build SQL Schema
Validate
XMatch
PUBLISH
Partition
97
Matching Detections with Objects
  • Algorithm described fully in prototype section
  • Stored procedures to cross-match detections will
    be part of the LOAD stage in loader pipeline
  • Vertical partition of Objects table kept on load
    server for matching with detections
  • Zones cross-match algorithm used to do 1 and 2
    matches
  • Detections with no matches saved in Orphans table

98
XMatch and Partition Data Flow
99
Detailed Design
  • Reuse SDSS software as much as possible
  • Data Transformation Layer (DX) Interface to IPP
  • Data Loading Pipeline (DLP)
  • Data Storage (DS)
  • Schema and Test Queries
  • Database Management System
  • Scalable Data Architecture
  • Hardware
  • Query Manager (QM CasJobs for prototype)

100
Data Storage Schema
101
PS1 Table Sizes Spreadsheet
102
PS1 Table Sizes - All Servers

Table Year 1 Year 2 Year 3 Year 3.5
Objects 4.63 4.63 4.61 4.59
StackPsfFits 5.08 10.16 15.20 17.76
StackToObj 1.84 3.68 5.56 6.46
StackModelFits 1.16 2.32 3.40 3.96
P2PsfFits 7.88 15.76 23.60 27.60
P2ToObj 2.65 5.31 8.00 9.35
Other Tables 3.41 6.94 10.52 12.67
Indexes 20 5.33 9.76 14.18 16.48
Total 31.98 58.56 85.07 98.87
Sizes are in TB
103
Data Storage Test Queries
  • Drawn from several sources
  • Initial set of SDSS 20 queries
  • SDSS SkyServer Sample Queries
  • Queries from PS scientists (Monet, Howell,
    Kaiser, Heasley)
  • Two objectives
  • Find potential holes/issues in schema
  • Serve as test queries
  • Test DBMS iintegrity
  • Test DBMS performance
  • Loaded into CasJobs (Query Manager) as sample
    queries for prototype

104
Data Storage DBMS
  • Microsoft SQL Server 2005
  • Relational DBMS with excellent query optimizer
  • Plus
  • Spherical/HTM (C library SQL glue)
  • Spatial index (Hierarchical Triangular Mesh)
  • Zones (SQL library)
  • Alternate spatial decomposition with dec zones
  • Many stored procedures and functions
  • From coordinate conversions to neighbor search
    functions
  • Self-extracting documentation (metadata) and
    diagnostics

105
Documentation and Diagnostics
106
Data Storage Scalable Architecture
  • Monolithic database design (a la SDSS) will not
    do it
  • SQL Server does not have cluster implementation
  • Do it by hand
  • Partitions vs Slices
  • Partitions are file-groups on the same server
  • Parallelize disk accesses on the same machine
  • Slices are data partitions on separate servers
  • We use both!
  • Additional slices can be added for scale-out
  • For PS1, use SQL Server Distributed Partition
    Views (DPVs)

107
Distributed Partitioned Views
  • Difference between DPVs and file-group
    partitioning
  • FG on same database
  • DPVs on separate DBs
  • FGs are for scale-up
  • DPVs are for scale-out
  • Main server has a view of a partitioned table
    that includes remote partitions (we call them
    slices to distinguish them from FG partitions)
  • Accomplished with SQL Servers linked server
    technology
  • NOT truly parallel, though

108
Scalable Data Architecture
  • Shared-nothing architecture
  • Detections split across cluster
  • Objects replicated on Head and Slice DBs
  • DPVs of Detections tables on the Headnode DB
  • Queries on Objects stay on head node
  • Queries on detections use only local data on
    slices

109
Hardware - Prototype
Storage
S3 PS04
4
10A 10 x 13 x 750 GB 3B 3 x 12 x 500 GB
2A
Server Naming Convention
Function
S2 PS03
4
LX Linux L Load server S/Head DB server M
MyDB server W Web server
PS0x 4-core PS1x 8-core
2A
L2/M PS05
S1 PS12
8
4
A
2A
Head PS11
8
W PS02
4
LX PS01
L1 PS13
8
4
B
2B
2A
A
Web
Staging
Loading
DB
MyDB
Function
9 TB
39 TB
0 TB
10 TB
Total space
RAID10
RAID5
RAID10
RAID10
RAID config
12D/4W
14D/3.5W
Disk/rack config
110
Hardware PS1
  • Ping-pong configuration to maintain high
    availability and query performance
  • 2 copies of each slice and of main (head) node
    database on fast hardware (hot spares)
  • 3rd spare copy on slow hardware (can be just
    disk)
  • Updates/ingest on offline copy then switch copies
    when ingest and replication finished
  • Synchronize second copy while first copy is
    online
  • Both copies live when no ingest
  • 3x basic config. for PS1

111
Detailed Design
  • Reuse SDSS software as much as possible
  • Data Transformation Layer (DX) Interface to IPP
  • Data Loading Pipeline (DLP)
  • Data Storage (DS)
  • Schema and Test Queries
  • Database Management System
  • Scalable Data Architecture
  • Hardware
  • Query Manager (QM CasJobs for prototype)

112
Query Manager
  • Based on SDSS CasJobs
  • Configure to work with distributed database, DPVs
  • Direct links (contexts) to slices can be added
    later if necessary
  • Segregates quick queries from long ones
  • Saves query results server-side in MyDB
  • Gives users a powerful query workbench
  • Can be scaled out to meet any query load
  • PS1 Sample Queries available to users
  • PS1 Prototype QM demo

113
ODM Prototype Components
  • Data Loading Pipeline
  • Data Storage
  • CasJobs
  • Query Manager (QM)
  • Web Based Interface (WBI)
  • Testing

114
Spatial Queries (Alex)
115
Spatial Searches in the ODM
116
Common Spatial Questions
  • Points in region queries
  • Find all objects in this region
  • Find all good objects (not in masked areas)
  • Is this point in any of the regions
  • Region in region
  • Find regions near this region and their area
  • Find all objects with error boxes intersecting
    region
  • What is the common part of these regions
  • Various statistical operations
  • Find the object counts over a given region list
  • Cross-match these two catalogs in the region

117
Sky Coordinates of Points
  • Many different coordinate systems
  • Equatorial, Galactic, Ecliptic, Supergalactic
  • Longitude-latitude constraints
  • Searches often in mix of different coordinate
    systems
  • gbgt40 and dec between 10 and 20
  • Problem coordinate singularities,
    transformations
  • How can one describe constraints in a easy,
    uniform fashion?
  • How can one perform fast database queries in an
    easy fashion?
  • FastIndexes
  • Easy simple query expressions

118
Describing Regions
  • Spacetime metadata for the VO (Arnold Rots)
  • Includes definitions of
  • Constraint single small or great circle
  • Convex intersection of constraints
  • Region union of convexes
  • Support both angles and Cartesian descriptions
  • Constructors for
  • CIRCLE, RECTANGLE, POLYGON, CONVEX HULL
  • Boolean algebra (INTERSECTION, UNION, DIFF)
  • Proper language to describe the abstract regions
  • Similar to GIS, but much better suited for
    astronomy

119
Things Can Get Complex
120
We Do Spatial 3 Ways
  • Hierarchical Triangular Mesh (extension to SQL)
  • Uses table valued functions
  • Acts as a new spatial access method
  • Zones fits SQL well
  • Surprisingly simple good
  • 3D Constraints a novel idea
  • Algebra on regions, can be implemented in pure SQL

121
PS1 Footprint
  • Using the projection cell definitions as centers
    for tessellation (T. Budavari)

122
CrossMatch Zone Approach
  • Divide space into declination zones
  • Objects ordered by zoneid, ra (on the sphere
    need wrap-around margin.)
  • Point search look in neighboring zones within
    (ra ?) bounding box
  • All inside the relational engine
  • Avoids impedance mismatch
  • Can batch comparisons
  • Automatically parallel
  • Details in Marias thesis

r
ra-zoneMax
x
zoneMax
ra ?
123
Indexing Using Quadtrees
  • Cover the sky with hierarchical pixels
  • COBE start with a cube
  • Hierarchical Triangular Mesh (HTM) uses trixels
  • Samet, Fekete
  • Start with an octahedron, andsplit each triangle
    into 4 children,down to 20 levels deep
  • Smallest triangles are 0.3
  • Each trixel has a unique htmID

124
Space-Filling Curve
0.12,0.13)
0.122,0.123)
0.121,0.122)
0.120,0.121)
0.123,0.130)
Triangles correspond to ranges All points inside
the triangle are inside the range.
125
SQL HTM Extension
  • Every object has a 20-deep htmID (44bits)
  • Clustered index on htmID
  • Table-valued functions for spatial joins
  • Given a region definition, routine returns up to
    10 ranges of covering triangles
  • Spatial query is mapped to 10 range queries
  • Current implementation rewritten in C
  • Excellent performance, little calling overhead
  • Three layers
  • General geometry library
  • HTM kernel
  • IO (parsing SQL interface)

126
Writing Spatial SQL
-- region description is contained by
_at_area DECLARE _at_cover TABLE (htmStart
bigint,htmEnd bigint) INSERT _at_cover SELECT
from dbo.fHtmCover(_at_area) -- DECLARE _at_region
TABLE ( convexId bigint,x float, y float, z
float) INSERT _at_region SELECT dbo.fGetHalfSpaces(_at_
area) -- SELECT o.ra, o.dec, 1 as flag, o.objid
FROM (SELECT objID as objid,
cx,cy,cz,ra,dec FROM Objects q JOIN _at_cover AS
c ON q.htmID between c.HtmIdStart and
c.HtmIdEnd ) AS o WHERE NOT EXISTS
( SELECT p.convexId FROM _at_region AS p
WHERE (o.cxp.x o.cyp.y o.czp.z lt
p.c) GROUP BY p.convexId )
127
Status
  • All three libraries extensively tested
  • Zones used for Marias thesis, plus various
    papers
  • New HTM code in production use since July on SDSS
  • Same code also used by STScI HLA, Galex
  • Systematic regression tests developed
  • Footprints computed for all major surveys
  • Complex mask computations done on SDSS
  • Loading zones used for bulk crossmatch
  • Ad hoc queries use HTM-based search functions
  • Excellent performance

128
Prototype (Maria)
129
PS1 PSPSObject Data Manager Design
  • PSPS Critical Design Review
  • November 5-6, 2007
  • IfA

130
Detail Design
  • General Concepts
  • Distributed Database architecture
  • Ingest Workflow
  • Prototype

131
Zones
  • Zones (spatial partitioning and indexing
    algorithm)
  • Partition and bin the data into declination zones
  • ZoneID floor ((dec 90.0) / zoneHeight)
  • Few tricks required to handle spherical geometry
  • Place the data close on disk
  • Cluster Index on ZoneID and RA
  • Fully implemented in SQL
  • Efficient
  • Nearby searches
  • Cross-Match (especially)
  • Fundamental role in addressing the critical
    requirements
  • Data volume management
  • Association Speed
  • Spatial capabilities

132
Zoned Table
ObjID ZoneID RA Dec CX CY CZ
1 0 0.0 -90.0
2 20250 180.0 0.0
3 20250 181.0 0.0
4 40500 360.0 90.0
ZoneID floor ((dec 90.0) / zoneHeight)
ZoneHeight 8 arcsec in this example
133
SQL CrossNeighbors
  • SELECT
  • FROM prObj1 z1
  • JOIN zoneZone ZZ
  • ON ZZ.zoneID1 z1.zoneID
  • JOIN prObj2 z2
  • ON ZZ.ZoneID2 z2.zoneID
  • WHERE
  • z2.ra BETWEEN z1.ra-ZZ.alpha AND z2.raZZ.alpha
  • AND
  • z2.dec BETWEEN z1.dec-_at_r AND z1.dec_at_r
  • AND
  • (z1.cxz2.cxz1.cyz2.cyz1.czz2.cz) gt
    cos(radians(_at_r))

134
Good CPU Usage
135
Partitions
  • SQL Server 2005 introduces technology to handle
    tables which are partitioned across different
    disk volumes and managed by a single server.
  • Partitioning makes management and access of
    large tables and indexes more efficient
  • Enables parallel I/O
  • Reduces the amount of data that needs to be
    accessed
  • Related tables can be aligned and collocated in
    the same place speeding up JOINS

136
Partitions
  • 2 key elements
  • Partitioning function
  • Specifies how the table or index is partitioned
  • Partitioning schemas
  • Using a partitioning function, the schema
    specifies the placement of the partitions on file
    groups
  • Data can be managed very efficiently using
    Partition Switching
  • Add a table as a partition to an existing table
  • Switch a partition from one partitioned table to
    another
  • Reassign a partition to form a single table
  • Main requirement
  • The table must be constrained on the partitioning
    column

137
Partitions
  • For the PS1 design,
  • Partitions mean File Group Partitions
  • Tables are partitioned into ranges of ObjectID,
    which correspond to declination ranges.
  • ObjectID boundaries are selected so that each
    partition has a similar number of objects.

138
Distributed Partitioned Views
  • Tables participating in the Distributed
    Partitioned View (DVP) reside on different
    databases which reside in different databases
    which reside on different instances or different
    (linked) servers

139
Concept Slices
  • In the PS1 design, the bigger tables will be
    partitioned across servers
  • To avoid confusion with the File Group
    Partitioning, we call them Slices
  • Data is glued together using Distributed
    Partitioned Views
  • The ODM will manage slices. Using slices improves
    system scalability.
  • For PS1 design, tables are sliced into ranges of
    ObjectID, which correspond to broad declination
    ranges. Each slice is subdivided into partitions
    that correspond to narrower declination ranges.
  • ObjectID boundaries are selected so that each
    slice has a similar number of objects.

140
Detail Design Outline
  • General Concepts
  • Distributed Database architecture
  • Ingest Workflow
  • Prototype

141
PS1 Distributed DB system
objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1
objZoneIndx Orphans_ln Detections_ln LnkToObj_ln
detections
detections
Linked servers
Load Support1
Load Supportn
LoadAdmin
PartitionsMap
Linked servers
P1
Pm
Objects_p1 LnkToObj_p1 Detections_p1 Meta
Objects_pm LnkToObj_pm Detections_pm Meta
PS1
PartitionsMap Objects LnkToObj Meta
Detections
PS1 database
Query Manager (QM)
Legend Database Full table partitioned
table Output table Partitioned View
Web Based Interface (WBI)
142
Design Decisions ObjID
  • Objects have their positional information encoded
    in their objID
  • fGetPanObjID (ra, dec, zoneH)
  • ZoneID is the most significant part of the ID
  • It gives scalability, performance, and spatial
    functionality
  • Object tables are range partitioned according to
    their object ID

143
ObjectID Clusters Data Spatially
Dec 16.71611583? ZH 0.008333?
ZID (Dec90) / ZH 08794.0661
ObjectID 087941012871550661
RA 101.287155?
ObjectID is unique when objects are separated by
gt0.0043 arcsec
144
Design Decisions DetectID
  • Detections have their positional information
    encoded in the detection identifier
  • fGetDetectID (dec, observationID, runningID,
    zoneH)
  • Primary key (objID, detectionID), to align
    detections with objects within partitions
  • Provides efficient access to all detections
    associated to one object
  • Provides efficient access to all detections of
    nearby objects

145
DetectionID Clusters Data in Zones
Dec 16.71611583? ZH 0.008333?
ZID (Dec90) / ZH 08794.0661
DetectID 0879410500001234567
ObservationID 1050000
Running ID 1234567
146
ODM Capacity
  • 5.3.1.3 The PS1 ODM shall be able to ingest into
    the
  • ODM a total of
  • 1.5?1011 P2 detections
  • 8.3?1010 cumulative sky (stack) detections
  • 5.5?109 celestial objects
  • together with their linkages.

147
PS1 Table Sizes - Monolithic

Table Year 1 Year 2 Year 3 Year 3.5
Objects 2.31 2.31 2.31 2.31
StackPsfFits 5.07 10.16 15.20 17.74
StackToObj 0.92 1.84 2.76 3.22
StackModelFits 1.15 2.29 3.44 4.01
P2PsfFits 7.87 15.74 23.61 27.54
P2ToObj 1.33 2.67 4.00 4.67
Other Tables 3.19 6.03 8.87 10.29
Indexes 20 4.37 8.21 12.04 13.96
Total 26.21 49.24 72.23 83.74
Sizes are in TB
148
What goes into the main Server
Linked servers
P1
Pm
PS1
PartitionsMap Objects LnkToObj Meta
PS1 database
Objects LnkToObj Meta
PartitionsMap
Legend Database Full table partitioned
table Output table Distributed Partitioned View
149
What goes into slices
Linked servers
P1
Pm
Objects_pm LnkToObj_pm Detections_pm Partiti
onsMap Meta
Objects_p1 LnkToObj_p1 Detections_p1 Partiti
onsMap Meta
PS1
PartitionsMap Objects LnkToObj Meta
PS1 database
Objects_p1 LnkToObj_p1 Detections_p1 Meta
PartitionsMap
Legend Database Full table partitioned
table Output table Distributed Partitioned View
150
What goes into slices
Linked servers
P1
Pm
Objects_pm LnkToObj_pm Detections_pm Partiti
onsMap Meta
Objects_p1 LnkToObj_p1 Detections_p1 Partiti
onsMap Meta
PS1
PartitionsMap Objects LnkToObj Meta
PS1 database
Objects_p1 LnkToObj_p1 Detections_p1 Meta
PartitionsMap
Legend Database Full table partitioned
table Output table Distributed Partitioned View
151
Duplication of Objects LnkToObj
  • Objects are distributed across slices
  • Objects, P2ToObj, and StackToObj are duplicated
    in the slices to parallelize inserts
    updates
  • Detections belong into their objects slice
  • Orphans belong to the slice where their position
    would allocate them
  • Orphans near slices boundaries will need special
    treatment
  • Objects keep their original object identifier
  • Even though positional refinement might change
    their zoneID and therefore the most significant
    part of their identifier

152
Glue Distributed Views
Linked servers
P1
Pm
Objects_pm LnkToObj_pm Detections_pm Partiti
onsMap Meta
Objects_p1 LnkToObj_p1 Detections_p1 Partiti
onsMap Meta
PS1
PartitionsMap Objects LnkToObj Meta
Detections
PS1 database
Detections
Legend Database Full table partitioned
table Output table Distributed Partitioned View
153
Partitioning in Main Server
  • Main server is partitioned (objects) and
    collocated (lnkToObj) by objid
  • Slices are partitioned (objects) and collocated
    (lnkToObj) by objid

Linked servers
P1
Pm
PS1
PS1 database
Query Manager (QM)
Web Based Interface (WBI)
154
PS1 Table Sizes - Main Server

Table Year 1 Year 2 Year 3 Year 3.5
Objects 2.31 2.31 2.31 2.31
StackPsfFits ? ? ? ?
StackToObj 0.92 1.84 2.76 3.22
StackModelFits ? ? ? ?
P2PsfFits ? ? ? ?
P2ToObj 1.33 2.67 4.00 4.67
Other Tables 0.41 0.46 0.52 0.55
Indexes 20 0.99 1.46 1.92 2.15
Total 5.96 8.74 11.51 12.90
Sizes are in TB
155
PS1 Table Sizes - Each Slice
m4 m8 m10 m12
Table Year 1 Year 2 Year 3 Year 3.5
Objects 0.58 0.29 0.23 0.19
StackPsfFits 1.27 1.27 1.52 1.48
StackToObj 0.23 0.23 0.28 0.27
StackModelFits 0.29 0.29 0.34 0.33
P2PsfFits 1.97 1.97 2.36 2.30
P2ToObj 0.33 0.33 0.40 0.39
Other Tables 0.75 0.81 1.00 1.01
Indexes 20 1.08 1.04 1.23 1.19
Total 6.50 6.23 7.36 7.16
Sizes are in TB
156
PS1 Table Sizes - All Servers

Table Year 1 Year 2 Year 3 Year 3.5
Objects 4.63 4.63 4.61 4.59
StackPsfFits 5.08 10.16 15.20 17.76
StackToObj 1.84 3.68 5.56 6.46
StackModelFits 1.16 2.32 3.40 3.96
P2PsfFits 7.88 15.76 23.60 27.60
P2ToObj 2.65 5.31 8.00 9.35
Other Tables 3.41 6.94 10.52 12.67
Indexes 20 5.33 9.76 14.18 16.48
Total 31.98 58.56 85.07 98.87
Sizes are in TB
157
Detail Design Outline
  • General Concepts
  • Distributed Database architecture
  • Ingest Workflow
  • Prototype

158
PS1 Distributed DB system
objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1
objZoneIndx Orphans_ln Detections_ln LnkToObj_ln
detections
detections
Linked servers
Load Support1
Load Supportn
LoadAdmin
PartitionsMap
Linked servers
P1
Pm
Objects_p1 LnkToObj_p1 Detections_p1 Partiti
onsMap Meta
Objects_pm LnkToObj_pm Detections_pm Partiti
onsMap Meta
PS1
PartitionsMap Objects LnkToObj Meta
Detections
PS1 database
Query Manager (QM)
Legend Database Full table partitioned
table Output table Partitioned View
Web Based Interface (WBI)
159
Insert Update
  • SQL Insert and Update are expensive operations
    due to logging and re-indexing
  • In the PS1 design, Insert and Update have been
    re-factored into sequences of
  • Merge Constrain Switch Partition
  • Frequency
  • f1 daily
  • f2 at least monthly
  • f3 TBD (likely to be every 6 months)

160
Ingest Workflow
ObjectsZ
CSV
161
Ingest _at_ frequency f1
ObjectsZ
P2ToObj
P2PsfFits
Metadata
Orphans
SLICE_1
LOADER
MAIN
162
Updates _at_ frequency f2
Metadata
163
Updates _at_ frequency f2
Objects
Metadata
SLICE_1
LOADER
MAIN
164
Snapshots _at_ frequency f3
Snapshot
Metadata
MAIN
165
Batch Update of a Partition
166
Scaling-out
  • Apply Ping-Pong strategy to satisfy query
    performance during ingest

2 x ( 1 main m slices)
Objects_p1 LnkToObj_p1 Detections_p1 Object
s_p2 LnkToObj_p2 Detections_p2 Meta
Linked servers
P1 P2
Pm P1
Objects_pm LnkToObj_pm Detections_pm Object
s_p1 LnkToObj_p1 Detections_p1 Meta
P2 P3
Pm-1 Pm
PS1
PS1
Detections
Detections
PartitionsMap Objects LnkToObj Meta
PartitionsMap Objects LnkToObj Meta
PS1 database
Query Manager (QM)
Legend Database
Write a Comment
User Comments (0)