Speaker - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Speaker

Description:

Metadata services background and possible uses on a grid ... Postgres, MySQL 4/5, SQLite, Oracle. Frontend. TCP Text Streaming. High Performance. mdclient CLI ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 37
Provided by: jin80
Category:

less

Transcript and Presenter's Notes

Title: Speaker


1
Presentation Title
AMGA Metadata Catalogue Service
  • Speaker
  • Institution
  • Event Name

Academia Sinica Grids Clouds, Jingya
You jingya.you_at_twgrid.org Aug 4th, 2009
2
Contents
  • Metadata services background and possible uses on
    a grid environment
  • Architecture and features of the gLite Metadata
    Service
  • New AMGA Features
  • existing DB import
  • native SQL-92 support
  • multi-thread server
  • WS-DAIR interface

3
Why Grid Needs Metadata
  • Grids allow to save millions of files spread over
    several storage sites.
  • Users and applications need an efficient
    mechanism
  • to describe files
  • to locate files based on their contents
  • This is achieved by
  • associating descriptive attributes to files
  • Metadata is data about data
  • answering user queries against the associated
    information

4
Basic Metadata Concept
  • Entries
  • Representation of real world entities which we
    are attaching metadata to describing them
  • Attributes
  • Type The type (int, float, string, )
  • Name/Key The name of the attribute
  • Value Value of an entrys attribute
  • Schema A Set of Attributes
  • Collection A Set of Entries associated with a
    Schema
  • Metadata List of Attributes (including their
    values) associated with entries

5
Example Movie Trailer
  • Movie trailers files (entries) saved on Grid
    storage Elements and registered into file
    catalogue
  • We want to add metadata to describe movie content
  • A possible schema
  • Title varchar
  • Runtime int
  • Cast varchar
  • LFN varchar
  • A metadata catalogue will be the repository of
    the movies metadata and allow to find movies
    satisfying users queries

6
Example Movie Trailer
Schema
Attributes
Entry
Collection
7
Metadata Service on Grid
  • Information about file, but not only
  • Metadata can describe any grid entity/object
  • ex JobIDs - add logging information to your jobs
  • Inputset for a storm of parametric jobs
  • Monitoring of running applications
  • ex ongoing results from running jobs can be
    published on the metadata server
  • Information exchanging among grid peers
  • ex producers/consumers job collections master
    jobs produce data to be analyzed slave jobs
    query the metadata server to retrieve input to
    consume
  • Simplified DB access on the grid
  • Grid applications that needs structured data can
    model their data schemas as metadata

8
Inputset for Parametric Jobs
  • /grid/my_simulation/input
  • This collection lists all the parameter set to be
    run on the Grid
  • On the WN, one of the inputset is selected and
    isTaken is set JOB_ID of the job that has
    fetched it
  • Results is also written in the found column to
    monitor the simulation
  • so users can check the simulation from a UI,
    querying the metadata server, or from a WebPage
    (using APIs for ex)
  • StdOutput can be copied also into the output
    text column

9
A possible parametric-get.sh script
10
Monitoring of Running Application
11
Use a Metadata services to exchange data among
running jobs
  • Suppose we have two sets of jobs
  • Producers they generate a file, store on a SE,
    register it onto the LFC File Catalogue assigning
    a LFN
  • Consumers they will take a LFN, download the
    file and elaborate it
  • A Metadata collection can be used to share the
    information generated by the Producers it could
    act as a bag-of-LFNs (bag-of-task model) from
    which Consumers can fetch file for further
    elaboration

12
Information exchanging among grid peers
13
AMGA Metadata Catalogue
  • Metadata Service for the gLite middleware
  • but no dependencies from gLite software
  • it can be used with other grid technologies/other
    environments
  • AMGA Arda Metadata Grid Application
  • Provide a complete but simple interface, in order
    to make all users able to use it easily.
  • Designed with scalability in mind in order to
    deal with large number of entries
  • based on a lightweight and streamed text-based
    protocol, like TCP/IP
  • Grid security is provided to grant different
    access levels to different users.
  • Flexible with support to dynamic schemas in order
    to serve several application domains
  • Simple installation by tar source, RPMs or
    Yum/YAIM

14
AMGA Analogies
  • Analogy to the RDBMS world
  • Schema ? table schema
  • Collection?db table
  • Attribute?schema column
  • Entry?table row/record
  • Analogy to file system
  • Collection?Directory
  • Entry?File
  • Example
  • createdir /jobs (create table jobs)
  • addattr /jobs jobStatus int (alter table jobs add
    column jobStatus int)
  • addentry /jobs/job1 jobStatus 0 (insert into jobs
    (jobstatus) values(1)) updateattr /jobs
    jobStatus 1 jobIDgt100 (update jobs set
    jobStatus1 where JobIDgt100)

15
Features
  • Dynamic Schemas
  • Schemas can be modified at runtime by client
  • Create, delete schemas
  • Add, remove attributes
  • AMGA collections are hierarchical organized
  • Collections can contain sub-collections
  • Sub-collections can inherit/extend parent
    collection schema
  • Flexible Queries
  • SQL-like query language
  • Different join type (inner, outer, left, right)
    between schemas are provided
  • Support for Views, Constraints, Indexes

16
Example
17
AMGA Security
  • Unix style permissions users and groups
  • ACLs Per-collection or per-entry (table row)
  • Secure client/server connections SSL
  • Client Authentication based on
  • Username/password
  • General X509 certificates (DN based)
  • Grid-proxy certificates (DN based)
  • VOMS support
  • VO attribute maps to defined AMGA user
  • VOMS Role maps to defined AMGA user
  • VOMS Group maps to defined AMGA group

18
AMGA Implementation
  • C multiprocess server
  • Backend
  • Postgres, MySQL 4/5, SQLite, Oracle
  • Frontend
  • TCP Text Streaming
  • High Performance
  • mdclient CLI
  • Client API for C, Java,
  • Python, Perl, PHP
  • SOAP
  • Interoperability
  • Scalability
  • Standalone Python
  • Library Implementation

19
AMGA Datatypes
  • Using the above datatypes you are sure that your
    metadata can be easily moved to all supported
    backends
  • If you do not care about DB portability, you can
    use, in principle, as entry attribute type ALL
    the datatypes supported by the backend, even the
    more esoteric ones (PostgreSQL Network Address
    type or Geometric ones)

20
Accessing AMGA from UI/WNs
  • TCP Streaming Front-end
  • mdcli mdclient CLI and C API (md_cli.h,
    MD_Client.h)
  • Java Client API and command line mdjavaclient.sh
    mdjavacli.sh (also under Windows !!)
  • Python and Perl Client API
  • PHP Client API NEW
  • developed totally by the GILDA team INFN CT
  • AMGA Web Interface (AMGA WI) ---NEW
  • Developed totally by the GILDA team INFN CT
  • Based on JAVA AMGA Standard APIs
  • Web Application using standard as JSP Custom
    Tags, Servlet
  • SOAP Frontend (WSDL)
  • C gSOAP
  • AXIS (Java)
  • ZSI (Python)

21
Advanced Features Metadata Replication
  • AMGA provides a replication/federation mechanisms
  • Motivation
  • Scalability Support hundreds/thousands of
    concurrent users
  • Geographical distribution Hide network latency
  • Reliability No single point of failure
  • DB Independent replication Heterogeneous DB
    systems
  • Disconnected computing Off-line access (laptops)
  • Architecture
  • Asynchronous replication
  • Master-slave writes only allowed on the master
  • Application level replication
  • Replicate Metadata commands
  • Partial replication supports replication of only
    sub-trees of the metadata hierarchy

22
Metadata Replication
23
DB Access and Replication
24
Existing DB access with AMGA
  • Since AMGA 1.2.10, a new import feature allow to
    access existing DB table
  • Once imported into AMGA the tables from one or
    more DBs you want to access through AMGA, you can
    exploit many of the features brought to you by
    AMGA for your existing tables
  • Advantages
  • your db tables can be accessed by grid
    users/applications, using grid authentication
    (VOMS proxies)/authorization with ACLs
  • exploiting AMGA federation features you can
    access several databases together from the Grid

25
Set up AMGA to access your tables
  • To remember AMGA stores its own tables in its DB
    backend
  • To access an existing DB you have 2 option
  • import the tables of the DB you want to access to
    into AMGA DB backend
  • viceversa, add AMGA DB backend tables to the DB
    you want to access to
  • Use the import command by root to mount you
    table into the AMGA collection hierarchy
  • Querygt whoami
  • gtgt root
  • Querygt createdir /world
  • Querygt cd /world/
  • Querygt import world.City /world/City
  • Querygt import world.Country /world/Country
  • Querygt import world.CountryLanguage
    /world/CountryLanguage

26
Set up AMGA to access your tables
  • Properly set up authorization on the imported
    tables
  • Querygt acl_remove /world/City/ systemanyuser
  • Querygt acl_remove /world/Country systemanyuser
  • Querygt acl_add /world/ gildausers rx
  • Querygt acl_show /world
  • gtgt root rwx
  • gtgt gildausers rx
  • gtgt systemanyuser rx
  • Querygt selectattr CityCountryCode CityName
    'like(CityName, "Am") limit 5'
  • gtgt NLD
  • gtgt Amsterdam
  • gtgt NLD
  • gtgt Amersfoort
  • gtgt BRA
  • gtgt Americana
  • gtgt ECU
  • gtgt Ambato
  • gtgt IDN
  • More information on existing DB access _at_

27
Native SQL syntax Support
  • Goal
  • To implement native SQL query processing
    functionality in AMGA
  • Reason
  • A lot of requests from user communities
  • take advantage of their SQL expertice
  • ease the work needed to port existing SQL DB
    application to the Grid with AMGA
  • Complement the exiting AMGA metadata query
    language
  • SQL-92 Entry Level direct data statements
  • SELECT, INSERT, UPDATE, DELETE

28
Native SQL support in AMGA
  • All SQL commands should be uppercase
  • Entry name
  • FILE special attribute
  • file column (primary key) into the backend DB
  • Using INSERT, file is automatically filled with
    a random GUID
  • Permission modification
  • GRANT/REVOKE not allowed
  • use the existing AMGA commands (acl_)
  • Table name
  • lttable namegt ltCollection pathnamegt in AMGA
  • Column name
  • lttable namegt.ltattributegt
  • lttable namegtltattributegt

29
Enable Postgres array Support
  • PostgreSQL supports array as column data type
  • ex keywords varchar
  • manuscripts,federico de roberto,envelope
    32
  • keywords2 federico de roberto
  • Both the AMGA language and SQL provides access to
    array datatypes
  • selectattr /tmp/arraykeywords2 keywords1
    manuscripts
  • SQL syntax offers ANY, ALL, ARRAY_UPPER,
    GENERATE_SERIES
  • SELECT FROM /tmp/array WHERE manuscripts
    ANY(keywords)
  • SELECT COUNT() FROM PROJ WHERE CITY ANY
    (SELECT CITY FROM STAFF WHERE EMPNUM 'E8')

30
Multi-Threading Server
  • Classic AMGA server implemented as a
    multi-process daemon
  • each process with its own DB connection
  • each process take care of one connected client
  • a configurable number of listening processes is
    set up on the amgad.config
  • MinProcesses 2
  • MaxProcesses 50
  • In case of thousand of concurrent clients,
    thousand server processes and thousand DB
    connections are needed
  • db connections are very expensive system
    resources
  • A new multi-threaded AMGA server is available in
    1.9
  • one processes holding multiple threads with only
    one db connection

31
Implementation
  • Thread pool
  • Pre-forked threads for each server
  • configurable number in the amgad.config
  • initThreadNumber 16
  • DB Connection sharing
  • all threads belonging to the same process share
    the same DB connection
  • Architecture
  • using Pthread library
  • each thread has
  • its own MDServer instance

32
Tunning AMGA for High Loads
  • Advice 1
  • use the multi-threaded version
  • it allows to handle a thousand of concurrent
    connections with only 25-30 DB connections
  • Advice 2
  • use session caching many concurrent requests
    from the same client will share the same AMGA
    server
  • can be configured into the amgad.config
  • Sessions (no allow force)
  • Default is allow
  • Advice 3
  • in case of high memory consumption, use two
    separate machines for the AMGA server and DB
    respectively

33
WS-DAIR Interface
  • What is WS-DAIR
  • Proposed OGF standards Recommendation for access
    to relational DBs on the Grid
  • Allow AMGA a seamless integration into the OGF
    standardized Grid Data Access Services

34
WS-DAIR Interface
35
AMGA WS-DAIR Implementation
  • Written in C (gSOAP)
  • SOAP Binding document/literal
  • Given WSDLs in WS-DAIR specification were used
    with few modification
  • Features
  • Supported Dataset Format SUN JDBC WebRowSet
    (default)
  • Supported Language SQL-92 Direct Data
    Statement, AMGA Metadata Language
  • Security SSL, GSI, VOMS, and ACL
  • Indirect Data Access Service
  • Data for a new indirect service is stored as a DB
    VIEW

36
References
  • AMGA website http//amga.web.cern.ch/amga/
  • AMGA Forum http//amga.ct.infn.it/support/
  • ISGC 2009 http//event.twgrid.org/isgc2009/program
    .htm
Write a Comment
User Comments (0)
About PowerShow.com