Title: Speaker
1Presentation Title
AMGA Metadata Catalogue Service
- Speaker
- Institution
- Event Name
Academia Sinica Grids Clouds, Jingya
You jingya.you_at_twgrid.org Aug 4th, 2009
2Contents
- Metadata services background and possible uses on
a grid environment - Architecture and features of the gLite Metadata
Service - New AMGA Features
- existing DB import
- native SQL-92 support
- multi-thread server
- WS-DAIR interface
3Why Grid Needs Metadata
- Grids allow to save millions of files spread over
several storage sites. - Users and applications need an efficient
mechanism - to describe files
- to locate files based on their contents
- This is achieved by
- associating descriptive attributes to files
- Metadata is data about data
- answering user queries against the associated
information
4Basic Metadata Concept
- Entries
- Representation of real world entities which we
are attaching metadata to describing them - Attributes
- Type The type (int, float, string, )
- Name/Key The name of the attribute
- Value Value of an entrys attribute
- Schema A Set of Attributes
- Collection A Set of Entries associated with a
Schema - Metadata List of Attributes (including their
values) associated with entries
5Example Movie Trailer
- Movie trailers files (entries) saved on Grid
storage Elements and registered into file
catalogue - We want to add metadata to describe movie content
- A possible schema
- Title varchar
- Runtime int
- Cast varchar
- LFN varchar
- A metadata catalogue will be the repository of
the movies metadata and allow to find movies
satisfying users queries
6Example Movie Trailer
Schema
Attributes
Entry
Collection
7Metadata Service on Grid
- Information about file, but not only
- Metadata can describe any grid entity/object
- ex JobIDs - add logging information to your jobs
- Inputset for a storm of parametric jobs
- Monitoring of running applications
- ex ongoing results from running jobs can be
published on the metadata server - Information exchanging among grid peers
- ex producers/consumers job collections master
jobs produce data to be analyzed slave jobs
query the metadata server to retrieve input to
consume - Simplified DB access on the grid
- Grid applications that needs structured data can
model their data schemas as metadata
8Inputset for Parametric Jobs
- /grid/my_simulation/input
- This collection lists all the parameter set to be
run on the Grid - On the WN, one of the inputset is selected and
isTaken is set JOB_ID of the job that has
fetched it - Results is also written in the found column to
monitor the simulation - so users can check the simulation from a UI,
querying the metadata server, or from a WebPage
(using APIs for ex) - StdOutput can be copied also into the output
text column
9A possible parametric-get.sh script
10Monitoring of Running Application
11Use a Metadata services to exchange data among
running jobs
- Suppose we have two sets of jobs
- Producers they generate a file, store on a SE,
register it onto the LFC File Catalogue assigning
a LFN - Consumers they will take a LFN, download the
file and elaborate it - A Metadata collection can be used to share the
information generated by the Producers it could
act as a bag-of-LFNs (bag-of-task model) from
which Consumers can fetch file for further
elaboration
12Information exchanging among grid peers
13AMGA Metadata Catalogue
- Metadata Service for the gLite middleware
- but no dependencies from gLite software
- it can be used with other grid technologies/other
environments - AMGA Arda Metadata Grid Application
- Provide a complete but simple interface, in order
to make all users able to use it easily. - Designed with scalability in mind in order to
deal with large number of entries - based on a lightweight and streamed text-based
protocol, like TCP/IP - Grid security is provided to grant different
access levels to different users. - Flexible with support to dynamic schemas in order
to serve several application domains - Simple installation by tar source, RPMs or
Yum/YAIM
14AMGA Analogies
- Analogy to the RDBMS world
- Schema ? table schema
- Collection?db table
- Attribute?schema column
- Entry?table row/record
- Analogy to file system
- Collection?Directory
- Entry?File
- Example
- createdir /jobs (create table jobs)
- addattr /jobs jobStatus int (alter table jobs add
column jobStatus int) - addentry /jobs/job1 jobStatus 0 (insert into jobs
(jobstatus) values(1)) updateattr /jobs
jobStatus 1 jobIDgt100 (update jobs set
jobStatus1 where JobIDgt100)
15Features
- Dynamic Schemas
- Schemas can be modified at runtime by client
- Create, delete schemas
- Add, remove attributes
- AMGA collections are hierarchical organized
- Collections can contain sub-collections
- Sub-collections can inherit/extend parent
collection schema - Flexible Queries
- SQL-like query language
- Different join type (inner, outer, left, right)
between schemas are provided - Support for Views, Constraints, Indexes
16Example
17AMGA Security
- Unix style permissions users and groups
- ACLs Per-collection or per-entry (table row)
- Secure client/server connections SSL
- Client Authentication based on
- Username/password
- General X509 certificates (DN based)
- Grid-proxy certificates (DN based)
- VOMS support
- VO attribute maps to defined AMGA user
- VOMS Role maps to defined AMGA user
- VOMS Group maps to defined AMGA group
18AMGA Implementation
- C multiprocess server
- Backend
- Postgres, MySQL 4/5, SQLite, Oracle
- Frontend
- TCP Text Streaming
- High Performance
- mdclient CLI
- Client API for C, Java,
- Python, Perl, PHP
- SOAP
- Interoperability
- Scalability
- Standalone Python
- Library Implementation
19AMGA Datatypes
- Using the above datatypes you are sure that your
metadata can be easily moved to all supported
backends - If you do not care about DB portability, you can
use, in principle, as entry attribute type ALL
the datatypes supported by the backend, even the
more esoteric ones (PostgreSQL Network Address
type or Geometric ones)
20Accessing AMGA from UI/WNs
- TCP Streaming Front-end
- mdcli mdclient CLI and C API (md_cli.h,
MD_Client.h) - Java Client API and command line mdjavaclient.sh
mdjavacli.sh (also under Windows !!) - Python and Perl Client API
- PHP Client API NEW
- developed totally by the GILDA team INFN CT
- AMGA Web Interface (AMGA WI) ---NEW
- Developed totally by the GILDA team INFN CT
- Based on JAVA AMGA Standard APIs
- Web Application using standard as JSP Custom
Tags, Servlet - SOAP Frontend (WSDL)
- C gSOAP
- AXIS (Java)
- ZSI (Python)
21Advanced Features Metadata Replication
- AMGA provides a replication/federation mechanisms
- Motivation
- Scalability Support hundreds/thousands of
concurrent users - Geographical distribution Hide network latency
- Reliability No single point of failure
- DB Independent replication Heterogeneous DB
systems - Disconnected computing Off-line access (laptops)
- Architecture
- Asynchronous replication
- Master-slave writes only allowed on the master
- Application level replication
- Replicate Metadata commands
- Partial replication supports replication of only
sub-trees of the metadata hierarchy
22Metadata Replication
23DB Access and Replication
24Existing DB access with AMGA
- Since AMGA 1.2.10, a new import feature allow to
access existing DB table - Once imported into AMGA the tables from one or
more DBs you want to access through AMGA, you can
exploit many of the features brought to you by
AMGA for your existing tables - Advantages
- your db tables can be accessed by grid
users/applications, using grid authentication
(VOMS proxies)/authorization with ACLs - exploiting AMGA federation features you can
access several databases together from the Grid
25Set up AMGA to access your tables
- To remember AMGA stores its own tables in its DB
backend - To access an existing DB you have 2 option
- import the tables of the DB you want to access to
into AMGA DB backend - viceversa, add AMGA DB backend tables to the DB
you want to access to - Use the import command by root to mount you
table into the AMGA collection hierarchy - Querygt whoami
- gtgt root
- Querygt createdir /world
- Querygt cd /world/
- Querygt import world.City /world/City
- Querygt import world.Country /world/Country
- Querygt import world.CountryLanguage
/world/CountryLanguage
26Set up AMGA to access your tables
- Properly set up authorization on the imported
tables - Querygt acl_remove /world/City/ systemanyuser
- Querygt acl_remove /world/Country systemanyuser
- Querygt acl_add /world/ gildausers rx
- Querygt acl_show /world
- gtgt root rwx
- gtgt gildausers rx
- gtgt systemanyuser rx
- Querygt selectattr CityCountryCode CityName
'like(CityName, "Am") limit 5' - gtgt NLD
- gtgt Amsterdam
- gtgt NLD
- gtgt Amersfoort
- gtgt BRA
- gtgt Americana
- gtgt ECU
- gtgt Ambato
- gtgt IDN
- More information on existing DB access _at_
27Native SQL syntax Support
- Goal
- To implement native SQL query processing
functionality in AMGA - Reason
- A lot of requests from user communities
- take advantage of their SQL expertice
- ease the work needed to port existing SQL DB
application to the Grid with AMGA - Complement the exiting AMGA metadata query
language - SQL-92 Entry Level direct data statements
- SELECT, INSERT, UPDATE, DELETE
28Native SQL support in AMGA
- All SQL commands should be uppercase
- Entry name
- FILE special attribute
- file column (primary key) into the backend DB
- Using INSERT, file is automatically filled with
a random GUID - Permission modification
- GRANT/REVOKE not allowed
- use the existing AMGA commands (acl_)
- Table name
- lttable namegt ltCollection pathnamegt in AMGA
- Column name
- lttable namegt.ltattributegt
- lttable namegtltattributegt
29Enable Postgres array Support
- PostgreSQL supports array as column data type
- ex keywords varchar
- manuscripts,federico de roberto,envelope
32 - keywords2 federico de roberto
- Both the AMGA language and SQL provides access to
array datatypes - selectattr /tmp/arraykeywords2 keywords1
manuscripts - SQL syntax offers ANY, ALL, ARRAY_UPPER,
GENERATE_SERIES - SELECT FROM /tmp/array WHERE manuscripts
ANY(keywords) - SELECT COUNT() FROM PROJ WHERE CITY ANY
(SELECT CITY FROM STAFF WHERE EMPNUM 'E8')
30Multi-Threading Server
- Classic AMGA server implemented as a
multi-process daemon - each process with its own DB connection
- each process take care of one connected client
- a configurable number of listening processes is
set up on the amgad.config - MinProcesses 2
- MaxProcesses 50
- In case of thousand of concurrent clients,
thousand server processes and thousand DB
connections are needed - db connections are very expensive system
resources - A new multi-threaded AMGA server is available in
1.9 - one processes holding multiple threads with only
one db connection
31Implementation
- Thread pool
- Pre-forked threads for each server
- configurable number in the amgad.config
- initThreadNumber 16
- DB Connection sharing
- all threads belonging to the same process share
the same DB connection - Architecture
- using Pthread library
- each thread has
- its own MDServer instance
32Tunning AMGA for High Loads
- Advice 1
- use the multi-threaded version
- it allows to handle a thousand of concurrent
connections with only 25-30 DB connections - Advice 2
- use session caching many concurrent requests
from the same client will share the same AMGA
server - can be configured into the amgad.config
- Sessions (no allow force)
- Default is allow
- Advice 3
- in case of high memory consumption, use two
separate machines for the AMGA server and DB
respectively
33WS-DAIR Interface
- What is WS-DAIR
- Proposed OGF standards Recommendation for access
to relational DBs on the Grid - Allow AMGA a seamless integration into the OGF
standardized Grid Data Access Services
34WS-DAIR Interface
35AMGA WS-DAIR Implementation
- Written in C (gSOAP)
- SOAP Binding document/literal
- Given WSDLs in WS-DAIR specification were used
with few modification - Features
- Supported Dataset Format SUN JDBC WebRowSet
(default) - Supported Language SQL-92 Direct Data
Statement, AMGA Metadata Language - Security SSL, GSI, VOMS, and ACL
- Indirect Data Access Service
- Data for a new indirect service is stored as a DB
VIEW
36References
- AMGA website http//amga.web.cern.ch/amga/
- AMGA Forum http//amga.ct.infn.it/support/
- ISGC 2009 http//event.twgrid.org/isgc2009/program
.htm