Using Grid Computing - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Using Grid Computing

Description:

some measure of the capacity technology advances provide for a constant number ... 50% of the main analysis capacity will be at ... bilbo, kilogram, triangel ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 31
Provided by: david2676
Category:
Tags: bilbo | computing | grid | using

less

Transcript and Presenter's Notes

Title: Using Grid Computing


1
Using Grid Computing
David Groep, NIKHEF2002-07-15
2
The Grid, But Why?
  • Physics _at_ CERN
  • LHC particle accellerator
  • operational in 2007
  • 5-10 Petabyte per year
  • 150 countries
  • gt 10000 Users
  • lifetime 20 years

40 MHz (40 TB/sec)
level 1 - special hardware
75 KHz (75 GB/sec)
level 2 - embedded
5 KHz (5 GB/sec)
level 3 - PCs
100 Hz (100 MB/sec)
data recording offline analysis
3
CPU Data Requirements
4
More Reasons Why
ENVISAT
  • 3500 MEuro programme cost
  • 10 instruments on board
  • 200 Mbps data rate to ground
  • 400 Tbytes data archived/year
  • 100 standard products
  • 10 dedicated facilities in Europe
  • 700 approved science user projects

5
And More
Bio-informatics
  • For access to data
  • Large network bandwidth to access computing
    centers
  • Support of Data banks replicas (easier and
    faster mirroring)
  • Distributed data banks
  • For interpretation of data
  • GRID enabled algorithms BLAST on distributed
    data banks, distributed data mining

6
Common Ground
  • Large amounts of data
  • Distributed, ad-hoc user community
  • Problems are distributable
  • Need for resources grows faster than market
  • Network grows faster than the application needs
  • Willingness to share resources
  • if security and integrity is guaranteed

7
The One-Liner
  • Resource sharing and coordinated problem solving
    in dynamic multi-institutional virtual
    organisations

8
What is Grid computing?
  • Dependable, consistent and pervasive access
  • Combining resources from various organizations
  • Virtual Organizations user-based view on Grid
  • Technical challenges
  • transparent decisions for the user
  • uniformity in access methods
  • secure crack resistant
  • authentication, authorization, accounting (AAA)
    quota

9
Grid Middleware
  • Globus Project started 1997
  • de facto-standard
  • Reference implementation of Gridforum standards
  • Large community effort
  • Basis of several projects, including EU-DataGrid
  • Toolkit bag-of-services' approach
  • Successful test beds, with single sign-on, etc

10
In The Beginning
  • Distributed Computing
  • synchronous processing
  • High-Throughput Computing
  • asynchronous processing
  • On-Demand Computing
  • dynamic resources
  • Data-Intensive Computing
  • databases
  • Collaborative Computing
  • science

Ian Foster and Carl Kesselman, editors, The
Grid Blueprint for a New Computing
Infrastructure, Morgan Kaufmann, 1999
11
Grid Architecture
Make all resources talk standard
protocols Promote interoperability of application
toolkit, similar to interoperability of networks
by Internet standards
Application Toolkits
DUROC
MPICH-G2
Condor-G
VLAM-G
Grid Services
GRAM
GridFTP
MDS
ReplicaSrv
Grid Security Infrastructure (GSI)
Grid Fabric
Condor
MPI
PBS
Internet
Linux
SUN
12
OGSA new directions
  • Looks superficially like web services
  • Based on common standards
  • WSDL
  • SOAP
  • UDDI
  • Adds
  • Transient services
  • State of distributed activities
  • Workflow, videoconf, distributed data analysis
  • Management of service instances
  • Grid Security Infrastructure

13
EU DataGrid
HEP, EO, Bio
ResourceBroker
Data ReplicasDatabasesMass storage
FabricNetwork
14
Looking for Resources
  • Resource Brokerage based on matchmaking (Condor)
  • Information Services Mesh
  • Meta-computing directory
  • Replica Catalogues
  • DataGrid http//marianne.in2p3.fr/

15
Submitting a Job
16
Locating a Replica
  • Grid Data Mirror Package
  • Moves data across sites
  • Replicates both files and individual objects
  • Catalogue used by Broker
  • Replica Location Service (giggle)
  • Read-only copies owner by the Replica Manager.
  • http//cmsdoc.cern.ch/cms/grid

17
Sending Your Data
  • Tape robots, disks, etc. share GridFTP interface
  • Supports single-sign-on and confidentiality
  • Optimize for high-speed gt1Gbit/s networks
  • In the future automatic optimizations,
    bandwidth reservations, directory-enabled
    networking,

18
Grid-enabled Databases?
  • SpitFireuniform access to persistent storage on
    the Grid
  • Multiple roles support
  • Compatible with GSI (single sign-on) though CoG
  • Uses standard stuff JDBC, SOAP, XML
  • Supports various back-end data bases

http//hep-proj-spitfire.web.cern.ch/hep-proj-spit
fire/
19
DataGrid Test Bed 1
  • DataGrid TB1
  • 14 countries
  • 21 major sites
  • Growing rapidly
  • Submitting Jobs
  • Login only once,run everywhere
  • Cross administrativeboundaries in asecure and
    trusted way
  • Mutual authorization

20
DutchGrid Platform
  • DutchGrid
  • Test bed coordination
  • PKI security
  • Participation by
  • NIKHEFFOM, VU, UvA, Utrecht, Nijmegen
  • KNMI, SARA
  • AMOLF
  • DAS-2 (ASCI)TUDelft, Leiden, VU, UvA, Utrecht
  • Telematics Institute

21
And now for some Technical Details
  • For Users

22
Resources
  • Current startup-resources to be (ab)used
  • NIKHEF
  • Several Globus test machines (try them now from
    your desk!)
  • 50x2 CPUs D0 cluster
  • 2x10x2 (40) CPUs LHCb at NIKHEF(WCW) VU
  • 10x2 CPUs Alice NIKHEF(WCW)
  • ca. 4x2 CPUs Alice Utrecht
  • ca. 10x2 CPUs D0 Nijmegen
  • Lots of disk dedicated 1.3TByte cache server
  • DAS-II 200 dual-PIIIs systems some disk
    (2TByte)
  • Spread over 5 locations (NIKHEF is one!)
  • SARA tape robot (gt200TByte), some clusters
  • More systems (NCF) to come this year

23
Start using the grid
  • All the necessary client tools are on all
    Linux and Solaris systems
  • You just need
  • Credentials/tokens for the Grid (see next slides)
  • Authorization to use resources(you get all
    NIKHEF resources by default)
  • Information on which resources to use effectively

24
Your Grid Credentials
  • You will use resources across several domains
  • You may not care about security and authorization
  • But the remote site admin will !
  • All communications are authenticated usingX.509
    Public Key Certificates
  • The technology used to securecredit card
    transactions on the web (https//)
  • Uniquely binds name/affiliation to a digital
    token

25
Certification Authorities
  • CAs act as trusted third parties
  • Remote sites trust the CA for a proper binding
  • They will not do authentication again, soonly
    authorization left.
  • CAs are highly valuable crack one to
    impersonate others on the Grid (and abuse
    resources)
  • Registration Authorities do in-person ID checks

26
CAs in DataGrid
  • 10 National CAs (one per EU country)
  • Each one has a detailed policy and practice
    statement
  • NIKHEF operates the CA for DutchGridSee
    http//www.dutchgrid.nl/ca
  • Get a certificate from the DutchGrid CAbefore
    you can start using the Grid
  • Its valuable, protect it with a pass phrase
  • One cert valid for all DataGrid sites

27
The Proxy
  • A proxy certificate is a limited-lifetime
    delegationwithout a pass phrase to protect it
  • Implements the single sign-on for Grid
  • Valid for 12 hours (by default)
  • Use it to
  • Run your jobs
  • Get access to your data
  • Get it, by running grid-proxy-init

28
Now see for yourself
29
Getting a Certificate
  • Initialize your environment for the Grid
  • Use the Globus local guide fromhttp//www.dutchgr
    id.nl/Support/
  • Send the result to ca_at_nikhef.nlyou will be
    contacted by phone
  • Put the certificate (sent by mail) in
    yourHOME/.globus/usercert.pem
  • Or use the Web at http//certificate.nikhef.nl/use
    rhelp.html

30
Using the Grid
  • Request authorization grid.support_at_nikhef.nl
  • Look what is out there using grid-info-search
    orhttp//marianne.in2p3.fr/datagrid/giis/giis-bro
    wse.html
  • Try some local hosts
  • bilbo, kilogram, triangel
  • kilogramdavidg1009 globus-job-run
    dommel.wins.uva.nl /usr/ucb/quota -v
  • Disk quotas for random (uid 12xxx)
  • Filesystem usage quota limit timeleft
    files quota limit timeleft
  • /home/random 13067 1500000 2000000
    0 0 0
  • kilogramdavidg1010
  • Start running your analysis/MC/other jobs

31
grid-proxy-init
  • kilogramdavidg1003 grid-proxy-init
  • Your identity /Odutchgrid/Ousers/Onikhef/CNDa
    vid Groep
  • Enter GRID pass phrase for this identity
    PassPhrase
  • Creating proxy ...................................
    . Done
  • Your proxy is valid until Wed Sep 26 055053
    2001

32
GridFTP
  • Universal high-performance file transfer
  • Extends the FTP protocol with
  • Single sign-on (GSI, GSSAPI, RFC2228)
  • Parallel streams for speed-up
  • Striped access (ftp from multiple sites to be
    faster)
  • Clients gsincftp, globus-url-copy.

33
Whats Next?
  • Some of the nice user-features to come
  • Finding data files by characteristics(give me
    all golden decays)
  • Moving your job to where the data is
  • Automatic partitioning of jobs
  • Support true-interactive work
  • Better network utilisation (faster access to
    data)
  • If you are in the DataGrid project, ask your WP
    leader for authorization in TB1
Write a Comment
User Comments (0)
About PowerShow.com