The GSI Mass Storage for Experiment Data - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

The GSI Mass Storage for Experiment Data

Description:

8 tape drives LTO2 ULTRIUM (35 MByte/s) ca 170 volumes (32 TByte) ... disks faster emptied than filled: network - disk: ~10 MByte/s. disk - tape: ~30 MByte/s ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 31
Provided by: horstg3
Category:

less

Transcript and Presenter's Notes

Title: The GSI Mass Storage for Experiment Data


1
The GSI Mass Storage for Experiment Data
  • DVEE-Palaver GSI Darmstadt
  • Feb. 15, 2005
  • Horst Göringer, GSI Darmstadt
  • H.Goeringer_at_gsi.de

2
Overview
  • different views
  • current status
  • last enhancements
  • - write cache
  • - on-line connection to DAQ
  • future plans
  • conclusions

3
GSI Mass Storage System
  • Gsi mass STORagE system
  • gstore

4
gstore storage view
5
gstore hardware view
  • 3 automatic tape libraries (ATL)
  • (1) IBM 3494 (AIX)
  • 8 tape drives IBM 3590 (14 MByte/s)
  • ca. 2300 volumes (47 TByte, 13 TByte backup)
  • 1 data mover (adsmsv1)
  • access via adsmcli, RFIO read
  • read cache 1.1 TByte
  • StagePool, RetrievePool

6
gstore hardware view
  • (2) StorageTek L700 (Windows 2000)
  • 8 tape drives LTO2 ULTRIUM (35 MByte/s)
  • ca 170 volumes (32 TByte)
  • 8 data mover (gsidmxx), connected via SAN
  • access via tsmcli, RFIO
  • read cache 2.5 TByte
  • StagePool, RetrievePool
  • write cache
  • ArchivePool 0.28 TByte
  • DAQPool 0.28 TByte

7
gstore hardware view
  • (3) StorageTek L700 (Windows 2000)
  • 4 tape drives LTO1 ULTRIUM (15 MByte/s)
  • ca. 80 volumes (10 TByte)
  • backup copy of 'irrecoverable' archives
    ...raw
  • mainly for backup of user data ( 30 TByte)

8
gstore software view
  • 2 major components
  • TSM (Tivoli Storage Manager) commercial
  • handles tape drives and robots
  • data base
  • GSI software ( 80,000 lines of code)
  • C, sockets, threads
  • - interface to user (tsmcli / adsmcli,
    RFIO)
  • - interface to TSM (TSM API client)
  • - cache administration

9
gstore user view tsmcli
  • tsmcli subcommands
  • archive file archive path
  • retrieve file archive path
  • query file archive path
  • stage file archive path
  • delete file archive path
  • ws_query file archive path
  • pool_query pool
  • any combination of wildcard characters (,?)
    allowed
  • soon file may contain list of files (with
    wildcard chars)

10
gstore user view RFIO
  • rfio_fopen
  • rfio_fread
  • rfio_fwrite
  • rfio_fclose
  • rfio_fstat
  • rfio_lseek
  • GSI extensions (for on-line DAQ connection)
  • rfio_fendfile
  • rfio_fnewfile

11
gstore server view query
12
gstore server view archive to cache
13
gstore server view archive from cache
14
gstore server view retrieve from tape
15
server view retrieve from write cache
16
gstore overall server view
17
server view gstore design concepts
  • strict separation of control and data flow
  • no bottleneck for data
  • scalable in
  • capacity (tape and disk)
  • I/O bandwidth
  • hardware independent
  • (as long as TSM support)
  • platform independent
  • unique name space

18
server view cache administration
  • multithreaded servers for read and write cache
  • each with own metadata DB
  • main tasks
  • - lock/unlock files
  • - select data movers and file systems
  • - collect actual infos on
  • disk space
  • soon data mover and disk load -gt load
    balancing
  • - trigger asynchronous archiving
  • - disk cleaning
  • several disk pools with different attributes
  • StagePool, RetrievePool, ArchivePool,
    DAQPool, ...

19
usage profile batch farm
  • batch farm 120 double processor nodes
  • gt highly parallel mass storage access (read and
    write)
  • read requests
  • 'good' user stage all files before
  • use wildcard chars
  • 'bad' user read lots of single files from
    tape
  • 'bad' system stage disk/DM crashes during
    analysis
  • write requests
  • via write cache
  • distribute as uniformly as possible

20
usage profile experiment DAQ
  • several continous data streams from DAQ
  • keep same DM during life time of data stream
  • only via RFIO
  • GSI extensions necessary
  • rfio_fendfile, rfio_fnewfile
  • disks faster emptied than filled
  • network -gt disk 10 MByte/s
  • disk -gt tape 30 MByte/s
  • gt time to stage for on-line analysis
  • enough disk buffer necessary for case of problems
  • (robot, TSM, ...)

21
current plans new hardware
  • more and safer disks
  • write cache all RAID
  • 4 TByte (ArchivePool, DAQPool)
  • read cache 7.5 TByte new RAID
  • StagePool, RetrievePool,
  • new pools, e.g. with longer file life
    time
  • 5 new data movers
  • new fail-safe entry server
  • hosts query server, cache administration servers
  • -gt query performance!
  • take-over in case of host failure
  • metadata DBs mirrored on 2nd host

22
current plans merge tsmcli /adsmcli
  • new command gstore
  • replaces tsmcli and adsmcli
  • unique name space (already available)
  • users need not care in which robot data reside
  • new archive policy computing center

23
brief excursion future of IBM 3494?
  • still heavily used
  • rather full
  • hardware highly reliable
  • should be decided this year!

24
usage IBM 3494 (AIX)
25
brief excursion future of IBM 3494?
  • 2 extreme options (and more in between)
  • no more money investment
  • use as long as possible
  • in a few years move data to other robot
  • upgrade tape drives and connect to SAN
  • 3590 (30 GB, 14 MB/s) -gt 3592 (300 GB, 40
    MB/s)
  • new media gt 700 TByte capacity
  • access with available data movers via SAN
  • new fail-safe TSM server (Linux?)

26
current plans load balancing
  • acquire actual info on no. of read/write
    processes
  • for each disk, data mover, pool
  • new write request
  • select resource with lowest load
  • new read request
  • avoid 'hot spots'
  • -gt create additional instances of stage
    file
  • new option '-randomize' for stage/retrieve
  • distribute equally to different data
    movers / disks
  • split into n (parallel) jobs

27
current plans new org. of DMs
  • Linux platform
  • more familar environment (shell scripts, Unix
    commands, ...)
  • case sensitive file names
  • current mainstream OS for experiment DV
  • '2nd level' data movers
  • no SAN connection
  • disks filled via ('1st level') DMs with SAN
    connection
  • for stage pools with guaranteed life time of
    files

28
current plans new org. of DMs
  • integration of selected group file servers
  • as '2nd level' data movers
  • disk space (logically) reserved for owners
  • pool policy according to owners
  • many advantages
  • no NFS gt much faster I/O
  • files physically distributed over
    several servers
  • load balancing of gstore
  • disk cleaning
  • disadvantages
  • only for exp. data, access via gstore
    interface

29
current plans user interface
  • a large number of user requests
  • - longer file names
  • - option to rename files
  • - more specific return codes
  • - ...
  • program code consolidation
  • improved error recovery after HW failures
  • support for successor of alien
  • GRID support
  • - gstore as Storage Element (SE)
  • - Storage Resource Manager (SRM)
  • -gt new functionalities, e.g. reserve
    resources

30
Conclusions
  • GSI concept for mass storage successfully
    verified
  • hardware and platform independent
  • scalable in capacity and bandwidth to keep up
    with
  • - requirements of future batch farm(s)
  • - data rates of future experiments
  • gstore able to manage very different usage
    profiles
  • but still a lot of work ...
  • to fully reach all discussed plans
Write a Comment
User Comments (0)
About PowerShow.com