Intelligent Storage and Data Management with IBM General Parallel File System (GPFS) - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Intelligent Storage and Data Management with IBM General Parallel File System (GPFS)

Description:

Then a sophisticated statistical analysis was used to understand characteristics related ... for renewable energy ... power to the people For an electric utility ... – PowerPoint PPT presentation

Number of Views:317
Avg rating:3.0/5.0
Slides: 40
Provided by: PeterSc152
Category:

less

Transcript and Presenter's Notes

Title: Intelligent Storage and Data Management with IBM General Parallel File System (GPFS)


1
Intelligent Storage and Data Management with IBM
General Parallel File System (GPFS)
  • Maciej REMISZEWSKI
  • IBM Forum 2012 Estonia
  • Tallinn, October 9, 2012

2
RedBull STRATOS redbullstratos.com/live/
3
(No Transcript)
4
(No Transcript)
5
Critical IT Trends for Technical Computing Users
How to spot trends, predict outcomes and take
meaningful actions?
Explosion of data
How to manage inflexible, siloed systems and
business processes to improve business agility?
Inflexible IT infrastructures
How to manage IT costs and complexity while
speeding time-to-market for new services?
Escalating IT complexity
6
Introducing the new IBM Technical Computing
Portfolio Powerful. Comprehensive. Intuitive.
Solutions
HPC Cloud
Integrated Solutions
IndustrySolutions
Intelligent Cluster
Big Data
Software
Platform HPC
Parallel Environment Runtime
Parallel Environment Developer
Platform LSF
Platform Symphony
Platform Application Center
GPFS
Platform MPI
Engineering and Scientific Libraries
Platform Cluster Manager
Systems Storage
BG/Q
SoNAS
LTO Tape 3592 Automation
DS5000 DS3000
PureFlex
System x BladeCenter
iDataPlex Intelligent Cluster
DCS3700
P7-775
7
NEW HPC Cloud Solutions
  • Overview
  • Innovative solutions for dynamic, flexible HPC
    cloud environments
  • Whats New
  • New LSF add-on IBM Platform Dynamic Cluster V9.1
  • Workload driven dynamic node re-provisioning
  • Dynamically switch nodes between physical
    virtual machines
  • Automated job checkpoints and migration
  • Smart, flexible policy and performance controls
  • Enhanced Platform Cluster Manager Advanced
    capabilities
  • New complete, end to end solutions

8
NEW Financial Risk and Crimes Solution
  • Overview
  • High-performance, low-latency integrated risk
    solution stack with Platform Symphony - Advanced
    Edition and partner products including
  • BigInsights and IBM Algorithmics
  • 3rd party Partner products Murex and Calypso
  • Whats New
  • New solution stacks to manage and process big
    data with speed and scale
  • Sales tools that highlight value of IBM Platform
    Symphony
  • Financial Risk Customer testimonial videos
    inclusion in SWG risk frameworks, and SD
    blueprints
  • Financial Crime with BigInsight for credit card
    fraud analytics
  • TCO tool and benchmarks

Use Case 1Financial Risk including Credit Value
Adjustment (CVA) analytics
  • Accelerates compute intensive workloads up to 4X
    e.g. Monte Carlo simulations, Algorithmic
    Riskwatch cube simulations
  • Integrated with IBM Algorithmics, Murex and
    Calypso
  • High throughput 17K tasks/sec

Use Case 2 Big Data for Financial Crimes
  • Accelerates analyses of data for fraud and
    irregularities
  • Supports BigInsight
  • Faster than Apache Hadoop distribution

8
9
NEW Technical Computing for Big Data Solutions
  • Overview
  • High-performance, low-latency Big Data
    solution stack featuring Platform Symphony, GPFS,
    DCS3700, Intelligent Cluster proven across many
    industries
  • Low Latency Hadoop stack with Platform Symphony,
    Advanced Edition and InfoSphere BigInsights
  • Whats New
  • New solution stacks to manage and process big
    data with speed and scale

9
10
IBM is delivering a Smarter Computing foundation
for Technical Computing
Designed for data
Achieve faster time to insight with scalable, low
latency data access and control for Big Data
analytics
Tuned to the task
Increase throughput, utilization, and lower
operating costs with workload optimized systems
and intelligent resource management
Optimize agility with on-demand and
workload-driven dynamic cluster, grid, and HPC
clouds
11
  • Technical Computing Software
  • Simplified management, optimized performance
  • The backbone of Technical Computing

12
IBM acquired Platform Computing, a leader in
cluster, grid, and HPC cloud management software
  • 20 year history delivering leading management
    software for technical computing and analytics
    distributed computing environments
  • Enable the usage and management of 1000s of
    systems as 1 - powering the evolution from
    clusters to grids to HPC clouds
  • 2000 global customers including 23 of 30 largest
    enterprises
  • Market leading scheduling engine with high
    performance, mission-critical reliability and
    extreme scalability
  • Comprehensive capability footprint from
    ready-to-deploy complete cluster systems to large
    global grids
  • Heterogeneous systems support
  • Large ISV and global partner ecosystem
  • Global services and support coverage

De facto Standard for Commercial HPC
60 of top Financial Services
Over 5 MM CPUs under management
x
From June 2012, IBM Platform Computing portfolio
is ready to deploy!
12
13
IBM Platform Computing can help accelerate your
application results
For technical computing and analytics distributed
computing environments
  • Batch and highly parallelized
  • Policy resource-aware scheduling
  • Service level agreements
  • Automation / workflow

Optimizes Workload Management
  • Compute Data intensive apps
  • Heterogeneous resources
  • Physical, virtual, cloud
  • Easy user access

AggregatesResource Pools
  • Multiple user groups, sites
  • Multiple applications and workloads
  • Governance
  • Administration/ Reporting / Analytics

Delivers Shared Services
  • Workload-driven dynamic clusters
  • Bursting and in the cloud
  • Enhanced self-service / on-demand
  • Multi-hypervisor and multi-boot

Transforms Static Infrastructure to Dynamic
13
14
Clients span many industries
x
Watch Red Bull video
15
IBM also offers the most widely used,
commercially available, technical computing
data management software
16
GPFS pioneered Big Data management
  • File system
  • 263 files per file system
  • Maximum file system size 299 bytes
  • Maximum file size equals file system size
  • Production 5.4 PB file system
  • Number of nodes
  • 1 to 8192

Extreme Scalability
No Special Nodes Add/remove on the
fly Nodes Storage Rolling Upgrades Administer
from any node Data replication
Proven Reliability
High Performance Metadata Striped Data Equal
access to data Integrated Tiered storage
Performance
17
IBM innovation continues with GPFS Active File
Management (AFM) for global namespace
x
1993
2005
2011
18
x
Storage Rich Servers
Legacy HPC Storage
NSD/GNRx
TSM/HPSS
NSD
Native Raid
GPFS Eliminating data islands
GNRx
AFM
AFM Multicluser
AFM
SNC
Legacy NFS Storage
MapReduce Farm
19
How can GPFS deliver value to your business?
x
20
Speed time-to-market with faster analytics
x
  • Issue
  • We are in the era of Smarter Analytics
  • Data explosion makes I/O a major hurdle.
  • Deep analytics result in longer running workloads
  • Demand for lower-latency analytics to beat the
    competition
  • GPFS was designed for complex and/or large
    workloads accessing lots of data
  • Real time disk scheduling and load balancing
    ensure all relevant information and data can be
    ingested for analysis
  • Built-in replication ensures that deep analytics
    workloads can continue running should a hardware
    or low level software failure occur.
  • Distributed design means it can scale as needed

21
Reduce storage costs thru Life-cycle Management
x
  • Issue
  • Increasing storage costs as dormant files sit on
    spinning disks
  • Redundant files stored across the enterprise to
    ease access
  • Aligning user file requirements with cost of
    storage
  • GPFS has policy-driven, automated tiered storage
    management for optimizing file location.
  • ILM tools manage sets of files across pools of
    storage based upon user requirements
  • Tiering across different economic classes of
    storage SSD, spinning disk, tape regardless
    of physical location.
  • Interface with external storage sub-systems such
    as TSM and HPSS to exploit ILM capability
    enterprise-wide.

22
Maintain business continuity thru disaster
recovery
x
  • Issue
  • Need for real-time or low latency file access
  • File data contained in geographic areas
    susceptible to downtime
  • Fragmented file based information across a wide
    geographic area
  • GPFS has inherent features that are designed to
    ensure high availability of file-based data
  • Remote file replication with built-in failover
  • Multi-site clustering enables risk reduction of
    stored data via WAN
  • Space efficient point-in-time snapshot view of
    the file system enabling quick recovery

23
Innovate with Big Data or Map-Reduce/Hadoop
x
  • Issue
  • Unlocking value in large volumes of unstructured
    data
  • Mission critical applications requiring
    enterprise-tested reliability
  • Looking for alternatives to the Hadoop File
    System (HDFS) for map-reduce applications
  • As part of a Research project, there is an active
    development project called GPFS-SNC to provide a
    robust alternative to HDFS
  • HDFS is a centralized file system with a single
    point of failure, unlike the distributed design
    of GPFS
  • GPFS Posix compliance expands the range of
    application that can access files (read, write,
    append) vs HDFS which cannot append or overwrite.
  • GPFS contains all of the rich ILM features for
    high availability and storage management, HDFS
    does not.

24
Knowledge management and efficiency thru file
sharing
x
  • Issue
  • Geographically dispersed employees need access to
    same set of file based information
  • Supporting follow-the-sun product engineering
    and development processes (CAD, CAE, etc)
  • Managing and integrating the workflow of highly
    fragmented and geographically dispersed file data
    generated by employees
  • GPFS global name space support and Active File
    Management provide core capabilities for file
    sharing
  • Global namespace enables a common view of files,
    file location no matter where the file requestor,
    or file resides.
  • Active File Management handles file version
    control to ensure integrity.
  • Parallel data access allows for large number of
    files and people to collaborate without
    performance impact.

25
  • Intelligent Cluster
  • System x iDataPlex
  • Optimized platforms to right-size
  • your Technical Computing operations

26
IBM leadership for a new generation of Technical
Computing
x
  • Technical Computing is no longer just the domain
    of large problems
  • Businesses of all sizes need to harness the
    explosion of data for business advantage
  • Workgroups and departments are increasingly using
    clustering at a smaller scale to drive new
    insights and better business outcomes
  • Smaller groups lack the skills and resources to
    deploy and manage the system effectively
  • IBM brings experience in supercomputing to
    smaller workgroup and department clusters with
    IBM Intelligent Cluster
  • Reference solutions for simple deployment across
    a range of applications
  • Simplified end-to-end deployment and resource
    management with Platform HPC software
  • Factory integrated and installed by IBM
  • Supported as an integrated solution
  • Now even easier with IBM Platform Computing

IBM Technical Computing expertise
IBM intelligence for clusters of all sizes!
27
IBM Intelligent Cluster its about faster
time-to-solution
x
Take the time and risk out Technical Computing
deployment
Building Blocks Industry-leading IBM and 3rd
Party components
Cluster Management
IBM Intelligent Cluster
Factory-integrated, interoperability-tested
system with compute, storage, networking and
cluster management tailored to your requirements
and supported as a solution!
OS
Design Build Test Install Support
Management Servers
Compute Nodes
Networking
Allows clients to focus on their business not
their IT that is backed by IBM
Storage
28
IBM Intelligent Cluster simplifies large and
small deployments
x
Large
Small
Research
LRZ SuperMUC Europe-wide research cluster 9,587
servers, direct-water cooled
University of Chile Earthquake prediction and
astronomy 56 servers, air-cooled
Media
Illumination Entertainment 3D Feature-length
movies 800 iDataPlex servers Rear-Door Heat
eXchanger cooled
Kantana Animation Studios Thailand television
production 36 iDataPlex servers, air-cooled
29
  • Technical Computing Storage
  • Complete, scaleable, dense solutions
  • from a single vendor

30
IBM System Storage for Technical Computing
x
SONAS
  • Complete, scaleable, integrated solutions from
    single vendor
  • Scaling to the multi-petabyte and hundreds
    gigabyte/sec
  • Industry leading data management software and
    services
  • Big Green features lower overall costs
  • Worldwide support and service

31
IBMs Densest Storage Solution Just Got Better
x
IBM System Storage DCS3700 Performance Module
6Gb/s x4 SAS-based storage system
  • Expandable performance, scalability and density
    starting at entry-level prices
  • Powerful hardware platform
  • 2.13GHz quad core processor
  • 12, 24 or 48GB cache / controller pair
  • 8x base 8Gb FC ports / controller pair
  • Additional host port options via Host Interface
    Cards
  • Drastically Improved Performance
  • Supports up to 360 drives
  • Fully supports features in recent and upcoming
    releases
  • 10.83 feature set
  • DDP, Enhanced FlashCopy, FlashCopy Consistency
    Groups, Thin Provisioning, ALUA, VAAI

32
Expanded Capabilities of IBMs Densest Storage
Solution
x
IBM System Storage DCS3700 now with Performance
Module Option 6Gb/s x4 SAS-based storage system
  • Expandable performance, scalability and density
    starting at entry-level prices
  • New DCS3700 Performance Controller
  • High density storage system designed for General
    Purpose Computing and High Performance Technical
    Computing applications
  • IBMs densest disk system 60 drives and dual
    controllers in 4U now scales to over 1PB per
    system with 3TB drives
  • New Dynamic Disk Pooling feature enables easy to
    configure Worry-Free storage reducing maintenance
    requirements and delivering consistent
    performance
  • New Thin Provisioning, ALUA, VAAI, Enhanced
    FlashCopy features deliver increased utilization,
    higher efficiency, and performance
  • Superior serviceability and easy installation
    with front load drawers
  • Bullet-proof reliability and availability
    designed to ensure continuous high-speed data
    delivery

33
The DCS3700 Can Scale In clusterswith IBM GPFS
x
  • Combining IBMs GPFS clustered file management
    software and DCS3700, creates an extremely
    scalable and dense file-based management system
  • Using a flexible architecture, building blocks
    of DCS3700GPFS can be organized

Single Building Block Two Building Blocks
Configuration 2 GPFS x3650 Servers 3 DCS3700 4 GPFS x3650 Servers 6 DCS3700
Capacity Raw Usable 360TB 262TB 720TB 524TB
Streaming Rate Write Read Up to 4.8 GB/s Up to 5.5 GB/s Up to 9.6 GB/s Up to 11.0 GB/s
IOP Rate (4K trans.) Write Read 3,600 IOP/s 6,000 IOP/s 7,200 IOP/s 12,000 IOP/s
34
  • Customer Success Stories
  • Applying IBM technology and experience
  • to solve real-world issues and deliver value

35
National Digital Archive (NDA) PolandHeritage
and cultural preservation for society.
x
The Need NDA needed a cost-effective IT solution
that it could use to significantly increase
internal efficiencies and archiving capacity. NDA
currently holds almost 15 million photographs,
30,000 sound recordings and 2,500 films. It
provides free access to these materials. The
Solution The client implement a solution based
on IBM Power Systems servers, IBM System Storage
devices, and IBM GPFS. To provide scalability
for the ongoing work, CompFort Meridian helped
NDA implement an IBM Power 750 Express server.
Using this system, the client will be able to
rapidly expand the digital archive without
impeding the performance of the ZoSIA service.
NDA also uses IBM General Parallel File System to
conduct online storage management and integrated
information lifecycle management and to scale
accessibility to the expanding volume of archived
material. Using this solution, the client can
maintain the performance of the ZoSIA service
even when numerous users access the same resource
at the same time. The Benefit NDA saved its
nearly 290,000 users an estimated 35million by
enabling them to check archives online rather
than spending the time and money required to
visit NDA in person. The client also gained a
high-performance, stable and secure solution to
support the ZoSIA online archive system. In
addition, with this solution in place, NDA can
consolidate national and cultural remembrance -
and extend a sense of national origin and
heritage into the future.
  • Solution components
  • IBM Power
  • IBM General Parallel File System

36
DEISAEnabling 15 European supercomputing centers
to collaborate
x
The need DEISA wanted to advance European
computational science through close collaboration
between Europes most important supercomputing
centers by supporting challenging computing tasks
and sharing data across a wide-area network. The
solution To allow different IBM and non-IBM
supercomputer architectures to access data from
across a wide-area network, DEISA worked closely
with the IBM Deep Computing team to create a
global multicluster file system based on IBM
General Parallel File System (GPFS).
Our work with IBM GPFS demonstrates the
viability and usefulness of a global file system
for collaboration and data-sharing between
supercomputing centerseven when the individual
supercomputing clusters are based on very
different technical architectures. The
flexibility of GPFS and its ability to support
all the different DEISA supercomputers is highly
impressive. Dr. Stefan Heinzel, Director of
the Rechenzentrum Garching at the Max Planck
Society
  • The benefit
  • Allows scientists in different countries to share
    supercomputing resources and collaborate on
    large-scale projects
  • Enables allocation of specific computing tasks to
    the most suitable supercomputing resources,
    boosting performance.
  • Provides a rapid, reliable and secure shared file
    system for a variety of supercomputing
    architecturesboth IBM and non-IBM.
  • Solution components
  • IBM Power Systems
  • IBM BlueGene/P
  • IBM PowerPC
  • Several non-IBM supercomputing architectures
    including Cray XT5, NEC SX8 and SGI-Altix
  • IBM General Parallel File System (GPFS)

37
SnecmaHPC to achieve regulatory objectives
x
The Need To reduce its impact on the
environment, Snecma adopts a number of key
factors such as reducing fuel consumption and
therefore greenhouse gas emissions, reducing
noise and choosing environment-friendly materials
for manufacturing and maintenance of aviation
engines. The company was required to meet the
'Vision 2020' plan set by the European community.
The plan defines the European aviation industrys
objectives for 2020, with ambitious environmental
objectives, including - 50 reduction in
perceived noise and CO2 releases per
passenger-kilometer - 80 reduction in nitrogen
oxide (NOx) emissions compared to the year
2000. To meet these objectives, Snecma needed
heavy investment in research and development,
powered with supercomputers. The Solution Snecma
implemented a powerful high performance computing
(HPC) environment with optimal energy efficiency.
The core architectural components were based upon
highly dense, low power server cluster packaging,
low latency interconnect, and a high performance
parallel file system. Thanks to IBM
technologies, Snecma gains a powerful and
reliable high performance computing solution. The
new supercomputer will be used by leading-edge
researchers to make highly complex computations
in the aviation field. The simulations carried on
iDataPlex supercomputer allow Snecma to reduce
fuel consumption and therefore greenhouse gas
emissions, reduce noise, while addressing data
center energy crisis.
  • Solution components
  • IBM System x iDataPlex
  • IBM General Parallel File SystemTM

38
x
  • Vestas Wind Systems
  • Maximize power generation and durability in its
    wind turbines with HPC
  • The Opportunity
  • What Makes it Smarter
  • This wind technology company relied on the World
    Research and Forecasting modeling system to run
    its turbine location algorithms, in a process
    generally requiring weeks and posing inherent
    data capacity limitations. Poised to begin the
    development of its own forecasts and adding
    actual historical data from existing customers to
    the mix of factors used in the model, Vestas
    needed a solution to its Big Data challenge that
    would be faster, more accurate, and better suited
    to the its expanding data set.
  • Precise placement of a wind turbine can make a
    significant difference in the turbine's
    performanceand its useful life. In the
    competitive new arena of sustainable energy,
    winning the business can depend on both value
    demonstrated in the proposal and the speed of RFP
    response. Vestas broke free of its dependency on
    the World Research and Forecasting model with a
    powerful solution that sliced weeks from the
    processing time and more than doubled the
    capacity needed to include all the factors it
    considers essential for accurately predicting
    turbine success. Using a supercomputer that is
    one of the world's largest to-date and a modeling
    solution designed to harvest insights from both
    structured and unstructured data, the company can
    factor in temperature, barometric pressure,
    humidity, precipitation, wind direction and wind
    velocity at the ground level up to 300 feet,
    along with its own recorded data from customer
    turbine placements. Other sources to be
    considered include global deforestation metrics,
    satellite images, geospatial data and data on
    phases of the moon and tides. The solution raises
    the bar for due diligence in determining
    effective turbine placement.
  • Real Business Results
  • Reduces from weeks to hours the response time for
    business user requests
  • Provides the capability to analyze ALL modeling
    and related data to improve the accuracy of
    turbine placement
  • Reduces cost to customers per kilowatt hour
    produced and increases the precision of customer
    ROI estimates
  • Solution Components
  • Today, more and more sites are in complex
    terrain. Turbulence is a big factor at these
    sites, as the components in a turbine operating
    in turbulence are under more strain and
    consequently more likely to fail. Avoiding these
    pockets of turbulence means improved cost of
    energy for the customer."
  • - Anders Rhod Gregersen,Senior Specialist, Plant
    Siting Forecasting
  • IBM Technical Computing General Parallel File
    System
  • IBM InfoSphere BigInsights Enterprise Edition
  • IBM System x , iDataPlex

39
  • best to hear from a Client themselves
  • Please join me in welcoming
  • Ivar Koppel
  • deputy director of research
Write a Comment
User Comments (0)
About PowerShow.com