SecureFiles - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

SecureFiles

Description:

... Finance, Insurance, Banking. Compliance. Web 2.0 ... Mostly photos and videos. up to 3x speedup for large image loading. Major research institution ' ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 56
Provided by: seAuck
Category:

less

Transcript and Presenter's Notes

Title: SecureFiles


1
SecureFiles
  • VLDB August 23 - 28, 2008
  • Database Storage Development
  • Oracle Corporation

2
A revolutionary technology for unstructured
(file) data storage, specifically engineered to
provide filesystem like performance and advanced
filesystem and database features all within the
database server (released in 2007)
3
Agenda
  • Preface
  • Introduction
  • Performance Proof Points
  • SecureFiles Show Case NIF LLN Laboratories, USA
  • Architecture
  • Advanced Features

4
Enterprise Data Growth
  • Yearly Data Growth (IDC, Gartner)
  • Structured 15 20
  • Unstructured 50- 200
  • Drivers for unstructured data growth
  • Increased digitization of content
  • Healthcare, Finance, Insurance, Banking
  • Compliance
  • Web 2.0
  • Scientific/Research Community
  • Storage, network and processor bandwidth
  • By 2010, enterprise data volumes are expected to
    reach multi petabytes ingested on hundreds of
    cores

5
Challenges for Near Future
  • As data volumes and ingestion rates step up,
    requirement arises for
  • Maximum storage throughput and scalability
  • Highest degree of robustness through atomicity,
    consistency, durability, security and
    availability
  • Scalable query ability of metadata and
    manageability
  • Efficient storage utilization space and power
    costs
  • Effective storage lifecycle management

6
Why not Databases for Unstructured Data?
  • Current solution
  • Filesystems
  • preferred choice for unstructured data storage
  • Low performance and scalability of RDBMS is a
    major reason
  • RDBMS preferred choice for relational data
    accompanying files
  • Fragmented solution, however, is not a long-term
    one

7
Consolidation Without Compromises
  • Lack of consolidation compromises security,
    robustness, and management
  • Disjoint security and auditing models
  • Differences in transaction semantics
  • Integrity and Consistency not guaranteed
  • Backup and recovery are fragmented
  • Storage management is complicated
  • Separate interfaces and protocols
  • Two data storage managers for one application is
    one too many as data volumes explode
  • Data Storage Integration Precursor to
    Information Integration
  • Need for consolidated industry-strength
    semi-structured data management solution that
    does not compromise on challenges

8
Vision
  • Jim Gray For less than 1MBDB faster than
    FilesystemMost things are less than 1MBDB
    should work to make this 10 MBFilesystem should
    borrow ideas from DB FAST 2005
  • David DeWitt Objects and Databases in 2006We
    envision large enterprises reaping the benefits
    of families of products that offerA Fully
    Integrated Solutionscalably and robustly VLDB
    1996
  • Michael Stonebraker There have been some
    extensions over the yearstime has come for a
    complete rewrite VLDB 2007

9
Agenda
  • Preface
  • Introduction
  • Performance Proof Points
  • SecureFiles Show Case NIF LLN Laboratories, USA
  • Architecture
  • Advanced Features

10
SecureFiles Consolidated Secure Management of
Data
  • SecureFiles is a new Oracle 11g database server
    feature designed to break the performance barrier
    that has been keeping file data out of databases
  • Delivers comparable performance with respect to
    traditional filesystems for all file sizes
    without compromising on throughput and
    scalability
  • Maximizes throughput to match underlying device,
    single instance multi-core systems as well
    clusters
  • Scales from terabytes to petabytes on all storage
    tiers
  • Enables consolidation of file data with
    associated relational data
  • Single platform of storage
  • Single security model
  • Single view and management of data
  • Extends security, reliability, and scalability of
    database to file data.
  • Is a cluster filesystem
  • Provides high scalability and availability using
    commodity hardware
  • Leverages and extends Real Application Clusters
    (RAC) cache fusion technology

11
SecureFiles
  • Implemented in a parallel integrated filesystem
    Stack. Designed from Ground Up, for the next
    10-15 years
  • Layered Filesystem extensible transform
    architecture
  • Dynamic Write Buffering Deferred write requests
    within transaction boundaries. Full utilization
    of I/O bandwidths
  • Space Management Scalable in-memory management
    of free space metadata within SMPs and across
    clusters maintaining ACID properties. Self
    adaptive best fit on-disk data layout optimizing
    I/O requests
  • Inode and I/O Management Fast scalable access
    of file metadata for sequential as well as random
    access. Scales with concurrency as well as across
    clusters. Parallel, pipelined, asynchronous I/Os,
    intelligent read-ahead based on access patterns,
    overlap of network and storage bandwidths

12
The Best of Filesystems and Databases
  • SecureFiles have all the leading-edge file system
    capabilities
  • Options for Deduplication, Encryption,
    Compression, Snapshots
  • SecureFiles have advanced database features not
    in file systems
  • Transactions, Read Consistency, Various
    Durability Options
  • Readable Standby, Consistent Incremental Backup,
    Point in Time Recovery
  • Unlimited Temporal Data Access using Oracle
    Flashback Archive
  • Sliding Inserts Using Delta Updates Inherent
    Support for XML operations
  • Text, Functional and XML Indexes
  • Search across meta-data and file content
  • Partitioning and ILM
  • Leading the architectural confluence of databases
    and filesystems
  • Having the best of both worlds removes the need
    to compromise
  • Visions Fulfilled in the Domain of
    Storage

13
Agenda
  • Preface
  • Introduction
  • Performance Proof Points
  • SecureFiles Show Case NIF LLN Laboratories, USA
  • Architecture
  • Advanced Features

14
SecureFiles vs NFS
15
Experiment Setup
  • Clients 2 hyperthreaded Intel Xeon 2.8 GHz, 6GB
    RAM, RHEL
  • Server 2 hyperthreaded Intel Xeon 3.2 GHz, 6GB
    RAM, RHEL, 2Gb fiber channel SAN host adapter
  • OCI client for the database, NFSv3 client for
    filesystem, TCP/IP
  • Server machine running Oracle 11g database server
    and NFSv3 server
  • Two 2 TB Raid 5 storage arrays
  • Managed by Ext3
  • Managed by Oracle ASM

16
Dataset
  • DICOM application consisting of digital
    diagnostic images and patient information.
  • Images are stored on Ext3 FS fileserver accessed
    through NFSv3 Vs Images stored as SecureFiles
    within the database
  • Patient information is stored in OracleDB in both
    cases.
  • Test images range from 10KB to 100MB, with total
    data size from 1GB upto 100GB
  • Filesystem_like_logging used for securefiles
    similar to filesystems with metadata journaling.

17
Single Threaded Read Performance
18
Multi Threaded Read Performance
19
Single Threaded Write Performance
20
Multi Threaded Write Performance
21
Single DB Instance Scalability
22
Scalability Document Archiving Workload
23
Scalability Image and Video Storage
24
Secure Files on RAC
25
Setup
  • 4 node RAC, Xeon 3.4 GHz, 2 CPUs, 6GB RAM, 3 EMC
    CX700 connected through 2 switches
  • DICOM application dataset reused
  • SecureFiles Filesystem_like_logging, NoCache

26
Breaking The Performance Barriers
Writes
27
Breaking The Performance Barriers
Reads
28
SecureFiles Performance Summary
  • High Performance meets 100 data storage and
    access requirements
  • 462MB/s Ingest, 776MB/s Reads
  • Meets or Beats NFS/Ext3 performance on same h/w
  • Solution for the future
  • YouTube - 65,000 uploads a day, 100MB maximum
    video size, 6.5TB of uploads a day
  • SecureFiles on 4 nodeRAC (in-house test setup)
    30TB of possible insertss a day, 4x of the peak
    YouTube requirement

29
Early Beta (external) User Feedback
  • Major telecommunication company
  • fingerprinting application for govt agencies
  • up to 7x speedup with SecureFiles
  • Major digital video company
  • Digital asset management
  • Mostly photos and videos
  • up to 3x speedup for large image loading
  • Major research institution
  • The tests showed a clear performance advantage
    of storing LOB data in SecureFiles by a factor of
    upto 5.45 times better

30
Agenda
  • Preface
  • Introduction
  • Performance Proof Points
  • SecureFiles Show Case NIF LLN Laboratories, USA
  • Architecture
  • Advanced Features

31
National Ignition Facility
  • Worlds largest and most energetic laser
  • Experiments to harness the potential of fusion as
    future source of safe and usable energy
  • When fully operational, 192 laser beams will
    generate 500 trillion watts, pulse energy 1.8 MJ,
    pulse length 20 billionth of second

32
Content Management Requirements
  • Optics make NIF work laser glass slabs,
    crystals, lenses, precision optical components
  • High resolution cameras
  • Generate multiple images every 6 seconds for 6
    hours
  • Needs to be processed within 6 seconds
  • For detection of defects on optical surfaces (1/5
    human hair size)
  • Throughput requirement more than 40 MB/sec
  • More than 300 TB of storage by 2010
  • 30 yrs of retention (on tape)

33
On Production With SecureFiles
  • Performance
  • Able to push the server and storage to the
    maximum
  • Provides more than 100MB/s on their in-house
    hardware
  • Storage Overhead
  • Mitigated with compression
  • Manageability
  • Consolidated store for images and metadata
  • Better governance
  • Archival
  • Unlimited history
  • Partitioning
  • Backup and Recovery of all data

34
Conclusion
  • Unification of storage of unstructured and
    structured data without compromises A
    Reality

35
Special Note
  • We continue to innovate
  • Lots of challenges in the field of data storage
    management
  • We collaborate extensively with our users in the
    scientific community for new ideas, requirements
    and feedback
  • We welcome you to join us
  • We welcome you to collaborate with us

36
Agenda
  • Preface
  • Introduction
  • Performance Proof Points
  • SecureFiles Show Case NIF LLN Laboratories, USA
  • Architecture
  • Advanced Features

37
Layered Architecture
  • Write Gather Cache enables large disk I/O and
    contiguous space allocation.
  • Layering allows multiple data transformations to
    be applied.
  • Features like delta updates are pluggable based
    on application requirements.
  • Delta updates support non-length preserving
    updates to be performed efficiently.
  • SF Compression provide significant space savings.
  • SF Encryption extends DB security to file data.
  • Deduplication eliminates multiple copies of
    identical data.

38
Delta Updates
  • Enables non-length-preserving updates on Oracle
    SecureFile objects
  • Provides special APIs to the user to specify the
    object, the list of delta, their lengths and
    offsets
  • I/O size proportional to the length of the update

39
Write Gather Cache
  • Subset of database buffer cache private to Oracle
    SecureFiles
  • User specified parameter governs buffering of
    data during write operations before flushing to
    an underlying storage layer
  • Maintained on a per transaction basis
  • Optimizes on-disk layout of SecureFiles data

40
Securefile Deduplication
  • Eliminates duplicates of identical file data.
    Results in efficient space usage.
  • File copy is an efficient non-space consuming
    operation.
  • More duplicates, the higher the space savings.
  • A secure hash is evaluates over the file contents
    and stored in a per-segment index
  • Prefix-matched to identify potential duplicates
    followed by byte-byte comparison to eliminate
    false positives or potentially hash collisions.
  • Updates result in break-away from the source and
    copy-on-write.
  • Content management, email and data archive
    applications can greatly leverage deduplication.

41
Compression and Encryption
  • Results in reduced I/O and significant space
    savings.
  • Transparent to end users
  • Efficient random access with partial
    decompression/decryption.
  • Random updates involve updating only specific
    portion of the data.
  • Layering enables features like encryption to
    encrypt less data when compressed.
  • Encryption using 3DES 168 and AES 128, 192 and
    256 bit key size.
  • Encryption leverage existing database security
    model.

42
Inode and I/O Management
  • Responsible for maintaining persistent,
    transactional disk structures that maps file data
    to physical storage space.
  • Metadata is either a simple array of chunk
    entries for small files or grows into B tree for
    large files.
  • Enables efficient random access to an arbitrary
    offset within the data.
  • Highly scalable disk layout to map TB sized
    objects efficiently
  • Inode metadata is transactionally managed and
    recoverable across process, instance and media
    failures.
  • In-place updates for small changes. Large changes
    are versioned at appropriate levels of
    granularity.
  • Supports intelligent pre-fetching based on access
    patterns and asynchronous writes within
    transactions.
  • Reduces read/write latency by overlapping network
    and storage throughput

43
Efficient Space Management
  • Supports variable sized chunks upto 64M.
  • Allocation based on best fit approach.
  • In-memory dispenser primary high-performance
    allocation provider.
  • CFS unit Pool of committed free space blocks.
  • UFS unit Pool of de-allocated uncommitted free
    space blocks.
  • Space freed not reused until retention time
    allows for CR.
  • Space reclaimation is a background process.

44
Database Semantics in SecureFiles
  • Atomicity
  • SF is a transaction data store.
  • Ability to rollback and recover from transaction
    failures
  • Copy on write semantics for large size
    modifications
  • Undo generation for metadata and small data
    changes
  • Read Consistency
  • Multi version read consistency for relational
    data
  • Retention of old versions of SecureFile objects
    up to a user specified amount of time. Read
    requests on SecureFile objects succeed as long as
    versions as of the query time are retained
  • Data Durability
  • Relational data and SecureFile metadata are
    always logged to achieve durability across
    instance database and media failures
  • Filesystem-like-logging semantics as in
    filesystems continue to achieve data durability
    across transaction and instance failures
  • User data can be logged conditionally based on
    user settings, thus allowing recovery from media
    failures.
  • .

45
Agenda
  • Preface
  • Introduction
  • Performance Proof Points
  • SecureFiles Show Case NIF LLN Laboratories, USA
  • Architecture
  • Advanced Features

46
Advanced Features Inherited from the RDBMS
  • Temporal Filesystem Features Using Flashback
  • Flashback framework allows to query, retrieve as
    well as recreate relational data consistent as of
    any point in time in the past.
  • Flashback Archive enable users to retrieve and
    recreate data as of several years before.
  • SecureFiles data retrieval at any point in time
    is guaranteed as long as the accompanying
    relational data can be retrieved.
  • SecureFiles with Flashback Archive provide a
    tamper-proof filesystem behavior to applications
    that have many practical uses in the area of data
    security.
  • Data Retrieval in Standby Systems
  • Oracle 11g provides the capability to query and
    retrieve database objects from physical standby
    database systems using Active Data Guard
  • Being first-class database objects, SecureFiles
    support query-ability of both unstructured and
    relational content on standby database systems if
    data manipulation operations on SecureFile
    objects are logged in the database Redo logs.

47
Advanced Features Inherited from the RDBMS
  • Secure Incremental Backup and Point-in-Time
    Recovery
  • Being first class database objects, secure
    encrypted backup of the database system ensures
    encrypted backup of SecureFile objects as well as
    accompanying relational data.
  • Oracle provides the capability to perform point
    in time recovery of the database.
  • Point in time recovery can be performed on
    SecureFile objects if users choose to log
    manipulation operations on SecureFile object data
    with the full LOGGING option.
  • Clustered Filesystem Features Using RAC
  • Allows share access of the entire underlying disk
    subsystem staging the database and provides
    opportunities for maximizing scalability of
    execution of database operations.
  • SecureFiles inherit the capabilities provided by
    Real Application Clusters. The design of the
    space management component in SecureFiles is
    tuned to provide scalability in throughput
    proportionally with the number of active database
    instances

48
Advanced Features Inherited from the RDBMS
  • Information Lifecycle Management Using
    Partitioning
  • Partitioning achieves effective lifecycle
    management of data.
  • SecureFiles makes use of similar partitioning
    techniques to achieve lifecycle management of
    SecureFile objects.
  • Partitioning of base tables containing the
    relational and SecureFile metadata result in
    partitioning underlying SecureFiles segments.
  • Storage Support on Flash-based Devices
  • SecureFiles architecture provides a variant of a
    log-structured filesystem.
  • The space management framework in SecureFiles
    assists the architecture to adapt to storage on
    flash devices.
  • With optimal wear-leveling, semi-structured data
    management becomes highly feasible on flash-based
    storage devices

49
More Experiments
50
WAN Performance (NFS vs SecureFiles)
51
SecureFiles Compression Data Reduction
  • Calgary dataset standard compression dataset
  • 3-9x reduction in size
  • Additional 20 reduction in size from Compress
    High

52
SecureFiles Compression Read CPU Impact
  • Compression makes Encrypted SF consume less CPU
    for Reads
  • Reading Compressed consumes 2-3x more CPU

53
SecureFiles Compression Write CPU Impact
  • Compress High can consume 2x more CPU than
    Compress medium

54
SecureFiles LZO Compression
  • LZO Write is 3x faster than ZLIB
  • LZO Read is 2x faster than ZLIB
  • ZLIB gives additional 15 compression

55
Q A
Write a Comment
User Comments (0)
About PowerShow.com