Title: Daniel M' Seurer Oracle Public Sector daniel'seureroracle'com 608'695'0269
1Daniel M. SeurerOracle Public
Sectordaniel.seurer_at_oracle.com608.695.0269
Oracle Database 11g Unstructured Data
2Trends
- Enterprise applications incorporating content
with data - Compliance driving content to databases
- Documents becoming semi-structured, driving value
of database - Maps and location analysis are becoming part of
Business Intelligence and commercial applications - XML becoming a standard for representing
Information - Information Exchange for Web Services
- Format for Microsoft Office Documents
- Search becoming central to Information Access
3Strategy
- Evolve Database to manage all Enterprise
Information - Meld Database and File-system metaphors
- Enable integration of All Enterprise Information
Sources - Enable rich Information Retrieval capabilities
- Provide solutions built on top of the Database
4Evolving the Database .
1999
2001
2004
2007
2006
1997
- Oracle 11g
- Secure Files
- RDF Ontology
- DICOM
- Binary XML
- XML Index
- Oracle9i
- XML DB
- Repository
- SQL/XML
- Oracle 10g
- Secure Enterprise Search
- Intranets
- Databases
- Files
- Email
- DMS
- Portals
- Oracle8i
- Text
- Spatial
- Media
- Oracle8
- VLDB
- LOBs
- Object-relational
- Extensibility
- Oracle 10g
- ULDB
- Location Services
- XQuery
5Managing All Your Information
Oracle SecureFiles High Performance, Secure LOBs
- Uniform Management of content and metadata
- Scalable
- Secure
- Highly Available
- Integrated
- Robust
- Available on all platforms
XML DBIntegrated Native XML Database
Secure Enterprise SearchAcross All Enterprise
Sources
Oracle Text Text Indexing and Classification
Location SpatialLocation Enabled Databases
Semantic DatabaseOntologies, OWL, RDF
MultimediaAudio, Image and Video New with 11g
DICOM Medical Imaging
RelationalCharacters, Numbers, Dates
6LOBs
7Managing Enterprise Information
- Organizations need to efficiently and securely
manage
Semi-Structured
Unstructured
Structured
XML PDF
- Simplicity and performance of file systems makes
it attractive to store file data in file systems,
while keeping relational data in DB - Enterprise applications manipulate both files and
relational data - e.g. Document Management, Media, Medical, CAD,
Imaging
8Files belong with Relational Data
- Two data managers for one application is one too
many - The application must patch over the gap
This split compromises security, robustness, and
management
- Disjoint security and auditing models
- Changes cannot be made atomically
- Backup and recovery are fragmented
- Search across relational data and files is
difficult - Space management is complicated
- Separate interfaces and protocols
9Oracle SecureFilesConsolidated Secure Management
of Data
- SecureFiles is a new 11g feature designed to
break the performance barrier keeping file data
out of databases - Next-generation LOBs - faster, and with more
capabilities - Transparent deduplication, compression and
encryption - Leverage the security, reliability, and
scalability of database - Superset of LOB interfaces allows easy migration
from LOBs
- Enables consolidation of file data with
associated relational data - Single security model
- Single view of data
- Single management of data
- Scalable to any level using SMP scale-up, or grid
scale-out
10Designed from Scratch
- SecureFiles is a major rearchitecture of how the
database handles unstructured (file) data - Not an incremental improvement to LOBs
- Entirely new
- Disk format
- Network protocol
- Versioning and sharing mechanisms
- Caching and locking
- Redo and undo algorithms
- Space and memory management
- Cluster consistency algorithms
11SecureFile Innovations
- Write Gather Cache
- Cache above the storage layer buffers data up to
4MB during writes before flushing to disk - Allows for large contiguous space allocation for
LOB data and reduced write latency. - Intelligent Pre-fetching
- Improves read performance by pre-fetching LOB
data from disk - Overlaps disk IO with network latency to improve
throughput - New Space Management routine
- Automates new space allocation and freed space
reclamation - Optimized chunk size reduces fragmentation
- No more High Water Mark contention as with old
LOBs - Deletion and Reuse of entire LOBs not just
individual chunks.
12SF Meets filesystem Performance
File Read Performance
File Write Performance
SecureFiles
SecureFiles
Linux NFS
Linux NFS
- SQL File test, single stream, single host
- Using Secure Files is faster across the board
- Upto 2x faster for Reads, 6x for Writes
- Tests run with SecureFiles and NFS/ext3
- Meets or Beats Filesystem-like performance
No Compromises
Get filesystem performance on your favorite
platform
13SF Breaking The Performance Barrier
- High Performance
- 38TB/day ingest
- 5x YouTube
- Unlimited Scalability
- RAC for Server
- ASM for Storage
- SecureFiles is free
High Performance Experiment
800
SecureFile Reads
600
MB/s
400
200
SecureFile Writes
0
File Size (MB)
0.01
0.1
1
10
100
- 776 MB/s for File Read
- 67TB/day of data serve
- 462 MB/s for File Writes
- 38 TB/day of data ingest
- 8 sessions, 4 node RAC, x2 Xeon, 6GB RAM, 3
EMCCX700
Designed to Scale
14SecureFiles vs Old LOBs (BasicFile)Performance
Comparisons
15Secure Files Scalability Concurrent Writes, OCI,
non-RAC, 4 streams
Writes New Database
Writes after Deletes
SecureFiles
SecureFiles
LOBs
LOBs
File Size (MB)
File Size (MB)
- Adding Files using New Disk Space Upto 2x faster
- Adding Files using Deleted Space Upto 22x faster
- No highwatermark enqueue bottleneck
16Secure Files Scalability Concurrent Reads, OCI,
non-RAC, 4 streams
Concurrent Read Performance
SecureFiles
LOBs
File Size (MB)
17Advanced Features - Compression
- Huge storage savings
- Industry standard compression algorithms
- 2-3x compression for typical files (doc, pdf,
xml) - Minimal CPU overhead during compression
- Two levels of compression provide flexibility to
optimize trade off between storage savings and
CPU overhead - Compression Levels MEDIUM (default), HIGH
- Higher the degree of compression, higher the
latency and CPU overhead incurred - Intelligent compression
- Skips compression for already compressed data
- Auto-turn off compression when space savings are
minimal or zero - Application transparent
- Random reads and writes allowed on compressed
SecureFile data - Compressed data can be indexed and searched
18Advanced Features - Deduplication
Secure hash
- Enables storage of a single physical image for
duplicate data - Significantly reduces space consumption
- Dramatically improves writes and copy operations
- No adverse impact on read operations
- May actually improve read performance for cache
data - Duplicate detection happens within a table,
partition or sub-partition - Specially useful for content management, email
applications and data archival applications - Part of the Advanced Compression Option
19Advanced Features - Encryption
- Extends Transparent Data Encryption (TDE) syntax
to SecureFile data - Old LOB or BasicFiles data can not be encrypted
- Performed at Block level
- Support for industry-standard encryption
algorithms - 3DES168
- AES128
- AES192 (default)
- AES256
- Encrypt on a per-column basis
- Part of the Advanced Security Option
20Unified Security Model
- Unified security and identity model for
relational and file data - Large reduction in attack footprint
- And better security features
- Transparent data encryption (TDE) secures stored
data - Existing applications require no changes
- Backups, log files, etc. are also encrypted
- Encryption on the network
- Fine-Grained Auditing for content based auditing
of access - Label Security restricts access based on
- Level (e.g. confidential)
- Compartment (e.g. finance)
- Group (e.g. Japan)
21SecureFile Interfaces
- SecureFiles can be accessed by both database
clients and file system clients - Database clients use extended LOB interfaces
- JDBC, ODBC, OCI, .NET, PL/SQL
- 11g has a highly optimized streaming protocol for
SecureFiles - File system clients use the file system protocols
implemented in the Content DB repository - FTP access
- WebDav Access
- Http Access
22Integration with Other Products and Features
- SecureFiles is fully integrated with
- XML DB (Binary XML)
- Oracle InterMedia
- Oracle Spatial
- Content DB
- Out of box benefits for new installations
- By setting db_securefiles FORCE or ALWAYS
- Efforts underway to integrate with Stellent
23Spatial/GeoRaster Load, Pyramid Generation
- Loading and Scaling performance test
- 30 faster
- Pyramid generation test on a 9GB image that
generated 4GB of pyramid images - 6x faster
- Raw read/write improvement accrue to application
access of LOB
24XDB Binary XML XML index performance
Speedup with SecureFiles
File Size (MB)
- For data centric XML (10KB) 5x speedup
- For document centric XML (1MB) 35x speedup
25Using SecureFiles
- Old LOBs are still supported and are referred to
as BASICFILE - Default LOB storage type in Oracle Database 11g
- New keyword SECUREFILE to refer to SecureFile
- Requires compatibility set to 11.1 or higher
- New init.ora parameter db_securefile to manage
LOB storage policy - PERMITTED allow SecureFiles to be created
(Default) - NEVER disallow new SecureFile
- FORCE create all LOBs as SecureFiles
- ALWAYS attempt to create SecureFiles, but fall
back to BasicFiles - IGNORE ignore attempts to create SecureFiles
- Example
- ALTER SYSTEM SET db_securefile 'ALWAYS'
- Locally managed tablespaces with ASSM is required
to use SecureFiles
26Advanced Features - DeduplicationExamples
- Keywords DEDUPLICATE/KEEP_DUPLICATES
- Create table with SECUREFILE LOB column with
LOB-level deduplication - CREATE TABLE tbl1 (a BLOB)
- LOB(a) STORE AS SECUREFILE (
- DEDUPLICATE LOB)
- Disable deduplication on SECUREFILE LOB
- ALTER TABLE tbl1 MODIFY
- LOB(a) (
- KEEP_DUPLICATES)
27Advanced Features - CompressionExamples
- Keywords COMPRESS/NOCOMPRESS
- Create table with SECUREFILE compressed LOB
column - CREATE TABLE tbl1 (a BLOB)
- LOB(a) STORE AS SECUREFILE ( COMPRESS )
- Create table with SECUREFILE compressed LOB
column with highest level of compression - CREATE TABLE tbl2 (b BLOB)
- LOB(b) STORE AS SECUREFILE ( COMPRESS HIGH )
- Modify compression level from HIGH to MEDIUM
- ALTER TABLE tbl2 MODIFY
- LOB(b) ( COMPRESS MEDIUM )
28Advanced Features - EncryptionExamples
- Create table with SECUREFILE LOB column with
encryption - CREATE TABLE tbl1 (a BLOB ENCRYPT)
- LOB(a) STORE AS SECUREFILE
- CREATE TABLE tbl1 (a BLOB)
- LOB(a) STORE AS SECUREFILE (ENCRYPT)
- Create table with SECUREFILE LOB column with
AES256 encryption - CREATE TABLE tbl2 (b BLOB ENCRYPT USING AES256)
- LOB(b) STORE AS SECUREFILE
- Enable Encryption on SECUREFILE LOB column using
AES128 - ALTER TABLE t1 MODIFY LOB(a) ( ENCRYPT USING
'AES128')
29Migration to SecureFiles
- Easiest approach is to just enable SecureFiles on
new partitions - Old data stays as LOBs
- Migrating existing data requires table rebuild
- Can be done at the partition level
- Online Table Redefinition could be used to
eliminate downtime - No need to take the table or partition offline.
- Additional storage equal to the entire table and
all LOB segments must be available. - Global indexes need to be rebuilt.
- Recommend setting NOLOGGING storage attribute for
destination SecureFile columns during migration
to avoid performance problems with redo
generation - If the destination table is partition, online
redefinition can be done in parallel
30The Best of Files and Databases
- SecureFiles have all the leading-edge file system
capabilities - Deduplication, Encryption, Compression, Logging
- SecureFiles have advanced DB capabilities not in
file systems - Transactions, Read Consistency, Flashback
- Readable Standby, Consistent Backup, Point in
Time Recovery - Fine Grained Auditing, Label Security
- XML indexing, XML Queries, XPath
- Real Application Clusters
- Automatic Storage Management
- Partitioning and ILM
- Search across meta-data and file content
- Capabilities go far beyond any other database or
file system - Having the best of both worlds removes the need
to compromise
31XML DB
32Evolution of Oracles XML Support
Binary XMLStorage Indexing
XQuery
Performance
XMLStorage Repository
XMLAPIs
1998 2001 2004
2007
33XML DB Customers
SQL Centric
XML Centric
Document Centric
34Extending the lead in 11g
- Improvements for XML Schema Optimized XML
- In-place XML Schema Evolution
- Support for Partitioning
- Intelligent Defaults for XML Storage
- Performance
- Improvements for Schema-less XML Storage
- Binary XML
35Schema Optimized XML StorageIn-place XML Schema
Evolution
- Makes certain changes to XML schema with zero
downtime - Provides support for most common types of changes
including - Addition of new elements
- Addition of new attributes
36Schema Optimized XML StoragePartitioning Support
- XML Schema storage can be used in conjunction
with Partitioning - Full support for partition maintenance operations
- Full support for partition pruning
- Correct solution for large managing large volumes
of data!
37Schema Optimized XML StorageIntelligent Defaults
- Defaults implemented based on in-depth knowledge
on how best to structure underlying storage model - Delivers Optimal Performance
- Makes it easier to use capabilities for Oracle
Text to perform text-based searches on contents
of specific elements within an XML document
38Schema Optimized XML StoragePerformance
Optimizations
- When compared with Oracle 10g R2
- 10x improvement in throughput when loading
certain types of XML data into XML
schema-optimized storage - 10x improvement for XML Publishing operations
that use XMLAgg SQL operator - Eliminates the 64K size limit on text node
39New in 11g Binary XML
40Binary XML New 11g Storage Model
- Binary XML is a new storage model and indexing
technique - Delivers high performance insert, update, and
query operations in cases where full flexibility
of XML data model is required and use of XML
schema is not appropriate - Syntax
- CREATE TABLE purchaseOrder OF XMLTYPE
- STORE AS BINARY XML
41Schema-less XML Storage
- Benefits
- Reduced Storage requirements
- More efficient CPU, Network, and Memory
utilization - Flexible XML Schema Support
- Support for 11g SecureFiles
42Schema-less XML StorageReduced Storage
Requirements
- Binary XML is a compact representation of XML
document - Reduces disk space even before traditional
database compression techniques are applied - Achieved from tokenization of tags and conversion
from text to native representation for text nodes
and attribute values
43Schema-less XML StorageReduced CPU and Memory
Overhead
- Binary XML format addresses one of biggest
problems associated with large-scale XML
deployments - Overhead associated with parsing and serializing
XML every time data moves between different
application tiers - With Binary XML
- On-disk representation of data is same as
in-memory representation and on-wire
representation - Since representation of XML is shared by all
tiers, this enables efficient exchange of XML
content
44Schema-less XML StorageReduced Network Overhead
- Network overhead reduced by using the compact
internal format rather than the traditional
serialized text format to transmit XML on the
wire.
45Schema-less XML StorageFlexible Schema Support
- Supports both schema and non-schema XML
- Binary Model allows documents associated with one
or more schemas to be stored in the same table or
column - This eliminates majority of problems that can
arise when changes occur to an XML schema
46Schema-less XML StorageSecureFiles
- Binary XML leverages all features of 11g Secure
Files - Achieves maximum throughput for storing and
retrieval
47Secure Enterprise Search
48Oracle SES Solution
- Make Information and Business Applications Easy
to Access - Allow Unified View of All Enterprise Sources
- Oracle Sources/Non Oracle Sources
- Maintain High-Level Security
- Provide High Quality Search Capability
- Advanced Query Capability
- Better User Experience
- Better Manageability
- Crawling, Indexing, Search Reporting, UI
Customization - High Scalability and High Availability
- Oracle Clusterware etc.
49All Your Enterprise SourcesOracle and Non-Oracle
Repositories
- ERP/CRM App.
- E-Business Suite Employees
- E-Business Suite iProcurement
- E-Business Suite Learning Management
- Oracle Calendar
- Siebel Accounts
- Siebel Contacts
- Siebel Solutions
- Siebel SR Attachments
- Siebel SRs
- Siebel Sales Tool
- Content
- Documentum
- FileNet CE
- IBM Content Manager
- OpenText LiveLink
- Hummingbird
- Lotus Domino
- Oracle Content DB
- Oracle Content Server
- Interwoven
- SAP KM
- Email
- MS Exchange
- IBM Lotus Notes
- Oracle Collaboration Suite
- Portal
- Documentum eRoom
- Oracle Portal and WebCenter
- MS SharePoint
- Websphere
- Database
- Oracle DB
- SQL Server
- Any JDBC-enabled Databases
- BI Applications
- Hyperion
- Business Objects
- Microstrategy
- Cognos
- File System
- Unix/Linux
- Windows
50Simplify Information AccessThrough a Single
Unified Search
Tomorrow
Today
- Joined-up / federated
- Consistent user experience
- Structured unstructured sources
- Highly Secured
- Siloed search
- Inconsistent user experience
- Primary structured sources
- Lack of Security
51Secure Search across your EnterpriseProvide the
right access and the right results
52Directly Accessing Applications Accessing Your
Enterprise Application Easily
1. Enter search terms
Product sales southeast region
2. Results shown based on user access for reports
and other BI content
3. Click through to view actual reports
53SES Customers
Ministry of Science Denmark
University of Tokyo
Bank of Austria
54Location and Spatial
55Location and Spatial in Oracle Database
- Readily available location data
- GPS, Web Services, packaged data
- Integrated information flows to from multiple
sources - Real-time information updates
- Extremely high data volumes
- Terabytes to petabytes of machine generated data
- Integrate location information into business
processes, operational workflows, and business
intelligence applications - Leverage geospatial tools, solutions and analytics
56Location and Spatial in Oracle Database
57Oracle Core Spatial Capabilities
Spatial Data Types
Oracle10g Spatial
All Spatial Data Stored in the Database
Spatial Access Through SQL
Select a.building_id from facility a, facility
b where sdo_within_distance( a.geom, b.geom
distance 10 unit mile) TRUE
58Spatial and Location Customers
Mapping
Emergency Response Resource Planning
Army CorpsOf Engineers
Location Based Services
Asset Fleet Management
59Oracle Spatial 11g Enables
3D, Point Clouds, and LIDAR
Scrollable, Interactive Maps
Spatial Web Services
Raster Imagery
Geocoding Routing
Oracle BI Dashboards
603D Web Services Support
- Comprehensive 3D infrastructure for modeling,
visualization, simulation - Meets business requirements for 3D simulations
models of - Cityscapes, viewscapes, viewsheds, line-of-sight
- Hazard assessments, urban models, city planning
- As-built and reverse engineering structures
- OGC ISO TC211 Enterprise Web Services Support
- Meets requirements to provide spatial features as
a service - Full transaction support for SOA architectures
used by mapping agencies, energy, utilities,
public sector
61MapViewer overview
- Publish Spatial data to the web
- Centrally managed map definitions, symbology, and
styling rules - Java, XML, JavaScript APIs
- Map and feature Cache
10.1.3.1 Quickstart KitDevelopers download at
www.oracle.com/technology/products/mapviewer
62MapViewer Architecture
- Map response consists of
- A streamed map image
- or
- A URL to the map image along with the map MBR
- Map request consists of
- Base map name
- Center of map
- Image format
- Image width/height
- Optional tags
- map name
- JDBC query
- others
XML via HTTP
XML via HTTP
SQL
63OBIEE Crime Information Warehouse demo
64OBI EE demo
65Semantic Database RDF and Ontology support
66What is Ontology?
An ontology is a formal representation of a set
of concepts within a domain and the
relationships between those concepts. It is
used to reason about the properties of that
domain, and may be used to define the domain.
67What is RDF?
Resource Description Framework (RDF) is a family
of The World Wide Web Consortium (The W3C)
specifications originally designed as a metadata
data model, but which has come to be used as a
general method of modeling information through a
variety of syntax formats.
68What is RDF?
The RDF metadata model is based upon the idea
of making statements about Web resources in the
form of subject-predicate-object expressions,
called triples in RDF terminology. The subject
denotes the resource, and the predicate denotes
traits or aspects of the resource and expresses
a relationship between the subject and the
object.
69What is RDF?
For example, one way to represent the notion
"The sky has the color blue" in RDF is as the
triple a subject denoting "the sky", a
predicate denoting "has the color", and an
object denoting "blue".
70Reasons to Perform Semantic Query and Data
Management in the Database
- Discover data relationships across applications
or large amounts of data - Structured data (database, apps, SOA, RSS
schemas) - Unstructured data (email, office documents)
- Rich data types (graphs, spatial, text,
machine-generated) - Queries are not defined in advance
- Schemas are continuously evolving
- Overcome isolated systems design
- High-transaction data inputs
71Customers Managing Semantic Relationships in the
Database
- Enterprise Information Integration
- Metadata mapping of application data models
- Use of structured data
- Early adopters Life Science Pharma, Utilities,
U.S. Government - Knowledge Management
- Analyze social networks who / what relates to X
? - Create ontology to facilitate ETL what to
extract? - Use of unstructured and structured data
- Early adopters Intelligence, Public Health, Life
Science
72New in 11g
- 10gR2 introduced RDF support for graph data model
- 11g adds new data model for scalable, secure,
standards-based storage and query of OWL and RDF
Ontologies - SQL-based query of these Ontologies
- Scales up to a billion triples
- Semantic Match SQL Operator to query any
relational data using referenced Ontology - Expands results to include related semantically
relevant content - Inferencing Support
- Native, practical subset of OWL inferencing
capability - Built-in RDFS inferencing
- User-defined rules
73Customer Drivers and Business NeedsSemantics
pain points
- Recognized need for semantic data management
- Business advantage, competitive differentiator
- Derive new insights from semantic reasoning
- Growing challenges to manage semantic data
- Scalability, accessibility, security,
performance, manageability - Consolidate and Interoperability
- Eliminate proprietary semantic Web silos
- Make semantic relational data interoperable in
enterprise - Overall business drivers
- Reduce risk and costs associated with specialized
and unproven technology providers - Oracle reliability
74Why is Oracle Spatial unique in Semantics Data
Management?
- Oracle 11g is the only commercial database with
native RDF/OWL semantics capability. - Can readily scale to ultra-large repositories (1
billion) - Growing ecosystem of 3rd party tools partners
- Leverages Oracle Partitioning and Advanced
Compression. RAC is also supported. - Semantics customers expect to deploy very large
servers
75Query Semantic Data
- Choice of SQL or SPARQL (W3C Standard)
- SPARQL-like graph queries can be embedded in SQL
- Key advantage semantic queries can be combined
with relational data - SPARQL support is currently supported through a
Jena plug-in for Oracle - Oracle plans to natively support SPARQL in next
major release of RDBMS
76Knowledge Mining Workflows
Ontology Engineering Modeling Process
Information Extraction Categorization,
Feature/term Extraction
RDF/OWL
OWL Ontologies
Processed Document Collection
Web Resources
Domain Specific Knowledge Base
- Knowledge Mining Analysis
- Text Indexing using Oracle Text
- Non-Obvious Relationship Discovery
- Pattern Discovery
- Text Mining
- Faceted Search
News, Email, RSS
SQL/SPARQL Query
Content Mgmt. Systems
Explore
Analyst
Browsing, Presentation, Reporting, Visualization,
Query
77Oracle Partners
Commercial
Open Source
- Jena
- - Semantics Framework
- Pellet
- - Reasoning Engine
78Conclusion
79Managing All Your Information
- Two key trends driving unstructured data into
Databases - Next generation Business Applications
- Compliance
- Oracle is the leading platform for managing all
your information - Dominant in Location and Spatial
- Leader in XML and Semantic Database
- Proven in Information Retrieval and Media
Management - Trend-setting Secure Enterprise Search
80Questions?
oracle.com/technology/
At Amazon and other booksellers
81Thank You