Title: Warehousing on the Web
1Warehousing on the Web
2Why Utilize the Web?
- What is the data Webhouse
- Managing clickstreams
- WWW today
- ROI
- DSS
3Data Webhouse
- Defined by Ralph Kimball
- Two distict focuses
- Bringing the web to the warehouse
- Clickstream data as a source of information
- Bringing existing data warehouses to web
- Fully distributed environment
4Required Capabilities
- Capture clickstream logs and convert to tables
for analysis - Merge customer demographic and account info with
above - Interpret customer paths in website
- Identify abandoned sessions
- Use dw to drive customer responses appearing on
your website - DW querying and reporting available through web
browsers - Attach multimedia to DW
- DW security
5Architecture Web to Warehouse
- Beyond comprehensive snapshot of business on
real-time basis also want knowledge of customer
behavior - Extended design factors
- Timliness real-time
- Data volume no upper limit
- Response time less than 10 seconds
6Hot Response Cache
- A file server holding complex file objects
- As a file server it is an I/O engine (bandwidth)
- Must hold objects which will be requested
- Security responsibility of requesting server
- Extension of original operational data store
(ODS) - Does not physically speed up database creates
illusion by storing predictable answers
7Who are our users?
- Traditional
- Power users
- need database connectivity
- Analysts
- want to manipulate existing data
- Report viewers
- view standardized reports
- Web
- Our customers
- Our business partners
- Our employees
8Clickstreams
- Clickstream not another data source
- Distributed nature leads to multiple data sources
which require synchronization - Multiple parties
- More than a dozen log file formats for capturing
clickstream data - Search specification
- Basic form of clickstream data stateless
- Log shows isolated page retrieval event
- Clickstream data anonymous
- Todays Promotions
- Clickthroughs and referrals as a revenue source
9Clickstreams
- Clickstream post-processor receives raw long
data from web server and normalizes it into a
format which can be combined with application
derived data for insertion into dw - Todays Promotions
- Clickthroughs and referrals as a revenue source
10Why Bring DW to Web?
- Primary function of dw to publish information
web good partner - Need distrnuted dw web provides universal
connectivity - Universal front-end web browser
11Web Pushes Data Warehouse
- User interface effectiveness measurable
- Queries and updates mixed
- Speed expected 10 second rule
- Global
- 27 X 7 expected
- International characters, dates, addresses
- Expanded multimedia
- Animation, zoomable images, maps, video clips
- Need material in digital form
- Enterprise information portal will require items
to be searchable
12Web Pushes Data Warehouse
- Mass customization
- Dynamically created web pages XML
- Fully distributed
- Linking together all the data marts
- Security and Privacy
- Publish only to those who need to know
- User profiles and access profiles defined in one
place - Full-time expert security person
13Second Generation User Interface Guidelines
- Near- instantaneous performance
- Website Design
- Design for lowest common denominator
- Measure page performance on a continuous basis
- Paint navigation buttons immediately
- Disclose content progressively
- Implement page caching
- Cache data, reports
- Improve web server bandwidth
- Improve server throughput
14Second Generation User Interface Guidelines
- Data Webhouse design
- Adapt all web design responses
- Select appropriate DBMS software dimensional
models, OLAP - Use indexes, aggregations
- Partition files
- Increase RAM
- Use parallel processing
15Meet User Expectations
- Website design
- Site navigation choices
- Help choices
- Communication with various groups response must
be assured - Headlines serious and define content
- Indicate off-screen material
- Survey customer needs and wants
16Meet User Expectations
- Data Webhouse design
- Report library
- Folder of previous queries, reports
- Dimension browser viewing dimension can assist
report creation - Business metadata interface understand
organizations data assets
17Streamline Process
- Business processes designed from ground up to
work seamlessly on web - Website design
- Reengineer to streamline process and make
navigation easier, uniform interfaces - Remove barriers to reaching page
- Minimize clicks and new windows
- Allow interruption and return
18Streamline Process
- Data Webhouse design
- Build an explicit value chain for reporting and
analysis around the application suite using
conformed dimensions and facts - Drill across functions
- Single user interface for reporting against all
parts of business - Master report library and FAQs
- Single login and single console access to webhouse
19Reassure Users
- Website Design
- Map of processes
- Data Webhouse design
- Provide status and lineage of current data
- Provide status of running reports
- Active notification
- Allow for entry of NA if data not available
- Time stamped dimensions
- Time stamped reports
20Allow Problem Resolution
- Website design
- Allow backtracking, rollback, play forward
- Keep old transactions
- Easy error reporting
- Acknowledge, track and follow-up all user inputs,
show wait time - Assist searching
- Data Webhouse design
- Provide adequate end user support
- Show aggregates in use and available
- Show system load and percent completed
21Build Trust
- Clearly state and observe websites policies for
using customers identity - Website design
- Do not abuse privacy
- Link to privacy statement
- Use friendly pictures of people
- Distinguish between ad content and editorial
content
22Build Trust
- Data Webhouse design
- Two-factor security
- What you know password
- What you posses token
- Track changes in employee and contractor status
- Create and enforce roles for employees,
contractors and customers - Manage webhouse security directly
23Provide Communication Hooks
- Website design
- Provide useful links to others internal and
external - Remove links that invalidate the back button
- Use copyable URLs
- Use URL as medium of distribution
24Advantages of Web Today 1998 2000
- Immediate worldwide access
- Centralized management - Decentralized
- Thin client
- Multi-platform (client and server) - Distributed
- Little or no software distribution - Downloads
A
25Disadvantages of Web Today 1998 2000
- Immature technology - Teenager
- Security - Solutions
- Speed restricted by bandwidth - data and logic
must both travel across internet - Design limited to least common denominator or
access restricted to specific browser
26Vulnerabilities
- Physical assets
- Information assets
- theft
- modification
- Software assets
- Ability to conduct business
27Web Architecture
- Browser
- Applets/ActiveX
- Email
- Spreadsheet
- Word-processing
Thin Client
Communication layer (network/internet)
Internet Server
Analysis/ Graphics Report SQL
statistics Writer Query
OLAP Server
Multidimensional Summary/Alternative Database
Relational Tables
Database Servers
Data Warehouse - Relational Database
28Business Management through Information
- Analysis of historical records
- order processing, inventory levels, shipments,
receivables, customer history, etc. - Goals include
- Measures of efficiency
- Anticipate changes (planning and forecasting)
- Make adjustments
- Integration of model and control function
29Rule-Based Management
- Create Strategic rules
- IF market demand increases THEN implement
marketing campaign A3 - IF profit margin drops below value X THEN
adjust overhead by - Must not forget alert rules
- If unanticipated condition, then notify CFO
- Must not be too reactive
- would cause thrashing
30OLDM Decision Process
- Simultaneous capture of
- Decision support information
- Surveyed customer on-line in exchange for an
additional discount - with business function inputs
- Immediate computation or estimation of secondary
information - based on planning and forecasting rules
- Decision support information is
- available on-line
- ready to use as is
Management Defined !
31OLDM Decision Process
- Derived data becomes control information
- Automation of analysis and decision support
- immediately available to management
- Problems documented on-line
- Classes of problem and corrective action codified
- problem recognition
- decision rules
32OLDM Decision Process
- Requires four types of information
- Characteristics which identify a class of problem
- Corrective action ( management responses by
problem class) - Rules to implement actions
- Record of result
33Potential of OLDM
- Better managed business
- knowledge asset capture and retention
- consistency across enterprise
- flexible, highly responsive
- Close loop with customer
- event and market driven but controlled
- Direct customer interaction
- via web, telephone, remote connection
- Improved systems capacity planning and system
management - Re-alignment of business and IT