Title: Smart Home Technologies
1Smart Home Technologies
- Data Management and Databases
2Databases for Smart Homes
- Requirements
- Database Types
- Database Technologies
- Smart Home Databases
- Data Mining
3Data Storage Requirements
- Sensor data
- Temperature (15 _at_ 8 Kbps)
- Humidity (15 _at_ 8 Kbps)
- Gas (15 _at_ 8 Kbps)
- Light (15 _at_ 8 Kbps)
- Motion (15 _at_ 8 Kbps)
- Pressure (100 _at_ 8 Kbps)
- Microphone (15 _at_ 500 Kbps)
- Camera (15 _at_ 10 Mbps)
4Data Storage Requirements
- User data
- Multimedia
- Phone messages/conversations (500 Kbps 10 Mbps)
- Music (500 Kbps)
- TV/Radio broadcasts (500 Kbps 10 Mbps)
- Home movies (10 Mbps)
- Images
- Computer
- Programs
- Data files
- Operating systems
5Data Storage Issues
- Issues
- Query frequency and type
- Sampling/recording rates
- 205 sensors (158,900 Kbps)
- Multimedia recordings
- Simultaneous playback
- Analysis, prediction, decision-making queries
- Transaction granularity
- Historical data, decay
- Security and privacy
- Centralized vs. distributed
6What Data to Store
- Type of Data
- Raw data
- Pre-processed
- Compressed
- Frequency of Data Storage for Sensor Data
- Tradeoff between precision and quantity
7Sensor Data Example
- 9/8/2002 201 AMA5 (Coffee Maker) ON
- 9/8/2002 1659 AMA9 (A/C) ON
- 9/8/2002 35852 AMA0 (Stereo) ON
- 9/8/2002 5570 AMA2 (Kitchen Light) ON
- 9/8/2002 3142 AMA5 (Coffee Maker) OFF
- 9/8/2002 783 AMA3 (Stove) ON
- 9/8/2002 125452 PMA10 (Bathroom Light) ON
- 9/8/2002 4585 AMA0 (Stereo) OFF
- 9/8/2002 8120 AMA3 (Stove) OFF
- 9/8/2002 9610 AMA8 (Computer) ON
- 9/8/2002 10819 AMA4 (Bathtub Heater) ON
- 9/8/2002 1194 AMA0 (Stereo) ON
- 9/8/2002 945 AMA8 (Computer) OFF
- 9/8/2002 1094 AMA4 (Bathtub Heater) OFF
- 9/8/2002 225 PMA10 (Bathroom Light) OFF
- 9/8/2002 25237 PMA0 (Stereo) OFF
- 9/8/2002 420 PMA9 (A/C) OFF
8Media Viewing Example
9Multimedia Example
- Digital Silhouettes (Predictive Networks)
- Predicting web surfing behavior ()
- Microsoft (2002) track TV viewing preferences
- 140 data items for each user
- Demographics (50)
- Subcategories within gender, age, income,
education, occupation, and race - 90 Content preferences
- golf, music, yoga
10Database Types / Data Models
- Relational
- OO
- Hybrid (Object-Relational)
- Temporal
- Deductive
- Others
- Spatial,
11Example Data Representations
- Relational
- We all knowflat tables of atomic attributes
with foreign key relationships - OO
- Complex data reps
- multivalued, composite
- Temporal
- Relational model add valid start, end dates to
each table (versions of info and when valid) - Includes time, events, durations
12Operations
- DDL/DML (data def/manip languages)
- SQL
- OQL
- Update operations
- Built-in insert, delete, update
- Stored procedures for triggers, active (ECA) rules
13Example Operations for Temporal Databases
- INCLUDES
- Rows valid in a certain time period
- BEFORE/AFTER a time condition
- Set operations
- Union, intersection of 2 time periods
14Active DB
- Event-Condition-Action rules
- Allow for decisions to be made in the database
instead of a separate application - Relational
- Implemented as triggers
- Challenges
- Rule consistency
- (2 rules do not contradict)
- Guaranteed termination
- Trigger loops (T1 lt-gtT2)
15Smart Home Active DB Example
- Java, Postgres, Jess rules
- Event classification (localcomposite)
- Data Manipulation Events
- TV show being viewed (channel, time, genre)
- Temporal Events (instance,recurring)
- Set temp to 70 degrees at 700am workdays
- Exception Events
- Power failure
- Behavioral Events
- Time children home from school dinner time
16Active DB Example (TCU)
Title Event Condition Action
TV View Menu TV turned on Molly is holding remote Display shows matching Mollys preferences
Entry Lighting Inhabitant enters house Light level ltthreshold Adjust lighting to predetermined level
Aroma-therapy Every Friday night when Hanna sits on sofa Always Release aroma
Night Idle John on sofa idle gt 15 minutes, TVlights are on No other inhabitant in room Turn off all devices in the room
17Distributed vs. Centralized
- Centralized database can produce a bottleneck
- Large volume of data input
- Large database
- Large volume of queries
- In distributed databases, data consistency,
replication, and retrieval can be more
problematic - Consistency of schemas
- Retrieval in case the data location is not known
- Communication overhead to ensure database
consistency
18SmartHome Database Architecture
- Centralized vs. distributed?
- Answer Both
- Central storage of high demand, persistent data
- Distributed storage of low demand, dynamic data
- Distributed queries
- Push processing toward sensors
- Adaptive, hierarchical organization
- End-effector autonomy (smart sensor)
19Database Systems
- Commercial
- DB2
- Empress
- Informix
- Oracle
- MS Access
- MS SQL
- Sybase
- Free
- Berkeley DB
- PostgreSQL
- MySQL
20UTA MavHome DB
- Active
- Reactive proactive (e.g., to predict)
- Distributed
- Information collection agents
- Rules
- Local Agent what data they need to collect
- Distributed coordinate overall monitoring of
collected information - Continuous monitoring of events
- Extension of SNOOP
21Microsoft Easy Living DB (2002)
- Relational
- Fast robust, but awkward for some data
- World Model DB Describes
- Computing devices
- People and their personal preferences/settings
- Services
- Rooms and doorways
- Serves as Abstraction Layer between sensors and
application that use data from sensors - e.g. new sensors ? no change to applications
22Stanford Interactive Workspace
- Uses LORE
- A semi-structured XML DB system
- Still available, but work stopped in 2000
- Data stored is catalog of (index to)
- documents, images, 3-D models, application-specifi
c domain models
23Sensor Database Systems
- COUGAR project
- www.cs.cornell.edu/database/cougar
- Query processing over ad-hoc sensor networks
- Small database component (QueryProxy) at each
sensor - Sensor clusters provide local aggregations (e.g.,
min, max, mean) - Assumes centralized index of all data sources
24Siemens Netabase
- The network is the database.
- Navas and Wynblatt, ACM SIGMOD 2001
- Sensor networks
- Large number of data sources (105)
- Volatile data and data organization
- Thin data servers on scaled-down hardware
- Netabase approach
- Query decomposition
- Characteristic routing (ala IP routing)
- Local joins
- Query evaluation
25Siemens Netabase
26Data Warehouses
- Repositories for data mining activities
- Aggregates/summaries of data help efficiency
- Optimized for decision-support, not transaction
processing - Definition (Elmasri, page 900)
- A subject-oriented, integrated, non-volatile,
time-variant collection of data in support of
managements decisions - Replace management, with smart home agents
27Warehouse Properties
- Very large 100gigabytes to many terabytes
- Tends to include historical data
- Workload mostly complex queries that access lots
of data, and do many scans, joins, aggregations.
Tend to look for "the big picture". - Updates pumped to warehouse in batches
(overnight) - Data may be heavily summarized and/or
consolidated in advance (must be done in batches
too, must finish overnight). - Research work has been done (e.g. "materialized
views") -- a small piece of the problem.
02.15.04 from http//redbook.cs.berkeley.edu/lec28
.html
28Data Warehouses
- Data Cleaning
- Data Migration simple transformation rules
(replace "gender" with "sex") - Data Scrubbing use domain-specific knowledge
(e.g. zip codes) to modify data. Try parsing and
fuzzy matching from multiple sources. - Data Auditing discover rules and relationships
(or signal violations thereof). Not unlike data
mining. - Data Loading
- can take a very long time! (Sorting, indexing,
summarization, integrity constraint checking,
etc.) Parallelism a must. - Full load like one big xact change from old
data to new is atomic. - Incremental loading ("refresh") makes sense for
big warehouses, but transaction model is more
complex have to break the load into lots of
transactions, and commit them periodically to
avoid locking everything. Need to be careful to
keep metadata indices consistent along the way.
02.15.04 from http//redbook.cs.berkeley.edu/lec28
.html
29Data Warehouses
02.15.04 from http//redbook.cs.berkeley.edu/lec28
.html
30Data Mining Definition
- Discovery of new information in terms of patterns
or rules from vast amounts of data - Extracts patterns that cant readily be found by
asking the right questions (queries) - TOO MUCH DATA FOR HUMANS
- Emerged from
- Artificial IntelligenceMachine learning, Neural
nets, Genetic Algorithms - Statistics
- Operations Research
31Data Mining Steps
- Data selection -- pick the data needed
- Data cleansing
- Fix bad data (e.g., spelling, zip codes)
- Hard to deal with missing, erroneous,
conflicting, redundant data - Enrichment
- Add data (e.g., age, gender, income)
- Data transformation
- Aggregate (e.g., zip codes ? regions)
- Data mining
- Reporting on discovered Knowledge
32Types of Results
- Association rules
- Buy diapers ? buy lots of beer
- Sequential patterns
- Buy house ? buy furniture within months
- Classification trees
- Types of buyers (upscale,bargain-conscience, )
- Why do it?
- Make more money
- Science medicine
33Data Mining Goals
- Find patterns to predict future events
- Find major groupings
- Groupings of buyers, stars, diseases
- Find which group something belongs to
- creditworthiness
34Data Mining Results
- Association rules
- Classification hierarchies
- Clustering
- Sequential patterns
- Patterns within time series
- Type of result, inputs algorithms vary
- Often interested in some combination of these
types of Knowledge
35Clustering
- Unsupervised learning techniques
- Training samples are unclassified
- Vs. supervised learning (classification)
- Drug categories for depression
- Categories of TV viewers
- Categories of buyers (likely, unlikely)
- Categories of households?
- Single male, mother/children, conventional
(M/D/kids), DINKs.
36Sequential Patterns
- Detecting associations among events with certain
temporal relationships - Example
- Cardiac bypass for blocked arteries
- AND within 18 months, high blood urea
- THEN kidney failure likely in next 18 months
- Particularly important in smart homes
37Sequential Pattern Discovery
- Sequence of itemsets
- Grocery store purchases by 1 person (3 itemsets)
- soy milk, bread, chocolate, bananas,
chocolate, lettuce, tomato, chocolate - 2 Subsequences
- soy milk, bread, chocolate, bananas,
chocolate, - bananas, chocolate, lettuce, tomato,
chocolate
38Sequential Pattern Discovery
- The support for a sequence S is the of the
given set U of sequences of which S is a
subsequence. - That is how many times does S show up?
- Find all subsequences from the given sequence
sets that have a user-defined minimum support. - The sequence S1, S2, Sn, is a predictor of
fact that a customer that buys itemset S1 is
likely to buy itemset S2, then S3, - Prediction support based on frequency of this
sequence in the past - Many research issues to create good algos
39Patterns Within Time Series
- Finding 2 patterns that occur over time
- 2003 stock prices of Choice Homes and Home Depot
- 2 products show same sales pattern in summer but
different one in winter - Solar magnetic wind patterns may predict earth
atmospheric changes
40Time Series Pattern Discovery
- Time series are sequences of events
- Event could be a transaction (closing daily stock
price) - Look at sequences over n days, or
- Longest period in which change is no greater than
1 - Comparing
- Must define similarity measures
41Other Approaches in Data Mining
- Neural nets
- Infer a function from a set of examples
- Non-parametric curve-fitting
- Interpolates to solve new problems
- Supervised unsupervised algorithms
- Capabilities
- classification
- time-series prediction
- Disadvantages
- cant see what it learned (not declarative)
42Other Approaches in Data Mining
- Genetic algorithms
- Set up
- Representation (strings over an alphabet)
- Evaluation (fitness) function
- Parameters of generations, cross-over rate,
mutation rate, etc. - Randomized (probabilistic operators), parallel
search over search space - Used for problem solving and clustering