Title: Research Interests
1Research Interests
Juan E. Vargas Computer Science
Engineering vargasje_at_engr.sc.edu September 14,
2001
2Data Mining,Knowledge DiscoveryData
Warehousing
Dynamic Uncertaintyof Research Areas
3Data Mining Anticipating users questions in a
flexible and scalable manner
DM is searching for strong patterns within big
data that can be generalized to make predictions
and to support future decisions ...
DM is a cooperative effort involving humans and
computers...
DM is a process, not a product.
Features are identified from a problem domain
and measured over many cases, to do
classification or regression.
Complexity is given by the number of cases, the
number of features, and the number of distinct
values that features can assume.
4Knowledge Discovery
KD is prior to prediction. KD is necessary when
the available information is insufficient to
predict accurately.
Given a set of data (D), a language (L), and some
measurement of certainty (C), find statements (S)
or patterns (P) that describe relationships among
subsets of D with certainty C.
Interesting patterns having sufficient certainty
can be treated as new pieces of knowledge that
can be incorporated into a knowledge base.
KD nontrivial extraction of implicit,
previously unknown, and potentially useful
information from data.
5Underlying Principle
6Data Mining
7Bayesian Belief Networks
A Bayesian belief network (BBN) is a directed
acyclic graph (DAG) in which nodes represent
probabilistic variables and links represent
relations between the variables. Causal
relations are quantified by conditional
probabilities associated with each link. Belief
(probability) is computed using Bayes Rule,
propagating messages among nodes (for
singly-connected networks) or cliques in a tree
(for multiple-connected networks).
8Bayesian Belief Networks
Bayesian Networks (BNs) offer the best
combination of formal representation, clear
semantics, and efficiency for reasoning under
uncertainty (incomplete, ambiguous, partially
available information).
Influence Diagrams (IDs) are knowledge
representations that combine probabilities with
utilities to offer the advantages of BNs plus a
uniform representation scheme for decision making.
We are furthering the state of the art on BNs and
IDs and applying these methodologies for Data
Mining Data Warehousing.
9Rationale
Closed- form, analytical solutions are not always
available
e ma2 e mb2 e mc..
10Current Projects
DARPA Resource Allocation in Dyamic Uncertain
Domains (TargetShare) (Nagabushan Mahadevan,
Kiran Tvarlapati)
NCR/WalMart Teradata Architecture for Data
Mining Warehousing (Natalie Pakhomkina)
DODSCARNG Vibration Monitoring Enhancement
Program (VMEP) (Natalie Pakhmokina, Elena Zagrai)
11TargetShare Allocating Resources via Negotiating
Agents who use Bayesian Networks to Deal with
Uncertainty
12Sensor Process Models
Process model
unknown
State(T-1)
State(T)
State(T1)
State(T2)
Sensor Model
Signals(T1)
Signals(T)
Signals(T2)
Signals(T-1)
observed
13Sensor and Process Models (v2)
14Monitoring the Sensors
- Sensor is rewarded if its reading is a positive
contribution towards the correct prediction. - Sensor is penalized (less trusted) if reading was
a negative contribution or if reading is not
available for a long time
Sensor1 Sensor2 Sensor3 Sensor4 Target
Loc
15Tested Tracks
S
S
S
S
S
16Results
17Wal-Mart NCR Teradata Architecturefor
E-Commerce, Data Mining Data Warehousing
18Teradata at USC
- 5100M System
- 10 nodes with 8 Processors
- 2 GB of memory/node
- 400 disks of 4.2 GB, or 1.7 Terabytes of Disk
Storage - Ultra wide SCSI
- RAID 5
19VMEP
VMEP/VMU SYSTEM PROTOTYPE BETA TESTING AT SCARNG
SC-ARNG AASF
UH-60
AH-64
Data Warehousing and Mining
VMUs
USC Data Repository
Vibration Data
Crew Chiefs Laptop
Crew Chiefs Laptop
Condition Indicators
OS Cost Benefit Analysis
RTB and HUMS CIs
RTB Vibration Management
Parts and Maintenance
HUMS Vibration Diagnostics and Prognostics
Logistics Maintenance Data
ULLS-A
Product Qualification
AMCOM
IAC
MIMOSA
- VMEP data must be
- Catalogued
- Time/aircraft synchronized
- Accessible and retrievable
VMU 1-- 50
20Teradata at USC
21Teradata at USC
22Teradata at USC
23Principles Technologies
Probabilistic Networks Influence Diagrams (C,
JAVA, XML )
Distributed Systems and DB Design (SQL, ODBC,
JDBC, XML, C, Java, SOAP .NET, C)