Title: Collection of general data mining briefings
1Data Management Information Management Knowledge
Management for Network Centric Operations
Dr. Bhavani Thuraisingham The University of Texas
at Dallas
October 2005
2Data, Information and Knowledge Management
Definitions
Knowledge Management
Acquiring knowledge
Collaboration and sharing
Managing the processes Disseminating the
knowledge Taking action
Information Management
Extracting information from the data
Visualizing the data
Data Management
Data administration
Database management
3What is data management?
- One proposal Data Management Database System
Management Data Administration - Includes data analysis, data administration,
database administration, auditing, data modeling,
database system development, database application
development
4Data Administration
- Identifying the data
- Data may be in files, paper, databases, etc.
- Analyzing the data
- Is the data of good quality?
- Is the data complete?
- Data standardization
- Should one standardize all the data elements and
metadata? - Repositories for handling semantic heterogeneity?
- Data Security
- How should data be secured?
- Data modeling
- Structure the data, model the data and the
processes
5Data Administration (Continued)
- Data quality provides some measure for
determining the accuracy of the data - Is the data current? Can we trust the source?
- Data quality parameters can be passed from source
to source - E.g., Trust A 50 and Trust B 30
- Data may have different semantics
- E.g, Bank A may send out statement on the 20th
day of each month and Bank B may send out
statements on the 5th day of each month - Fighter jet and Passenger plane may be considered
to be one and the same
6Data Administration (Concluded)
- Data Standards
- Standards for data semantics and administration
- E.g., XML (eXtensible Markup Language) for
document interchange - Data security includes data confidentiality and
integrity - Confidentiality is about preventing unauthorized
access to the data - Integrity is about preventing malicious
corruption to the data
7An Example Database System
8Metadata
- Metadata describes the data in the database
- Example Database D consists of a relation EMP
with attributes SS, Name, and Salary - Metadatabase stores the metadata
- Could be physically stored with the database
- Metadatabase may also store constraints and
administrative information - Metadata is also referred to as the schema or
data dictionary
9Three-level Schema Architecture Details
User B2
User A1
User A2
User A3
User B1
External Schema B
External Model A
External Schema A
External Model B
External/Conceptual Mapping A
External/Conceptual Mapping B
Conceptual Model
Conceptual Schema
Conceptual/Internal Mapping
Stored Database Internal Model
Internal Schema
10Functional Architecture
Data Management
User Interface Manager
Schema (Data Dictionary) Manager (metadata)
Security/ Integrity Manager
Query Manager
Transaction Manager
Storage Management
File Manager
Disk Manager
11Types of Database Systems
- Relational Database Systems
- Distributed and Federated Database Systems
- Object Database Systems
- Deductive Database Systems
- Other
- Real-time, Secure, Parallel, Scientific,
Temporal, Wireless, Functional,
Entity-Relationship, Sensor/Stream Database
Systems, etc.
12Relational Database Example
Relation S S SNAME STATUS CITY S1 Smith
20 London S2 Jones 10
Paris S3 Blake 30
Paris S4 Clark 20 London S5
Adams 30 Athens Relation P P
PNAME COLOR WEIGHT CITY P1 Nut
Red 12 London P2 Bolt
Green 17 Paris P3 Screw
Blue 17 Rome P4 Screw
Red 14 London P5 Cam
Blue 12 Paris P6 Cog
Red 19 London
Relation SP S P QTY S1 P1
300 S1 P2 200 S1 P3 400 S1 P4
200 S1 P5 100 S1 P6 100 S2
P1 300 S2 P2 400 S3 P2
200 S4 P2 200 S4 P4 300 S4 P5
400
13Example Object
Composite Document Object
Section 2 Object
Section 1 Object
Paragraph 1 Object
Paragraph 2 Object
14Distributed Database System
15Query Processing Example
DQP (Distributed Query Processor)
Network
DQP
DQP
DQP
DBMS 3
DBMS 1
DBMS 2
EMP1 (20) EMP3 (50) DEPT3 (30)
EMP2 (30) DEPT2 (20)
EMP1 (20)
Query at site 1 Join EMP and DEPT on D Move
EMP2 to site 3 Merge EMP1, EMP2, EMP3 to form
EMP Move DEPT2 to site 3 Merge DEPT2 and DEPT3
to form DEPT Join EMP and DEPT Move result to
site 1
16Transaction Processing Example
DTM (Distributed Transaction Manager)
responsible for executing the distributed transact
ion
Issues Concurrency control Recovery Data
Replication
Site 1 Coordinator
Transaction Tj
Subtransaction Tj4
Subtransaction Tj2
Subtransaction Tj3
Site 2 Participant
Site 4 Participant
Site 3 Participant
Two-phase commit Coordinator queries
participants whether they are ready to
commit If all participants agree, then
coordinator sends request for the participants to
commit
17Interoperability of Heterogeneous Database Systems
Database System A
Database System B
(Relational)
(Object- Oriented)
Network
Transparent access to heterogeneous databases -
both users and application programs Query,
Transaction processing
Database System C (Legacy)
18Technical Issues on the Interoperability of
Heterogeneous Database Systems
- Heterogeneity with respect to data models,
schema, query processing, query languages,
transaction management, semantics, integrity, and
security policies - Interoperability based on client-server
architectures - Federated database management
- Collection of cooperating, autonomous, and
possibly heterogeneous component database
systems, each belonging to one or more
federations
19Different Data Models
Network
Node A
Node B
Node C
Node D
Database
Database
Database
Database
Network Model
Object- Oriented Model
Relational Model
Hierarchical Model
Developments Tools for interoperability
commercial products Challenges Global data
model
20Schema Integration and Transformation An approach
External Schema III
External Schema I
External Schema II
Global Schema Integrate the generic schemas
Generic schema describing the relational database
Generic schema describing the network database
Generic schema describing the hierarchical databas
e
Generic schema describing the object-oriented data
base
Schema describing the network database
Schema describing the hierarchical database
Schema describing the object-oriented database
Schema describing the relational database
Challenges Selecting appropriate generic
representation maintaining
consistency during transformations
21Semantic Heterogeneity
- Semantic heterogeneity occurs when there is a
disagreement about the meaning or interpretation
of the same data or same data interpreted
differently
Object O
Challenges Standard definitions Repositories
Node A
Node B
Database
Database
Object O interpreted as a passenger ship
Object O interpreted as a submarine
22Federated Database Management
Database System A
Database System B
Federation F1
Cooperating database systems yet maintaining some
degree of autonomy
Federation F2
Database System C
23Autonomy
component A honors the local request first
request from component
local request
Component A
Component B
Challenges Adapt techniques to handle autonomy
- e.g., transaction processing, schema
integration transition research to products
communication through federation
component A does not communicate with component C
Component C
24Federated Data and Policy Management
Data/Policy for Federation
Export
Export
Data/Policy
Data/Policy
Export
Data/Policy
Component
Component
Data/Policy for
Data/Policy for
Agency A
Agency C
Component
Data/Policy for
Agency B
25What is Information Management?
- Information management essentially analyzes the
data and makes sense out of the data - Several technologies have to work together for
effective information management - Data Warehousing Extracting relevant data and
putting this data into a repository for analysis - Data Mining Extracting information from the data
previously unknown - Multimedia managing different media including
text, images, video and audio - Web managing the databases and libraries on the
web
26Data Warehouse
Data Warehouse Data correlating Employees
With Medical Benefits and Projects
Could be any DBMS Usually based on the
relational data model
Users Query the Warehouse
Oracle DBMS for Employees
Sybase DBMS for Projects
Informix DBMS for Medical
27What is Data Mining?
28Steps to Data Mining
Clean/ modify data sources
Mine the data
Integrate data sources
Report final results/ Take actions
Examine Results/ Prune results
Data Sources
29Data Mining Needs for Counterterrorism
Non-real-time Data Mining
- Gather data from multiple sources
- Information on terrorist attacks who, what,
where, when, how - Personal and business data place of birth,
ethnic origin, religion, education, work history,
finances, criminal record, relatives, friends and
associates, travel history, . . . - Unstructured data newspaper articles, video
clips, speeches, emails, phone records, . . . - Integrate the data, build warehouses and
federations - Develop profiles of terrorists,
activities/threats - Mine the data to extract patterns of potential
terrorists and predict future activities and
targets - Find the needle in the haystack - suspicious
needles? - Data integrity is important
- Techniques have to SCALE
30Data Mining Needs for Counterterrorism
Real-time Data Mining
- Nature of data
- Data arriving from sensors and other devices
- Continuous data streams
- Breaking news, video releases, satellite images
- Some critical data may also reside in caches
- Rapidly sift through the data and discard
unwanted data for later use and analysis
(non-real-time data mining) - Data mining techniques need to meet timing
constraints - Quality of service (QoS) tradeoffs among
timeliness, precision and accuracy - Presentation of results, visualization, real-time
alerts and triggers
31Data Mining as a Threat to Privacy
- Data mining gives us facts that are not obvious
to human analysts of the data - Can general trends across individuals be
determined without revealing information about
individuals? - Possible threats
- Combine collections of data and infer information
that is private - Disease information from prescription data
- Military Action from Pizza delivery to pentagon
- Need to protect the associations and correlations
between the data that are sensitive or private
32Privacy Preserving Data Mining
User Interface Manager
Privacy Constraints
Constraint Manager
Database Design Tool Structures the database
Data Miner Makes correlations Ensures privacy
Query Processor Constraints during query and
release operations
DBMS
Database
33Current Status, Challenges and Directions
- Status
- Data Mining is now a technology
- Several prototypes and tools exist Many or
almost all of them work on relational databases - Challenges
- Mining large quantities of data Dealing with
noise and uncertainty, reasoning with incomplete
data, Eliminating False positives and False
negatives - Directions
- Mining multimedia and text databases, Web mining
(structure, usage and content), Mining metadata,
Real-time data mining, Privacy
34Semantic Web Overview
- According to Tim Berners Lee, The Semantic Web
supports - Machine readable and understandable web pages
- Enterprise application integration
- Nodes and links that essentially form a very
large database - Premise
- Semantic Web Applications Web Database
Management - Web Services Information Integration - - -
- - - Semantic Web Technologies XML, RDF, Ontologies,
Rules-ML
35Layered Architecture for Dependable
Semantic Web
- Adapted from Tim Berners Lees description of the
Semantic Web
- Some Challenges Interoperability between
Layers Security and Privacy cut across all
layers Integration of Services Composability
36What is XML all about?
- XML is needed due to the limitations of HTML and
complexities of SGML - It is an extensible markup language specified by
the W3C (World Wide Web Consortium) - Designed to make the interchange of structured
documents over the web easier - Key to XML are Document Type Definitions (DTDs)
and XML Schemas - Allows users to bring multiple files together to
form compound documents
37What is Knowledge Management?
- Knowledge management, or KM, is the process
through which organizations generate value from
their intellectual property and knowledge-based
assets - Gartner group KM is a discipline that promotes
an integrated approach to identifying and sharing
all of an enterprise's information assets,
including databases, documents, policies and
procedures as well as unarticulated expertise and
experience resident in individual workers - Peter Senge Knowledge is the capacity for
effective action, this distinguishes knowledge
from data and information KM is just another
term in the ongoing continuum of business
management evolution
38Knowledge Management Components
Knowledge
Components of
Management
Components,
Cycle and
Technologies
Cycle
Technologies
Components
Knowledge, Creation
Expert systems
Strategies
Sharing, Measurement
Collaboration
Processes
And Improvement
Training
Metrics
Web
39KM Strategy, Process and Metrics
- Strategy
- Motivation for KM and how to structure a KM
program - Process
- Use of KM to make existing practice more
effective - Metrics
- Measure the impact of KM on an organization
40Strategy Building Learning Organizations
- Adaptive learning and Generative learning
- Need to adapt to the changing environment
- Total quality movement (TQM) in Japan has
migrated to a generative learning model - Look at the world in a new way
- Changing roles of the leader
- Migrating from decision makers to designers,
teachers and stewards - Building a shared vision
- Encouraging ideas, Requesting support, Moving
beyond blame, Effective communication - Learning tools
- Learning laboratory
41Knowledge Management in Process Management
- Types of Processes
- Simple processes Low level operation
- Complex and nonadapative processes Systems that
use the same rules - Complex and adaptive Agents carrying out the
processes are intelligent and adaptive - Linking knowledge management with processes
- Knowledge management is needed for all processes
critical for complex and adaptive processes - Learn from experience and use the experience in
unknown situations
42Metrics The Balanced Scorecard
- Employee Capabilities Measuring the following
- Employee satisfaction
- Employee retention
- Employee productivity
- Information system capabilities Measuring the
following - Whether each employee segment has information to
carry out its operations. - Motivation and Empowerment Measuring the
following - Suggestions made and implemented
- Improvement
- Team performance
43Knowledge Management Architecture
Knowledge Creation and Acquisition Manager
Knowledge Representation Manager
Knowledge Dissemination and Sharing Manager
Knowledge Manipulation Manager
44Secure Knowledge Management
- Protecting the intellectual property of an
organization - Access control including role-based access
control - Security for process/activity management and
workflow - Users must have certain credentials to carry out
an activity - Composing multiple security policies across
organizations - Security for knowledge management strategies and
processes - Risk management and economic tradeoffs
- Digital rights management and trust negotiation
-
45Status and Directions
- Knowledge management has exploded due to the web
- Knowledge Management has different dimensions
- Technology, Business
- Goal is to take advantage of knowledge in a
corporation for reuse - Tools are emerging
- Need effective partnerships between business
leaders, technologists and policy makers - Knowledge management may subsume information
management and data management - Vague boundaries
46Other Ideas and Directions?
- Prof. Bhavani Thuraisingham
- Director Cyber Security Center
- Department of Computer Science
- Erik Jonsson School of Engineering and Computer
Science - The University of Texas at Dallas
- Richardson, Texas
- bhavani.thuraisingham_at_utdallas.edu
- http//www.utdallas.edu/bxt043000/
- President
- Dr-Bhavani Security Consulting
- Dallas, TX
- www.dr-bhavani.org
-