Title: Westphal
1Westphal Blaxton Chapter 3 Defining the
Problems to be Solved
- MIS 6473- DATA MINING
- Dr. Segall
- Spring 2004
2Defining the Problems to be Solved
- Analyzing data requires that you have a broad
sense of how to classify the types of information
and the knowledge that you are working with as
well as how to construct an analysis so as to
move from one class of knowledge to another.
3Defining the Problems cont.
- Analyst must be flexible.
- Four frameworks
- Think of knowledge being represented in a
hierarchy ranging from single objects all the way
up to full systems - Distinguish between knowing how procedures are
accomplished opposed to knowing that certain
facts are true about the situation - Characterize your problem in terms of
metaknowledge and actual knowledge - Map your problem onto the orthogonal dimensions
of situations versus parameter values.
4Defining the Problem cont.
- Once the problem is mapped onto a conceptual
framework, analyst must think about whether the
analysis will proceed in the reactive mode,
proactive mode, or a combination of both.
5Challenging Analysts to Think Outside the Box
- Analyst becomes comfortable with few, select, and
limited choices. - Rat Analogy
- Driving without maps and cooking without recipes.
- No road maps for data mining
- Southern Cooking requires no recipes only simple
ingredients. Different dishes are a result of
variation in mixing the simple ingredients.
6Theres more than one way to slice a Bagel
- How would you cut a bagel in half?
- Find out the real requirements of a task before
taking any action to complete it.
7Mapping your problem onto a Hierarchical Framework
8From Objects to Networks Applications and Systems
- Basic unit of knowledge representation is an
object - Relationships are like objects, they are mutually
exclusive of one another - Relationships and objects together form networks
of data - The integration of one or more networks forms the
basis of an application - Applications are usually stand alone solutions
developed for a particular problem or targeted to
a specific domain - Systems are an environment used by analysts to
perform a wide range of data mining activities
9Distinguishing between Knowing How and Knowing
That Procedural vs. Declarative Knowledge
- Procedural knowledge knowing how to do things
- Declarative knowledge or knowing that,
represents factual information about the world.
10Breaking Declarative Knowledge into Subcategories
- Two types of declarative knowledge episodic or
semantic. - Episodic- temporal or spatial information
- Semantic- descriptive representations
11Distinguishing between Metaknowledge and Actual
Knowledge
- First dimension Actual status
- Second dimension Metaknowledge
- Four categories of knowledge
- YKYK
- YKDK
- DKYK
- DKDK
12The information that You Know You Know (YKYK)
- This is the simplest case of information
knowledge. - It is information that actually exists and the
user is aware that the information is there. - Ex Water boils at 100 degrees Celsius.
13The information that You Know You Dont Know
(YKDK)
- Information that is generally not known or
readily accessible but can be researched to find
the answer. - Ex - Exact height of Empire State Building
- - The area of Central Park
- - How many vehicles are registered at ASU.
14The information that You Dont Know You Know
(DKYK)
- Type of information targeted in exploratory
analyses where you may not have definite ideas
about what is expected to be found. - Discovery of information that is already
accessible but not currently being used. - Ex - exposure of fraud through investigation
(and a little snooping)
15The information that You Dont Know You Dont
Know (DKDK)
- The most vulnerable situation and it affects all
aspects of business. - The boundaries of the DKDK knowledge are
undefined and based on unknown parameters.
16Dimensions of Metaknowledge and Actual knowledge
within the problem space
17Distinguishing between Situations and Parameter
Values
- We can recast the world into different paradigm.
- Conducting a financial transaction, making a
telephone call, or fixing a car. - All have a set of parameters.
- Conduct a matrix
18Known Situation and Established Parameter
Boundaries
- Risks and probabilities
- Exceptions easily flagged
- Example Going over on credit card.
19Unknown Situation and Established Parameter
Boundaries
- Bottom-up analyses
- Detect patterns in existing data
- Example - Medical case reports
20Unknown Situation and No Established Parameter
Boundaries
- Most threatening circumstances
- Proactive invigilation
- Behavior is not yet discovered
21Figure 3.3
22Performing Analyses in Reactive and Proactive
Modes
- Reactive Analysis
- Proactive Analysis
23Performing Reactive Analysis
- Concrete question? yes/no
- Analysis is on entity
- Indexed structures
24Performing Proactive Analysis
- Unknown and cannot be defined
- Big picture
- Structuring the Proactive Slice of Data
25Combining Proactive and Reactive Techniques
- Work on terabytes of data
- Combination to navigate through the data
- Example - Internet
26Figure 3.4
27Summing Up
- Thinking of data
- Next get the data in shape
28Dr. Segalls Questions
29Dr. Segalls Question 1
- An object is the basic unit of knowledge
representation. At this level, what are all
patterns and trends based on and explain how this
relates to the hierarchical framework for
knowledge.
30Dr. Segalls Question 2
- What are the four dimensions of actual or
metaknowledge? Explain each of these.
31Dr. Segalls Question 3
- Explain what is the bottom up analyses.