Automated Learning and Discovery - PowerPoint PPT Presentation

About This Presentation
Title:

Automated Learning and Discovery

Description:

9714 patient records, each describing a pregnancy and birth. Each patient record contains 215 ... Trimester Ultrasound, and Malpresentation at admission ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 25
Provided by: stephene2
Category:

less

Transcript and Presenter's Notes

Title: Automated Learning and Discovery


1
Automated Learning and Discovery
  • Professor Tom M. Mitchell, Director
  • Center for Automated Learning and Discovery
  • Carnegie Mellon University
  • www.cs.cmu.edu/cald
  • March, 1999

2
Talk Organization
  • Overview of datamining issues
  • An example research project
  • Mining the world wide web

3
The Opportunity
  • Explosion in online data
  • Inexpensive computational power
  • Advances in automated learning algorithms

How can we best use historical data to improve
future decisions?
4
Typical Datamining Task
  • Data
  • Patient103 (timet0) Patient103 (timet1)
    Patient103 (timetn)
  • Age 23 Age 23 Age 23
  • FirstPregnancy no FirstPregnancy no
    FirstPregnancy no
  • Anemia no Anemia no Anemia no
  • Diabetes no Diabetes YES Diabetes no
  • PreviousPrematureBirth no PreviousPrematureBir
    th no PreviousPrematureBirth no
  • Ultrasound ? Ultrasound abnormal
    Ultrasound ?
  • Elective C-Section? Elective C-Sectionno
    Elective C-Sectionno
  • Emergency C-Section? Emergency C-Section?
    Emergency C-Section? Yes
  • Given
  • 9714 patient records, each describing a
    pregnancy and birth
  • Each patient record contains 215 features
  • Learn to predict
  • Classes of future patients at high risk for
    Emergency
  • Cesarean Section

5
Datamining Result
One of 18 learned rules If No previous vaginal
delivery, and Abnormal 2nd Trimester Ultrasound,
and Malpresentation at admission Then
Probability of Emergency C-Section is 0.6 Over
training data 26/41 .63, Over test data 12/20
.60

6
Credit Risk Analysis
  • Customer103 (timet0) Customer103 (timet1)
    Customer103 (timetn)
  • Years of credit 9 Years of credit 9 Years
    of credit 9
  • Loan balance 2,400 Loan balance 3,250 Loan
    balance 4,500
  • Income 52k Income ? Income ?
  • Own House Yes Own House Yes Own House Yes
  • Other delinquent accts 2 Other delinquent
    accts 2 Other delinquent accts 3
  • Max billing cycles late 3 Max billing cycles
    late 4 Max billing cycles late 6
  • Profitable customer?? Profitable
    customer?? Profitable customer? NO

7
Credit Risk Analysis

Rules learned from synthesized data If
Other-Delinquent-Accounts gt 2, and
Number-Deliquent-Billing-Cycles gt1 Then
Profitable-Customer? No Deny Credit Card
Application If Other-Deliquent-Accounts
0, and (Income gt 30k) OR
(Years-of-Credit gt 3) Then Profitable-Customer?
Yes Accept Credit Card application
8
  • Customer purchase behavior
  • Customer103 (timet0) Customer103
    (timet1) Customer103 (timetn)
  • Sex M Sex M Sex M
  • Age 53 Age 53 Age 53
  • Income 50k Income 50k Income 50k
  • Own House Yes Own House Yes Own House Yes
  • MS Products Word MS Products Word MS Products
    Word
  • Computer 386 PC Computer Pentium Computer
    Pentium
  • Purchase Excel?? Purchase Excel?? Purchase
    Excel? YES

9
  • Customer retention
  • Customer103 (timet0) Customer103
    (timet1) Customer103 (timetn)
  • Sex M Sex M Sex M
  • Age 53 Age 53 Age 53
  • Income 50k Income 50k Income 50k
  • Own House Yes Own House Yes Own House Yes
  • Checking 5k Checking 20k Checking 0
  • Savings 15k Savings 0 Savings 0
  • Current-customer?yes Current-customer?yes
    Current-customer?NO

10
  • Process optimization
  • Product72 (timet0) Product72
    (timet1) Product72 (timetn)
  • Stage mix Stage cook Stage cool
  • Mixing-speed 60rpm Temperature 325 Fan-speed
    medium
  • Viscosity 1.3 Viscosity 3.2 Viscosity 1.3
  • Fat content 15 Fat content 12 Fat content
    12
  • Density 2.8 Density 1.1 Density 1.2
  • Spectral peak 2800 Spectral peak 3200 Spectral
    peak 3100
  • Product underweight??? Product
    underweight??? Product underweight?YES

11
Where Is this Headed?
  • Today tip of the iceberg
  • First generation algorithms regression, neural
    nets, d-trees, ...
  • Applied to single databases
  • Budding industry
  • Tomorrow
  • Learn across multiple media data
  • Learn across multiple databases, including web,
    newsfeeds
  • Learn through active experimentation
  • Learn decisions rather than predictions

12
A Datamining Research Agenda
  • Scientific Issues,
  • Basic Technologies Applications

Learn from mixed media data, e.g., Numeric, text,
image, voice, sensor, Active experimentation,
exploration Optimize decisions rather than
predictions Invent new features to
improve accuracy Learn from multiple databases
and the world wide web
Medicine Manufacturing Financial Intelligence
analysis Public Policy Marketing
13
Part II Mining the World Wide Web

14
Project Goals
  • Observation
  • The web is rapidly becoming the worlds largest
    information resource
  • Retrievable by computer, but readable only to
    humans
  • Our goal
  • Automatic construction of computer readable
    knowledge base from the web

15
Automatically Extracted WebKB Description
16
(No Transcript)
17
Automatically Extracted WebKB Description
18
Hypertext Information Extraction
Analyze hypertext at 3 levels of resolution
  • Individual web pages
  • faculty, student, company, product
  • Groups of interconnected web pages
  • teaches-course(p, c)
  • Individual sentences and fragments
  • we produce oranges in Florida

19
Bag of Words Classification
aardvark 0 about 2 all 2 Africa 1 apple 0 anxious
0 ... gas 1 ... oil 1 Zaire 0
20
Total Oil Corporation Web Site
21
Learned First Order Rules
IF Person(B) Research_Project(A) Hyperlink(C,A,
D) Neighborhood_word(C,people) Hyperlink(E,D,B
) Then Member_Of_Research_Project(A,B)
Accuracy 135/138
D
A
B
Research Project
Person
people
22
Learning to Extract Information
We are headquartered in sunny Tehran.
  • Focus
  • linguistic structure
  • train with minimal effort

If Verb phrase ?V has direct object ?O and
head of ?V is headquartered Then Location is ?O
23
Web Information Extraction
Economic Sector
Corporate Locations
Student
Research Project
Course
Default Accuracy
1
lt49
lt5
lt5
lt5
Learned Accuracy
76 (cov 100)
83 (cov 60)
72
89
73
24
Automatically Extracted KB
Write a Comment
User Comments (0)
About PowerShow.com