Pattern Analysis - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Pattern Analysis

Description:

Pattern Analysis. Prof. Bennett. Math Model of Learning and Discovery 1/17/03 ... Statistical Pattern. A general statistical pattern, f, for data source S ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 28
Provided by: Stude8
Category:
Tags: analysis | pattern

less

Transcript and Presenter's Notes

Title: Pattern Analysis


1
Pattern Analysis
  • Prof. Bennett
  • Math Model of Learning and Discovery 1/17/03
  • Based on Chapter 1 of
  • Shawe-Taylor and Cristianini

2
Outline
  • What is pattern analysis?
  • Illustrate issues via example
  • Pattern definitions
  • Examples of practical tasks
  • Pattern algorithms
  • Summary

3
Pattern Analysis
  • The automatic detection of patterns in data from
    the same source.
  • Make predictions of new data coming from the same
    source.
  • Data may take many forms
  • images, text, records of commercial
    transactions, genome sequences, family tree

4
Data Driven Analysis
Kepler Analyzed Brahes Planetary Motion Data P
Period D Average Distance from Sun
5
Found Regularities
  • Observed P3 D2
  • Developed three laws of planetary motion.
  • Compressible
  • Data can be represented by one column
  • Predictable
  • Discovering hidden relations allow us to
    predict other columns.
  • Third Law is exact.

6
Data Representation I
  • Nonlinear Model of D and P
  • Linear Model of

7
Data Representation II
  • Say know plane of orbit so we can represent
    positions as (x,y) pairs
  • Also know orbit is ellipse

8
Data Representation
  • Pattern is nonlinear function of x,y
  • Pattern is linear function of
  • Linear relationships are easier to find.

9
Set of Hypotheses
  • Hypothesis Ellipse compute
  • Hypothesis Circle compute

UNDERFITS
10
Set of Hypotheses
  • Hypothesis any continuous function

OVERFITS!!!
Depends on size of hypothesis class
Use domain knowledge to limit hypotheses
11
Approximate Pattern
Noisy Data
12
Typical Pattern Analysis
  • Approximate not exact.
  • Data has errors and omissions.
  • Cannot predict graduate school performance from
    GREs and grades alone.
  • Best Representation/Model unknown.
  • Make approximate predictions need to address
    how accurate estimates are.

13
Definition Exact Pattern
  • A general exact pattern, f, for data source S
    satisfies
  • for all data x from source S

14
Approximate Pattern
  • A general approximate pattern, f, for data source
    S satisfies
  • for all data x from source S

15
Statistical Pattern
  • A general statistical pattern, f, for data source
    S generated iid according to distribution D
    satisfies
  • for all data x from source S

16
Two and Multiclass Classification
  • Example Character Recognition
  • two class - is it an A or not?
  • multiclass what letter is it ?

17
Regression
  • Example Determine drug bioavailability through
    the intestine. Estimate apparent permeability as
    assayed via intestinal cell line.

18
Density Estimation
  • Estimate the probability that a particular event
    occurs, p(x). Use it to detect improbably
    events like fraud.

19
Principal Component Analysis
  • Find a projection of the data that captures the
    major variance in the data.
  • Eigenfaces - capture essential qualities of
    faces to help ID and reduce storage needs.

20
Other Tasks
  • Reinforcement Learning
  • Robot senses state of the world,
  • Must learn action to take,
  • Periodically receives
  • rewards delivers mail
  • punishments hits wall
  • What is the learning model?

21
Pattern Analysis Algorithm
  • A Pattern Analysis Algorithm
  • input finite set of data from source S
  • a.k.a. the training set
  • output detector function f
  • or no patterns detected

22
Pattern Algorithm Issues
  • Efficiency and Scalability memory and CPU
    requirements, large data sets
  • Robustness find approximate patterns on noisy
    data
  • Stability - discover genuine patterns, find same
    problems on different views of the dataset

23
Stability
  • Generalization
  • Find pattern on future data
  • Pattern may exist by chance for finite sample
  • Provide statistical guarantee that pattern
    truly exist with caveat that with small
    probability that algorithm may have been mislead.

24
Example
  • Observe that for state agency that all 20 babies
    adopted in last 10 years from country x are
    girls.
  • Pattern, only girls are available for adoption
    from that country.
  • With probability p(0.5)220 could observe data
    even if chance of girls and boys equally likely.
  • So with chance p, we were mislead.

25
Statistical Learning Theory
  • Produce a pattern based on a finite sample.
    Provide bounds on the probability that pattern
    approximately represents a true pattern with some
    probability.
  • Probably Approximately Correct

26
Recoding Strategy
  • With proper representation, the problem can
    become easier (linear model works).
  • Develop general purpose linear learning methods.
  • Change recoding using kernel functions

27
Key Ideas
  • Patterns are regularities in data from a
    specified source
  • Algorithm takes finite sample and computes
    pattern
  • Efficiency, robustness, and stability
  • Representation -- Kernels
  • Strategy Generic Algorithms Recoding
  • Many Learning Tasks in this framework
Write a Comment
User Comments (0)
About PowerShow.com