Title: Talk Announcement
1Talk Announcement
- Michael Jordan (no, not that Michal Jordan)
- Statistical Machine Learning Researcher from
Berkeley - Very relevant topicRecent Developments in
Nonparametric Hierarchical Bayesian Modeling - November 13 (Monday), 2006 at 400 p.m.
- 1404 Siebel Center for Computer Science
- We will require attendance for CS 446 try to
arrive earlyYour next assignment! - If you cannot attend (or even if you
can)http//www.stat.berkeley.edu/jordan/674.pdf
- Reception after the talk in the 2nd Floor Atrium
of Siebel Center you are invited
2Written Assignment due Wednesday 11/15
- Due next Wednesday in class
- One paragraph (or so but no more than one page
double spaced, readable font) - What I learned from Michael Jordans research of
talk/paper - Something you didnt know before
- Something you understand now (at least a little)
- Something IMPORTANT
- REACH!
- Some faculty love to show off technical prowess
- Michael will not be easy to follow
- Be tenacious
- Try to see the forest while hes describing tree
leaves - Tell me about what you think the forest is /
might be / should be
3Next (future) Programming Assignment
- Not assigned yet
- Compare naïve Bayes and logistic regression as
examples of generative and discriminative
classifiers - A new text chapter available from
MitchellGenerative and Discriminative
Classifiers Naive Bayes and Logistic Regression
http//www.cs.cmu.edu/7Etom/NewChapters.htmlor
navigate down fromhttp//www.cs.cmu.edu/tom/ - A classic paperA. Y. Ng and M. Jordan. On
discriminative vs. generative classifiers A
comparison of logistic regression and naive
Bayes. In Proceeding of Fourteenth Neural
Information Processing Systems, 2002.
4Jordans Abstract
Much research in statistics and machine learning
is concerned with controlling some form of
tradeoff between flexibility and variability. In
the Bayesian approach, such control is often
exerted via hierarchies---stochastic
relationships among prior distributions.
Nonparametric Bayesian statisticians work with
priors that are general stochastic processes
(e.g., distributions on spaces of continuous
functions, spaces of monotone functions, or
general measure spaces). Thus flexibility is
emphasized and it is of particular importance to
exert hierarchical control. In this talk I
discuss Bayesian hierarchical modeling in the
setting of two particularly interesting
stochastic processes the Dirichlet process and
the beta process. These processes are discrete
with probability one, and have interesting
relationships to various random combinatorial
objects. They yield models with open-ended
numbers of "clusters" and models with open-ended
numbers of "features," respectively. I discuss
Bayesian modeling based on hierarchical Dirichlet
process priors and hierarchical beta process
priors, and present applications of these models
to problems in bioinformatics, information
retrieval and computational vision.
5Jordans Abstract
- Much research in statistics and machine learning
is concerned with controlling some form of
tradeoff between flexibility and variability.
Modeling, Bias, Variance - In the Bayesian approach, such control is often
exerted via hierarchies---stochastic
relationships among prior distributions.
Hierarchical Bayes, Hyper-Parameters - Nonparametric Bayesian statisticians work with
priors that are general stochastic processes
(e.g., distributions on spaces of continuous
functions, spaces of monotone functions, or
general measure spaces). Non-parametric Models,
Order Statisitcs, Weaker but More Robust Prior
Assumptions, ex samples from increasing fcn?
(linear regression, goodness of fit) - Thus flexibility is emphasized and it is of
particular importance to exert hierarchical
control. - In this talk I discuss Bayesian hierarchical
modeling in the setting of two particularly
interesting stochastic processes the Dirichlet
process and the beta process. Stochastic
Processes as characterizing transitions among
states where a state is an assignment to a set of
random variables (recall MDPs) - These processes are discrete with probability
one, and have interesting relationships to
various random combinatorial objects. Dirichlet
Process ex Chinese Restaurant Process - They yield models with open-ended numbers of
"clusters" and models with open-ended numbers of
"features," respectively. Ex. Chinese Restaurant
Process - I discuss Bayesian modeling based on hierarchical
Dirichlet process priors and hierarchical beta
process priors, and present applications of these
models to problems in bioinformatics, information
retrieval and computational vision.
6Chinese Restaurant Process
- A Chinese restaurant serves an infinite number of
alternative dishes and has an infinite number of
tables, each with infinite capacity. - Each new customer either sits at a table that is
already occupied, with probability proportional
to the number of customers already sitting at
that table, or sits alone at a table not yet
occupied, with probability ? / (n ?), where n
is how many customers were already in the
restaurant. - Customers who sit at an occupied table must order
some dish already being served in the restaurant,
but customers starting a new table are served a
dish at random according to D. - DP(?,D) is the distribution over the different
dishes as n increases - Note the extreme flexibility afforded over the
dishes - Clustering microarray gene expression data,
Natural language modeling, Visual scene
classification - It invents clusters to best fit the data.
- These clusters can be semantically interpreted
images of shots in basketball games, outdoor
scenes on gray days, beach scenes