Example: Intel, Novelus, Motorola, Dell depend on the pric - PowerPoint PPT Presentation

About This Presentation

Title:

Example: Intel, Novelus, Motorola, Dell depend on the pric

Description:

Example: Intel, Novelus, Motorola, Dell depend on the price of Microsoft ... DELL. AMAT. HPQ. Module Network Components. Module Assignment Function ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 52

Provided by: btas

Category:

more less

Transcript and Presenter's Notes

Title: Example: Intel, Novelus, Motorola, Dell depend on the pric

1
Learning Module Networks

Eran Segal
Stanford University

Aviv Regev (Harvard) Nir Friedman (Hebrew U.)
Joint work with Dana Peer (Hebrew U.) Daphne
Koller (Stanford)
2
Learning Bayesian Networks

Density estimation
Model data distribution in population
Probabilistic inference
Prediction
Classification
Dependency structure
Interactions between variables
Causality
Scientific discovery

3
Stock Market

Learn dependency of stock prices as a function of
Global influencing factors
Sector influencing factors
Price of other major stocks

4
Stock Market

Learn dependency of stock prices as a function of
Global influencing factors
Sector influencing factors
Price of other major stocks

MSFT
DELL
INTL
NVLS
MOT
5
Stock Market

Learn dependency of stock prices as a function of
Global influencing factors
Sector influencing factors
Price of other major stocks

Bayesian Network
DELL
INTL
MSFT
NVLS
MOT
6
Stock Market

4411 stocks (variables)
273 trading days (instances) from Jan.02
Mar.03

Problems
Statistical robustness
Interpretability

7
Key Observation

Many stocks depend on the same influencing
factors in much the same way
Example Intel, Novelus, Motorola, Dell depend on
the price of Microsoft
Many other domains with similar characteristics
Gene expression
Collaborative filtering
Computer network performance

8
The Module Network Idea
Bayesian Network
MSFT
MOT
INTL
DELL
AMAT
HPQ
9
Problems and Solutions

Statistical robustness
Interpretability

10
Outline

Module Network
Probabilistic model
Learning the model
Experimental results

11
Module Network Components

Module Assignment Function
A(MSFT)MI
A(MOT)A(DELL)A(INTL) MII
A(AMAT) A(HPQ)MIII

MSFT
AMAT
HPQ
INTL
MOT
DELL
MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
Module III
12
Module Network Components

Module Assignment Function
Set of parents for each module
Pa(MI)?
Pa(MII)MSFT
Pa(MIII)DELL, INTL

MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
Module III
13
Module Network Components

Module Assignment Function
Set of parents for each module
CPD template for each module

MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
Module III
14
Ground Bayesian Network

A module network induces a ground BN over X
A module network defines a coherent probabilty
distribution over X if the ground BN is acyclic

MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
Module III
15
Module Graph

Nodes correspond to modules
Mi?Mj if at least one variable in Mi is a parent
of Mj

MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
Acyclicity checked efficiently using the module
graph
Module III
16
Outline

Module Network
Probabilistic model
Learning the model
Experimental results

17
Learning Overview

Given data D, find assignment function A and
structure S that maximize the Bayesian score
Marginal data likelihood

18
Likelihood Function
MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
Likelihood function decomposes by modules
Module III
Instance 1
Instance 2
Sufficient statistics of (X,Y)
Instance 3
19
Bayesian Score Decomposition

Bayesian score decomposes by modules

MSFT
Module I
Module j variables
Module j parents
MOT
INTL
DELL
Delete INTL ? ModuleIII
Module II
AMAT
HPQ
Module III
20
Bayesian Score Decomposition

Bayesian score decomposes by modules

MSFT
Module I
MOT
INTL
DELL
A(MOT)2 ? A(MOT)1
Module II
AMAT
HPQ
Module III
21
Algorithm Overview

Find assignment function A and structure S that
maximize the Bayesian score

Find initial assignment A
Dependency structure S
22
Initial Assignment Function
Variables (stocks)
AMAT
MOT
MSFT
DELL
INTL
HPQ
Instances (trading days)
x1
x2
x3
x4
Find variables that are similar across instances
A(MOT) MII A(INTL) MII A(DELL) MII
23
Algorithm Overview

Find assignment function A and structure S that
maximize the Bayesian score

Find initial assignment A
Dependency structure S
24
Learning Dependency Structure

Heuristic search with operators
Add/delete parent for module
Cannot reverse edges
Handle acyclicity
Can be checked efficientlyon the module graph
Efficient computation
After applying operator formodule Mj, only
update scoreof operators for module Mj

MSFT ? ModuleII
X
MSFT
Module I
MOT
MI
MII
MIII
INTL
DELL
Module II
X
INTL ? ModuleI
AMAT
HPQ
?
INTL ? ModuleIII
Module III
25
Learning Dependency Structure

Structure search done at module level
Parent selection
Reduced search space relative to BN
Acyclicity checking
Individual variables only used for computation of
sufficient statistics

26
Algorithm Overview

Find assignment function A and structure S that
maximize the Bayesian score

Find initial assignment A
Dependency structure S
27
Learning Assignment Function

A(DELL)MI
Score 0.7

DELL
DELL
MSFT
Module I
MOT
INTL
Module II
AMAT
HPQ
Module III
28
Learning Assignment Function

A(DELL)MI
Score 0.7
A(DELL)MII
Score 0.9

DELL
MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
Module III
29
Learning Assignment Function

A(DELL)MI
Score 0.7
A(DELL)MII
Score 0.9
A(DELL)MIII
Score cyclic!

MSFT
Module I
MOT
INTL
DELL
Module II
DELL
AMAT
HPQ
Module III
30
Learning Assignment Function

A(DELL)MI
Score 0.7
A(DELL)MII
Score 0.9
A(DELL)MIII
Score cyclic!

MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
Module III
31
Ideal Algorithm

Learn the module assignment of all variables
simultaneously

32
Problem

Due to acyclicity cannot optimize assignment for
variables separately

A(DELL)ModuleIV
A(MSFT)ModuleIII
DELL
MSFT
DELL
DELL
MSFT
DELL
Module I
Module II
MI
MII
DELL
AMAT
HPQ
MIII
MIV
Module III
Module IV
Module graph
Module Network
33
Problem

Due to acyclicity cannot optimize assignment for
variables separately

A(DELL)ModuleIV
A(MSFT)ModuleIII
DELL
MSFT
DELL
DELL
MSFT
DELL
Module I
Module II
MI
MII
DELL
AMAT
HPQ
MIII
MIV
Module III
Module IV
Module graph
Module Network
34
Learning Assignment Function

Sequential update algorithm
Iterate over all variables
For each variable, find its optimal assignment
given the current assignment to all other
variables
Efficient computation
When changing assignment from Mi to Mj, only need
to recompute score for modules i and j

35
Learning the Model
MSFT
AMAT
HPQ

Initialize module assignment A
Optimize structure S
Optimize module assignment A
For each variable, find its optimalassignment
given the currentassignment to all other
variables

INTL
MOT
DELL
MSFT
Module I
MOT
INTL
DELL
Module II
AMAT
HPQ
MOT
Module III
36
Related Work
Bayesian networks
Parameter sharing
PRMs
OOBNs
Module Networks
37
Outline

Module Network
Probabilistic model
Learning the model
Experimental results
Statistical validation
Case study Gene regulation

38
Learning Algorithm Performance
-128
-129
Bayesian score (avg. per gene)
-130
Algorithm iterations
-131
0
5
10
15
20
39
Generalization to Test Data

Synthetic data 10 modules, 500 variables

40
Generalization to Test Data

Synthetic data 10 modules, 500 variables

500 instances
200 instances
Test data likelihood (per instance)
100 instances

Gain beyond 100 instances is small

25 instances
50 instances
Number of modules
41
Structure Recovery Graph

Synthetic data 10 modules, 500 variables

500 instances
200 instances
Recovered structure ( correct)
100 instances
50 instances
25 instances
Number of modules
42
Stock Market

4411 variables (stocks), 273 instances (trading
days)
Comparison to Bayesian networks (cross validation)

43
Regulatory Networks

Learn structure of regulatory networks
Which genes are regulated by each regulator

44
Gene Expression Data
Experiments

Measures mRNA level forall genes in one
condition
Learn dependency of the expression of genes as a
function of expression of regulators

Induced
Genes
Repressed
45
Gene Expression

2355 variables (genes), 173 instances (arrays)
Comparison to Bayesian networks

46
Biological Evaluation

Find sets of co-regulated genes (regulatory
module)
Find the regulators of each module

46/50
30/50
Segal et al., Nature Genetics, 2003
47
Experimental Design

Hypothesis Regulator X activates process Y
Experiment Knock out X and repeat experiment

X
Segal et al., Nature Genetics, 2003
48
Differentially Expressed Genes
Segal et al., Nature Genetics, 2003
49
Biological Experiments Validation

Were the differentially expressed genes predicted
as targets?
Rank modules by enrichment for diff. expressed
genes

Segal et al., Nature Genetics, 2003
50
Summary

Probabilistic model for learning modules of
variables and their structural dependencies
Improved performance over Bayesian networks
Statistical robustness
Interpretability
Application to gene regulation
Reconstruction of many known regulatory modules
Prediction of targets for unknown regulators

51
Thank You!

Write a Comment

User Comments (0)