COMP201 Java Programming - PowerPoint PPT Presentation

About This Presentation
Title:

COMP201 Java Programming

Description:

Latent Tree Analysis of Unlabeled Data Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. http://www.cse.ust.hk/~lzhang – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 42
Provided by: CSD121
Category:

less

Transcript and Presenter's Notes

Title: COMP201 Java Programming


1
Latent Tree Analysis of Unlabeled Data
Nevin L. Zhang Dept. of Computer Science
Engineering The Hong Kong Univ. of Sci.
Tech. http//www.cse.ust.hk/lzhang
2
Outline
  • Latent tree models
  • Latent tree analysis algorithms
  • What can LTA be used for
  • Discovery of co-occurrence/correlation patterns
  • Discovery of latent variable/structures
  • Multidimensional clustering
  • Examples
  • Danish beer survey data
  • Text data
  • TCM survey data

3
Latent Tree Models
  • Tree-structured probabilistic graphical models
  • Leaves observed (manifest variables)
  • Discrete or continuous
  • Internal nodes latent (latent variables)
  • Discrete
  • Each edge is associated with a conditional
    distribution
  • One node with marginal distribution
  • Defines a joint distributions over all the
    variables
  • (Zhang, JMLR 2004)

4
Latent Tree Analysis
From data on observed variables, obtain latent
tree model
  • Learning latent tree models Determine
  • Number of latent variables
  • Numbers of possible states for latent variables
  • Connections among nodes
  • Probability distributions
  • Model Selection Criterion
  • Find the model that maximize the BIC score
  • BIC(mD) log P(Dm, ?) d/2 logN
  • D Data, N sample size
  • m model, ? MLE of parameters
  • d number of free parameters

5
Algorithms EAST
  • Search-based
  • Extension, Adjustment,
  • Simplification until
  • Termination
  • Can deal with
  • 100 observed variables
  • (Chen, Zhang et al. AIJ 2011)

6
  • (Liu, Zhang et al. MLJ 2013)

UniDimensioanlity Test
7
  • (Liu, Zhang et al. MLJ 2013)

8
  • (Liu, Zhang et al. MLJ 2013)

Chow-Liu tree (1968)
9
(Liu, Zhang et al. MLJ 2013)
  • Close to EAST in terms of model quality. Can deal
    with 1,000 observed variables

10
Outline
  • Latent tree models
  • Latent tree analysis algorithms
  • What can LTA be used for
  • Discovery of co-occurrence/correlation patterns
  • Discovery of latent variable/structures
  • Multidimensional clustering
  • Examples
  • Danish beer survey data
  • Text data
  • TCM survey data

11
Danish Beer Market Survey
(Mourad et al. JAIR 2013)
  • 463 consumers, 11 beer brands
  • Questionnaire For each brand
  • Never seen the brand before (s0)
  • Seen before, but never tasted (s1)
  • Tasted, but do not drink regularly (s2)
  • Drink regularly (s3).

12
Why variables grouped as such?
  • GronTuborg and Carlsberg Main mass-market beers
  • TuborgClas and CarlSpec Frequent beers, bit
    darker than the above
  • CeresTop, CeresRoyal, Pokal, minor local beers
  • Grouped as such because responses on brands in
    each group strongly correlated.
  • Intuitively, latent tree analysis
  • Partitions observed variables into groups such
    that
  • Variables in each group are strongly correlated,
    and
  • The correlations among each group can be properly
    be modeled using one single latent variable

13
Multidmensional Clustering
  • Each Latent variable gives a partition of
    consumers.
  • H1
  • Class 1 Likely to have tasted TuborgClas,
    Carlspec and Heineken , but do not drink
    regularly
  • Class 2 Likely to have seen or tasted the
    beers, but did not drink regularly
  • Class 3 Likely to drink TuborgClas and Carlspec
    regularly
  • Intuitively, latent tree analysis is a technique
    for multiple clustering.
  • K-Means, mixture models give only one partition.

14
Page 14
Binary Text Data WebKB
(Liu et al. PGM 2012, MLJ 2013)
1041 web pages collected from 4 CS departments in
1997 336 words
15
Page 15
Latent Tree Model for WebKB Data by BI Algorithm
89 latent variables
16
Latent Tree Modes for WebKB Data
17
Page 17
18
Page 18
19
Page 19
Why variables grouped as such?
  • Group as such because words in in each group tend
    to co-occur.
  • On binary data, latent tree analysis
  • Partitions observed word variables into groups
    such that
  • Words in each group tend to co-occur and
  • The correlations can be properly be explained
    using one single latent variable

LTA is a method for identifying co-occurrence
relationships.
20
Multidimensional Clustering
  • LTA is an approach to topic detection
  • Y664 Object Oriented Programming (oop)
  • Y662 Non-oop programming
  • Y661 programming language
  • Y663 Not on programming

21
Outline
  • Latent tree models
  • Latent tree analysis algorithms
  • What can LTA be used for
  • Discovery of co-occurrence/correlation patterns
  • Discovery of latent variable/structures
  • Multidimensional clustering
  • Examples
  • Danish beer survey data
  • Text data
  • TCM survey data

22
Background of Research
  • Common practice in China, increasingly in Western
    world
  • Patients of a WM disease divided into several TCM
    classes
  • Different classes are treated differently using
    TCM treatments.
  • Example
  • WM disease Depression
  • TCM Classes
  • Liver-Qi Stagnation (????). Treatment principle
    ????, Prescription ?????
  • Deficiency of Liver Yin and Kidney Yin
    (????)Treatment principle ????, Prescription
    ?????????
  • Vacuity of both heart and spleen (????).
    Treatment principle ????, Prescription ???
  • .

23
Key Question
  • How should patients of a WM disease be divided
    into subclasses from the TCM perspective?
  • What TCM classes?
  • What are the characteristics of each TCM class?
  • How to differentiate different TCM classes?
  • Important for
  • Clinic practice
  • Research
  • Randomized controlled trials for efficacy
  • Modern biomedical understanding of TCM concepts
  • No consensus. Different doctors/researchers use
    different schemes. Key weakness of TCM.

24
Key Idea
  • Our objective
  • Provide an evidence-based method for TCM patient
    classification
  • Key Idea
  • Cluster analysis of symptom data gt empirical
    partition of patients
  • Check to see whether it corresponds to TCM class
    concept
  • Key technology Multidimensional clustering
  • Motivation for developing latent tree analysis

25
Symptoms Data of Depressive Patients
(Zhao et al. JACM 2014)
  • Subjects
  • 604 depressive patients aged between 19 and 69
    from 9 hospitals
  • Selected using the Chinese classification of
    mental disorder clinic guideline CCMD-3
  • Exclusion
  • Subjects we took anti-depression drugs within two
    weeks prior to the survey women in the
    gestational and suckling periods, .. etc
  • Symptom variables
  • From the TCM literature on depression between
    1994 and 2004.
  • Searched with the phrase ?? and ? on the
    CNKI (China National Knowledge Infrastructure)
    data
  • Kept only those on studies where patients were
    selected using the ICD-9, ICD-10, CCMD-2, or
    CCMD-3 guidelines.
  • 143 symptoms reported in those studies altogether.

26
The Depression Data
  • Data as a table
  • 604 rows, each for a patient
  • 143 columns, each for a symptom
  • Table cells 0 symptom not present, 1 symptom
    present
  • Removed Symptoms occurring lt10 times
  • 86 symptoms variables entered latent tree
    analysis.
  • Structure of the latent tree model obtained on
    the next two slides.

27
Model Obtained for a Depression Data (Top)
28
Model obtained for a Depression Data (Bottom)
29
The Empirical Partitions
  • The first cluster (Y29 s0) consists of 54 of
    the patients and while the cluster (Y29 s1)
    consists of 46 of the patients.
  • The two symptoms fear of cold and cold limbs
    do not occur often in the first cluster
  • While they both tend to occur with high
    probabilities (0.8 and 0.85) in the second
    cluster.

30
Probabilistic Symptom co-occurrence pattern
  • Probabilistic symptom co-occurrence pattern
  • The table indicates that the two symptoms fear
    of cold and cold limbs tend to co-occur in the
    cluster Y29 s1
  • Pattern meaningful from the TCM perspective.
  • TCM asserts that YANG DEFICIENCY (??) can lead
    to, among other symptoms, fear of cold and
    cold limbs
  • So, the co-occurrence pattern suggests the TCM
    symdrome type (??) YANG DEFICIENCY (??).
  • The partition Y29 suggests that
  • Among depressive patients, there is a subclass of
    patient with YANG DEFICIENCY.
  • In this subclass, fear of cold and cold
    limbs
  • co-occur with high probabilities (0.8 and
    0.85)

31
Probabilistic Symptom co-occurrence pattern
  • Y28 s1 captures the probabilistic co-occurrence
    of aching lumbus, lumbar pain like pressure
    and lumbar pain like warmth.
  • This pattern is present in 27 of the patients.
  • It suggests that
  • Among depressive patients, there is a subclass
    that correspond to the TCM concept of KIDNEY
    DEPRIVED OF NOURISHMENT (????)
  • Characteristics of the subclass given by
    distributions for Y28 s1

32
Probabilistic Symptom co-occurrence pattern
  • Y27 s1 captures the probabilistic co-occurrence
    of weak lumbus and knees and cumbersome
    limbs.
  • This pattern is present in 44 of the patients
  • It suggests that,
  • Among depressive patients, there is a subclass
    that correspond to the TCM concept of KIDNEY
    DEFICIENCY (??)
  • Characteristics of the subclass given by
    distributions for Y27 s1
  • Y27, Y28, Y29 together provide evidence for
    defining KIDNEY YANG DEFICIENCY

33
Probabilistic Symptom co-occurrence pattern
  • Pattern Y21 s1 evidence for defining STAGNANT
    QI TURNING INTO FIRE (????)
  • Y15 s1 evidence for defining QI DEFICIENCY
  • Y17 s1 evidence for defining HEART QI
    DEFICIENCY
  • Y16 s1 evidence for defining QI STAGNATION
  • Y19 s1 evidence for defining QI STAGNATION IN
    HEAD

34
Probabilistic Symptom co-occurrence pattern
  • Y9 s1 evidence for defining DEFICIENCY OF BOTH
    QI AND YIN (????)
  • Y10 s1 evidence for defining YIN DEFICIENCY
    (??)
  • Y11 s1 evidence for defining DEFICIENCY OF
    STOMACH/SPLEEN YIN (????)

35
Symptom Mutual-Exclusion Patterns
  • Some empirical partitions reveal symptom
    exclusion patterns
  • Y1 reveals the mutual exclusion of white
    tongue coating, yellow tongue coating and
    yellow-white tongue coating
  • Y2 reveals the mutual exclusion of thin tongue
    coating, thick tongue coating and little
    tongue coating.

36
Summary of TCM Data Analysis
  • By analyzing 604 cases of depressive patient data
    using latent tree models we have discovered a
    host of probabilistic symptom co-occurrence
    patterns and symptom mutual-exclusion patterns.
  • Most of the co-occurrence patterns have clear TCM
    syndrome connotations, while the mutual-exclusion
    patterns are also reasonable and meaningful.
  • The patterns can be used as evidence for the task
    of defining TCM classes in the context of
    depressive patients and for differentiating
    between those classes.

37
Another Perspective Statistical Validation of
TCM Postulates
(Zhang et al. JACM 2008)
..
..
Y28 s1
Kidney deprived of nourishment
Y29 s1
Yang Deficiency
  • TCM terms such as Yang Deficiency were introduced
    to explain symptom co-occurrence patterns
    observed in clinic practice.

38
Value of Work in View of Others
  • D. Haughton and J. Haughton. Living Standards
    Analytics Development through the Lens of
    Household Survey Data. Springer. 2012
  • Zhang et al. provide a very interesting
    application of latent class (tree) models to
    diagnoses in traditional Chinese medicine (TCM).
  • The results tend to confirm known theories in
    Chinese traditional medicine.
  • This is a significant advance, since the
    scientific bases for these theories are not
    known.
  • The model proposed by the authors provides at
    least a statistical justification for them.

39
Summary
  • Latent tree models
  • Tree-structure probabilistic graphical models
  • Leaf nodes observed variables
  • Internal nodes latent variable
  • What can LTA be used for
  • Discovery of co-occurrence patterns in binary
    data
  • Discovery of correlation patterns in general
    discrete data
  • Discovery of latent variable/structures
  • Multidimensional clustering
  • Topic detection in text data
  • Key role in TCM patient classification

40
  • References
  • N. L. Zhang (2004). Hierarchical latent class
    models for cluster analysis. Journal of Machine
    Learning Research, 5(6)697-723, 2004.
  • T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K.
    M. Poon (2011). Model-based multidimensional
    clustering of categorical data. Artificial
    Intelligence, 176(1), 2246-2269.
  • T.F.Liu, N. L. Zhang, A.H. Liu, L.K.M. Poon
    (2012). A Novel LTM-based Method for
    Multidimensional Clustering. European Workshop
    on Probabilistic Graphical Models (PGM-12),
    203-210.
  • T.F, Liu, N. L. Zhang, P. X. Chen, A. H.Liu, L.
    K. M. Poon, and Yi Wang (2013). Greedy learning
    of latent tree models for multidimensional
    clustering. Machine Learning, doi10.1007/s10994-0
    13-5393-0.
  • R. Mourad, C. Sinoquet, N. L. Zhang, T.F. Liu and
    P. Leray (2013). A survey on latent tree models
    and applications. Journal of Artificial
    Intelligence Research, 47, 157-203 , 13 May 2013.
    doi10.1613/jair.3879.
  • N. L. Zhang,  S. H. Yuan, T. Chen and  Y. Wang
    (2008).  Statistical Validation of TCM Theories.
    Journal of Alternative and Complementary
    Medicine, 14(5)583-7. 
  • N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang
    (2008). Latent tree models and diagnosis in
    traditional Chinese medicine. Artificial
    Intelligence in Medicine. 42 229-245.
  • Z.X. Xu, N. L. Zhang, Y.Q. Wang, G.P. Liu, J. Xu,
    T. F. Liu, and A. H. Liu (2013). Statistical
    Validation of Traditional Chinese Medicine
    Syndrome Postulates in the Context of Patients
    with Cardiovascular Disease. The Journal of
    Alternative and Complementary Medicine.
  • Y. Zhao, N. L. Zhang, T.F.Wang, Q. G. Wang
    (2014). Discovering Symptom Co-Occurrence
    Patterns from 604 Cases of Depressive Patient
    Data using Latent Tree Models. The Journal of
    Alternative and Complementary Medicine.

41
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com