Title: Randomness and prediction
1Finding scientific topics
Tom Griffiths Stanford University Mark
Steyvers UC Irvine
2Why map knowledge?
- Quickly grasp important themes in a new field
- Synthesize content of an existing field
- Discover targets for funding and research
3Why map knowledge?
- Quickly grasp important themes in a new field
- Synthesize content of an existing field
- Discover targets for funding and research
INFORMATION OVERLOAD
4Apoptosis Plant Biology
5Apoptosis Medicine
6Apoptosis Medicine
7Apoptosis Medicine
8Apoptosis Medicine
Apoptosis Medicine
9Apoptosis Medicine
probabilistic generative model
10Apoptosis Medicine
statistical inference
11- 1. A generative model for documents
- 2. Discovering topics with Gibbs sampling
- 3. Results
- Topics and classes
- Mapping science
- Topic dynamics
- 4. Future directions
- Tagging abstracts
12- 1. A generative model for documents
- 2. Discovering topics with Gibbs sampling
- 3. Results
- Topics and classes
- Mapping science
- Topic dynamics
- 4. Future directions
- Tagging abstracts
13A generative model for documents
- Each document a mixture of topics
- Each word chosen from a single topic
- from parameters
- from parameters
(Blei, Ng, Jordan, 2003)
14A generative model for documents
w P(wz 1) f (1)
w P(wz 2) f (2)
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2 SCIENTIFIC 0.0 KNOWLEDGE 0.0 WORK
0.0 RESEARCH 0.0 MATHEMATICS 0.0
HEART 0.0 LOVE 0.0 SOUL 0.0 TEARS 0.0 JOY
0.0 SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
topic 1
topic 2
15Choose mixture weights for each document,
generate bag of words
q P(z 1), P(z 2) 0, 1 0.25,
0.75 0.5, 0.5 0.75, 0.25 1, 0
MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS
RESEARCH WORK SCIENTIFIC MATHEMATICS WORK
SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC
HEART LOVE TEARS KNOWLEDGE HEART
MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK
TEARS SOUL KNOWLEDGE HEART
WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE
LOVE SOUL
TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
16A generative model for documents
q
z
z
z
w
w
w
- Called Latent Dirichlet Allocation (LDA)
- Introduced by Blei, Ng, and Jordan (2003),
reinterpretation of PLSI (Hofmann, 2001)
17(Dumais, Landauer)
P(w)
18- 1. A generative model for documents
- 2. Discovering topics with Gibbs sampling
- 3. Results
- Topics and classes
- Mapping science
- Topic dynamics
- 4. Future directions
- Tagging abstracts
19Inverting the generative model
- Maximum likelihood estimation (EM)
- Variational EM (Blei, Ng Jordan, 2003)
- Bayesian inference
20Bayesian inference
- Sum in the denominator over Tn terms
- Full posterior only tractable to a constant
21Markov chain Monte Carlo
- Sample from a Markov chain which converges to
target distribution - Allows sampling from an unnormalized posterior
distribution - Can compute approximate statistics from
intractable distributions
22A visual example Bars
sample each pixel from a mixture of topics
pixel word image document
23(No Transcript)
24(No Transcript)
25Interpretable decomposition
- SVD gives a basis for the data, but not an
interpretable one - The true basis is not orthogonal, so rotation
does no good
26Bayesian model selection
- How many topics do we need?
- A Bayesian would consider the posterior
- Involves summing over assignments z
P(Tw) ? P(wT) P(T)
27Bayesian model selection
T 10
P( w T )
T 100
Corpus (w)
28Bayesian model selection
T 10
P( w T )
T 100
Corpus (w)
29Bayesian model selection
T 10
P( w T )
T 100
Corpus (w)
30Back to the bars
31- 1. A generative model for documents
- 2. Discovering topics with Gibbs sampling
- 3. Results
- Topics and classes
- Mapping science
- Topic dynamics
- 4. Future directions
- Tagging abstracts
32Corpus preprocessing
- Used all D 28,154 abstracts from 1991-2001
- Used any word occurring in at least five
abstracts, not on stop list (W 20,551) - Segmentation by any delimiting character, total
of n 3,026,970 word tokens in corpus - Also, PNAS class designations for 2001
- (thanks to Kevin Boyack)
33Running the algorithm
- Memory requirements linear in T(WD), runtime
proportional to nT - T 50, 100, 200, 300, 400, 500, 600, (1000)
- Ran 8 chains for each T, burn-in of 1000
iterations, 10 samples/chain at a lag of 100 - All runs completed in under 30 hours on
BlueHorizon supercomputer at San Diego
34How many topics?
35(No Transcript)
36A selection of topics
STRUCTURE ANGSTROM CRYSTAL RESIDUES STRUCTURES STR
UCTURAL RESOLUTION HELIX THREE HELICES DETERMINED
RAY CONFORMATION HELICAL HYDROPHOBIC SIDE DIMENSIO
NAL INTERACTIONS MOLECULE SURFACE
NEURONS BRAIN CORTEX CORTICAL OLFACTORY NUCLEUS NE
URONAL LAYER RAT NUCLEI CEREBELLUM CEREBELLAR LATE
RAL CEREBRAL LAYERS GRANULE LABELED HIPPOCAMPUS AR
EAS THALAMIC
TUMOR CANCER TUMORS HUMAN CELLS BREAST MELANOMA GR
OWTH CARCINOMA PROSTATE NORMAL CELL METASTATIC MAL
IGNANT LUNG CANCERS MICE NUDE PRIMARY OVARIAN
MUSCLE CARDIAC HEART SKELETAL MYOCYTES VENTRICULAR
MUSCLES SMOOTH HYPERTROPHY DYSTROPHIN HEARTS CONT
RACTION FIBERS FUNCTION TISSUE RAT MYOCARDIAL ISOL
ATED MYOD FAILURE
HIV VIRUS INFECTED IMMUNODEFICIENCY CD4 INFECTION
HUMAN VIRAL TAT GP120 REPLICATION TYPE ENVELOPE AI
DS REV BLOOD CCR5 INDIVIDUALS ENV PERIPHERAL
FORCE SURFACE MOLECULES SOLUTION SURFACES MICROSCO
PY WATER FORCES PARTICLES STRENGTH POLYMER IONIC A
TOMIC AQUEOUS MOLECULAR PROPERTIES LIQUID SOLUTION
S BEADS MECHANICAL
37A selection of topics
STUDIES PREVIOUS SHOWN RESULTS RECENT PRESENT STUD
Y DEMONSTRATED INDICATE WORK SUGGEST SUGGESTED USI
NG FINDINGS DEMONSTRATE REPORT INDICATED CONSISTEN
T REPORTS CONTRAST
MECHANISM MECHANISMS UNDERSTOOD POORLY ACTION UNKN
OWN REMAIN UNDERLYING MOLECULAR PS REMAINS SHOW RE
SPONSIBLE PROCESS SUGGEST UNCLEAR REPORT LEADING L
ARGELY KNOWN
MODEL MODELS EXPERIMENTAL BASED PROPOSED DATA SIMP
LE DYNAMICS PREDICTED EXPLAIN BEHAVIOR THEORETICAL
ACCOUNT THEORY PREDICTS COMPUTER QUANTITATIVE PRE
DICTIONS CONSISTENT PARAMETERS
CHROMOSOME REGION CHROMOSOMES KB MAP MAPPING CHROM
OSOMAL HYBRIDIZATION ARTIFICIAL MAPPED PHYSICAL MA
PS GENOMIC DNA LOCUS GENOME GENE HUMAN SITU CLONES
ADULT DEVELOPMENT FETAL DAY DEVELOPMENTAL POSTNATA
L EARLY DAYS NEONATAL LIFE DEVELOPING EMBRYONIC BI
RTH NEWBORN MATERNAL PRESENT PERIOD ANIMALS NEUROG
ENESIS ADULTS
PARASITE PARASITES FALCIPARUM MALARIA HOST PLASMOD
IUM ERYTHROCYTES ERYTHROCYTE MAJOR LEISHMANIA INFE
CTED BLOOD INFECTION MOSQUITO INVASION TRYPANOSOMA
CRUZI BRUCEI HUMAN HOSTS
MALE FEMALE MALES FEMALES SEX SEXUAL BEHAVIOR OFFS
PRING REPRODUCTIVE MATING SOCIAL SPECIES REPRODUCT
ION FERTILITY TESTIS MATE GENETIC GERM CHOICE SRY
38A selection of topics
STUDIES PREVIOUS SHOWN RESULTS RECENT PRESENT STUD
Y DEMONSTRATED INDICATE WORK SUGGEST SUGGESTED USI
NG FINDINGS DEMONSTRATE REPORT INDICATED CONSISTEN
T REPORTS CONTRAST
MECHANISM MECHANISMS UNDERSTOOD POORLY ACTION UNKN
OWN REMAIN UNDERLYING MOLECULAR PS REMAINS SHOW RE
SPONSIBLE PROCESS SUGGEST UNCLEAR REPORT LEADING L
ARGELY KNOWN
MODEL MODELS EXPERIMENTAL BASED PROPOSED DATA SIMP
LE DYNAMICS PREDICTED EXPLAIN BEHAVIOR THEORETICAL
ACCOUNT THEORY PREDICTS COMPUTER QUANTITATIVE PRE
DICTIONS CONSISTENT PARAMETERS
CHROMOSOME REGION CHROMOSOMES KB MAP MAPPING CHROM
OSOMAL HYBRIDIZATION ARTIFICIAL MAPPED PHYSICAL MA
PS GENOMIC DNA LOCUS GENOME GENE HUMAN SITU CLONES
ADULT DEVELOPMENT FETAL DAY DEVELOPMENTAL POSTNATA
L EARLY DAYS NEONATAL LIFE DEVELOPING EMBRYONIC BI
RTH NEWBORN MATERNAL PRESENT PERIOD ANIMALS NEUROG
ENESIS ADULTS
PARASITE PARASITES FALCIPARUM MALARIA HOST PLASMOD
IUM ERYTHROCYTES ERYTHROCYTE MAJOR LEISHMANIA INFE
CTED BLOOD INFECTION MOSQUITO INVASION TRYPANOSOMA
CRUZI BRUCEI HUMAN HOSTS
MALE FEMALE MALES FEMALES SEX SEXUAL BEHAVIOR OFFS
PRING REPRODUCTIVE MATING SOCIAL SPECIES REPRODUCT
ION FERTILITY TESTIS MATE GENETIC GERM CHOICE SRY
39- 1. A generative model for documents
- 2. Discovering topics with Gibbs sampling
- 3. Results
- Topics and classes
- Mapping science
- Topic dynamics
- 4. Future directions
- Tagging abstracts
40Topics and classes
- PNAS authors provide class designations
- major Biological, Physical, Social Sciences
- minor 33 separate disciplines
- Find topics diagnostic of classes
- validate reality of classes
- show topics pick out meaningful structure
(classes, and the the relations between them)
41(No Transcript)
42210 SYNAPTIC NEURONS POSTSYNAPTIC HIPPOCAMPAL SYNA
PSES LTP PRESYNAPTIC TRANSMISSION POTENTIATION PLA
STICITY EXCITATORY RELEASE DENDRITIC PYRAMIDAL HIP
POCAMPUS DENDRITES CA1 STIMULATION TERMINALS SYNAP
SE
43201 RESISTANCE RESISTANT DRUG DRUGS SENSITIVE MDR
MULTIDRUG SUSCEPTIBLE SELECTED GLYCOPROTEIN SENSIT
IVITY PGP AGENTS CONFERS MDR1 CYTOTOXIC CONFERRED
CHEMOTHERAPEUTIC EFFLUX INCREASED
44280 SPECIES SELECTION EVOLUTION GENETIC POPULATION
S POPULATION VARIATION NATURAL EVOLUTIONARY FITNES
S ADAPTIVE RATES THEORY TRAITS DIVERSITY EXPECTED
NEUTRAL EVOLVED COMPETITION HISTORY
45222 CORTEX BRAIN SUBJECTS TASK AREAS REGIONS FUNCT
IONAL LEFT MEMORY TEMPORAL IMAGING PREFRONTAL CERE
BRAL TASKS FRONTAL AREA TOMOGRAPHY EMISSION POSITR
ON CORTICAL
462 SPECIES GLOBAL CLIMATE CO2 WATER ENVIRONMENTAL Y
EARS MARINE CARBON DIVERSITY OCEAN EXTINCTION TERR
ESTRIAL COMMUNITY ABUNDANCE EARTH ECOLOGICAL CHANG
E TIME ECOSYSTEM
4739 THEORY TIME SPACE GIVEN PROBLEM SHAPE SIMPLE DI
MENSIONAL PAPER NUMBER CASE LOCAL TERMS SYMMETRY R
ANDOM EQUATION CLASSICAL COMPLEXITY NUMERICAL PROP
ERTIES
48- 1. A generative model for documents
- 2. Discovering topics with Gibbs sampling
- 3. Results
- Topics and classes
- Mapping science
- Topic dynamics
- 4. Future directions
- Tagging abstracts
49Mapping science
- Topics provide dimensionality reduction
- Some applications require visualization (and even
lower dimensionality) - Low-dimensional representation from methods for
analysis of compositional data
50(No Transcript)
51(No Transcript)
52(No Transcript)
53- 1. A generative model for documents
- 2. Discovering topics with Gibbs sampling
- 3. Results
- Topics and classes
- Mapping science
- Topic dynamics
- 4. Future directions
- Tagging abstracts
54Topic dynamics
- We have the distribution over topics for
abstracts from 1991 to 2001 - Analysis of dynamics
- perform linear trend analysis for each topic
- hot topics go up, cold topics go down
55Cold topics
Hot topics
56Cold topics
Hot topics
2 SPECIES GLOBAL CLIMATE CO2 WATER ENVIRONMENTAL Y
EARS MARINE CARBON DIVERSITY OCEAN EXTINCTION TERR
ESTRIAL COMMUNITY ABUNDANCE
134 MICE DEFICIENT NORMAL GENE NULL MOUSE TYPE HOM
OZYGOUS ROLE KNOCKOUT DEVELOPMENT GENERATED LACKIN
G ANIMALS REDUCED
179 APOPTOSIS DEATH CELL INDUCED BCL CELLS APOPTOT
IC CASPASE FAS SURVIVAL PROGRAMMED MEDIATED INDUCT
ION CERAMIDE EXPRESSION
57Cold topics
Hot topics
37 CDNA AMINO SEQUENCE ACID PROTEIN ISOLATED ENCOD
ING CLONED ACIDS IDENTITY CLONE EXPRESSED ENCODES
RAT HOMOLOGY
2 SPECIES GLOBAL CLIMATE CO2 WATER ENVIRONMENTAL Y
EARS MARINE CARBON DIVERSITY OCEAN EXTINCTION TERR
ESTRIAL COMMUNITY ABUNDANCE
289 KDA PROTEIN PURIFIED MOLECULAR MASS CHROMATOGR
APHY POLYPEPTIDE GEL SDS BAND APPARENT LABELED IDE
NTIFIED FRACTION DETECTED
75 ANTIBODY ANTIBODIES MONOCLONAL ANTIGEN IGG MAB
SPECIFIC EPITOPE HUMAN MABS RECOGNIZED SERA EPITOP
ES DIRECTED NEUTRALIZING
134 MICE DEFICIENT NORMAL GENE NULL MOUSE TYPE HOM
OZYGOUS ROLE KNOCKOUT DEVELOPMENT GENERATED LACKIN
G ANIMALS REDUCED
179 APOPTOSIS DEATH CELL INDUCED BCL CELLS APOPTOT
IC CASPASE FAS SURVIVAL PROGRAMMED MEDIATED INDUCT
ION CERAMIDE EXPRESSION
58- 1. A generative model for documents
- 2. Discovering topics with Gibbs sampling
- 3. Results
- Topics and classes
- Mapping science
- Topic dynamics
- 4. Future directions
- Tagging abstracts
59Future directions
- Including different kinds of knowledge
- citations (Hofmann Cohn, 2001)
- author, title, keywords, other fields
- word order information
- An example scientific syntax and semantics
60Scientific syntax and semantics
Factorization of language based on statistical
dependency patterns long-range, document
specific, dependencies short-range
dependencies constant across all documents
semantics probabilistic topics
q
z
z
z
w
w
w
x
x
x
syntax probabilistic regular grammar
61x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
62x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE
63x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE
64x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE OF
65x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE OF RESEARCH
66Semantic topics
67Syntactic classes
5
8
14
25
26
30
33
IN
ARE
THE
SUGGEST
LEVELS
RESULTS
BEEN
FOR
WERE
THIS
INDICATE
NUMBER
ANALYSIS
MAY
ON
WAS
ITS
SUGGESTING
LEVEL
DATA
CAN
BETWEEN
IS
THEIR
SUGGESTS
RATE
STUDIES
COULD
DURING
WHEN
AN
SHOWED
TIME
STUDY
WELL
AMONG
REMAIN
EACH
REVEALED
CONCENTRATIONS
FINDINGS
DID
FROM
REMAINS
ONE
SHOW
VARIETY
EXPERIMENTS
DOES
UNDER
REMAINED
ANY
DEMONSTRATE
RANGE
OBSERVATIONS
DO
WITHIN
PREVIOUSLY
INCREASED
INDICATING
CONCENTRATION
HYPOTHESIS
MIGHT
THROUGHOUT
BECOME
EXOGENOUS
PROVIDE
DOSE
ANALYSES
SHOULD
THROUGH
BECAME
OUR
SUPPORT
FAMILY
ASSAYS
WILL
TOWARD
BEING
RECOMBINANT
INDICATES
SET
POSSIBILITY
WOULD
INTO
BUT
ENDOGENOUS
PROVIDES
FREQUENCY
MICROSCOPY
MUST
AT
GIVE
TOTAL
INDICATED
SERIES
PAPER
CANNOT
INVOLVING
MERE
PURIFIED
DEMONSTRATED
AMOUNTS
WORK
REMAINED
AFTER
APPEARED
TILE
SHOWS
RATES
EVIDENCE
ALSO
THEY
ACROSS
APPEAR
FULL
SO
CLASS
FINDING
AGAINST
ALLOWED
CHRONIC
REVEAL
VALUES
MUTAGENESIS
BECOME
WHEN
NORMALLY
ANOTHER
DEMONSTRATES
AMOUNT
OBSERVATION
MAG
ALONG
EACH
EXCESS
SUGGESTED
SITES
MEASUREMENTS
LIKELY
68Abstract tagging
- Highlight important words in text, to reduce
demands on information users - Can be done to identify different content
- words assigned to most prevalent topic reveal
important themes (see the paper!) - with syntactic/semantic factorization, we can
highlight words that determine semantic content
69(PNAS, 1991, vol. 88, 4874-4876) A23
generalized49 fundamental11 theorem20 of4
natural46 selection46 is32 derived17 for5
populations46 incorporating22 both39 genetic46
and37 cultural46 transmission46. The14
phenotype15 is32 determined17 by42 an23
arbitrary49 number26 of4 multiallelic52 loci40
with22 two39-factor148 epistasis46 and37 an23
arbitrary49 linkage11 map20, as43 well33 as43
by42 cultural46 transmission46 from22 the14
parents46. Generations46 are8 discrete49 but37
partially19 overlapping24, and37 mating46 may33
be44 nonrandom17 at9 either39 the14 genotypic46
or37 the14 phenotypic46 level46 (or37 both39).
I12 show34 that47 cultural46 transmission46 has18
several39 important49 implications6 for5 the14
evolution46 of4 population46 fitness46, most36
notably4 that47 there41 is32 a23 time26 lag7 in22
the14 response28 to31 selection46 such9 that47
the14 future137 evolution46 depends29 on21 the14
past24 selection46 history46 of4 the14
population46.
(graylevel semanticity, the probability of
using LDA over HMM)
70(PNAS, 1996, vol. 93, 14628-14631) The14
''shape7'' of4 a23 female115 mating115
preference125 is32 the14 relationship7 between4
a23 male115 trait15 and37 the14 probability7 of4
acceptance21 as43 a23 mating115 partner20, The14
shape7 of4 preferences115 is32 important49 in5
many39 models6 of4 sexual115 selection46, mate115
recognition125, communication9, and37
speciation46, yet50 it41 has18 rarely19 been33
measured17 precisely19, Here12 I9 examine34
preference7 shape7 for5 male115 calling115
song125 in22 a23 bushcricket13 (katydid48).
Preferences115 change46 dramatically19 between22
races46 of4 a23 species15, from22 strongly19
directional11 to31 broadly19 stabilizing45 (but50
with21 a23 net49 directional46 effect46),
Preference115 shape46 generally19 matches10 the14
distribution16 of4 the14 male115 trait15, This41
is32 compatible29 with21 a23 coevolutionary46
model20 of4 signal9-preference115 evolution46,
although50 it41 does33 nor37 rule20 out17 an23
alternative11 model20, sensory125
exploitation150. Preference46 shapes40 are8
shown35 to31 be44 genetic11 in5 origin7.
71(PNAS, 1996, vol. 93, 14628-14631) The14
''shape7'' of4 a23 female115 mating115
preference125 is32 the14 relationship7 between4
a23 male115 trait15 and37 the14 probability7 of4
acceptance21 as43 a23 mating115 partner20, The14
shape7 of4 preferences115 is32 important49 in5
many39 models6 of4 sexual115 selection46, mate115
recognition125, communication9, and37
speciation46, yet50 it41 has18 rarely19 been33
measured17 precisely19, Here12 I9 examine34
preference7 shape7 for5 male115 calling115
song125 in22 a23 bushcricket13 (katydid48).
Preferences115 change46 dramatically19 between22
races46 of4 a23 species15, from22 strongly19
directional11 to31 broadly19 stabilizing45 (but50
with21 a23 net49 directional46 effect46),
Preference115 shape46 generally19 matches10 the14
distribution16 of4 the14 male115 trait15. This41
is32 compatible29 with21 a23 coevolutionary46
model20 of4 signal9-preference115 evolution46,
although50 it41 does33 nor37 rule20 out17 an23
alternative11 model20, sensory125
exploitation150. Preference46 shapes40 are8
shown35 to31 be44 genetic11 in5 origin7.
72Conclusion
- Probabilistic generative models can reveal the
structure of knowledge domains - We can use these models to
- identify important themes
- synthesize content
- discover targets for funding and research
- reduce the demands on information users
73(No Transcript)
74Gibbs sampling
- For variables z z1, z2, , zn
- Draw zi(t1) from P(ziz-i, w)
- z-i z1(t1), z2(t1),, zi-1(t1), zi1(t), ,
zn(t)
75Gibbs sampling
- Need full conditional distributions for variables
- Since we only sample z we need
number of times word w assigned to topic j
number of times topic j used in document d
76Gibbs sampling
iteration 1
77Gibbs sampling
iteration 1 2
78Gibbs sampling
iteration 1 2
79Gibbs sampling
iteration 1 2
80Gibbs sampling
iteration 1 2
81Gibbs sampling
iteration 1 2
82Gibbs sampling
iteration 1 2
83Gibbs sampling
iteration 1 2
84Gibbs sampling
iteration 1 2
1000