Searching Speech: A Research Agenda - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Searching Speech: A Research Agenda

Description:

MAP: .2374, even post-mixing of scratchpad/summary from 20NN, remixed with time ... Normalize labeled categories? Food in hiding - food AND hiding. Develop ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 53
Provided by: JJ7
Category:

less

Transcript and Presenter's Notes

Title: Searching Speech: A Research Agenda


1
Searching SpeechA Research Agenda
  • Douglas W. Oard
  • College of Information Studies and
  • Institute for Advanced Computer Studies
  • University of Maryland, College Park

2
Some Grid Use at Maryland
  • Global Land Cover Facility
  • 13 TB of raw and derived data from 5 satellites
  • Digital archives
  • Preserving the meaning of metadata structure
  • Access grid
  • No-operator information studies classroom

3
Expanding the Search Space
Scanned Docs
Identity Harriet Later, I learned that John
had not heard
4
Indexable Speech
  • What if we could collect everything?
  • 1 billion users of speech-enabled devices
  • Each producing gt10K words per day
  • Much of it not worth finding
  • Comparison case Web search
  • Google indexes 10 billion Web pages
  • Perhaps averaging 1K words each
  • Much of it not worth finding

5
A Web of Speech?
6
The Need for Scalable Solutions
7
Some Spoken Word Collections
  • Broadcast programming
  • News, interview, talk radio, sports,
    entertainment
  • Storytelling
  • Books on tape, oral history, folklore
  • Incidental recording
  • Speeches, courtrooms, meetings, phone calls

8
Indexing Options
  • Transcript-based (e.g., NASA)
  • Manual transcription, editing by interviewee
  • Thesaurus-based (e.g., Shoah Foundation)
  • Manually assign descriptors to points in an
    interview
  • Catalog-based (e.g., British Library)
  • Catalog record created from interviewers notes
  • Speech-based (MALACH)
  • Create access points with speech processing

9
Supporting Intellectual Access
  • Speech Processing
  • Computational Linguistics
  • Information Retrieval
  • Information Seeking
  • Human-Computer Interaction
  • Digital Libraries

Source Selection
10
Some Technical Challenges
  • Fast ASR systems are way too slow
  • 6 orders or magnitude slower than tokenization
  • Situational sublanguage induces variability
  • Impedes interactive vocabulary acquisition
  • Knee in the WER/MAP curve comes early
  • 30-40 for broadcast news
  • Somewhere below 30 for conversations
  • Skimmable summaries from imperfect ASR
  • Particularly important for linear media
  • Classic IR measures focus on documents
  • Conversationalboundaries are ambiguous

11
Start Time Error Cost
12
Shoah Foundation Collection
  • Substantial scale
  • 116,000 hours 52,000 interviews 32 languages
  • Spontaneous conversational speech
  • Accents, elderly, emotional,
  • Accessible
  • 100 million collection and digitization
    investment
  • Manually indexed (10,000 hours)
  • Segmented, thesaurus terms, people, summaries
  • Users
  • A department working full time on dissemination

13
Interview Excerpt
  • Audio characteristics
  • Accented (this one is unusually clear)
  • Separate channels for interviewer / interviewee
  • Dialog structure
  • Interviewers have different styles
  • Content characteristics
  • Domain-specific terms
  • Named entity mentions and relationships

14
MALACH Languages
Testimonies (average 2.25 hours each)
As of January 31, 2004
15
Observational Studies
  • 8 independent searchers
  • Holocaust studies (2)
  • German Studies
  • History/Political Science
  • Ethnography
  • Sociology
  • Documentary producer
  • High school teacher
  • 8 teamed searchers
  • All high school teachers
  • Thesaurus-based search
  • Rich data collection
  • Intermediary interaction
  • Semi-structured interviews
  • Observational notes
  • Think-aloud
  • Screen capture
  • Qualitative analysis
  • Theory-guided coding
  • Abductive reasoning

16
Relevance Criteria
6 Scholars, 1 teacher, 1 film producer, working
individually
17
Topicality
Total mentions
6 Scholars, 1 teacher, 1 movie producer, working
individually
18
Test Collection Design
Query Formulation
Speech Recognition
Automatic Search
Boundary Detection
Interactive Selection
Content Tagging
19
Test Collection Design
Query Formulation
Speech Recognition
Automatic Search
Boundary Detection
Content Tagging
20
CLEF-2005 CL-SR Track
  • Test collection distributed by ELDA
  • 7,800 segments from 300 English interviews
  • Hand segmented / known boundaries
  • 63 topics (title/description/narrative)
  • 38 for training, 25 for blind evaluation
  • 5 languages (EN, SP, CZ, DE, FR)
  • Relevance judgments
  • Search-guided post-hoc judgment pools
  • 5 participating teams
  • DCU, Maryland, Pitt, Toronto/Waterloo, UNED
  • One required cross-site baseline run
  • ASR segments / English TD topics

21
Additional Resources
  • Thesaurus
  • 3,000 core concepts
  • Plus alternate vocabulary standard combinations
  • 30,000 location-time pairs, with lat/long
  • Both is-a and part-whole relationships
  • In-domain expansion collection
  • 186,000 3-sentence summaries
  • Indexers scratchpad notes
  • Digitized speech
  • .mp2 or .mp3

22
English ASR
ASR2004A
ASR2003A
Training 200 hours from 800 speakers
23
ltDOCNOgtVHF00017-062567.005lt/DOCNOgt ltKEYWORDgt
Warsaw (Poland), Poland 1935 (May 13) - 1939
(August 31), awareness of political or military
events, schools lt/KEYWORDgt ltPERSONgt Sophie P,
Henry H lt/PERSONgt ltSUMMARYgt AH talks about the
college she attended before the war. She mentions
meeting her husband. She discusses young peoples'
awareness of the political events that preceded
the outbreak of war. lt/SUMMARYgt ltSCRATCHPADgt
graduated HS, went to college 1 year,
professional college hotel management met future
husband, knew that they'd end up together sister
also in college, nice social life, lots of
company, not too serious already got news from
Czechoslovakia, Sudeten, knew that Poland would
be next but what could they do about it, very
passive just heard info from radio and press
lt/SCRATCHPADgt ltASRTEXTgt no no no they did no
not not uh i know there was no place to go we
didn't have family in a in other countries so we
were not financially at the at extremely went so
that was never at plano of my family it is so and
so that was the atmosphere in the in the country
prior to the to the war i graduate take the high
school i had one year of college which was a
profession and that because that was already did
the practical trends f so that was a study for
whatever management that eh eh education and
this i i had only one that here all that at that
time i met my future husband and that to me about
any we knew it that way we were in and out
together so and i was quite county there was so
whatever i did that and this so that was the
person that lived my sister was it here is first
year of of colleagues and and also she had a very
strongly this antisemitic trend and our parents
there was a nice social life young students that
we had open house always pleasant we had a lot
of that company here and and we were not too
serious about that she we got there we were
getting the they already did knew he knew so from
czechoslovakia from they saw that from other
part and we knew the in that that he is uhhuh the
hitler spicy we go into this year this direction
that eh poland will be the next country but
there was nothing that we would do it at that
time so he was a very very he says belong to
any any organizations especially that the so we
just take information from the radio and from the
dress lt/ASRTEXTgt
24
Segment duration (s)
44.5
Min. 1st Qu. Median Mean 3rd Qu.
Max. NA's -2044.00 54.01 224.90
391.70 326.00 287400.00 75031.00
??
25
Keywords vs. Segment duration
26
Nodes descending from parents of leaves
27
Years spoken in ASR
28
Spoken dates in release ASR
Min. 0.0000 1st Qu. 0.0000
Median 0.0000 Mean 0.6575 3rd
Qu. 1.0000 Max. 13.0000
29
Current classifier performance
46,601 (1,175) 3,610 ( 169) 1,437 (168)
613 ( 47)
MAP .2374, even post-mixing of
scratchpad/summary from 20NN, remixed with
time-label densities estimated w/ Gaussian kernel
at 5x def. bandwidth
30
An Example English Topic
Number 1148 Title Jewish resistance in
Europe Description Provide testimonies or
describe actions of Jewish resistance in Europe
before and during the war. Narrative The
relevant material should describe actions of
only- or mostly Jewish resistance in Europe. Both
individual and group-based actions are relevant.
Type of actions may include survival (fleeing,
hiding, saving children), testifying (alerting
the outside world, writing, hiding testimonies),
fighting (partisans, uprising, political
security) Information about undifferentiated
resistance groups is not relevant.
31
5-level Relevance Judgments
  • Classic relevance (to food in Auschwitz)
  • Direct Knew food was sometimes withheld
  • Indirect Saw undernourished people
  • Additional relevance types
  • Context Intensity of manual labor
  • Comparison Food situation in a different camp
  • Pointer Mention of a study on the subject

32
Comparing Index Terms
Persons
Title queries, adjudicated judgments
33
Searching Manual Transcripts
jewish kapo(s)
fort ontario refugee camp
Title queries, adjudicated judgments
34
Category Expansion
35
ASR-Based Search
Mean Average Precision
Title queries, adjudicated judgments
36
Rethinking the Problem
  • Segment-then-label models planned speech well
  • Producers assemble stories to create programs
  • Stories typically have a dominant theme
  • The structure of natural speech is different
  • Creation digressions, asides, clarification,
  • Use intended use may affect desired granularity
  • Documentary film brief snippet to illustrate a
    point
  • Classroom teacher longer self-contextualizing
    story

37
Activation Matrix
Time
38
Training Data 196,000 Segments
Subject
Person
Location-Time
Berlin-1939 Employment
Josef Stein
Berlin-1939 Family life
Gretchen Stein

Anna Stein
interview time
Dresden-1939 Relocation
Transportation-rail

Dresden-1939 Schooling
Gunter Wendt

Maria

Segment summaries Indexers notes
39
Preprocessing Training Data
  • Normalize labeled categories?
  • Food in hiding -gt food AND hiding
  • Develop class models
  • Existing hierarchy, types of personal
    relationships
  • Determine the extent for each label and class
  • Merge the extent of repeated labels

40
Characteristics of the Problem
  • Clear dependencies
  • Correlated assignment of applications
  • Living in Dresden negates living in Berlin
  • Heuristic basis for class models
  • Persons, based on type of relationship
  • Date/Time, based on part-whole relationship
  • Topics, based on a defined hierarchy
  • Heuristic basis for guessing without training
  • Text similarity between labels and spoken words
  • Heuristic basis for smoothing
  • Sub-sentence retrieval granularity is unlikely

41
Modeling Location
Berlin
Dresden
Germany
  • Presence in a new location negates presence in
    the prior location
  • Location granularity varies (inclusion
    relationships are known)

42
A Class Model for People
father mother sister
father mother
sister friend
nobody
  • Several people may be discussed simultaneously
  • Small inventory of relationship types
  • Relationship type is known for most people that
    are mentioned

43
Search
  • Compute a score at each time based on
  • How likely is each descriptor? (TF)
  • How selective is each descriptor? (IDF)
  • What related descriptors are active? (expansion)
  • Determine passage start time based on
  • Score trajectory (sequence of scores)
  • Additional heuristics (e.g., pause, speaker turn)
  • Rank passages based on score trajectory
  • e.g., by peak score within the passage

44
(No Transcript)
45
Some Open Issues
  • Is the expressive power of a lattice needed?
  • An activation matrix is an unrolled lattice
  • What states do we need to represent?
  • Balance fidelity, accuracy, and complexity
  • How to integrate manual onset marks?
  • How much training data do we need?
  • Annotating new data costs 100/hour
  • How will people use the system we build?

46
Non-English ASR Systems
WER
30 40 50 60 70
34.49
35.51
38.57
stand.LMTrTC
adapt.
40.69
41.15
100h LMTr
LMTrTC
standard.
45.75
45.91
stand.LMTrTC
84h LMTr
50.82
100h LMTr
57.92
45h LMTr
66.07
20h LMTr
Polish
Czech
Slovak
Hungarian
Russian
10/01 4/02 10/02 4/03 10/03
4/04 10/04 4/05 10/05
4/06 10/06
47
Planning for the Future
  • Tentative CLEF-2006 CL-SR Plans
  • Adding a Czech collection
  • Larger English collection (900 hours)
  • Adding word lattice as standard data
  • No-boundary evaluation design
  • ASR training data (by special arrangement)
  • Transcripts, pronunciation lexicon, language
    model
  • Possible CLEF-2007 CL-SR Options
  • Add a Russian or Slovak collection?
  • Much larger English collection (5,000 hours)?

48
The CLEF CL-SR Team
USA
Europe
  • Shoah Foundation
  • Sam Gustman
  • IBM TJ Watson
  • Bhuvana Ramabhadran
  • Martin Franz
  • U. Maryland
  • Doug Oard
  • Dagobert Soergel
  • Johns Hopkins
  • Zak Schefrin
  • U. Cambridge (UK)
  • Bill Byrne
  • Charles University (CZ)
  • Jan Hajic
  • Pavel Pecina
  • U. West Bohemia (CZ)
  • Josef Psutka
  • Pavel Ircing
  • UNED (ES)
  • Fernando López-Ostenero

49
More Things to Think About
  • Privacy protection
  • Working with real data has real consequences
  • Are fixed segments the right retrieval unit?
  • Or is it good enough to know where to start?
  • What will it cost to tailor an ASR system?
  • 100K to 1 million per application?
  • Do we need to change what we collect?
  • Speaker enrollment, metadata standards,

50
Final Thoughts
  • The moving hand, having writ, moves on
  • Ephemeral webcasting
  • Forgone acquisition opportunities

51
For More Information
  • The MALACH project
  • http//www.clsp.jhu.edu/research/malach
  • CLEF-2005 evaluation
  • http//www.clef-campaign.org
  • NSF/DELOS Spoken Word Access Group
  • http//www.dcs.shef.ac.uk/spandh/projects/swag

52
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com