Tesis doctoral - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Tesis doctoral

Description:

Use short fragments of documents instead of whole documents ... Babel fish. Evaluation. Bilingual test (French-English) Organization. ITC-irst. IR-n Alicante ... – PowerPoint PPT presentation

Number of Views:251
Avg rating:3.0/5.0
Slides: 31
Provided by: JoseLui98
Category:
Tags: babel | doctoral | fish | tesis

less

Transcript and Presenter's Notes

Title: Tesis doctoral


1
Spoken Document Retrieval experiments with IR-n
system
Fernando Llopis Pascual Patricio Martínez-Barco
Departamento de Lenguajes y Sistemas Informáticos
2
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
3
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
4
IR-n System Passage Retrieval Systems
  • Use short fragments of documents instead of whole
    documents to evaluate the relevance or similarity
  • These fragments are called passages
  • Each document is divided into passages before
    calculating the relevance

5
IR-n System Passage concept
  • Why IR-n system use the sentence to define the
    passages ?
  • A sentence expresses an idea in the document
  • There are algorithms to obtain each sentence with
    a great precision
  • Sentences are full units allowing to show an
    understandable information by users or provide
    this information to a subsequent system

6
IR-n System Passage concept
IR-n system defines the passages in the following
way
General Custer was Civil War Union Major
soldier. One of the most famous and controversial
figures in United States Military history.
Graduated last in his West Point Class (June
1861). Spent first part of the Civil War as a
courier and staff officer. Promoted from Captain
to Brigadier General of Volunteers just prior to
the Battle of Gettysburg, and was given command
of the Michigan "Wolverines" Cavalary brigade.
He helped defeat General Stuart's attempt to
make a cavalry strike behind Union lines on the
3rd Day of the Battle (July 3, 1863), thus
markedly contributing to the Army of the
Potomac's victory (a large monument to his
Brigade now stands in the East Cavalry Field in
Gettysburg). Participated in nearly every cavalry
action in Virginia from that point until the end
of the war, always performing boldly, most often
brilliantly, and always seeking publicity for
himself and his actions. Ended the war as a Major
General of Volunteers and a Brevet Major General
in the Regular Army. Upon Army reorganization
in 1886, he was appointed Lieutenant Colonel of
the soon to be renown 7th United States Cavalry.
Fought in the various actions against the Western
Indians, often with a singular brutality
(exemplified by his wiping out of a Cheyenne
village on the Washita in November 1868). His
exploits on the Plains were romanticized by
Eastern Unites States newspapermen, and he was
elevated to legendary status in his time. The
death of his friend, Lucarelli change his life.
SENTENCE 1 SENTENCE 2 SENTENCE 3 SENTENCE
4 SENTENCE 5 SENTENCE 6 SENTENCE 7 SENTENCE
8 SENTENCE 9 SENTENCE 10 SENTENCE 11 SENTENCE
12 SENTENCE 13 SENTENCE 14 SENTENCE 15
1 Obtains sentences from the document
2 Defines passages according to a fixed number
of sentences
7
IR-n System Passage concept
  • Every passage has the same number of sentences
  • This number depends on
  • The collection of documents
  • Size of the query

8
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
9
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
10
Adapting IR-n system to SDR task Spoken input
  • As appointed by Dahlback (1997)
  • Spoken input is often incomplete and incorrect
  • Contains interruptions and repairs
  • Sentences occur only very occasionally
  • Conclusion
  • Sentence concept is not valid in spoken input
  • Therefore new basic units for dialogue models
    must be proposed
  • Utterances instead of sentences
  • Turns instead of paragraphs

11
Adapting IR-n system to SDR task Definitions
  • Utterance
  • sequency of words chained by a speaker between
    two pauses.
  • Turn
  • set of utterances that a speaker can express
    between two speaker changes (dialogues)
  • set of utterances that a speaker expresses about
    the same subject (monologues)
  • (each section of TREC SDR collection is going to
    be considered as a turn)

12
Adapting IR-n system to SDR task SDR problems
  • The lack of punctuation marks impedes the
    recognition of utterance boundaries
  • Utterances boundaries must be estimated detecting
    longest pauses
  • Some turns have not semantic content
  • Morning C.N.N. headline news Im Sachi Koto
  • Some turns are interrupted due to
  • Overlaps
  • Speaker mistakes
  • Repetitions
  • Modifications of previous information
  • Noise incorporate by Automatic transcriptors

13
Adapting IR-n system to SDR task IR-n problems
  • The lack of sentences to define passages must be
    solved with the use of utterances
  • An utterance splitter was developed
  • Overlapping passage technique was used to
    minimize fails of utterance splitting
  • Noise inputs
  • How the system supports them must be tested

14
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
15
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
16
Evaluation Evaluation goal
  • The main goal of this experiment is to know the
    robustness of IR-n system
  • How a system based on passages (therefore based
    on sentences) can be adapted to utterances
  • How the system supports noise

17
Evaluation Training focus
  • Discovering the minimum time between words to
    consider a new utterance
  • ..
  • TO
  • THWART
  • THEIR
  • ABILITY
  • TO
  • ACQUIRE
  • AND
  • DEVELOP
  • WEAPONS
  • ..

18
Evaluation Training focus
  • Discovering the minimum time between words to
    consider a new utterance
  • ..
  • TO
  • THWART
  • THEIR
  • ABILITY
  • TO
  • ACQUIRE
  • AND
  • DEVELOP
  • WEAPONS
  • ..

That is not a new utterance
19
Evaluation Training focus
  • Discovering the minimum time between words to
    consider a new utterance
  • ..
  • BUT
  • FOR
  • THE
  • BAY'S
  • CHIEF
  • I
  • WHAT
  • WOULD
  • THEY
  • ACHIEVED
  • ..

20
Evaluation Training focus
  • Discovering the minimum time between words to
    consider a new utterance
  • ..
  • BUT
  • FOR
  • THE
  • BAY'S
  • CHIEF
  • I
  • WHAT
  • WOULD
  • THEY
  • ACHIEVED
  • ..

That is a new utterance
21
Evaluation Training focus
  • Discovering the better size for passages

UTTERANCE 1 UTTERANCE 2 UTTERANCE 3 UTTERANCE
4 UTTERANCE 5 UTTERANCE 6 UTTERANCE 7 UTTERANCE
8 UTTERANCE 9 UTTERANCE 10 UTTERANCE 11 UTTERANCE
12 UTTERANCE 13 UTTERANCE 14 UTTERANCE 15
UTTERANCE 1 UTTERANCE 2 UTTERANCE 3 UTTERANCE
4 UTTERANCE 5 UTTERANCE 6 UTTERANCE 7 UTTERANCE
8 UTTERANCE 9 UTTERANCE 10 UTTERANCE 11 UTTERANCE
12 UTTERANCE 13 UTTERANCE 14 UTTERANCE 15
UTTERANCE 1 UTTERANCE 2 UTTERANCE 3 UTTERANCE
4 UTTERANCE 5 UTTERANCE 6 UTTERANCE 7 UTTERANCE
8 UTTERANCE 9 UTTERANCE 10 UTTERANCE 11 UTTERANCE
12 UTTERANCE 13 UTTERANCE 14 UTTERANCE 15
22
Evaluation Training
  • Training corpus TREC SDR-8 collection
    (according to the track specification)
  • Parameters to be evaluated
  • Number of utterances / passage (from 1 to 9)
  • Pause size considered for utterance split (0.1,
    0.2, 0.3 sec.)
  • Models
  • With query expansion
  • Without query expansion

23
Evaluation Training
Training results
Best AvgP
0.4620
Best size of passage
5
Best pause estimation
0.2
Best model
WITH
24
Evaluation Monolingual test
Monolingual results
Organization
AvgP
ITC-irst
0,3944
1
Exeter
0,3824
2
IR-n Alicante
0,3637
3
JHU/APL
0,3184
4
  • Test corpus TREC SDR-9 collection
  • Parameters
  • Number of utterances / passage 5
  • Pause size considered for utterance split 0.2
    seconds
  • Model with query expansion

25
Evaluation Bilingual test (French-English)
  • French queries were translated into English using
    machine translation
  • Power translator
  • Free translator
  • Babel fish

26
Evaluation Bilingual (French-English)
Bilingual results
Organization
AvgP
ITC-irst
0,3064
1
IR-n Alicante
0,3032
2
Exeter
0,2825
3
JHU/APL
0,1904
4
  • Test corpus TREC SDR-9 collection
  • Parameters
  • Number of utterances / passage 5
  • Pause size considered for utterance split 0.2
    seconds
  • Model with query expansion

27
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
28
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
29
Conclusions and future work
  • Conclusions
  • IR-n System is robust when working in SDR task
    ()
  • IR-n System performance must be increased (-)
  • Future work
  • Reduce noise produced by repetitions
    modifications
  • Remove turns without semantic content
  • Evaluate and improve our utterance splitter

30
Spoken Document Retrieval experiments with IR-n
system
Fernando Llopis Pascual Patricio Martínez-Barco
Departamento de Lenguajes y Sistemas Informáticos
Write a Comment
User Comments (0)
About PowerShow.com