Tesis doctoral - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Tesis doctoral

Description:

Use short fragments of documents instead of whole documents ... Babel fish. Evaluation. Bilingual test (French-English) Organization. ITC-irst. IR-n Alicante ... – PowerPoint PPT presentation

Number of Views:251

Avg rating:3.0/5.0

Slides: 31

Provided by: JoseLui98

Category:

more less

Transcript and Presenter's Notes

Title: Tesis doctoral

1
Spoken Document Retrieval experiments with IR-n
system
Fernando Llopis Pascual Patricio Martínez-Barco
Departamento de Lenguajes y Sistemas Informáticos
2
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
3
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
4
IR-n System Passage Retrieval Systems

Use short fragments of documents instead of whole
documents to evaluate the relevance or similarity
These fragments are called passages
Each document is divided into passages before
calculating the relevance

5
IR-n System Passage concept

Why IR-n system use the sentence to define the
passages ?
A sentence expresses an idea in the document
There are algorithms to obtain each sentence with
a great precision
Sentences are full units allowing to show an
understandable information by users or provide
this information to a subsequent system

6
IR-n System Passage concept
IR-n system defines the passages in the following
way
General Custer was Civil War Union Major
soldier. One of the most famous and controversial
figures in United States Military history.
Graduated last in his West Point Class (June
1861). Spent first part of the Civil War as a
courier and staff officer. Promoted from Captain
to Brigadier General of Volunteers just prior to
the Battle of Gettysburg, and was given command
of the Michigan "Wolverines" Cavalary brigade.
He helped defeat General Stuart's attempt to
make a cavalry strike behind Union lines on the
3rd Day of the Battle (July 3, 1863), thus
markedly contributing to the Army of the
Potomac's victory (a large monument to his
Brigade now stands in the East Cavalry Field in
Gettysburg). Participated in nearly every cavalry
action in Virginia from that point until the end
of the war, always performing boldly, most often
brilliantly, and always seeking publicity for
himself and his actions. Ended the war as a Major
General of Volunteers and a Brevet Major General
in the Regular Army. Upon Army reorganization
in 1886, he was appointed Lieutenant Colonel of
the soon to be renown 7th United States Cavalry.
Fought in the various actions against the Western
Indians, often with a singular brutality
(exemplified by his wiping out of a Cheyenne
village on the Washita in November 1868). His
exploits on the Plains were romanticized by
Eastern Unites States newspapermen, and he was
elevated to legendary status in his time. The
death of his friend, Lucarelli change his life.
SENTENCE 1 SENTENCE 2 SENTENCE 3 SENTENCE
4 SENTENCE 5 SENTENCE 6 SENTENCE 7 SENTENCE
8 SENTENCE 9 SENTENCE 10 SENTENCE 11 SENTENCE
12 SENTENCE 13 SENTENCE 14 SENTENCE 15
1 Obtains sentences from the document
2 Defines passages according to a fixed number
of sentences
7
IR-n System Passage concept

Every passage has the same number of sentences
This number depends on
The collection of documents
Size of the query

8
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
9
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
10
Adapting IR-n system to SDR task Spoken input

As appointed by Dahlback (1997)
Spoken input is often incomplete and incorrect
Contains interruptions and repairs
Sentences occur only very occasionally
Conclusion
Sentence concept is not valid in spoken input
Therefore new basic units for dialogue models
must be proposed
Utterances instead of sentences
Turns instead of paragraphs

11
Adapting IR-n system to SDR task Definitions

Utterance
sequency of words chained by a speaker between
two pauses.
Turn
set of utterances that a speaker can express
between two speaker changes (dialogues)
set of utterances that a speaker expresses about
the same subject (monologues)
(each section of TREC SDR collection is going to
be considered as a turn)

12
Adapting IR-n system to SDR task SDR problems

The lack of punctuation marks impedes the
recognition of utterance boundaries
Utterances boundaries must be estimated detecting
longest pauses
Some turns have not semantic content
Morning C.N.N. headline news Im Sachi Koto
Some turns are interrupted due to
Overlaps
Speaker mistakes
Repetitions
Modifications of previous information
Noise incorporate by Automatic transcriptors

13
Adapting IR-n system to SDR task IR-n problems

The lack of sentences to define passages must be
solved with the use of utterances
An utterance splitter was developed
Overlapping passage technique was used to
minimize fails of utterance splitting
Noise inputs
How the system supports them must be tested

14
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
15
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
16
Evaluation Evaluation goal

The main goal of this experiment is to know the
robustness of IR-n system
How a system based on passages (therefore based
on sentences) can be adapted to utterances
How the system supports noise

17
Evaluation Training focus

Discovering the minimum time between words to
consider a new utterance
..
TO
THWART
THEIR
ABILITY
TO
ACQUIRE
AND
DEVELOP
WEAPONS
..

18
Evaluation Training focus

Discovering the minimum time between words to
consider a new utterance
..
TO
THWART
THEIR
ABILITY
TO
ACQUIRE
AND
DEVELOP
WEAPONS
..

That is not a new utterance
19
Evaluation Training focus

Discovering the minimum time between words to
consider a new utterance
..
BUT
FOR
THE
BAY'S
CHIEF
I
WHAT
WOULD
THEY
ACHIEVED
..

20
Evaluation Training focus

Discovering the minimum time between words to
consider a new utterance
..
BUT
FOR
THE
BAY'S
CHIEF
I
WHAT
WOULD
THEY
ACHIEVED
..

That is a new utterance
21
Evaluation Training focus

Discovering the better size for passages

UTTERANCE 1 UTTERANCE 2 UTTERANCE 3 UTTERANCE
4 UTTERANCE 5 UTTERANCE 6 UTTERANCE 7 UTTERANCE
8 UTTERANCE 9 UTTERANCE 10 UTTERANCE 11 UTTERANCE
12 UTTERANCE 13 UTTERANCE 14 UTTERANCE 15
UTTERANCE 1 UTTERANCE 2 UTTERANCE 3 UTTERANCE
4 UTTERANCE 5 UTTERANCE 6 UTTERANCE 7 UTTERANCE
8 UTTERANCE 9 UTTERANCE 10 UTTERANCE 11 UTTERANCE
12 UTTERANCE 13 UTTERANCE 14 UTTERANCE 15
UTTERANCE 1 UTTERANCE 2 UTTERANCE 3 UTTERANCE
4 UTTERANCE 5 UTTERANCE 6 UTTERANCE 7 UTTERANCE
8 UTTERANCE 9 UTTERANCE 10 UTTERANCE 11 UTTERANCE
12 UTTERANCE 13 UTTERANCE 14 UTTERANCE 15
22
Evaluation Training

Training corpus TREC SDR-8 collection
(according to the track specification)
Parameters to be evaluated
Number of utterances / passage (from 1 to 9)
Pause size considered for utterance split (0.1,
0.2, 0.3 sec.)
Models
With query expansion
Without query expansion

23
Evaluation Training
Training results
Best AvgP
0.4620
Best size of passage
5
Best pause estimation
0.2
Best model
WITH
24
Evaluation Monolingual test
Monolingual results
Organization
AvgP
ITC-irst
0,3944
1
Exeter
0,3824
2
IR-n Alicante
0,3637
3
JHU/APL
0,3184
4

Test corpus TREC SDR-9 collection
Parameters
Number of utterances / passage 5
Pause size considered for utterance split 0.2
seconds
Model with query expansion

25
Evaluation Bilingual test (French-English)

French queries were translated into English using
machine translation
Power translator
Free translator
Babel fish

26
Evaluation Bilingual (French-English)
Bilingual results
Organization
AvgP
ITC-irst
0,3064
1
IR-n Alicante
0,3032
2
Exeter
0,2825
3
JHU/APL
0,1904
4

Test corpus TREC SDR-9 collection
Parameters
Number of utterances / passage 5
Pause size considered for utterance split 0.2
seconds
Model with query expansion

27
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
28
Index
IR-n System
Adapting IR-n System to SDR task
Evaluation
Conclusions and future work
29
Conclusions and future work

Conclusions
IR-n System is robust when working in SDR task
()
IR-n System performance must be increased (-)
Future work
Reduce noise produced by repetitions
modifications
Remove turns without semantic content
Evaluate and improve our utterance splitter

30
Spoken Document Retrieval experiments with IR-n
system
Fernando Llopis Pascual Patricio Martínez-Barco
Departamento de Lenguajes y Sistemas Informáticos

Write a Comment

User Comments (0)