Title: SIMILAR Review Slides
1 eNTERFACE 08Project 2 multimodal high-level
data integration Mid-term presentation August
19th, 2008
2Team
- Olga Vybornova (Université catholique de Louvain,
UCL-TELE, Belgium)? - Hildeberto Mendonça (Université catholique de
Louvain, UCL-TELE, Belgium)? - Ao Shen (University of Birmingham, UK)?
- Daniel Neiberg (TMH/CTT, KTH Royal Institute of
Technology, Sweden)? - David Antonio Gomez Jauregui (TELECOM and
Management SudParis, France)?
3Project objectives
- to augment and improve the previous work, look
for new methods of data fusion - to resolve the problem and implement a/the
technique distinguishing between the data from
different modalities that should be fused and the
data that should not be fused but analyzed
separately - to explore and employ a context-aware cognitive
architecture for decision-making purposes.
3
4Background - Multimodality
A set of variables describing states of the
world (users input, an object, an event,
behavior, etc.) represented in different media
and through different information channels. GOAL
OF DATA FUSION The result of the fusion (merging
semantic content from multiple streams) should
give an efficient joint interpretation of the
multimodal behavior of the user(s) to provide
effective and advanced interaction
4
5Video Stream
Audio Stream
Sound Waves
Sequence of Images
Speech Recognizer
Video Analyzer
Recognized String
Movements Coordinates
Syntactic Analyzer
Human Behavior Analyzer
Syntactic Triple
Movements Meanings
Semantic Analyzer
Fusion Mechanism
Linguistic meanings
Advise People
Knowledge Base
6Video Stream
Audio Stream
Sound Waves
Sequence of Images
Speech Recognizer
Video Analyzer
Recognized String
Movements Coordinates
Syntactic Analyzer
Human Behavior Analyzer
Syntactic Triple
Movements Meanings
Semantic Analyzer
Fusion Mechanism
Linguistic meanings
Advise People
Knowledge Base
7Video Stream
Audio Stream
Sound Waves
Sequence of Images
Sphinx-4
Open CV
Recognized String
Movements Coordinates
C C Tool Parser
Human Behavior Analyzer
Syntax Analysis
Movements Meanings
C C Tool Boxer
Fusion Mechanism
Linguistic meanings
Advise People
Protegè Jena
Semantic Validation
8Integration
- All tools are integrated through socket
communication - C and Java interoperating normally
- The interchanging data format is XML
- Verifiable
- Easy data identification
- Easy data compatibility
- Low cost of manipulation
- Processing XML on demand
- Main issues transparency, extensibility and
customization
8
9Speech Recognition
- Sphinx 4
- Integrated in system!
- Fined tuned for maximum length of n-best lists
- 2 Language models created
- Scenario dependent 3-grams, 150 Words
- 86,9 Accuracy, Speed 0,94 X real time
- Wall Street Journal scenarios 3-grams, 5000
words - 68,6 Accuracy, Speed 3,19 X real time
9
10Speech Identification
- Standard GMM-based speaker identification system
- Developed in Matlab
- To the right are the results from a 2-person
development set as a function of Gaussians
10
11Speech Recognition Output
- ltsentence id"3" speaker1gt
- ltheardgt yesterday i received an email from
nick lt/heardgt - lthypothesis id"0"gt
- yesterday i received an email from nick
- lt/hypothesisgt
- lthypothesis id"2"gt
- yesterday i received an email from nick
to - lt/hypothesisgt
- lthypothesis id"4"gt
- yesterday i received an email from nick
for - lt/hypothesisgt
- lt/sentencegt
11
12Syntax and Semantics
12
13Syntax and Semantics
13
14Syntax and Semantics
14
15Image Processing
- OpenCV Library (Open Source)?
- Motion History to calculate the motion direction
- Matching template to identify objects in the
scene - Gaussian probability distribution to model the
color of clothes - Background subtraction technique to detect the
foreground - Blob identification to track people in the scene
15
16Image Processing
16
17Image Processing Output
- ltpositionsgt
- ltposition obj "1" frame "1633" x "58" y
"58" angle "10"/gt - ltposition obj "2" frame "1633" x "89" y
"186" angle "234"/gt - ltposition obj "1" frame "1634" x "58" y
"58" angle "10"/gt - ltposition obj "2" frame "1634" x "82" y
"190" angle "232"/gt - ltposition obj "1" frame "1635" x "58" y
"58" angle "10"/gt - ltposition obj "2" frame "1635" x "74" y
"196" angle "232"/gt - ltposition obj "1" frame "1636" x "58" y
"58" angle "10"/gt - ltposition obj "2" frame "1636" x "74" y
"196" angle "232"/gt - lt/positionsgt
17
18Ontology
- Restricted-domain ontology structure and its
instantiation - Pattern situations (semantic frames)?
- User profile - a priori collected information
about users - preferences, social relationships
information, etc. - and dynamically obtained
data - Using Protegè to create and edit
- Using Jena to manage the ontology data
18
19Ontology
19
20Project schedule
- Overall progress 65
- WP1 Workshop preparation Done
- WP2 Integration of multimodal components Done
- WP3 Multimodal fusion implementation Running
- WP4 Scenario implementation and reporting To
do - Strategic changes to achieve the goal
- Everybody focusing on the fusion mechanism
- Less priority on the improvement of modalities
- Each risky task has a plan B associated with less
time consuming, but less robust too.
20
21Next Steps
- Intergration of WordNet into the ontology
- Rules to process human behavior
- Mapping the semantic analysis with the ontology
- Fusion mechanism
21