Title: ICANN 2006
1Affect Recognition
Artificial Intelligence Information Analysis
Laboratory Department of Informatics Aristotle
University of Thessaloniki GREECE www.aiia.csd.aut
h.gr
Ioannis Pitas, Constantine Kotropoulos, Nikos
Nikolaidis e-mail pitas,costas,nikolaid_at_aiia.cs
d.auth.gr Contributors Dimitrios Ververidis,
Irene Kotsia, Margarita Kotti, Vasiliki Moschou
ICANN 2006 Athens, Greece 10-14 September 2006
2Scientific Research Competence
- Signal and Image Processing
- Speech processing
- Physiological signal processing
- Facial image and video processing
- Computer Graphics
- Human body posture analysis,
- Virtual reality
- Computational Intelligence
- Pattern recognition
- Machine learning
3Speech Processing
- Detected start and end of the utterance
- Short-term feature extraction
- Interpolation for obtaining contours
1 plateaux at max 0.5 rising slopes 0
silence -0.5 falling slopes
-1 plateaux at min
4Physiological Signal Processing
- Sweat (Galvanic skin Response)
- Heart Beat Rate
Period
Finger Pressure
Freq.
Samples
5Facial Image/Video Processing
6Classifier Design
- Mixtures of Gaussians Expectation Maximization
algorithm - 2-D Example
Title Sum Log-Likelihood function Blue
Samples Red Total likelihood between
0.7,0.75 Green Partial probability in 0.8,
0.85
7Probability of correct classification
Probability of correct classification for the
Bayes classifier
Class distributions single Gaussian densities
Real model
e)
a)
Red Using real model Blue Dispersion when cross
validation is used
Simulated database (500 samples)
b)
Simulated Design set (450 samples)
c)
Simulated Test set (50 samples)
d)
8Feature Selection
- Optimum feature set selection
- Sequential and Floating Sequential algorithms
9Selected Related Publications
- D. Ververidis and C. Kotropoulos, "Emotional
speech recognition Resources, features, methods,
and applications," Speech Communication, vol.
48, no. 9, pp. 1162-1181, Sep. 2006. - I. Kotsia and I. Pitas, "Facial Expression
Recognition in Image Sequences using Geometric
Deformation Features and Support Vector
Machines, IEEE Transactions on Image Processing,
accepted 2006. - D. Ververidis, C. Kotropoulos, and I. Pitas,
"Automatic Emotional Speech Classification," in
Proc. 2004 Int. Conf. Acoustics, Speech, and
Signal Processing, vol. 1, pp. 593-596, 2004. - D. Ververidis and C. Kotropoulos, Fast
sequential floating forward selection applied to
emotional speech features estimated on DES and
SUSAS data collections, in Proc. XIV European
Signal Processing Conf., Florence, September
2006. - I. Kotsia and I. Pitas, "Real time facial
expression recognition from image sequences using
Support Vector Machines", in Proc. 2006 IEEE Int.
Conf. Image Processing, Genova, Italy, 11-14
September, 2005 - V. Moschou, D. Ververidis, and C. Kotropoulos,
"On the variants of the self-organizing map that
are based on order statistics," in Proc. 2006
Int. Conf. Artificial Neural Networks, Athens,
Sep. 2006. - C. Kotropoulos and V. Moschou, Self Organizing
Maps for Reducing the Number of Clusters by One
on Simplex Subspaces," in Proc. 2006 IEEE Int.
Conf. Acoustics, Speech, and Signal Processing,
vol. 5, pp. 725-728, May 2006. - M. Kotti, C. Kotropoulos, B. Ziolko, I. Pitas,
and V. Moschou, "A framework for dialogue
detection in movies," in Proc. Int. Workshop
Multimedia Content Representation,
Classification, and Security, Istanbul, Sep. 2006.
10Related Projects Current Research Activities
- National projects
- Use of Virtual Reality for training pupils to
deal with earthquakes - Multimodal emotion recognition in call centers
- Information organization, browsing, and retrieval
in multimedia - European projects
- VISNET-NOE www.visnet-noe.org
- MUSCLE-NOE www.muscle-noe.org
- SIMILAR-NOE
11Video genre classification
12Immersion assessment in Virtual Reality
13Service quality assessment in Call Centers
14Multimodal emotion recognition at call centers
- Why?
- It is a method to estimate the customer
needs and adopt to them in a better system
deployment. - If a negative emotion is detected in a
customer (for example irritation or anxiety) then
the call is diverted to a human agent.
- Multimodal?
- A fusion of audio and video
- information channels can
- lead to better emotion
- recognition results.
- Motivation
- MMS is a killer application of
- 3rd generation cell phones.
- Videophones are also
- becoming widely used.
- Computers with webcams
- are commonplace.