Ubiquitous Home: Retrieval of Experiences in a Home Environment - PowerPoint PPT Presentation

1 / 65

About This Presentation

Title:

Ubiquitous Home: Retrieval of Experiences in a Home Environment

Description:

Storing and retrieval of media, in different levels from photos to experiences ... There are some cases of swapping in paths between two persons when they walk ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 66

Provided by: csieN

Category:

more less

Transcript and Presenter's Notes

Title: Ubiquitous Home: Retrieval of Experiences in a Home Environment

1
Ubiquitous Home Retrieval of Experiences in a
Home Environment

Gamhewage C. DE SILVA
Toshihiko YAMASAKI
Kiyoharu AIZAWA

2
Outline

Introduction
Ubiquitous Home
Sensors and Data Acquisition
Data Collection
Retrieval
Footstep segmentation, Video and Audio Handover
Key frame Extraction, Audio Segmentation
User Interaction
User Study
Discussion
Future Work

3
Introduction
4
Introduction

Automated capture of experience taking place at
home is interesting.
Ex. first footstep of a child
Something is so important that people have a
strong desire to include themselves in the
experience, rather than carry a camera and shoot
photos.

5
Introduction

Capture and retrieval of experience in a home
like environment is extremely difficult.
Large number of cameras and microphones
Continuous recording of data result in a very
large amount of data
Level of privacy
Most difficult
Retrieval and summarization of captured data
Queries for retrieval could be at vary different
levels of complexity

6
Introduction

Multimedia retrieval for ubiquitous environments
based solely in content analysis is neither
efficient nor accurate
Make use of supplementary data from other sensors
for easier retrievalex. Proximity sensor, domain
knowledge

7
Introduction

The research combines two main areas
Ubiquitous Environment
Multimedia Retrieval

8
Ubiquitous Environment

Providing services to the people in the
environment by detecting and recognizing their
actions.
Storing and retrieval of media, in different
levels from photos to experiences

9
Multimedia Retrieval

Common approach is content analysis
The use of context data where available can
improve the performance greatly

10
Main work on this article

Capturing and retrieval of personal experiences
in a ubiquitous environment that simulates a
home.
Create electronic chronicle for capturing video
using interactive queries
Main data Video and Audio
Context data from pressure based floor sensors to
achieve fast and effective retrieval and
summarization of video and audio data.
Audio analysis and segmentation are used to
complement context based retrieval.

11
Ubiquitous Home
12
Ubiquitous Home

Sensors and Data Acquision
Data Collection

13
Sensors and Data Acquisition

Layout of ubiquitous home.

14
Sensors and Data Acquisition

Images are recorded at the rate of five frames
per second and stored in JPEG file format.
Audio is sampled at 44.1kHz from each microphone
and record into audio clip in mp3 file format and
the duration is 1 minute.
The floor sensors are point-based pressure
sensors spaced by 180mm in a rectangular grid.
The sample rate is 6Hz.
Start state0, pressure over a threshold state1

15
Data Collection

Students experiment
Acquiring training data for actions and events
Audio data are not available during the
experiment
Real-life experiment
No manual monitoring of video was performed
during the experiment
The processing and analysis were performed offline

16
Retrieval
17
Retrieval

Footstep Segmentation
Video Handover
Audio Handover
Key Frame Extraction
Audio Segmentation for Retrieval

18
Retrieval

Only a few data sources will convey useful
information at any given time.
Automatically select sources that will convey the
most amount of information based on context data.
Only the selected sources will be queried to
retrieve data and these data will be analyzed
further for retrieval.

19
Retrieval
20
Footstep Segmentation

Noise
When there are footsteps on adjacent sensors
(very small duration)
Relatively small weight such as a leg of a stool
is placed in a sensor. (periodically)
Kohonen Self Organizing Maps (SOM)

21
Footstep Segmentation

3-stage Agglomerative Hierarchical Clustering
(AHC) algorithm is used to segment sensor
activations into footstep sequences of different
persons

22
Agglomerative Hierarchical Clustering algorithm

First stage
Combine to form single footsteps
Distance function for clustering is based on
connectedness and overlap of duration

23
Agglomerative Hierarchical Clustering algorithm

Second stage
Combine to form path sequences based on
physiological constraints
Ex. Range of distance between steps, overlap of
duration in two steps, constraints on direction
change

24
Agglomerative Hierarchical Clustering algorithm

Third stage
Compensate for the frgmentation of individual
path due to the absence of sensors in some areas
Starting and ending timestamp, locations of the
doors and furniture and information about places
where floor sensors are not installed

25
Agglomerative Hierarchical Clustering algorithm
26
Footstep Segmentation

Errors
Some paths are still fragmented after clustering
in the third stage
There are some cases of swapping in paths between
two persons when they walk close to each other

27
Video Handover

Select cameras in a way that a good video
sequence can be constructed.
Position-based handover
Based on simple view model, where the viewable
region for each camera is specified in terms of
floor sensor coordinates.

28
Position-based handover

Create a video sequence that has the minimum
possible number of shots.
If the person can be seen from the previous
camera, then that camera is selected.
Otherwise, the viewable regions for the cameras
are examined in a predetermined order and the
first match is selected.

29
Position-based handover
(1) The change of color of the arrow indicates
how the camera changes with the position of the
person. (2) It is possible to acquire a frontal
view due to the positioning and orientation of
cameras.
30
Audio Handover

Dub the video sequences
Not necessary to use all of them since a
microphone can cover a larger region compared to
a camera

31
Audio Handover

Each camera is associated with one microphone for
audio retrieval.
Camera installed in a room
From the microphone that is located in the center
of that room
Camera installed in the corridor
From the microphone that is closet to the center
of the region seen by that camera is selected

32
Audio Handover
(1) Minimize transitions between microphones (2)
Uniform amplitude level
33
Video Audio Handover
34
Key Frame Extraction

The video sequence constructed using video
handover has be sample to extract key frames.
For complete and compact
Minimize the number of redundant key frames while
ensuring that important key frames are not missed

35
Key Frame Extraction
T is a constant time interval.
36
Key Frame Extraction

Adaptive spatio-temporal sampling algorithm
The time interval for sampling the next key frame
is reduced with footstep, thereby sampling more
key frames when there are more footsteps

37
Key Frame Extraction

Evaluation
The subjects extracted key frames form four video
clips according to their own choice.
Create average key frame sets which are used as
ground truth for evaluation
They voted for the key frame set that summarized
the sequence best.

38
Key Frame Extraction
39
Key Frame Extraction
40
Audio Segmentation for Retrieval

The floor sensors are unable to capture data when
people are not treading on a floor area with
sensors.
They are not activated if the pressure on the
sensors is not sufficiently large.
Audio-based retrieval can also be conducted
independently to support various types of queries.

41
Audio Segmentation for Retrieval

The amount of audio to be processed is quite
large.
Tread-off
Utilizing the redundancy to improve the accuracy
of retrieval
Minimizing processing by removing redundancy

42
Audio Segmentation for Retrieval

Eliminate audio corresponding to silence.
Compare the RMS power of the audio signal against
a threshold value.
RMS(Root Mean Square) is a statistical measure of
the magnitude of a varying quantity.

43
Audio Segmentation for Retrieval

Audio clips with one hour were extracted from
different times of day.
These clips were partitioned into frames having
300 samples.
Adjacent frames had a 50 overlap.
The RMS value of each frame is calculated and
recorded, and the statistics obtained for each
clip.

44
Audio Segmentation for Retrieval

Probabilistic distribution of the RMS values for
different audio clips were not significantly
different.
Combine to a single probabilistic model for
silence and noise

45
Audio Segmentation for Retrieval

The threshold for each microphone is estimated by
analyzing audio data for silence and noise for
that microphone.
Threshold value was selected to be at 99 level
of confidence according to this distribution.
Below 100 because false negatives(sound
misclassified as silence) are more costly than
false positives(silence misclassified as sound).

46
Silence Elimination

First stage based on individual microphone
If RMS value of each frame is large than the
threshold, the frame is considered to contain
sound.
Sets of contiguous frames with duration less than
0.1s are removed.
Sets of contiguous frames with duration less than
0.5s apart are combined together to form single
segment.

47
Silence Elimination

Second stage based on multiple microphones in
close proximity to reduce false positives.
For each microphone
B(n) Binary sound segment function
C(n) Cumulative sound segment function

48
Silence Elimination

Binary sound segment function
B(n) 1 if there is sound in the n-th second of
audio stream
B(n) 0 otherwise
For the set of microphones in the same room

49
Silence Elimination

Noise
random
It is less likely that noise in sound segments
from different microphones occur simultaneously.
Small duration

50
Silence Elimination

Voting algorithm to determine the sound segment
function - S(n)
S(n) 1 if C(n) convolution M(n) gt ceil(k/2)
S(n) 0 otherwise
M(n) 111
K number of microphones installed in the location

51
Audio Segmentation for Retrieval

Video is retrieved from all cameras in the room
for each sound segment.
The video created by handover is extended to
include the time during which sounds were present
before the start of the footstep sequence

52
User Interaction
53
User Interaction
54
User Study-Real-Life Experiment
55
User Study

1st requirement study
2nd
Given a demonstration on how to use the system
Summit their own queries
Select video clips that they would like to keep
3rd feedback about the system

56
Discussion
57
Discussion

Issues Related to Capture
Algorithm for Retrieval
Real-Life Experiment

58
Issues Related to Capture

Continuous capture
The research was carried out at a different
location from the home-like environment.
Experiments with families are quite difficult to
arrange and the cost of losing important data due
to algorithms with sufficient accuracy is quite
high.
Problem large amount of disk space

59
Issues Related to Capture

Some of microphones seem to be redundant, given
their range and directivity.
Save disk space
Floor sensors are more expensive and difficult to
maintain
Movement of furniture

60
Algorithm for Retrieval

The accuracy of footstep segmentation
deteriorates when the number of persons in the
house is large and with the movement of furniture
Video handover can be improved by considering
occlusion by other persons when selecting the
camera.
For audio handover, smoother transitions are
possible by looking for silence near the point of
microphone change.

61
Algorithm for Retrieval

Key frame extraction
Human-human and human-object interaction
Audio-based video retrieval will retrieved false
result if the house is located at a place where
loud sounds can enter the house from outside

62
Real-Life Experiment

The subjects in students experiments were
independent in their actions.
The behavior of the family in the real-life
experiment was in the form of a group.
Accuracy of footstep segmentation is decreased.

63
Future Work
64
Future Work