Ubiquitous Home: Retrieval of Experiences in a Home Environment - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Ubiquitous Home: Retrieval of Experiences in a Home Environment

Description:

Storing and retrieval of media, in different levels from photos to experiences ... There are some cases of swapping in paths between two persons when they walk ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 66
Provided by: csieN
Category:

less

Transcript and Presenter's Notes

Title: Ubiquitous Home: Retrieval of Experiences in a Home Environment


1
Ubiquitous Home Retrieval of Experiences in a
Home Environment
  • Gamhewage C. DE SILVA
  • Toshihiko YAMASAKI
  • Kiyoharu AIZAWA

2
Outline
  • Introduction
  • Ubiquitous Home
  • Sensors and Data Acquisition
  • Data Collection
  • Retrieval
  • Footstep segmentation, Video and Audio Handover
  • Key frame Extraction, Audio Segmentation
  • User Interaction
  • User Study
  • Discussion
  • Future Work

3
Introduction
4
Introduction
  • Automated capture of experience taking place at
    home is interesting.
  • Ex. first footstep of a child
  • Something is so important that people have a
    strong desire to include themselves in the
    experience, rather than carry a camera and shoot
    photos.

5
Introduction
  • Capture and retrieval of experience in a home
    like environment is extremely difficult.
  • Large number of cameras and microphones
  • Continuous recording of data result in a very
    large amount of data
  • Level of privacy
  • Most difficult
  • Retrieval and summarization of captured data
  • Queries for retrieval could be at vary different
    levels of complexity

6
Introduction
  • Multimedia retrieval for ubiquitous environments
    based solely in content analysis is neither
    efficient nor accurate
  • Make use of supplementary data from other sensors
    for easier retrievalex. Proximity sensor, domain
    knowledge

7
Introduction
  • The research combines two main areas
  • Ubiquitous Environment
  • Multimedia Retrieval

8
Ubiquitous Environment
  • Providing services to the people in the
    environment by detecting and recognizing their
    actions.
  • Storing and retrieval of media, in different
    levels from photos to experiences

9
Multimedia Retrieval
  • Common approach is content analysis
  • The use of context data where available can
    improve the performance greatly

10
Main work on this article
  • Capturing and retrieval of personal experiences
    in a ubiquitous environment that simulates a
    home.
  • Create electronic chronicle for capturing video
    using interactive queries
  • Main data Video and Audio
  • Context data from pressure based floor sensors to
    achieve fast and effective retrieval and
    summarization of video and audio data.
  • Audio analysis and segmentation are used to
    complement context based retrieval.

11
Ubiquitous Home
12
Ubiquitous Home
  • Sensors and Data Acquision
  • Data Collection

13
Sensors and Data Acquisition
  • Layout of ubiquitous home.

14
Sensors and Data Acquisition
  • Images are recorded at the rate of five frames
    per second and stored in JPEG file format.
  • Audio is sampled at 44.1kHz from each microphone
    and record into audio clip in mp3 file format and
    the duration is 1 minute.
  • The floor sensors are point-based pressure
    sensors spaced by 180mm in a rectangular grid.
    The sample rate is 6Hz.
  • Start state0, pressure over a threshold state1

15
Data Collection
  • Students experiment
  • Acquiring training data for actions and events
  • Audio data are not available during the
    experiment
  • Real-life experiment
  • No manual monitoring of video was performed
    during the experiment
  • The processing and analysis were performed offline

16
Retrieval
17
Retrieval
  • Footstep Segmentation
  • Video Handover
  • Audio Handover
  • Key Frame Extraction
  • Audio Segmentation for Retrieval

18
Retrieval
  • Only a few data sources will convey useful
    information at any given time.
  • Automatically select sources that will convey the
    most amount of information based on context data.
  • Only the selected sources will be queried to
    retrieve data and these data will be analyzed
    further for retrieval.

19
Retrieval
20
Footstep Segmentation
  • Noise
  • When there are footsteps on adjacent sensors
    (very small duration)
  • Relatively small weight such as a leg of a stool
    is placed in a sensor. (periodically)
  • Kohonen Self Organizing Maps (SOM)

21
Footstep Segmentation
  • 3-stage Agglomerative Hierarchical Clustering
    (AHC) algorithm is used to segment sensor
    activations into footstep sequences of different
    persons

22
Agglomerative Hierarchical Clustering algorithm
  • First stage
  • Combine to form single footsteps
  • Distance function for clustering is based on
    connectedness and overlap of duration

23
Agglomerative Hierarchical Clustering algorithm
  • Second stage
  • Combine to form path sequences based on
    physiological constraints
  • Ex. Range of distance between steps, overlap of
    duration in two steps, constraints on direction
    change

24
Agglomerative Hierarchical Clustering algorithm
  • Third stage
  • Compensate for the frgmentation of individual
    path due to the absence of sensors in some areas
  • Starting and ending timestamp, locations of the
    doors and furniture and information about places
    where floor sensors are not installed

25
Agglomerative Hierarchical Clustering algorithm
26
Footstep Segmentation
  • Errors
  • Some paths are still fragmented after clustering
    in the third stage
  • There are some cases of swapping in paths between
    two persons when they walk close to each other

27
Video Handover
  • Select cameras in a way that a good video
    sequence can be constructed.
  • Position-based handover
  • Based on simple view model, where the viewable
    region for each camera is specified in terms of
    floor sensor coordinates.

28
Position-based handover
  • Create a video sequence that has the minimum
    possible number of shots.
  • If the person can be seen from the previous
    camera, then that camera is selected.
  • Otherwise, the viewable regions for the cameras
    are examined in a predetermined order and the
    first match is selected.

29
Position-based handover
(1) The change of color of the arrow indicates
how the camera changes with the position of the
person. (2) It is possible to acquire a frontal
view due to the positioning and orientation of
cameras.
30
Audio Handover
  • Dub the video sequences
  • Not necessary to use all of them since a
    microphone can cover a larger region compared to
    a camera

31
Audio Handover
  • Each camera is associated with one microphone for
    audio retrieval.
  • Camera installed in a room
  • From the microphone that is located in the center
    of that room
  • Camera installed in the corridor
  • From the microphone that is closet to the center
    of the region seen by that camera is selected

32
Audio Handover
(1) Minimize transitions between microphones (2)
Uniform amplitude level
33
Video Audio Handover
34
Key Frame Extraction
  • The video sequence constructed using video
    handover has be sample to extract key frames.
  • For complete and compact
  • Minimize the number of redundant key frames while
    ensuring that important key frames are not missed

35
Key Frame Extraction
T is a constant time interval.
36
Key Frame Extraction
  • Adaptive spatio-temporal sampling algorithm
  • The time interval for sampling the next key frame
    is reduced with footstep, thereby sampling more
    key frames when there are more footsteps

37
Key Frame Extraction
  • Evaluation
  • The subjects extracted key frames form four video
    clips according to their own choice.
  • Create average key frame sets which are used as
    ground truth for evaluation
  • They voted for the key frame set that summarized
    the sequence best.

38
Key Frame Extraction
39
Key Frame Extraction
40
Audio Segmentation for Retrieval
  • The floor sensors are unable to capture data when
    people are not treading on a floor area with
    sensors.
  • They are not activated if the pressure on the
    sensors is not sufficiently large.
  • Audio-based retrieval can also be conducted
    independently to support various types of queries.

41
Audio Segmentation for Retrieval
  • The amount of audio to be processed is quite
    large.
  • Tread-off
  • Utilizing the redundancy to improve the accuracy
    of retrieval
  • Minimizing processing by removing redundancy

42
Audio Segmentation for Retrieval
  • Eliminate audio corresponding to silence.
  • Compare the RMS power of the audio signal against
    a threshold value.
  • RMS(Root Mean Square) is a statistical measure of
    the magnitude of a varying quantity.

43
Audio Segmentation for Retrieval
  • Audio clips with one hour were extracted from
    different times of day.
  • These clips were partitioned into frames having
    300 samples.
  • Adjacent frames had a 50 overlap.
  • The RMS value of each frame is calculated and
    recorded, and the statistics obtained for each
    clip.

44
Audio Segmentation for Retrieval
  • Probabilistic distribution of the RMS values for
    different audio clips were not significantly
    different.
  • Combine to a single probabilistic model for
    silence and noise

45
Audio Segmentation for Retrieval
  • The threshold for each microphone is estimated by
    analyzing audio data for silence and noise for
    that microphone.
  • Threshold value was selected to be at 99 level
    of confidence according to this distribution.
  • Below 100 because false negatives(sound
    misclassified as silence) are more costly than
    false positives(silence misclassified as sound).

46
Silence Elimination
  • First stage based on individual microphone
  • If RMS value of each frame is large than the
    threshold, the frame is considered to contain
    sound.
  • Sets of contiguous frames with duration less than
    0.1s are removed.
  • Sets of contiguous frames with duration less than
    0.5s apart are combined together to form single
    segment.

47
Silence Elimination
  • Second stage based on multiple microphones in
    close proximity to reduce false positives.
  • For each microphone
  • B(n) Binary sound segment function
  • C(n) Cumulative sound segment function

48
Silence Elimination
  • Binary sound segment function
  • B(n) 1 if there is sound in the n-th second of
    audio stream
  • B(n) 0 otherwise
  • For the set of microphones in the same room

49
Silence Elimination
  • Noise
  • random
  • It is less likely that noise in sound segments
    from different microphones occur simultaneously.
  • Small duration

50
Silence Elimination
  • Voting algorithm to determine the sound segment
    function - S(n)
  • S(n) 1 if C(n) convolution M(n) gt ceil(k/2)
  • S(n) 0 otherwise
  • M(n) 111
  • K number of microphones installed in the location

51
Audio Segmentation for Retrieval
  • Video is retrieved from all cameras in the room
    for each sound segment.
  • The video created by handover is extended to
    include the time during which sounds were present
    before the start of the footstep sequence

52
User Interaction
53
User Interaction
54
User Study-Real-Life Experiment
55
User Study
  • 1st requirement study
  • 2nd
  • Given a demonstration on how to use the system
  • Summit their own queries
  • Select video clips that they would like to keep
  • 3rd feedback about the system

56
Discussion
57
Discussion
  • Issues Related to Capture
  • Algorithm for Retrieval
  • Real-Life Experiment

58
Issues Related to Capture
  • Continuous capture
  • The research was carried out at a different
    location from the home-like environment.
  • Experiments with families are quite difficult to
    arrange and the cost of losing important data due
    to algorithms with sufficient accuracy is quite
    high.
  • Problem large amount of disk space

59
Issues Related to Capture
  • Some of microphones seem to be redundant, given
    their range and directivity.
  • Save disk space
  • Floor sensors are more expensive and difficult to
    maintain
  • Movement of furniture

60
Algorithm for Retrieval
  • The accuracy of footstep segmentation
    deteriorates when the number of persons in the
    house is large and with the movement of furniture
  • Video handover can be improved by considering
    occlusion by other persons when selecting the
    camera.
  • For audio handover, smoother transitions are
    possible by looking for silence near the point of
    microphone change.

61
Algorithm for Retrieval
  • Key frame extraction
  • Human-human and human-object interaction
  • Audio-based video retrieval will retrieved false
    result if the house is located at a place where
    loud sounds can enter the house from outside

62
Real-Life Experiment
  • The subjects in students experiments were
    independent in their actions.
  • The behavior of the family in the real-life
    experiment was in the form of a group.
  • Accuracy of footstep segmentation is decreased.

63
Future Work
64
Future Work
  • Further clustering of floor sensor data and
    classification of audio data.
  • Face detection

65
Thank you
Write a Comment
User Comments (0)
About PowerShow.com