CADAL Digital Library presentation

About This Presentation

Transcript and Presenter's Notes

Title: CADAL Digital Library

1
CADAL Digital Library
The 2nd International Conferenceon Universal
Digital Library(ICUDL 2006)

Wu Jiang-Qin,Zhuang Yue-Ting
Pan Yun-he
College of Computer Science, Zhejiang
University,China
November 18,2006

2
Outline
Introduction
1
2
Unified Paralleling Search
Multimedia Analysis and Retrieval
3
Bilingual services
4
5
Chinese Calligraphy Character Retrieval
6
Conclusion and Future Work
3
Outline
Introduction
1
2
Unified Paralleling Search
Multimedia Analysis and Retrieval
3
Bilingual services
4
5
Chinese Calligraphy Character Retrieval
6
Conclusion and Future Work
4
CADAL

The China-Us Million Book Digital Library(CADAL)
is an international cooperation program between
China and the US.
The objective of CADAL project , is to create a
free-to-read, searchable collection of one
million book, available to everyone over the
internet.
CADAL is the important part of Universal Digital
Library(UDL), universal access to human
knowledge.

5
The challenges and services (1)

the amount of the digital resources including
digital books and multimedia for research and
education can reach 100 terabyte(The number of
digital books is 1,023,425 by October of
2006,including previous Chinese ancient books,
Chinese minguo books ,Chinese Modern books,
Chinese degree dissertation,English
books,image,video etc..
active services of unified paralleling search for
the different types of digital resources

6
The challenges and services (2)

image, video,3-D model and other types of media
resources, various types of media resources are
included in the CADAL resources.
the services of quickly retrieving and
structurally browsing of multimedia documents
including image, video

7
The challenges and services (3)

there are two kinds of language digital books.
Chinese and English, in the CADAL resources.
the services of bilingual translation

8
The challenges and services (4)

traditional Chinese culture resources are
important part of the CADAL resources.
the services related to Chinese traditional
culture resources.

9
Outline
Introduction
1
2
Unified Paralleling Search
Multimedia Analysis and Retrieval
3
Bilingual services
4
5
Chinese Calligraphy Character Retrieval
6
Conclusion and Future Work
10
Background

TB volume of various types of digital resources,
such as dissertation, ancient minguo book, modern
book, minguo journal, English book, drawing,
video and illustration are available in the
CADAL, which is one of the distinct
characteristic of CADAL. So CADAL presents a
challenge for the technique of searching
resources based on metadata.

11
Metadata

Dublin core metadata is used to describe the
million digital books in the CADAL project.
Metadata corresponding to the other types of
multimedia resources are used to describe them.
Independent data map is designed for each kind of
resource metadata.

12
Unified parallel searching

In order to meet the requirements of different
users and improve the users interactive
experience, the service for the different types
of digital resources is provided for users
convenient searching.

13
(No Transcript)
14
(No Transcript)
15
Outline
Introduction
1
2
Unified Paralleling Search
Multimedia Analysis and Retrieval
3
Bilingual services
4
5
Chinese Calligraphy Character Retrieval
6
Conclusion and Future Work
16
Background

As the digital library contains unstructured
multimedia resources such as images, videos,
audios etc besides digital books, effective and
efficient analysis and retrieval of multimedia
resources is a challenging problem in the CADAL
digital library.
Here we examine the analysis and retrieval issues
related to two primary kinds of multimedia, image
and video.

17
Contents

Content-based Image Retrieval
Image retrieval by peer indexing
Image annotation
Image search engine
Video analysis system
Video Browser(structure and summary)
Metadata-based Video Retrieval

18
Content based image retrieval

Extracting visual features
color featurecolor histogram, color moment,
color coherence vector, color correlgram
textureTamura textural feature and co-occurrence
textural feature
relevance feedback
Make image retrieval coincide with users
requirement

19
Content based image retrieval
Query example
Negative example
Relevance feedback
Image searching
Positive example
20
Image retrieval by peer index

A new scheme for image indexing, Peer Index, is
the method that describe images through
semantically relevant peer images.
In particular, each image is associated with a
two-level peer index, including
global peer index describing the data
characteristics of this image
personal peer indexes describing the user
characteristics of an individual user with
respect to this specific image
Both types of peer index are learned
interactively and incrementally from user
feedback information.

21
Peerindex-based image retrieval
semantic relevance feedback
Semantic query
22
Image annotation

Automatic semantic annotation for images by
machine learning and statistical modeling
Classify the training images, and create a
semantic skeleton for each class of the training
image.
Classify new image with Support Vector Machine
automatically, and describe it using the semantic
skeleton
Select the key words for the image by statistical
methods

23
Image annotation
............
classify
statistical learning
Semantic skeleton
annotation
tiger
annotate
Visual similar
classify segment
24
Text based image retrieval
25
Image search engine

We implemented an image search engine, Octopus,
which provides Peer Index and relevance feedback
to avoid the gap between the semantics and
low-level features, according to the intuitive
and simple idea that the semantic concept is
hidden in each image and the semantic concept
appears apparently in the relation between the
image and the other images.

26
Integrating into CADAL DL
27
The image retrieval interface
28
Our target for video

Analyze multimodal information, such as the
visual, the audial, motion and caption to
generate structural information and video summary
Support video browsing and video retrieval based
on metadata and structural information
efficiently

29
Main idea

Nonlinear browsingGenerate structural indexing
such as key frame, shot and shot group from the
original video stream
Content compressionAnalyze time sequence in
video stream, eliminate redundant data, and
generate the summary and the highlight scene for
the original video.

30
System

Video Fusion Analysis System (VideoFAS)
VideoBrowser

31
VideoFAS-system interface
Original Video
Similar Video shots are Clustered together
Video shot
32
VideoFAS-system functions

Basic operation
Importing and Saving
Appending
Separating the video stream into video and audio
data
transcoding and compressing

33
VideoFAS-system functions

Feature Extraction
Visual feature
color color histogram, color moment, color
coherence vector, color correlgram
Texture Tamura textural feature and
co-occurrence textural feature
shapecontour feature
Audial feature
temporal featurezero-crossing rate
Frequency featureMel coefficient?tone and
sub-band statistical feature
Target Feature
Integrate OpenCV face detection module into the
system Extract the face features

34
VideoFAS-system functions

Video structuring
shot detection
Cut shot detection
Transition shot detection
key frame extraction
Similar shot grouping
group the shots based on Support Vector Machine

35
(No Transcript)
36
VideoFAS-system functions

Video summarization
Summarize by Mining Non-Trivial Repeating
Patterns
Extract frequent and non-trivial shot sequence to
generate video summary

37
VideoFAS-system functions

Metadata annotation
Annotate Video clip with metadata conform to
Dublin Core Standard
Save the metadata and the video structural
information in database

38
VideoBrowser-framework
39
VideoBrowser-system interface
40
VideoBrowser-system interface
metadata
media player
Video structural information
41
System architecture
Web
Movies
Internet
Web server
Retrieval service
video data
firewall
switcher
Online storage Disk array
annotation
structuring
Archive server
summarization
taper(offline storage)
42
Outline
Introduction
1
2
Unified Paralleling Search
Multimedia Analysis and Retrieval
3
Bilingual services
4
5
Chinese Calligraphy Character Retrieval
6
Conclusion and Future Work
43
Background

As there are both English and Chinese books in
CADAL, bilingual services are required for users
to access resources in any language.

44
Services

Some technologies and prototypes have been
developed by north technical center on how to
carry out the multi-layered bilingual machine
translation in English and Chinese books, such as
the metadata translation between English and
Chinese
the accurate translation of proper nouns such as
names for unique individuals, events,or places
the selective translation in a full-text context
the translation of Old Chinese text
the distributed translation service technique.

45
Services

An online translation service is integrated into
the CADAL digital library.
Users can be directly conducted semantic-based
multi-linguistics retrieval of required
information in our CADAL digital library.
The translation of contents of a page on line.
The translation of metadata of a digital book.

46
Bilingual Search
47
The translation of contents of a page
48
(No Transcript)
49
Outline
Introduction
1
2
Unified Paralleling Search
Multimedia Analysis and Retrieval
3
Bilingual services
4
5
Chinese Calligraphy Character Retrieval
6
Conclusion and Future Work
50
Background

Since most people are interested in the art of
the beautiful styles of calligraphy character
rather than the meaning of the character, the
service of Chinese calligraphy character
retrieval is provided in the CADAL digital
library, treating them just as they are images
without recognizing them like OCR does.

51
Calligraphy art still alive in
52
Key issues

Feature extraction character complexity, stroke
density and shape, the three kinds of features of
the calligraphy character are proposed
similarity matching cost retrieve relevant
images according to it.

53
Contents

Chinese Calligraphy Page Segmentation
Features Extraction
Character Image Retrieval

54
Chinese Calligraphy Page Segmentation

The page image are binarized with characters in
black and the background in white.
Cut the page into columns according to the
vertical projecting histogram, and columns
continued to be cut into individual characters.
All the characters are normalized in order to
keep scale invariant Contour information,

55
Page segmentation
56
Features Extraction

shape
character complexity

57
shape representation

Calligraphic characters shape is represented by
its contour points.
The polar coordinates is used to describe
directional relationship of points instead of the
Cartesian coordinates.
For direction, we use 8 bins in equal degree size
to divide the whole space into 8 directions.
For radius, we use 4 bins
For each point of a given point set composed of
sampling points, its approximate shape context is
described by its relationship with the remaining
points in weighted bins.

58
shape representation
Contour point
59
Calligraphy Character Complexity

We use Calligraphy Character Complexity as a
filter at the beginning to discard the
calligraphy character that has no possibility to
be similar to the query.

L be the number of sampled contour points from
the query and Li be the number of sampled contour
points from candidate image. ? is the threshold
obtained by experience.
60
Character Image Retrieval

Compute the values of the character complexity of
the calligraphy character.
Normalize the scale size of the query and sample
its contour points.
Filter the candidate images by character
complexity
Extract the shape feature and employ the shape
matching method introduced in 6 to compute the
matching cost for every remaining candidate image
and the query.
Rank the results according to the matching cost,
and return.

61
The calligraphy character retrieval
62
interface of browsing the original works
63
Outline
Introduction
1
2
Unified Paralleling Search
Multimedia Analysis and Retrieval
3
Bilingual services
4
5
Chinese Calligraphy Character Retrieval
6
Conclusion and Future Work
64
All the services have been accessed by the users
from over 70 countries 280.000 times per day.
65
Conclusion and Future Work

With the increase of the number of the users and
the amount of the resources. future work with
CADAL digital library will proceed in several
directions
Improving the performance of the current
services, to be more complete and be more stable
Continuing exploring the application of
multimedia in Digital Library.

66
Thanks!Welcome Visiting CADAL Digital
Library(WWW.CADAL.ZJU.EDU.CN)
Email wujq_at_cs.zju.edu.cn yzhuang_at_zju.edu.cn

Write a Comment

User Comments (0)

About PowerShow.com

CADAL Digital Library PowerPoint PPT Presentation