Title: XML Watermarking
1XML Watermarking Information Hiding
??? ??????????? ???????????? ???????????????
2Markup Language
- SGML (Standard Generalized Markup Language)
- XML (Extensible Markup Language)
- HTML (HyperText Markup Language)
- XHTML
3Publishing Information in WWW
4Publishing Information in WWW
5XML Document
Corresponding Watermarking and information
hiding techniques can be employed
- XML element type
- text
- image
- Video
- Audio
- executive codes
Can we use its own information to do watermarking
or information hiding?
6Known content-based technique
- Change font size, color
- Append white spaces at the end of a line
- 0-space (x0020)
- 1-tab (x0009)
7Shortcomings
- white spaces at the end of a line
- Increase page size
- Layout might be changed
- Detect very easily by selection
8Specification
- Element (Entity)
- ltname attribute1 attributengt contents lt/name gt
- ltname attribute1 attributengt lt/name gt
- ltname attribute1 attributengt
- Attribute
- namevalue
- Example
- ltfont face"Verdana" size"4" color"FFFF00"gtStud
ent Number lt/fontgt
9Properties of markup labels
- Property 1 Element and attribute names are
case-insensitive - ltfont face"Verdana" size"4" color"FFFF00"gtStud
ent Number lt/fontgt - ltFont face"Verdana" size"4" color"FFFF00"gtStud
ent Number lt/fontgt - ltfont face"Verdana" size"4" color"FFFF00"gtStud
ent Number lt/Fontgt - ltFont face"Verdana" size"4" color"FFFF00"gtStud
ent Number lt/Fontgt
10Properties of markup labels
- Property 2 Attributes are order-insensitive
- ltfont face"Verdana" size"4" color"FFFF00"gtStud
ent Number lt/fontgt - ltfont size"4" face"Verdana" color"FFFF00"gtStud
ent Number lt/fontgt
11Pair attributes technique
- pair attributes order (Corinna John)
- key attribute, corresponding attribute
- key / corresponding (1) corresponding/key (0)
- ltfont face"Verdana" size"4" color"FFFF00"gtStud
ent Namelt/fontgt - ltfont size"4" face"Verdana" color"FFFF00"gtStud
ent Namelt/Fontgt - key / corresponding table
- size, detect difficultly
12Attributes permutation technique
- equivalent attributes permutation
- ltfont face"Verdana" size"4" color"FFFF00"gtStud
ent Namelt/fontgt - ltfont face"Verdana" color"FFFF00"
size"4"gtStudent Namelt/fontgt - ltfont size"4" face"Verdana" color"FFFF00"gtStud
ent Namelt/fontgt - ltfont size"4" color"FFFF00" face"Verdana"
gtStudent Namelt/fontgt - ltfont color"FFFF00" face"Verdana" size"4"
gtStudent Namelt/fontgt - ltfont color"FFFF00" size"4" face"Verdana"
gtStudent Namelt/fontgt - lexicographic (alphabetic) order f precedes a
permutation g iff f(k)ltg(k) for the minimum
value of k such that f(k)ltgtg(k).
13Attributes permutation technique
- Generating attributes permutations in
lexicographical order - ltfont color"FFFF00" face"Verdana" size"4"
gtStudent Namelt/fontgt - ltfont color"FFFF00" size"4" face"Verdana"
gtStudent Namelt/fontgt - ltfont face"Verdana" color"FFFF00"
size"4"gtStudent Namelt/fontgt - ltfont face"Verdana" size"4" color"FFFF00"gtStud
ent Namelt/fontgt - ltfont size"4" face"Verdana" color"FFFF00"gtStud
ent Namelt/fontgt - ltfont size"4" color"FFFF00" face"Verdana"
gtStudent Namelt/fontgt - attributes permutations ?? order numbers
- color face size 0
- color size face 1
- face color size 2
- face size color 3
- size face color 4
- Size color face 5
14Attributes permutation technique
- If the number of attributes of an element gt2, it
may be used to embed hidden information or
watermark - Let be the elements, whose number of
attributes , in a web page, the
embedded capacity is
15Embedded capacity example
Name of web page Capacity (bytes)
www.163.com 48
www.sina.com.cn 279
www.sohu.com.cn 338
www.microsfot.com 15
www.ebay.com 78
www.yahoo.com 33
16Perceivability
- Can not perceive when browse the page
- Hard to perceive through reading the source codes
17Robust or resistant against editing
18Robust or resistant against editing
- Font, size, color can be changed
19Security
- attributes permutations ?? order numbers
- color face size 0
- color size face 1
- face color size 2
- face size color 3
- size face color 4
- Size color face 5
- Apply hash to concatenation of attributes and key
to get order number
20Performance comparison
Type Size change Perceivable by Perceivable by Capacity (bit) Extra payload
Type Size change view code Capacity (bit) Extra payload
White space Y easy easy Page lines N
Case change N N easy Tags N
Attribute pair N N hard Pair table
Equivalent attributes N N hard N
21Other potential properties
- String delimiters
- namevalue
- namevalue
- White Space Between the Elements Name and the
First Attribute - ltfont faceverdana size3gt
- ltfont faceverdana size3gt
- White Space Between Attributes
- ltfont faceverdana size3gt
- ltfont faceverdana size3gt
22Other potential properties
- White Space after
- ltfont faceverdana size3gt
- ltfont face verdana size3gt
- White Space Between Elements
- lttdgtcon1lt/tdgtlttdgtcon2lt/tdgt
- lttdgtcon1lt/tdgt lttdgtcon2lt/tdgt
23Other potential properties
- The default value of an attribute
- ltfont faceverdana size3gt
- ltfont faceverdanagt
24Current progress
- Introduce insignificant attributes
- ltfont faceverdanagt
- ltfont faceverdana xyzabcdgt
- Break through the capacity bottle neck
- Web page watermarking
- Text watermarking
25Our focus on watermarking
- Text content security
- Funded by NSFC Key Project 60736016
- Funded by NSFC 60373062
- Software watermarking
- Funded by NSFC 60573045
- Wireless sensor network security
- Funded by 973 Project 2006CB303000
- Funded by NSFC 60873198
- Steganalysis
- Funded by 115 Project
26??
????0731-8821341,13875971258 Emailsunnudt_at_163.co
m http//nisl.hnu.cn/
27- HyperText Markup Language (HTML), version 4.0,
the publishing language of the World Wide Web - Recall that in HTML, element and attribute names
are case-insensitive the convention is meant to
encourage readability. - Element and attribute names in this document have
been marked up and may be rendered specially by
some user agents. - http//www.w3.org/TR/1998/REC-html40-19980424/abou
t.htmlh-1.2.1
28http//www.w3.org/TR/html/xhtml
- HTML 4 HTML4 is an SGML (Standard Generalized
Markup Language) application conforming to
International Standard ISO 8879, and is widely
regarded as the standard publishing language of
the World Wide Web. - SGML is a language for describing markup
languages, particularly those used in electronic
document exchange, document management, and
document publishing. HTML is an example of a
language defined in SGML. - SGML has been around since the middle 1980's and
has remained quite stable. Much of this stability
stems from the fact that the language is both
feature-rich and flexible. This flexibility,
however, comes at a price, and that price is a
level of complexity that has inhibited its
adoption in a diversity of environments,
including the World Wide Web. - HTML, as originally conceived, was to be a
language for the exchange of scientific and other
technical documents, suitable for use by
non-document specialists. HTML addressed the
problem of SGML complexity by specifying a small
set of structural and semantic tags suitable for
authoring relatively simple documents. In
addition to simplifying the document structure,
HTML added support for hypertext. Multimedia
capabilities were added later. - In a remarkably short space of time, HTML became
wildly popular and rapidly outgrew its original
purpose. Since HTML's inception, there has been
rapid invention of new elements for use within
HTML (as a standard) and for adapting HTML to
vertical, highly specialized, markets. This
plethora of new elements has led to
interoperability problems for documents across
different platforms.
29- XML is the shorthand name for Extensible Markup
Language XML. - XML was conceived as a means of regaining the
power and flexibility of SGML without most of its
complexity. Although a restricted form of SGML,
XML nonetheless preserves most of SGML's power
and richness, and yet still retains all of SGML's
commonly used features. - While retaining these beneficial features, XML
removes many of the more complex features of SGML
that make the authoring and design of suitable
software both difficult and costly.
30- XHTML is a family of current and future document
types and modules that reproduce, subset, and
extend HTMLÂ 4 HTML4. XHTML family document
types are XML based, and ultimately are designed
to work in conjunction with XML-based user
agents. The details of this family and its
evolution are discussed in more detail in
XHTMLMOD. - XHTML 1.0 (this specification) is the first
document type in the XHTML family. It is a
reformulation of the three HTMLÂ 4 document types
as applications of XML 1.0 XML. It is intended
to be used as a language for content that is both
XML-conforming and, if some simple guidelines are
followed, operates in HTMLÂ 4 conforming user
agents. Developers who migrate their content to
XHTML 1.0 will realize the following benefits - XHTML documents are XML conforming. As such, they
are readily viewed, edited, and validated with
standard XML tools. - XHTML documents can be written to operate as well
or better than they did before in existing
HTMLÂ 4-conforming user agents as well as in new,
XHTML 1.0 conforming user agents. - XHTML documents can utilize applications (e.g.
scripts and applets) that rely upon either the
HTML Document Object Model or the XML Document
Object Model DOM. - As the XHTML family evolves, documents conforming
to XHTML 1.0 will be more likely to interoperate
within and among various XHTML environments. - The XHTML family is the next step in the
evolution of the Internet. By migrating to XHTML
today, content developers can enter the XML world
with all of its attendant benefits, while still
remaining confident in their content's backward
and future compatibility.
31Terrorism
http//www.arabteam2000-forum.com/
Jihad??????????(????)???????
32Watermark embedding
33Watermark detection
34Classification of watermarkingby host
- Image
- Audio
- Video
- Text (Document)
- Software / Executive code
- Database
35Text watermarking Information Hiding
Watermarking
Information hiding
36Any redundance?
NO
Character
Code
One to one
37Utilize format information
- Line-shift Coding
- vertically displacing an entire text line
- Word-shift Coding
- horizontally shifting the location of a word
within a text line - Character feature coding
- altering a particular feature of an individual
character
38Utilize language information
- Synonym substitution
- Syntactic transform
- TMR tree (text meaning representation)
- Add spaces at the end of a line
39Text recoverable watermarking
- Format based watermarking?
- Natural language watermarking?
- How to combine??
- Text recoverable watermarking???