Title: Collection of general data mining briefings
1Building Trustworthy Semantic Webs Lecture
5 XML and XML Security
Dr. Bhavani Thuraisingham
September 2006
2Objective of the Unit
- This unit will provide an overview of XML and
then discuss some security issues
3Outline of the Unit
- XML Elements
- XML Attributes
- XML DTD
- XML Schema
- XML Namespaces
- Federations
- Policy/Credential
- Access Control
- Third Party Publication
- XML Databases
- Inference Control
4What is XML all about?
- XML is needed due to the limitations of HTML and
complexities of SGML - It is an extensible markup language specified by
the W3C (World Wide Web Consortium) - Designed to make the interchange of structured
documents over the Internet easier - Key to XML used to be Document Type Definitions
(DTDs) - Defines the role of each element of text in a
formal model - XML schemas have now become critical to specify
the structure - XML schemas are also XML documents
5XML Elements
XML Statement John Smith is a Professor in
Texas This can be expressed as
follows ltProfessorgt ltnamegt John Smith
lt/namegt ltstategt Texas lt/stategt lt/Professorgt
6XML Elements
Now suppose this data can be read by anyone then
we can augment the XML statement by an additional
element called access as follows. ltProfessorgt
ltnamegt John Smith lt/namegt ltstategt Texas
lt/stategt ltaccessgt All, Read lt/accessgt lt/Professor
gt
7XML Elements
If only HR can update this XML statement, then we
have the following ltProfessorgt ltnamegt John
Smith lt/namegt ltstategt Texas lt/stategt ltaccessgt
HR department, Write lt/accessgt lt/Professorgt
8XML Elements
We may not wish for everyone to know that John
Smith is a professor, but we can give out the
information that this professor is in Texas.
This can be expressed as ltProfessorgt ltnamegt
John Smith, Govt-official, Read lt/namegt ltstategt
Texas, All, Read lt/stategt ltaccessgt HR
department, Write lt/accessgt lt/Professorgt
9XML Attributes
Suppose we want to specify to access based on
attribute values. One way to specify such access
is given below. ltProfessor Name John Smith,
Access All, Read Salary 60K, Access
Administrator, Read, Write Department
Security Access All, Read lt/Professor Here
we assume that everyone can read the name John
Smith and Department Security. But only the
administrator can read and write the salary
attribute.
10XML DTD
DTDs essentially specify the structure of XML
documents. Consider the following DTD for
Professor with elements Name and State. This
will be specified as lt!ELEMENT Professor
Officer (Name, State)gt lt!ELEMENT name
(PCDATA)gt lt!ELEMENR state (PCDATA)gt lt!ELEMENT
access (PCDATA).gt
11XML Schema
While DTDs were the early attempts to specify
structure for XML documents, XML schemas are far
more elegant to specify structures. Unlike
DTDs XML schemas essentially use the XML syntax
for specification. Consider the following
example ltComplexType name
ProfessorTypegt ltSequencegt ltelement name
name type string/gt ltelement name state
type string/gt ltelement name access type
strong/gt ltSequencegt lt/ComplexTypegt
12XML Namespaces
Namespaces are used for DISAMBIGUATION ltCountryX
Academic-Institution Xmlns CountryX
http//www.CountryX.edu/Instution DTD Xmlns
USA http//www.USA.edu/Instution DTD Xmlns
UK http//www.UK.edu/Instution DTD ltUSA
Title College USA Name University of Texas
at Dallas USA State Texas ltUK Title
University UK Name Cambridge
University UK State Cambs lt/CountryX
Acedmic-Instiutiongt
13XML Namespaces
ltCountry Academic-Institution ltAccess
Government-official, Read lt/Accessgt Xmlns
CountryX http//www.CountryX.edu/Instution
DTD Xmlns USA http//www.USA.edu/Instution
DTD Xmlns UK http//www.UK.edu/Instution
DTD ltUSA Title College USA Name
University of Texas at Dallas USA State
Texas ltUK Title University UK Name
Cambridge University UK State
Cambs lt/CountryX Academic-Institutiongt
14Federations/Distribution
Site 1 document ltProfessor-namegt ltIDgt 111
lt/IDgt ltNamegt John Smith lt/namegt ltStategt Texas
lt/stategt lt/Professor-namegt Site 2
document ltProfessor-salarygt ltIDgt 111
lt/IDgt ltsalarygt 60K lt/salarygt ltProfessor-salarygt
15XML Query
- XML-QL, XQuery, etc. are query languages for XML
- XPath is used for query specification
16Presentations of XML Documents
17Credentials in XML
ltProfessor credID9 subID 16 CIssuer
2gt ltnamegt Alice Brown lt/namegt ltuniversitygt
University of X ltuniversity/gt ltdepartmentgt CS
lt/departmentgt ltresearch-groupgt Security
lt/research-groupgt lt/Professorgt ltSecretary
credID12 subID 4 CIssuer 2gt ltnamegt
John James lt/namegt ltuniversitygt University of X
ltuniversity/gt ltdepartmentgt CS lt/departmentgt ltlev
elgt Senior lt/levelgt lt/Secretarygt
18Policies in XML
lt? Xml VERSION 1.0 ENCODING utf-8?gt
ltPolicybasegt ltpolicy-spec cred-expr
//Professordepartment CS target
annual_ report.xml path //Patent_at_Dept
CS//Node() priv VIEW/gt
ltpolicy-spec cred-expr //Professordepartment
CS target annual_ report.xml
path //Patent_at_Dept EE /Short-descr/Node()
and //Patent _at_Dept EE/authors priv
VIEW/gt ltpolicy-spec cred-expr - - -
- ltpolicy-spec cred-expr - -
-- lt/Policy-basegt Explantaion CS professors
are entitled to access all the patents of their
department. They are entitled to see only the
short descriptions and authors of patents of the
EE department
19Access Control Strategy
- Subjects request access to XML documents under
two modes Browsing and authoring - With browsing access subject can read/navigate
documents - Authoring access is needed to modify, delete,
append documents - Access control module checks the policy based and
applies policy specs - Views of the document are created based on
credentials and policy specs - In case of conflict, least access privilege rule
is enforced - Works for Push/Pull modes
20System Architecture for Access Control
User
Pull/Query
Push/result
X-Access
X-Admin
Admin Tools
Credential base
Policy base
XML Documents
21Third-Party Architecture
- The Owner is the producer of information It
specifies access control policies - The Publisher is responsible for managing (a
portion of) the Owner information and answering
subject queries - Goal Untrusted Publisher with respect to
Authenticity and Completeness checking
XML Source
policy base
Credential base
SE-XML
Owner
Publisher
Reply document
credentials
Query
User/Subject
22XML Databases
- Data is presented as XML documents
- Query language XML-QL
- Query optimization
- Managing transactions on XML documents
- Metadata management XML schemas/DTDs
- Access methods and index strategies
- XML security and integrity management
23Inference/Privacy Control
Interface to the Semantic Web
Technology By UTD
Inference Engine/ Rules Processor
Policies Ontologies Rules
XML Documents Web Pages, Databases
XML Database
24Example Policies
- Temporal Access Control
- After 1/1/05, only doctors have access to medical
records - Role-based Access Control
- Manager has access to salary information
- Project leader has access to project budgets, but
he does not have access to salary information - What happens is the manager is also the project
leader? - Positive and Negative Authorizations
- John has write access to EMP
- John does not have read access to DEPT
- John does not have write access to Salary
attribute in EMP - How are conflicts resolved?
25Privacy Policies
- Privacy constraints processing
- Simple Constraint an attribute of a document is
private - Content-based constraint If document contains
information about X, then it is private - Association-based Constraint Two or more
documents taken together is private individually
each document is public - Release constraint After X is released Y becomes
private - Augment a database system with a privacy
controller for constraint processing
26Summary and Directions
- XML is widely used
- Securing XML documents is a challenges
- How can we specify the policies discussed in this
unit in XML? - How can query modification be carried out for XML
documents? - Design access control for XML databases