Title: Dr' Bhavani Thuraisingham
1Data and Applications Security Developments and
Directions
- Dr. Bhavani Thuraisingham
- The University of Texas at Dallas
- Lecture 6
- Trustworthy Semantic Webs
- September 15, 2008
2Outline
- Semantic web
- XML and XML security
- RDF and RDF security
- Ontologies
- Rules
- Applications
- Reference
- Building trustworthy semantic web, Thuraisingham,
CRC Press, 2007
3From Todays Web to Semantic web
- Todays web
- High recall, low precision Too many web pages
resulting in searches, many not relevant - Sometimes low recall
- Results sensitive to vocabulary Different words
even if they mean the same thing do not results
in same web pages - Results are single web pages not linked web pages
- Semantic web
- Machine understandable web pages
- Activities on the web such as searching with
little or no human intervention - Technologies for knowledge management,
e-commerce, interoperability - Solutions to the problems faced by todays web
4Knowledge Management and Personal Agents
- Knowledge Management
- Corporation Need Searching, extracting and
maintaining information, uncovering hidden
dependencies, viewing information - Semantic web for knowledge management Organizing
knowledge, automated tools for maintaining
knowledge, question answering, querying multiple
documents, controlling access to documents - Personal Agent
- John is a president of a company. He needs to
have a surgery for a serious but not a critical
illness. With current web he has to check each
web page for relevant information, make decisions
depending on the information provided - With the semantic web, the agent will retrieve
all the relevant information, synthesize the
information, ask John if needed, and then present
the various options and makes recommendations
5E-Commerce
- Business to Consumer
- Users shopping on the web wrapper technology is
used to extract information about user
preferences etc. and display the products to the
user - Use of semantic web Develop software agents that
can interpret privacy requirements, pricing and
product information and display timely and
correct information to the use also provides
information about the reputation of shops - Business to Business
- Organizations work together and carrying out
transactions such as collaborating on a product,
supply chains etc. With todays web lack of
standards for data exchange - Use of semantic web XML is a big improvement,
but need to agree on vocabulary. Future will be
the use of ontologies to agree on meanings and
interpretations
6Semantic Web Technologies
- Explicit metadata
- Metadata is data about data Need metadata to be
explicitly specified so that different groups and
organizations will know what is on the web - Metadata specification languages include XML and
RDF - Ontologies
- Explicit and formal specification of
conceptualization describes a domain of
discourse relationships - Ontology languages include XML, RDF, OWL
- Logic
- Logic can be used to specify facts as well as
rules New facts and derived from existing facts
based on the inference rules - Descriptive Logic is the type of logic that has
been developed for semantic web applications
7Layered Approach Tim Berners Lees
Visionwww.w3c.org
8What is XML all about?
- XML is needed due to the limitations of HTML and
complexities of SGML - It is an extensible markup language specified by
the W3C (World Wide Web Consortium) - Designed to make the interchange of structured
documents over the Internet easier - Key to XML used to be Document Type Definitions
(DTDs) - Defines the role of each element of text in a
formal model - XML schemas have now become critical to specify
the structure - XML schemas are also XML documents
9XML Elements
XML Statement John Smith is a Professor in
Texas This can be expressed as
follows ltProfessorgt ltnamegt John Smith
lt/namegt ltstategt Texas lt/stategt lt/Professorgt
10XML Elements
Now suppose this data can be read by anyone then
we can augment the XML statement by an additional
element called access as follows. ltProfessorgt
ltnamegt John Smith lt/namegt ltstategt Texas
lt/stategt ltaccessgt All, Read lt/accessgt lt/Professor
gt
11XML Elements
If only HR can update this XML statement, then we
have the following ltProfessorgt ltnamegt John
Smith lt/namegt ltstategt Texas lt/stategt ltaccessgt
HR department, Write lt/accessgt lt/Professorgt
12XML Elements
We may not wish for everyone to know that John
Smith is a professor, but we can give out the
information that this professor is in Texas.
This can be expressed as ltProfessorgt ltnamegt
John Smith, Govt-official, Read lt/namegt ltstategt
Texas, All, Read lt/stategt ltaccessgt HR
department, Write lt/accessgt lt/Professorgt
13XML Attributes
Suppose we want to specify to access based on
attribute values. One way to specify such access
is given below. ltProfessor Name John Smith,
Access All, Read Salary 60K, Access
Administrator, Read, Write Department
Security Access All, Read lt/Professor Here
we assume that everyone can read the name John
Smith and Department Security. But only the
administrator can read and write the salary
attribute.
14XML DTD
DTDs essentially specify the structure of XML
documents. Consider the following DTD for
Professor with elements Name and State. This
will be specified as lt!ELEMENT Professor
Officer (Name, State)gt lt!ELEMENT name
(PCDATA)gt lt!ELEMENR state (PCDATA)gt lt!ELEMENT
access (PCDATA).gt
15XML Schema
While DTDs were the early attempts to specify
structure for XML documents, XML schemas are far
more elegant to specify structures. Unlike
DTDs XML schemas essentially use the XML syntax
for specification. Consider the following
example ltComplexType name
ProfessorTypegt ltSequencegt ltelement name
name type string/gt ltelement name state
type string/gt ltelement name access type
strong/gt ltSequencegt lt/ComplexTypegt
16XML Namespaces
Namespaces are used for DISAMBIGUATION ltCountryX
Academic-Institution Xmlns CountryX
http//www.CountryX.edu/Instution DTD Xmlns
USA http//www.USA.edu/Instution DTD Xmlns
UK http//www.UK.edu/Instution DTD ltUSA
Title College USA Name University of Texas
at Dallas USA State Texas ltUK Title
University UK Name Cambridge
University UK State Cambs lt/CountryX
Acedmic-Instiutiongt
17XML Namespaces
ltCountry Academic-Institution ltAccess
Government-official, Read lt/Accessgt Xmlns
CountryX http//www.CountryX.edu/Instution
DTD Xmlns USA http//www.USA.edu/Instution
DTD Xmlns UK http//www.UK.edu/Instution
DTD ltUSA Title College USA Name
University of Texas at Dallas USA State
Texas ltUK Title University UK Name
Cambridge University UK State
Cambs lt/CountryX Academic-Institutiongt
18Federations/Distribution
Site 1 document ltProfessor-namegt ltIDgt 111
lt/IDgt ltNamegt John Smith lt/namegt ltStategt Texas
lt/stategt lt/Professor-namegt Site 2
document ltProfessor-salarygt ltIDgt 111
lt/IDgt ltsalarygt 60K lt/salarygt ltProfessor-salarygt
19Credentials in XML
ltProfessor credID9 subID 16 CIssuer
2gt ltnamegt Alice Brown lt/namegt ltuniversitygt
University of X ltuniversity/gt ltdepartmentgt CS
lt/departmentgt ltresearch-groupgt Security
lt/research-groupgt lt/Professorgt ltSecretary
credID12 subID 4 CIssuer 2gt ltnamegt
John James lt/namegt ltuniversitygt University of X
ltuniversity/gt ltdepartmentgt CS lt/departmentgt ltlev
elgt Senior lt/levelgt lt/Secretarygt
20Policies in XML
lt? Xml VERSION 1.0 ENCODING utf-8?gt
ltPolicybasegt ltpolicy-spec cred-expr
//Professordepartment CS target
annual_ report.xml path //Patent_at_Dept
CS//Node() priv VIEW/gt
ltpolicy-spec cred-expr //Professordepartment
CS target annual_ report.xml
path //Patent_at_Dept EE /Short-descr/Node()
and //Patent _at_Dept EE/authors priv
VIEW/gt ltpolicy-spec cred-expr - - -
- ltpolicy-spec cred-expr - -
-- lt/Policy-basegt Explantaion CS professors
are entitled to access all the patents of their
department. They are entitled to see only the
short descriptions and authors of patents of the
EE department
21Access Control Strategy
- Subjects request access to XML documents under
two modes Browsing and authoring - With browsing access subject can read/navigate
documents - Authoring access is needed to modify, delete,
append documents - Access control module checks the policy based and
applies policy specs - Views of the document are created based on
credentials and policy specs - In case of conflict, least access privilege rule
is enforced - Works for Push/Pull modes
22System Architecture for Access Control
User
Pull/Query
Push/result
X-Access
X-Admin
Admin Tools
Credential base
Policy base
XML Documents
23Third-Party Architecture
- The Owner is the producer of information It
specifies access control policies - The Publisher is responsible for managing (a
portion of) the Owner information and answering
subject queries - Goal Untrusted Publisher with respect to
Authenticity and Completeness checking
XML Source
policy base
Credential base
SE-XML
Owner
Publisher
Reply document
credentials
Query
User/Subject
24XML Databases
- Data is presented as XML documents
- Query language XML-QL
- Query optimization
- Managing transactions on XML documents
- Metadata management XML schemas/DTDs
- Access methods and index strategies
- XML security and integrity management
25Inference/Privacy Control
Interface to the Semantic Web
Technology By UTD
Inference Engine/ Rules Processor
Policies Ontologies Rules
XML Documents Web Pages, Databases
XML Database
26Why RDF?
- XML cannot be used to specify semantics
- Example
- Professor is a subclass of Academic Staff
- Professor inherits all properties of Academic
Staff - RDF was specified so that the inadequacies of XML
could be handled - RDF uses XML Syntax
- Additional constructs are needed for RDF
27RDF
- Resource Description Framework is the essence of
the semantic web - Adds semantics with the use of ontologies, XML
syntax - RDF Concepts
- Basic Model
- Resources, Properties and Statements
- Container Model
- Bag, Sequence and Alternative
28RDF Basics
- Resource Everything is a resource
- Person, Vehicle, etc.
- Property properties describe relationships
between resources - E.g., Invented
- Statement (Object, Property, Value) Triple
- Berners Lee invented the Semantic Web
29RDF Container Model
- Bag Unordered container, may contain multiple
occurrences - Rdf Bag
- Seq Ordered container, may contain multiple
occurrences - Rdf Seq
- Alt a set of alternatives
- Rdf Alt
30RDF Specification
ltrdf RDF xmlns rdf http//w3c.org/1999/
02-22-rdf-syntax-ns xmlns xsd http//
- - - xmlns uni http// - - - - ltrdf
Description rdf about 949352 ltuni name
Berners Leelt/uninamegt ltuni titlegt
Professor lt unititlegt lt/rdf Descriptiongt ltrdf
Description rdf about ZZZ lt uni booknamegt
semantic web ltunibooknamegt lt uni authoredby
Berners Lee ltuniauthoredbygt lt/rdf
Descriptiongt lt/rdf RDFgt
31RDF Specification
- RDF specifications have been given for
Attributes, Types Nesting, Containers, etc. - How can security policies be included in the
specification - Example consider the statement Berners Les is
the Author of the book Semantic Web - Do we allow access to the connection between
author and book? Do we allow access to the
connection but not to the author name and book
name?
32RDF Policy Specification
ltrdf RDF xmlns rdf http//w3c.org/1999/
02-22-rdf-syntax-ns xmlns xsd http//
- - - xmlns uni http// - - - - ltrdf
Description rdf about 949352 ltuni name
Berners Leelt/uninamegt ltuni titlegt
Professor lt unititlegt Level L1 lt/rdf
Descriptiongt ltrdf Description rdf about
ZZZ lt uni booknamegt semantic web
ltunibooknamegt lt uni authoredby Berners Lee
ltuniauthoredbygt Level L2 lt/rdf
Descriptiongt lt/rdf RDFgt
33RDF Schema
- Need RDF Schema to specify statements such as
professor is a subclass of academic staff - ltrdfs Class rdf ID professor
- ltrdfs commentgt
- The class of Professors
- All professors are Academic Staff Members.
- ltrdfs commentgt
- ltrdfs subClassof rdf resource
academicStaffMember/gt - ltrdfs Classgt
34RDF Schema Security Policies
- How can security policies be specified?
- ltrdfs Class rdf ID professor
- ltrdfs commentgt
- The class of Professors
- All professors are Academic Staff Members.
- ltrdfs commentgt
- ltrdfs subClassof rdf resource
academicStaffMember/gt - Level L
- ltrdfs Classgt
35RDF Axiomatic Semantics
- First order logic to specify formulas and
inferencing - Built in functions (First) and predicates (Type)
- Modus Ponens
- From A and If A then B, deduce B
- Example All containers are Resources
- Type(?C, Container) ? Type(?c, Resource)
- If we have Type(A, Container) then we can infer
(Type A, Resource)
36RDF Inferencing
- While first order logic provides a proof system,
it will be computationally infeasible - As a result horn clause logic was developed for
logic programming this is still computationally
expensive - RDF uses If then Rules
- IF E contains the triples (?u, rdfs subClassof,
?v) - and (?v, rdfs subClassof ?w)
- THEN
- E also contains the triple (?u, rdfs subClassOf,
?w) - That is, if u is a subclass of v, and v is a
subclass of w, then u is a subclass of w
37RDF Query
- One can query RDF using XML, but this will be
very difficult as RDF is much richer than XML - Is there an analogy between say XQuery and a
query language for RDF? - RQL an SQL-like language has been developed for
RDF - Select from RDF document where some condition
38Policies in RDF
- How can policies be specified?
- Should policies be specified as shown in the
examples, extensions to RDF syntax? - Should policies be specified as RDF documents?
- Is there an analogy to XPath expressions for RDF
policies? - ltpolicy-spec cred-expr //Professordepartment
CS target annual_ report.xml
path //Patent_at_Dept CS//Node() priv
VIEW/gt
39Ontology
- Common definitions for any entity, person or
thing - Several ontologies have been defined and
available for use - Defining common ontology for an entity is a
challenge - Mappings have to be developed for multiple
ontologies - Specific languages have been developed for
ontologies
40Why RDF is not sufficient?
- RDF was developed as XML is not sufficient to
specify semantics - E.g., class/subclass relationship
- RDF has issues also
- Cannot express several other properties such as
Union, Interaction, relationships, etc - Need a richer language
- Ontology languages were developed by the semantic
web community for this purpose - Essentially RDF is not sufficient to specify
ontologies
41Security and Ontology
- Ontologies used to specify security policies
- Example OWL to specify security policies
- Choice between XML, RDF, OWL, Rules ML, etc.
- Security for Ontologies
- Access control on Ontologies
- Give access to certain parts of the Ontology
42OWL Background
- Its a language for ontologies and relies on RDF
- DARPA (Defense Advanced Research Projects Agency)
developed early language DAML (DARPA Agent Markup
Language) - Europeans developed OIL (Ontology Interface
Language) - DAMLOIL combines both and was the starting point
for OWL - OWL was developed by W3C
43OWL Features
- Subclass relationship
- Class membership
- Equivalence of classes
- Classification
- Consistency (e.g., x is an instance of A, A is a
subclass of B, x is not an instance of B) - Three types of OWL OWL-Full, OWL-DL, OWL-Lite
- Automated tools for managing ontologies
- Ontology engineering
44OWL Specification (e.g., Classes)
lt owl Class rdf about associateProfessorgt
ltowl disjointWith rdf resource professor/gt
ltowl disjointWith rdf resource
assistantProfessor/gt lt/owlClassgt ltowl Class
rdf ID facultygt ltowl equivalentClass rdf
resource academicStaffMember/gt lt/owl
Classgt Faculty and Academic Staff Member are the
same Associate Professor is not a
professor Associate professor is not an Assistant
professor
45OWL Specification (e.g., Property)
Courses are taught by Academic staff members lt
owl ObjectProperty rdf about isTaughtbygt
ltrdfs domain rdf resource course/gt ltrdfs
range rdf resource academicStaffMember/gt ltr
dfs subPropertyOf rdf resource
involves/gt lt/owl ObjectPropertygt
46OWL Specification (e.g., Property Restriction)
All first year courses are taught only by
professors lt owl Class rdf about
firstyearCoursegt ltrdfs subClassOfgt ltowl
Restrictiongt ltowl onProperty rdf resource
isTaughtBygt ltowl allValuesFrom rdf resource
Professor/gt lt/rdfs subClassOfgt lt/owl Classgt
47Policies in OWL
- How can policies be specified?
- Should policies be specified as shown in the
examples, extensions to OWL syntax? - Should policies be specified as OWL documents?
- Is there an analogy to XPath expressions for OWL
policies? - ltpolicy-spec cred-expr //Professordepartment
CS target annual_ report.xml
path //Patent_at_Dept CS//Node() priv
VIEW/gt
48Policies in OWL Example
lt owl Class rdf about associateProfessorgt
ltowl disjointWith rdf resource professor/gt
ltowl disjointWith rdf resource
assistantProfessor/gt Level L1 lt/owlClassgt lto
wl Class rdf ID facultygt ltowl
equivalentClass rdf resource
academicStaffMember/gt Level L2 lt/owl Classgt
49Logic and Inference
- First order predicate logic
- High level language to express knowledge
- Well understood semantics
- Logical consequence - inference
- Proof systems exist
- Sound and complete
- OWL is based on a subset of logic descriptive
logic
50Why Rules?
- RDF is built on XML and OWL is built on RDF
- We can express subclass relationships in RDF
additional relationships can be expressed in OWL - However reasoning power is still limited in OWL
- Therefore the need for rules and subsequently a
markup language for rules so that machines can
understand
51Example Rules
- Studies(X,Y), Lives(X,Z), Loc(Y,U), Loc(Z,U) ?
HomeStudent(X) - i.e. if John Studies at UTDallas and John is
lives on Campbell Road and the location of
Campbell Road and UTDallas are Richardson then
John is a Home student - Note that
- Person (X) ? Man(X) or Woman(X) is not a rule in
predicate logic - That is if X is a person then X is either a man
of a woman. This can be expressed in OWL - However we can have a rule of the form
- Person(X) and Not Man(X) ? Woman(X)
52Monotonic Rules
- ? Mother(X,Y)
- Mother(X,Y) ? Parent(X,Y)
- If Mary is the mother of John, then Mary is the
parent of John - Syntax Facts and Rules
- Rule is of the form
- B1, B2, ---- Bn ? A
- That is, if B1, B2, ---Bn hold then A holds
53Logic Programming
- Deductive logic programming is in general based
on deduction - i.e., Deduce data from existing data and rules
- e.g., Father of a father is a grandfather, John
is the father of Peter and Peter is the father of
James and therefore John is the grandfather of
James - Inductive logic programming deduces rules from
the data - e.g., John is the father of Peter, Peter is the
father of James, John is the grandfather of
James, James is the father of Robert, Peter is
the grandfather of Robert - From the above data, deduce that the father of a
father is a grandfather - Popular in Europe and Japan
54Nonmonotonic Rules
- If we have X and NOT X, we do not treat them as
inconsistent as in the case of monotonic
reasoning. - For example, consider the example of an apartment
that is acceptable to John. That is, in general
John is prepared to rent an apartment unless the
apartment ahs less than two bedrooms, is does not
allow pets etc. This can be expressed as follows - ? Acceptable(X)
- Bedroom(X,Y), Ylt2 ? NOT Acceptable(X)
- NOT Pets(X) ? NOT Acceptable(X)
- Note that there could be a contradiction. But
with nonmotonic reasoning this is allowed.
55Rule Markup
- The various components of logic are expressed in
the Rule Markup Language RuleML - Both monotonic and nonmonotnic rules can be
represented - Example representation of Fact P(a) - a is a
parent - ltfactgt
- ltatomgt
- ltpredicategtplt/predicategt
- lttermgt
- ltconstgtalt/constgt
- lttermgt
- ltatomgt
- lt/factgt
-
-
56Policies in RuleML
ltfactgt ltatomgt ltpredicategtplt/predicategt
lttermgt ltconstgtalt/constgt
lttermgt ltatomgt Level L lt/factgt
57An Application Horizontal Information Products
at Elsevier
- Elsevier is publishing company based in Amsterdam
- E.g., publisher of Computer Standards and
Interface Journal that has papers on all kinds of
computer related standards - Currently the journals and books are grouped by
topics such as say operating systems, databases,
etc. (or at a higher level, Biology, Chemistry,
etc.) - Where do we then put the journal Computer
Standards and Interfaces? - Need horizontal groupings also
58Horizontal Information Products at Elsevier
- Semantic web technologies are being used by
Elsevier - RDF for document representation
- RDF for ontologies
- Query language based on RDF to query the
documents and the ontologies - E.g. Life Science Thesaurus EMTREE
- Other publishing companies are following in
Elseviers direction
59Common Threads and Challenges
- Common Threads
- Building Ontologies for Semantics
- XML for Syntax
- Challenges
- Scalability, Resolvability
- Security policy specification, Securing the
documents and ontologies - Developing applications for secure semantic web
technologies - Automated tools for ontology management
- Creating, maintaining, evolving and querying
ontologies