Title: Creating and Sharing Structured Semantic Web Contents through the Social Web
1Creating and Sharing Structured Semantic Web
Contents through the Social Web
- (Main Evaluation)
- Aman Shakya
- Advisor Prof. Hideaki Takeda
- Sub-advisors Assoc. Prof. Nigel Collier
- Assoc. Prof. Kenro Aihara
2Outline
- Introduction
- Social Semantic Web
- State-of-art and Problems
- Proposed approach
- The StYLiD system
- Concept consolidation
- Concept grouping
- Evaluation
- Practical applications
- Conclusions
3Introduction
4Background
- Information Sharing
- Information publishing
- Understandable semantics
- Information dissemination
- Shared information
- Better utilization ? Increased value
- Shared information put together
- Valuable knowledge
5Social Web and Web 2.0
- Easy to publish, understand and use
- Information sharing platform
- User generated contents
- Connecting people
- Collaboration
- Mass participation Power of People
- Wisdom of the crowds
6Current Limitations and Needs
- Data processing and automation
- Unstructured data only for humans
- Interoperability
- Sharing data across
- different applications
- Integration
- Combining data from
- different applications
7The Semantic Web
- Web of Structured Data
- Machine understandable semantics
- Ontologies
- Represent Conceptualizations of things
- Consensus and common formats
- Enables
- Automated processing
- Interoperation and Integration
- Effective search and browsing
8Challenges
?
- Difficult to publish on the Semantic Web
- Wide variety of data to share
- Long Tail of information domains (Hunyh et
al. 2007) - Not enough ontologies
- Ontology creation is a difficult process
- Goal - To enable people to easily share wide
variety of semantically structured data
9Social Semantic Web
- Social software Semantic Web
- Web 3.0
Social connectivity
Social Semantic Web
Information connectivity
- Adapted from (Decker, 2005)
10State-of-Art Social Semantic Web
Structured content creation on the Social
Semantic Web
Direct Structured Contents
Derived Structured Contents
Instance Data Creation
Semantification of Social Data
Data Exporters
Semantic Blogging
Scrapers
Semantic Bookmarking
Semantics of Tags
Semantic Desktop
Semantics from Text
Semantic Annotation
Emergent Semantics
Ontology Instance Data creation
Semantic Wikis
Collaborative Ontology Creation
11Collaborative Knowledge Base Creation
Knowledge base ontology instance data
Collaborative Knowledge Base
Users
Users
12Collaborative Knowledge Base Creation Systems
Ease of use Expressiveness Constraints Multiplicity Consensus
Semantic Wikis SMW, ikeWiki, etc Complex extended wiki syntax, some training needed Moderate Mainly instances, concept schemas possible strict type constraints No Needed Wiki way
Freebase Metaweb Inc. Moderate Interactive but elaborate interface Moderate Concept schemas, instances strict type constraints Allowed but concepts not related Mostly needed Wiki way, by admin
my- Ontology Siorpaes Hepp, 2007 Complex understanding of ontology needed Moderate Concepts, relations, instances Strict logical constraints No Needed Wiki way
Ontology Maturing Braun et al., 2007 Fairly easy need to build taxonomy Low Concept hierarchy free tagging No Needed By interaction
Desired Solution Easy Moderate Minimum Yes Optional
13Problems
- Complexity and learning curve
- Powerful collaborative systems difficult for
ordinary people - Difficult to create perfect concept definitions
and ontologies - Difficult to accommodate all requirements
- Strict constraints can make the model rigid
- Existence of multiple conceptualizations
- Different perspectives or contexts
- Difficulty of collaboration and consensus
14Proposed Approach
15Proposed Collaborative Knowledge Base Creation
Collaborative Knowledge Base
Users
Users
Users
16Overview of Proposed Approach
Structured Data Collection
Concept Consolidation
Social Platform for Structured Data Authoring
Schema Alignment
Concepts
Instances
Concept Grouping
Structured Linked Data
Grouped concepts
Browsing, Searching, Services
Emerging Lightweight Ontologies
User Community
17StYLiD
- Structure Your own Linked Data
- http//www.stylid.org
- Social Software for
- Sharing a wide variety of Structured Data
- Users freely define their own concepts
- Easy for ordinary people
- Consolidate multiple concept schemas
- Group and organize similar concepts
- Popular evolving concepts definitions
18Hotel Concept
Creating a new Concept
List of Attributes
Description
Or Reuse / Modify existing Concept
Suggested Value Range
19Shinjuku Prince Hotel
Instance Data
Literal value
Pick value from Suggested range
Resource URI
External URI
Multiple Values
20Concept Consolidation
- Hotel 1
- Name
- Amenities
- Capacity
- Contact
- Price
- Access
- Rating
- Hotel 2
- Name
- Facilities
- No. of rooms
- Phone-number
- Single room price
- Double room price
- Nearest station
- Category
- Address
- Hotel 3
- Name
- Price
- Rating
- City
- Country
- Near-by attractions
- Hotel 4
- Name
- Phone-number
- Zip-code
- Latitude
- Longitude
- No. of stories
same
Synonymous / different labels
Different Contexts / Perspectives
Many-to-one
Complimentary
21- Hotel (Consolidated Concept )
- Name
- Facilities
- Capacity
- Contact
- Single room price
- Double room price
- Access
- Rating
- Address
- Zip-code
- Latitude
- Longitude
- Near-by attractions
- No. of stories
Consolidated Concept
22Concept Consolidation
- A concept consolidation C is defined as a triple
- lt , S, Agt where
- - consolidated concept
- S - set of constituent concepts C1,C2 ,..Cn
- A is the attribute alignment between and S
- Based on Global-as-View (GAV) approach for data
integration (Lenzerini, 2002) - Global schema defined as views on source schemas
- Consolidated Concept with consolidated
attributes - aligned to source concept attributes as views
23Concept Consolidation
lt , S, Agt
image
view
aligned( , )
aligned( , )
aligned( , )
A ,
23
24Concept Consolidation
- Consolidated view of instances
- Translation of instances
- From one conceptualization to another
- Query Unfolding (Advantage of GAV over LAV)
- Queries over (in terms of
attributes) - to queries over C1,C2 ,..Cn
- Using alignment A
- Union of results
- Translation of queries
25Concept Cloud
Consolidated concept
Sub-Cloud
26Experiment on Conceptualization
- Hypothesis
- Multiple conceptualizations by different people
for the same thing can be consolidated - Methodology
- Participants given short text passages (6
participants) - List down Facts structured as
- (Attribute, Value) table
- All concept schemas aligned manually
attribute value
name Kiyomizu
location Kyoto
.. ..
Concept schema
26
27Observations
Types of Alignment Relations found
Attribute label similarity
28Remarks
- People can express their conceptualizations in
terms of schema - Different people have different
conceptualizations - No one covers all possible attributes
- Conceptualizations overlap significantly
- Most parts can be aligned
- Most have simple alignment relations
- Multiple conceptualizations can be consolidated
28
29Alignment of Concept Schemas
- Attribute Alignments suggested Automatically
- Alignment API implementation (with WordNet
extension) - (Euzenat, 2004)
- Community-supported alignment
- Human intelligence Machine intelligence
- Alignments are represented and saved
- Alignment ontology (Hughes and Ashpole, 2004)
- Alignment API alignment specification language
(Euzenat et al., 2004) - Other formats C-OWL, SWRL, OWL axioms, XSLT,
SEKT-ML and SKOS. - Incremental alignment (maintained
collaboratively) - A Unified View
- Consolidated concept with Consolidated Attributes
- Homogenous table of data
29
30Semi-automatic Schema Alignment
Two Hotel concepts
x
Consolidated attributes
31Consolidated Structured Search
Find all hotels with location Tokyo and type
luxury
Search on Consolidated Concept
Hotel 1 ---- Hotel 2 location ? address type ?
category
32Concept Grouping
- Concept Similarity
- ConceptSim(C1, C2) w1NameSim(N1, N2)
w2SchemaSim(S1, S2) - NameSim
- WordNet-based similarity - Lins algorithm (1998)
- Levenshtein distance
- SchemaSim
- Average similarity of best matching pairs of
attributes - Calculate ConceptSim between all pairs of
concepts - Group similar concepts above Threshold
32
33Schema Similarity
- Calculate NameSim for all pairs of attributes to
create an n1n2 matrix - M NameSim(A1X A2)
- Find best matching pairs using
- Hungarian Algorithm (M)
- (Kuhn, 1955 Munkres, 1957)
- Calculate matching average
- SchemaSim(S1, S2) 2x?Similarity of best
matching pairs / (A1A2) - Adapted from Semantic similarity between
sentences (Simpson and Dao, 2005)
S1
S2
A2
A1
34Visualization of Concepts Grouping
Cytoscape
35Experiments on Freebase Data
- Purpose
- Evaluate automatic schema alignment
- Evaluate proposed concept grouping method
- Observations about user-defined concepts
- Community-driven database of worlds information
- User-defined Types concept schemas
- Queried out (May 20, 2008)
- Cleaning
- Filter out test types, stop-words, types without
instances
35
36Observations
- After cleaning
- 1,412 concepts
- 500 users who defined concepts
- People want to share a wide variety of data
- People define their own concept schemas
- Most people only define few concepts (1-5)
- Long tail of information types
37Freebase Concept Consolidation
- Concepts with same name, synonyms, morphological
variants - 57 consolidated concepts formed
- Multiple versions of concept by different users
- Up to 6 versions of the same concept
- Same user also defines multiple versions
- Alignments suggested automatically
- 51 alignment relations (44 aligned attribute
sets) - Human judgement
- Precision 88.24
- Recall 67.16
37
38Concept Consolidation Example
- Recipe (user1), Recipe (user2), Recipes
(user3) . - r1 r2 r3
- Consolidated concept - Recipe
- Consolidated attributes
- r1ingredient, r2ingredients, r3materials
- r1steps, r2instructions
- r3directions
- r2tools_required
- r3taste
- r3author
Aligned attribute Sets
(adapted from Freebase)
38
39Evaluation of Concept Grouping
- ConceptSim(C1, C2) w1NameSim(N1, N2)
w2SchemaSim(S1, S2)
Concept grouping with different thresholds (w1
0.7, w2 0.3)
Concept grouping with different weights
(threshold 0.8)
39
40Emergence of Lightweight Ontologies
- Concepts contributed by community
- Concept consolidation
- Concept grouping
- Popularity of concepts (as in Tag clouds)
- Common vocabulary for structured information
sharing - Conceptual schemas (class/property)
- Informal organization by similarity
41Informal Lightweight Ontology
source Schaffert et al. (2005) p. 7
42Evaluation
43Evaluation of Usability
- Hypothesis
- StYLiD is more usable than Freebase (for given
tasks) - Methodology
- Tasks performed with StYLiD and Freebase
- Task 1 - Structured data authoring
- Task 2 - Concept schema creation
- Task 3, 4 - Modifying and reusing concepts
- Task 5 - Structured concepts and instances
authoring - Task 6 - Searching
- Observations
- Questionnaires, screen logs, comments, etc
44Example (Task 1)
Input Band The Beatles
45Participants
- Total 15 participants
- Including 6 without IT background
- Different backgrounds
- Public policy, international relations,
psychology, telecommunication, networks, hotel
staff, etc. - From 10 countries
- Age 22 43 (avg. 28.3)
- Most did not know the systems before
46Results
- System Usability Scale (SUS) (Digital Equipment
Corp.) - Average scores StYLiD 69.7, Freebase 39.3
- Enhanced Semantic MediaWiki 54.8 (Pfisterer et
al., 2008) - Aggregated results from the Tasks (score 0-4)
47Results for non-IT participants
- 6 participants
- SUS scores
- StYLiD (71.67), Freebase (50.42)
48Observations
- StYLiD quite usable without any training,
knowledge or help - Most users preferred StYLiD to Freebase
- Specifying attribute value range not easy
- Strict data type constraints can cause problems
- Many people modify and reuse concepts
- People try to input all data in minimum steps
- Data entry can be made easier and quicker
- Auto-complete mechanisms would be helpful
49Comparison with some systems
StYLiD Freebase Semantic MediaWiki
Concept creation UI supported UI supported Template markup
Instance creation Form-based Form-based Extended wiki syntax forms
Data authoring Blogging / social bookmarking Structured wiki Wiki text annotation
Data import Wrappers Bulk import facility Not possible
Constraints Flexible Strict type constraints Strict type constraints
Multiplicity Allowed Partly No
Consolidation Schema-level Some instances No
Organization Concept grouping Bases Categories
50Practical Applications
51Application Scenarios
- Social Site for
- Structured Information Sharing
Users
Concept Schemas
External Data Resources
Structured data
Information Sharing Social Semantic Website
Users
51
52Application Scenarios
- Integrated Semantic portal
IS1
Structured data
Wrapper1
IS2
Wrapper2
Wrapper3
IS3
External Data Resources
Concept Schemas
Information Sources
Integrated Semantic Portal
Users
Admin
52
53Adapting to different scenarios
- Variable aspects
- Data and concepts acquisition
- Community and motivation
- Functionalities and constraints
- Data quality
- Ways of adaptation
- Use of wrappers, etc.
- Delegate functionalities/constraints
- Extensible and customizable open source
- Customized queries and views
54Real practical applications
- Integration of research staff directories
- Osaka university and Nagoya university
- Data scraped from the websites
- A musical community website in Tokyo
International Exchange Center - Social data bookmarking site StYLiD.org
- A document management system in AIT
55University Directory Integration
- 10 alignments automatically suggested
- All correct
- Total 19 alignments
56Integrated interface
57TIEC Musical Community website
58StYLiD.org Data Bookmarking
59Document Management system
60Structured Information Dissemination in
Decentralized Communities
SocioBiblog System
SocioBiblog System
Publishing
Publishing
Aggregation
Aggregation
Social network links
Web
Extended RSS
SocioBiblog System
SocioBiblog System
Publishing
Publishing
Aggregation
Aggregation
60
61Conclusions
62Conclusions
- Social web application for sharing structured
Semantic Web contents - StYLiD
- Free contribution, no strict constraints
- Usable (even without training)
- Concept consolidation
- Multiple conceptualizations exist
- Overlap significantly and can be consolidated
- Automatic alignments with good precision and
recall - A loose collaborative approach for creating
shared concept definitions
63Conclusions (contd.)
- Concept grouping by similarity
- Informal organization
- Good precision can be obtained
- Parameters can be tuned for appropriate coverage
and precision - Emergent lightweight informal ontologies
- Ontology as by-product of information sharing and
integration - Practical applications
64Future Directions
- Computing concept relations
- Hierarchical and non-hierarchical
- Better schema alignment techniques
- Consolidation of data instances
- Using existing vocabularies
- Mash-ups / plugins to utilize structured data
- Scrapers to collect data from the web
65Thank You!