Creating and Sharing Structured Semantic Web Contents through the Social Web - PowerPoint PPT Presentation

About This Presentation
Title:

Creating and Sharing Structured Semantic Web Contents through the Social Web

Description:

User can also define his own concept group Connect grouped islands of concepts by hierarchical relations from WordNet elaborate Eg. Hotel, guest house, ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 66
Provided by: aman69
Category:

less

Transcript and Presenter's Notes

Title: Creating and Sharing Structured Semantic Web Contents through the Social Web


1
Creating and Sharing Structured Semantic Web
Contents through the Social Web
  • (Main Evaluation)
  • Aman Shakya
  • Advisor Prof. Hideaki Takeda
  • Sub-advisors Assoc. Prof. Nigel Collier
  • Assoc. Prof. Kenro Aihara

2
Outline
  • Introduction
  • Social Semantic Web
  • State-of-art and Problems
  • Proposed approach
  • The StYLiD system
  • Concept consolidation
  • Concept grouping
  • Evaluation
  • Practical applications
  • Conclusions

3
Introduction
4
Background
  • Information Sharing
  • Information publishing
  • Understandable semantics
  • Information dissemination
  • Shared information
  • Better utilization ? Increased value
  • Shared information put together
  • Valuable knowledge

5
Social Web and Web 2.0
  • Easy to publish, understand and use
  • Information sharing platform
  • User generated contents
  • Connecting people
  • Collaboration
  • Mass participation Power of People
  • Wisdom of the crowds

6
Current Limitations and Needs
  • Data processing and automation
  • Unstructured data only for humans
  • Interoperability
  • Sharing data across
  • different applications
  • Integration
  • Combining data from
  • different applications

7
The Semantic Web
  • Web of Structured Data
  • Machine understandable semantics
  • Ontologies
  • Represent Conceptualizations of things
  • Consensus and common formats
  • Enables
  • Automated processing
  • Interoperation and Integration
  • Effective search and browsing

8
Challenges
?
  • Difficult to publish on the Semantic Web
  • Wide variety of data to share
  • Long Tail of information domains (Hunyh et
    al. 2007)
  • Not enough ontologies
  • Ontology creation is a difficult process
  • Goal - To enable people to easily share wide
    variety of semantically structured data

9
Social Semantic Web
  • Social software Semantic Web
  • Web 3.0

Social connectivity
Social Semantic Web
Information connectivity
- Adapted from (Decker, 2005)
10
State-of-Art Social Semantic Web
Structured content creation on the Social
Semantic Web
Direct Structured Contents
Derived Structured Contents
Instance Data Creation
Semantification of Social Data
Data Exporters
Semantic Blogging
Scrapers
Semantic Bookmarking
Semantics of Tags
Semantic Desktop
Semantics from Text
Semantic Annotation
Emergent Semantics
Ontology Instance Data creation
Semantic Wikis
Collaborative Ontology Creation
11
Collaborative Knowledge Base Creation
Knowledge base ontology instance data
Collaborative Knowledge Base
Users
Users
12
Collaborative Knowledge Base Creation Systems
Ease of use Expressiveness Constraints Multiplicity Consensus
Semantic Wikis SMW, ikeWiki, etc Complex extended wiki syntax, some training needed Moderate Mainly instances, concept schemas possible strict type constraints No Needed Wiki way
Freebase Metaweb Inc. Moderate Interactive but elaborate interface Moderate Concept schemas, instances strict type constraints Allowed but concepts not related Mostly needed Wiki way, by admin
my- Ontology Siorpaes Hepp, 2007 Complex understanding of ontology needed Moderate Concepts, relations, instances Strict logical constraints No Needed Wiki way
Ontology Maturing Braun et al., 2007 Fairly easy need to build taxonomy Low Concept hierarchy free tagging No Needed By interaction
Desired Solution Easy Moderate Minimum Yes Optional
13
Problems
  • Complexity and learning curve
  • Powerful collaborative systems difficult for
    ordinary people
  • Difficult to create perfect concept definitions
    and ontologies
  • Difficult to accommodate all requirements
  • Strict constraints can make the model rigid
  • Existence of multiple conceptualizations
  • Different perspectives or contexts
  • Difficulty of collaboration and consensus

14
Proposed Approach
15
Proposed Collaborative Knowledge Base Creation
Collaborative Knowledge Base
Users
Users
Users
16
Overview of Proposed Approach
Structured Data Collection
Concept Consolidation
Social Platform for Structured Data Authoring
Schema Alignment
Concepts
Instances
Concept Grouping
Structured Linked Data
Grouped concepts
Browsing, Searching, Services
Emerging Lightweight Ontologies
User Community
17
StYLiD
  • Structure Your own Linked Data
  • http//www.stylid.org
  • Social Software for
  • Sharing a wide variety of Structured Data
  • Users freely define their own concepts
  • Easy for ordinary people
  • Consolidate multiple concept schemas
  • Group and organize similar concepts
  • Popular evolving concepts definitions

18
Hotel Concept
Creating a new Concept
List of Attributes
Description
Or Reuse / Modify existing Concept
Suggested Value Range
19
Shinjuku Prince Hotel
Instance Data
Literal value
Pick value from Suggested range
Resource URI
External URI
Multiple Values
20
Concept Consolidation
  • Hotel 1
  • Name
  • Amenities
  • Capacity
  • Contact
  • Price
  • Access
  • Rating
  • Hotel 2
  • Name
  • Facilities
  • No. of rooms
  • Phone-number
  • Single room price
  • Double room price
  • Nearest station
  • Category
  • Address
  • Hotel 3
  • Name
  • Price
  • Rating
  • City
  • Country
  • Near-by attractions
  • Hotel 4
  • Name
  • Phone-number
  • Zip-code
  • Latitude
  • Longitude
  • No. of stories

same
Synonymous / different labels
Different Contexts / Perspectives
Many-to-one
Complimentary
21
  • Hotel (Consolidated Concept )
  • Name
  • Facilities
  • Capacity
  • Contact
  • Single room price
  • Double room price
  • Access
  • Rating
  • Address
  • Zip-code
  • Latitude
  • Longitude
  • Near-by attractions
  • No. of stories

Consolidated Concept
22
Concept Consolidation
  • A concept consolidation C is defined as a triple
  • lt , S, Agt where
  • - consolidated concept
  • S - set of constituent concepts C1,C2 ,..Cn
  • A is the attribute alignment between and S
  • Based on Global-as-View (GAV) approach for data
    integration (Lenzerini, 2002)
  • Global schema defined as views on source schemas
  • Consolidated Concept with consolidated
    attributes
  • aligned to source concept attributes as views

23
Concept Consolidation
lt , S, Agt
image
view
aligned( , )
aligned( , )
aligned( , )
A ,
23
24
Concept Consolidation
  • Consolidated view of instances
  • Translation of instances
  • From one conceptualization to another
  • Query Unfolding (Advantage of GAV over LAV)
  • Queries over (in terms of
    attributes)
  • to queries over C1,C2 ,..Cn
  • Using alignment A
  • Union of results
  • Translation of queries

25
Concept Cloud
Consolidated concept
Sub-Cloud
26
Experiment on Conceptualization
  • Hypothesis
  • Multiple conceptualizations by different people
    for the same thing can be consolidated
  • Methodology
  • Participants given short text passages (6
    participants)
  • List down Facts structured as
  • (Attribute, Value) table
  • All concept schemas aligned manually

attribute value
name Kiyomizu
location Kyoto
.. ..
Concept schema
26
27
Observations
Types of Alignment Relations found
Attribute label similarity
28
Remarks
  • People can express their conceptualizations in
    terms of schema
  • Different people have different
    conceptualizations
  • No one covers all possible attributes
  • Conceptualizations overlap significantly
  • Most parts can be aligned
  • Most have simple alignment relations
  • Multiple conceptualizations can be consolidated

28
29
Alignment of Concept Schemas
  • Attribute Alignments suggested Automatically
  • Alignment API implementation (with WordNet
    extension)
  • (Euzenat, 2004)
  • Community-supported alignment
  • Human intelligence Machine intelligence
  • Alignments are represented and saved
  • Alignment ontology (Hughes and Ashpole, 2004)
  • Alignment API alignment specification language
    (Euzenat et al., 2004)
  • Other formats C-OWL, SWRL, OWL axioms, XSLT,
    SEKT-ML and SKOS.
  • Incremental alignment (maintained
    collaboratively)
  • A Unified View
  • Consolidated concept with Consolidated Attributes
  • Homogenous table of data

29
30
Semi-automatic Schema Alignment
Two Hotel concepts
x
Consolidated attributes
31
Consolidated Structured Search
Find all hotels with location Tokyo and type
luxury
Search on Consolidated Concept
Hotel 1 ---- Hotel 2 location ? address type ?
category
32
Concept Grouping
  • Concept Similarity
  • ConceptSim(C1, C2) w1NameSim(N1, N2)
    w2SchemaSim(S1, S2)
  • NameSim
  • WordNet-based similarity - Lins algorithm (1998)
  • Levenshtein distance
  • SchemaSim
  • Average similarity of best matching pairs of
    attributes
  • Calculate ConceptSim between all pairs of
    concepts
  • Group similar concepts above Threshold

32
33
Schema Similarity
  • Calculate NameSim for all pairs of attributes to
    create an n1n2 matrix
  • M NameSim(A1X A2)
  • Find best matching pairs using
  • Hungarian Algorithm (M)
  • (Kuhn, 1955 Munkres, 1957)
  • Calculate matching average
  • SchemaSim(S1, S2) 2x?Similarity of best
    matching pairs / (A1A2)
  • Adapted from Semantic similarity between
    sentences (Simpson and Dao, 2005)

S1
S2
A2
A1
34
Visualization of Concepts Grouping
Cytoscape
35
Experiments on Freebase Data
  • Purpose
  • Evaluate automatic schema alignment
  • Evaluate proposed concept grouping method
  • Observations about user-defined concepts
  • Community-driven database of worlds information
  • User-defined Types concept schemas
  • Queried out (May 20, 2008)
  • Cleaning
  • Filter out test types, stop-words, types without
    instances

35
36
Observations
  • After cleaning
  • 1,412 concepts
  • 500 users who defined concepts
  • People want to share a wide variety of data
  • People define their own concept schemas
  • Most people only define few concepts (1-5)
  • Long tail of information types

37
Freebase Concept Consolidation
  • Concepts with same name, synonyms, morphological
    variants
  • 57 consolidated concepts formed
  • Multiple versions of concept by different users
  • Up to 6 versions of the same concept
  • Same user also defines multiple versions
  • Alignments suggested automatically
  • 51 alignment relations (44 aligned attribute
    sets)
  • Human judgement
  • Precision 88.24
  • Recall 67.16

37
38
Concept Consolidation Example
  • Recipe (user1), Recipe (user2), Recipes
    (user3) .
  • r1 r2 r3
  • Consolidated concept - Recipe
  • Consolidated attributes
  • r1ingredient, r2ingredients, r3materials
  • r1steps, r2instructions
  • r3directions
  • r2tools_required
  • r3taste
  • r3author

Aligned attribute Sets
(adapted from Freebase)
38
39
Evaluation of Concept Grouping
  • ConceptSim(C1, C2) w1NameSim(N1, N2)
    w2SchemaSim(S1, S2)

Concept grouping with different thresholds (w1
0.7, w2 0.3)
Concept grouping with different weights
(threshold 0.8)
39
40
Emergence of Lightweight Ontologies
  • Concepts contributed by community
  • Concept consolidation
  • Concept grouping
  • Popularity of concepts (as in Tag clouds)
  • Common vocabulary for structured information
    sharing
  • Conceptual schemas (class/property)
  • Informal organization by similarity

41
Informal Lightweight Ontology
source Schaffert et al. (2005) p. 7
42
Evaluation
43
Evaluation of Usability
  • Hypothesis
  • StYLiD is more usable than Freebase (for given
    tasks)
  • Methodology
  • Tasks performed with StYLiD and Freebase
  • Task 1 - Structured data authoring
  • Task 2 - Concept schema creation
  • Task 3, 4 - Modifying and reusing concepts
  • Task 5 - Structured concepts and instances
    authoring
  • Task 6 - Searching
  • Observations
  • Questionnaires, screen logs, comments, etc

44
Example (Task 1)
Input Band The Beatles
45
Participants
  • Total 15 participants
  • Including 6 without IT background
  • Different backgrounds
  • Public policy, international relations,
    psychology, telecommunication, networks, hotel
    staff, etc.
  • From 10 countries
  • Age 22 43 (avg. 28.3)
  • Most did not know the systems before

46
Results
  • System Usability Scale (SUS) (Digital Equipment
    Corp.)
  • Average scores StYLiD 69.7, Freebase 39.3
  • Enhanced Semantic MediaWiki 54.8 (Pfisterer et
    al., 2008)
  • Aggregated results from the Tasks (score 0-4)

47
Results for non-IT participants
  • 6 participants
  • SUS scores
  • StYLiD (71.67), Freebase (50.42)

48
Observations
  • StYLiD quite usable without any training,
    knowledge or help
  • Most users preferred StYLiD to Freebase
  • Specifying attribute value range not easy
  • Strict data type constraints can cause problems
  • Many people modify and reuse concepts
  • People try to input all data in minimum steps
  • Data entry can be made easier and quicker
  • Auto-complete mechanisms would be helpful

49
Comparison with some systems
StYLiD Freebase Semantic MediaWiki
Concept creation UI supported UI supported Template markup
Instance creation Form-based Form-based Extended wiki syntax forms
Data authoring Blogging / social bookmarking Structured wiki Wiki text annotation
Data import Wrappers Bulk import facility Not possible
Constraints Flexible Strict type constraints Strict type constraints
Multiplicity Allowed Partly No
Consolidation Schema-level Some instances No
Organization Concept grouping Bases Categories
50
Practical Applications
51
Application Scenarios
  • Social Site for
  • Structured Information Sharing

Users
Concept Schemas
External Data Resources
Structured data
Information Sharing Social Semantic Website
Users
51
52
Application Scenarios
  • Integrated Semantic portal

IS1
Structured data
Wrapper1
IS2
Wrapper2
Wrapper3
IS3
External Data Resources
Concept Schemas
Information Sources
Integrated Semantic Portal
Users
Admin
52
53
Adapting to different scenarios
  • Variable aspects
  • Data and concepts acquisition
  • Community and motivation
  • Functionalities and constraints
  • Data quality
  • Ways of adaptation
  • Use of wrappers, etc.
  • Delegate functionalities/constraints
  • Extensible and customizable open source
  • Customized queries and views

54
Real practical applications
  • Integration of research staff directories
  • Osaka university and Nagoya university
  • Data scraped from the websites
  • A musical community website in Tokyo
    International Exchange Center
  • Social data bookmarking site StYLiD.org
  • A document management system in AIT

55
University Directory Integration
  • 10 alignments automatically suggested
  • All correct
  • Total 19 alignments

56
Integrated interface
57
TIEC Musical Community website
58
StYLiD.org Data Bookmarking
59
Document Management system
60
Structured Information Dissemination in
Decentralized Communities
SocioBiblog System
SocioBiblog System
Publishing
Publishing
Aggregation
Aggregation
Social network links
Web
Extended RSS
SocioBiblog System
SocioBiblog System
Publishing
Publishing
Aggregation
Aggregation
60
61
Conclusions
62
Conclusions
  • Social web application for sharing structured
    Semantic Web contents
  • StYLiD
  • Free contribution, no strict constraints
  • Usable (even without training)
  • Concept consolidation
  • Multiple conceptualizations exist
  • Overlap significantly and can be consolidated
  • Automatic alignments with good precision and
    recall
  • A loose collaborative approach for creating
    shared concept definitions

63
Conclusions (contd.)
  • Concept grouping by similarity
  • Informal organization
  • Good precision can be obtained
  • Parameters can be tuned for appropriate coverage
    and precision
  • Emergent lightweight informal ontologies
  • Ontology as by-product of information sharing and
    integration
  • Practical applications

64
Future Directions
  • Computing concept relations
  • Hierarchical and non-hierarchical
  • Better schema alignment techniques
  • Consolidation of data instances
  • Using existing vocabularies
  • Mash-ups / plugins to utilize structured data
  • Scrapers to collect data from the web

65
Thank You!
  • Questions
  • Suggestions
Write a Comment
User Comments (0)
About PowerShow.com