UMBC AN HONORS UNIVERSITY IN MARYLAND - PowerPoint PPT Presentation

About This Presentation
Title:

UMBC AN HONORS UNIVERSITY IN MARYLAND

Description:

Text Based Similarity Metrics and Delta for Semantic Web Graphs Krishnamurthy Viswanathan and Tim Finin, University of Maryland, Baltimore County – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 2
Provided by: Andr1524
Category:

less

Transcript and Presenter's Notes

Title: UMBC AN HONORS UNIVERSITY IN MARYLAND


1
Text Based Similarity Metrics and Delta for
Semantic Web Graphs
Krishnamurthy Viswanathan and Tim Finin,
University of Maryland, Baltimore County
Motivation
  • Case 3 Different versions of the same SW graph
  • In addition, when this case is detected, generate
    a delta between the two versions

Classification
Text similarity is very useful in information
retrie-val for near duplicate and similarity
detection
Similarity metrics computed for each candidate
pair
Approach
Naïve Bayes Classifier Similarity in classes
and properties
Naïve Bayes/SVM classifier Difference only in
Base-URI
SVM Classifier Versioning Relationship
Input corpus of SWDs
Convert to canonical form
Convert to n-triples format
Problem
Create Reduced Forms
Compute Text-Based Similarity Metrics
Identify pairs of similar documents
Generating Deltas
  • Given a collection of SW graphs as RDF
    doc-uments, identify pairs of graphs that are
    similar
  • Generate a delta for pairs of graphs identified
    as having a versioning relationship

Generate delta between versions
Identify ontology versions
Contributions
SW Graph Canonicalization
  • Defined text-based similarity metrics
    char-acterizing relations between SW graphs
  • Evaluated these metrics for three specific cases
    of similarity

ltahasCapitalgt . _x _y
ltaIsPartOfgt USA . _x ltpersonJohngt
ltalikesgt cheese . ltpersonJohngt ltalivesIngt
. _x
ltpersonJohngt ltalivesIngt _x . _x ltaIsPartOfgt
USA . ltpersonJohngt ltalikesgt cheese . _x
ltahasCapitalgt y .
Evaluation
  • Case 1 Same classes and properties used but
    differ only in literal content
  • Three datasets of 400 semantic web documents for
    training and testing
  • 17 combinations of similarity metrics tested
    Jaccard, Containment, Cosine similarity, Hamming
    distance between Simhash fingerprints

BNode Table
_g2 ltahasCapitalgt _g1 . _g2 ltaIsPartOfgt
USA . ltpersonJohngt ltalikesgt cheese
. ltpersonJohngt ltalivesIngt _g2 .
Old bnode identifier New bnode identifier
_y _g1
_x _g2
  • Assigns uniform identifiers to blank nodes
  • Provides a deterministic order to statements
  • Empirical method that works for most examples

Type of Similarity True Positives False Positives Precision Recall
Similarity in classes properties 0.986 0.014 0.987 0.986
Difference only in base URI 0.988 0.012 0.988 0.988
Versioning Relationship 0.909 0.091 0.913 0.909
Four reduced forms
  • Case 2 Differ only in base-URI
  • Only literals from the original n-triple file
  • All non-literal content from original n-triple
    file
  • Base-URI of every node replaced by
  • Literals and base-URIs replaced by

UMBCAN HONORS UNIVERSITY IN MARYLAND
Write a Comment
User Comments (0)
About PowerShow.com