Semantic Annotation and Search of Software Artefacts - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Semantic Annotation and Search of Software Artefacts

Description:

Search of Software Artefacts. Valentin Tablan. Kalina Bontcheva. Danica Damljanovic. Some Terminology ... Annotation (of text) associating labels to text ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 25
Provided by: velblodVid
Category:

less

Transcript and Presenter's Notes

Title: Semantic Annotation and Search of Software Artefacts


1
Semantic Annotation andSearch of Software
Artefacts
  • Valentin Tablan
  • Kalina Bontcheva
  • Danica Damljanovic

2
Some Terminology
  • Ontology population given an ontology, populate
    it with instances derived automatically from a
    text.
  • Annotation (of text) associating labels to text
    snippets from a larger document.
  • Can be linguistic, semantic, etc...
  • Semantic annotation labels used in annotation
    are associated with an ontology
  • Can also include ontology population, as a side
    effect.

3
Annotation
4
Semantic Annotation
5
Case Study Software Artefacts
  • The GATE project
  • Open source text mining infrastructure tools.
  • Structured information
  • XML configuration files
  • Unstructured Information
  • Software documentation (JavaDocs)?
  • User guide
  • Project website
  • Publications
  • Mailing lists.

6
Case Study Software Artefacts
  • The problem?
  • Information overload 000's of pages.
  • Parameters for ANNIE Tokeniser?
  • Where to look for the answer?
  • The solution?
  • Use an ontology as a shared store.
  • Populate ontology using semantic annotation.
  • Provide search facilities backed by the ontology.

7
The GATE Ontology
8
Ontology Population Structured Data
9
GATE Ontology - populated
10
Ontology PopulationUnstructured Data
  • Extract all lexicalisations of ontological
    resources
  • Names, labels, string property values.
  • Normalise the lexicalisations
  • Extract morphological root.
  • Break text at underscores.
  • Segment CamelCaseNames.
  • Build a gazetteer with all the lexicalisations.
  • Use the gazetteer to recognise mentions in text.

11
Ontology PopulationUnstructured Data
12
Information Access Conceptual Retrieval
  • Can make use of abstractions and generalisations
    powered by ontology back-end.
  • Provides retrieval options not available to
    full-text search, e.g.
  • Capitals of countries in Asia
  • Query language very complex, somewhat similar to
    SQL ? not really suitable for end users.

13
Capitals of countries in Asia (simplified
SeRQL)?
  • select c0, p1, c3, p4, i6
  • from
  • c0 rdftype puppCapitalgt,
  • c3 p1 c0,
  • c3 rdftype puppCountrygt,
  • c3 p4 i6,
  • i6 rdftype puppContinentgt
  • where
  • p1ltpupphasCapitalgt and
  • p4ltptopsubRegionOfgt and
  • i6ltkimContinent_T.2gt

14
QuestIOQuestion-based Interface to Ontologies
  • Natural Language interface for querying knowledge
    bases.
  • Easy to use require no training.
  • Domain independent.
  • Works with short, agrammatical queries
    (Google-like).

15
QuestIO Initialisation
  • Vocabulary built automatically from the KB (hence
    domain independent).
  • Extract all possible textual descriptions from
    the ontology.
  • Normalise for morphology, lack of tokenisation,
    CamelCasing, etc.
  • Represent all lexicalisations into a GATE
    gazetteer (long init time, fast run time).

16
Query Construction
  • Identify known concepts in the NL query
  • Normalise the query for morphology, etc.
  • Find concepts by matching lexicalisations from
    the gazetteer.

Capital City
Country
Continent
Continent_T4
Asia
Capitals of countries located in Asia
17
Query Construction (II)?
  • Build a SerQL query by finding appropriate
    properties to link the concepts found.
  • Build a list of candidate properties based on
    ontology schema (using domain and range
    constraints).
  • Rank the properties.

18
Ranking Properties
  • We combine three types of scores
  • similarity score compare query fragments with
    candidate property names using Levenshtein string
    similarity metric.
  • specificity score is based on the subproperty
    relation in the ontology definition.

19
Ranking Properties (II)?
  • distance score inferring an implicit specificity
    of a property based on the level of the classes
    that are used as its domain and range.

20
Query Execution
  • Execute queries in ranking order until results
    are obtained.

21
Evaluation coverage and correctness
  • 36 questions extracted from GATE list
  • 22 out of 36 questions were answerable (the
    answer was in the knowledge base)
  • 12 correctly answered (54.5)?
  • 6 with partially corrected answer (27.3)?
  • system failed to create a SeRQL query or created
    a wrong one for 4 questions (18.2)?
  • Total score
  • 68 correctly answered
  • 32 did not answer at all or did not answer
    correctly

22
Evaluation on scalability and portability
  • Sizes of the knowledge bases created based on
  • GATE ontology http//gate.ac.uk/ns/gate-ontology
  • Travel ontology http//goodoldai.org.yu/ns/tgprot
    on.owl
  • Ontologies have not been customised or changed
    prior using with QuestIO!

23
Evaluation on scalability and portability
24
DEMO
  • http//www.gate.ac.uk/questio-client-app/search.js
    p
  • Geography ontology
  • Continents, countries, cities (capitals only).
  • Part of PROTON KIM KB
  • Example questions
  • Countries in Europe or North America
  • Capitals in Asia
  • Capitals of countries (located) in Africa
  • ...
Write a Comment
User Comments (0)
About PowerShow.com