Ontea: Pattern based Annotation Platform - PowerPoint PPT Presentation

About This Presentation
Title:

Ontea: Pattern based Annotation Platform

Description:

Email: laclavik.ui_at_savba.sk. laclavik.ui_at_savba.sk. 3 ... Integration with other tools. Ontea. DocConverter. Nalit. Morphonary. Lucene. URL. Plain Text ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 8
Provided by: ZB4
Category:

less

Transcript and Presenter's Notes

Title: Ontea: Pattern based Annotation Platform


1
Ontea Pattern based Annotation Platform
  • Michal Laclavík

2
Ontea Method
  • Motivation
  • To create semantic meta data from texts or
    documents
  • Approach
  • Even unstructured text contains patterns
  • Patterns can be used to extract various objects
    from text
  • Results are key - value pairs
  • Such pairs can be transformed to ontology
    individuals
  • Class individual
  • Individual property

3
Result Examples
  • Text
  • Bratislava is the capital of Slovakia. Slovakia
    is in Europe.
  • Pattern (inby) (the)? (A-Za-z) for
    Location
  • Ontea discovers key value pair
  • Location Europe
  • By transformation to ontology knowledge base - it
    finds Europe as continent using inference
    (sub-class of Location)
  • Continent Europe
  • More Examples are in the table

Text Key value Patterns regular expressions
1 Apple, Inc. Company Apple Company (A-Za-z0-9), (IncLtd)
2 Mountain View, CA 94043 Settlement Mountain View Settlement (A-Za-z A-Za-z) A-Z2 0-95
3 laclavik.ui_at_savba.sk Email laclavik.ui_at_savba.sk Email -_.a-z0-9_at_-_.a-zA-Z0-9\.a-z2,8
4 Mr. Michal Laclavik Person Michal Laclavik Person (Mr.Mrs.Dr.) (A-Za-z A-Za-z)
4
Features
  • Identification of concept instances from the
    ontology
  • Automatic population of ontologies with instances
  • Identifying relevance, when creating instances
    using information retrieval techniques
  • Large scale semantic annotation of documents or
    texts using Googles MapReduce architecture.

5
Advantages
  • Simple, customizable method
  • Not tied to document structure
  • Architecture build on detection of key-value
    pairs and its various transformation. For
    example
  • Text Slovensko je v Európegt
  • Extraction Location Európe gt
  • Transformation, Lemmatization Location Európa
    gt
  • Transformation, Ontology Continent Europe
  • Scalable method. Ported to Grid and Hadoop.
  • Applicable on texts in any language
  • Success rate 60-90 depending on used patterns,
    transformers and application

6
Integration with other tools
DocConverter
URL
Plain Text
Nalit
Language Identification
Ontea
Pattern Matching
Morphonary
Transformation Lemmatization
Transformation Individual Search and Creation
Transformation Relevance Identification
Lucene
Ontology Repository
7
Future research development
  • http//ontea.sourceforge.net/
Write a Comment
User Comments (0)
About PowerShow.com