Title: Ontea: Pattern based Annotation Platform
1Ontea Pattern based Annotation Platform
2Ontea Method
- Motivation
- To create semantic meta data from texts or
documents - Approach
- Even unstructured text contains patterns
- Patterns can be used to extract various objects
from text - Results are key - value pairs
- Such pairs can be transformed to ontology
individuals - Class individual
- Individual property
3Result Examples
- Text
- Bratislava is the capital of Slovakia. Slovakia
is in Europe. - Pattern (inby) (the)? (A-Za-z) for
Location - Ontea discovers key value pair
- Location Europe
- By transformation to ontology knowledge base - it
finds Europe as continent using inference
(sub-class of Location) - Continent Europe
- More Examples are in the table
Text Key value Patterns regular expressions
1 Apple, Inc. Company Apple Company (A-Za-z0-9), (IncLtd)
2 Mountain View, CA 94043 Settlement Mountain View Settlement (A-Za-z A-Za-z) A-Z2 0-95
3 laclavik.ui_at_savba.sk Email laclavik.ui_at_savba.sk Email -_.a-z0-9_at_-_.a-zA-Z0-9\.a-z2,8
4 Mr. Michal Laclavik Person Michal Laclavik Person (Mr.Mrs.Dr.) (A-Za-z A-Za-z)
4Features
- Identification of concept instances from the
ontology - Automatic population of ontologies with instances
- Identifying relevance, when creating instances
using information retrieval techniques - Large scale semantic annotation of documents or
texts using Googles MapReduce architecture.
5Advantages
- Simple, customizable method
- Not tied to document structure
- Architecture build on detection of key-value
pairs and its various transformation. For
example - Text Slovensko je v Európegt
- Extraction Location Európe gt
- Transformation, Lemmatization Location Európa
gt - Transformation, Ontology Continent Europe
- Scalable method. Ported to Grid and Hadoop.
- Applicable on texts in any language
- Success rate 60-90 depending on used patterns,
transformers and application
6Integration with other tools
DocConverter
URL
Plain Text
Nalit
Language Identification
Ontea
Pattern Matching
Morphonary
Transformation Lemmatization
Transformation Individual Search and Creation
Transformation Relevance Identification
Lucene
Ontology Repository
7Future research development
- http//ontea.sourceforge.net/