Folie 1 - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Folie 1

Description:

Domain ontologies are exchangeable as long as they are written in RDFS. ... iii. Symbolization. iv. Instantiation. v. Contextualization. vi. Population ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 2
Provided by: thomaski
Category:

less

Transcript and Presenter's Notes

Title: Folie 1


1
iDocument Using Ontologies for Extracting
Structured Information from Unstructured Text
Benjamin Adrian, Heiko Maus, Andreas
DengelKnowledge Management Department, German
Research Center for Artificial Intelligence
(DFKI), Germany
  • Domain ontologies are exchangeable as long as
    they are written in RDFS.
  • The MOBIE mapping vocabulary allows to define
    relevant classes, attributes and relations for
    extraction purpose.
  • Existing instance knowledge is reused for
    information extraction purpose.
  • Extracted results are formalized in the RDF
    scheme of the input domain ontology.
  • SPARQL queries are used for defining extraction
    templates.
  • All intermediate and final extraction results
    are weighted hypothesis according to Dempster
    Shafers belief function.

Unstructured text is still the major information
carrier. Until now, Natural Language Processing
(NLP), Document Analysis and Understanding (DAU),
or Information Extraction (IE) systems have not
been able to really understand and process
information written in natural language
text. Common business domains already provide
structured knowledge in relational databases,
native applications, XML or CSV files. By using
Semantic Web technologies (RDF/S) these knowledge
sources may be integrated in an ontology and used
as contextual background knowledge for
information extraction purpose. iDocument uses
ontologies for extracting structured, domain
relevant information from unstructured text.
MOTIVATION
UNIQUE FEATURE
Template
about
Question
SCENARIO
OBIE PIPELINE
User
Ontology
Corpus
SELECT WHERE
Ontology
ExtractionTemplate
RDF Graph
OBIE Extraction Pipeline
i. Normalization
ii. Segmentation
iii. Symbolization
iv. Instantiation
v. Contextualization
vi. Population
Extract plain text and existing metadata from
document. The APERTURE Framework generates a
document frame description in
RDF. http//aperture.sourceforge.net
  • Partition text to segments
  • Paragraph
  • Sentence
  • Token
  • GATEs regular rule language JAPE provides
    efficient
  • Paragraph Detection
  • Sentence Extraction
  • Tokenization
  • http//gate.ac.uk/

Recognize known textual attribute values of
ontology concepts in text as symbols. Stores
values of each attribute in gazetteers.
Finite- state transducers recognize gazetteer
entries in text as named entities. Regular
patterns, learned from gazetteers extract
structured entities
Resolve instance candidates and relationship
candidates from symbols. Resolution rules
identify an inferred candidate closure filled
with instances and relationships that are
described with matching attribute values.
Resolve relevant relations between resolved
instances. Annotate relevant, known relations
and instances. Interpret relations between
co- occurring instances Classify structured
entities as instances. Java-based rules and
inductive learning algorithms.
Generate and populate template instances
with extracted instances and relations. Direct
ed graph traversal algorithms populate
templates hierarchically.
IMPLEMENTATION
Extraction Workbench
Template Designer
Ontology Explorer
This work was supported by Stiftung
Rheinland-Pfalz für Innovation.
VISUALIZATION
ACKNOWLEDGEMENTS
Template Instances
Template Viewer
Write a Comment
User Comments (0)
About PowerShow.com