OntosMiner Family: Content Extraction from MultiLingual Document Collections - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

OntosMiner Family: Content Extraction from MultiLingual Document Collections

Description:

... usage of the results within the commercial & non-commercial organizations ... that can be processed correctly, and NON-processing of those ones, that still ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 17
Provided by: gat98
Category:

less

Transcript and Presenter's Notes

Title: OntosMiner Family: Content Extraction from MultiLingual Document Collections


1
OntosMiner FamilyContent Extraction from
MultiLingual Document Collections
  • Daniel Hladky, Ontos AG, Switzerland
  • Irina Efimenko, Moscow State University
  • Vladimir Khoroshevsky, Computer Center RAS
  • Victor Klintsov, AviComp AG, Russia

2
Presentation Map
  • Introduction
  • OntosMiner Project
  • OntosMiner Family
  • Ontos NLPs Architecture
  • Domain Ontology
  • NE Extraction
  • Extraction of Relations
  • Cognitive Maps
  • Conclusion Future Trends
  • Demos

3
OntosMiner Project
  • Objectives
  • Combining AI IT experience within the NLP
    domain
  • RD in knowledge management
  • Approaches
  • Usage of IE technologies in NL texts processing
  • Enrichment of IE techniques with NLP on the
    basis of special linguistic models
  • Representation of the NL texts meaning in the
    form of cognitive maps
  • Results
  • New generation of MIE-systems
  • Practical usage of the results within the
    commercial non-commercial organizations
  • Approaches developed for Semantic Indexing,
    Clusterization and Summarization

4
Basic Principles
  • Processing of those constructions, that can be
    processed correctly, and NON-processing of those
    ones, that still can not be processed correctly.
  • Development of reusable components for
    multi-platform implementation.
  • Providing domain ontology-driven analysis.

5
Main Requirements
  • Work with multilingual document collections (at
    present OntosMiner systems deal with English,
    French, German and Russian texts).
  • Work with monothematic document collection (at
    present it is, first of all, the so-called
    Business Duties domain. Such collections
    include informational materials about
    IT-companies, analytical materials about
    founding, investing, selling, buying, merging of
    companies, top-management CVs, etc. and Crime
    for Russian texts).
  • An adequate processing of relevant objects and
    relations, according to the concrete ontology.
  • Representation of processing results in a form of
    a cognitive map, that is a kind of semantic
    network.
  • Multi-platform implementation of all systems of
    the family.

6
OntosMiner Family Architecture
7
Domain Ontology Business Duties
8
Named Entities
  • People
  • Organizations
  • Titles and JobTitles
  • Scientific degrees
  • Various kinds of Addresses
  • Money
  • Percent
  • URL, e-mail, phone (international style)
  • Locations
  • Cars
  • Dates and Periods of Time.

9
Semantic Relations
  • Affiliate
  • Buy-Sell
  • Employ
  • Found
  • Graduate
  • Invest
  • JointVenture
  • Own
  • Rival
  • LocatedIn
  • EarnDegree
  • Employ
  • Own
  • LocatedIn
  • Reside
  • Belong
  • Hijack
  • Investigate
  • Petition
  • Crime Place Observation

10
OntosMiner family key point components
11
Evaluation
  • Materials of the companies from KM Top100
    outlined by Knowledge Management Journal have
    been used as evaluation corpus for OntosMiner
    family systems.
  • There were at about 150 texts in the corpus and
    all of them were manually marked by linguists
    from Moscow State University and the received
    markup was controlled by the specialists from
    OntosMiner Project.
  • In addition to such documents, some Russian
    texts from the Business Duties domain, as well as
    documents from Police reports collection, were
    added into corpus.

12
OntosMiner/English
13
OntosMiner/French
14
OntosMiner/Russian
15
Conclusion Future Trends
  • Multilingual CE-System has been developed
  • Achieved results are going to be used in other
    tasks resolving, such as Summarization, Semantic
    Indexing, QA
  • We are interested in collaboration and mutual
    projects with other teams

16
  • Thank You
  • for Your Attention
Write a Comment
User Comments (0)
About PowerShow.com