Automatic Extraction and Integration of Knowledge from Documents - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Automatic Extraction and Integration of Knowledge from Documents

Description:

Over 5m in current active funding. From Research Councils/ Government/ Industry/ and EU ... Kowledge Management: Solcara (UK), Accenture, Quinary (I), Ontoprise(D) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 11
Provided by: fabioci
Category:

less

Transcript and Presenter's Notes

Title: Automatic Extraction and Integration of Knowledge from Documents


1
Automatic Extraction and Integration of Knowledge
from Documents
  • Professor Fabio Ciravegna
  • Natural Language Processing Group
  • Department of Computer Science,
  • University of Sheffield
  • fabio_at_dcs.shef.ac.uk

2
Sheffields NLP Research Group
  • One of the Largest NLP research groups in Europe
  • Over 45 people employed full-time
  • Over 5m in current active funding
  • From Research Councils/ Government/ Industry/ and
    EU
  • Personnel
  • 8 Academics
  • 4 Full professors
  • 1Reader
  • 1 Senior Lecturer
  • 2 Lecturers
  • 1 Very Senior Research Scientist
  • Around 30 Research Associates
  • 15 PhD student
  • 3 full time admin staff
  • 2 technical support

3
Knowledge Written Documents
  • In any organization 80-85 of a valuable
    knowledge is contained in unstructured form,
  • i.e. expressed in some forms of natural language.

Products
Volume
  • Content tends to be more valuable than
    structured information
  • more update
  • more detailed
  • Includes cases not foreseen at structured
    information design

4
Indexing and Searching in KM
  • Retrieval based on metadata
  • Documents are manually annotated
  • Identification of mentions of individuals
  • Events are rarely marked up
  • Cost of manually structuring knowledge is very
    high
  • Italian Police
  • 100 people annotating documents
  • People, weapons, locations etc. from witness and
    offender interviews, trials reports
  • 6 pages a day each on average
  • Outcome ability to trace people, weapons and
    locations
  • Keyword searching
  • Can be laborious and ineffective
  • Requires human brainwork to extract and integrate
    information

5
Keyword-based Searching
6
Needed Technology
Most of the retrieval, extraction and integration
work done by system!
Semantic Search
7
Knowledge Harvesting Technologies
  • Enables
  • Automatic creation of metadata directly from
    documents
  • Analysing large quantities of documents and data
  • Creation of a global picture
  • Of events and participants
  • Independent from the actual distribution within
    documents
  • Steps
  • Automatic information extraction
  • Facts, events, participants in single documents
  • Automatic Information integration
  • Connecting information from different documents
    and archives (including structured information)
  • Basic technology
  • Natural Language Processing
  • Machine Learning
  • Semantic Web Technologies
  • Data mining

8
Aktive DocumentPlatform
  • Support to manual annotation
  • Automatic suggestions for human annotator
  • System learns from user annotations and
    correction
  • Reduces annotation need by 80

Sticky notes to record reader/writer notes
Automatic correlation with external knowledge
9
Use for Crime Detection
  • Possibility of discovering patterns across
    multiple sources of information,
  • Emerging trends and potential threats.
  • Contribution to information management,
  • Effective planning and decision making for crime
    detection and prevention.
  • Of benefit during the collection and preparation
    of evidence,
  • Helping reduce time in evidence collection,
    provision of information, and the investigation
    overall.
  • Help investigators and barristers to retrieve the
    information relevant to a court case
  • Prosecution to demonstrate evidence (ensuring
    that criminals are convicted)
  • Defence to demonstrate innocence or attenuating
    circumstances.

10
External Collaborations
  • Some cooperation activities with
    companies/organizations
  • Intelligence
  • GCHQ (UK)
  • Lawrence Livermore Research Lab (Usa)
  • Aerospace Boeing, Rolls Royce
  • Biomedicine GlaxoSmithKline, Merck
  • Kowledge Management Solcara (UK), Accenture,
    Quinary (I), Ontoprise(D)
  • Other Environmental Agency (UK)
  • Past Cooperation with Police forces
  • Yorkshire Police
  • Italian Police
  • Amt für Auslandsfragen, Germany
  • Bundeskriminalamt, Germany
Write a Comment
User Comments (0)
About PowerShow.com