Title: Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1, Scott Piao2
1A Framework to Experiment with Different NLP
Techniques
- Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1,
Scott Piao2 - 1 Computing Department, Lancaster University,
Lancaster, UK - 2 School of Computer Science, Manchester
University, U
Workshop - Issues in Ontology Development and
Use Nottingham, UK. 2007
2Index
Context Problems Research Question Objectives
Framework Brief Demo Ontolancs
Workbench Further Work
3Context
Focus
Most initiatives for Ontology Learning
combine techniques to find concepts and
relationships between them.
4Context
Focus
Most initiatives for Ontology Learning
combine techniques to find concepts and
relationships between them.
However, researchers have realised that the
output for the ontology learning process is far
from being perfect Cimmiano, 2005
Philipp Cimiano, Johanna Völker, Rudi Studer
Ontologies on Demand? - A Description of the
State-of-the-Art, Applications, Challenges and
Trends for Ontology Learning from Text
Information, Wissenschaft und Praxis 57 (6-7)
315-320. October 2006. see the special issue for
more contributions related to the Semantic Web
5Problem
6Research Question
- Can shallow semantic analysis of the kind
enabled by semantic tagging, together with a
range of other statistical NLP techniques
identify key domain concepts? - Can it do it with sufficient confidence in
the correctness and completeness of the result?
7Background..
8A Flexible Framework
Phase 1 Part-of-Speech (POS) and Semantic
annotation of corpus Domain texts are tagged
morpho-syntactically and semantically.
Phase 4 Domain Ontology Edition the bootstrap
ontology is turned into OWL. Then it is processed
using an ontology editor (Protégé) to manage the
versioning of the domain ontology and modify or
improve it.
A existing DAML ontology can be used as a
reference and to calculate precision and recall.
Phase 2 Extraction of concepts The domain
terminology is extracted from the tagged domain
corpus by identifying a list of domain candidate
terms. The system provides a set of statistical
and linguistic techniques which an ontology
engineer can combine
- Phase 3 Domain Ontology Construction Concepts
extracted during the previous phase are then
added to a concept hierarchy.
9Preliminary Results
Some researchers use different text processing
techniques such as stopword filtering,
lemmatization or stemming.
StopWord Filtering Bloehdorn et al., 2006
Lemmatization Buitelaar and Ramaka, 2005
Stemming Kietz et al, 2000
- S. Bloehdorn and P. Cimiano and A. Hotho
Learning Ontologies to Improve Text Clustering
and Classification. Proc of GFKL, 2005. - Paul Buitelaar, Srikanth Ramaka Unsupervised
Ontology-based Semantic Tagging for Knowledge
Markup In Proc. of the Workshop on Learning in
Web Search at the International Conference on
Machine Learning, Bonn, Germany, August 2005. - J.Kietz, et al., A Method for semi-automatic
ontology acquisition from a corporate intranet,
in Proc EKAW-2000 , France. 2000.
From the preliminary experiments, we can conclude
that the lemmatization technique (Group 3)
produces better results than the stemming
technique (Group 2) for the domain concept
acquisition process.
Our results are consistent with other studies.
For instance, Alkula3 suggests that the
lemmatization may be a better approach than
stemming.
3Alkula, R. 2001. From Plain Character Strings
to Meaningful Words Producing Better Full Text
Databases for Inflectional and Compounding
Languages with Morphological Analysis Software.
Inf. Retr. 4, 3-4 (Sep. 2001), 195-208.
10Brief Demo
Ontology Framework
11Conclusions
Main challenge
Our research project addresses an important
challenge of ontology research, i.e. how
quantitatively to evaluate the usefulness and
accuracy of both techniques and combinations of
techniques, when are applied to ontology learning.
Our ontology learning environment in unique in
not only providing a framework for integrating
linguistic techniques, but also possibility an
experimental platform for identifying the most
effective technique or combinations.
12Further Work
Our Project
OntoLancs A Flexible Framework For Ontology
Learning
Future Work
13The End
OntoLancs Computing Department Lancaster
University 2006, UK
14Text2Onto vs. OntoLancs
Text2Onto defines the user interaction as a core
aspect whereas our framework provides support to
process algorithms in a unsupervised mode.
Our framework provides a graphical workflow
engine to provide support for the composition of
complex ensemble techniques.
Our framework uses a plug-in-based structure as
Text2Onto. However, in contrast, it can include
techniques from existing linguistic and ontology
tools by using java APIs.
15Techniques included into OntoLancs
- Grouping by POS
- Raw Frequency Filtering
- POS Filtering
- Lemmatization
- Stemming
- StopWord Filtering
- Frequency Profiling
- Syntactic Pattern Co-ocurrences
- Window-based Collocations
- Semantic Filter (soon)