Title: Solving Some Text Mining Problems with Conceptual Graphs
1Solving Some Text Mining Problems with Conceptual
Graphs
Tula State UniversityFaculty of
Cybernetics Laboratory of Information Systems
2008
2The Nature of Text Mining
Data mining "the nontrivial extraction of
implicit, previously unknown, and potentially
useful information from data"1 "the science of
extracting useful information from large data
sets or databases."2
Text mining is interdisciplinary
- Text mining
- process of deriving
- high quality information from text
- text data mining
- text analytics
- information retrieval,
- machine learning,
- statistics,
Computational Linguistics
- W. Frawley and G. Piatetsky-Shapiro and C.
Matheus (Fall 1992), Knowledge Discovery in
Databases - An Overview, AI Magazine pp. 213228
- 2. D. Hand, H. Mannila, P. Smyth (2001).
Principles of Data Mining. MIT Press, Cambridge,
MA.
3Computational (Corpora) Linguistics
Text Mining
Natural Language Processing
Knowledge Discovery
Global Problems
- Analysis of
- syntax
- grammar
- morphology
- semantics
- text categorization,
- text clustering,
- concept/entity extraction,
- sentiment analysis,
- document summarization
Problems
- annotation
- abstraction
- ontologies
- semantic roles
- Objects of tagging
- clusters,
- trends,
- associations,
- deviations
Processing objects
- Knowledge Models
- rules
- ontologies
Metadata
- Corpora
- large and structured text
- tagging
Data Plain text
4Conceptual Graph
Example John is going to Boston by bus
5Conceptual Graphs in Digital Libraries
- Supporting CGs in Digital Libraries
- Building and storing CGs
- Automated building of CGs
- Organizing access to CGs in Datastore
- Solving applied problems with CGs
- Automated building and developing catalogues and
rubricators of DLs - KDD problems
6Supporting Conceptual Graphs Building and
storing CGs
Lexical restrictions are needed
- Standard way of building CG
- The sentences are marked with part-of-speech
tags. - Some titles and sentences from abstracts are
filtered - The selected sentences are parsed, obtaining
their syntactic tree. - The syntactic tree is traversed and the
canonical conceptual graphs related to it nodes
are joined.
- DL contains scientific papers
- Only abstracts are transformed to CGs
- Semantic Role
- Labelling helps to
- create conceptual
- relations in CGs
http//framenet.icsi.berkeley.edu/
http//wordnet.princeton.edu/
7Semantic RoleLabelling for CGs Building
- The working of a genetic algorithm is usually
explained by the search for superior building
blocks.
John is going to Boston by bus
http//l2r.cs.uiuc.edu/cogcomp/srl-demo.php
8Conceptual Graphs in Some Text Mining Problems
1. Building Association Rules
- Set of CGs
- initial set
Generalization for concepts Disjoin for relations
- transactional set
- Set of generalized CGs
- Association Rule
- Association Rule on CGs
Supported by
Having Confidence as
9Conceptual Graphs in Some Text Mining Problems
2. Building Ontologies by Aggregation of CGs
Supporting Contexts
- with CGs
- with Corpora
In analyzing the ambiguities, Wittgenstein
developed his theory of language games, which
allow words to have different senses in
different contexts, applications, or modes of
use.
10Solving Text Mining problems by CGs clustering
CGs Hierarchy
- CGs Contexts problem
- CGs Similarity problem
? Clustering algorithm for specific similarity
measures
11Conceptual Graphs Clustering
Similarity Measures
Conceptual similarity
Relational similarity
Some modifications of similarity measures
Unified similarity measure
12Genetic algorithmspeciality of decisions
Ackley test function
Fitness function trajectories
Final population
Initial population
13Genetic algorithm for clustering
GA chromosomes representing the clustering for
various encoding schemes for clustering5 (a)
group number (b) matrix (c) permutation with
the separator character 7 (d) greedy
permutation (e) order based.
Clusters X1, X3, X6, X2, X4, X5
Our encoding scheme
picks the number of object which is in the same
cluster as i -th object
14Genetic algorithm for clustering
Chain encoding for Conceptual Graphs
- realizes implicit parallelism of genetic
algorithms - forces clustering algorithm to work faster
- is invariant under similarity measure on CGs
An idea about a possibility to vary fitness
function of GA by varying its parameters
15EVO LIB ProjectSystems architecture
13
???????
????????? ??
?????
16Conceptual Graphs ClusteringData Example for
Clustering
- We assume that the modality (i.e., number of
local optima) of a fitness landscape is related
to the difficulty of finding the best point on
that landscape by evolutionary computation (e.g.,
hillclimbers and genetic algorithms (GAs)). - We first examine the limits of modality by
constructing a unimodal function and a maximally
multimodal function. - At such extremes our intuition breaks down.
- A fitness landscape consisting entirely of a
single hill leading to the global optimum proves
to be hard for hillclimbers but apparently easy
for GAs. - A provably maximally multimodal function, in
which half the points in the search space are
local optima, can be easy for both hillclimbers
and GAs. - Exploring the more realistic intermediate range
between the extremes of modality, we construct
local optima with varying degrees of attraction
to our evolutionary algorithms. - Most work on optima and their basins of
attraction has focused on hills and hillclimbers,
while some research has explored attraction for
the GA's crossover operator. - We extend the latter results by defining and
implementing maximal partial deception in
problems with k arbitrarily placed global optima.
- This allows us to create functions with multiple
local optima attractive to crossover. - The resulting maximally deceptive function has
several local optima, in addition to the global
optima, each with various size basins of
attraction for hillclimbers as well as attraction
for GA crossover. - This minimum distance function seems to be a
powerful new tool for generalizing deception and
relating hillclimbers (and Hamming space) to GAs
and crossover. - This paper describes an initial version of a
library of sharable and reusable medical
ontological theories, organized according to a
proposed classification of ontologies.
17Conceptual Graphs Clusteringclustering results
- applying conceptual nearness
- applying relational nearness
18Resume
- Conceptual graphs is the perspective tool for
modelling semantics of texts in DL. - A process of creating ontologies can be based on
technologies which use conceptual graphs. - Conceptual graphs clustering helps in solving
structural problems in DLs and in understanding
its data. - Evolutionary approach is perspective in semantic
modelling with conceptual graphs. - To progress CGs technologies, a joined efforts of
computer specialists and linguists are needed.
19References
- A World of Conceptual Graphs http//conceptualgr
aphs.org/ - Boytcheva, S. Dobrev, P. Angelova, G.CGExtract
Towards Extraction of Conceptual Graphs from
Controlled English. Lecture Notes in Computer
Science ? 2120, Springer 2001. - F. Southey J. G. Linders. Notio - A Java API for
Developing CG Tools. 7th International Conference
on Conceptual Structures, 1999. P.p. 262-271. - Hirst G. Ontology and the Lexicon. - Handbook on
Ontologies in Information Systems, Berlin
Springer, 2003. - Cole, R. M. Clustering With Genetic Algorithms
http//citeseer.ist.psu.edu/cole98clustering.html.
- Montes-y-Gomez, Gelbukh, Lopez-Lopez,
Baeza-Yates, Text Mining at Detail Level Using
Conceptual Graphs. Lecture Notes in Computer
Science Vol. 2393. Springer-Verlag, 2002. Pp. 122
- 136 - Sarbo, J. Formal conceptual structure in
language. In Dubois, D. M., editor, Proceedings
of Computing Anticipatory Systems (CASYS'98), pp.
289 - 300, Woodbury, New York. 1999. - Sowa R., Conceptual Graphs Draft Proposed
American National Standard, International
Conference on Conceptual Structures ICCS-99,
Lecture Notes in Artificial Intelligence 1640,
Springer 1999. - ????????? ?.?. , ????? ?.?. ????????????
???????????? ?????????? ?????????????. - ???.
?????. ???. ??????????. ????????. ???????????.
??? 8, ???. 3 . ???????????. - ????, 2002. - ?.
101- 107. - Holland J.H. Adaptation in Natural and Artificial
Systems, Ann Arbor The University of Michigan
Press. Reprinted by MIT, 1992. - ????????? ?.?. ????????? ??????? ??????. ????
???????, 1981. 375 ?. - ????????? ?.?., ???????? ?.?., ???????? ?.?.
?????? ? ???????? ????????????? ?????????????.
?. ?????????, 2003 - 432 ?. - ????????? ?.?. ???????????? ????????? ????????
??????, ?????????????, ??????????. ????, ?????,
2003. 152 ?. - M. Bogatyrev. Modelling Systems With Symmetry//
Proceedings of the 4 th International IMACS
Symposium of Mathematical Modelling. - Vienna,
Austria, February 5-7, 2003.- ARGESIM-Verlag,
Vienna, 2003. - pp. 270 - 275. - M. Bogatyrev, V. Latov, K. Avdeev. Symmetry
Based Decomposition and its Application in
Evolutionary Modelling. Applied Mathematica
Proc. of 8 th International Mathematica
Symposium. Avignon, 19-23 June, France, 2006