Solving Some Text Mining Problems with Conceptual Graphs - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Solving Some Text Mining Problems with Conceptual Graphs

Description:

W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992), Knowledge ... sentiment analysis, document summarization. Natural Language Processing. annotation ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 20
Provided by: rcdl20
Category:

less

Transcript and Presenter's Notes

Title: Solving Some Text Mining Problems with Conceptual Graphs


1
Solving Some Text Mining Problems with Conceptual
Graphs
Tula State UniversityFaculty of
Cybernetics Laboratory of Information Systems
  • M. Bogatyrev, V. Tuhtin

2008
2
The Nature of Text Mining
Data mining "the nontrivial extraction of
implicit, previously unknown, and potentially
useful information from data"1 "the science of
extracting useful information from large data
sets or databases."2
Text mining is interdisciplinary
  • Text mining
  • process of deriving
  • high quality information from text
  • text data mining
  • text analytics
  • information retrieval,
  • machine learning,
  • statistics,

Computational Linguistics
  • W. Frawley and G. Piatetsky-Shapiro and C.
    Matheus (Fall 1992), Knowledge Discovery in
    Databases
  • An Overview, AI Magazine pp. 213228
  • 2. D. Hand, H. Mannila, P. Smyth (2001).
    Principles of Data Mining. MIT Press, Cambridge,
    MA.

3
Computational (Corpora) Linguistics
Text Mining
Natural Language Processing
Knowledge Discovery
Global Problems
  • Analysis of
  • syntax
  • grammar
  • morphology
  • semantics
  • text categorization,
  • text clustering,
  • concept/entity extraction,
  • sentiment analysis,
  • document summarization

Problems
  • annotation
  • abstraction
  • ontologies
  • semantic roles
  • Objects of tagging
  • clusters,
  • trends,
  • associations,
  • deviations

Processing objects
  • Knowledge Models
  • rules
  • ontologies

Metadata
  • Corpora
  • large and structured text
  • tagging

Data Plain text
4
Conceptual Graph
Example John is going to Boston by bus
5
Conceptual Graphs in Digital Libraries
  • Supporting CGs in Digital Libraries
  • Building and storing CGs
  • Automated building of CGs
  • Organizing access to CGs in Datastore
  • Solving applied problems with CGs
  • Automated building and developing catalogues and
    rubricators of DLs
  • KDD problems

6
Supporting Conceptual Graphs Building and
storing CGs
Lexical restrictions are needed
  • Standard way of building CG
  • The sentences are marked with part-of-speech
    tags.
  • Some titles and sentences from abstracts are
    filtered
  • The selected sentences are parsed, obtaining
    their syntactic tree.
  • The syntactic tree is traversed and the
    canonical conceptual graphs related to it nodes
    are joined.
  1. DL contains scientific papers
  2. Only abstracts are transformed to CGs
  • Semantic Role
  • Labelling helps to
  • create conceptual
  • relations in CGs

http//framenet.icsi.berkeley.edu/
http//wordnet.princeton.edu/
7
Semantic RoleLabelling for CGs Building
  • The working of a genetic algorithm is usually
    explained by the search for superior building
    blocks.

John is going to Boston by bus
http//l2r.cs.uiuc.edu/cogcomp/srl-demo.php
8
Conceptual Graphs in Some Text Mining Problems
1. Building Association Rules
- Set of CGs
- initial set
Generalization for concepts Disjoin for relations
- transactional set
  • subsets represented
  • in T

- Set of generalized CGs
- Association Rule
- Association Rule on CGs
Supported by
Having Confidence as
9
Conceptual Graphs in Some Text Mining Problems
2. Building Ontologies by Aggregation of CGs
Supporting Contexts
- with CGs
- with Corpora
In analyzing the ambiguities, Wittgenstein
developed his theory of language games, which
allow words to have different senses in
different contexts, applications, or modes of
use.
10
Solving Text Mining problems by CGs clustering
CGs Hierarchy
  • CGs Contexts problem
  • CGs Similarity problem

? Clustering algorithm for specific similarity
measures
11
Conceptual Graphs Clustering
Similarity Measures
Conceptual similarity
Relational similarity
Some modifications of similarity measures
Unified similarity measure
12
Genetic algorithmspeciality of decisions
Ackley test function
Fitness function trajectories
Final population
Initial population
13
Genetic algorithm for clustering
GA chromosomes representing the clustering for
various encoding schemes for clustering5 (a)
group number (b) matrix (c) permutation with
the separator character 7 (d) greedy
permutation (e) order based.
Clusters X1, X3, X6, X2, X4, X5
Our encoding scheme
picks the number of object which is in the same
cluster as i -th object
14
Genetic algorithm for clustering
Chain encoding for Conceptual Graphs
  • realizes implicit parallelism of genetic
    algorithms
  • forces clustering algorithm to work faster
  • is invariant under similarity measure on CGs

An idea about a possibility to vary fitness
function of GA by varying its parameters
15
EVO LIB ProjectSystems architecture
13
???????
????????? ??
?????
16
Conceptual Graphs ClusteringData Example for
Clustering
  • We assume that the modality (i.e., number of
    local optima) of a fitness landscape is related
    to the difficulty of finding the best point on
    that landscape by evolutionary computation (e.g.,
    hillclimbers and genetic algorithms (GAs)).
  • We first examine the limits of modality by
    constructing a unimodal function and a maximally
    multimodal function.
  • At such extremes our intuition breaks down.
  • A fitness landscape consisting entirely of a
    single hill leading to the global optimum proves
    to be hard for hillclimbers but apparently easy
    for GAs.
  • A provably maximally multimodal function, in
    which half the points in the search space are
    local optima, can be easy for both hillclimbers
    and GAs.
  • Exploring the more realistic intermediate range
    between the extremes of modality, we construct
    local optima with varying degrees of attraction
    to our evolutionary algorithms.
  • Most work on optima and their basins of
    attraction has focused on hills and hillclimbers,
    while some research has explored attraction for
    the GA's crossover operator.
  • We extend the latter results by defining and
    implementing maximal partial deception in
    problems with k arbitrarily placed global optima.
  • This allows us to create functions with multiple
    local optima attractive to crossover.
  • The resulting maximally deceptive function has
    several local optima, in addition to the global
    optima, each with various size basins of
    attraction for hillclimbers as well as attraction
    for GA crossover.
  • This minimum distance function seems to be a
    powerful new tool for generalizing deception and
    relating hillclimbers (and Hamming space) to GAs
    and crossover.
  • This paper describes an initial version of a
    library of sharable and reusable medical
    ontological theories, organized according to a
    proposed classification of ontologies.

17
Conceptual Graphs Clusteringclustering results
- applying conceptual nearness
- applying relational nearness
18
Resume
  1. Conceptual graphs is the perspective tool for
    modelling semantics of texts in DL.
  2. A process of creating ontologies can be based on
    technologies which use conceptual graphs.
  3. Conceptual graphs clustering helps in solving
    structural problems in DLs and in understanding
    its data.
  4. Evolutionary approach is perspective in semantic
    modelling with conceptual graphs.
  5. To progress CGs technologies, a joined efforts of
    computer specialists and linguists are needed.

19
References
  1. A World of Conceptual Graphs http//conceptualgr
    aphs.org/
  2. Boytcheva, S. Dobrev, P. Angelova, G.CGExtract
    Towards Extraction of Conceptual Graphs from
    Controlled English. Lecture Notes in Computer
    Science ? 2120, Springer 2001.
  3. F. Southey J. G. Linders. Notio - A Java API for
    Developing CG Tools. 7th International Conference
    on Conceptual Structures, 1999. P.p. 262-271.
  4. Hirst G. Ontology and the Lexicon. - Handbook on
    Ontologies in Information Systems, Berlin
    Springer, 2003.
  5. Cole, R. M. Clustering With Genetic Algorithms
    http//citeseer.ist.psu.edu/cole98clustering.html.
  6. Montes-y-Gomez, Gelbukh, Lopez-Lopez,
    Baeza-Yates, Text Mining at Detail Level Using
    Conceptual Graphs. Lecture Notes in Computer
    Science Vol. 2393. Springer-Verlag, 2002. Pp. 122
    - 136
  7. Sarbo, J. Formal conceptual structure in
    language. In Dubois, D. M., editor, Proceedings
    of Computing Anticipatory Systems (CASYS'98), pp.
    289 - 300, Woodbury, New York. 1999.
  8. Sowa R., Conceptual Graphs Draft Proposed
    American National Standard, International
    Conference on Conceptual Structures ICCS-99,
    Lecture Notes in Artificial Intelligence 1640,
    Springer 1999.
  9. ????????? ?.?. , ????? ?.?. ????????????
    ???????????? ?????????? ?????????????. - ???.
    ?????. ???. ??????????. ????????. ???????????.
    ??? 8, ???. 3 . ???????????. - ????, 2002. - ?.
    101- 107.
  10. Holland J.H. Adaptation in Natural and Artificial
    Systems, Ann Arbor The University of Michigan
    Press. Reprinted by MIT, 1992.
  11. ????????? ?.?. ????????? ??????? ??????. ????
    ???????, 1981. 375 ?.
  12. ????????? ?.?., ???????? ?.?., ???????? ?.?.
    ?????? ? ???????? ????????????? ?????????????.
    ?. ?????????, 2003 - 432 ?.
  13. ????????? ?.?. ???????????? ????????? ????????
    ??????, ?????????????, ??????????. ????, ?????,
    2003. 152 ?.
  14. M. Bogatyrev. Modelling Systems With Symmetry//
    Proceedings of the 4 th International IMACS
    Symposium of Mathematical Modelling. - Vienna,
    Austria, February 5-7, 2003.- ARGESIM-Verlag,
    Vienna, 2003. - pp. 270 - 275.
  15. M. Bogatyrev, V. Latov, K. Avdeev. Symmetry
    Based Decomposition and its Application in
    Evolutionary Modelling. Applied Mathematica
    Proc. of 8 th International Mathematica
    Symposium. Avignon, 19-23 June, France, 2006
Write a Comment
User Comments (0)
About PowerShow.com