Practical semantic web mining platform - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Practical semantic web mining platform

Description:

Using ML to automate the process. Learn annotation rule ... D., Fikes R., Rice J., and Wilder S. :An environment for merging and testing large ontologies. ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 34
Provided by: kegCsTsi
Category:

less

Transcript and Presenter's Notes

Title: Practical semantic web mining platform


1
Practical semantic web mining platform
2
What is?
  • SWM includes
  • Semantic Web and RDF
  • Regular Expressions, Web Agents
  • HMMs and Information Extraction
  • Rule Mining, F-Logic, Description Logic
  • Information Integration
  • Planning for Data Gathering
  • Ontologies, Learning, Editing
  • Text Classification
  • Applications E-Commerce
  • Web services
  • Semantic Web Browser
  • etc

3
Some Background
4
(No Transcript)
5
Algorithm/theory of ML
  • Techniques of Machine Learning /Data Mining
  • Bayesian classification/NN/GA
  • Statistical technique
  • Active Learning, Multi-View Learning
  • Risk Minimization/Maximum Entropy Model

6
Annotation
  • Multiple Sources
  • Annotation tools
  • Using ML to automate the process
  • Learn annotation rule
  • Active Learn Driven (reduce training sample)
  • Multi-view (improve performance)
  • Multi-view detection (improve again)

7
Mapping Link
  • Mapping
  • Find mapping points
  • Find Complex mapping points (subof, superof,
    5(ab), even conjunct of, etc)
  • Translate instances based on Mapping
  • Link
  • Find Link Points
  • Find Complex Links
  • Integrate Ontology
  • Mapping/Link detection.

8
(No Transcript)
9
Mapping Link
  • Multi-view
  • name
  • Instance
  • Relationship, etc
  • Active learning. Ask the user to specify the most
    confused mapping/link
  • Multi-view detection. Improve the performance

10
Indexing
  • What is the difference between SI and Text
    indexing/XML indexing?
  • How to define the data structure of SI? (note
    that such structure should represent the
    characters of SW Ontology)
  • How to make it efficient? (how to compare to
    others work? Are there some works on it?)

11
Semantic Retrieval
  • Domain vs. General
  • Make use of SI Ontology to improve the
    performance.
  • Make use of reasoning technique to improve.

12
Reasoning
  • Reasoning rules learning
  • Example Resumes, Jobs
  • How to find the most appropriate job for
    individual?
  • How to find the most appropriate person for
    specified job?
  • Define the Rules if Person.Age(x)lt30 then
    Job(y).Salarygt8000
  • Rule Discovery

13
Applications
  • Jobs Resumes
  • E-Commerce. E.g. Travel, Tickets, etc.
  • Personal Assistant. Track ones work and interest
    to find new information automatically.
  • Semantic Web Browser

14
Free discussion for the platform
15
Aspects
  • Data
  • Content
  • what will to do, what can do, what not.
  • Semantic web, semantic web services
  • Theory-gtgtmay be basic for SCI?
  • Practical application!!!! important
  • Proposal Schedule.

16
Data
  • Data preparation
  • Domain jobresume, software (from sourceforge),
    travel web services.
  • ontology. Metadata instance
  • Works
  • metadata definition?integrate a ontology editor
    (protégé or ontoedit or orient)
  • Instance database, ? use technique of annotation
    or IE to extract information from specific web
    sites.
  • How to save? use jena to save the data in
    database and query it by RQL? indexing?

17
Content
  • Ontology building, knowledge base building?use
    wordnet to assist
  • Composition for web services. If not web
    services, what we can do, such as jobs resumes.
  • Annotation deep annotation. Web service
    annotation, text annotation, even image
    annotation.
  • Mapping. concept mapping, instance mapping. ?
    translation, merge, meaning negotiation(mapping
    representation)
  • Data Integration. Combine annotation and mapping

18
Content
  • Semantic search engine. Its definition? Simple
    searchdata search, then how to make use of
    ontology. Reasoning?
  • How to make it practical, that is, how to do it
    in our domain. Shall it be a general one or
    domain one?
  • Ontology summary. Need a better name. output
    knowledge in ontology by NLP.
  • Indexing?
  • Tools integration

19
Theory
  • ML, data mining.
  • Inductive learning NN, Bayes, SVM, GA. Code them
    or one of them by ourselves. It will cost our
    time, but it doesnt mean waste time.
  • Transductive learning.
  • Selective learning.
  • More general theory, risk minimization. Note that
    RM is an algorithm. It is a framework for ML. Any
    learning algorithms can be used as its
    implementation.
  • Active learning multi-view
  • Reduce the samples of training.
  • Improve the precision.

20
Practical application
  • Jobs resumes
  • Targets to find the best qualified
    resumes/persons for specified job or to find the
    best jobs for a person.
  • Software from sourceforge, etc.
  • Aim at software composition. ? web service
    composition. Software search

21
Practical application
  • more?

22
Proposal schedule
  • Why proposal?
  • Why schedule?
  • Can we work together for the possible platform?

23
Further Reading
24
Further reading on Semantic Annotation
  • A. Kiryakov, B. Popov, et al. Semantic
    Annotation, Indexing, and Retrieval. 2nd
    International Semantic Web Conference (ISWC2003),
    http//www.ontotext.com/publications/index.htmlKi
    ryakovEtAl2003
  • Alani, 2003 Alani, H., Kim, S., Millard, D.,
    Weal, M., Hall, W., Lewis, P. and Shadbolt, N.
    Automatic Ontology-Based Knowledge Extraction
    from Web Documents. IEEE Intelligent Systems
    18(1)pp. 14-21.
  • Bemjamins, 2002Richard Benjamins, Jesus
    Contreras. White Paper Six Challenges for the
    Semantic Web. Intelligent Software Components.
    Intelligent software for the networked economy
    (isoco). April, 2002.
  • Berners-Lee, 1999 Tim Berners-Lee, Mark
    Fischetti (Contributor), Michael L. Dertouzos
    Weaving the Web The Original Design and
    Ultimate Destiny of the World Wide Web 1999.
  • Califf, 1998 Califf M. E. (1998), Relational
    Learning Techniques for Natural Language
    Information Extraction, Ph.D. thesis, Univ.
    Texas, Austin, 1998
  • Ciravegna, 2001 Fabio Ciravegna. (LP)2, an
    adaptive algorithm for information extraction
    from web-related texts. In Proceedings of the
    IJCAI-2001 Workshop on Adaptive Text Extraction
    and Mining held in conjunction with 17th
    International Joint Conference on Artificial
    Intelligence (IJCAI), Seattle, Usa, August 2001.

25
Further reading on Semantic Annotation
  • Cohen, 2001 W. Cohen, L. Jensen, A structured
    wrapper induction system for extracting
    information from semi-structured documents, in
    Proceedings of the Workshop on Adaptive Text
    Extraction and Mining (IJCAI01), 2001.
  • Cunningham. 2002 H. Cunningham, D. Maynard, K.
    Bontcheva, and V. Tablan. GATE A Framework and
    Graphical Development Environment for Robust NLP
    Tools and Applications. In Proceedings of the
    40th Anniversary Meeting of the Association for
    Computational Linguistics, 2002.
  • Czejdo, 2000 B. Czejdo, J. Dinsmore, C. H.
    Hwang, R. Miller, M. Rusinkiewicz. Automatic
    Generation of Ontology Based Annotations in XML
    and Their Use in Retrieval Systems. Proceedings
    of the First International Conference on Web
    Information Systems Engineering (WISE'00)-Volume
    1. IEEE Computer Society Washington, DC, USA.
    2000. 296-300
  • Dhamankar, 2004 Robin Dhamankar, Yoonkyong Lee,
    AnHai Doan, Alon Halevy, Pedro Domingos. iMAP
    Discovering Complex Semantic Matches between
    Database Schemas. SIGMOD 2004 June 1318, 2004,
    Paris, France.

26
Further reading on Semantic Annotation
  • Dill, 2003 Stephen Dill, Nadav Eiron, David
    Gibson, Daniel Gruhl, R. Guha, Anant Jhingran,
    Tapas Kanungo, Kevin S. McCurley, Sridhar
    Rajagopalan, Andrew Tomkins, John A. Tomlin,
    Jason Y. Zien. A case for automated large-scale
    semantic annotation. Journal of Web Semantics
    Science, Services and Agents on the World Wide
    Web. Published by Elsevier B.V. July,
    2003115-132
  • Eriksson, 1999 H. Eriksson, R. Fergerson, Y.
    Shahar, and M. Musen. Automatic generation of
    ontology editors. In Proceedings of the 12th
    Banff Knowledge Acquisition Workshop, Banff
    Alberta, Canada, 1999.
  • Handschuh, 2002 S. Handschuh, S. Staab, F.
    Ciravegna, S-CREAMsemi-automatic creation of
    metadata, in Proceedings of the 13th
    International Conference on Knowledge Engineering
    and Management (EKAW 2002), Siguenza, Spain,
    2002, pp. 358-372.
  • Heflin, 2000 J. Heflin, J. Hendler, Searching
    the web with shoe, in AAAI-2000 Workshop on AI
    for Web Search, Austin, Texas, 2000.
  • Kahan, 2001 J. Kahan, M.-R. Koivunen, Annotea
    an open RDF infrastructure for shared web
    annotations, in World Wide Web, 2001, pp.
    623-632.

27
Further reading on Semantic Annotation
  • Kogut, 2001 P. Kogut, W. Holmes, AeroDAML
    applying information extraction to generate DAML
    annotations from web pages, 2001.
  • Kushmerick, 1997 N. Kushmerick, D.S. Weld, R.B.
    Doorenbos, Wrapper induction for information
    extraction, in Proceedings of the International
    Joint Conference on Artificial Intelligence
    (IJCAI), 1997, Nagoya, Japan, pp. 729-C737.
  • Leonard, 2001 T. Leonard, H. Glaser, Large
    scale acquisition and maintenance from the web
    without source access, http//www.
    semannot2001.aifb.uni-karlsruhe.de/positionpapers/
    Leonard. pdf, 2001.
  • Lerman, 2001 K. Lerman, C. Knoblock, S. Minton,
    Automatic data extraction from lists and tables
    in web sources, in IJCAI-2001 Workshop on
    Adaptive Text Extraction and Mining, Seattle, WA,
    August 2001.
  • Li, 2001 L.Z. Jianming Li, Y. Yu, Learning to
    generate semantic annotation for domain specific
    sentences, in Knowledge Markup and Semantic
    Annotation Workshop in K-CAP 2001, Victoria, BC,
    2001.
  • Popov, 2003 Borislav Popov, Atanas Kiryakov,
    Dimitar Manov, Angel Kirilov, Damyan Ognyanoff,
    and Miroslav Goranov. Towards Semantic Web
    Information Extraction. In ISWC'03 Workshop on
    Human Language Technology for the Semantic Web
    and Web Services, 2003.1-21

28
Further reading on Semantic Annotation
  • Schaffer, 1993 Selecting a classification
    method by cross-validation. Machine Learning,
    13(1)135-143
  • Soderlan, 1999 Soderland, S. Learning
    information extraction rules for semi-structured
    and free text. Machine Learning. 1999,1. 1-44
  • Soo, 2003 Von-Wun Soo, Chen-Yu Lee, Chung-Cheng
    Li, Shu Lei Chen and Ching-chih Chen. Automated
    Semantic Annotation and Retrieval Based on
    Sharable Ontology and Case-based Learning
    Techniques. Proceedings of the 2003 Joint
    Conference on Digital Libraries. 2003 IEEE.
  • Vargas-Vera, 2001 M. Vargas-Vera, E. Motta, J.
    Domingue, S. Buckingham Shum, and M. Lanzoni.
    Knowledge Extraction by using an Ontology-based
    Annotation Tool. In K-CAP 2001 workshop on
    Knowledge Markup and Semantic Annotation,
    Victoria, BC, Canada, October 2001.
  • Vargas-Vera, 2002 M. Vargas-Vera, E. Motta, J.
    Domingue, M. Lanzoni, A. Stutt, F. Ciravegna,
    MnM ontology driven semiautomatic and automatic
    support for semantic markup, in Proceedings of
    the 13th International Conference on Knowledge
    Engineering and Management (EKAW 2002), Siguenza,
    Spain, 2002.

29
Further reading on Ontology Mapping
  • 1 Berger, J. Statistical decision theory and
    Bayesian analysis. Springer-Verlag. 1985
  • 2 Calvanese, D. De Giacomo, G. and Lenzerini,
    M. 2002. A framework for ontology integration. In
    Cruz, I. Decker, S. Euzenat, J. and
    McGuinness, D., eds., The Emerging Semantic Web.
    IOS Press. 201-214.
  • 3 H. Cunningham, D. Maynard, K. Bontcheva, and
    V. Tablan. GATE A Framework and Graphical
    Development Environment for Robust NLP Tools and
    Applications. In Proceedings of the 40th
    Anniversary Meeting of the Association for
    Computational Linguistics, 2002.
  • 4 Robin Dhamankar, Yoonkyong Lee, AnHai Doan,
    etal. iMAP Discovering Complex Semantic Matches
    between Database Schemas. Proceedings of the 2004
    ACM SIGMOD International Conference on Management
    of Data, 2004. Paris, France ACM Press.
  • 5 H. Do and E. Rahm. Coma A system for
    flexible combination of schema matching
    approaches. In Proc. of VLDB-2002.
  • 6 Doan, A.H., P. Domingos, A. Halevy
    Reconciling Schemas of Disparate Data Sources A
    Machine-Learning Approach. SIGMOD 2001.
  • 7 A. Doan, J. Madhavan, P. Domingos, and A.
    Halevy. Learning to map between ontologies on the
    semantic web. In Proceedings of the World-Wide
    Web Conference (WWW-2002), pages 662-673. ACM
    Press, 2002.

30
Further reading on Ontology Mapping
  • 8 J. Kang and J. Naughton. On schema matching
    with opaque column names and data values. In
    Proc. of SIGMOD-2003.
  • 9 W. Kim and J. Seo. Classifying schematic and
    data heterogeneity in multidatabase systems. IEEE
    Computer, 1991, 24(12)12-18
  • 10 J. Madhavan, P. Bernstein, and E. Rahm.
    Generic schema matching with cupid. In Proc. of
    VLDB-2001.
  • 11 A. Maedche, B. Moltik, N. Silva and R. Volz.
    MAFRA -An Ontology MApping FRAmework in the
    Context of the Semantic Web. In Proceeding of the
    EKAW'2002, Siguenza, Spain. 2002.
  • 12 Alexander Maedche, Steffen Staab Ontology
    Learning for the Semantic Web. IEEE Intelligent
    Systems 16(2) 72-79 (2001)
  • 13 Jayant Madhavan, Philip Bernstein, Kuang
    Chen, Alon Halevy, and Pradeep Shenoy. Corpus
    based schema matching. In Proc. of the IJCAI-03
    Workshop on Information Integration on the Web
    (IIWeb-03), 2003.
  • 14 McGuinness D., Fikes R., Rice J., and Wilder
    S. An environment for merging and testing large
    ontologies. Proceedings of the 7th International
    Conference on Principles of Knowledge
    Representation and Reasoning. Colorado, USA.

31
Further reading on Ontology Mapping
  • 15 S. Melnik, H. Molina-Garcia, and E. Rahm.
    Similarity flooding a versatile graph matching
    algorithm. In Proc. of ICDE-2002.
  • 16 N. F. Noy and M. A. Musen. PROMPT Algorithm
    and Tool for Automated Ontology Merging and
    Alignment. In Proc. of AAAI-2000, pages 450-455,
    2000.
  • 17 Nuno Silva and Joao Rocha. Semantic Web
    Complex Ontology Mapping. IEEE/WIC International
    Conference on Web Intelligence (WI'03) October
    13-17, 2003 Halifax, Canada82-100
  • 18 Omelayenko, B. RDFT A Mapping Meta-Ontology
    for Business Integration Workshop on Knowledge
    Transformation for the Semantic Web (KTSW 2002)
    at ECAI'2002. Lyon, France 200276-83
  • 19 Palopoli, L., G. Terracina, D. Ursino The
    System DIKE Towards the Semi-Automatic Synthesis
    of Cooperative Information Systems and Data
    Warehouses. ADBIS-DASFAA 2000, 108C117
  • 20 Park, J. Y., Gennari, J. H. and Musen, M.
    A. "Mappings for Reuse in Knowledge-based
    Systems" 11th Workshop on Knowledge Acquisition,
    Modelling and Management (KAW 98) Banff, Canada
    1998.
  • 21 Patrick. P, Dekang. L. Discovering Word
    Senses from Text. In Proceedings of ACM SIGKDD
    Conference on Knowledge Discovery and Data Mining
    2002613-619.

32
Further reading on Ontology Mapping
  • 22 Richard Benjamins, Jes?s Contreras. White
    Paper Six Challenges for the Semantic Web.
    Intelligent Software Components. Intelligent
    software for the networked economy (isoco).
    April, 2002.
  • 23 E. Rahm and P. A. Bernstein. A survey of
    approaches to automatic schema matching. The VLDB
    Journal, 10334-350, 2001.
  • 24 Tim Berners-Lee, Mark Fischetti
    (Contributor), Michael L. Dertouzos "Weaving the
    Web The Original Design and Ultimate Destiny of
    the World Wide Web" 1999.
  • 25 K. M. Ting and I. H. Witten. Issues in
    stacked generalization. Journal of Artificial
    Intelligence Research, 10271-289, 1999.
  • 26 Wache, H. Voegele, T. Visser, U.
    Stuckenschmidt, H.Schuster, G. Neumann, H. and
    Huebner, S. 2001. Ontology-based integration of
    information - a survey of existing approaches. In
    Proc. of IJCAI 2001 Workshop on Ontologies and
    Information Sharing.
  • 27 Wiesman, F., Roos, N., and Vogt, P. (2001).
    Automatic ontology mapping for agent
    communication. Technical report.
  • 28 L. Xu and D. Embley. Using domain ontologies
    to discover direct and indirect matches for
    schema elements. In Proc. of the Semantic
    Integration Workshop at ISWC-2003.

33
Further Reading on Machine Learning
  • Muslea. Multi-view plus active learning. (thesis)
  • Tom M. Mitchell. Machine Learning.
  • Richard O. Duda. Pattern Classification. (Second
    Edition)
  • Zhai-Xiang Chen. Risk Minimization based
    Information Retrieval. (thesis)
  • Wrapper Induction. Several thesis rapier, etc
  • Data Mining. Han,
Write a Comment
User Comments (0)
About PowerShow.com