Title: Practical semantic web mining platform
1Practical semantic web mining platform
2What is?
- SWM includes
- Semantic Web and RDF
- Regular Expressions, Web Agents
- HMMs and Information Extraction
- Rule Mining, F-Logic, Description Logic
- Information Integration
- Planning for Data Gathering
- Ontologies, Learning, Editing
- Text Classification
- Applications E-Commerce
- Web services
- Semantic Web Browser
- etc
3Some Background
4(No Transcript)
5Algorithm/theory of ML
- Techniques of Machine Learning /Data Mining
- Bayesian classification/NN/GA
- Statistical technique
- Active Learning, Multi-View Learning
- Risk Minimization/Maximum Entropy Model
6Annotation
- Multiple Sources
- Annotation tools
- Using ML to automate the process
- Learn annotation rule
- Active Learn Driven (reduce training sample)
- Multi-view (improve performance)
- Multi-view detection (improve again)
7Mapping Link
- Mapping
- Find mapping points
- Find Complex mapping points (subof, superof,
5(ab), even conjunct of, etc) - Translate instances based on Mapping
- Link
- Find Link Points
- Find Complex Links
- Integrate Ontology
- Mapping/Link detection.
8(No Transcript)
9Mapping Link
- Multi-view
- name
- Instance
- Relationship, etc
- Active learning. Ask the user to specify the most
confused mapping/link - Multi-view detection. Improve the performance
10Indexing
- What is the difference between SI and Text
indexing/XML indexing? - How to define the data structure of SI? (note
that such structure should represent the
characters of SW Ontology) - How to make it efficient? (how to compare to
others work? Are there some works on it?)
11Semantic Retrieval
- Domain vs. General
- Make use of SI Ontology to improve the
performance. - Make use of reasoning technique to improve.
12Reasoning
- Reasoning rules learning
- Example Resumes, Jobs
- How to find the most appropriate job for
individual? - How to find the most appropriate person for
specified job? - Define the Rules if Person.Age(x)lt30 then
Job(y).Salarygt8000 - Rule Discovery
13Applications
- Jobs Resumes
- E-Commerce. E.g. Travel, Tickets, etc.
- Personal Assistant. Track ones work and interest
to find new information automatically. - Semantic Web Browser
14Free discussion for the platform
15Aspects
- Data
- Content
- what will to do, what can do, what not.
- Semantic web, semantic web services
- Theory-gtgtmay be basic for SCI?
- Practical application!!!! important
- Proposal Schedule.
16Data
- Data preparation
- Domain jobresume, software (from sourceforge),
travel web services. - ontology. Metadata instance
- Works
- metadata definition?integrate a ontology editor
(protégé or ontoedit or orient) - Instance database, ? use technique of annotation
or IE to extract information from specific web
sites. - How to save? use jena to save the data in
database and query it by RQL? indexing?
17Content
- Ontology building, knowledge base building?use
wordnet to assist - Composition for web services. If not web
services, what we can do, such as jobs resumes. - Annotation deep annotation. Web service
annotation, text annotation, even image
annotation. - Mapping. concept mapping, instance mapping. ?
translation, merge, meaning negotiation(mapping
representation) - Data Integration. Combine annotation and mapping
18Content
- Semantic search engine. Its definition? Simple
searchdata search, then how to make use of
ontology. Reasoning? - How to make it practical, that is, how to do it
in our domain. Shall it be a general one or
domain one? - Ontology summary. Need a better name. output
knowledge in ontology by NLP. - Indexing?
- Tools integration
19Theory
- ML, data mining.
- Inductive learning NN, Bayes, SVM, GA. Code them
or one of them by ourselves. It will cost our
time, but it doesnt mean waste time. - Transductive learning.
- Selective learning.
- More general theory, risk minimization. Note that
RM is an algorithm. It is a framework for ML. Any
learning algorithms can be used as its
implementation. - Active learning multi-view
- Reduce the samples of training.
- Improve the precision.
20Practical application
- Jobs resumes
- Targets to find the best qualified
resumes/persons for specified job or to find the
best jobs for a person. - Software from sourceforge, etc.
- Aim at software composition. ? web service
composition. Software search
21Practical application
22Proposal schedule
- Why proposal?
- Why schedule?
- Can we work together for the possible platform?
23Further Reading
24Further reading on Semantic Annotation
- A. Kiryakov, B. Popov, et al. Semantic
Annotation, Indexing, and Retrieval. 2nd
International Semantic Web Conference (ISWC2003),
http//www.ontotext.com/publications/index.htmlKi
ryakovEtAl2003 - Alani, 2003 Alani, H., Kim, S., Millard, D.,
Weal, M., Hall, W., Lewis, P. and Shadbolt, N.
Automatic Ontology-Based Knowledge Extraction
from Web Documents. IEEE Intelligent Systems
18(1)pp. 14-21. - Bemjamins, 2002Richard Benjamins, Jesus
Contreras. White Paper Six Challenges for the
Semantic Web. Intelligent Software Components.
Intelligent software for the networked economy
(isoco). April, 2002. - Berners-Lee, 1999 Tim Berners-Lee, Mark
Fischetti (Contributor), Michael L. Dertouzos
Weaving the Web The Original Design and
Ultimate Destiny of the World Wide Web 1999. - Califf, 1998 Califf M. E. (1998), Relational
Learning Techniques for Natural Language
Information Extraction, Ph.D. thesis, Univ.
Texas, Austin, 1998 - Ciravegna, 2001 Fabio Ciravegna. (LP)2, an
adaptive algorithm for information extraction
from web-related texts. In Proceedings of the
IJCAI-2001 Workshop on Adaptive Text Extraction
and Mining held in conjunction with 17th
International Joint Conference on Artificial
Intelligence (IJCAI), Seattle, Usa, August 2001.
25Further reading on Semantic Annotation
- Cohen, 2001 W. Cohen, L. Jensen, A structured
wrapper induction system for extracting
information from semi-structured documents, in
Proceedings of the Workshop on Adaptive Text
Extraction and Mining (IJCAI01), 2001. - Cunningham. 2002 H. Cunningham, D. Maynard, K.
Bontcheva, and V. Tablan. GATE A Framework and
Graphical Development Environment for Robust NLP
Tools and Applications. In Proceedings of the
40th Anniversary Meeting of the Association for
Computational Linguistics, 2002. - Czejdo, 2000 B. Czejdo, J. Dinsmore, C. H.
Hwang, R. Miller, M. Rusinkiewicz. Automatic
Generation of Ontology Based Annotations in XML
and Their Use in Retrieval Systems. Proceedings
of the First International Conference on Web
Information Systems Engineering (WISE'00)-Volume
1. IEEE Computer Society Washington, DC, USA.
2000. 296-300 - Dhamankar, 2004 Robin Dhamankar, Yoonkyong Lee,
AnHai Doan, Alon Halevy, Pedro Domingos. iMAP
Discovering Complex Semantic Matches between
Database Schemas. SIGMOD 2004 June 1318, 2004,
Paris, France.
26Further reading on Semantic Annotation
- Dill, 2003 Stephen Dill, Nadav Eiron, David
Gibson, Daniel Gruhl, R. Guha, Anant Jhingran,
Tapas Kanungo, Kevin S. McCurley, Sridhar
Rajagopalan, Andrew Tomkins, John A. Tomlin,
Jason Y. Zien. A case for automated large-scale
semantic annotation. Journal of Web Semantics
Science, Services and Agents on the World Wide
Web. Published by Elsevier B.V. July,
2003115-132 - Eriksson, 1999 H. Eriksson, R. Fergerson, Y.
Shahar, and M. Musen. Automatic generation of
ontology editors. In Proceedings of the 12th
Banff Knowledge Acquisition Workshop, Banff
Alberta, Canada, 1999. - Handschuh, 2002 S. Handschuh, S. Staab, F.
Ciravegna, S-CREAMsemi-automatic creation of
metadata, in Proceedings of the 13th
International Conference on Knowledge Engineering
and Management (EKAW 2002), Siguenza, Spain,
2002, pp. 358-372. - Heflin, 2000 J. Heflin, J. Hendler, Searching
the web with shoe, in AAAI-2000 Workshop on AI
for Web Search, Austin, Texas, 2000. - Kahan, 2001 J. Kahan, M.-R. Koivunen, Annotea
an open RDF infrastructure for shared web
annotations, in World Wide Web, 2001, pp.
623-632.
27Further reading on Semantic Annotation
- Kogut, 2001 P. Kogut, W. Holmes, AeroDAML
applying information extraction to generate DAML
annotations from web pages, 2001. - Kushmerick, 1997 N. Kushmerick, D.S. Weld, R.B.
Doorenbos, Wrapper induction for information
extraction, in Proceedings of the International
Joint Conference on Artificial Intelligence
(IJCAI), 1997, Nagoya, Japan, pp. 729-C737. - Leonard, 2001 T. Leonard, H. Glaser, Large
scale acquisition and maintenance from the web
without source access, http//www.
semannot2001.aifb.uni-karlsruhe.de/positionpapers/
Leonard. pdf, 2001. - Lerman, 2001 K. Lerman, C. Knoblock, S. Minton,
Automatic data extraction from lists and tables
in web sources, in IJCAI-2001 Workshop on
Adaptive Text Extraction and Mining, Seattle, WA,
August 2001. - Li, 2001 L.Z. Jianming Li, Y. Yu, Learning to
generate semantic annotation for domain specific
sentences, in Knowledge Markup and Semantic
Annotation Workshop in K-CAP 2001, Victoria, BC,
2001. - Popov, 2003 Borislav Popov, Atanas Kiryakov,
Dimitar Manov, Angel Kirilov, Damyan Ognyanoff,
and Miroslav Goranov. Towards Semantic Web
Information Extraction. In ISWC'03 Workshop on
Human Language Technology for the Semantic Web
and Web Services, 2003.1-21
28Further reading on Semantic Annotation
- Schaffer, 1993 Selecting a classification
method by cross-validation. Machine Learning,
13(1)135-143 - Soderlan, 1999 Soderland, S. Learning
information extraction rules for semi-structured
and free text. Machine Learning. 1999,1. 1-44 - Soo, 2003 Von-Wun Soo, Chen-Yu Lee, Chung-Cheng
Li, Shu Lei Chen and Ching-chih Chen. Automated
Semantic Annotation and Retrieval Based on
Sharable Ontology and Case-based Learning
Techniques. Proceedings of the 2003 Joint
Conference on Digital Libraries. 2003 IEEE. - Vargas-Vera, 2001 M. Vargas-Vera, E. Motta, J.
Domingue, S. Buckingham Shum, and M. Lanzoni.
Knowledge Extraction by using an Ontology-based
Annotation Tool. In K-CAP 2001 workshop on
Knowledge Markup and Semantic Annotation,
Victoria, BC, Canada, October 2001. - Vargas-Vera, 2002 M. Vargas-Vera, E. Motta, J.
Domingue, M. Lanzoni, A. Stutt, F. Ciravegna,
MnM ontology driven semiautomatic and automatic
support for semantic markup, in Proceedings of
the 13th International Conference on Knowledge
Engineering and Management (EKAW 2002), Siguenza,
Spain, 2002.
29Further reading on Ontology Mapping
- 1 Berger, J. Statistical decision theory and
Bayesian analysis. Springer-Verlag. 1985 - 2 Calvanese, D. De Giacomo, G. and Lenzerini,
M. 2002. A framework for ontology integration. In
Cruz, I. Decker, S. Euzenat, J. and
McGuinness, D., eds., The Emerging Semantic Web.
IOS Press. 201-214. - 3 H. Cunningham, D. Maynard, K. Bontcheva, and
V. Tablan. GATE A Framework and Graphical
Development Environment for Robust NLP Tools and
Applications. In Proceedings of the 40th
Anniversary Meeting of the Association for
Computational Linguistics, 2002. - 4 Robin Dhamankar, Yoonkyong Lee, AnHai Doan,
etal. iMAP Discovering Complex Semantic Matches
between Database Schemas. Proceedings of the 2004
ACM SIGMOD International Conference on Management
of Data, 2004. Paris, France ACM Press. - 5 H. Do and E. Rahm. Coma A system for
flexible combination of schema matching
approaches. In Proc. of VLDB-2002. - 6 Doan, A.H., P. Domingos, A. Halevy
Reconciling Schemas of Disparate Data Sources A
Machine-Learning Approach. SIGMOD 2001. - 7 A. Doan, J. Madhavan, P. Domingos, and A.
Halevy. Learning to map between ontologies on the
semantic web. In Proceedings of the World-Wide
Web Conference (WWW-2002), pages 662-673. ACM
Press, 2002.
30Further reading on Ontology Mapping
- 8 J. Kang and J. Naughton. On schema matching
with opaque column names and data values. In
Proc. of SIGMOD-2003. - 9 W. Kim and J. Seo. Classifying schematic and
data heterogeneity in multidatabase systems. IEEE
Computer, 1991, 24(12)12-18 - 10 J. Madhavan, P. Bernstein, and E. Rahm.
Generic schema matching with cupid. In Proc. of
VLDB-2001. - 11 A. Maedche, B. Moltik, N. Silva and R. Volz.
MAFRA -An Ontology MApping FRAmework in the
Context of the Semantic Web. In Proceeding of the
EKAW'2002, Siguenza, Spain. 2002. - 12 Alexander Maedche, Steffen Staab Ontology
Learning for the Semantic Web. IEEE Intelligent
Systems 16(2) 72-79 (2001) - 13 Jayant Madhavan, Philip Bernstein, Kuang
Chen, Alon Halevy, and Pradeep Shenoy. Corpus
based schema matching. In Proc. of the IJCAI-03
Workshop on Information Integration on the Web
(IIWeb-03), 2003. - 14 McGuinness D., Fikes R., Rice J., and Wilder
S. An environment for merging and testing large
ontologies. Proceedings of the 7th International
Conference on Principles of Knowledge
Representation and Reasoning. Colorado, USA.
31Further reading on Ontology Mapping
- 15 S. Melnik, H. Molina-Garcia, and E. Rahm.
Similarity flooding a versatile graph matching
algorithm. In Proc. of ICDE-2002. - 16 N. F. Noy and M. A. Musen. PROMPT Algorithm
and Tool for Automated Ontology Merging and
Alignment. In Proc. of AAAI-2000, pages 450-455,
2000. - 17 Nuno Silva and Joao Rocha. Semantic Web
Complex Ontology Mapping. IEEE/WIC International
Conference on Web Intelligence (WI'03) October
13-17, 2003 Halifax, Canada82-100 - 18 Omelayenko, B. RDFT A Mapping Meta-Ontology
for Business Integration Workshop on Knowledge
Transformation for the Semantic Web (KTSW 2002)
at ECAI'2002. Lyon, France 200276-83 - 19 Palopoli, L., G. Terracina, D. Ursino The
System DIKE Towards the Semi-Automatic Synthesis
of Cooperative Information Systems and Data
Warehouses. ADBIS-DASFAA 2000, 108C117 - 20 Park, J. Y., Gennari, J. H. and Musen, M.
A. "Mappings for Reuse in Knowledge-based
Systems" 11th Workshop on Knowledge Acquisition,
Modelling and Management (KAW 98) Banff, Canada
1998. - 21 Patrick. P, Dekang. L. Discovering Word
Senses from Text. In Proceedings of ACM SIGKDD
Conference on Knowledge Discovery and Data Mining
2002613-619.
32Further reading on Ontology Mapping
- 22 Richard Benjamins, Jes?s Contreras. White
Paper Six Challenges for the Semantic Web.
Intelligent Software Components. Intelligent
software for the networked economy (isoco).
April, 2002. - 23 E. Rahm and P. A. Bernstein. A survey of
approaches to automatic schema matching. The VLDB
Journal, 10334-350, 2001. - 24 Tim Berners-Lee, Mark Fischetti
(Contributor), Michael L. Dertouzos "Weaving the
Web The Original Design and Ultimate Destiny of
the World Wide Web" 1999. - 25 K. M. Ting and I. H. Witten. Issues in
stacked generalization. Journal of Artificial
Intelligence Research, 10271-289, 1999. - 26 Wache, H. Voegele, T. Visser, U.
Stuckenschmidt, H.Schuster, G. Neumann, H. and
Huebner, S. 2001. Ontology-based integration of
information - a survey of existing approaches. In
Proc. of IJCAI 2001 Workshop on Ontologies and
Information Sharing. - 27 Wiesman, F., Roos, N., and Vogt, P. (2001).
Automatic ontology mapping for agent
communication. Technical report. - 28 L. Xu and D. Embley. Using domain ontologies
to discover direct and indirect matches for
schema elements. In Proc. of the Semantic
Integration Workshop at ISWC-2003.
33Further Reading on Machine Learning
- Muslea. Multi-view plus active learning. (thesis)
- Tom M. Mitchell. Machine Learning.
- Richard O. Duda. Pattern Classification. (Second
Edition) - Zhai-Xiang Chen. Risk Minimization based
Information Retrieval. (thesis) - Wrapper Induction. Several thesis rapier, etc
- Data Mining. Han,