Title: crawler? spider? robot? ...
1????? ?????? ???????? ?????????? ??? ?? ??????
- ?????? ????????
- ????? ?????? ???? ???? ??????
- ????? 1391
2????? ?????
- ?????? ??
- ?? ??????
- ???? ??? ?????? ??? ???? ???? ?? ??????
- ?????? ??? ?? ??????
- ?????? ??? ?????? ???? ?? ??????
- ????? ????
3?????? ??
- ?????? ?? ??? ?? ?? ???? ??????? ?? ?? ?? ????
?????? ???? ????? ? ????? ???? ????? ??? ?? ????
?? ?? ???? ?? ????? ?????? ?? ???. - ??? ???? ????
- ????? ????? ???? ???? ???? ????? ???? ????? ?
??????? ????? ?? - crawler? spider? robot? ...
???? ???? Mae2006
4????? ??????
- ?? ?????? ???? ?????? ????? (????? ?????? ????)
- ??????? ????
- ????? ????
- ????? ????? ???? ?? ??? ????? (????? ?? ???
?????) - ????? ?? ??? ???? ????
- ?????? ?? ??? ?????
- ????? ???? ????? ???? ?? ?? ??? ???? ?? ?? ?????
?????? (????? politeness) - ???? ???? ???? ???? robot.txt
- ????? ?? ????? ?????? ??? ????? ??????? ???
?????? ?? ?? ???? - ????? ???? ?????? ???? ?? ????? ???? (????? ?????
????) - ?? ???? ????? ????? ??? ???? ?????????? ?????
???? ????.
5?? ??????
6????????? ????? ????? ?? ?? ??????
- rdfsseeAlso ? rdfsisDefinedBy ? owlsameAs ?
owlimport - ????? ??? ????? ?? ????? html ?? ????? ?? ?? ???
?? ????? .rdf ?? ????. - ????? ???? ????? ???? A-Box ? T-Box
- ??????? ?? ??? ?? ???? ?? (subject, predicate,
object) - T-Box ????? ?? ? ?????? ?? (???? ?? ???)
- URI ?? predicate
- URI ????? ?? ?? obejct? ?? ????? ?? ?? ?? ????
???? ???? predicate ?? ?? ??? type ????.. - A-Box ????????? ????? ???
- URI ????? ?? ?? Subject ? ?? Object
7???? ??? ?????? ??? ???? ???? ?? ??????
- ??? ???? ?? ?????? ???? ?? ?? ???? ??? ???? ??
???? ???? ? ???? ?? - ??? ????? ???? ??? ?????
- ????? ???? ???? ?? ??? ???? ?? ?????? ??????
????? - ???? ?????? ????? ??? ?????
- ?? ??? ????? ?????
8?????? ??? ?? ??????
9???? ?? ??????
10???? ?? ??????
- ??? ???? URI ??? ?????
- ???? ???? URI ???? ?????
- ??????? ?? ????? ????? ??? ????? ? ???? ?? ????
?? - ????? ??? ??????
- ??????? ?? ????? ? ????? ???? ??? ??? ??????
- ????? ????? ?? ?????? - ????? ???? - ?????? ????
????? - ?????? URI - ??? ?????/????? - ???????? ??????
- Jena - Any23 NxParser
- ??????? ?? ???? ?? ?? ????? ? ????? ???? ?? ????
???? ?? - ???????? (subject, predicate, object, context)
11??? ??? ????
- ??? ???? ???? ??? Din2005
- ?? ????? ?? ??? ?????? ???? ??? ????? ??????
????? ??? ???? ????? ????. - ???? ???? ????? ????? ?? ?? ????
- ????? ????? ?????
- ??? ????? ?????? ?????
- ??? ???-???
- ???? ????? ????? ?????
- ???? ????? ?? ?????? ??? ???? ???
- ??? ???-???
- ???????? ???? ?? ?? ???? ?? ????
12?? ??? ????
- ?????? ?? ??? ????? ?? ????? Lee2008
- ????? ???? ???? ?? ?? ??? ????
- ????? ????? ???? ????? ??
- ??????? ?? ????? ???? ??? ????
- Top-Level Domain (TLD)
- .com , .net , .uk
- cc-TLD co.uk , edu.au
- Pay level domain (PLD)
- amazon.com , det.wa.edu.au
13?? ??? ????
14???? ???? ?????? Hog2011
15??? ????? ?? ??????? Bat 2012
- BioCrawler ?? ?????? ??????
- ?????? ?? ?? ?????? ?? ?? ????? ??? ???
- ?????? ???? - ????? ??? - ?????? ?????? - ???
???? - ????? - ???
- ?????? ????? ???? ????? ???? ??? ?????? (OWL
?? RDF) - ???????? ???? ???? ???-???
- ??? ???? ?????? ??????
- IF lt vision_vector gt THEN lt select_domain gt
- ????? ????? ?????? ???? ?????? ??
16????? ??? Politeness
- ????? ????? ???? ????? ??? ?? ????? ?????? ?? ??
????? (PLD - ????) - ????? ?????? ????? URI ??? ???? ??? ?? ???? ??
????? - ?? ??? ????? ?? ??????? ???? ???? ????? ?? ????
?? ??? ??????? - ???? ???? PLD ??
- ??? ????? PLD ???? ?? ????? ???? Hog2011
- ?????
- ????? ???? ?? ?? ?????? ?????? ????? ????
- ??? ????? ??? ????? ?? ?? ??????
- ??? ??
- ?????? ?? ???
17?? ??? ????? ???? ??
- ???? ?? ??? ?????
- ?? ??? ?????
- ??????
- ????? ?? ??? ????
- ????? ???? (last-modified-since) ?? ??? HTTP ????
- ?? ?????? ????
- ??????? ?? ????? "Pingthesemanticweb.com"
- ??????? ?? ???? ????
- ?????? ?? ??? ?????
- ????? ??? ???? ????? ?? ??? ??????
- ?????? ???? ???? ?????
18??? ??? ?????
19??? ??? ?????
- ?????? ????? ???
- ?????? ????? ?????
- ??? Master-Slave Hog2011
- ??????? ???? URI ??? seed ???? Master ??? Slave
?? - ????? ?? ??? ?? ?????? ???? ???? Slave ??
- ??????? ???? URI ?? ???? Slave ?? ??? ?????? ??
????? ?? ??? - ??? ?? ???? Har2006
- ??? ???? ????? ????? ?? ????? ??????
- ????? ?? ?? ?? ????? ??? ???? ?? ???? ??? ???
- ?????? ????? ????? ? ???? ???????
20??? ?????? ????? ?????-????? ? ????? ?????-????
????? Dod2006
- ??? ???? ????? ????? (????? ?????-?????) ?? ?????
?????? (????? ?????-???? ?????)
?????? ?????? SLUG
21??? ?????? SWSE Hog2011 Sindice Cyg2011 Swoogle Han2006 Falcons Che2009 Watson Sab2007 Slug Dod2006 LDSpider Ise2010 BioCralwer Bat2012
???? ?????? ????? ????? SWSE ????? ????? Sindice ????? ????? Swoogle ????? ????? Falcons ????? ????? Watson ????? ????? ????? ????? WebOWL
????? RDF/XML RDF/XML Turtle N-Triples RDFa Notation3 RSS1.0 ??????????? HTML ????? RDF/XML N-Triples N3 RDF/XML RDF/XML RDF/XML RDF/XML Turtle RDFa Notation3 ????? ??????? RDF/XML
????? Quad N-Triple ?? ????? ????????? subject? dataset ? URL ??? N-Triple Quad N-Triple N-Triple RDF/XML N-QUADS Object
??? ???? ???-??? ???-??? ???? ??? - - - ???-??? ???-??? ???-???
?????? ???? ????? ??PLD ????? ?? ???? - ????? ?? PLD - - ????? ?? PLD ????? ?? ?????
????? ?????? ????? ????????? ????? URI ???? ???? ?????? ?????? ???? ???? ????? - - - - ????? URI???? ????? ?? ?? PLD ?????? ???? ????? ????? ?????????
22??? ?????? SWSE Hog2011 Sindice Cyg2011 Swoogle Han2006 Falcons Che2009 Watson Sab2007 Slug Dod2006 LDSpider Ise2010 BioCralwer Bat2012
????? ??? Any23 ??? ??? ??? ??? RdfXml Nx Any23 ???
??? ???? yars2 Hbase mysql mysql mysql ?????? ???? ????? ?? ???? RDFSore db4o
??? ???? URI???? ????? - PSW ????? ( URI? ???? ????) ???? ????? Swoogle ???? PSW ????? Swoogle ???????? Protege ????? - - -
??? ????? ?? ???? ????? ??? ?? ???? ????? ??? - - - ?????-???? ????? ????? ?????-????? - ???????? ?????? (JADE) ????? ???
????? ??? - ???????? ?? ???? ???? ?????? Ping API ????? ????? ?? ?????? - ??????? ?? ?????????? ????? ??????? ????? ?? ??? ???? ???? ??? ?????????? ???? ? ?????????? ?? ??? ?? ??? ????? ?? ???????
????/ ????? ?????? ????? ???????? DERI ?????? ????? ???????? DERI ???? eBiquity ?? UMBC ??? ???? ???????? Websoft ?????? ????? KM - ?????? ????? ???????? DERI ?????
23???? ??????
- ?????
- ??? ??? ????? ? ????? ???? ?????? ??
- ???? ?? ????? ?? ????? ? ?????? ????
- ?? ??? ????? ????? ?? ???? ??????? ?? ??????
- ????? ???? ?????? ???? ? ????? ?? ??? ???? ?????
?? ????? - ?????
- ??? ???? ???? ????? ??? ?????? ? ????
- ??? ????? ?? ??? ????? ? ??????? ???? ?? ????
24????? ???? ??????
- ????? ????? ?????? ?? ?????? ?? ?? ????? ?????
- ????????
- ????? ???? ????? ???? ????? ???
- ?????? ????? ?? ??? ????? ????? ??????
- ??? ??? ???? ?????
- ???? ???????? ?????
- ???? ????? ???
- ?????? ?????? ??? ?? ?? ?????
- ?????? ???? ?? ????? ?? ????????
- ????? ??? ???????? ????? ? ???????? ???
- ?????? ????? ??? ??????? ??? ?? ???? ??????
????????
25?????? ?????? ?? ???? ??????
26?????? ??? ??????
27?????? ??? ??????
- ???? ??
- ????? ?? ??? ???? ????? ???
- ?? ??? ????? ???? ??? ?? ?? ??? ??????? ????? ???
- ?? ??? ??????? ???? ??? ?? ?? ??? ????? ????? ???
- ??????? ????? ?? ?? ????? ????? ????? ????? ?????
?? ???? - ????? ??? ???????
- ??????? ????? ????? ????? (harvest)
- ????? ????? ????? / ????? ?? ????? ??????? ???
- ??????? ?????? ?????
- ??????? ?????? ???? ??? ????? ?? ??? ????? ?????
- ????? ?????? ????? / ????? ?? ????? ???????? ???
28?????? Cha1999 Dil2000 Ehr2003 Yuv2006 Mae2008
???? ???? HTML HTML HTML - RDF HTML HTML - RDF
??? ?????? ?????? ????????? ?????? ?????? ??????
??? ???? ??????? ??????? ?? ????? ????? ?????? ????? ???? ????? ???? ????? ??? ???????? ????? ?? ??????????? ????????
????? ??????? ???? ????? ?????? ??????? ?? ??? ???? ???? ?????? ????? ????? ??? ???? ???????? ????? ??????? ???? ????? ? ?????? ???????? ????? ?? ???? ???????? ?????
??? ????? ???? - ????? ???? ????? ?????? ????? ?????? ???????? ??????-?????
??? ?????? ?????? ?????? ???? ?? ??? ????? ?? ???? ??????? ??? / ???? ??? TF-IDF / ???? ??? ????? ??? ?????? ?? ???? ???????? ?????? ???? ????? ?? ???????? RDF ????? ??? ??????????? HTML TF-IDF
????? ?????? ?????? ???? ?? ??????? ????? ?????? ?????? ???? ?? ??????? ????? ?????? ????? ?????????? ???? ???? ?? ???? ?? ??? ??????? ?????????? ????? ???????? ??? ??????? ?????? ? ??????? ???????? RDF ????? ????? ????? ???? ???? ?? ?????? ????? HTML ??? ????? TF-IDF
?????? ?????? ????? ??? ????? ???? ????? ??????? ??????? ???? ????? ????? ???? ????? ???? ????? ????? ???????? ???? ????? - ??????? ????? ??? ????? ???? ?????
??????? 1/2/3 1/2/4 2/3/4 3/4 2/3/4
????? ??????? ???? ????? ????? ?? ???? ??????? ? ????? ?? ?????? ?? ?????? URL ?????? ???? ?? ?????? ??? ??? harvest ? ??????? ?????? ?????? ???? ?? ?????? ??? ??? ? ?????? ?????? harvest ?????? ???? ?? ?????????? ??? ??? ? ?????? ????? ?? ???? ????? ???????? ?????? ???? ?? ???? ????? ???? ??? harvest ?????? ???? ?? ?? ?????? ?????? ????? ?? ???? ?????
29????? ????
- ???? ?? ?? ????? ????? ??????
- ?????? ?? ??????? ????? ??? ??????
- ???? ???? ?? ???? ????? ???????? ? ????
- ?? ?? ??? ??? ????
- ???????? ?? ????? ?????????? ??????
- ??????? ??????? ??? ?????? ?? ????? ??? ??????
- ??? ???? ????? ????? ?? ????? ?????
- ???? ???-??? ????? ?? ?????
- ???? ????? ???? ?? ???? ?? ??? ????????
- ???? PLD ??
- ??? ????? ??? ????? ?? ?? ??????
- ??? ????? ????? ????? ??????
- ???? ???? ???
- ?????? ? ?? ??? ????? ????? ?????? ??
- ???? ???? ??????
- ??????? ?? ??? ??? ????? ?? ???????
- ??????? ?? ??? ??? ???? ??????
- ????? ????
- ????? ??? ?? ???? ????? ?????? ???? URI ??
30?????
- Bat2012 A. Batzios, P. A. Mitkas, WebOWL A
Semantic Web search engine development
experiment. Journal of Expert Systems with
Applications, vol. 39, pp. 50525060, 2012. - Kum2012 R. K. Rana, N. Tyagi, A Novel
Architecture of Ontology-based Semantic Web
Crawler. International Journal of Computer
Applications, vol. 44, Apr. 2012. - Hog2011 A. Hogan, A. Harth, J. Umbrich, S.
Kinsella, A. Polleres, S. Decker, Searching and
Browsing Linked Data with SWSE the SemanticWeb
Search Engine. Journal web semantics, vol. 9,
pp. 365-401, 2011. - Cyg2011 R. Cyganiak, D1.1 Deployment of Crawler
and Indexer Module, Linking Open Data Around The
Clock (LATC) Project, 2011. - Jal2011 O. Jaliian, H. Khotanlou, A New
fuzzy-Based Method to Weigh the Related Concepts
in Semantic Focused Web Crawlers, IEEE
Conference, 2011. - Dhe2011 S. S. Dhenakaran, K. T. Sambanthan,
WEB CRAWLER - AN OVERVIEW. International
Journal of Computer Science and Communication,
vol. 2, pp. 265-267, Jun 2011. - Ise2010 R. Isele, J. Umbrich, C. Bizer, A.
Harth, LDSpider An open-source crawling
framework for the Web of Linked Data, In Poster.
ISWC2010, Shanghai, Chinam, 2010. - Del2010 R. Delbru, Searching Web Data an
Entity Retrieval Model. Ph.D thesis, at Digital
Enterprise Research Institute, National
University of Ireland, Sep. 2010.
31?????
- Che2009 G. Cheng, Y. Qu, Searching Linked
Objects with Falcons Approach, Implementation
and Evaluation. International Journal on
Semantic Web and Information Systems, vol. 5,
pp. 50-71, Sep. 2009. - Ore2008 E. Oren, R. Delbru, M. Catasta, R.
Cyganiak, H. Stenzhorn, G. Tummarello,
Sindice.com A document-oriented lookup index
for open linked data. International Journal
Metadata Semant and Ontologies, vol. 3, pp.
37-52, 2008. - Umb2008 J. Umbrich, A. Harth, A. Hogan, S.
Decker, Four heuristics to guide structured
content crawling, in Proc. of the 2008 Eighth
International Conference on Web
Engineering-Volume 00, IEEE Computer Society,
Jul. 2008, pp.196-202. - Cyg2008 R. Cyganiak, H. Stenzhorn, R.Delbru, S.
Decker, G. Tummarello, Semantic Sitemaps
Efficient and Flexible Access to Datasets on the
Semantic Web, in Proc. of the 5th European
semantic web conference on The semantic web
research and applications, 2008, pp. 690-704. - Lee2008 H. T. Lee, D. Leonard, X. Wang, D.
Loguinov, Irlbot scaling to 6 billion pages and
beyond. in Proc. of the 17th international
conference on World Wide Web, 2008, pp. 427-436. - Don2008 H. Dong, F. K. Hussain, E. Chang,
State of the art in metadata abstraction
crawlers, IEEE International Conference on
Industrial Technology, Chengdu, 2008.
32?????
Sab2007 M. Sabou, C. Baldassarre, L. Gridinoc,
S. Angeletou, E. Motta, M. d'Aquin, M. Dzbor,
WATSON A Gateway for the Semantic Web, in ESWC
poster session, 2007. Bat2007 A. Batzios, C.
Dimou, A. L. Symeonidis, P. A. Mitkas,
BioCrawler An intelligent crawler for the
Semantic Web. Journal of Expert Systems with
Applications, vol. 35, pp. 524-530,
2007. Dod2006 L. Dodds, Slug A Semantic Web
Crawler, 2006. Han2006 L. Han, L. Ding, R.
Pan, T. Finin, Swoogle's Metadata about the
Semantic Web, 2006. Har2006 A. Harth, J.
Umbrich, S. Decker, Multicrawler A pipelined
architecture for crawling and indexing semantic
web data, In 5th International Semantic Web
Conference, 2006, pp. 258271. Mae2006 F. V. D.
Maele. Ontology-based Crawler for the Semantic.
M.A. thesis, Department of Applied Computer
Science, Brussel, 2006. Yuv2006 M. Yuvarani, N.
Ch. S. N. Iyengar, A. Kannan, LSCrawler A
Framework for an Enhanced Focused Web Crawler
based on Link Semantics, in Proc. of the 2006
IEEE/WIC/ACM International Conference on Web
Intelligence, 2006. Din2005 L. Ding, T. Finin,
A. Joshi, R. Pan, P. Reddivari, Search on the
semantic web. Journal IEEE Computer, vol. 10,
pp. 62-69, Oct. 2005.
33?????
- Din2004 T. Finin, Y. Peng, R. S. Cost, J.
Sachs, R. Pan, A. Joshi, P. Reddivari, R. Pan, V.
Doshi, L. Ding,Swoogle A Search and Metadata
Engine for the Semantic Web, in Proc. of the
Thirteenth ACM Conference on Information and
Knowledge Management, 2004. - Ehr2003 M. Ehrig, A. Maedche,
Ontology-focused crawling of Web documents, in
Proc. of the 2003 ACM Symposium on Applied
Computing, 2003, pp. 1174-1178. - Ara2001 A. Arasu, J. Cho, H. G. Molina, A.
Paepcke, S. Raghavan, Searching the Web. ACM
Transactions on Internet Technology, vol. 1, pp.
243, Aug. 2001. - Ber2001 T. Berners-Lee, J. Hendler, O. Lassila,
The Semantic Web. Journal of Scienti?c
American, vol. 284, pp.35-43, May 2001. - Dil2000 M. Diligenti, F. Coetzee, S. Lawrence,
C. L. Giles, M. Gori, Focused crawling using
context graphs, in Proc. of 26th International
Conference on Very Large Databases, 2000, pp.
527534. - Cha1999 S. Chakrabarti, M. V. D. Berg, B. Dom,
Focused crawling a new approach to
topic-specific web resource discovery. Journal
of Computer Networks, vol. 31, pp. 1623-1640,
1999. - Kle1998 J. Kleinberg, Authoritative sources in
a hyperlinked environment, in Proc. ACM-SIAM
Symposium on Discrete Algorithms, 1998.
34