Title: Building an Intelligent Web: Theory and Practice
1Building an Intelligent WebTheory and Practice
- Pawan Lingras
- Saint Marys University
- Rajendra Akerkar
- American University of Armenia and SIBER, India
2(No Transcript)
3(No Transcript)
4Information Retrieval
5(No Transcript)
6(No Transcript)
7(No Transcript)
8Data Mining has emerged as one of the most
exciting and dynamic fields in computing science.
The driving force for data mining is the presence
of petabyte-scale online archives that
potentially contain valuable bits of information
hidden in them. Commercial enterprises have been
quick to recognize the value of this concept
consequently, within the span of a few years, the
software market itself for data mining is
expected to be in excess of 10 billion. Data
mining refers to a family of techniques used to
detect interesting nuggets of relationships/knowle
dge in data. While the theoretical underpinnings
of the field have been around for quite some time
(in the form of pattern recognition, statistics,
data analysis and machine learning), the practice
and use of these techniques have been largely
ad-hoc. With the availability of large databases
to store, manage and assimilate data, the new
thrust of data mining lies at the intersection of
database systems, artificial intelligence and
algorithms that efficiently analyze data. The
distributed nature of several databases, their
size and the high complexity of many techniques
present interesting computational challenges.
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Figure 2.43 Relationship between precision and
recall
17(No Transcript)
18Semantic Web
19Semantic WebThe layer language model
(Berners-Lee, 2001 Broekstra et al, 2001)
20(No Transcript)
21(No Transcript)
22Figure 3.4 Representing classes and instances
(Noy et al., 2001)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26Queries 1 and 2
27Queries 3 and 4
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32A RDF model for automobiles
33(No Transcript)
34(No Transcript)
35Classification and Association
36Data Preparation
- Database Theory
- SQL
- Data Transformation
- http//www.ecn.purdue.edu/KDDCUP/data/
37Classification
- Find a rule, a formula, or black box classifier
for organizing data into classes. - Classify clients requesting loans into categories
based on the likelihood of repayment - Classify customers into Big or Moderate Spenders
based on what they buy - Classify the customers into loyal, semi-loyal,
infrequent based on the products they buy - The classifier is developed from the data in the
training set - The reliability of the classifier is evaluated
using the test set of data
38Classification
- ID3 Algorithm
- Numerical Illustration
- Application to a Small E-commerce Dataset
- C4.5 for Experimentation
- Other approaches
- Neural Networks
- Fuzzy Classification
- Rough Set Theory
39Association
- Market basket analysis
- determine which things go together
- Transactions might reveal that
- customers who buy banana also buy candles
- cheese and pickled onions seem to occur
frequently in a shopping cart - Information can be used for
- arranging a physical shop or structuring the Web
site - for targeted advertising campaign
40Association
-
- Apriori Algorithm
- Demonstration for an E-commerce Application
41Clustering
42Clustering
- Breaks a large database into different subgroups
or clusters - Unlike classification there are no predefined
classes - The clusters are put together on the basis of
similarity to each other - The data miners determine whether the clusters
offer any useful insight
43(No Transcript)
44Statistical Methods
- k means
- Numerical Example
- Implementation
- Data Preparation
- Clustering
- Other Methods
45Neural Network Based Approaches
- Kohonen Self Organising Maps
- Numerical Demonstration
- Application to Web Data Collection
- Other Neural Network Based Approaches
46Clustering of customers
47(No Transcript)
48Web Usage Mining
49High level web usage mining process(Srivastava
et al., 2000)
50Applications of web usage mining(Romanko, 2006
Srivastava et al., 2000)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67Clustering exercise
68(No Transcript)
69(No Transcript)
70Classification exercise
71Association exercise
72(No Transcript)
73Sequence Pattern Analysis of Web Logs
74(No Transcript)
75(No Transcript)
76(No Transcript)
77Web Content Mining
78Data Collection
- Web Crawlers
- Public Domain Web Crawlers
- An Implementation of a Web Crawler
79Architecture of a search engine(Romanko, 2006)
80(No Transcript)
81(No Transcript)
82(No Transcript)
83Other topics in Web Content Mining
- Search Engines
- How to prepare for and setup a search engine
- Types and listings of search engines (freeware,
remote hosting services, commercial) - Multimedia Information Retrieval
84Web Structure Mining
85(No Transcript)
86http//www.iprcom.com/papers/pagerank/
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91Index quality for different search
engines(Henzinger, et al., 1999)
92Index quality per page for different search
engines(Henzinger, et al., 1999)
93(No Transcript)
94(No Transcript)