Title: Linear Algebra: When are we ever going to use this stuff?
1Linear Algebra When are we ever going to use
this stuff?
- Dr. Chris Pavone
- CSU Chico
- October 7, 2006
2Objectives
- Learn a few (6) new concepts in linear algebra.
- Test your short-term memory.
- Learn how Google uses Linear Algebra to order
search results. - Gain an appreciation for the power of
mathematics.
3Notation
4(No Transcript)
5Recall
(We will be viewing matrices as linear
transformations)
6Definitions (short-term memory)
7(No Transcript)
8(No Transcript)
9Example
We can also get a matrix from a directed graph
10(HINT The WWW is a directed graph!!!)
11(No Transcript)
12RECAP
- Nonnegative Matrix all entries 0.
- Stochastic Matrix nonnegative and rows add up
to 1. - Irreducible Matrix there is a path between any
2 nodes on the graph of the matrix (the graph of
the matrix is strongly connected). - Primitive Matrix nonnegative, irreducible, and
has exactly 1 eigenvalue with magnitude 1.
13A is nonnegative, stochastic, irreducible, and
primitive.
14The Power Method
- Given a diagonalizable matrix with a dominant
eigenvalue, the power method is an iterative
technique for computing the dominant eigenvector
(i.e., the eigenvector that corresponds to the
dominant eigenvalue).
15Here is how the power method works
Your answer will be a vector pointing in the same
direction of the dominant eigenvector. Note
that this says nothing about magnitude.
16Example
The power method should give us a vector pointing
in the same direction as x1(.7071,.7071,0).
17normalize
18Here is why it works
Normalize
19Notice how crucial it is that we have a largest
eigenvalue and a diagonalizable matrix. This
guarantees convergence.
20(No Transcript)
21Disclaimer
I am not a computer scientist, and I dont claim
to know anything about computers, the internet,
hacking, or Bill Gates. The following is the
gist of what I learned from doing a little
research on the Google search algorithm.
Everything I learned I got from the internet,
research articles, and various linear algebra
books. There were some small variations in what
I found, and I have left out the annoying little
technicalities (I have references for those
interested). THIS IS JUST THE GIST. Dont ask
me why Google does what it does, it just does.
22Flashback to 1997
Sergey Brin received his B.S. degree in
mathematics and computer science from the
University of Maryland at College Park in 1993.
Currently, he is a Ph.D. candidate in computer
science at Stanford University where he received
his M.S. in 1995. He is a recipient of a National
Science Foundation Graduate Fellowship. His
research interests include search engines,
information extraction from unstructured sources,
and data mining of large text collections and
scientific data.
Lawrence Page was born in East Lansing, Michigan,
and received a B.S.E. in Computer Engineering at
the University of Michigan Ann Arbor in 1995. He
is currently a Ph.D. candidate in Computer
Science at Stanford University. Some of his
research interests include the link structure of
the web, human computer interaction, search
engines, scalability of information access
interfaces, and personal data mining.
These guys created
using Linear Algebra.
23Taken from Does Google know what its doing,
BBC 11-15-05 Referring to Googles first office
(1998) It was unreconstructed 1960s
California bikes in the corridors, lava lamps
everywhere, the famous ex-Grateful Dead chef
cooking delights in the Google canteen, a grand
piano in reception for the Google PhDs to tinkle
on during breaks. Referring to Brin and Page
back in 98 the founding duo was not at all
clear about what the business plan actually was.
Now It is taking a million square feet of
NASA property in Silicon Valley for a new
Googleplex to create space for its army of PhDs
who spend their time dreaming up ways of making
search better. Its main asset is the number
of PhDs it has working for it, ceaselessly trying
to figure out how to extend the principle of
search into everything, unbounded by time, space
and (soon) language barriers. The company
refuses to hire people more than a year or two
out of university, for fear that experience in
the conventional business world will taint their
freshness of mind.
24Number of pages in Googles index as of 11/04
(http//blog.searchenginewatch.com/blo
g/041111-084221)
But how does it work?
25The Basic Idea
When you do a search, Google finds all the
relevant pages, and then orders the results using
PageRank. PageRank is a numeric value assigned
to each page that represents how important that
page is.
a page is important if it is pointed to by
other important pages. That is, they Brin
Page decided that the importance of your page
(its PageRank score) is determined by summing the
PageRanks of all pages that point to yours.
In building a mathematical definition of
PageRank, Brin and Page also reasoned that when
an important page points to several places, its
weight (PageRank) should be distributed
proportionately.
In other words, if YAHOO! points to your Web
page, thats good, but you shouldnt receive the
full weight of YAHOO! because they point to many
other places. If YAHOO! points to 999 pages in
addition to yours, then you should only get
credit for 1/1000 of YAHOO!s PageRank.
Langville, A., Meyer, C. 2004. The Use of
Linear Algebra by Web Search Engines.
26How Google uses Linear Algebra
Google turns the hyperlink structure of the WWW
(a directed graph) into a primitive stochastic
matrix G (The Google Matrix), and then uses the
power method to find the PageRank of each page.
27Building the Google Matrix
Then H is a nonnegative matrix.
28Example
Suppose n6 (day 1 of the WWW)
1
2
3
6
5
4
29But we want a primitive stochastic matrix.
Problem Some rows of H may be all zeros (i.e.,
some pages may have no links). Therefore, H is
not necessarily stochastic.
Step 2
Replace all zeros rows with e(1/n 1/n 1/n )
30Example (cont.)
1
2
3
6
5
4
31We are still not in the clearS may not be
irreducible. Remember We need a primitive
(nonnegative, irreducible, and exactly 1
eigenvalue with absolute value equal to 1)
stochastic matrix in order to perform the power
method.
Step 3 Let
where 0 ? 1 and
G is called the Google Matrix.
Google applies the power method to GT to compute
the PageRank of each page.
32Google orders all pages according to each pages
PageRank. If page m has the biggest PageRank
(i.e., gm gi for all i), then page m is the
first result you see after doing a search.
Keep in mind that this algorithm only RANKS
pages. Google does have a separate method for
finding the relevant pages before ranking them.
33To finish our example
1
2
3
6
5
4
34Recall
(Google uses ?.85)
35Note Using Matlab, the dominant eigenvector of
GT is d (.1044, .1488, .1160, .7043, .4038,
.54250)
36Normalize d (.1044, .1488, .1160, .7043, .4038,
.54250) to get (0.1482, 0.2113, 0.1647, 1.0000,
0.5733, 0.7703)
Same EXACT direction
1
2
PageRanks
3
6
5
4
37Why find the dominant eigenvector of GT?
PageRanks being contributed to page 1
Start with a random PageRank for each page
(e.g.,assume every page has initial PageRank
equal to 1)
The new PageRanks for each page
Proportion of page 3s PageRank being
contributed to page 1.
Do it again, and again, and again
38Did you know??? (Party Knowledge)
- It has been reported that Google computes
PageRank once every few weeks for all documents
in its web collection. - The time required by Google to compute the
PageRank vector has been reported to be on the
order of several days. - Google s index is larger than any other search
engine. - Brin and Page were 23 and 24 (respectively) when
they created Google. - Google claims that 50-100 iterates of the power
method is all thats needed. - The Google Toolbar has a PageRank indicator on
it. - PageRank has been called the worlds largest
matrix computation. - The WWW is not an irreducible directed graph.
Googles justification for their method is based
on the random surfer.
39THANK YOU
References
- Strang, G. Introduction to Linear Algebra, 2nd
edition, 1998. - Brin, S., Page, L. The Anatomy of a Large-Scale
Hypertextual Web Search Engine.
http//www-db.stanford.edu/backrub/google.html - Craven, P. Google's PageRank Calculator.
http//webworkshop.net/pagerank_calculator.html - Craven, P. Google's PageRank Explained and how to
make the most of it. www.webworkshop.net/pagerank.
html - Day, P. Does Google know what it's doing?
http//news.bbc.co.uk/2/hi/business/4436764.stm - Google. http//www.google.com/technology/
- Langville, A., Meyer, C. The Use of the Linear
Algebra by Web Search Engines. 2004.
http//www.tufts.edu/mkilme01/siagla/articles/IMA
GE.pdf - Langville, A., Meyer, C. A Survey of Eigenvector
Methods for Web Information Retrieval. SIAM
Review, v. 47(1), 2005. - Larson Edwards. Elementary Linear Algebra, 2nd
edition, 1991. - Meyer, C. Matrix Analysis and Applied Linear
Algebra, 2000. - Rogers, I. The Google Pagerank Algorithm and How
It Works. http//www.iprcom.com/papers/pagerank/