Linear Algebra: When are we ever going to use this stuff? - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Linear Algebra: When are we ever going to use this stuff?

Description:

The Power Method ... Here is how the power method works: ... bikes in the corridors, lava lamps everywhere, the famous ex-Grateful Dead ... – PowerPoint PPT presentation

Number of Views:280

Avg rating:3.0/5.0

Slides: 40

Provided by: chrisp4

Category:

more less

Transcript and Presenter's Notes

Title: Linear Algebra: When are we ever going to use this stuff?

1
Linear Algebra When are we ever going to use
this stuff?

Dr. Chris Pavone
CSU Chico
October 7, 2006

2
Objectives

Learn a few (6) new concepts in linear algebra.
Test your short-term memory.
Learn how Google uses Linear Algebra to order
search results.
Gain an appreciation for the power of
mathematics.

3
Notation
4
(No Transcript)
5
Recall
(We will be viewing matrices as linear
transformations)
6
Definitions (short-term memory)
7
(No Transcript)
8
(No Transcript)
9
Example
We can also get a matrix from a directed graph
10
(HINT The WWW is a directed graph!!!)
11
(No Transcript)
12
RECAP

Nonnegative Matrix all entries 0.
Stochastic Matrix nonnegative and rows add up
to 1.
Irreducible Matrix there is a path between any
2 nodes on the graph of the matrix (the graph of
the matrix is strongly connected).
Primitive Matrix nonnegative, irreducible, and
has exactly 1 eigenvalue with magnitude 1.

13
A is nonnegative, stochastic, irreducible, and
primitive.
14
The Power Method

Given a diagonalizable matrix with a dominant
eigenvalue, the power method is an iterative
technique for computing the dominant eigenvector
(i.e., the eigenvector that corresponds to the
dominant eigenvalue).

15
Here is how the power method works
Your answer will be a vector pointing in the same
direction of the dominant eigenvector. Note
that this says nothing about magnitude.
16
Example
The power method should give us a vector pointing
in the same direction as x1(.7071,.7071,0).
17
normalize
18
Here is why it works
Normalize
19
Notice how crucial it is that we have a largest
eigenvalue and a diagonalizable matrix. This
guarantees convergence.
20
(No Transcript)
21
Disclaimer
I am not a computer scientist, and I dont claim
to know anything about computers, the internet,
hacking, or Bill Gates. The following is the
gist of what I learned from doing a little
research on the Google search algorithm.
Everything I learned I got from the internet,
research articles, and various linear algebra
books. There were some small variations in what
I found, and I have left out the annoying little
technicalities (I have references for those
interested). THIS IS JUST THE GIST. Dont ask
me why Google does what it does, it just does.

22
Flashback to 1997
Sergey Brin received his B.S. degree in
mathematics and computer science from the
University of Maryland at College Park in 1993.
Currently, he is a Ph.D. candidate in computer
science at Stanford University where he received
his M.S. in 1995. He is a recipient of a National
Science Foundation Graduate Fellowship. His
research interests include search engines,
information extraction from unstructured sources,
and data mining of large text collections and
scientific data.
Lawrence Page was born in East Lansing, Michigan,
and received a B.S.E. in Computer Engineering at
the University of Michigan Ann Arbor in 1995. He
is currently a Ph.D. candidate in Computer
Science at Stanford University. Some of his
research interests include the link structure of
the web, human computer interaction, search
engines, scalability of information access
interfaces, and personal data mining.
These guys created
using Linear Algebra.
23
Taken from Does Google know what its doing,
BBC 11-15-05 Referring to Googles first office
(1998) It was unreconstructed 1960s
California bikes in the corridors, lava lamps
everywhere, the famous ex-Grateful Dead chef
cooking delights in the Google canteen, a grand
piano in reception for the Google PhDs to tinkle
on during breaks. Referring to Brin and Page
back in 98 the founding duo was not at all
clear about what the business plan actually was.
Now It is taking a million square feet of
NASA property in Silicon Valley for a new
Googleplex to create space for its army of PhDs
who spend their time dreaming up ways of making
search better. Its main asset is the number
of PhDs it has working for it, ceaselessly trying
to figure out how to extend the principle of
search into everything, unbounded by time, space
and (soon) language barriers. The company
refuses to hire people more than a year or two
out of university, for fear that experience in
the conventional business world will taint their
freshness of mind.
24
Number of pages in Googles index as of 11/04
(http//blog.searchenginewatch.com/blo
g/041111-084221)
But how does it work?
25
The Basic Idea
When you do a search, Google finds all the
relevant pages, and then orders the results using
PageRank. PageRank is a numeric value assigned
to each page that represents how important that
page is.
a page is important if it is pointed to by
other important pages. That is, they Brin
Page decided that the importance of your page
(its PageRank score) is determined by summing the
PageRanks of all pages that point to yours.
In building a mathematical definition of
PageRank, Brin and Page also reasoned that when
an important page points to several places, its
weight (PageRank) should be distributed
proportionately.
In other words, if YAHOO! points to your Web
page, thats good, but you shouldnt receive the
full weight of YAHOO! because they point to many
other places. If YAHOO! points to 999 pages in
addition to yours, then you should only get
credit for 1/1000 of YAHOO!s PageRank.
Langville, A., Meyer, C. 2004. The Use of
Linear Algebra by Web Search Engines.
26
How Google uses Linear Algebra
Google turns the hyperlink structure of the WWW
(a directed graph) into a primitive stochastic
matrix G (The Google Matrix), and then uses the
power method to find the PageRank of each page.
27
Building the Google Matrix
Then H is a nonnegative matrix.
28
Example
Suppose n6 (day 1 of the WWW)
1
2
3
6
5
4
29
But we want a primitive stochastic matrix.
Problem Some rows of H may be all zeros (i.e.,
some pages may have no links). Therefore, H is
not necessarily stochastic.
Step 2
Replace all zeros rows with e(1/n 1/n 1/n )
30
Example (cont.)
1
2
3
6
5
4
31
We are still not in the clearS may not be
irreducible. Remember We need a primitive
(nonnegative, irreducible, and exactly 1
eigenvalue with absolute value equal to 1)
stochastic matrix in order to perform the power
method.
Step 3 Let
where 0 ? 1 and
G is called the Google Matrix.
Google applies the power method to GT to compute
the PageRank of each page.
32
Google orders all pages according to each pages
PageRank. If page m has the biggest PageRank
(i.e., gm gi for all i), then page m is the
first result you see after doing a search.
Keep in mind that this algorithm only RANKS
pages. Google does have a separate method for
finding the relevant pages before ranking them.
33
To finish our example
1
2
3
6
5
4
34
Recall
(Google uses ?.85)
35
Note Using Matlab, the dominant eigenvector of
GT is d (.1044, .1488, .1160, .7043, .4038,
.54250)
36
Normalize d (.1044, .1488, .1160, .7043, .4038,
.54250) to get (0.1482, 0.2113, 0.1647, 1.0000,
0.5733, 0.7703)
Same EXACT direction
1
2
PageRanks
3
6
5
4
37
Why find the dominant eigenvector of GT?
PageRanks being contributed to page 1
Start with a random PageRank for each page
(e.g.,assume every page has initial PageRank
equal to 1)
The new PageRanks for each page
Proportion of page 3s PageRank being
contributed to page 1.
Do it again, and again, and again
38
Did you know??? (Party Knowledge)

It has been reported that Google computes
PageRank once every few weeks for all documents
in its web collection.
The time required by Google to compute the
PageRank vector has been reported to be on the
order of several days.
Google s index is larger than any other search
engine.
Brin and Page were 23 and 24 (respectively) when
they created Google.
Google claims that 50-100 iterates of the power
method is all thats needed.
The Google Toolbar has a PageRank indicator on
it.
PageRank has been called the worlds largest
matrix computation.
The WWW is not an irreducible directed graph.
Googles justification for their method is based
on the random surfer.

39
THANK YOU
References

Strang, G. Introduction to Linear Algebra, 2nd
edition, 1998.
Brin, S., Page, L. The Anatomy of a Large-Scale
Hypertextual Web Search Engine.
http//www-db.stanford.edu/backrub/google.html
Craven, P. Google's PageRank Calculator.
http//webworkshop.net/pagerank_calculator.html
Craven, P. Google's PageRank Explained and how to
make the most of it. www.webworkshop.net/pagerank.
html
Day, P. Does Google know what it's doing?
http//news.bbc.co.uk/2/hi/business/4436764.stm
Google. http//www.google.com/technology/
Langville, A., Meyer, C. The Use of the Linear
Algebra by Web Search Engines. 2004.
http//www.tufts.edu/mkilme01/siagla/articles/IMA
GE.pdf
Langville, A., Meyer, C. A Survey of Eigenvector
Methods for Web Information Retrieval. SIAM
Review, v. 47(1), 2005.
Larson Edwards. Elementary Linear Algebra, 2nd
edition, 1991.
Meyer, C. Matrix Analysis and Applied Linear
Algebra, 2000.
Rogers, I. The Google Pagerank Algorithm and How
It Works. http//www.iprcom.com/papers/pagerank/