Mining the Deep Web for Economic Data - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Mining the Deep Web for Economic Data

Description:

Monster. Traffic. 7/27/09. SIMS. 3. Federated Facts and Figures. http: ... Split out trucks and cars. Maybe in future split out full trucks and empty trucks... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 32
Provided by: halva
Category:

less

Transcript and Presenter's Notes

Title: Mining the Deep Web for Economic Data


1
Mining the Deep Web for Economic Data
  • Joe Hellerstein
  • Hal R. Varian
  • UC Berkeley
  • http//www.sims.berkeley.edu/hal
  • http//www.cs.berkeley.edu/jmh

2
The Deep Web
  • The deep Web (databases) is about 400 times as
    large as the surface Web
  • There is lots of interesting data thereif it can
    be harvested
  • Examples
  • FFF
  • Amazon
  • Monster
  • Traffic

3
Federated Facts and Figures
  • http//fff.cs.berkeley.edu
  • Mine political data
  • Collect contributions to the Democratic and
    Republican party
  • Then cross-tab this with other data
  • Yahoo celebrities list
  • Geographic information system
  • Value of real estate in donors neighborhoods
  • Clinton pardons list

4
Celebrity donors
5
Distribution of donors
6
Mining Economic Data
  • Private sector applications
  • Competitive intelligence of various sorts
  • Public sector applications
  • Economic forecasting
  • Labor market histories
  • And more.

7
Private Sector Applications
  • SIMS final projects
  • Competitors
  • Media Map
  • Footprint (a hack)
  • Book sales example
  • Courtesy of Madeline Schnapp, OReilly Associates

8
(No Transcript)
9
(No Transcript)
10
battleground adversaries attacking catching up
to fights contends opponents
competitive challenge leading market win wins losi
ng
arch-rival compete competes competitors market
share
Lists
Good
Bad
11
Evaluation
Competitors
NAICS
87
Precision
16
Recall
36
11

12
Footprint
  • Companies have to file 10-Ks and cite all
    information that is materially relevant to the
    value of the company
  • Potential bad news ends up in footnotes
  • SEC rules about visual presentation
  • What about computer readable versions at EDGAR?

13
Footnotes in SEC Filings
  • Our idea extract and highlight footnotes, link
    back to 10-K
  • Add toenotes to interesting footnotes
  • See results at http//www.sims.berkeley.edu/hal/f
    ootprinthal/footprint
  • Deeper project does content analysis help
    predict stock performance?

14
Example of Footprint
  • Includes 216 million and 128 million in other
    current liabilities for 1997 and 1996,
    respectively.
  • Unaffiliated revenues include sales to
    unconsolidated subsidiaries

15
Intelligence about competitors sales
  • Courtesy of Madeline Schnapp, formerly of
    OReilly Associates

16
(No Transcript)
17
(No Transcript)
18
Average Rank 5500
19
Amazon Rank Calibration 5/2001 Ranks between 1
and 2000
Amazon Rank
Units Sold
20
Amazon Rank Calibration 5/2001 Ranks between 1
and 20000
Amazon Data
Units Sold
21
How Well Does it Work?
22
ADDISON WESLEY TITLES
23
CALCULATED WEEKLY SALES
CALCULATED WEEKLY SALES
PUBLISHER
24
(No Transcript)
25
Monster.com
  • Help wanted
  • By city
  • By occupation
  • Jobs wanted
  • Resume generator
  • Salary aspirations
  • Better than help wanted ads since also have
    job wanted ads

26
Talentmarket
27
Resume data
  • How many job changes are optimal?
  • What is role of big regional or industry
    employers in job history?
  • Do immigrants accept lower wages?
  • Regional dynamics (e.g., Silicon Valley, Houston)
  • How long does it take before job seekers check
    willing to relocate?

28
Housing market at Craigslist
29
Traffic monitoring (PATH)
Decades of data has been collected by CalTrans
and others
30
Database of traffic conditions
31
Is Traffic a (leading,lagging) Indicator of
Regional Economic Activity?
  • Econometric issues frequency of series,
    weekend/weekday
  • Correlate with help wanted/job wanted and
    apartment/housing prices
  • Split out trucks and cars
  • Maybe in future split out full trucks and empty
    trucks
Write a Comment
User Comments (0)
About PowerShow.com