Data Cloud - PowerPoint PPT Presentation

About This Presentation
Title:

Data Cloud

Description:

Data API, dump access, update stream. Custom notifications. Gnip.com. Data cloud as a primary backend. Access control. Ad distribution. ( AT&T and Yahoo! Local deal) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 41
Provided by: yah94
Category:

less

Transcript and Presenter's Notes

Title: Data Cloud


1
Data Cloud
  • Yury Lifshits
  • Yahoo! Research
  • http//yury.name

2
My Beliefs
  • The key challenge in web search is structured
    search
  • Part 1 What is structured search?
  • The key challenge in structured search is
    collecting data
  • Part 2 Data distribution idea of Data Cloud
  • Part 3 Demo numeric data distribution
  • The key challenge in collecting data is incentive
    design
  • Part 4 Economics of data distribution

3
  • Structured
  • Search

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Data
  • Data data of entities data of content
  • Semi-structured data
  • Content unit
  • Body text, video, audio, or image
  • Metadata
  • Explicit key-value pairs
  • Relational properties
  • Evaluation
  • Structured data
  • Entity unit
  • Identifier
  • Metadata
  • Explicit key-value pairs
  • Relational properties
  • Evaluation

12
Structured Search
  • Factoid search
  • what's the value of property X of object Y
  • Entity hubs
  • Domain hubs
  • Structured object search
  • "all concerts this weekend in SF under 20 sorted
    by popularity"
  • Time focus
  • Ranking focus
  • Relations focus
  • Structured content search
  • "all videos with Tom Brady"
  • all comments and blog posts about Bing"

13
Yurys Wishlist
  • Business-generated data
  • Products, services, news, wishlists, contact data
  • Reality stream, sensors
  • Where what have happened
  • Expert knowledge
  • Glossary, issues, typical solutions, object
    databases, related objects graph
  • Events
  • Sport, concerts, education, corporate, community,
    private
  • Market graph signals
  • Like, interested, use, following, want to buy
    votes and ratings

14
Search as a Platform
App 4
15
  • Data Cloud

How to collect all structured data in one place?
16
Data Producers
  • People forums, wiki, mail groups, blogs, social
    networks
  • Enterprizes product profiles, corporate news,
    professional content
  • Sensors GPS modules, web cameras, traffic
    sensors, RFID
  • Transactional data

17
Data Distributors
  • Data distributor is any technical solution to
    accumulate, organize and provide access to
    structured and semi-structured data
  • Data publisher the original distributor of some
    data
  • Data retailer a consumer-facing distributor of
    some data

18
Data Consumers
  • Humans
  • Email
  • Aggregators news, friend feeds, RSS readers
  • Search
  • Browsing / random walks
  • Intelligence projects
  • Recommendation systems
  • Trend mining

19
Data Cloud
  • Data Cloud is a centralized fully-functional
    data distribution service
  • Success metric for data cloud strategy the
    total value of data on the cloud

20
To-Cloud Solutions
  • Extraction
  • DBpedia.org, web tables
  • Semantic markup, data APIs
  • Yahoo! SearchMonkey
  • Feeds
  • Yahoo! Shopping
  • Disqus.com, js-kit.com, Facebook Connect
  • Direct publishing

21
On-Cloud Solutions
  • Ontology maintenance
  • Freebase
  • Normalization, de-duplication, antispam
  • Named entity recognition, metadata
    inference, ranking
  • Data recycling (cross-references)
  • Amazon Public Data Sets
  • Viral license
  • Hosted search
  • Yahoo! BOSS

22
From-Cloud Solutions
  • Search, audience
  • Y! SearchMonkey, Google Base
  • Data API, dump access, update stream
  • Custom notifications
  • Gnip.com
  • Data cloud as a primary backend
  • Access control
  • Ad distribution. (ATT and Yahoo! Local deal)

23
  • Demo
  • webNumbr.com

Joint work with Paul Tarjan
24
(No Transcript)
25
webNumbr.com Import
  • Crawl numbers from the web
  • URL XPath regex
  • Create numbr pages
  • Update their values every hour
  • Keep the history
  • Anyone can create a numbr
  • http//webnumbr.com/create

26
webNumbr.com Export
  • Embed code
  • Graphs
  • Search browse
  • RSS

27
  • Economics of Data Distribution

Joint work with Ravi Kumar and Andrew Tomkins
28
Network Effect in Two-Sided Markets
  • Two sided market every product serves consumers
    of two types A and B
  • Cross-side network effect the more type-A users
    product X has, the more attractive it is for
    type-B consumers and vice versa
  • Examples operating systems, credit cards,
    e-commerce marketplaces
  • Two-sided network effects A theory of
    information product design
  • G. Parker, M.W. Van Alstyne, N. Bulkley, M. Van
    Alstyne

29
Basic model
  • Distributors D1, Dk
  • Producer/consumer joins only one distributor
  • Initial shares (p1,c1) (pk,ck)
  • New consumer selects a distributor with a
    probability proportional to pi
  • New producer selects a distributor with
    probability proportional to ci

30
Basic model
a2
a4
a3
a1
a1
a3
a4
a2
31
Market Shares Dynamics
  • Theorem 1
  • Market shares will stabilize
  • Theorem 2
  • With super-liner preference rule
  • one of distributors will tip
  • Theorem 3
  • With sub-liner preference rule
  • market shares will flatten

32
External Factor
  • Preference rule with external factor
  • eici/(c1ck)

Theorem 4 Market shares will stabilize on e1
e2 ek
33
Coalition
Data Cloud
34
Coalitions
  • Theorem 5
  • If all market shares are below 1/sqrt(k)
  • coalition (sharing data) is profitable for
  • all distributors
  • Corollary
  • Coalitions are not monotone
  • Example 5 4 1 1

35
Model Variations
  • Same-side network effect
  • Different p-to-c and c-to-p rules
  • Multi-homing (overlapping audiences)
  • n2 vs. nlog n revenue models
  • Mature market newcomer rate departing rate
  • Diverse market (many types of producers and
    consumers)
  • Newcoming and departing distributors
  • Directed coalitions

36
  • Challenges

37
Marketing
  • Data demand?
  • Data offerings?
  • Requirements for distribution technology?

38
Incentive design
  • Incentives for data sharing?
  • Centralized or distributed?
  • For profit or non-profit?
  • Data licensing and ownership?
  • Monetizing data cloud?

39
More Challenges
  • Prototyping
  • Data marketplace open data data demand
  • Search plugins related objects, glossaries,
    object timelines
  • Publishing tools for structured data
  • Data client structured news, bookmarking,
    notifications
  • Tech design
  • Access management
  • Namespace design
  • User interface
  • Structured search UI
  • Discovery UI

40
  • Thanks!
  • Follow my research
  • http//twitter.com/yurylifshits
  • http//yury.name/blog
Write a Comment
User Comments (0)
About PowerShow.com