Title: Bits about Bits: Bitzi and the Business of Metadata
1(No Transcript)
2Bits about Bits Bitzi and the Business of
Metadata
- Gordon Mohr
- Bitzi Corporation
- Founder Chief Technology Officer
- September 17, 2001
3Bits about Bits Bitzi and Open,Cooperative
Metadata
- Gordon Mohr
- Bitzi Corporation
- Founder Chief Technology Officer
- November 7, 2001
4Overview
- P2P File Sharing a cornucopia without
confidence - Four Missing Ingredients
- The Bitzi Approach
- Demos
- Future Directions
- Could metadata be a big business?
5Everything is now Bits
- Anything can be encoded, stored, shifted, shared
- Thc cloud is coming to include everything
- Tech and social trends are against strict control
010 100 101
100 100 101
010 110 001
011 010 001
and Bits Move Freely
6No Confidence or Context
- You can get anything imaginable, BUT
- Is it complete? Where did it originate?
- Has it been damaged or altered?
- Is this the best or current instance?
- Whats related? Is it legitimate?
- What should I seek next?
- Current ad hoc P2P sharing/distribution nets
inherently blur these issues - Filename-centric
- Mr. Short-Term Memory
7Whats Missing?
- Were craving four things
- Reliable Names
- Nothing can masquerade as something else
- Easy to ask for exactly the right thing
- Rich Metadata
- Beyond just filename and length
- Easy Access
- Everywhere the files are, and then some
- A Consensus View
- Eliminate frivolous skew of understanding
8We Want Reliable Names
- Does a file have a True Name?
- Yes, via Cryptographic Hashes
- Essentially, these are digital fingerprints
- Any-sized input (any digital file) to
fixed-sized output (hash value) - Deterministic but unpredictable
- Infeasible to create specific desired hash value
- Infeasible to find two inputs with same hash
value - Examples
- MD5 (but maybe not as reliably as once thought)
- SHA1 (and now SHA256, SHA512)
- Tiger
- RIPEMD160
9We Want Rich Metadata
- Metadata is Data about other Data
- Filename and Length are a trivial start
- Intrinsic or extrinsic to file itself
- Examples
- Generic Origin, Free-form description, Comments,
Community Ratings - Format-specific Encoding parameters, Resolution,
Playback length - Growing body of useful standards and conventions
- XML, RDF, Dublin Core, domain-specific proposals
10We Want Easy Access
- Ubiquity
- Anywhere the files are and where theyre not
- Simplicity
- Familiar interfaces
- Reliability
- Canonical location
- Redundant Mirrors
- Multiple paths same paths as files
11We Want A Consensus View
- Avoid redundant efforts
- Achieve convergence on simple issues
- Trivial disagreements and mistakes should be
quickly and permanently resolved - Robustness against casual mischief
- Capture and highlight enduring disagreements
- Even arbitrary commonality is valuable
- Naming systems
- A central reference point is the easy solution
12The File Trust Utility
13The Bitzi Approach
- A metadata aggregator, consisting of
- Website
- Community of contributors
- Editorial/rating policies
- Canonical datastore
- Web service
- Free access and reuse
- Just give us attribution
- Other restrictions only get in the way
- Our long-term role stewardship
- We live or die by the usefulness of the dataset
14Sources of Inspiration
- Open Directory Project
- AKA NewHoo, GnuHoo, DMoz(illa)
- Volunteer-built Yahoo-like categorical web index
- CD/Music projects
- CDDB (before dataset lockdown)
- FreeDB MusicBrainz (since)
- Oxford English Dictionary
- The Professor and the Madman
- Naspter et al
- De facto quality filtering
- Usenet (esp. FAQs), Epinions, Amazon reviews,
EBay, Zagats
15How Bitzi Works Bitprints Tickets
Every discrete file out there can be boiled down
to
Over time, the Tickets in our database collect
all the best metadata about the corresponding
original file.
At no point does Bitzi receive, store, transmit,
or link to actual files we deal strictly in
Bitprints and metadata.
16How Bitzi Works Tickets Out
Our database grows to describe a useful
proportion of all files in circulation.
A wide variety of people and applications use
ticket info for their own ends.
010 110 001
010 100 101
- Website visitors/searchers
- Desktop file lookups
- Media player apps
- Filesharing apps
- Derivative services
111 010 000
17How Bitzi Works Tech Details
- Our Bitprint
- Master key into our catalog
- Concatenation of two nonproprietrary hashes
- SHA1 safe, standard
- TigerTree different basis, range benefits
- Robustness against research breakthroughs
- Our data model terminology
- Bitprints may be tagged
- Tags are arbitrary XML blobs
- Growing set of types
- Usually coercible into a database row or RDF
- Tags compete with each other as necessary
- Tickets are created from the best tags
18How Bitzi Works Current tools
- Data collection
- Downloadable Bitcollider utility
- Windows Linux
- Free source code
- Calculates bitprint, extracts some intrinsic tags
- Web forms
- Viewing/rating/searching
- All at our website
19How Bitzi Works Open Code Data
- Bitcollider bitprinting code available
- Public Domain
- C Java
- Free dataset access OpenBits
- Draft OpenBits License based on Open Directory
Project license - Preliminary RDF dump available
- http//preview.openbits.org
- Eventually, at the Ticket granularity
20Using Bitzi
- On your desktop
- Identify anything youve got including possible
problems, newer versions, etc. - At our website
- Find interesting potential new things to get in
context, presented alongside other options - In other applications, devices, websites
- Identify whats playing
- Choose between offered options
- Organize/correct your collection
- Much more ?
21Demos
- Bitzi Bitcollider
- Desktop utility
- LimeWire
- Evaluate search results before downloading
- WinAmp
- See more about whats playing
- Bitzi Website
- Search for new items of interest
22Future Greater Integration
- Standard, generic get facility
- We expect single-click from Ticket asks multiple
applications to locate matching file - Ticket info inside applications
- Get Ticket direct from Bitzi, or elsewhere
- Verify Ticket validity (cryptographically signed)
- Display as locally appropriate
23Future Website and Community
- Enhanced search
- Improved rating and peer-review processes
- Browsing/Categorization
- Automatic and manual
- Dataset mining
- Variety of rankings
24Is this a Business?
- Not all Tickets are (or should be) equal
- Fuzzy vs. guaranteed trust
- Community vs. promotional info
- Attention is always scarce
- Some special inserts will cost
- Someone always needs to be found trusted
- Users benefit
- Fees subsidize verification procedures
- Prices self-select for appropriateness
- Has anyone succeeded with free lookups, paid
inserts? (Yes examples should be obvious)
25The End
- Gordon Mohr
- Founder Chief Technology Officer
- Bitzi Corporation
- Email gojomo_at_bitzi.com
- Bitizen Page http//bitzi.com/bitizen/gojomo
- OReilly Webloghttp//www.oreillynet.com/weblogs/
gojomo
26(No Transcript)
27(No Transcript)