MFM Observation Of Magnetization Reversal Process In Recording Media PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: MFM Observation Of Magnetization Reversal Process In Recording Media


1
Finding Needles in the Internet Haystack
Ron K. Cytron Washington University in Saint
Louis Department of Computer Science http//www.cs
.wustl.edu/cytron/
Roger Chamberlain, Mark Franklin, Ron Indeck,
John Lockwood, George Varghese (UCSD) Mahesh
Jayaram Thanks Ben Brodie Center for Distributed
Object Computing Department of Computer
Science Washington University
Century Club May 2002
2
Outline
  • Computers have come a long way

3
Outline
  • Computers have come a long way
  • Todays computers are never lonely

4
Outline
  • Computers have come a long way
  • Todays computers are never lonely
  • Volumes and volumes of data

5
Outline
  • Computers have come a long way
  • Todays computers are never lonely
  • Volumes and volumes of data
  • Fast searching of magnetic media

6
Outline
  • Computers have come a long way
  • Todays computers are never lonely
  • Volumes and volumes of data
  • Fast searching of magnetic media
  • Internet packet filtering

7
Outline
  • Computers have come a long way
  • Todays computers are never lonely
  • Volumes and volumes of data
  • Fast searching of magnetic media
  • Internet packet filtering
  • Conclusion

8
A Grandchilds Gift
9
If cars improved that much in 30 years
  • 4000
  • 60,000 miles per hour
  • Seats 10,000 people
  • Gets 20,000 miles per gallon
  • Breaks every 70 years

10
The Haystack
  • The Internet is large and growing
  • Content on the Internet is growing even faster
  • A haystack sits still, but the Internet.

11
Growth of the Internet (why computers arent
lonely anymore)
Y2K Problem (?) More computers sold than TVs
12
Growth of Internet Content (volumes and volumes
of data)
Anybody can publish Problem is how to find what
you want
13
9/17/2001
Page 6B What can tech companies do? Some say
they're at a loss, but others offer budding
solutions By Kevin Maney On July 7, 1940, as
the nation edged toward World War II, IBM put out
a statement that made headlines. The company
offered all its facilities for national defense,
ready to convert to making anything the
government needed. Other leaders in the
electro-mechanical technology of the day -- Ford
Motor, General Motors, General Electric -- also
threw their weight into defense efforts. They
switched from making cars and washing machines to
building tanks, aircraft engines and machine
guns. So here we are in 2001, readying for
another war. The U.S. technology industry is the
best and most innovative in the world. It is the
nation's pride and joy. Shouldn't it do
something?
14
. . . One possibility is in data-mining
technology. Data mining is a way to collect
millions of pieces of information in a computer
system, sift through that data, make sense of
them and come up with something useful. ''We (the
U.S. tech industry) are experts at data mining
and have vast resources of data to mine,'' says
Tom Evslin, CEO of Internet communications
company ITXC. ''We have used it to target
advertising. We can probably use it to identify
suspicious activity or potential terrorists.''
. . .
15
Fast searching of magnetic media with Roger
Chamberlain, Mark Franklin, Ron Indeck, John
Lockwood
16
Enabling Technology Disk Drives
Almost 10,000,000x increase in 45 years!
Magnetic disk storage areal density vs. year of
IBM product introduction(From D. A. Thompson)
17
Cost per Megabyte
Cost decreasing 3 per week!
Price history of hard disk product vs. year of
product introduction (From D. A. Thompson)
18
Massive Storage Data
  • Storage industry will ship 4,000,000,000,000,000,0
    00 Bytes this year
  • FedEx generated 14 Terabytes of data last year
  • US intelligence collects data equaling the
    printed collection of the US library every day!

19
Massive Data Sets
  • Employee records
  • Consumer information
  • Maps/mission/intelligence data
  • Genome maps
  • Data sets now measured in Terabytes, and are
    dynamic!

20
Genome Application
  • Genome maps growing expanded daily
  • Wash U sequencing center
  • Each of us has 80,000 genes found among 3 billion
    characters of DNA (A,C,G,T)
  • Look for matches
  • Identify function
  • Disease understand, diagnose, detect, medicine,
    therapy
  • Biofuels, warfare, toxic waste
  • Understand evolution
  • Forensics, organ donors, authentication
  • More effective crops, disease resistance

21
DNA String Matching
  • Looking for CACGTTAGTTAGC
  • Interested in matches and near matches
  • Search human genome and other gene oceans
  • Need to search entire data sets

22
Bio Computation Problem
BIG Genome Databases
A C G
T G
DNA sequence
T A C
A G
DNA pattern
Match?
Approximate matches are just as useful
23
Finding a needel in a heystuck
  • DNA and live text can contain errors
  • We often seek an approximate match, for example
  • needle
  • No match? Try 2-transpositions
  • enedle, needle, nedele, neelde, needel
  • No match? Try 1-deletions
  • eedle, nedle, nedle, neele, neede, needl
  • No match? Try insertions, larger edits,
  • An exponential number of possibilities

24
How is this done today?
  • Think of every way a word can be misspelled
  • Present each misspelling to the computer for an
    exact match

enedle needle nedele neelde needel
No
25
How can we do better?
  • Data is present on magnetic media
  • Hardware at the disk is
  • Already fault tolerant (more on this later)
  • needel ? needle
  • Distributed across all surfaces

We win if number of misspellings is large, and
the number of false hits is small
26
Another ApplicationIntelligence Data
  • Lots of data
  • Changing constantly
  • Many perturbations
  • Tzar, tsar, czar, . . .
  • Dont know what we want to look for beforehand

27
Google Search Engine
  • Crawls the web once per month
  • Caches web pages
  • Fast, exact text-based search (see how soon)

28
Image Database Applications
  • Challenging database
  • Unstructured
  • Massive data sets
  • Dont know what we need to look for in each
    picture

29
Satellite Data
  • Low-orbit fly-over every 90 minutes
  • Look for differences in images
  • Large objects
  • Troops
  • Changes to landscape
  • Flag, transmit these differences immediately
  • National Reconnaissance Office
  • City assessors . . .

30
Washington University
Hilltop Campus
31
How do we find what were looking for?!
32
Conventional Structured Database
Word
Inverted list - pointers
agent
lt1,2gt
Bond
lt1,4gt
computer
lt2gt
James
lt1,3,4gt
Madison
lt3gt
mobile
lt2gt
movie
lt3,4gt
33
Challenges in SearchingMassive Databases
  • Know what to search for
  • need to build index beforehand
  • maintain index as it changes
  • Do not know what to search for
  • need to search the whole database!

34
Conventional Search
Processor
Hard drive
Memory bus
Memory
I/O bus
35
Conventional Search
Conventional Search
find .
Processor
Hard drive
Memory bus
Memory
I/O bus
36
Conventional Search
Conventional Search
yes, no, no, yes, yes .
Processor
Hard drive
contents
Memory bus
Memory
I/O bus
37
Conventional Approach
38
WUSTLs Approach
39
Streaming Approach
Processor
Hard drive
Reconfigurable hardware
Memory/ processing
Memory Bus
Memory
I/O bus
40
Streaming Approach
find
Processor
Hard drive
Reconfigurable hardware
Memory/ processing
Memory Bus
Memory
I/O bus
41
Streaming Approach
Processor
find
Hard drive
Reconfigurable hardware
Memory/ processing
Memory Bus
Memory
I/O bus
42
Streaming Approach
yes, no, no, yes, yes
Processor
find
Hard drive
Reconfigurable hardware
Memory/ processing
Memory Bus
Memory
I/O bus
Parallelism through each transducer and drive
43
Magnetic Recording Channel Schematic
Channel Bits
Input User Data
Head
Disk
Encoder
A
Decoded User Data
To Bus or Cache
Decoder
Detector
B
C
Analog Readback
44
Key streaming over Data
45
Disk Level Implementation
matches
score
100-bit-key matching through a pseudo-random
binary series
46
Status Prototype in progress
Host ATAPI Controller
IDE_to_ATM module
Hard drive
47
Internet Packet Filtering with Mahesh Jayaram
and George Varghese
48
Finding Needles in a Moving Haystack
49
Cost of Internet Request
  • As technology improves, transmission time
    decreases but latency stays the same

Time
Year
50
Example Garden Hose
Fire department and gardener suffer the same wait
51
Example Hot Shower
You want this water
Latency (time to get hot water) distance
52
Latency-Free Hot Shower
Convection circuit continuously circulates hot
water Latency 0
53
Better to receive than to give
  • Cable broadcast
  • Radio broadcast
  • TV guide channel
  • Gate connection announcements in flight
  • Winning lottery number

Modern name push technology
54
Better to receive than to give
55
How do you get what you want?
56
Packet Filters
Filter F (Weather)
57
Packet Filters
Filter F (Weather)
58
Existing Approach
IBM Quote
Weather
Flight Schedule
59
Our approach
IBM Quote
Weather
Flight Schedule
Composite filter makes just one pass
60
How we do it
IBM Quote
Weather
Flight Schedule
61
Sample grammar for TCP packet
TCPConnHeader EtherType IPHeader
TCPPortPair EtherType IP_TYPE IPHeader
Vers HlenPlusRest Vers HalfByte HlenPlusRes
t 0 1 0 1 FixedRest 0 1 1 0
FixedRest OneIPOption 0 1 1 1
FixedRest TwoIPOption 1 0 0 0
FixedRest ThreeIPOption 1 0 0 1
FixedRest FourIPOption 1 0 1 0
FixedRest FiveIPOption 1 0 1 1
FixedRest FiveIPOption OneIPOption
1 1 0 0 FixedRest FiveIPOption TwoIPOption
1 1 0 1 FixedRest FiveIPOption
ThreeIPOption 1 1 1 0 FixedRest
FiveIPOption FourIPOption 1 1 1 1
FixedRest FiveIPOption FiveIPOption FixedRest
ServiceType TotalLength Identification Flags
FragmentOffset TimeToLive Protocol
HeaderChecksum IPAddrPair
ServiceType Byte TotalLength
TwoByte Identification TwoByte Flags bit bit
bit FragmentOffset bit Byte HalfByte TimeToLive
Byte Protocol TCP_PROTOCOL HeaderChecksum
TwoByte IPAddrPair IP_SRC_DST_PAIR FiveIPOption
ThreeIPOption TwoIPOption FourIPOption
TwoIPOption TwoIPOption ThreeIPOption
TwoIPOption OneIPOption TwoIPOption OneIPOption
OneIPOption OneIPOption Option Padding Option
ThreeByte Padding Byte TCPPortPair
TCP_PORT_PAIR FourByte TwoByte
TwoByte ThreeByte TwoByte Byte TwoByte Byte
Byte Byte HalfByte HalfByte HalfByte bit bit
bit bit bit 0 1
62
Results
The more things you want, the slower existing
approaches get Our performance doesnt degrade
63
Conclusions
  • The Internet and its content are growing
    explosively
  • Disk storage is abundant, cheap, reliable
  • Technology must provide fast, inexact searching
    of text and images
  • As more data is hurled at and past us, fast
    filtering of Internet traffic is a must

64
Questions?
Write a Comment
User Comments (0)
About PowerShow.com