Mining the Stock Market: Which Measure is the Best ? Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, Rajeev Motwani Presented by Arun Qamra Main Idea Lot of interest ...
Heavy Hitters Piotr Indyk MIT Last Few Lectures Recap (last few lectures) Update a vector x Maintain a linear sketch Can compute Lp norm of x (in zillion different ...
Piotr Indyk (s partially by Lars Arge and Jeff Vitter) Today 1D data structure for searching in external memory O(log N) I/O s using standard data structures ...
Near(est) Neighbor in High Dimensions. Alexandr Andoni (s by Piotr Indyk) Nearest Neighbor ... coordinates in {1...M} into dM-dimensional Hamming space ...
a set of clients originates demands for some kind of goods or services. ... Indyk (1999) and Bose et al (2003): linear in n and polynomial in 1/ e (1-median) ...
IRC: An Iterative Reinforcement Categorization Algorithm for ... S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. ...
Pick a subset I of random coordinates. Hash function, h(p), will return a bucket ID ... Requires parameter tweaking (size of I and number of hash buckets) ...
Efficient Nearest Neighbor Searching for Motion Planning Anna Atramentov Dept. of Computer Science Iowa State University Ames, IA, USA Steven M. LaValle
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality ... These results are obtained by reducing -NNS to a new problem: point location in ...
Geometric Data Stream Algorithms as Data Structures. Data structures that support: ... The algorithms will maintain certain statistics over nP(.), which will allow it ...
Interpolate degree-k polynomial q(zj) = S1 zj. Output q(0) Multiplicative ... For what other problems can we use this 'generalize-then-interpolate' strategy? ...
Geometric Data Stream Algorithms as Data Structures. Data structures that support: ... The algorithms will maintain certain statistics over nP(.), which will allow it ...
... with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic ... Goal: compress x into a 'sketch' Ax , where A is a carefully designed ...
Feature space. query. What is it good for? Many things! Examples: Optical Character Recognition ... Given a query point q, let: p* point in P closest to q. r ...
Models and Issues in Data Stream Systems Rajeev Motwani Stanford University (with Brian Babcock, Shivnath Babu, Mayur Datar, and Jennifer Widom) STREAM Project ...
Web Data Integration Using Approximate String Join Yingping Huang and Gregory Madey Computer Science and Engineering University of Notre Dame WWW2004, New York, 5/19/2004
Mission: To improve the healthcare of the underserved by training future leaders ... Natasha Anu Anandaraja,MD Pediatrics. Sigrid Hahn,MD Emergency Medicine ...
The Problem with Music: Modeling Distance Distributions of Large Music Collections Prof. Michael Casey Program in Digital Musics Dartmouth College, Hanover, NH
What is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images? Neeraj Kumar*, Li Zhang , Shree K. Nayar* *Columbia University, University of ...
Want to estimate similarity without looking at entire objects. ... Synopsis data structures [Gibbons,Matias] Compact distance oracles, distance labels. ...
Mining Text and Web Data Contents of this Chapter Introduction Data Preprocessing Text and Web Clustering Text and Web Classification [Han & Kamber 2006, Sections 10 ...
School of Computer Science. Carnegie Mellon. Sensor ... Given a emi-infinite stream of values (time series) x1, x2, ..., xt, ... Vision; Astronomy, seismology, ...
Pseudo-disc pairs: O1 and O2 are in pd position, if O1-O2 and O2-O1 ... Minkowski sums are pseudo-discs. Consider convex P,Q,R, such that P and Q are disjoint. ...
Spectral Hashing Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU) How to handle non-uniform distributions Bit allocation between dimensions Compare value of ...
Handles scan and processing rate mismatch. PODS 2002. 31 ... Sliding windows as first-class construct. Awkward in SQL, needs reference to timestamps ...
Fingerprints. P(x): random, irreducible deg-k polynomial over Z2 ... Basic idea use min-hash of fingerprints. sk(A) = k minimal elements under p(SA) ...
[Ben-Amram, Galil FOCS '91] [Hampapuram, Fredman FOCS '93] [Chazelle STOC '95] ... never bow before the big problems (first O(lg n) bound; first separation between ...
In motion planning the following algorithms rely heavily on nearest ... T. Cover, P. Hart, 1967. D. Dobkin, R. Lipton, 1976. J. Bentley, M. Shamos, 1976 ...
ai in S ai. Example: S = 4, 5, 15, 4, 100, 4, 16, 15 ... 1m max1 i k ai,j. a. b. 8/26/09. IIT Kanpur Streams Workshop. 8. Reduction From Max Dominance Norm ...
Complicated metrics arise naturally in a number of applications. Image databases ... Isometric embedding |d(x, Ai) d(y, Ai)| d(x,y) Follows from triangle inequality ...
New Applications data input as continuous, ordered data streams ... Mine patterns, process queries and compute statistics on data streams in real-time ...
Title: ASF Author: wet-mp Last modified by: Tomasz Wysocki Document presentation format: Pokaz na ekranie (4:3) Other titles: Bookman Old Style Lucida Sans Unicode ...
Error-Correcting Codes: Progress & Challenges Madhu Sudan MIT CSAIL Communication in presence of noise Shannon s Model: Probabilistic Noise Hamming Model: Worst ...
Title: Foundations of Cryptography Lecture 2 Author: Administrator Last modified by: Admin Created Date: 10/31/2003 10:32:22 AM Document presentation format
n = stream size, m = universe size. fi = # occurrences of item i ... Estimate query selectivity to huge DB without sorting. Routers gather # distinct destinations ...
Christiane Lammersen. Christian Sohler. Dynamic Geometric Data Streams ... IITK Workshop on Algorithms for Christiane Lammersen. Processing Massive Data Sets ...
Lift. Global e-approximate sketch after lift. Merged e/2-approximate sketch ... sketch on (1- e/2)N data items, then lifting the sketch by eN/2 results in an e ...
We set f = L/k(1 log n) Run online facility location algorithm (Online-Fac-Locn) Lemma: ... Run offline algorithm on weighted centers to get k centers with cost O(OPT) ...
1 Assumption: Servers are sorted l1 ln Counter number of clients for server i: C(i) - Lk [li, li+1) at the right side of server i C(0) at left side ...
Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto Thank you! Questions? After giving form of has function just in words ...
Invariant: Any adjacent bucket pair except B2,1 within right-half window W1 has ... C (C 1) pairs of adjacent buckets in merging step. Worst Case Time is ...