CS 179 Database Project - PowerPoint PPT Presentation

About This Presentation
Title:

CS 179 Database Project

Description:

CS 179 Database Project Instructor: Dr Eamonn Keogh Computer Science & Engineering Department 318 EBII University of California - Riverside Riverside, CA 92521 – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 23
Provided by: csUcrEdu85
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: CS 179 Database Project


1
CS 179 Database Project
Instructor Dr Eamonn Keogh Computer Science
Engineering Department318 EBII University of
California - RiversideRiverside, CA
92521eamonn_at_cs.ucr.eduClass web page
www.cs.ucr.edu/eamonn/cs179
2
Administration I
 Class Meeting Times   Class Activities    Discu
ssion  M  0310 p.m. - 0400 p.m.  SPR
2339    LAB  F  0210 p.m. - 0500 p.m.  ENGR2
129 (first 15 minutes rule) We will not meet
every week. You are obliged to view the class web
page every Monday morning to check for
announcements. You are 100 responsible for any
announcements/changes I might post to the web
page.
3
Administration II
Presentation of Final Project You will need to
give a short group presentation in the last two
weeks (details later).   You must show up to one
of these final presentation sessions, or take a
failing grade (exception, you work out an
alternate plan with me by the end of week 4).
Note that the sessions may go very late!! You
must be prepared to stay for the entire sessions.
Sign ups for time slots will be made available on
a first-come first-served basis later in the
quarter.
4
Administration III
Groups Groups may be of size 2 or 3. Only one
person who did not get an A or B in CS 166 may be
in a group. (I may make exceptions if the numbers
require it). If you need to be a group of one,
talk to me after class. You should take your
responsibility to your group seriously. In most
case I expect that everyone in the group will get
the same grade, but I reserve the right to give
different grades where warranted.
5
Administration IIII
  • Grading
  •  
  • Project binder 90
  • Presentation (including demonstration of
    project) 10
  • Your project binder (exhaustive details in class
    handouts) is a document in which you prove to me
    (or any reader) that you solved the problem given
    to you using a good design process.
  • It must be in the format explicitly stated in the
    handouts.
  • Your presentation is your chance to review and
    highlight the quality of your work.

6
Administration V
Office Hours   I am normally in my office 6-7
days a week. You may visit me any time. If you
wish to be 100 certain I am there you may make
an appointment by email with at least 24 hours
notice. (Note that if you make an appointment,
and then fail to keep it or show up late, the
grade for your entire group will suffer).   If
you email me, you must include CS179 in the
subject heading and note your group name (i.e.
CS179-smith-jones-zoe) in the body.  
7
Administration VI
Important If a member of your group commits an
act of academic dishonesty, all members of the
group will receive a failing grade! Dont know
the exact definition of academic dishonesty? It
is your job to find out! (This is true in
general, not just for this class). http//www.cs.
ucr.edu/content/students/index.php?choiceacademdi
s http//cnas.ucr.edu/cnas/student/dishonesty.pd
f There are certain rules which must be followed
in this class, they are made clear on the
handouts, follow them or get a written exception
from me.  
8
If you write In order to handle spatial data
efficiently, as required in computer aided
design, we decided to use an R-tree. We
implemented it... Everyone in your group gets
a failing grade. Instead you should write It was
noted by Guttman 12 that In order to handle
spatial data efficiently, as required in computer
aided
9
XXX 2004 Similarity matching is useful in two aspects. First, it is a subroutine of many data mining tasks, such as classification, clustering, rule discovery, outlier detection, and query by contents. Second, it is important in its own right for exploratory data analysis. It is possible to convert the subsequence matching problem into whole matching, by placing a sliding window of size . A time series of length N is by definition a sequence of real numbers, and therefore can be considered as a point in N-dimensional space. This immediately suggests that R-tree . Since a time series may contain thousands of points,.. This phenomenon is known as the dimensionality curse problem, and in order to utilize the powers of SAMs we need to first perform dimensionality reduction. .. three steps Establish a distance measure Disttrue for the raw data series. In this thesis, we focus on Euclidean distance Disttrue. Produce a feature extraction function F that reduces the dimensionality of the data from the original length N to n that can be handled by an appropriate index structure. Establish a distance measure Distfeature in the feature space (of n dimensions). The first dimensionality reduction technique proposed for indexing time series in the literature is to use the Discrete Fourier Transform. The basic idea is that any realistic signal can be characterized by the superposition of a finite number of sine/cosine waves, each of which is represented by a single complex number known as a Fourier coefficient. and many Fourier coefficients have a very low amplitude and therefore can be discarded without much loss of information. Keogh 2000 Similarity search is useful in its own right as a tool for exploratory data analysis, and it is also an important subroutine of many data mining applications such as clustering , classification and mining of association rules. Keogh 2000 it is possible to convert subsequence matching to whole matching by sliding a "window" of length n. A time series C c1cn with n datapoints can be considered as a point in n-dimensional space. This immediately suggests that time series could be indexed by multidimensional index structure such as the R-tree and its many variant. Since realisticqueries typically contain 20 to 1,000 datapoints (i.e. n varies from 20 to 1000) and most multidimensional index structures have poor performance at dimensionalities greater than 8-12 12, we need to first perform dimensionality reduction. following three steps. Establish a distance metric from a domain expert (in this case Euclidean distance). Produce a dimensionality reduction technique that reduces the dimensionality of the data from n to N, where N can be efficiently handled by your favorite index structure. Produce a distance measure defined ... The first technique suggested for dimensionality reduction of time series was the Discrete Fourier Transform (DFT) 1. The basic idea of spectral decomposition is that any signal, no matter how complex, can be represented by the superposition of a finite number of sine (and/or cosine) waves, where each wave represented by a single complex number known as a Fourier coefficient 29. .. . many of the Fourier coefficients have very low amplitude and thus contribute little to reconstructed signal. These low amplitude coefficients can be discarded without much loss of information
10
So, what are spatial queries?
11
  • Databases are applications which store data in a
    format which supports querying.
  • Imagine we have a database of restaurants in
    California. The database should probably be able
    to support queries like
  • Return a list of all vegetarian restaurants.
  • Return the phone number of Marios Pizza on 123
    Spruce st.
  • Return the restaurants that have a 4-star or
    higher rating.
  • However there are many reasonable queries that
    most of-the-shelf database systems do not
    support.
  • Return a list of all restaurants with 5 miles of
    my house.
  • Return (in order of distance) the 3 pizza
    restaurants nearest to UCR.

12
Nearest neighbor query
Range query
13
Your project is to build a database that supports
spatial queries, as well as classic database
queries. Although you could do this from
scratch, I highly recommend that you do this by
building some code that sits on top of an
off-the-shelf database (ie Microsoft Access,
Oracle, FoxPro, PostgreSQL). I also highly
recommend that you do this by implementing an
R-tree. In some sense the sentence above,
Your project is to build a database that, is
misleading. I wont be grading the quality of
your database directly. Your project is really
to demonstrate your ability to design medium to
large scale software.
14
User Interface
Spatial Search Engine (probably R-Tree)
Classic Database
Name ID Type Phone Location Grade
Marios Pizza 1 ITA 888-1212 244, 365 D
Joes Bugers 2 US 848-1298 34, 764 A
Jos Mexican 3 MEX 878-1333 123, 32 A
Sues Pasta 4 ITA 878-1342 876, 65 B
15
User Interface
Enter an address and we will find the location of
the nearest Californian university
Spatial Search Engine (probably R-Tree)
221 Baker Street, Riverside
Exclude Religious Schools Exclude Cal States
Classic Database
The nearest university is CSUSB. Click here for
admissions information
Name ID Type Phone Location UCV
Marios Pizza 1 ITA 888-1212 244, 365 Q
Joes Bugers 2 US 848-1298 34, 764 S
Jos Mexican 3 MEX 878-1333 123, 32 G
Sues Pasta 4 ITA 878-1342 876, 65 W
16
User Interface
Click on the map and we will find the location of
the nearest Californian university
Spatial Search Engine (probably R-Tree)
Exclude Religious Schools Exclude Cal States
Classic Database
Name ID Type Phone Location UCV
Marios Pizza 1 ITA 888-1212 244, 365 Q
Joes Bugers 2 US 848-1298 34, 764 S
Jos Mexican 3 MEX 878-1333 123, 32 G
Sues Pasta 4 ITA 878-1342 876, 65 W
The nearest university is CSUSB. Click here for
admissions information
17
User Interface
Choose a location and we will find the location
of the nearest Californian university
Spatial Search Engine (probably R-Tree)
LAX Golden Gate Bridge Balboa Park, SD Ontario
Mills
Classic Database
Name ID Type Phone Location UCV
Marios Pizza 1 ITA 888-1212 244, 365 Q
Joes Bugers 2 US 848-1298 34, 764 S
Jos Mexican 3 MEX 878-1333 123, 32 G
Sues Pasta 4 ITA 878-1342 876, 65 W
Exclude Religious Schools Exclude Cal States
The nearest university is CSUSB. Click here for
admissions information
18
User Interface
The GPS unit tells me you are in UCR, Riverside
California. Do you want to know the location of
the nearest University?
Spatial Search Engine (probably R-Tree)
Exclude Religious Schools Exclude Cal States
Classic Database
Name ID Type Phone Location UCV
Marios Pizza 1 ITA 888-1212 244, 365 Q
Joes Bugers 2 US 848-1298 34, 764 S
Jos Mexican 3 MEX 878-1333 123, 32 G
Sues Pasta 4 ITA 878-1342 876, 65 W
The nearest university is CSUSB. Click here for
admissions information
19
To begin, you must come up with an application
area which has a spatial element (I.e Restaurants
in Orange County, California brown bear
sightings, Locations of car crashes in
Riverside). You must write a two page
description of the problem, in the first
person. The project description should begin by
informally explaining the domain from the
customers perspective (As a restaurant critic
). Then explaining the utility of database for
the customer (The database will allow me to
it will also help me). After I approve the
project description, I (and/or our TA) will
assume the role of the customer (I may add some
requirements). Thereafter anytime you have a
question about what the customer wants, you must
come to see me. If you make an assumption, and it
is the wrong assumption, you will have to redo
your work, or take a major grade penalty.
20
How am I going to get the Spatial locations of
500 places?
  • The web.
  • A GPS unit.
  • Use a grid overlay.
  • If you use a grid overlay you must do it very
    carefully, and document the process.
  • Note that treating the problem as existing on a
    Euclidian plane is actually incorrect. Since the
    locations are on a sphere there will be an
    inherent error in the distances reported. This
    effect would not show up in an area the size of
    Riverside, but would show up for an area the size
    of California. However you may ignore it in this
    project.

21
Important Reminder
Do not leave here today thinking how am I going
to code this R-tree thing, or what language
should I use. Leave here thinking How is our
group going to elicit the problem, design, build
and test this piece of software? What is the best
design process to use? How are we going to
convince the professor, (with the contents of our
project binder) that we used a high quality
process to solve this problem?. In particular,
you probably want to spend a few weeks
researching the design process before you even
consider the particular application problem.
22
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com