Title: Final Presentation by Visualization team
1Final Presentation by Visualization team
- Team members
- Haibo Liu
- Robertas Baronas
- Jonathan Krentel
- Zaixia Zhang
2Introduction, Requirements and Website
3Final Presentation
- Background
- Requirements
- Input Files for First Iteration
- Web site
4Background
- Radviz
- Visualization--n dimensions?2 dimensions
- Our project--data table?graph
- Terms
- target/anchor/
- division/
- attribute/numeric attribute/non-numeric
attribute/ - data point/graph point
- normalization/
5Requirements(1)
- Input file formatscomma-separated
txt/excel/oracle/access/xml - Target attribute
- Anchorsonly numeric attributes/25
- Data100/missing value/dirty value/dirty
attribute - Web download version
- Early prototypefinished!
6Requirements(2)
- User--opens files
- Usersselects target/anchors
- Our systemoffers some useful information and
suggestions - Usersplays on the graph
- Userssave a session/graph or print
7Input Files for 1st Iteration
8Names File
- Description of data file
- Format
- --non-numeric name, nominal
- example country, nominal
- --numeric
- 1. name, numeric
- example number of cylinders, numeric
- 2. name, numeric, unit
- example price, numeric, dollar
9Why?
- Sometimes, hard for our system to tell a value is
numeric or non-numeric. - Example telephone number, or bus number.
- So user has to tell us before using our system.
10Web Site
- www.cs.umb.edu/visualization
11Testing later
- Later, I am going to show the testings on our
system, mainly about FileProcessor.
12System Design, Architecture, Implementation
13Design Concerns
- Adaptability to unforeseen new view needs
- Easy maintenance of view consistency
- Restrained but flexible exposure of internally
held data - Hot spot performance plotting data points
- Data structure divorce
- Avoidance of data redundancy
- Flexibility to potential new data sources
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18Design Decisions
- Internally held data exposed via Focus
- User selections held, managed in Model
- Public getters, package setters and constructors
- Publicly available Division objects
- Objectification of system components
- Row based internal data model
- Heavy referencing, light construction
- Abstraction of data processing
19Scenarios
20Open Concerns
- Data structure marriage
- Attribute data management
- DivisionSet exception handling
- GroupSet home
- Performance hot spots
- Dirty data handling
- Object-verb command structure
- Graph Architecture Outline
21Schedule, Features, Databases Testing
22Schedule
- Old
- 10/29/03 11/24/03
- 26 days
- 11/25/03 12/30/03
- 35 days (holiday in)
- 01/01/04 02/15/04
- 45 days
- 02/16/04 04/30/04
- 75 days
- Revised
- 10/29/03 12/16/03
- 48 days (holiday in)
- 12/17/03 01/25/03
- 40 days (holiday in)
- 01/26/04 03/15/04
- 50 days
- 03/16/04 04/30/04
- 45 days
23Schedule
24Features (1)
- Deal with missing value, dirty value
- Handle equal values between minimum and maximum
for given attribute - Report statistic information for users data and
give suggestions to users for selecting anchors - Display outer big circle
- Compute and display anchors, data points
- Set different colors to data points for target
with nonnumeric data type and numeric data type - User can select target,anchors, add anchors
25Features (2)
- Table view display
- User can input value range in each division after
selecting target, display target range with color - User can move anchor points by using mouse
- Anchor locations are recomputed and anchors are
reordered - The locations of data points are recomputed and
data points are re-plotted - F. Add, remove anchors by mouse dragging
26Features (3)
- Display menu system, like file, tool, help with
their submenus - Save, new, open a section
- Display the original data table when user
requests - Compute and display the t-test value for every
two groups, the correlation values for every two
numeric attributes - User can set the base value for correlation
- Display the data points information and anchor
information when user do right click on the data
points or anchors
27Features (4)
- User can select an area in the graph, do zoom in
- Do histogram to show the range of target
attribute - Sort attributes in the table with original values
- Blinking/jingle data points
- Access different data source
- Design a nice web page and put our system on,
make it downloadable
28UCI Repository of Databases
- Contains wide set of different databases
- Tree structured, table structured
- What are we interested in?
- Table structured data sets
- Mixed with numeric/non-numeric attributes
- The number of attributes large enough
- Enough number of instances
29Data Source Collection
- Data set collection
- Full or part of real data set from UCI repository
- Data type transfer
- Continuous, integer ? numeric
- Boolean, nominal ? nonnumeric
- Unit ? add one if it has, otherwise skip
30Testing
- Test strategy
- Test by use cases
- Unit test by developer
- Integration test by two developer
- System test by specified tester
- Test cases (7)
- Test equal values, missing/dirty data, accuracy,
special cases, capacity, color distribution -
31Testing
- Bug reports and fixes
- Report by bug report form (one bug per form)
- Fixed by developer or tester
- Test assumption
- Input data is comma separated, its format
following our specification - First user select target, then select anchors
- Can not handle Boolean type attribute
32Test Case 4 Accuracy Test
- Plotting algorithm (RADVIZ approach)
- Data segment
33Test Case 4 Accuracy Test
34Test Case 5 Special Case
- No anchor is selected
- One anchor is selected
- 2 anchors are selected
- 25 anchor are selected
35Test Case 6 System capacity
- Data source
- case6/housing-names.txt (14 attributes, one of
them is nominal) - case6/housing-data.txt (number of instances 506)
- Database housing
- No missing values.
36Test Case 7 Color Distribution
- Data source
- case6/bignames.txt (9 attributes, 4 of them are
nominal, car name is unique) - case6/bigdata.txt (number of instances 392)
- Database auto-mpg
- No missing values.
37Thanks!