Dealing with NonIdeal Data - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Dealing with NonIdeal Data

Description:

Complete trend analysis on visibility data collected from Asheville Regional Airport ... a tool that can help with trend analysis on any type of data from ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 11
Provided by: orgs5
Category:
Tags: nonideal | data | dealing | trend

less

Transcript and Presenter's Notes

Title: Dealing with NonIdeal Data


1
Dealing with Non-Ideal Data
  • How to handle large, incomplete, and inconsistent
    datasets for use in trend analysis.
  • Brian May, October 2005

2
Project Goals
  • Complete trend analysis on visibility data
    collected from Asheville Regional Airport
  • Create a tool that can help with trend analysis
    on any type of data from any weather station

3
Trend Analysis
Trend analysis is the application of statistical
techniques to make and justify statements about
trends in the data. To show this visually the
data needs to be filtered to smooth noisy and
cyclic data.
Noisy Data
Cyclic Data
4
Problems with Data
  • Missing Data
  • Inconsistent Data
  • Changes to measurement techniques
  • Changes to measurement equipment
  • Large Data sets
  • 10 yrs 10 365 24 87,600 observations

5
Missing Data
  • Ignore missing values
  • When averaging divide by actual number of values

int N 365 24 float avg, sum for(int
i0iltNi) if(datai!null) sum
datai else N-- avg sum / N
int N 365 24 float avg, sum for(int
i0iltNi) if(datai!null) sum
datai avg sum / N
6
Inconsistent Data
  • Handling changes in data range
  • Calculated Max
  • Difficult to implement
  • Most versatile
  • User defined Max
  • Easy to implement
  • Less versatile

7
Large Datasets
Number of elements 5 365 24 43,800
  • Performance
  • Retrieve 74.8 s
  • Prepare 0.1 s
  • Process 92.8 s
  • Output 0.8 s
  • Total 168.5 s

Repetitive Sorting
Bad Algorithm
8
Large Datasets Contd.
  • Retrieve Problem
  • SQL used order by parameter
  • Dataset stored in sorted data structure
  • Solution
  • Remove order by parameter in SQL

avg (1/N)(vivi-N)
9
Large Datasets Contd.
Number of elements 5 365 24 43,800
  • Tuned Performance
  • Retrieve 59.7 s
  • Prepare 0.1 s
  • Process 1.4 s
  • Output 0.8 s
  • Total 62.0 s

74.8 59.7 15.1
92.8 1.4 91.4
168.5 62.0 106.5
10
Project Timeline
  • Aug Data access
  • Sept Initial graph
  • Oct Investigate filters
  • Nov Apply filter plot
  • Dec Apply to new elements
  • Jan Apply new filters
  • Feb Investigate higher dimensions
  • Mar Finalize Project
  • Apr Finalize report Presentation
Write a Comment
User Comments (0)
About PowerShow.com