Mirror Outlier Detection in Foreign Trade Data - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Mirror Outlier Detection in Foreign Trade Data

Description:

Improvement of FT quality is essential. Quality can be assessed using ... Violet: outlier appears in mirror (opposite sign) Black: mirror series not present ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 14
Provided by: spyros4
Category:

less

Transcript and Presenter's Notes

Title: Mirror Outlier Detection in Foreign Trade Data


1
Mirror Outlier Detection in Foreign Trade Data
  • Markos Fragkakis
  • NTTS 2009

2
Introduction
  • Foreign Trade data
  • Improvement of FT quality is essential
  • Quality can be assessed using several dimensions
    (e.g. accuracy, timeliness, clarity)
  • We focus on accuracy using outlier detection
  • Methods for outlier outlier detection (e.g.
    threshold, model based)
  • Presentation of the Mirror Outlier Detection
    application

3
Methodology
  • Univariate detection in time series (value,
    quantity, supplementary quantity)
  • Median Absolute Deviation
  • Robust
  • median, not mean
  • non-parametric

4
Mirror Outlier Detection
  • Characterization of outliers according mirror
    flow.
  • Possible outlier types
  • Green outlier appears in mirror (same sign)
  • Red outlier does not appear in mirror
  • Violet outlier appears in mirror (opposite sign)
  • Black mirror series not present
  • Pink mirror series not present (confidentiality)

5
Additional functionalities
  • Outlier classification (error in dimension, not
    observed values)
  • Swapping of observation between series
  • Copy of observations
  • Time delay (hidden green outlier)
  • Outlier detection in short series (product code
    changes)
  • Reporting for
  • Detected outliers per country (e-mailed)
  • Summary reporting

6
Example of detected outlier
7
Example of error due to swap
8
Error due to time delay
9
Technical Information
  • MOD-DB has RDBMS repository for storing outlier
    data (support for Oracle, MySQL).
  • Implemented in Java (portability,
    maintainability)
  • Command Line Interface
  • Performance issues
  • Large volume of data cause bottleneck in DB
  • Storage is in question (several GBs per month)

10
Architecture
11
Proposal for new platform
  • Use a multi dimensional viewer
  • Enable OLAP functions (slice, dice, rollup
    drilldown)
  • Create dynamic charts from data
  • Estimated variables (indices from raw outlier
    data)
  • Data mining could be performed for extracting
    inferences from data
  • Log linear models
  • Pin-point of poor data involving high values

12
Conclusions
  • Use of mirror flow for outlier chacterisation
  • New features
  • Improving quality
  • Enable building new platform for data exploration
  • Expansions of MOD to other FT data outside EU,
    other domain.

13
Questions
  • Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com