Data Mining with AURA - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Data Mining with AURA

Description:

Pre-processing. Implements a number of pre-processors. N-grams for ... University of Leeds, Peter Dew, Alison McKay. York, J Austin, J McDermid, A Wellings. ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 29
Provided by: JimAu7
Category:
Tags: aura | data | mining | preleeds

less

Transcript and Presenter's Notes

Title: Data Mining with AURA


1
Data Mining with AURA
  • Jim Austin
  • University of York
  • Cybula Ltd

2
Overview
  • AURA
  • Background to AURA
  • Brief overview of its components
  • Its implementation
  • AURA within UK e-Science
  • What is e-Science
  • The DAME pilot project
  • Use of AURA in DAME
  • GRID issues in DM

3
The AURA Technology
  • Neural network based associative storage
  • Set of tools to build fast pattern recognition
    systems
  • Aimed at unstructured data
  • Aimed at large datasets
  • Scaleable technology

4
AURA as a basis for search
  • The game is to remove the chaff using AURA.
  • Later processes find the exact match.

5
The storage system
  • Correlation Matrix Memory based
  • Exploits threshold logic methods
  • Uses distributed encoding of information
  • Implemented using binary weights for efficient
    software and hardware implementation

6
weights (
)
M
P
Inputs
Threshold, T
R
7
Why is it fast?
  • Access only rows that are activated by inputs.
  • Inputs are made as sparse as possible and fixed
    weight.
  • Only need to sum over active rows (bit vectors)
    ideal for most processors
  • Great for bit vector machines (DAP!).

8
Use of the CMM
CMM system
Query
Data subset
Data
Slow algorithm
Final data
9
CMM system
Pre-process Operations Prepare data
Post process
CMM system
10
Pre-processing
  • Implements a number of pre-processors
  • N-grams for text strings
  • CMAC for numeric data
  • Graphs for images and graphics
  • Tokens for logical data
  • Quantisation for time series

11
Post processing
  • Data selected by the CMM must be accessed
    quickly.
  • Uses best bit index method to match output data
    and recover stored data.

12
Implementation
  • The AURA C library
  • Implemented on PC or workstation
  • Beowulf parallel cluster
  • Origin 2000 supercomputer
  • Bespoke hardware

13
Cortex-1
AURA parallel implementation 28 dedicated PCI
based processors Beowulf configuration 3.5Gb
memory size
14
UK eScience
  • Aims to build on the concept of Grids
  • To make computing and data provision as direct
    and simple as electrical power delivery
  • 110M initiative started 18 months ago
  • DAME is a 3.5M pilot project to demonstrate its
    application in the engineering field.

15
DAME Objectives
  • DAME Distributed Aircraft Maintenance
    Environment.
  • Demonstrate diagnostic capability on the GRID
  • Examine timeliness properties of the GRID
  • Demonstrate on the RR Aeroengine diagnostic
    problem

16
University of Sheffield, P Fleming.
University of Leeds, Peter Dew, Alison McKay.
York, J Austin, J McDermid, A Wellings.
University of Oxford, Lionel Tarassenko.
Rolls-Royce, Derby.
Data Systems Solutions.
Cybula Ltd.
17
Engine flight data
London Airport
Airline office
New York Airport
Grid
Diagnostics centre
Maintenance Centre
American data center
European data center
18
Diagnostic issues
  • The system must analyse and report
  • Novel engine operation
  • Identify any cause of events
  • Do this quickly
  • Data
  • Large (many Tb)

19
Data Zmod plots
20
How does AURA contribute
  • Search technology for multi-media data
  • Parallel pattern match engine based on neural
    networks.
  • Built on Correlation Matrix Memories.
  • High performance Beowulf and dedicated hardware
    implementations.
  • Commercially sold by Cybula Ltd.

21
Diagnostic station
Engine data
Novelty indication
Quote
Data used to identify novelty
Data reduction processes
Match requests
Features
Data to be searched for
Pattern match results
Data stores/ data warehouse
Diagnosis
AURA-G
GRID
22
CMM
Data sample
DM coding
Simple example of processing chain
Matching previous events
23
Typical pre-processing
01101111011110111
Frequency
DM coding
(1 up and 0 down)
Fast Preserves information Produces a binary
vector
Time
24
AURA-G
  • This is a Globus enabled AURA implementation.
  • Developed under DAME
  • Will be available end of 2002 for use in other
    problems.

25
AURA-G
  • Support of scalable pattern matching
  • Supports distributed search, across multiple CMM
    engines at different sites
  • OGSA compliant

26
Grid Issues in Data Mining
  • Data provenance
  • Standards
  • Data transparency independent of location
  • Managing DB/Data mining link in distributed
    system
  • OGSA DAI

27
Conclusions
  • AURA is a mature component for data search and
    retrieval
  • Robust software and hardware implementation
    available
  • Applications in e-Science for Grid applications
    underway

28
Contacts
Jim Austin Dept Computer Science, University of
York, York, YO1O 5DD. www.cs.york.ac.uk/arch austi
n_at_cs.york.ac.uk 01904 432734 01904 432767
Cybula Ltd. www.cybula.com 01377 236382
DAME www.cs.york.ac.uk/dame
Write a Comment
User Comments (0)
About PowerShow.com