Working with SumatraTT application - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Working with SumatraTT application

Description:

Choose source from list of available tables and views. List of columns of selected table ... influence of road surface, skidding, location, street lighting. Results ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 34
Provided by: marting151
Category:

less

Transcript and Presenter's Notes

Title: Working with SumatraTT application


1
Working withSumatraTT application

2
SumatraTT - GUI
Fast toolbar of modules
Control
Tree of available modules
Workspace
3
Create processing schema
Place it into workspace
Double-click on module to open properties
Find suitable module
4
JDBC Fetcher module properties
Fill up connection parameters
Write SQL query directly
Continue
or choose wizard which will help you with our
SQL query
5
JDBC Fetcher module properties
List of columns of selected table
Choose required columns
Choose source from list of available tables and
views
Specify query condition (optional)
Finish module properties
Click on Fetch button to see a part of result set

See query result
6
Create processing schema
Module shows incoming data in table
Place all next modules into workspace and connect
each other
Module exports incoming data into Weka format
Module splits data flow into two or more ways
Module writes incoming data into text file
Module performs scripting data modification
7
Scripting module properties
Specify output data format in Init section
Double-click on Scripting module will open its
properties
8
Scripting module properties
Each module has its own documentation which
specifies how to use it. Choose About module
item from popup menu of module in workspace to
see documentation
Specify processing script. Java language is
supported as scripting language
9
ToWeka ToFile properties
Specify output file where data will be stored in
Weka format
Specify output file, field delimiter and header
option in properties of ToFile module
10
Run schema
Run schema using control button
All properties of modules were set up. Processing
schema is ready now
11
Process done
Connections between modules are green during
transformation process and black when process is
done.
Output files were created
Table viewer module shows rough data
Numbers by connections present how many records
were already processed
12
Additional graphics
Tool bar of additional graphics
Additional graphics can be added into workspace
to improve schema understanding
13
Project description
Project description will be used as a part of
project documentation. Documentation can be
generated automatically from Tools menu
Choose Description item from Project menu to
specify project description
Write project description. HTML tags can be
usedto format pure text
14
Automatic project documentation
Documentation is generated in html format. It can
be easily presented or distributed
15
Data miningapplications of SumatraTT

16
Basic processing steps
  • direct conversion between various syntactic data
    formats
  • SQL, CSV, DBF, Weka, XML, Lisp,
  • data understanding and visualisation
  • First-touch review, Static, Interactive, Advanced
  • handling missing values, outliers and errors in
    data
  • Script module Java syntax
  • creation of data sources for modelling and
    evaluation(e.g., random division, feature
    enhancement)
  • subset creation Fair subset, Vario subset,

17
Basic processing steps
  • changing dimension of the problem
  • domain and range of individual attributes (e.g.,
    design of discrete/categorical values),
  • elimination/addition of the chosen attributes
    (e.g., data enrichment from external sources)
    Choose fields, Merge to one sequence, Split to x
    parts,
  • sophisticated techniques of data
    enhancement/reduction Trends, Wavelets
    (Matlab),

18
Key features of SumatraTT
  • Modular architecture
  • Extensible
  • Processing / Formatting / Filtering
  • User-friendly Environment
  • Rich set of I/O modules SQL database, text
    files, XML, WEKA, DBF, etc.
  • Automatic project documentation
  • Fast use modules First Touch Review
  • Internal SQL database

19
Internal design of SumatraTT
  • Various data formats
  • Numbers
  • Strings
  • Xml
  • Image
  • Missing values handling
  • Processing schema represents data flow
  • Two channels communication (data metadata)
  • Ad hoc data format negotiation
  • Medadata messages control transformation process

20
Example of SumatraTT project
  • Final project focus
  • Preprocess medical data of patients examinations
  • Strongly predictive - predict number of
    prescriptions for all 35 procedures per week
  • SumatraTT tasks
  • Source format rough data from database
  • Subgroup discovery criteria otherness and
    frequency
  • Target format Weka source files for next data
    mining purposes
  • Processing schema

21
Processed data in Weka
22
Solved SumatraTT projects

23
Resource allocation at spa
  • Data provider
  • find anything interesting which can help us to
    better understand and control our spa facilities
  • Interesting tasks - business understanding
  • identify previously unknown groups of clients
    exhibiting characteristic behavior or
    requirements
  • for such groups, predict a set of procedures to
    be passed
  • Final project focus
  • strongly predictive - predict number of
    prescriptions for all 35 procedures per week
  • SumatraTT
  • subgroup discovery criteria otherness and
    frequency

24
Resource allocation at spa
25
Health - risk factors of atherosclerosis
  • Data provider
  • get new knowledge from Stulong data
  • Interesting tasks - business understanding
  • analytical questions were defined interactions
    among factors, identification of cardiovascular
    disease (CVD) risk factors, influence of their
    development in time
  • Our focus
  • anachronism risks when dealing with time
    aggregates
  • global approach vs. windowing
  • SumatraTT
  • aggregations, windowing, trends

26
Health - risk factors of atherosclerosis
27
Health - risk factors of atherosclerosis
28
Health - risk factors of atherosclerosis
29
Health - risk factors of atherosclerosis
30
Industry - intelligent pump diagnostic
  • The final goal - result
  • an algorithmic framework for non-intrusive and
    early diagnosis of cavitation in centrifugal
    pumps
  • Interesting tasks
  • identify suitable sensors, optimize their number
    and placement
  • what is the influence of number (and thus
    resolution) of the power spectral density
    features?
  • can we deal with a large number of features
    having only a limited number of training
    examples?
  • class values are ordered, can we benefit from
    this ordering?
  • SumatraTT
  • visualizations in multidimensional attribute
    spaces - RadViz

31
Industry - intelligent pump diagnostic
32
Transport improving road safety
  • Traffic geographical data
  • traffic accidents in UK, data collected for 20
    years, 1.5 GB
  • Interesting tasks
  • general objective improve understanding of road
    safety
  • influence of road surface, skidding, location,
    street lighting
  • Results
  • clusters of common accidents, risk conditions
    more likely resulting in serious accidents
  • SumatraTT
  • segmentation wrt time, aggregation wrt location

33
Transport improving road safety
Write a Comment
User Comments (0)
About PowerShow.com