Data Cleaning and Transformation - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Data Cleaning and Transformation

Description:

Data Cleaning and Transformation. Playing in the Mud. The Many Roles of Knowledge Workers ... It's fun to play in the mud sometimes ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 12
Provided by: marki8
Category:

less

Transcript and Presenter's Notes

Title: Data Cleaning and Transformation


1
Data Cleaning and Transformation
  • Playing in the Mud

2
The Many Roles of Knowledge Workers
Brilliant IS idea
3
The Gist of the Problem
  • Getting data out of some system to
  • Analyze it (e.g. Excel, Access, stats package)
  • Get it into another system (e.g. ERP)
  • Smart manipulation of electronic reports with
    embedded data
  • Dont want to do it manually (why?)

4
A former student describing new job with major
consulting firm
  • Lots of modeling, muddy data problems, and
    working with OLAP tools and data warehousing. I
    know the muddy data area was a particular area of
    interest to you and it seems that it is a really
    BIG issue for many businesses - actually bigger
    than I imagined.

5
ACD Report Example
Report Header
Date
Split
Blank lines
Data!
Totals Lines
Next report
6
Why Talk About This?
  • Very common problem in business
  • insert examples
  • Huge amount of time wasted doing manual
    processing
  • Really useful spreadsheet and database skills
    (and mindset)
  • Example from the EXCEL-L Developers listserv
  • Its fun to play in the mud sometimes
  • Commercial products such as Content Extractor
    (formerly Cambio) (www.datajunction.com) are
    available if you need to do this stuff routinely.
    Doing it here will allow you to become power
    users of such products quickly.
  • Next few slides are screen shots from Content
    Extractor

7
This is the main window where one defines style
definitions for the different types of lines in
the data file.
8
The date is always in the same spot so we can use
Fixed Column to get it out of the line weve
defined as DAY.
9
Use Floating Tags for data that could appear in
different positions on a line. Ex we dont know
how long the split name will be, but we do know
that it will end with a period.
For data appearing in headers/footers, (e.g.
Split Name and ), we tell Content Extractor to
Propogate Field Contents so we get the name and
number with each detail line.
10
Repeating day, split num and split name
Check out our progress.
11
Export to a wide variety of file types.
ContExTest-acddata.asc
Heres an export I did to a delimited ASCII file.
Write a Comment
User Comments (0)
About PowerShow.com