Title: Spreadsheet Visualisation to Improve End-user Understanding
1Spreadsheet Visualisation to Improve End-user
Understanding
- Daniel Ballinger, Robert Biddle and James Noble
- School of Mathematical and Computing Sciences
- Email db, robert, kjx_at_mcs.vuw.ac.nz
- http//www.mcs.vuw.ac.nz/db/honours.html
2Motivation
- Spreadsheets are a common form of end-user
programming. - Unfamiliar spreadsheets can contain daunting
amounts of information in the layout and
inter-cell dependencies. - Visualisation can be used to aid in end-user
understanding of spreadsheets. - Working outside the spreadsheet application
allows for greater flexibility in visualisation. - We focused on Microsoft Excel due to its large
market share.
3Excels Current Visualisation Support Range
Finder
- Invoked by clicking in a cell and then in the
formula bar. - Components are coloured in the bar and sheet.
- Allows for visual manipulation.
- Mainly only useful for spatially close cells.
Range Finder
4Excels Current Visualisation Support Formula
Auditing Tools
Auditing Tools
- Invoked using Formula Auditing Toolbar.
- Trace dependents or precedents.
- Arrows always point to referenced cell.
- Users may navigate spatially disjoint cells.
(semantic navigation) - Complicated spreadsheets can create a tangle of
arrows.
5Related Work
- Takeo Igarashi
- Spreadsheets augment a visible tabular layout
with invisible formulas. - Created visualisations to help reveal the hidden
dataflow graphs and superficial tabular layouts
of spreadsheets. - Markus Clermont
- Most end-users are not trained programmers.
- Many spreadsheets exist beyond being simple
scratch pads. - Raymond Panko
- Studies of empirical data into spreadsheet
errors. - Found error rates can be disturbingly high.
- Errors attributed to over confidence and lack of
formal checking. - Margaret Burnett
- The importance of scalability in visualisations.
- Forms/3 and an embedded testing methodology.
6Spreadsheet Application Toolkit
- Find and store spreadsheets from the Internet.
- Extract low level structures. E.g. Cell values
and formulas. - Analyse spreadsheet structures. Either individual
or corpus. - Conveying the findings through visualisation.
7Visualisations
- Spreadsheet layout
- Clustering
- Data Dependency Flow
- Data Dependency Direction
- Graph Structure
- Fisheye view
- Formula Inspection
- Corpus Analysis
8Spreadsheet layout Real-estate Utilisation 2D
- Understanding layout is an important first step
in learning about a new spreadsheet. - Actual values and formulas are only shown as
occupied cells. - The visualisation layout mimics that of Excel,
with columns along the top of the x-axis and rows
running down the y-axis. - Cells with a higher occupancy level are coloured
towards the red end of the colour spectrum.
9Spreadsheet layout Real-estate Utilisation 3D
- Occupancy data is projected into 3D to create a
surface map. - Discrete to continuous data transformation helps
smooth the effects of spikes. - Coloured to give a Topographical terrain effect.
- Full benefit is seen with user interaction.
10BIRCH Clustering
- BIRCH clustering partitions records into clusters
that are similar according to two or more
attributes. - Current visualisations use the Euclidean distance
between cells as the similarity metric.
11Data Dependency Flow
- Visualising just average unit vectors for each
cell can reduce the visual clutter. - 3D can be used to separate vectors that occur at
different sheet levels. - Note the curvature back towards the origin for
this workbook.
12Data Dependency Direction
Radar Graph for Outgoing Dependencies
- Concentrate purely on the directions of data flow
relative to cells. - Angles are sorted into 36 buckets then feed to
Excel to create the graph. - After the four main axis the next significant
measure occurs between 300 and 360º.
13Graph Structure
- Disregarding spatial bounds allows some
structures to become clearer.
Spring view
Source Data
14Fisheye view - Focus Context
- Addresses formula dependencies that span large
distances or are many cells deep. - Trees are warped over a hyperbolic lens to
achieve both focus in the centre and context. - An artificial red root node is introduced to
connect disjoint trees.
15Formula Inspection Data Flow
- Visualising formula components and flow
direction. - Fully trace worksheets in one view.
Basic Referencing Components
16Formula Inspection - Dependency Types
Row Absolute
- Excel allows for combinations of relative and
absolute positioning. - Understanding the referencing type is important
when replicating formula and identifying regular
patterns.
Fully Absolute
Column Absolute
Relative
17Corpus Analysis of 259 Workbooks
Spatial Centre
- Demonstrations of visualisations created from a
corpus. - With this sample corpus the average worksheet
centre is more column centric. - Function utilisation suggests that the logical
functions, such as IF, actually outnumber simpler
math functions like SUM.
Number of non-empty Worksheets 227 Number of
empty Worksheets 195 Average Row 1.348 Average
Column 18.098 Max Row 1384 Max Col 82 Total
Occupied cells 55491 Orphans 51570 Root Cells
2105 Leafs 1031 Nodes in Cyclic References
29 Local Formula 108 Family Trees 509 Max Tree
Depth 22 Max Tree Breadth 150
Function Utilisation
18Summary
- Spreadsheets are significant examples of end-user
programming - Visualisation can assist end-users in better
understanding the structure of spreadsheets - In particular, the hidden structures created by
formula - Reviewed literature to investigate the
implications of the hidden structures. - Developed a toolkit to externally access the
spreadsheet structure and generate
visualisations. - Created several sample visualisations to help
improve end-user understanding.
19Current and Future Work
- Detailed user studies, including usability
evaluations - Domain specific visualisations
- Spreadsheet corpus analysis to find large
patterns - Visualisation scalability to larger more complex
spreadsheets
http//www.mcs.vuw.ac.nz/db/honours.html