CS5545 Data Interpretation and Communication - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

CS5545 Data Interpretation and Communication

Description:

Complete sequence data from Human Genome Project. of 3 billion DNA units. Medical ... Software developed in the department - as part of the SumTime project ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 21
Provided by: srip1
Category:

less

Transcript and Presenter's Notes

Title: CS5545 Data Interpretation and Communication


1
CS5545 Data Interpretation and Communication
  • Yaji Sripada
  • Ehud Reiter

2
Time table
  • Lectures
  • 2 lectures on Mondays in Meston 311
  • 930 -1030
  • 1100 -1200
  • No lectures in Week 6 and Week 12
  • Practicals/Tutorials
  • 1 two hour practical/tutorial on Mondays in
    Meston 311
  • 1400-1600

3
Assessment
  • Two components
  • 25 continuous assessment
  • 75 end of term exam
  • Continuous assessment
  • First assignment
  • Weight 12.5
  • Issued in Week 5
  • Due on the Friday of Week 6
  • Second assignment
  • Weight 12.5
  • Issued in Week 11
  • Due on the Thursday of Week 12

4
Course Organization
  • Three parts
  • Weeks 1-4 YS
  • Week 5-7 ER
  • Weeks 9-11 YSER

5
Reading
  • Weeks 1-4
  • Mostly lecture notes and some research papers
  • Week 5-7
  • Lecture notes, research papers and
  • Background Ehud Reiter and Robert Dale,
    Building Natural Language Generation Systems,
    Cambridge University Press
  • Weeks 9-11
  • Lecture notes and research papers

6
Introduction
  • Humans have access to large volumes of data in
    many domains
  • Scientific
  • Complete sequence data from Human Genome Project
  • of 3 billion DNA units
  • Medical
  • Physiological data
  • 10s of parameters such as blood pressure and
    heart rate measured every second
  • Engineering
  • 100s of sensors on a gas turbine taking
    measurements every second
  • And many more

7
Varying purpose/task
  • Different people use data for different
    purposes/tasks
  • For example, physiological data is used by
  • Medical staff on the ward to monitor the patient
  • Medical researchers for scientific explorations
  • Medical admin staff to archive them in patient
    records

8
Varying abilities/disabilities
  • Not all humans are equal in using the available
    data
  • 1 in 4 adults in the UK has poor numerical skills
  • 1 in 7 people in the UK suffers from some form of
    physical disability (such as visual impairment)
  • Many of us just dont have the time to use all
    the data at our disposal
  • Data from our credit card bills and utility bills
  • Many of us dont have the required domain
    knowledge to interpret the data
  • Data from medical lab tests such as blood tests

9
What we need
  • Novel computer technology to
  • (1) analyse and interpret large volumes of data
  • (2) communicate to us the required information
    suitable to our task/purpose in a way suited to
    our abilities/disabilities
  • In this course we study
  • (1) issues involved in developing such novel
    technology
  • (2) currently available techniques to be used as
    part of the novel technology
  • (2) study some systems in some limited domains
    developed using existing technology

10
Data Analysis and Interpretation
  • Data analysis
  • techniques from several fields are used
  • Statistics
  • Medical signal processing
  • Image processing
  • Data Mining etc
  • Issues with reusing data analysis methods
  • Choosing an algorithm from multiple algorithms
    available for performing a task may not be easy
  • Even when we find an algorithm, it may not be the
    best fit for use in a communication context
  • In other words, we may have to adapt available
    data mining algorithms to suit our purpose
  • Data interpretation
  • Knowledge based techniques are used
  • Context dependent
  • Varies from domain to domain

11
Communication
  • Information can be presented to users either
  • Graphically using visualization technology or
  • Textually using Natural Language Generation
    (NLG) technology or
  • Speech using text to speech technology or
  • Combinations of the above
  • Issues with communication
  • Visualization
  • Relatively a mature technology - a large
    collection of visualization techniques for
    different kinds of data are available
  • communicating high dimensional data is hard
  • Communicating large data sets on low resolution
    screens is a challenge
  • NLG
  • Communicates messages more directly
  • Effective for communicating over low bandwidths -
    SMS
  • Currently being developed a few success stories
    in some limited domains

12
Accessibility
  • Communication works
  • for an intended audience with their associated
    abilities/disabilities
  • with an intended task/purpose
  • Therefore communication should be sensitive to
    different users with different abilities and
    purpose

13
System Building Life cycle
  • Several Iterations of the following phases
  • Knowledge Acquisition (requirements collection
    and analysis)
  • System design
  • Implementation
  • Evaluation
  • Differs from the normal software development life
    cycle
  • Poorly understood requirements
  • System design ideas still under research
  • Evaluation ideas too still under research

14
Course Organization in detail
  • Lectures
  • 3 Parts
  • Part 1
  • Data Analysis Interpretation
  • Basic Statistics
  • Data analysis - Trend and pattern detection
  • Part 2
  • Data Communication
  • Visualization
  • NLG
  • Accessibility
  • Part 3
  • Real World Applications
  • Practicals
  • Part 1
  • Basic data analysis techniques using Excel
  • Trend and pattern detection in time series and
    spatial data
  • Visualization of time series and spatial data
  • Part 2
  • Document planning for summarising time series
    data
  • Micro-planning for summarising time series data

15
In our department
  • Many projects aim to develop technology for
  • data interpretation and communication
  • It is one of the three research themes in the
    department
  • Projects
  • SumTime Summarising Time Series Data
  • RoadSafe Automatically generating advisory text
    for road maintenance vehicle routing new
    project
  • BabyTalk Generating textual summaries of
    clinical temporal data new project
  • ScubaText Generating textual reports of Scuba
    dive computer data
  • Atlas.txt Generating textual reports of Census
    data for visually impaired people

16
Example 1 SumTime-Mousam
  • Software developed in the department - as part of
    the SumTime project
  • Task Automatically generates weather forecast
    texts in English
  • Input Numerical Weather Prediction (NWP) Data
    output of weather simulation software
  • Output English text delivered
  • As an ascii file to the client
  • In the spoken form over a telephone line
  • As a text message over a mobile line (currently
    explored)
  • Operationally deployed at a weather services
    company in Aberdeen
  • Produces around 150 draft forecasts/day
  • Produces text in some ways better than human
    authors

17
SumTime-Mousam (2)
  • SumTime technology
  • (1) Analyses NWP data Using segmentation
    techniques developed in the time series data
    mining community
  • (2) automatically produces the English forecast
    text using Natural Language Generation (NLG)
    technology
  • Majority of SumTime output texts used by oil
    company staff supporting oilrigs in the North Sea
  • Can we produce weather forecasts for a different
    purpose/task say for hill climbers?
  • In this course, we study how data
    analysis/interpretation and its communication
    (presentation) vary with the end-user
    task/purpose.

18
Example 2 GIS
  • Technology to store, retrieve, analyse and
    visualize spatial data on geographic maps
  • Plot delivery routes on street maps to a level of
    detail pinpointing even the locations of manholes
    and speed cameras
  • Plot census data such as residents ages, gender,
    income etc on country or regional maps for
    businesses to target their customers

19
GIS (2)
  • GIS technology
  • (1) Analyses/interprets spatial data
  • (2) presents spatial data in the form of visual
    maps
  • Great for sighted users, but useless for visually
    impaired users
  • In this course, we study technology not just
    based on what it does, but also based on to
    whom it does.
  • Accessibility issues

20
Summary
  • You learn novel technology to
  • Analyse and interpret large data sets by adapting
    data analysis techniques developed in other
    fields
  • Communicate (present) relevant information to
    different users with different tasks and
    abilities.
  • Relevant to E-technologies
  • All modern organizations
  • possess large volumes of data and
  • Communicate information to different stakeholders
Write a Comment
User Comments (0)
About PowerShow.com