Digitisation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Digitisation

Description:

Bitmap and vector images/raster (bitmap) and vector ... Bitmap (Raster) Images. The image is made up of many pixels ... Can be zoomed (c.f. bitmap images) ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 21
Provided by: IT86
Category:

less

Transcript and Presenter's Notes

Title: Digitisation


1
Digitisation Mick Eadie Visual Arts Data Service
2
Source Digitisation - Resource
The input channels of digitisation (keyboard,
scanner etc.) are narrow and can only capture a
partial representation of the original source
3
Digitisation Pathways
Digital audio/movie recording
Photocopy Photograph Recording
Copy of Source
Sound, Moving image
Original Source
Item to Digitise
Digital Object
Digital Resource
2D Image 3D Model
Scan Digital Camera 3D Scan
OCR Line tracing
4
Elements of a Digital Resource
  • Users
  • Knowledge
  • Experience
  • Culture
  • Environment
  • Hardware
  • Software
  • (OS)
  • (Network)
  • Digital Objects
  • Binary Data
  • Data Models
  • Relationships

The environment of a digital resource often
receives the most attention, but it is the users
and digital objects that are most
important Hardware and software selection should
be based on the needs of the users and the types
of digital objects to be used Fit for Purpose
Digital objects must be created with their
intended use/purpose of paramount importance
5
Digital Objects
  • Text
  • Data stored as a stream of characters (numbers,
    letters, etc.)
  • Image
  • Data primarily understood as a spatial pattern or
    shape
  • Bitmap and vector images/raster (bitmap) and
    vector spatial data
  • Time
  • Data primarily understood as a sequence through
    time
  • Audio and/or video (multimedia)

6
Text
  • Essentially, numeric codes used by the computer
    to represent specific characters
  • Fonts must be designed to provide a visual image
    for each code
  • Software must be designed to interpret the codes
  • ASCII is the most well known text encoding scheme
  • 1 byte per character 256 unique characters,
    primarily the Latin alphabet
  • Other characters are handled by having multiple
    code pages
  • Each code page uses the same codes to represent
    different characters
  • UNICODE is the replacement for ASCII
  • 2 bytes to store each character 60,000 codes
  • Can represent characters from different alphabets
    simultaneously as each character has a unique code

7
Text Transcription
  • Advantages
  • Low overhead to start transcription person,
    keyboard, document
  • Hand-written documents can be transcribed
  • A transcriber can follow complex disorganised
    documents
  • Issues
  • Slow and expensive
  • Human error
  • Good practice
  • Double entry (two transcribers both enter the
    same document and the transcriptions are checked
    for differences)
  • Keep copies of originals with transcriptions
    (preferably as digital images as this make
    post-transcription checking simple and quick)

8
Optical Character Recognition
  • Advantages
  • Automatic, suitable for digitising large numbers
    of documents
  • Highly accurate for clean, clear type written
    documents
  • Issues
  • Current technology is very poor on hand-writing
  • Complex document layout can become scrambled
  • Good practice
  • Proof-read, spell check OCR output for errors
  • Provide image of page with text so users can
    check the text themselves

9
Bitmap (Raster) Images
  • The image is made up of many pixels
  • Each pixel stores information about its colour
  • The standard archival file format is uncompressed
    TIFF

10
Resolution
  • Resolution is often expressed as dots per inch
    (dpi)
  • More accurately pixels per inch (ppi)
  • The frequency at which samples are taken by the
    capture device from the original source
  • Common misconceptions about ppi
  • Not an indicator of image size or quality
  • Unless we know the size (inches, cms) of the
    original
  • A better guide to digital image size is pixel
    dimensions e.g. 2000 x 3000 pixels, which allows
    us to work out the size of the image we will
    output to monitor or printer
  • No of pixels/output res output size

11
Scanners and Digital Cameras
  • Advantages
  • Accurate(?) visual representation of the source
  • Issues
  • Text and logical structure of a document is not
    captured (can be through OCR or line tracing)
  • Good practice
  • Capture master images at appropriate resolution
    and bit depth
  • Check the optical resolution of the scanner
    (avoid interpolated resolution)
  • Check the colour resolution (bit depth)
  • Check scanning time
  • Record details of scanner settings and any image
    editing done afterwards

12
Vectors
  • A point represents an exact location in two or
    three dimensional space
  • Two points define a line
  • A series of connected lines define an area

x,y
x,y,z
13
Vector Data
  • Advantages
  • Can be zoomed (c.f. bitmap images)
  • Allows spatial analysis (spatial statistics,
    network analysis)
  • Issues
  • Precision versus accuracy (detail versus
    truthfulness)
  • Scale versus resolution
  • Good practice
  • Ensure polygon topology (the polygons each line
    belongs to) is stored

14
Digital Audio
  • Human hearing
  • Frequency (pitch) - 20Khz to 20,000Khz
  • Intensity (loudness) - 0 and 120Db
  • Full sound reproduction requires digitisation at
    more than 40,000 samples a second (44,100 is a
    common standard)
  • NYQUIST rate for lossless digitisation, the
    sampling rate should be at least twice the
    maximum audio frequency
  • One second of good quality uncompressed digital
    sound is equivalent to ¼ of the Complete plays of
    Shakespeare
  • MP3 offers good quality compressed (lossy) files
  • Midi not a digital recording of actual sounds,
    but a digital sample library of how musical
    instruments sound

15
Digital Moving Images
  • 1 second of uncompressed good quality digital
    video (without sound) is equivalent to about ¾ of
    the complete plays of Shakespeare
  • MPEG - The Motion Pictures Experts Group
    standards are the most popular compression
    standards
  • The three standards, MPEG-1, MPEG-2, MPEG-4
  • Compression basically works by selecting key
    frames and only recording changes between the
    frames (but it gets a lot more complicated!)

16
Data Models
  • A data model is a set of rules that defines a
    particularly way
  • of organising a collection of digital objects
  • List, one item follows another
  • Tree, each item can have several children
  • Sets, items belong to one or more groups
  • Geography/geometry, items are located using a
    co-ordinate system

17
Selecting a Data Model
  • To be useful, digital objects must be
  • Arranged according to the rules of an appropriate
    data model
  • Stored in a file format that can represent the
    data model
  • Accessed with software that understands the file
    format and the data model, and can present the
    data in an appropriate way
  • When selecting a data model
  • Consider the natural organisation of your
    source
  • Consider what method of organisation will be
    familiar to your users
  • Consider the method of organisation that best
    fits your purposes
  • Then seek specialist advice if you need it!

18
Selecting Software
  • Selecting the right data model is more important
    than selecting a particular piece of software
  • Pick software that works with your preferred data
    model (can perform the right tasks)
  • Dont use a webpage editor as a database
  • Dont use a word processor as a spreadsheet
  • Avoid little-used software with proprietary
    features
  • Look for software with lots of export and import
    options
  • Look for software that supports important
    standards
  • Trees ? markup ? XML (SGML)
  • Sets ? relational databases ? SQL
  • Coordinates ? CAD or GIS ? less clear, use file
    formats like DXF, ESRI shape files

19
Digitisation a Balancing Act
  • Successful digitisation involves several
    trade-offs
  • Amount and detail versus time and cost of
    digitisation
  • Complexity of the digital resource versus ease of
    use
  • Flexibility of the digital resource versus
    suitability for a specific use
  • Digitisation with current technology versus
    future possibilities
  • Your project should be guided by a firm
    understanding of the source and the intended
    purpose of the digital resource
  • Do not exceed available support (financial,
    technical, labour)
  • Minimise the loss of information from the
    original during the digitisation process
  • Keep information that tracks the origin and
    history of the digital resource with the digital
    resource

20
Where to get more advice
  • AHDS Guides to Good Practice series
  • http//vads.ahds.ac.uk/guides/index.html
  • Technical Advisory Service for Images (TASI)
  • http//www.tasi.ac.uk
  • Text Encoding Workshops
  • http//www.ota.ahds.ac.uk
  • BUFVC Workshops
  • http//www.bufvc.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com