Down and Dirty Digitization: Everything you need to know about putting content online - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Down and Dirty Digitization: Everything you need to know about putting content online

Description:

Full-color pictures. Anything that requires more than 256 colors. JPEG ... Print (on acid free paper!) Store. Refresh. Encapsulate. Emulate ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 64
Provided by: royten2
Category:

less

Transcript and Presenter's Notes

Title: Down and Dirty Digitization: Everything you need to know about putting content online


1
Down and Dirty DigitizationEverything you need
to know about putting content online
  • Roy Tennant
  • California Digital Library

2
Outline
  • Project Planning
  • Selecting Material to Digitize
  • Digitization Purpose
  • Basic Imaging Principles
  • Capturing Images
  • Editing Images
  • Best Practices
  • Conversion to Text
  • Metadata
  • Access Systems
  • Skills Required of Staff
  • Preservation

3
Project Planning
  • Who will do the work?
  • What systems will be required?
  • What are the specifications for images and
    metadata?
  • How much will the project cost?
  • Who will own and manage the digital products that
    will be produced?

Steve Chapman, from Handbook for Digital
Projects, NEDCC
4
Selecting Material to Digitize
  • Publishing rights
  • Available support/funding opportunity
  • Critical mass
  • Uniqueness
  • Reputation
  • Audience and potential use
  • Diversity of material type
  • Ability to stand on its own and fit in with other
    collections

5
What Do We Preserve?
  • The body or the soul?
  • The artifact
  • The intellectual content
  • How do we decide that the artifact has
    preservation value?
  • Who decides?

6
The Artifact
  • The look and feel
  • The experience of interacting with a specific
    object
  • Consequences
  • Choices for providing access are limited
  • Time and money spent on recreating the artifact
    may be better spent on increasing access
  • In some cases, preserving the look and feel
    actually harms other uses

7
(No Transcript)
8
Written Material
  • Handwritten texts (diaries, etc.), or those with
    handwritten notations (manuscript drafts, etc.)
    can easily be considered to have artifactual
    value
  • But how much artifactual value do printed texts
    have?
  • And born-digital texts?
  • Whats it worth to you?

9
If the goal of preservation is persistent
utility, then functionality rather than
aesthetics should drive system design.
Stephen Chapman, Content Follows Form
Preservation via Systems Design, Microform
Imaging Review
10
Persistent Utility
  • Form must be allowed to be altered or destroyed
    to retain or enhance function
  • If function cannot be retained or enhanced, then
    form should be preserved

11
Considerations for Retaining Items in Original
Format
  • Age
  • Evidential value
  • Aesthetic value
  • Scarcity
  • Associational value
  • Market value
  • Exhibition value

12
The issue is not to evaluate the artifact per
se to determine what survives and what does
notThe issue is the need to agree on a method
for interrogating the individual artifact, that
would, in a climate of finite resources, help
make a good decision about whether and how
to preserve it.
Council on Library and Information Resources,
The Evidence in Hand the Report of the Task
Force on the Artifact in Library Collections
13
How Do We Preserve It?
Preservation costs by method calculated by the
Library of Congress Preservation Directorate
14
Types of Materials
Printed text/ Simple line art
Mixed
Halftones
Manuscripts
Continuous Tone
From Anne Kenney, et.al., Moving Theory into
Practice
15
Benchmarking
  • The process whereby you determine your
    digitization requirements using the material you
    will digitize

16
Resolution
The number of pixels in a given area defines the
resolution of an image
One pixel
1
500 x 1,000 pixels
17
Dynamic Range (bit-depth)
1 bit 8 bit grayscale 8 bit
color 24 bit color
(GIF)
(GIF) (JPEG)
1 bit black or white 8 bits 256 shades 16
bits thousands 24 bits millions 36 bits
billions
18
RGB Color Space
8 bits per channel 24 bit color image
Red
Color Channels
Green
Blue
12 bits per channel 36 bit color image
19
Image Compression
  • Lossless the image is unchanged after
    compression (no image data is lost)
  • Typical file size 50 of original
  • Example LZW compression
  • Lossy the image is altered after compression
    (image data is lost)
  • Example JPEG

20
TIFF
  • Tagged Image File Format
  • Most often used to save master versions of
    images (unedited)
  • Can be compressed or uncompressed

21
Compuserve GIF
  • Graphic Interchange Format (GIF)
  • Maximum 8 bits/pixel 256 colors (shades)
  • Good for
  • Text and line art
  • Thumbnails
  • Not good for
  • Full-color pictures
  • Anything that requires more than 256 colors

22
JPEG
  • Joint Photographic Engineers Group
  • JPEG is actually a compression scheme the image
    file format is JFIF (JPEG File Image Format)
  • Good for
  • Full-color pictures
  • Anything that requires more than 256 colors
  • Not good for
  • Text or line art

23
New Image Formats
  • Portable Network Graphics (PNG) - from the W3C to
    replace the Compuserve GIF format and provide
    more capabilities
  • JPEG2000 - An upgrade of the JPEG format
  • Flashpix - from a consortium of commercial
    companies, to provide much higher-resolution
    images in a way that allows speedy network
    delivery
  • MrSID - From LizardTech, good for large format
    materials (maps, panoramic photos, etc.)

24
Capturing Images
  • Technologies
  • Digital Cameras
  • Flatbed Scanners
  • Film Scanners
  • Kodak PhotoCD
  • Outsourcing
  • Standards and Best Practices

25
Digital Cameras
Phase One PowerPhase FX 10,500 x 12,600 pixels,
760MB (48 bit RGB)
BetterLight Super6K 6,000 x 8,000 pixels, 136MB
(24bit RGB) 16,990
26
Flatbed Scanners
  • Minimum requirements
  • 600 X 1200 dpi optical resolution
  • 36-bit color
  • Not for slides or transparencies, best for
    81/2x11 or 81/2x14 originals
  • Sheet feeder (often optional) helpful for
    digitizing text

27
Film Scanners
  • For 35mm slides and negativesothers available
    for larger formats
  • 600 - 3,000
  • Most around 2700-4000 dpi,30-36 bit color

28
Kodak PhotoCD
  • Take pictures with a normal camera, but have your
    pictures developed onto a PhotoCD
  • A proprietary image format ImagePAC, but very
    high resolution (4 different resolutions)

29
Outsourcing Pros and Cons
  • Benefits
  • No ramp-up costs (both time and money)
  • Probably higher quality, at least to begin with
  • High volume capability
  • Drawbacks
  • May be more costly if you have underutilized
    staff time
  • No internal capability or experience developed
    (that is, when the money runs out, so does your
    chance to do anything more)
  • Rare items may require in-house digitization

30
Outsourcing How
  • Write an RFQ (Request for Quote) outlining
  • Type and amount of material being digitized
  • Quality requirements
  • Volume per unit of time requirements
  • For RFQ guidance and samples, see RLG Tools for
    Digital Imaging
  • www.rlg.org/preserv/RLGtools.html

31
Digital Image Work Flow
Rotate, Crop, Retouch, Brightness/ Contrast
Resize, Sharpen
Original TIFF or PCD 10-100MB
JPEG 100K
GIF 10K
Indexed Color Space
RGB Color Space
Stored offline
Stored online
32
Editing Images
  • Rotating
  • Cropping
  • Retouching
  • Adjusting
  • Resizing
  • Sharpening
  • Saving

33
Image Editing Demonstration
34
Conversion to Text
  • Optical Character Recognition (OCR) software is
    required (Caere OmniPage Pro, Xerox TextBridge,
    etc.)
  • Quality and typography of originals is key
  • Less than 99.5 accuracy is less expensive to
    have re-keyed offshore
  • For some applications, uncorrected text is
    sufficient

35
Imaging Best Practices
  • General guidelines for archival versions
  • Photos, illustrations, maps, etc.
  • 300-600dpi
  • 24-36 bit color
  • B/W Text document
  • 300-600dpi
  • 8 bit grayscale
  • Negatives and Slides
  • 2000-4000 pixels in longest dimension
  • 24-36 bit color for color 8 bit grayscale for B/W

36
Imaging Best Practices
The key to image quality is not to capture at
the highest resolution or bit depth possible, but
to match the conversion process to the
informational content of the original, and to
scan at that level--no more, no less. Moving
Theory Into Practice
37
Metadata Types
  • Structured description of an object or collection
    of objects
  • Three basic types
  • descriptive - e.g., title, creator, subject -
    used for discovery
  • administrative - e.g., resolution, bit depth -
    used for managing the collection
  • structural - e.g., table of contents page, page
    34, etc. - used for navigation

38
Metadata Appropriate Level
  • Collection-level access
  • Discovery metadata describes the collection
  • Example Archival finding aid encoded in SGML
    see http//www.oac.cdlib.org/
  • Item-level access
  • Discovery metadata describes the item
  • Example individual metadata records for each
    item see http//jarda.cdlib.org/cgi-bin/imagesear
    ch.pl

39
Collection Level Access
Images
Individual Finding Aid
Search Interface (Library catalogor dedicated)
Individual Finding Aid
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Item Level Access
Finding Aids
Images
Search Interface (Dedicated)
44
jarda.cdlib.org/search.html
45
Metadata Granularity
  • William Randolph Hearst
  • William Randolphmiddle Hearst
  • Consider all uses for the metadata
  • Design for the most granular use
  • Store it in a machine-parseable format

46
Metadata Qualification
  • William Randolph
    Hearst
  • Builder -- Castles --
    Southern California

47
Metadata Machine Parseability
  • The ability to pull apart and reconstruct
    metadata via software
  • For example, this
  • Can easily become this

William Randolphmiddle Hearst
Hearst, William Randolph
48
Metadata Standards
  • Metadata
  • Collection Level
  • Encoded Archival Description (EAD) -
    lcweb.loc.gov/ead/
  • Item Level
  • MARC
  • Dublin Core - purl.org/DC/
  • MODS - www.loc.gov/standards/mods/
  • Harvesting
  • Open Archives Initiative, www.openarchives.org

49
Access Systems
  • Exhibit
  • Browse
  • Search

50
Access Systems Exhibit
  • Goals
  • Inviting
  • Easy to navigate
  • Highlight selected parts of a collection
  • Teach
  • Requirements
  • Great graphic design
  • Informative and succinct commentary
  • Interesting subject matter

51
(No Transcript)
52
(No Transcript)
53
Access Systems Browse
  • Goals
  • Provide intriguing and interesting paths into and
    throughout a collection
  • Give a broad sense of a collection, but not show
    everything necessarily
  • Requirements
  • Logical browse paths
  • May have multiple paths to the same items (e.g.,
    time, geography, subject)

54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
Access Systems Search
  • Goals
  • To provide post-coordinate access to all items in
    a collection relevant to a particular query
  • To provide good methods to create a search as
    well as refine or alter the display as required
  • Requirements
  • Good search software (database or indexing
    software)
  • Good metadata (minimum is probably a title or
    caption for each item)
  • Good interface (options for navigation, search
    refinement, etc.)

58
(No Transcript)
59
Skills Required of Staff
  • Imaging
  • OCR
  • Markup languages (HTML, XML)
  • Cataloging metadata
  • Indexing and database technology
  • User interface design
  • Programming
  • Web technology
  • Project management

60
How Does Digital Data Die?
  • Let me count the ways
  • New replaces old
  • Death of a sponsor
  • Sponsor loses interest
  • Lost functionality
  • Format rot
  • Media format obsolescence
  • Content format obsolescence
  • Disaster

61
Preserving Digital Content
  • No preservation format
  • Digital preservation techniques
  • Print (on acid free paper!)
  • Store
  • Refresh
  • Encapsulate
  • Emulate
  • Proliferate (Lots Of Copies Keep Stuff Safe or
    LOCKSS)

62
Preserving Digital Content
  • Institutional commitment
  • Consortial agreements
  • Cooperatively funded central repositories
  • Preservation Open Market

63
The Best Defense
  • What will ensure that material will not be
    preserved?
  • Ignorance of its existence
  • Ignorance of its worth
  • Inability or unwillingness to pay for its
    preservation
  • Access helps with all of these problems
Write a Comment
User Comments (0)
About PowerShow.com