Text processing - PowerPoint PPT Presentation

About This Presentation
Title:

Text processing

Description:

Need for embedded commands to describe layout of document. Three approaches to desktop publishing: ... Each page of a document is usually bracketed by a save ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 16
Provided by: borameCs
Category:
Tags: processing | text

less

Transcript and Presenter's Notes

Title: Text processing


1
Text processing
  • Programming Language Design and Implementation
    (4th Edition)
  • by T. Pratt and M. Zelkowitz
  • Prentice Hall, 2001
  • Section 12.1

2
Desktop publishing
  • Traditional publication systems
  • WYSIWYG - What you see is what you get
  • Typewriters examples of early WYSIWYG systems
  • More complex today - Multiple fonts, colors,
    embedded graphics
  • Need for embedded commands to describe layout of
    document
  • Three approaches to desktop publishing
  • WYSIWYG
  • Page description languages
  • Document compiling

3
WYSIWYG
  • A common approach in PC world
  • Tools like Microsoft Word and Corel WordPerfect
  • Embedded commands in document to control layout
    (fonts, colors, font size, location of objects)
  • Rich Text Format (RTF) - An ASCII language for
    describing such layout. Can be used to pass
    information among different processors

4
LaTeX
  • TeX Document processing system
  • developed by Donald Knuth
  • a macro processing system for creation of string
    text (i.e., documents)
  • Arcane syntax
  • LaTeX Macros for TeX
  • a set of macros developed for TeX by Leslie
    Lamport
  • creates a series of environments and control
    structures similar to programming language
    structures
  • for lack of a better term, we often refer to the
    compiling of the book as various chapters are
    processed by the TeX program
  • This book developed using LaTeX

5
LaTeX execution
  • Executes much like a traditional compiler
  • First pass
  • Read in text and create output format.
  • Create symbol table for all internal references
    (section numbers, page numbers, figure numbers)
  • Create table of contents and index, if desired
  • Second pass
  • Read in text and create output format.
  • This time, internal references are correct
    because of symbol table created during pass 1.
  • Third pass
  • If no changes made to symbol table by pass 2,
    same as pass 2 otherwise repeat pass 2, again
    until no further changes are made to symbol table
  • Why more than 2 passes? - Think of putting a
    table of contents at beginning of a report.

6
LaTeX features
  • LaTeX creates environments that make TeX easier
    to use. These behave much like C or Pascal scope
    rules
  • For example, one can begin and end a list of
    items
  • Numbered
  • \beginenumerate
  • \item text Prints as number 1
  • \item text prints as number 2
  • \endenumerate End of list
  • Bulleted (itemized)
  • Named (description)
  • Starting new sections or subsections
    automatically adjusts the appropriate section
    numbers. LaTeX has a syntax similar to the
    block-structured style of a programming
    languages.

7
LaTeX structure
8
LaTeX execution
  • By invoking LaTeX, the latex.tex macros are read
    into TeX to create commands for chapters,
    sections, subsections, figures, tables, lists,
    and the numerous other structures needed to write
    simple documents.
  • The documentstyle command (in LaTeX) allows the
    user to add other style features.
  • The required article parameter causes article.sty
    to be read in to tailor latex.tex with commands
    needed for an article. For example, there are no
    chapters in articles, but for style book (i.e.,
    book.sty), chapters are defined.
  • 11pt defines the size of the text font (11-point
    type), and art11.sty is read giving additional
    information on line and character spacing for 11-
    point type. The TeX program along with
    article.sty and art11.sty form the standard way
    to process a LaTeX article.
  • Mystyle.sty defines addition maccros a user can
    add to tailor LaTeX for a specific document.

9
Page description languages
  • A Postscript program consists of five components
  • 1. An interpreter for performing calculations. A
    simple postfix execution stack is the basic
    model.
  • 2. A language syntax. This is based on Forth.
  • 3. Painting extensions. An extension to Forth
    with painting commands for managing the process
    of painting text and pictures on a sheet of
    paper.
  • 4. Defines a virtual machine for drawing
    information (text and graphics on a page). The
    showpage operator causes the described page to be
    displayed
  • 5. Conventions. A series of conventions, not part
    of the formal Postscript language, that various
    printers use for consistency in presentation. Use
    of these conventions makes it easier for
    transporting postscript documents from one
    system to another.

10
Postscript execution model
  • A Postscript program consists of a sequence of
    commands that represent the postfix of the
    algorithm necessary to paint the document.
  • Postscript execution begins with two entries
    initially on the stack, which the program may not
    remove
  • Systemdict is the system dictionary, which
    represents the initial binding of Postscript
    objects to their internal representation.
  • Userdict is the user dictionary, which represents
    the new definitions included within this
    execution of a Postscript program. This may
    include redefinition of primitive objects already
    defined in systemdict.

11
Sample Postscript command
  • Each argument is stacked on Postscript stack
  • /box newpath 0 0 moveto 0 1 lineto 3 1 lineto 3
    0 lineto closepath def
  • /box Add name box to stack. / says this is a
    definition and not to evaluate arguments, only
    move to stack (like quote in LISP)
  • newpath start a new path
  • moveto Take top two stack arguments and move
    cursor to that (X,Y) location
  • lineto Draw line from current cursor to the
    (X,Y) address, which is the top two stack numbers
  • closepath Draw line back to newpath location
  • def Everything within ... is defined to be
    command box
  • Note that the command box now draws a rectangle
    from (0,0) to (0,1) to (3,1) to (3,0) and back to
    (0,0)

12
Summary
  • Note differences between models
  • LaTeX and MS Word - define the layout of the
    final document
  • Postscript - defines a program which computes the
    final layout. A Postscript printer contains an
    interpreter that executes the Postscript program
    to produce the final printed document

13
Postscript execution stacks
  • 1. The operand stack contains the operands as
    they are stacked,executed, and unstacked.
  • 2. The dictionary stack contains only dictionary
    objects. This stack defines the scope and context
    of each definition.
  • 3. The execution stack contains executable
    objects. For the most part, these are functions
    in intermediate stages of execution.
  • 4. The graphics state stack manages the context
    for painting objects on the page
  • A Postscript program is a sequence of ASCII
    characters. As each token is read, its definition
    is accessed in the stack (by first looking in
    userdict and then systemdict) and executed by an
    appropriate action.

14
Document conventions
  • Conventions built into all Postscript
    interpreters
  • The leading comment should be!PS That informs
    the interpreter that the file is a Postscript
    program.
  • Each page of a document is usually bracketed by a
    save and a restore command to isolate that page
    from the effects of other pages.
  • DocumentFonts a list of fonts used in the
    document
  • Title an arbitrary string, the title of the
    document
  • Creator the name of the program that created
    the file
  • CreationDate the date and time of creation
  • Pages the number of pages in the document.
  • BoundingBox the four values that represent the
    lower left and upper right corners of the page
    that are actually painted by the program. This
    allows the pages to be inserted into other
    documents.

15
Postscript summary
  • Postscript was developed to be a virtual machine
    architecture that can be used to create printable
    documents. Postscript of a document is not meant
    to be read by a programmer. However, the syntax
    is quite simple and easily understood.
  • Postscript has been developed further by Adobe
    with the creation of their Portable Document
    Format (PDF). PDF is a form of compressed
    Postscript. PDF readers are freely available over
    the Internet, and most Web browsers can display
    PDF files. PDF has become ubiquitous for the
    transmission and display of formatted documents.
  • Giving away PDF display programs was a shrewd
    move for Adobe because they sell the Acrobat
    program needed to create PDF documents.
Write a Comment
User Comments (0)
About PowerShow.com