Tools%20for%20Text - PowerPoint PPT Presentation

About This Presentation
Title:

Tools%20for%20Text

Description:

ADT Tree. Definition: A set of nodes and a set of links connecting pairs of nodes such that ... Generic identifier. Attribute: name = 'value' Shows hierarchy of ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 22
Provided by: rober102
Category:

less

Transcript and Presenter's Notes

Title: Tools%20for%20Text


1
Tools for Text
  • Review

2
Algorithms
  • The heart of computer science
  • Definition A finite sequence of instructions
    with the properties that
  • Each instruction is well-defined
  • Each instruction can be completed in finite time
  • The process terminates

3
Algorithms (2)
  • Examples
  • Linear search
  • Binary search
  • Boyer-Moore
  • Shift table for Boyer-Moore
  • Huffman code
  • Grammatical rules

4
Algorithms (3)
  • Issues
  • Preprocessing
  • Efficiency
  • Notation

5
Abstract Data Types
  • Abstraction of common ideas or features of
    computer systems
  • Definition A set of objects and a collection of
    operations on those objects

6
Abstract Data Types (2)
  • Examples
  • Strings of characters with operations first,
    last, head, tail, concat, substr, match, in
  • Trees (rooted, unordered) with operations insert
    node, delete node, localize
  • Networks with operations from web browser
    forward, back, home, go

7
ADT Tree
  • Definition A set of nodes and a set of links
    connecting pairs of nodes such that
  • No node is linked to itself (no loops)
  • One node is designated as the root
  • No pair of nodes is joined by more than one link
    (no superhighways)
  • There is a unique path from any node to any
    other node (no cycles)
  • Shows hierarchy

8
ADT Network
  • Definition A set of nodes and a set of links
    connecting pairs of nodes such that
  • No node is linked to itself (no loops)
  • Each link has a direction
  • No pair of nodes is joined by more than one link
    in the same direction (no superhighways)
  • Sources and sinks

9
ADT Binary Tree
  • Definition A (rooted) tree with the properties
    that
  • Each node has either 0, 1 or 2 child nodes.
  • The child nodes are ordered (usually called left
    and right)

10
Measures of ADTs
  • Strings length
  • Trees degree of node, level of node, height of
    tree
  • Networks degree of node
  • out degree in degree
  • Arrays dimension, size
  • In general counts

11
Data structures
  • Ways of storing information
  • array, an indexed set of values
  • ASCII coded character
  • 1 byte 8 bits
  • 256 choices
  • Expressed in hexadecimal notation

12
Arrays
  • Nonpositional binary digram array
  • Positional binary digram array
  • Boyer-Moore shift table
  • ASCII code chart

13
Text Structure
  • Characters letters, digits, alphanumeric,
  • white space, punctuation
  • Words with or without punctuation
  • Lines
  • Sentences
  • Paragraphs
  • Files

14
Tools for Text
  • Searching
  • Spell checking
  • Grammar checking
  • Displaying
  • Encrypting
  • Compressing

15
Searching
  • Alphabet
  • Set of strings
  • Wildcard notation
  • matches 0 or more characters
  • ? matches exactly one character
  • designates a finite set of characters

16
Searching (2)
  • Linear search
  • Ordered vs. unordered list
  • Binary search
  • Efficiency compared to linear search
  • Indexed search
  • Modeled on thumb tabs

17
Spell Checking
  • Detection
  • Correction
  • N-gram analysis
  • Edit distance

18
Grammar Checking
  • You right Checker says right Action
    none
  • You right Checker says wrong Action
    ignore
  • You wrong Checker says right Action A
    difficulty
  • You wrong Checker says wrong Action make
    correction

19
Displaying
  • Markup
  • HTML tag ltGI attribute gt
  • Generic identifier
  • Attribute name value
  • Shows hierarchy of content
  • Fonts

20
Encrypting
  • Character based
  • Shift
  • Monoalphabetic substitution (cryptograms)
  • Polyalphabetic substitution
  • Numerically based
  • PGP pretty good privacy
  • Public key encryption

21
Compressing
  • Frequency-based
  • Example Huffman coding
Write a Comment
User Comments (0)
About PowerShow.com