Chapter%208%20Characters%20and%20Strings - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter%208%20Characters%20and%20Strings

Description:

Computers tend to be good at working with numeric data. ... ASCII has been superseded by Unicode. Figure 8-1, p. 256, table. Some notes ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 25
Provided by: qiao
Category:

less

Transcript and Presenter's Notes

Title: Chapter%208%20Characters%20and%20Strings


1
Chapter 8 Characters and Strings
2
Principle of enumeration
  • Computers tend to be good at working with numeric
    data.
  • The ability to represent an integer value,
    however, also makes it easy to work with other
    data types as long as it is possible to represent
    those types using integers. For types consisting
    of a finite set of values, the easiest approach
    is simply to number the elements of the
    collection.
  • Types that are identified by counting off the
    elements are called enumerated types.

3
Characters
  • Computers use the principle of enumeration to
    represent character data inside the memory. If
    you assign an integer to each character, you can
    use that integer as a code for the character it
    represents
  • Character codes, however, are not particularly
    useful unless they are standardized.
  • The first widely adopted character coding was
    ASCII American Standard Code for Information
    Interchange.
  • With only 256 characters, the ASCII system proved
    inadequate to represent the many alphabets in use
    throughout the world.
  • ASCII has been superseded by Unicode.
  • Figure 8-1, p. 256, table.

4
Some notes
  • The first thing to remember about the Unicode
    table is that you dont actually have to learn
    the numeric code for the characters. The
    important observation is that a character has a
    numeric representation, and not what that
    representation happens to be.
  • A character constant consists of the desired
    character enclosed in single quotation marks.
    Thus, the constant A in a program indicates the
    Unicode representation of an upper case A. That
    it has the value 1018 6510 is irrelevant detail.

5
Important properties
  • The codes for the digits 0 through 9 are
    consecutive.
  • 0 9 is 9
  • The codes for the uppercase letters A through Z
    are consecutive the codes for the lowercase
    letters a through z are consecutive.
  • a 2 is c
  • The arithmetic operations can be used with
    character values just as with integers.
  • Avoid using integer constants to refer to Unicode
    characters .

6
Special characters
  • Most of the characters in the Unicode table
    appear on the keyboard. They are called printing
    characters.
  • The table also includes special characters. They
    are indicated in the Unicode table by an escape
    sequence, which consists of a backslash followed
    by a character or sequence of digits.
  • \b Backspace
  • \f Form feed (starts a new page)
  • \n Newline (moves to the next line)
  • \r Return (moves to the beginning of the current
    line)
  • \t Tab (moves to the next tab)
  • \\ Backslash character itself
  • \ The character
  • \ The character
  • \ddd The character whose Unicode is the octal
    number ddd

7
Conversion
  • It is better to make the conversion between int
    (Unicode) and char (character) explicit by
    introducing type casts.
  • Example
  • Randomly generate an uppercase letter.
  • private char randomLetter()
  • return (char)
    rgen.nextInt((int) A, (int) Z)

8
  • The operations that generally make sense
  • Adding an integer to a character (usually a
    digit).
  • Subtracting one character from another.
  • a A gives the distance between a lowercase
    letter and its corresponding uppercase letter.
  • M (a A) gives m
  • This can be used to convert uppercase letters
    into lowercase letters.
  • Comparing two characters
  • (ch gt a) (ch lt z) is true if ch is a
    lowercase letter

9
Useful methods in the character class
  • static boolean isDigit(char ch)
  • static boolean isLetter(char ch)
  • static boolean isLetterOrDigit(char ch)
  • static boolean isLowerCase(char ch)
  • static boolean isUpperCase(char ch)
  • static boolean isWhitespace (char ch)
  • static char toLowerCase(char ch)
  • Static char toUpperCase(char ch)

10
Strings
  • Java defines many useful methods that operate on
    the String class.
  • The String class uses the receiver syntax when
    you call a method on a string
  • String class is immutable. None of its methods
    ever changes the internal state. Classes that
    prohibit clients from changing an objects state
    is said to be immutable.
  • What happens is that these methods return a new
    string on which the desired changes have been
    performed.
  • To change a string, you can overwrite a string
  • str str.toLowerCase()

11
Strings vs. characters
  • Both the String and the Character classes export
    a toUpperCase method.
  • In the Character class, you call toUpperCase as a
    static method
  • ch Character.toUpperCase(ch)
  • In the String class, you apply toUpperCase to an
    existing string
  • str str.toUpperCase()

12
Selecting characters from a string
  • In Java, positions within a string are numbered
    starting from 0.
  • str.charAt(1) gives the second character in
    str.
  • A substring can be extracted from a larger
    string. If a string variable str contains hello,
    world
  • str.subString(1, 4)
  • returns ell

13
Comparing strings
  • Equality Use s1.equals(s2) instead of s1 s2
    for equality, since s1 s2 compares objects s1
    and s2 (references) not values (content) of
    objects.
  • Order Use s1.compareTo(s2). It compares two
    strings s1 and s2 using the numeric ordering
    imposed by the underlying character codes
    (lexicographic order), different from
    conventional dictionary ordering.
  • For characters, c1 lt c2, compares the codes of c1
    and c2.
  • Other methods in the String class, Figure 8-4, p.
    266.

14
Searching within a string
  • / Given a string composed of separate words,
    this method returns its
  • acronym.
  • _at_param str Given string composed of separate
    words.
  • _at_return The acronym of the given string.
  • /
  • private String acronym(String str)
  • String result str.substring(0,1) /
    get the first character /
  • int pos str.indexOf( ) /
    position of the first space /
  • while (pos ! -1) / while not
    the end /
  • result str.substring(pos
    1, pos 2)
  • / concat a leter /
  • pos str.indexOf( , pos
    1) / position of next space /
  • return result

15
Simple string idioms
  • Iterating through the characters in a string
  • for (int i 0 i lt str.length() i)
  • char ch str.charAt(i)
  • code to process each character in
    turn . . .
  • Growing a new string character by character
  • String result
  • for (whatever limits)
  • code to determine next ch to be added
    . . .
  • result ch

16
A case study
  • /
  • File PigLatin.java
  • ------------------------
  • This file takes a line of text and converts
    each word into Pig Latin while
  • keeping punctuation marks.
  • The rules for forming Pig Latin words are as
    follows
  • - If the word begins with a vowel, add way
    to the end of the word.
  • - If the word begins with a consonant,
    extract the set of consonants up
  • to the first vowel, move that set of
    consonants to the end of the word
  • and add ay.
  • - If the word contains no vowel, the word is
    unchanged.
  • /

17
  • Top level English pseudo code
  • public void run()
  • Tell the user what the program does.
  • Ask the user for a line of text.
  • Translate the line into Pig Latin and
    print it on the console.
  • Implementation at the current level
  • public void run()
  • println(This program translates a
    line into Pig Latin.)
  • String line readLine(Enter a line
    )
  • Translate the line into Pig Latin and
    print it on the console.

18
  • Define a method to replace English, interface
    design
  • public void run()
  • println(This program translates a
    line into Pig Latin.)
  • String line readLine(Enter a line
    )
  • println(translateLine(line))
  • /
  • Translates a line into Pig Latin
  • _at_param line An English line
  • _at_return The Pig Latin
  • /
  • Private String translateLine(String line)

19
  • Next level English pseudo code
  • Apply a pattern, recalling the acronym pattern.
  • private String translateLine(String line)
  • String result
  • while not end
  • Get the next word
  • Translate that word into Pig Latin
  • Append the translated word to result
  • return result

20
  • As a programmer, you will often trip over some
    detail that the framers of the problem either
    overlooked or considered too obvious to mention.
    In some cases, the omission is serious enough
    that you have to discuss it with the person who
    assigned you the programming task. In many cases,
    however, you will have to choose for yourself a
    policy that seems reasonable.
  • In this case, the specification is unclear about
    spaces and punctuation marks. A reasonable
    decision is Keep spaces and punctuation marks,
    translate words only.

21
Implementation guideline
  • Identify reusable codes.
  • Use library whenever possible.
  • StringTokenizer class
  • import java.util.
  • Token is a sequence of characters that acts as a
    constant unit.
  • In this case, take a word as a token, punctuation
    marks as delimiters.
  • Define DELIMITERS check wikipedia or keyboard.

22
Implementation guideline (cont.)
  • Use the character methods, FIGURE 8-3, and string
    methods, FIGURE 8-4.
  • Use for instead of while whenever possible.
  • Use for in findFirstVowel, since we can get
    word.length
  • Use for in isWord, since we can get token.length
  • Use table to exhaust cases.
  • findFirstVowel, which is called by translateWord,
    returns a value -1 or 0 or a positive integer.
    Thus translateWord must handle all the cases.

23
Summary
  • For each level
  • English pseudo code
  • Straight implementations at the current level
  • Design methods to replace English pseudo code
  • Go to next level methods
  • Apply implementation guideline.
  • English pseudo code can be used as comments.

24
Testing
  • Bottom-up testing (start with testing methods at
    the lowest level and move up, test callees before
    the caller)
  • Test normal cases
  • Test special or extreme (boundaries of input
    variables) cases
  • Black-box testing (verify input/output
    specifications)
  • White-box testing (execute every part of the
    code, conditions in if, switch)
Write a Comment
User Comments (0)
About PowerShow.com