Parsing and Validating Text Input - PowerPoint PPT Presentation

About This Presentation
Title:

Parsing and Validating Text Input

Description:

Except for stdin, stdout and stderr, files have to be opened before reading or writing. ... It stops reading a field on encountering whitespace. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 21
Provided by: richar219
Category:

less

Transcript and Presenter's Notes

Title: Parsing and Validating Text Input


1
Parsing and Validating Text Input
  • file opening and closing
  • fprintf, fscanf and sscanf
  • fgets and fputs
  • fgetc and putc
  • Parsing a Token Delimited Input Record
  • Example Program using strtok
  • Input Validation Approaches
  • Checking for safe or dangerous input

2
file opening and closing
  • Except for stdin, stdout and stderr, files have
    to be opened before reading or writing. fopen()
    opens a file and returns a filehandle or NULL on
    error. fclose() closes a filehandle.
  • Example 1
  • include ltstdio.hgt
  • define IN "master.txt"
  • define OUT "backup.txt"

3
Example 1 continued
4
fprintf and fscanf
  • fscanf() and fprintf() work in the same way as
    printf and scanf, the difference being the extra
    first filehandle parameter is used to direct
    input/output to/from an opened file.
  • fscanf() has no inbuilt protection against buffer
    overflows, or data in the input file being
    incompatible with the data wanted. It stops
    reading a field on encountering whitespace. What
    if the field contains spaces, tabs or newlines ?

5
sscanf
  • With sscanf() the first parameter is the string
    it reads and
  • converts. This can be useful if the string has
    been validated e.g. as a number, and string
    fields don't contain embedded whitespace.
  • fscanf(stdin, args ... ) is the same as scanf(
    args ... ) .
  • fprintf(stdout, args ... ) is the same as
    printf( args ... ) .
  • sscanf(string, args ... ) like fscanf but scans
    and parses string
  • instead of file.

6
fgets and fputs
  • fgets() reads a line of data from a file up to
    and including the newline character '\n' into a
    string and then appends the string terminator
    character '\0' after the newline. fgets()
    returns the value NULL (not EOF !) when
    attempting to read beyond the end of file.
    fgets() requires
  • the name of the string ( or any other
    pointer giving the address at which it starts),
  • the maximum number of characters to read -
    1 (to leave room for the '\0' end of string
    marker)?
  • and the filehandle as parameters.
  • fputs() writes a string to a file such that
    fputs(string,out) is the equivalent of
    fprintf(out,"s",string).

7
fgets continued
  • The fgets() function is particularly useful for
    robustly reading text files organised into
    records separated using newlines as it contains
    built in buffer overflow protection. Data can be
    input using fgets() into a character string,
    validated to ensure the correct number and types
    of data items are present and then read from the
    string into local program variables of the
    appropriate types using sscanf().

8
fgetc and putc
  • These functions are the file-enabled equivalents
    of getchar() and putchar(). They are used to read
    and write single characters from and to files
    respectively. getc() returns EOF if an attempt is
    made to read beyond the end of file. cgetc(in)
    is the equivalent of fscanf(in,"c",c) and
    putc(c,out) is the equivalent of
    fprintf(out,"c",c) .

9
Copying file one character at a time
10
Parsing a Token Delimited Record
  • Use of the strtok() function in stdlib.h helps
    make this job a bit easier. The idea is to
    convert field delimiters into '\0' null
    characters. Strtok is passed and returns the
    address of the start of field 1. For fields gt 2
    you can either pass it a NULL instead, when it
    will automatically calculate the address of the
    start of the next string, unless you choose to
    calculate and pass the address of the start of
    the next string and so on. These string addresses
    can be stored in an array of char pointers for
    use later. This technique can be used to input
    fields which include unknown numbers of space and
    tab (\t) characters.

11
Warning concerning strtok
  • strtok() modifies the string it parses, by
    replacing field delimiters with '\0' NULL byte
    characters. If this is a problem, clone the
    string first using strcpy() and then parse the
    clone. E.G.
  • char clone
  • / using malloc() to avoid buffer overrun /
  • if( (clone (char)
  • malloc(sizeof(char)(strlen(original)1)))?
  • NULL )
  • exit(1) / error if insufficient memory
  • strcpy(clone,original)
  • / must remember to free(clone) later /

12
strtok example program
13
strtok program output
  • Name Joseph Smith
  • Weight 64.300000
  • Age 25

14
A thread-safe strtok
  • The static pointer variable value used internally
    within strtok() won't survive concurrent use in a
    multi-threaded application. If this is a problem,
    you can use the re-entrant version strtok_r(),
    prototype defined in the POSIX.1-2001 standard as
    follows
  • char strtok_r(char str, const char delim, char
    saveptr)
  • The saveptr has to be passed the address of a
    pointer variable declared within the caller
    function, which enables the position within the
    string being parsed to be remembered between
    function calls.

15
Input Validation Approaches
  • Is input likely to be perfect, clumsy or hostile
    ?
  • Perfect input assumes the person entering data
    will never use an incorrect key on the keyboard.
    The program is otherwise allowed to crash.
  • Clumsy input is common for a stand-alone
    application. An application is fragile and less
    usable if it crashes e.g. due to casual use of
    the ltentergt key by a user who hasn't read the
    prompt requesting input data correctly.
  • Hostile input has to be assumed very likely if
    the application accepts input data from
    non-authenticated users over the Internet. A
    standalone application might later become a
    web-browser plugin.

16
Buffer Overflow Protection
  • A buffer overflow occurs when a program writes
    beyond or outside allocated blocks of memory.
    Attackers may attempt to write specific data into
    the executable part of a program, e.g. vectoring
    execution into inserted code by overwriting a
    function return address (stack smashing). The
    allocated block might be an structure or
    character array, or a block allocated dynamically
    using malloc().
  • Many network programs are compromised through
    buffer overflows. fgets() allows the programmer
    to specify the maximum buffer size which it will
    overwrite. Careful programming is needed to
    ensure access can only be made within allocated
    memory.

17
Hostile input example
  • A web-based calculator program reads data from an
    HTML form expected to be in the format a op b,
    where a and b are numbers and b is an arithmetic
    operation e.g. , -, and / . A naive programmer
    has used a Perl or Python eval() function upon
    this input data and writes the result of the
    calculation to the web browser.
  • Mr Evil Cracker tests for this possibility with
    the form input
  • open("/etc/shadow").read()
  • This results in the output
  • Traceback (most recent call last)
  • File "", line 1, in ?
  • File "", line 0, in ?
  • IOError Errno 13 Permission denied
    '/etc/shadow'

18
Hostile input example 2
  • This shows that there is some rudimentary
    security on this system, as the webserver program
    is not running with the administrator privileges
    which would allow reading the shadow password
    file.
  • Mr Evil Cracker hasn't got a crackable form of
    the password hash file yet, but he now knows that
    he can run any Python code on the target system
    with the permissions of the webserver program. As
    this allows him to create and execute other
    program files on this server, all he now needs is
    to find a local privilege escalation exploit,
    rather than a remote one. His chances of running
    a program giving him full control of this server
    are now much greater.

19
Check safe or dangerous input ?
  • The problem with checking for dangerous input is
    that crackers will know things about your system
    that you don't. Therefore you don't really know
    what might be dangerous and what isn't so you
    can't easily check for specifically dangerous
    data. However, safe input is within the range of
    input values which you have designed your program
    to handle.
  • If the required data is in the form of a string,
    what are the maximum and minimum string lengths,
    and what characters should be allowed in a string
    e.g. to input someones name or address ? You
    should reject anything not in your allowed
    designed and tested range of values, sending a
    suitable error message to the user so that input
    mistakes can be corrected.

20
Checking safe input numbers
  • If you want to input numbers, you need to ensure
    that the data string input can be safely
    converted to a number. You want to consider what
    the range of acceptable numbers suitable for
    input to your program should be
  • Minimum and maximum values, avoiding numeric
    overflows.
  • Whether integer or floating point.
  • What the maximum acceptable string length is for
    each input.
  • Numbers should always be input as strings and
    then validated and converted.
Write a Comment
User Comments (0)
About PowerShow.com