Unix Programming: working with files - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Unix Programming: working with files

Description:

vis s : strip away non-printable character. Command-line arguments are available to main ... if (line) free(line); Automatically malloc/relloc buffer ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 54
Provided by: stormCis
Category:

less

Transcript and Presenter's Notes

Title: Unix Programming: working with files


1
Unix Programming working with files
  • CSRU3130, Spring 2008
  • Ellen Zhang

2
Last Class
  • Programming with standard I/O
  • Whats inside a file ?
  • ASCII code
  • getchar(), putchar()
  • printf

3
vis program
  • include ltstdio.hgt
  • include ltctype.hgt
  • main()
  • int c
  • while ((cgetchar()) !EOF)
  • if (isascii(c) (isprint(c) c\n
    c\t c ))
  • putchar (c)
  • else
  • printf (\\03o,c)
  • exit(0)

Character test macros in /usr/include/ctype.h
4
Escape sequence in C
  • \' Single quote
  • \" Double quote
  • \\ Backslash
  • \nnn Octal number (nnn), \xnnn Hexadecimal
    number (nnn)
  • \0 Null character (really just the octal
    number zero)
  • \a Audible bell
  • \b Backspace
  • \f Formfeed
  • \n Newline (NL, LF)
  • \r Carriage return (CR)
  • \t Horizontal tab

In Unix, lines are separated by NL (\n) In
Windows, lines are separated by CR and NL.
5
A program genfile.c
  • include ltstdio.hgt
  • include ltctype.hgt
  • main()
  • int c
  • c0
  • putchar (c)
  • putchar ('\007')
  • c'\a'
  • putchar (c)
  • putchar ('a')
  • putchar ('b')
  • putchar ('c')
  • putchar ('\n')
  • putchar ('\011') //tab, \t
  • putchar ('d')
  • putchar ('e')
  • putchar ('\012')

Online resource for ASCII here
6
Now try vis
  • zhang_at_storm vis ./genfile
  • abc
  • de
  • zhang_at_storm vis ./genfile od -cb
  • 0000000 \0 \a \a a b c \n \t d e
    \n
  • 000 007 007 141 142 143 012 011 144 145
    012
  • 0000013
  • zhang_at_storm vis ./genfile ./vis
  • \000\007\007abc
  • de

7
Today
  • Processing command line options/arguments
  • More on printf, scanf
  • File I/O
  • A script a day overwrite
  • A command a day find

8
Adding options to vis
  • vis s strip away non-printable character
  • Command-line arguments are available to main()
  • int main(int argc, char argv)
  • argc is the number of command line parameters
  • parameters are stored in argv0, ...,
    argvargc-1. Note argv0 is the command name
    itself.
  • Equivalently,
  • int main(int argc, char argv)

9
Interpreting complex declaration
  • char argv
  • brackets and parentheses (that is, modifiers to
    the right of the identifier) take precedence over
    asterisks (that is, modifiers to the left of the
    identifier).
  • Read from inside out
  • argv is
  • an array of
  • pointers to
  • character (apply type specifier, i.e., char,
    last)
  • Can you draw memory diagram for argv ?

identifier
10
Interpreting complex declaration
  • char (tmp)
  • brackets and parentheses (that is, modifiers to
    the right of the identifier) take precedence over
    asterisks (that is, modifiers to the left of the
    identifier).
  • Parenthesis can be used to override default order
  • Read from inside out, go to right first, then
    left
  • tmp is (start with the identifier)
  • pointer to (interpret whats in the parenthesis
    first)
  • array of (go to right)
  • character (apply type specifier, i.e., char,
    last)

identifier
11
Example of processing options
  • include ltstdio.hgt
  • include ltctype.hgt
  • int main(int argc, char argv)
  • int c, strip 0
  • if (argc gt 1 strcmp(argv1, "-s") 0)
  • strip 1
  • while ((c getchar()) ! EOF)
  • if (isprint(c) isspace(c) c\n
    c\t c )) )
  • putchar(c)
  • else if (!strip)
  • printf("\\03o", c)
  • return 0

strcmp is part of standard I/O library it
compares two strings and return 0 if two strings
are identical.
12
Today
  • Processing command line options/arguments
  • More on printf, scanf
  • File I/O
  • A script a day overwrite
  • A command a day find

13
Formatting output with printf
  • int printf (const char , )
  • Write strings, integers, doubles, etc to standard
    output, perform format conversion
  • means variable number of arguments, the first
    argument is required (a string).
  • Given a simple string, printf just prints the
    string (to standard output).
  • printf (I\thave\ttabs\n)
  • char s100
  • strcpy(s, printf is fun!\a\n)
  • printf (s)

14
Formatting output with printf
  • You can tell printf to embed some values in the
    string these values are determined at run-time,
    by using formatting tags embedded in the first
    string argument
  • Example
  • printf (here is an integer d\n, i)
  • printf (d d d\n, x, y, xy)
  • printf (reverse s and we get s\n, str,reverse
    (str))
  • printf (sqrt of n is lf \n, x, sqrt(x))

15
Printf formatting tag (1)
  • printf formatting tag
  • flagswidth.precisionlengthspecifier
  • Specifiers
  • d treat the corresponding parameter as a signed
    integer
  • u means unsigned integer
  • x means print as hexadecimal
  • s means treat it as a string
  • c is for character (char)
  • f is for floating point numbers

16
printf is dumb
  • printf will treat the corresponding parameter as
    the specifier suggests, even if the parameter is
    not of the given type
  • e.g., d is replaced by the value of the
    parameter when treated as an integer, even if the
    parameter is not an integer variable
  • printf (print an int d\n, Hi Dave)
  • print an int 134513980
  • printf ("print an int d\n",12.3)
  • print an int -1717986918

17
Fun with printf
  • char s Hi Dave
  • printf (the string \s\ is d characters
    long\n, s, strlen(s))
  • the string Hi Dave is 7 characters long
  • int x10
  • printf (xd is o in octal and x in
    hexadecimal\n, x,x,x)
  • x10 is 12 in octal and a in hexadecimal
  • Note print a single
  • e.g. printf (f of the population\n, 12.4)
  • 12.4 of the population

18
Printf formatting tags (2)
  • printf formatting tag
  • flagswidth.precisionlengthspecifier
  • Flags
  • 0 means left padding the output with 0
  • Width
  • A number specifying the width of the output
  • Precision
  • How many number of digits after decimal points
  • Example
  • printf (square root of 10 is 20.15f\n,sqrt(10)
  • square root of 10 is 3.162277660168380
  • printf(\\03o,gt)
  • 076

19
printf family
  • Write formatted output to standard output
  • int printf (const char , )
  • Write formatted output to a file
  • int fprintf (FILE , const char , )
  • Write formatted output to a string
  • int sprintf (char , const char , )
  • char birthDate20
  • sprintf(birthDate,02d02d/04d,m,d,y)

20
scanf (scan formatted)
  • int scanf (const char format, )
  • Read from standard input, with format string
    specifying what kind of variable(s) to read, and
    how the variables are separated/delimited
  • means variable number of arguments, each of
    them is address of variable that input should be
    stored
  • Return number of successfully matched and
    assigned input items (can be zero or smaller than
    provided)
  • Return EOF (-1) on error or end of file

21
A simple example of scanf
  • int x,y
  • scanf (d d,x,y) // a space match with
  • // arbitrary number of blank, tab ,etc.
  • Read from standard input two integers (separated
    by spaces, tab, newline), and save the first one
    to x the second one to y
  • Abc de // return 0, as cannot convert Abc and
    de to integers
  • 1023 232 //return 2, x set to 1023, y set to 232
  • 1023 232 // same as above
  • Note always check the return value of scanf !

Conversion specifier
22
A simple example of scanf
  • scanf(dd,x,y)
  • can only match with itself , i.e., the two
    integers are separated by
  • 1023232 //x will be set to 1023, y set to 232
  • 1023 234 //x will be set to 1023, y unset,
    return 1

Conversion specifier
23
Other scanf specifiers
  • Similar to the specifiers used in printf
  • u unsigned integer
  • o octal
  • x hexadecimal
  • f floating point
  • s a string

24
Problems with scanf
  • Using s only read the next word from input not
    a line of input
  • char s100
  • printf (Type in your name\n)
  • scanf (s,s)
  • printf (Your name is s\n,s)
  • What if the user input a string longer than 100
    characters long ?
  • segmentation fault

25
Ways to work around
  • char string1, string2
  • string1 (char ) malloc (25)
  • puts ("Please enter a string of 20 characters or
    fewer.")
  • scanf ("20s", string1)
  • printf ("\nYou typed the following
    string\ns\n\n", string1)
  • With the field width (20 in the example), scanf
    will only read the first 20 characters from user
    input to save it to string1.
  • Still need to make sure the string1 buffer is
    larger enough

26
Ways to work around (2)
  • Char string2
  • puts ("Now enter a string of any length.")
  • scanf ("as", string2)
  • printf ("\nYou typed the following
    string\ns\n", string2)
  • The a flag character tells scanf to allocate
    the buffer as large as needed (string2 will be
    set to point to the buffer allocated)

27
scanf is still not preferable
  • Avoid using scanf
  • can cause program to crash due to buffer overflow
  • can hang if it encounters unexpected non-numeric
    input while reading a line from standard input
    (?)
  • difficult to recover from errors when scanf
    template string does not match input exactly
  • Better practice to read input from keyboard
  • read a line (i.e., until a newline) with
    getline() or gets()
  • parse the resulting string with sscanf (string
    scan formatted, similar to scanf, but with input
    given by a string)

28
Reading a line
  • fgets read an entire line from a file
  • char fgets (char s, int size, FILE stream)
  • s pointer pointing to a string, should be big
    enough
  • size the maximum of chars
  • FILE a file handle, for now, remember
  • stdin (a constant) standard input
  • Read a line (i.e., read character until newline
    is met or until reach maximum ) from specified
    file, and save to the string pointed to by s

28
29
Example of fgets/sscanf
  • char s101
  • int month,day,year
  • printf (Type in your name\n)
  • fgets (s,100,stdin)
  • printf (Your name is s\n,s)
  • printf(Type in your date of birth(MM/DD/YYYY)\n)
  • fgets(s,100,stdin)
  • sscanf(s,d/d/d,month,day,year)
  • printf (You were born at 02d/02d/04d,month,day
    ,year)

30
getline a GNU extension
  • ssize_t getline(char lineptr, size_t n, FILE
    stream)
  • FILE fp
  • char line NULL
  • size_t len 0
  • ssize_t read
  • while ((read getline(line, len, stdin)) !
    -1)
  • printf("Retrieved line of length
    zu \n", read)
  • printf("s", line)
  • if (line) free(line)

Automatically malloc/relloc buffer
31
scanf family
  • Read from standard input
  • int scanf (const char format, )
  • Read from a file
  • int fscanf (FILE fp, const char format, )
  • Scan from a string
  • int sscanf (const char s, const char format,
    )

32
Standard I/O File Access
  • So far, we learnt how to read from standard
    input, write to standard output
  • Next how to read from or write to a file

33
File Access typical use
  • include ltstdio.hgt
  • FILE fp
  • To open a file
  • fp fopen(/tmp.txt, r)
  • To read a character from a file
  • int c c getc(fp)
  • To write a character to a file
  • putc(c, fp)
  • Read/write using fscanf/fprintf, fgets/fputs, ..
  • Finally, fclose(fp) is used to close a file

34
fopen() routine (1)
  • include ltstdio.hgt
  • FILE fopen(const char path, const char mode)
  • path a relative or full path name of the file
  • mode access mode
  • r open the file to read
  • w open the file to write (existing content
    will be discarded)
  • a open file to append (write starts at end of
    file)
  • rw open file to read and write

35
fopen library routine (2)
  • FILE fopen(const char path, const char mode)
  • FILE a data structure containing info. needed
    to perform input or output operations on it,
    including
  • a file descriptor (will study in low-level file
    access)
  • current stream position
  • an end-of-file indicator , an error indicator
  • a pointer to the stream's buffer, if applicable
  • Note
  • read/write at current stream position
  • Buffered I/O not every write is applied to the
    disk immediately

36
Example
  • FILE fp fopen(/tmp.txt, r)
  • takes a filename, does some housekeeping and
    negotiation with the kernel
  • Returns pointer to the FILE data structure on
    success return NULL on failure
  • Always check for error after the call
  • if (fpNULL)
  • printf (failed to open file /tmp.txt\n)
  • exit(1)

37
Meaningful error message
  • errno integer variable, set by system calls and
    some library functions in event of an error to
    indicate what went wrong
  • include lterrno.hgt
  • if (fpNULL)
  • switch (errno)
  • case EACCES
  • printf (You dont have permission\n)
  • break
  • case EINVAL
  • printf (Invalid argument to fopen\n)
  • break
  • default
  • printf (Something went wrong in fopen\n)
  • exit(1)

38
Using perror()
  • include ltstdio.hgt
  • void perror(const char s)
  • perror() produces a message on standard error
    output, describing the last error encountered,
    i.e., errno.
  • FILE fp fopen(/tmp.txt, r)
  • if (fpNULL)
  • perror(open /tmp.txt)
  • exit(1)

39
Three special files
  • Whenever a program is started, three files are
    automatically opened, with file pointers stdin,
    stdout, stderr.
  • getchar() is same as getc(stdin)
  • putchar(c) is same as putc(stdout, c).
  • printf(s,) is same as fprintf (stdout,s,)
  • scanf(s,) is same as fscanf (stdin,s,)

40
Other standard I/O Functions
  • feof(FILE ) return non-zero when end of file is
    reached
  • ferror(FILE ) return non-zero when any error
  • fflush (FILE ) flush any buffered output to the
    file

41
New vis handling files
  • int main(int argc, char argv)
  • int strip 0
  • int i
  • FILE fp
  • while (argc gt 1 argv10 '-')
  • switch (argv11)
  • case 's' / -s strip funny characters /
  • strip 1
  • break
  • default
  • fprintf(stderr, "s unknown arg s\n",
    argv0, argv1)
  • return 1
  • argc-- argv

42
main() contd
  • if (argc 1)
  • vis(stdin, strip)
  • else for (i 1 i lt argc i)
  • if ((fp fopen(argvi, "r")) NULL)
  • fprintf(stderr, "s can't open s\n",
    argv0, argv1)
  • return 1
  • else
  • vis(fp, strip)
  • fclose(fp)
  • return 0

43
Now vis
  • void vis(FILE fp, int strip)
  • int c
  • while ((c getc(fp)) ! EOF)
  • if (isprint(c) isspace(c))
  • putchar(c)
  • else if (!strip)
  • printf("\\03o", c)

44
A script a day overwrite
  • To replace UNIX with UNIX(TM) in a file called
    ch2
  • How about sed s/UNIX/UNIX(TM)/g ch2 gt ch2
  • A general solution ?
  • A script overwrite that save standard input to
    a file
  • sed s/UNIX/UNIX(TM)/g ch2 overwrite ch2
  • sort k 3 n data.txt overwrite data.txt

45
A script a day overwrite
  • !/bin/bash
  • PATH/bin/usr/bin
  • case in
  • 1)
  • ) echo Usage overwrite file 1gt2 exit 2
  • esac
  • new/tmp/overwr.
  • cat gtnew
  • cp new 1
  • rm f new

46
A command a day find
  • find command process a set of files and/or
    directories in a file subtree you can specify
  • where to search (pathname)
  • what type of file to search for (-type
    directories, data files, links)
  • how to process the files (-exec run a process
    against a selected file)
  • the name of the file(s) (-name)
  • perform logical operations on selections (-o and
    -a)

47
Examples
  • Search for file with a specific name in a set of
    files (-name)
  • find . -name "rc.conf" -print
  • Apply a unix command to all files found
  • find . -name "rc.conf" -exec chmod or '' \
  • Search for a string in a selection of files
  • find . -exec grep "www.athabasca" '' \ -print
  • search in current directory and all sub
    directories. All files containing string will
    have their path printed to standard output.

48
More ways to find files
  • Find all files under the root, that is regular
    file and modified seven or fewer days ago
  • find / -type f -mtime -7 -print

49
Homework 6 due April 8 (2 wks)
  • Implement your own wc command
  • wc -l -w -c filename ...
  • Options
  • -l show of lines
  • -w report of words
  • -c report of characters
  • List of filenames
  • Count for each of the file, and then report the
    total line/word/character numbers for all files
  • If no filename is given, count standard input

50
Whats next ?
  • Development tools (1-1.5 week)
  • gcc, make, gdb
  • Memory related topics (1 weeks)
  • Dynamic memory allocation
  • GNU C Library malloc, free, etc.
  • Debugging memory problems
  • System calls
  • Low-level File access
  • Processes, pipes, signals
  • Threads

51
Whats next ?
  • A final project, can be any of the followings
  • You can define your own project but instructors
    approval is needed
  • Some interesting tasks that require
  • Use multiple unix commands, shell programming
  • Some C programming to analyze data
  • Timeline for the project
  • Proposal presentation
  • Prototype due
  • Final project/presentation due

52
Project ideas
  • Example
  • Use command wget to retrieve a web page, parse
    the file and retrieve the web pages by following
    the hyperlinks within
  • Analyze
  • web page size statistics, or web contents
    statistics (word frequency)

53
Project ideas
  • C Program analyzing program
  • Beautifying program automatically indent the
    program
  • Statistics reporting lines of code, number of
    functions defined, number of loops, number of
    static variables, number of memory
    allocation/deallocation
Write a Comment
User Comments (0)
About PowerShow.com