Title: Unix Programming: working with files
1Unix Programming working with files
- CSRU3130, Spring 2008
- Ellen Zhang
2Last Class
- Programming with standard I/O
- Whats inside a file ?
- ASCII code
- getchar(), putchar()
- printf
3vis program
- include ltstdio.hgt
- include ltctype.hgt
- main()
-
- int c
- while ((cgetchar()) !EOF)
- if (isascii(c) (isprint(c) c\n
c\t c )) - putchar (c)
- else
- printf (\\03o,c)
- exit(0)
Character test macros in /usr/include/ctype.h
4Escape sequence in C
- \' Single quote
- \" Double quote
- \\ Backslash
- \nnn Octal number (nnn), \xnnn Hexadecimal
number (nnn) - \0 Null character (really just the octal
number zero) - \a Audible bell
- \b Backspace
- \f Formfeed
- \n Newline (NL, LF)
- \r Carriage return (CR)
- \t Horizontal tab
In Unix, lines are separated by NL (\n) In
Windows, lines are separated by CR and NL.
5A program genfile.c
- include ltstdio.hgt
- include ltctype.hgt
- main()
-
- int c
- c0
- putchar (c)
- putchar ('\007')
- c'\a'
- putchar (c)
- putchar ('a')
- putchar ('b')
- putchar ('c')
- putchar ('\n')
- putchar ('\011') //tab, \t
- putchar ('d')
- putchar ('e')
- putchar ('\012')
-
Online resource for ASCII here
6Now try vis
- zhang_at_storm vis ./genfile
- abc
- de
- zhang_at_storm vis ./genfile od -cb
- 0000000 \0 \a \a a b c \n \t d e
\n - 000 007 007 141 142 143 012 011 144 145
012 - 0000013
- zhang_at_storm vis ./genfile ./vis
- \000\007\007abc
- de
7Today
- Processing command line options/arguments
- More on printf, scanf
- File I/O
- A script a day overwrite
- A command a day find
8Adding options to vis
- vis s strip away non-printable character
- Command-line arguments are available to main()
- int main(int argc, char argv)
- argc is the number of command line parameters
- parameters are stored in argv0, ...,
argvargc-1. Note argv0 is the command name
itself. - Equivalently,
- int main(int argc, char argv)
9Interpreting complex declaration
- char argv
- brackets and parentheses (that is, modifiers to
the right of the identifier) take precedence over
asterisks (that is, modifiers to the left of the
identifier). - Read from inside out
- argv is
- an array of
- pointers to
- character (apply type specifier, i.e., char,
last) - Can you draw memory diagram for argv ?
identifier
10Interpreting complex declaration
- char (tmp)
- brackets and parentheses (that is, modifiers to
the right of the identifier) take precedence over
asterisks (that is, modifiers to the left of the
identifier). - Parenthesis can be used to override default order
- Read from inside out, go to right first, then
left - tmp is (start with the identifier)
- pointer to (interpret whats in the parenthesis
first) - array of (go to right)
- character (apply type specifier, i.e., char,
last)
identifier
11Example of processing options
- include ltstdio.hgt
- include ltctype.hgt
- int main(int argc, char argv)
- int c, strip 0
- if (argc gt 1 strcmp(argv1, "-s") 0)
- strip 1
- while ((c getchar()) ! EOF)
- if (isprint(c) isspace(c) c\n
c\t c )) ) - putchar(c)
- else if (!strip)
- printf("\\03o", c)
- return 0
-
strcmp is part of standard I/O library it
compares two strings and return 0 if two strings
are identical.
12Today
- Processing command line options/arguments
- More on printf, scanf
- File I/O
- A script a day overwrite
- A command a day find
13Formatting output with printf
- int printf (const char , )
- Write strings, integers, doubles, etc to standard
output, perform format conversion - means variable number of arguments, the first
argument is required (a string). - Given a simple string, printf just prints the
string (to standard output). - printf (I\thave\ttabs\n)
- char s100
- strcpy(s, printf is fun!\a\n)
- printf (s)
14Formatting output with printf
- You can tell printf to embed some values in the
string these values are determined at run-time,
by using formatting tags embedded in the first
string argument - Example
- printf (here is an integer d\n, i)
- printf (d d d\n, x, y, xy)
- printf (reverse s and we get s\n, str,reverse
(str)) - printf (sqrt of n is lf \n, x, sqrt(x))
15Printf formatting tag (1)
- printf formatting tag
- flagswidth.precisionlengthspecifier
- Specifiers
- d treat the corresponding parameter as a signed
integer - u means unsigned integer
- x means print as hexadecimal
- s means treat it as a string
- c is for character (char)
- f is for floating point numbers
16printf is dumb
- printf will treat the corresponding parameter as
the specifier suggests, even if the parameter is
not of the given type - e.g., d is replaced by the value of the
parameter when treated as an integer, even if the
parameter is not an integer variable - printf (print an int d\n, Hi Dave)
- print an int 134513980
- printf ("print an int d\n",12.3)
- print an int -1717986918
17Fun with printf
- char s Hi Dave
- printf (the string \s\ is d characters
long\n, s, strlen(s)) - the string Hi Dave is 7 characters long
- int x10
- printf (xd is o in octal and x in
hexadecimal\n, x,x,x) - x10 is 12 in octal and a in hexadecimal
- Note print a single
- e.g. printf (f of the population\n, 12.4)
- 12.4 of the population
18Printf formatting tags (2)
- printf formatting tag
- flagswidth.precisionlengthspecifier
- Flags
- 0 means left padding the output with 0
- Width
- A number specifying the width of the output
- Precision
- How many number of digits after decimal points
- Example
- printf (square root of 10 is 20.15f\n,sqrt(10)
- square root of 10 is 3.162277660168380
- printf(\\03o,gt)
- 076
19printf family
- Write formatted output to standard output
- int printf (const char , )
- Write formatted output to a file
- int fprintf (FILE , const char , )
- Write formatted output to a string
- int sprintf (char , const char , )
- char birthDate20
- sprintf(birthDate,02d02d/04d,m,d,y)
20scanf (scan formatted)
- int scanf (const char format, )
- Read from standard input, with format string
specifying what kind of variable(s) to read, and
how the variables are separated/delimited - means variable number of arguments, each of
them is address of variable that input should be
stored - Return number of successfully matched and
assigned input items (can be zero or smaller than
provided) - Return EOF (-1) on error or end of file
21A simple example of scanf
- int x,y
- scanf (d d,x,y) // a space match with
- // arbitrary number of blank, tab ,etc.
- Read from standard input two integers (separated
by spaces, tab, newline), and save the first one
to x the second one to y - Abc de // return 0, as cannot convert Abc and
de to integers - 1023 232 //return 2, x set to 1023, y set to 232
- 1023 232 // same as above
- Note always check the return value of scanf !
Conversion specifier
22A simple example of scanf
- scanf(dd,x,y)
- can only match with itself , i.e., the two
integers are separated by - 1023232 //x will be set to 1023, y set to 232
- 1023 234 //x will be set to 1023, y unset,
return 1
Conversion specifier
23Other scanf specifiers
- Similar to the specifiers used in printf
- u unsigned integer
- o octal
- x hexadecimal
- f floating point
- s a string
24Problems with scanf
- Using s only read the next word from input not
a line of input - char s100
- printf (Type in your name\n)
- scanf (s,s)
- printf (Your name is s\n,s)
- What if the user input a string longer than 100
characters long ? - segmentation fault
25Ways to work around
- char string1, string2
- string1 (char ) malloc (25)
- puts ("Please enter a string of 20 characters or
fewer.") - scanf ("20s", string1)
- printf ("\nYou typed the following
string\ns\n\n", string1) - With the field width (20 in the example), scanf
will only read the first 20 characters from user
input to save it to string1. - Still need to make sure the string1 buffer is
larger enough
26Ways to work around (2)
- Char string2
- puts ("Now enter a string of any length.")
- scanf ("as", string2)
- printf ("\nYou typed the following
string\ns\n", string2) - The a flag character tells scanf to allocate
the buffer as large as needed (string2 will be
set to point to the buffer allocated)
27scanf is still not preferable
- Avoid using scanf
- can cause program to crash due to buffer overflow
- can hang if it encounters unexpected non-numeric
input while reading a line from standard input
(?) - difficult to recover from errors when scanf
template string does not match input exactly - Better practice to read input from keyboard
- read a line (i.e., until a newline) with
getline() or gets() - parse the resulting string with sscanf (string
scan formatted, similar to scanf, but with input
given by a string)
28Reading a line
- fgets read an entire line from a file
- char fgets (char s, int size, FILE stream)
- s pointer pointing to a string, should be big
enough - size the maximum of chars
- FILE a file handle, for now, remember
- stdin (a constant) standard input
- Read a line (i.e., read character until newline
is met or until reach maximum ) from specified
file, and save to the string pointed to by s
28
29Example of fgets/sscanf
- char s101
- int month,day,year
- printf (Type in your name\n)
- fgets (s,100,stdin)
- printf (Your name is s\n,s)
- printf(Type in your date of birth(MM/DD/YYYY)\n)
- fgets(s,100,stdin)
- sscanf(s,d/d/d,month,day,year)
- printf (You were born at 02d/02d/04d,month,day
,year)
30getline a GNU extension
- ssize_t getline(char lineptr, size_t n, FILE
stream) - FILE fp
- char line NULL
- size_t len 0
- ssize_t read
- while ((read getline(line, len, stdin)) !
-1) - printf("Retrieved line of length
zu \n", read) - printf("s", line)
-
- if (line) free(line)
Automatically malloc/relloc buffer
31scanf family
- Read from standard input
- int scanf (const char format, )
- Read from a file
- int fscanf (FILE fp, const char format, )
- Scan from a string
- int sscanf (const char s, const char format,
)
32Standard I/O File Access
- So far, we learnt how to read from standard
input, write to standard output - Next how to read from or write to a file
33File Access typical use
- include ltstdio.hgt
- FILE fp
- To open a file
- fp fopen(/tmp.txt, r)
- To read a character from a file
- int c c getc(fp)
- To write a character to a file
- putc(c, fp)
- Read/write using fscanf/fprintf, fgets/fputs, ..
- Finally, fclose(fp) is used to close a file
34fopen() routine (1)
- include ltstdio.hgt
- FILE fopen(const char path, const char mode)
- path a relative or full path name of the file
- mode access mode
- r open the file to read
- w open the file to write (existing content
will be discarded) - a open file to append (write starts at end of
file) - rw open file to read and write
35fopen library routine (2)
- FILE fopen(const char path, const char mode)
- FILE a data structure containing info. needed
to perform input or output operations on it,
including - a file descriptor (will study in low-level file
access) - current stream position
- an end-of-file indicator , an error indicator
- a pointer to the stream's buffer, if applicable
- Note
- read/write at current stream position
- Buffered I/O not every write is applied to the
disk immediately
36Example
- FILE fp fopen(/tmp.txt, r)
- takes a filename, does some housekeeping and
negotiation with the kernel - Returns pointer to the FILE data structure on
success return NULL on failure - Always check for error after the call
- if (fpNULL)
- printf (failed to open file /tmp.txt\n)
- exit(1)
-
37Meaningful error message
- errno integer variable, set by system calls and
some library functions in event of an error to
indicate what went wrong - include lterrno.hgt
-
- if (fpNULL)
- switch (errno)
- case EACCES
- printf (You dont have permission\n)
- break
- case EINVAL
- printf (Invalid argument to fopen\n)
- break
- default
- printf (Something went wrong in fopen\n)
- exit(1)
38Using perror()
- include ltstdio.hgt
- void perror(const char s)
- perror() produces a message on standard error
output, describing the last error encountered,
i.e., errno. - FILE fp fopen(/tmp.txt, r)
- if (fpNULL)
- perror(open /tmp.txt)
- exit(1)
39Three special files
- Whenever a program is started, three files are
automatically opened, with file pointers stdin,
stdout, stderr. - getchar() is same as getc(stdin)
- putchar(c) is same as putc(stdout, c).
- printf(s,) is same as fprintf (stdout,s,)
- scanf(s,) is same as fscanf (stdin,s,)
40Other standard I/O Functions
- feof(FILE ) return non-zero when end of file is
reached - ferror(FILE ) return non-zero when any error
- fflush (FILE ) flush any buffered output to the
file
41New vis handling files
- int main(int argc, char argv)
- int strip 0
- int i
- FILE fp
- while (argc gt 1 argv10 '-')
- switch (argv11)
- case 's' / -s strip funny characters /
- strip 1
- break
- default
- fprintf(stderr, "s unknown arg s\n",
argv0, argv1) - return 1
-
- argc-- argv
-
42main() contd
- if (argc 1)
- vis(stdin, strip)
- else for (i 1 i lt argc i)
- if ((fp fopen(argvi, "r")) NULL)
- fprintf(stderr, "s can't open s\n",
argv0, argv1) - return 1
-
- else
- vis(fp, strip)
- fclose(fp)
-
- return 0
43Now vis
- void vis(FILE fp, int strip)
- int c
- while ((c getc(fp)) ! EOF)
- if (isprint(c) isspace(c))
- putchar(c)
- else if (!strip)
- printf("\\03o", c)
44A script a day overwrite
- To replace UNIX with UNIX(TM) in a file called
ch2 - How about sed s/UNIX/UNIX(TM)/g ch2 gt ch2
- A general solution ?
- A script overwrite that save standard input to
a file - sed s/UNIX/UNIX(TM)/g ch2 overwrite ch2
- sort k 3 n data.txt overwrite data.txt
45A script a day overwrite
- !/bin/bash
- PATH/bin/usr/bin
- case in
- 1)
- ) echo Usage overwrite file 1gt2 exit 2
- esac
- new/tmp/overwr.
- cat gtnew
- cp new 1
- rm f new
46A command a day find
- find command process a set of files and/or
directories in a file subtree you can specify - where to search (pathname)
- what type of file to search for (-type
directories, data files, links) - how to process the files (-exec run a process
against a selected file) - the name of the file(s) (-name)
- perform logical operations on selections (-o and
-a)
47Examples
- Search for file with a specific name in a set of
files (-name) - find . -name "rc.conf" -print
- Apply a unix command to all files found
- find . -name "rc.conf" -exec chmod or '' \
- Search for a string in a selection of files
- find . -exec grep "www.athabasca" '' \ -print
- search in current directory and all sub
directories. All files containing string will
have their path printed to standard output.
48More ways to find files
- Find all files under the root, that is regular
file and modified seven or fewer days ago - find / -type f -mtime -7 -print
49Homework 6 due April 8 (2 wks)
- Implement your own wc command
- wc -l -w -c filename ...
- Options
- -l show of lines
- -w report of words
- -c report of characters
- List of filenames
- Count for each of the file, and then report the
total line/word/character numbers for all files - If no filename is given, count standard input
50Whats next ?
- Development tools (1-1.5 week)
- gcc, make, gdb
- Memory related topics (1 weeks)
- Dynamic memory allocation
- GNU C Library malloc, free, etc.
- Debugging memory problems
- System calls
- Low-level File access
- Processes, pipes, signals
- Threads
51Whats next ?
- A final project, can be any of the followings
- You can define your own project but instructors
approval is needed - Some interesting tasks that require
- Use multiple unix commands, shell programming
- Some C programming to analyze data
- Timeline for the project
- Proposal presentation
- Prototype due
- Final project/presentation due
52Project ideas
- Example
- Use command wget to retrieve a web page, parse
the file and retrieve the web pages by following
the hyperlinks within - Analyze
- web page size statistics, or web contents
statistics (word frequency)
53Project ideas
- C Program analyzing program
- Beautifying program automatically indent the
program - Statistics reporting lines of code, number of
functions defined, number of loops, number of
static variables, number of memory
allocation/deallocation