Title: Web servers
1Web servers
- Miroslav Milinovic
- Croatian Academic and Research Network - CARNet
- Zagreb, Croatia
- ltmiro_at_srce.hrgt
6th CEENet Workshop on Network Technology,
Budapest, Hungary, August 2000.
2Content
- Web server
- Apache
- directory structure configuration files
directives running - access control authentication
- Common Gateway Interface (CGI) passing data
- Server side includes (SSI)
- API modules handlers
- virtual servers
- Measuring the Web
- Summary
3How Web works?
WWW servers
Internet
(WWW)
users browse
?
?
HTML files
authors write HTML
?
4Web server
- general purpose data delivery vehicle
- a program (daemon, httpd)
- responds to an incoming TCP connection and
provides a service to the client - runs independently
- Web servers
- do NOT validate HTML code (parse documents)
- do NOT check links
- follow MIME rules (without checking file content)
- Web site host Web server information (file
system)
5Web server software
- traditionally freely available
- for most of the platforms
- UNIX, Ms Windows, Macintosh, VMS, VM,
- list of available servers software
- http//www.yahoo.com/Computers_and_Internet/Softwa
re/Internet/World_Wide_Web/Servers/ - Web Server Survey
- http//www.netcraft.com/Survey/
- popular server programs
- CERN, NCSA (first ones)
- Apache, MS IIS, Netscape servers, ...
6Apache
- A PAtCHy server is a kind of a plug-in
replacement for NCSA httpd - under constant development
- freely available
- in source code
- binaries for many platforms (v. 1.3.x includes
also the Windows NT) - supports HTPP 1.1. from 1.2.
- useful addresses
- Apache home http//www.apache.org/
- http//www.apacheweek.com/
- support via Usenet group(s)
7Where to put the server?
- server should run where information is been
created - choose host carefully
- give an DNS alias name to the selected host
(www. mydoimain.mycuntry) - ServerRoot, DocumentRoot and Log files
directories should be chosen carefully according
to rules for all daemons and disk space
requirements - User Home Pages?
- CGI rules!
8Apache directory structure
- can be designed (changed) during installation
(compilation) process - some important directories
- cgi-bin/ - CGI scripts directory (examples
present) - conf/ - configuration files for httpd server
- htdocs/ - main directory for documents
- logs/ - directory with log files (currently
empty) - other stuff (bin/, man/)
9Apache configuration files
- look in conf/ directory
- access.conf - access configuration
- httpd.conf - server configuration
- mime.types - MIME type to extensions definition
- srm.conf - resource configuration
- .-dist - distribution templates
- since v.1.3.6. it is recommended to use only main
configuration file httpd.conf
10Apache configuration directives
- general rules
- case insensitive (not true for file/directory
names) - comment lines begin with
- one directive per line
- each line of these files consists of
- directive data data2 ... datan
- extra whitespace is ignored
11httpd.conf
- ServerType standalone
- Port 80
- User nobody
- Group nogroup
- ServerAdmin your_e-mail_address
- ServerRoot /home/httpd/
- ErrorLog /home/httpd/logs/error_log
- TransferLog /home/httpd/logs/access_log
- PidFile /home/httpd/logs/httpd.pid
- more directives
- Keep Alive, Spare Servers, Proxy, Cache, Virtual
Servers, ...
?
12httpd.conf (srm.conf)
- DocumentRoot /home/httpd/htdocs/
- UserDir public_html
- DirectoryIndex index.html
- AccessFileName .htaccess
- DefaultType text/plain
- ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/
- more directives
- Icons, Language, Handlers, ...
?
13httpd.conf (access.conf)
- defines
- which types of services are allowed
- in what circumstances
- ltDirectory dir_namegt directives
lt/DirectorygtltDirectoryMatch regexgt directives
lt/DirectoryMatchgt lt Files file1 file2 gt
directives lt/Filesgt ltFilesMatch regexgt
directives lt/FilesMatchgt - be very careful due to possible problems
- operational
- security
14mime.types
- list of MIME types know to your server
- format type/subtype file_extension
- example text/html html htm
- image/gif gif
- files with other extension will be sent with
DefaultType - add an entry according to your needs
15Starting and stopping Apache
- if you selected standalone server type
- simply execute the program (apachectl start)
- setup automated startup (during boot)
- apachectl options START, STOP, CONFIGTEST
- Apache dynamically adapts to the workload
- to stop (restart) the server use
- kill command (UNIX) (pid is in httpd.pid file)
- apachectl stop
16Access control
- two levels
- per-server (Global Access Configuration file) -
using directives in httpd.conf (access.conf) - per-directory (Per-directory Access Configuration
file) - using .htaccess files (you can change
this file name using AccessFileName directive in
httpd.conf (srm.conf) - two ways
- by user/password
- by host/domain
17httpd.conf (access.conf) DocumentRoot settings
- ltDirectory /home/httpd/htdocsgt
- instead of the Directory it is possible to
use Location (controls URLs) or Files
(controls files). - it is possible to use wild cards here ?
- Options Indexes FollowSymLinks
- Option can be FollowSymLinks,
SymLinksIfOwnerMatch, ExecCGI, Includes, Indexes,
IncludesNoExec, All, None - AllowOverride All
- Specify which Options can be overridden by
per-directory access files - lt/Directorygt
18httpd.conf (access.conf) Scripts directory
- ltDirectory /home/httpd/cgi-bingt
- Options FollowSymLinks
- AllowOverride None
- lt/Directorygt
- the later directives (according to the order in
the configuration files) are the more important
(specific) - if permitted the more specific are the settings
in the .htaccess
19User/password authentication
- Create a file called .htaccess in required
directory (of course you can do this on the
server level) - AuthUserFile /home/httpd/admin/.htpasswd
- AuthGroupFile /dev/null
- AuthName ByPassword
- AuthType Basic
- ltLimit GETgt
- require user username
- lt/Limitgt
?
20User/password authentication
- using htpasswd command create the password file
- htpasswd -c /home/httpd/bin/.htpasswd username
- enter password of your choice (later you can
check the content of .htpasswd file) - multiple users (of course you have to create
entries in .htpasswd file) - add users in require directive in .htaccess
- OR
- create a group file (.htgroup), use directives
AuthGroupFile and require group in .htaccess file - OR
- use require valid-user directives (all users from
.htpasswd have access)
21It works, but ...
- server asks browser for user/password to allow
access - password is send over the network not encrypted
but "uuencoded" - password is not visible in the clear, but can
easily be decoded by anyone who happens to catch
the right network packet (sniffers in action) - this method of authentication is as safe as
telnet-style username and password security
22Host/domain authentication
- protective .htaccess file looks like
- ltLimit GETgt
- order deny,allow
- deny from all
- allow from hostname/domain
- lt/Limitgt
?
23Host/domain authentication
- open .htaccess file looks like
- ltLimit GETgt
- order allow,deny
- allow from all
- deny from hostname/domain
- lt/Limitgt
24Access control
- it is possible to use authentication by
host/domain and by user/password together - for better security compile the Apache with the
SSL (Secure Socket Layer) - then server and client exchange the keys on the
beginning of the session and all of the
transactions are encrypted
25Common Gateway Interface (CGI)
- WWW server is able to communicate with other
programs (CGI scripts) - CGI scripts can be written in any programming
language (shell script, PERL, C, ) - CGI scripts can use CGI environment variables
- CGI is used for
- getting input from user, forms processing,
returning any kind of dynamic information,
gateways to other services, ... - workload is on the servers side (be careful)
?
26CGI
- server needs to be configured for CGI operation
to enforce security procedures - ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/
- all of the files in /cgi-bin/ are considered to
be a executable scripts (regardless of the name
of the file) - security measures (with CGI scripts)
- parse and check user input
- programs should have only the power they require
- dynamically generated programs are not permitted
- carefully examine all cgi scripts (do not allow
users to execute their own programs)
27Passing data (GET method)
- data is simply attached to the end of the URL
- ? is used to separate data from URL
(http//url?data) - CGI programs are executed with URL address
- http//www_server/cgi-bin/program_name?data
- simple example ltISINDEXgt tag
- browser asks for input from user and attaches it
to the URL - the input is rewritten by browser (spaces become
"", \n become "", ) - server puts part of URL after "?" in to the
environment variable QUERY_STRING
28Passing data (POST method)
- recommended method for processing FORMS
- on the HTML page with form you declare which
script will be called to process data from the
form - ltFORM METHODPOST ACTION/cgi-bin/script_namegt
- ...
- lt/FORMgt
- when user hits the submit button client contacts
server and passes request (POST
/cgi-bin/script_name) with data from the form
(data follows URL as a document) - to pass the data to the CGI program server uses
environment variables and stdin
?
29Passing data (POST method)
- server executes the CGI script and provides it
with - list of environment variables
- input stream of FORM contents in namevalue
pairs(name1value1name2value2name3value3) - script knows how long this input stream is from
environmental variable CONTENT_LENGHT - CGI script general procedure order
- read input from stdin
- split namevalue pairs and do value conversion
(spaces, ...) - do something and print out results in HTML form
tostdout
?
30Passing data (POST method)
- CGI scripts are responsible for formatting output
on "stdout" back to the server (finally server
will pass this information to the client) - CGI script is responsible for generating content
specific headers and send them as a first lines
of output to the server - for example
- Content-type text/plain
- FOLLOWED BY (at lest) ONE BLANK
LINE !
31Server side includes (SSI)
- server can be configured to scan documents with
shtml extension for occurrence of construction
like - lt!--command tag1"value1" tag2"value2" --gt
- and replace them with the result of the command
- this concept is used to add
- current date, any other CGI environment variable
value - document's (or other file's) last modification
data, size - inline other document contents into the current
document - result of work of any other program on any Web
server side
32API modules handlers
- Apache breaks down clients request handling into
a series of steps - URL --gt Filename translation
- Auth ID checking
- Auth access checking
- Access checking other than above Auth
- Determining MIME type of the object requested
- Fixups' - if needed
- Actually sending a response back to the client
- Logging the request
?
33API modules handlers
- on any of those steps you may tide up an handler
(the procedure) - a set of handlers may make an module, eg. cgi
module, log module, server side includes module,
access module, ... - consistent specification of the steps allows to
connect own modules to Apache which replace the
old one or gives the new possibilities
34Virtual servers
- one server may listen on many hosts names -
virtual servers (same port, different hostnames) - part of basic server configuration (httpd.conf)
- ltVirtualHost hostnamegt lt/VirtualHostgt
- each of the virtual server may have totally
different content, configuration, separate log
and error files, - alternative is to run another server on a
different port
35Virtual servers - example
- Port 80
- ServerName test.ceenet.ceu.hu
- NameVirtualHost 193.225.201.125
- ltVirtualHost 193.225.201.125gt
- ServerName test.ceenet.ceu.hu
- DocumentRoot /home/httpd/test
- lt/VirtualHostgt
- ltVirtualHost 193.225.201.125gt
- ServerName virtualtest.ceenet.ceu.hu
- DocumentRoot /home/httpd/virtualtest
- ...
- ltVirtualHostgt
36Web measurement? Why?
- for content providers
- usability testing
- detection of performance problems
- tuning the server software / benchmarking servers
- for network providers
- evaluating proxies
- detection of performance problems
- for protocol developers
- measuring protocol (DNS, TCP, HTTP) performance
- evaluating changes and new mechanisms
37Web measurement techniques
- server logs
- proxy (cache) logs
- browser (client) logs
- packet traces
38Server logs
- servers logs access information in the file
- client host,
- date,
- client request,
- status,
- count of the bytes sent by server
- ...
- standard log files and formats
- it is possible (and easy) to produce many kinds
of activity reports from that data - plenty of log analyzers (wwwstat, analog,)
39Some results
- derived from various studies
- average resource is small (e.g. 8-10 KB)
- 10 of resources attract 90 of traffic
- average of 3-5 resources per Web document
(including HTML file) - number of clicks per session (?)
- 4 - server studies
- 10 - client studies
40Summary
- WWW server
- Apache
- directory structure configuration files
directives running - access control authentication
- Common Gateway Interface (CGI) passing data
- Server side includes (SSI)
- API modules handlers
- virtual servers
- Measuring the Web