CGI-based - PowerPoint PPT Presentation

About This Presentation
Title:

CGI-based

Description:

CGI-based Proxy (URL-rewriter) Used at TCU Developed in March 1996, initially for access to First Search, then other vendors Love child of Naivete and Desperation – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 12
Provided by: KerryBo1
Learn more at: http://lib.tcu.edu
Category:
Tags: cgi | based | proxy | server

less

Transcript and Presenter's Notes

Title: CGI-based


1
CGI-based Proxy (URL-rewriter) Used at
TCU
  • Developed in March 1996, initially for access to
    First Search, then other vendors
  • Love child of Naivete and Desperation(Getting
    people into commercial databaseswas easy with
    Telnet and C-Kermit -- isntthe web supposed to
    make things easier?)
  • Desperation Needed solution now
  • Naïvete Figured could cobble together a solution
    to use for about 18 months, then IETF/vendors
    would have a better solution available

2
Goals...
  • Support following vendor authorization scenarios
    with one program
  • User enters id/password in a form to gain access
    (Cambridge Scientific Abstracts, Ovid on TexShare
    server, OCLC)
  • Vendor uses standard HTTP Username/Password
    validation (Hoovers Company Data, CenStats)
  • Vendor validates based on IP address (many
    vendors)
  • Make completely transparent to user (no
    reconfiguration of Web browser)
  • Use tools we already had (e.g, not maintain a
    Unix box just to run WebScript)

3
Tools
  • The Lynx Web browser supports most validation
    scenarios as long as vendor doesnt require use
    of cookies (many systems send cookies, but still
    work without them)
  • URL re-writing technique makes access transparent
    to User - no reconfiguration of browser.
  • Components are CERN Web server on VMS, CGI
    scripts written in DCL, the Lynx web browser, and
    a Pascal URL-rewriter program written in-house
    (tools we already had, except for URL-rewriter)

4
How it works (IP validation scenario)
  • User clicks on a link to a URL that looks like
    this
  • http//lib.tcu.edu/htbin/validate.pp?Britannica
  • Validate.pp looks at WWW_REMOTE_ADDR field to
    determine if user is already coming from a TCU IP
    address
  • If FLoc("138.237", WWW_REMOTE_ADDR) .lt.
    FLen(WWW_REMOTE_ADDR)
  • then
  • WS "Location http//www.eb.com180/"
  • WS ""
  • else

5
. (IP validation scenario)
  • If user is off-campus, then send them back a
    screen explaining theyre about to be asked for
    username and password, with a link that calls the
    CGI script for remote resource. URL for the link
    looks like this
  • http//lib.tcu.edu/htbin/Britannica.pp?http//www.
    eb.com180/
  • If their username and password isregistered,
    Britannica.pp uses Lynxto fetch the page
    specified as the parameter and dumps it in a
    temp file.

6
. (IP validation scenario)
  • Pascal program opens the temp file and looksfor
    all HREF, SRC and other tags thatspecify URLS.
    If URL is relative
  • ltIMG SRC/images/pascal.jpggt
  • Then it must first be converted to absolute
    formltIMG SRChttp//www.eb.com180/images/pasca
    l.jpggt
  • Then, assuming URL points to something that also
    requires IP validation, the URL of the CGI script
    is prepended to itltIMG SRChttp//lib.tcu.edu/h
    tbin/britannica.pp?http//www.eb.com180/images/pa
    scal.jpggt

7
. (IP validation scenario)
  • So, when user clicks on the link (or when browser
    fetches data automatically, as with IMG tags),
    access continues to be routed through the CGI
    script and the vendors server continues to see
    our IP address. Lynx will gladly fetch .JPG,
    .GIF, .PDF and other non-HTML data and dump it in
    a temp file, so the CGI script just does a
    redirect when it sees that its fetched non-HTML
    data
  • If (FLoc(".GIF", TSTR) .lt. FLen(URL)) .OR.
    (FLoc(".JPG", TSTR) ...
  • then
  • WS "Location http//library.tcu.edu/wwwscra
    tch/''HTML_InFile'"
  • WS "
  • else

8
Surprisingly...
its worked for three years
  • Currently in use for 18 different sources. Most
    use IP validation, three use HTTP
    Username/Password, and five use logon forms with
    session ids.
  • 250-1,400 files a day fetched through the
    scripts in February.
  • Although technique is clunky (creates temporary
    files for everything fetched), the response time
    for the Internet is generally so slow that the
    extra time required by the scripts is negligible
    in comparison. Likewise, the load it adds to our
    Alpha server is small relative to other
    applications.

9
But
  • Lack of cookie support is becoming anissue for
    more and more databases.
  • Creating scripts requires fair amount of
    proficiency in DCL.
  • Limitations in DCL and Lynx create need for some
    kludgy work arounds in some of the DCL scripts
    they can be complex to diagnose when something
    stops working.
  • How much longer will be be running a VMS server?

10
So working on a new version.
  • Written as a Java Servlet so as to be portable
    to different web servers andoperating systems.
    (The Servlet interface replaces CGI).
  • Instead of scripts, uses a configuration file
    should be easier to set up and maintain.
  • Java has built in support for cookies and
    maintaining state in both directions (to the
    vendor, and to the users browser), so cookie
    support becomes doable.

11
Finally.
  • If vendors start supporting a bettersolution,
    like theReferrer URLtechnique, there will be
    no need for it. Altogethernow, lets
    startholding our breath...
Write a Comment
User Comments (0)
About PowerShow.com