Capturing Websites - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Capturing Websites

Description:

Harvester. Web. Server. Internet. Business. Application. You may ... used on your web pages to ensure harvester can find all the links and does not get confused ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 41
Provided by: Wau1
Category:

less

Transcript and Presenter's Notes

Title: Capturing Websites


1
Capturing Websites
  • PROV Advice to Agencies 20bTechnical issues for
    capturing web records

2
Overview
  • What is Advice 20b what does it contain?
  • How do you use Advice 20b?

3
What is Advice 20b?
  • Technical advice on how to capture websites as
    records
  • Does not discuss recordkeeping issues see
    Advice 20a for those

4
What does Advice 20b contain?
  • Description of the various technical options to
    capture web records
  • Recommendations as to which option should be
    chosen under various conditions
  • Technical advice for each option
  • Limitations
  • Metadata to be captured
  • Transfer to PROV (including VERS mapping)

5
Relationship to Advice 20a
  • Where 20a aims to help you work out what you need
    to do, 20b aims to help you do it!
  • The two documents are complementary and refer to
    each other throughout
  • Advice 20b relies on the recordkeeping analysis
    in 20a to determine what records to capture

6
Who should read Advice 20b?
  • Technical staff
  • who design, maintain, and operate websites
  • Management
  • who commission and oversee websites
  • Records managers and others
  • who need to capture records

7
Requirements timetables
  • This advice provides recommendations, not
    requirements
  • PROV prefers to get experience with recommended
    approaches before mandating approaches
  • Agencies already have a legal and policy
    requirement to keep full and accurate records

8
How do you use Advice 20b?
9
Your goal is
  • To create a record of your interaction with the
    user of the web site
  • The information that you presented to them
  • Possibly how the information was presented (look
    and feel)
  • Possibly what information they supplied to you
  • Possibly the sequence of interactions with the
    user

10
Before using Advice 20b
  • Appraise your website using Advice 20a to
    determine what portions (if any) need to be
    captured as records and how often they need to be
    captured

11
Appearance or Content?
  • Do you need to capture the actual appearance of
    the web page, or just the data that is used to
    construct the web page?
  • Is the web simply a front end to a business
    application?
  • Do you capture screenshots of your business
    applications?

12
What did that user see?
  • Is the page different depending on who is viewing
    the page?
  • Common view (everyone sees the same)
  • Group view (e.g. staff, public)
  • Individual view
  • Is it important to show what a particular user
    saw?

13
Where is the data held?
  • Is the content drawn from one business
    application or assembled on the fly from many
    different data sources?

14
How do you capture web sites?
15
Options
WebHarvester
WebServer
Internet
Business Application
Web Harvesting
Capture from Back-end application
Transaction Capture
16
Which option is best?
  • Depends on
  • Whether you need to capture what the users saw
  • How the web pages are generated
  • Advice 20b gives recommendations on which option
    to chose

17
Web harvesting
18
What is it?
  • Software that acts as a user, follows links in a
    site, and downloads every page of a portion of a
    website
  • Cannot capture sites where the pages are not
    linked together
  • Captures a snapshot what the site looks like
    at a point in time

19
When to use?
  • Recommended default approach

20
Why?
  • Captures exactly what the user saw
  • Range of off-the-shelf software (including public
    domain)
  • Does not matter where the information on web
    pages is drawn from

21
Do not use if
  • The pages are not linked together (e.g. you must
    do a search to find the pages)
  • Pages change frequently (as this approach
    captures a snapshot)
  • Pages are significantly customised (as this
    approach captures the view of a particular user)

22
You may need to limit
  • May need to restrict (bleeding) edge features
    used on your web pages to ensure harvester can
    find all the links and does not get confused
  • Javascript, Flash,

23
Implementation options
  • Many commercial and public domain products
    available to carry out harvesting
  • Harvesting services also available (e.g. Internet
    Archive)

24
Transaction Capture
25
What is it?
  • Software that attaches to your Web server that
    captures and stores a copy of each request and
    each web page as it is being served to users
  • Uses a custom client to replay interactions

26
When to use?
  • When it is necessary to show exactly what you
    told each user
  • But this is amount of detail is often not required

27
Why?
  • Particularly useful where pages are
  • heavily customised for particular user
  • change frequently

28
Implementation options
  • Two commercial products available (details in
    advice)

29
Capture from back-end application
30
What is it?
  • Capture records directly from the business
    application that sits behind the web
  • database, business application, content
    management system

31
When to use?
  • Recommended where the application already
    captures records (or could capture records)

32
Why?
  • Simplicity (why repeat work?)
  • Particularly useful where pages are
  • Dynamically generated
  • Change frequently

33
Do not use if
  • It is necessary to show exactly how a page
    appeared
  • it provides no evidence as to what the web server
    did.

34
Implementation options
  • No products each website would be different
  • Capture content from the business application
  • Design documentation showing how the content was
    turned into web pages

35
Summary
  • Appraise using 20a is it a record?
  • Where is the data held?
  • Do you need to show what the user saw on the web
    site?
  • Is the web page significantly customised for
    individual users (or groups of users)?

36
Choose web harvesting if
WebHarvester
WebServer
Internet
Business Application
  • By default due to its simplicity
  • Do not use if pages heavily customised, change
    frequently, or are not linked together

37
Choose transaction capture if
  • Pages are heavily customised and you must show
    what you said to individuals
  • Pages change frequently, or are dynamically
    generated

38
Choose capture from the backend application if
  • Records already captured in application, content
    changes frequently or is heavily customised
  • Do not use if you must be able to demonstrate web
    page presented to user

39
More information
  • PROV Advice to Agencies 20a and 20b
  • http//www.prov.vic.gov.au/records/standards.asp
  • Contact PROV
  • Kathy Sinclairkathy.sinclair_at_prov.vic.gov.au
  • Andrew Waughandrew.waugh_at_prov.vic.gov.au

40
Questions?
Write a Comment
User Comments (0)
About PowerShow.com