LCC 6310 Computation as an Expressive Medium - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

LCC 6310 Computation as an Expressive Medium

Description:

Disturbing code. void method1() { println('Beginning of method 1'); method1 ... Making our disturbing code work. void method1(int depth) { if (depth = 0) return; else ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 18
Provided by: michael74
Category:

less

Transcript and Presenter's Notes

Title: LCC 6310 Computation as an Expressive Medium


1
LCC 6310Computation as an Expressive Medium
  • Lecture 10

2
Parsing HTML
  • Parsing HTML
  • Process and create visualizations of or
    manipulations of web information
  • Learning HtmlParser
  • Available htmlparser.sourceforge.net
  • A java package for parsing html
  • Using external java libraries within processing

3
Accessing an external class library
  • Place jars (or class files) within the libraries
    folder in processing
  • For example, if the class library is called
    htmlparser, create an htmlparser/library folder
    within the libraries folder and add the jar files
  • Jason will go over how to use external class
    libraries in your exported applets
  • Use the import statement to bring the external
    classes into your program
  • Class libraries live in packages
  • import ltpackagenamegt. means make all the
    classes in ltpackagenamegt available for use in my
    program
  • Package names can be hierarchical (e.g.
    org.htmlparser.util).
  • If you get an error that an htmlparser class can
    not be found, look in the documentation to see
    what package you need to import into your program
  • The imported packages in the example code should
    get your pretty far

4
The Parser class
  • Parser is the main class for parsing the html
    file pointed to by a URL
  • What is parsing? To parse an html file means to
    turn the raw text of the page into structured
    tags that you can process
  • Look for specific tags
  • Get attributes from tags
  • One way to parse an HTML file
  • Parser parser new Parser(ltURLgt)
  • NodeList nodeList parser.parse(null)
  • Parser.parse(NodeFilter filter) returns a
    NodeList of all html nodes (tags) that satisfy
    the filter
  • The null filter returns a list containing all the
    tags

5
The try/catch block
  • Exceptions are thrown whenever java encounters an
    error situation
  • Youve probably all run into exceptions, like the
    NullPointerException
  • A bunch of different exceptions are defined by
    Java, but programmers can define their own
  • When an exception is thrown, it travels up the
    call stack (the stack of method calls)
  • The default behavior for exceptions is for them
    to travel all the way to the top, where they
    terminate your program (and print out the
    exception)
  • Sometimes, however, you want to handle an
    exception yourself and keep on going. You do this
    with the try ltstatementsgt catch
    (ltExceptionClassgt e) ltexception codegt
  • The idea is that you want your program to keep
    goinig, so you write special code to clean up
    after the error
  • You can declare that a method can throw specific
    exceptions if you do, any caller of the method
    must handle the exception with a try/catch
  • Parser.parse() does this, so we have to handle
    the exception

6
NodeLists
  • NodeLists contain Nodes, where each node
    represents a tag or text
  • Nodes are hierarchical, just like the structure
    of html documents
  • Lets look at the top-level tag structure for the
    syllabus
  • for(int i 0 i lt nodeList.size() i)
  • Node n nodeList.elementAt(i)
  • if (n instanceof Tag)
  • Tag t (Tag)n
  • println(t.getTagName())
  • Lets take a look at reading documentation

7
Traversing the hierarchical structure
  • NodeList Node.getChildren()
  • Get a list of the children nodes of a node
  • So, to search the nodeList for specific nodes,
    you need to search the top level, then search
    within the children of the top level, and so
    forth
  • NodeList NodeList.searchFor(Class classType,
    boolean recursive)
  • If the second parameter is true, it will look in
    the children lists for you
  • Need to tell it what class of node youre looking
    for
  • In Java, classes are themselves objects
  • To get the Class object corresponding to the
    class for an object, call getClass() on any
    object

8
Example find all the images on a page
  • First create a throw away instance of the tag
    youre looking for just need it to get your
    hands on the class
  • ImageTag tempImageTag new ImageTag()
  • Use NodeList.searchFor() to create a list of just
    the image tags (searching recursively)
  • NodeList imageList nodeList.searchFor(tempImageT
    ag.getClass(), true)

9
Getting tag attributes
  • String Tag.getAttribute(ltattribute namegt)
  • Returns the String value of the attribute
  • Some example useful attributes
  • SRC for image tags (gives you the URL for the
    image)
  • ALT for image tags (gives you the alt text
    associated with an image)
  • HREF for link tags (gives you the URL associated
    with a link)
  • If a tag doesnt have the requested attribute,
    returns null

10
Example Drawing images you find on a web site
  • Assume you have a NodeList of ImageTags (e.g.
    imageList)
  • for(int i 0 i lt imageList.size() i)
  • String src imageList.elementAt(i)
  • PImage pim loadImage(src)
  • image(pim, random(width), random(height))
  • Red text is processing methods and class

11
Example Only drawing images with the given alt
text
  • Just like we used getAttribute() to get the image
    source, use getAttribute() to get the image text
    (the ALT attribute)
  • Use string methods to see if the alt text
    contains the text you care about
  • String.indexOf(lttestStringgt) if lttestStringgt
    appears anywhere within the String, returns the
    location within the String, otherwise -1
  • Lets use this to get only the images we care
    about from cnn.com.

12
Following links
  • When processing information from the web, we may
    have to crawl a site
  • This means following links and recursively
    parsing
  • Links are just another tag type, and the href is
    just an attribute

13
Non-disturbing code
  • void method1()
  • println(In method 1)
  • method2()
  • void method2()
  • println(In method 2)

14
Disturbing code
  • void method1()
  • println(Beginning of method 1)
  • method1()
  • println(End of method 1)
  • What will this do? Its recursive

15
Making our disturbing code work
  • void method1(int depth)
  • if (depth lt 0) return
  • else
  • println(beginning of method 1)
  • method1(depth 1)
  • println(End of method 1)
  • Good recursive functions have a base case (where
    they stop) and a recursive case (where they call
    themselves)

16
Example Following links (recursively)
  • if (depth 1 lt maxDepth) // When depth 1 gt
    maxDepth, we stop
  • // Get a list containing just the links
  • LinkTag tempLink new LinkTag()
  • NodeList links nodeList.searchFor(tempLink
    .getClass(), true)
  • for(int i 0 i lt links.size() i)
  • // Get the link
  • LinkTag lnk (LinkTag)links.elementAt(i)
  • String href lnk.extractLink() //
    Extract link gives us an absolute link
  • parsePage(depth 1, href, altText) //
    Recursively parse

17
Putting the pieces together
  • ImageCollage from CNN
Write a Comment
User Comments (0)
About PowerShow.com