Title: Building Intelligent Web Agents with CFML Michael Dinowitz November, 2000
1Building Intelligent Web Agents with
CFMLMichael DinowitzNovember, 2000
2Intelligent Agents in ColdFusion
- What are Agents?
- Code that does automatic work for you
- Involves retrieving information
- Processing or storing that information
- Usually a single page or has no interface
- What are Intelligent Agents (IA)?
- Term user for a specific class of agents
- Retrieves remote information
- Processes the retrieved information
- Decision making code built in
- Usually involves Parsing operations
- Interfaces with remote processes
3Intelligent Agents in ColdFusion
- What arent Intelligent Agents?
- Push of any sort (CFMAIL)
- Calls to structured locations
- DBs
- LDAP
- Browsers
- Grey Areas - Structured data
- Syndicated data (Spectra)
- HTTP query returns
- Comma delimited information
- Most local information calls
4Intelligent Agents in ColdFusion
- Broad examples
- CF_StockGrabber - grabs and processed stock
information - CF_UPS - interface to UPS shipping data
- CF_MetaSearch - searches multiple search engines
and collates results - CF_GetTags
5Intelligent Agents in ColdFusion
- Technologies used for retrieval
- CFHTTP - retrieve websites
- CFFTP - retrieves ftp information
- CFX_Socket - socket calls for information
- CFX_NNTP - retrieves usenet news
- Technologies used for parsing
- Find() / FindNoCase ()
- Replace() / ReplaceNoCase ()
- Mid()
- REFind() / REFindNoCase ()
- REReplace() / REReplaceNoCase()
6IA technique I - CF_EbayItem
- IA technique I - CF_EbayItem
- 1. Define what you want
- A page from ebay with the results of a search
- 2. Define how it will be displayed
- Whole page returned in a variable. No parsing
- 3. Define the steps to get it
- CFHTTP to retrieve a page
- Place information in file or on browser
7CFHTTP Basics
- Url - Url to retrieve. Does not need http//
prefix - Method - Get or Post.
- ResolveUrl - Turns all relative links into full
ones. Needed for graphics and links from the
page. - Notes
- The URL does not need to be prefixed by http//,
but its good practice to do so. - Get is standard and uses the tag as is. Post
requires a CFHTTPPARAM as well as a closing
CFHTTP tag. - ResolveUrl should only be used when you expect to
follow links from the called page or want to see
the media content.
8IA technique I - CF_EbayItem
- IA technique I - CF_Ebay (Code)
AM name"attributes.ReturnVar" default"ReturnVar"
/search.dll?MfcISAPICommandGetResultebaytag1eba
yreght1queryattributes.searchitemebaytag1co
de0srchdescySortPropertyMetaNewSort"
method"GET" resolveurl"true" Caller.Attributes.ReturnVarCFHttp.FileContent
9IA technique II - CF_EbayItem
- 1. Define what you want
- All items from an ebay search
- 2. Define how it will be displayed
- in a return array
- 3. Define the string to search for in the page
- ViewItemitem449570667"HEBREW AMULETS By T
Schrire - 4. Define the steps to get it
- CFHTTP to retrieve a page
- CFLOOP over the page for elements
- FindNoCase() to get start of specific element
- FindNoCase() to get end of specific element
- Mid() to get whole element
- Place information in array for return
10Find()/FindNoCase() Basics
- FindNoCase(substring, string , start )
- SubString - The exact string your looking for
- String - The string that your searching
- Start - Optional start position.
- Notes
- FindNoCase is slightly slower, but better when
you dont know exactly what your looking for. - Always a good idea to set a start. Speeds up the
search. - Remember that the return value is the START
position of the SubString. Add the SubString
length to get the end position.
11Mid() Basics
- Mid(string, start, count)
- String - The string that contains the SubString
you want. - Start - The start position of the SubString you
want. - Count - The amount of characters in the SubString
that you want. - Notes
- When used with FindNoCase, it is usual to have a
start variable and an end variable. The count
would then be noted as - End-Start
12IA technique II - CF_EbayItem
-
-
-
- default"ReturnVar"
- arch.dll?MfcISAPICommandGetResultebaytag1ebayre
ght1queryattributes.searchitemebaytag1code
0srchdescySortPropertyMetaNewSort"
method"GET" resolveurl"true" - Content
AM nameAttributes.ReturnArray"
default"ReturnArray" url"http//search-desc.ebay.com/search/search.dll
?MfcISAPICommandGetResultebaytag1ebayreght1q
ueryAttributes.SearchItemebaytag1code0srchde
scySortPropertyMetaNewSort" method"GET"
resolveurl"true" LocalArrayArrayNew(1)
13IA technique II - CF_EbayItem
-
- y.com/aw-cgi/eBayISAPI.dll?ViewItemitem',
cfhttp.filecontent, end) -
-
- ', cfhttp.filecontent,
start)4 -
- ent, start, end-start))
-
-
-
-
-
- y
14IA technique III - CF_EbayItem
- 1. Define what you want
- All items from an ebay search
- 2. Define how it will be displayed
- in a return array
- 3. Define the string to search for in the page
- ViewItemitem449570667"HEBREW AMULETS By T
Schrire - 4. Define the steps to get it
- CFHTTP to retrieve a page
- CFLOOP over the page for elements
- REFindNoCase() to get specific element
- Mid() to get whole element
- Place information in array for return
15REFind()/REFindNoCase() Basics
- REFindNoCase(RegEx, String ,start ,returnsub
) - RegEx - Regular Expression to use as search
criteria - String - String to search in
- Start - Position in String to start search at
- ReturnSub - Returns sub expressions as defined in
the RegEx - Notes
- Start should always be used as it speeds up the
search. If using ReturnSub, it is required and
can be set to 1. - This function returns the numeric position of the
searched for text unless ReturnSub is specified.
Then it returns a structure
16REFind()/REFindNoCase() Basics
- Structure returned by this string will have two
keys (Pos, Len) with each key being an array. The
first array (Variable.Pos1, Variable.Len1)
will always contain the position/Length of the
ENTIRE match. Each additional array element will
contain the position and length of a subelement. - Variable
- Pos
- 1
- 2
- Len
- 1
- 2
17RegEx Basics
- The following is a fast rundown of important
characters in Regular Expressions - In most cases, a character is equal to itself
- A \ will escape any special character
- A period (.) represents any one character
- .at can mean bat, cat, rat, or anything that has
a single character and ends with at. - A pair of brackets denotes a set of characters
(I.e. one of them can be used) - 01256 means any one of those numbers
- A dash (-) within a set means a range of
- 0-9 means any single number of 0 through 9
- A carat () within a range means Not the range
- aeiou means any character but a vowel
18RegEx Basics
- Parenthesis is used to denote a compound
expression OR a subexpression - (this) will return the position and length of the
word this - When used within a compound, a pipe () means
either/or - (thisthat) will return the position and length
of the first occurrence of this or that - A question mark (?) means that the previous
character, set or compond may or may not exist
but if it does, will exist 1 time - A plus () means that the previous character, set
or compond must exist 1 or more times - An asterisk () means that the previous
character, set or compond may exist 0 or more
times
19IA technique III - CF_EbayItem
-
- SELECT PRODUCT, PRICE
- FROM PRODUCTS
-
- Car Paint Colors
-
- product - price
AM nameAttributes.ReturnArray"
default"ReturnArray" url"http//search-desc.ebay.com/search/search.dll
?MfcISAPICommandGetResultebaytag1ebayreght1q
ueryAttributes.SearchItemebaytag1code0srchde
scySortPropertyMetaNewSort" method"GET"
resolveurl"true" LocalArrayArrayNew(1)
20IA technique III - CF_EbayItem
-
-
- y\.com/aw-cgi/eBayISAPI\.dll\?ViewItemitem0-9
"', cfhttp.filecontent, end, 1)
21IA technique III - CF_EbayItem
-
-
-
-
-
- ent, Item.pos1, item.len1))
-
-
-
-
-
- y
22Extra Information
- CFHTTP Headers - extra information returned by a
CFHTTP (or any HTTP) call - FILECONTENT - Text grabbed
- HEADER - Header info (including cookies)
- MIMETYPE - Return mime type
- RESPONSEHEADER - structure with all information
except content - STATUSCODE - HTTP return code
23Syndication (WDDX Queries)
- Can return structured information as a query
- Better to use WDDX to send query encoded in a
packet - Basis of Spectra syndication
- Can pass binary files encoded with ToBase64()
function
24Conference Closing Slide