News Aggregator - PowerPoint PPT Presentation

1 / 139
About This Presentation
Title:

News Aggregator

Description:

News Aggregator A news aggregator refers to a system including software application, webpage or service that collects syndicated content using RSS and other XML feeds ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 140
Provided by: cun79
Category:

less

Transcript and Presenter's Notes

Title: News Aggregator


1
News Aggregator
  • A news aggregator refers to a system including
    software application, webpage or service that
    collects syndicated content using RSS and other
    XML feeds from weblogs and mainstream media
    sites. Aggregators improve upon the time and
    effort needed to regularly check websites of
    interest for updates, creating a unique
    information space or "personal newspaper." An
    aggregator is able to subscribe to a feed, check
    for new content at user-determined intervals, and
    retrieve the content. The content is sometimes
    described as being "pulled" to the subscriber, as
    opposed to "pushed" with email or other channels.
    Unlike recipients of some "pushed" information,
    the aggregator user can easily unsubscribe from a
    feed.
  • Software which allows syndicated news content
    (such as RSS feeds) to be brought together and
    displayed.

2
Introduction of Python
  • For Engr 101 5-Week News Aggregator Module
  • Fall 2010
  • Instructor Tao Wang

3
What is a computer?
4
Computer Organization
5
Software / Programs
  • Computer Programs instruct the CPU which
    operations it should perform/execute
  • Programmers write the code for the tasks a
    computer program performs
  • But computers only understand binary (1s and
    0s)
  • Programmers need a language to write computer code

6
Types of Programming Languages
  • Low-Level Languages
  • Machine Languages
  • CPU instructions are binary strings of 1s and
    0s 10010000
  • Each kind of CPU has different instruction sets
  • Programs are not portable across CPUs
    architectures
  • Difficult for programmers to read/write
  • Assembly Languages
  • Use English-like abbreviations to represent CPU
    instructions
  • A bit easier to understand MOV AL, 42h
  • Converted to machine language using an assembler
  • Still not portable

7
Types of Programming Languages
  • High-Level Languages
  • C/C, Java/C, Python, Ruby, many more...
  • These languages abstract hardware implementation
    details
  • Provides programmers a logical computer model
  • Allows programmer to focus on solving problems
    instead of low-level hardware details
  • Use English-like keywords and statements to write
    code
  • Use a compiler or interpreter that translates
    code to machine language
  • Makes code portable across different CPUs and
    platforms
  • Programmer does not have to learn each CPUs
    instructions

8
Compiled vs. Interpreted
  • Both Compilers and Interpreters translate source
    code to machine language
  • Compiled
  • Must compile program on each target CPU
    architecture prior to execution.
  • Interpreted
  • Code can be directly run on any platform that has
    an available interpreter

9
About Python
  • Designed by Guido van Rossum in late 1980s
  • Interpreted programming language
  • Imperative, Dynamic, and Object-Oriented
  • Python Programs
  • are a sequence of instructions written in Python
    language
  • interpreter executes the instructions
    sequentially, in order
  • programs can take inputs and send data to outputs
  • programs can process and manipulate data
  • programs may read and write data to RAM, Hard
    Drive, ...
  • First Program
  • print Welcome to learning Python.

10
PYTHON LETS BEGIN
  • The Python Shell (2.6)
  • You type commands
  • It does stuff
  • It converts python to machines instructions and
    runs them right now

11
Python as Calculator
  • gtgt 2 2 add
  • 4
  • gtgt 2 3 muliply
  • 6
  • gtgt 32 powers
  • 9
  • gtgt 12 11 modulo (remainder)
  • 1

12
Division
  • Integer division
  • 1/4 Integer division, no decimals
  • 0
  • Float division
  • 1.0/4.0 float number division
  • 0.25

13
Literals, Variables, Data Types,Statements and
Expressions
14
Literals, Data Types
  • Numbers
  • Integers are natural numbers ..., -2, -1, 0, 1,
    2, ... (32 bits)
  • Floats contain decimals 4.5, 67.444443335,
    7E2...
  • Booleans True, False
  • Long ints that exceed 32 bit capacity
  • Complex numbers 4.5, 1j 1j -1 0j
  • Strings
  • Strings are used to represent words, text, and
    characters
  • examples (can use single, double or triple
    quotes)
  • I am learning python.
  • 'hello.'

15
Variables
  • Literals are data values that our program use
  • Data is stored in memory, and we need a way to
    access it
  • We use variables to name our data in memory
  • We can retrieve data by calling the name that
    refers to it
  • student_name Ben
  • print student_name
  • is the assignment operator, assigns a data
    value to a variable

16
Variables Syntax Rules
  • Variable names must begin with a letter
    (uppercase or lowercase) or underscore (_)
  • good programming convention variables should
    not start with uppercase letters, commonly used
    for something else
  • remainder of name can contain letters, (_), and
    numbers
  • names may not contain spaces or special
    characters
  • names are case sensitive and must not be a
    reserved python keyword
  • myVariable and myvariable refer to different data

17
Statements and Expressions
  • Statement perform a task do not return a value
  • x 2
  • y 3
  • print y
  • Expression return a value
  • gtgt x y
  • 5

18
Expressions (evaluate to values)
  • Math expressions
  • gtgt 10 2 3
  • gtgt 10 (2.0 3)
  • Boolean expressions
  • gtgt 10 lt 2 False
  • gtgt 10 gt10 True
  • Combined with logic operators
  • gtgt (10lt2) or (1010)
  • Can combine
  • gtgt (acd) gt (da-c)

19
Expressions (evaluate to values)
  • String expressions
  • gtgt hel lo hello
  • gtgt Hi 3 HiHiHi
  • Operator Precedence
  • Parentheses
  • Exponentiation
  • Multiplication and Division
  • Addition and Subtraction

20
Operator Precedence (top-to-bottom)
21
Data Types
  • Finding out a data type

22
Data Types
  • What if data types dont match?
  • STRONG TYPES no automatic conversion (for non
    number types)

23
Data Types
  • Explicit conversion

24
Python Keywords, User Input
25
Python Keywords
  • RESERVED do not use as variable names

26
User Input
  • Create interactive programs by requesting user
    input

27
Control Structures
28
Branching / Conditional Statements
  • Decision making

29
if - statement
  • if a lt 0
  • print a is negative

30
if - else
31
if - elif - else
  • If one test fails, perform next test

32
Nested if statements
33
Modules
34
Modules
  • Python Strength large collection of open source
    modules
  • Modules are collections (packages) of useful
    (tested) code you can reuse
  • Common modules random, math
  • The modules we use for the project
  • urllib, xml

35
Modules
  • Python Standard Library (packages included with
    most python distributions)
  • http//docs.python.org/library/index.html
  • PyPI (Python Package Index)
  • http//pypi.python.org/pypi
  • repository of optional modules available (11,000)

36
Using Modules
  • Math module contains many useful functions and
    values math.sin, math.cos, math.pow, math.sqrt,
    ...
  • Using modules

37
Getting help
  • In python interpreter you can get documentation

38
Control StructuresRepetition, Iteration
39
Repetition
  • selection statements (if-elif-else) let us make
    simple decisions
  • repeating statements and decisions let us build
    more complex programs

40
while
41
Testing Primeness
42
break statement
  • break immediately ends loops
  • DO NOT overuse Can be difficult to
    read/understand logic

43
Testing Primeness
44
range(...)
  • Built in function produces a sequence (list)
  • range(0, 3) ? 0, 1, 2
  • range(3) ? 0, 1, 2
  • range(1, 3) ? 1, 2
  • range(1,7,2) ? 1, 3, 5

45
for
  • The for loop is used for iteration

46
continue statement
  • break and continue work in both while and for
    loops

47
Find all primes
48
while - else
49
Nesting Control Structures
  • We can nest control structures (if, while, for)
  • We can nest many times
  • while
  • while
  • while
  • If
  • for ...
  • There is a limit If you reach it, something is
    WRONG
  • Abuse makes code unreadable Use functions
    instead... (more in a bit)

50
Counter-Controlled Loops
51
Sentinel-Controlled Loops
52
Accumulating
53
Swapping
  • x 2
  • y 3
  • Swap (WRONG)
  • x y
  • y x
  • x 3
  • y 3
  • Swap (CORRECT)
  • z x
  • x y
  • y z
  • x 3
  • y 2

54
Multiple Assignments
  • aInt, bInt, cInt 1, 2, 3
  • Swapping with multiple assignment
  • aInt, bInt bInt, aInt
  • Why does this work? (entire right side is
    evaluated before assignments)

55
Everything is an object
56
Debugging
57
Debugging
  • Syntax Errors Python gives us an alert code
    crashes
  • Runtime Errors How do we fix incorrect results
    (logic) in our programs?
  • We need to trace the codes execution flow.
  • Tracing Keep track of variable values as each
    line is executed
  • Print Statements strategically add print to view
    results at each step don't over do or it will be
    difficult to keep track
  • Can help us detect Infinite Loops

58
More Data Types...(LISTS)
59
Collection Types
  • List s
  • Sequential and mutable
  • gtgt k 1,3, 5
  • gtgt m hel, 3
  • Tuples
  • Sequential and immutable
  • gtgt (1,2,3)
  • Dictionaries
  • map collection
  • gtgt dname Alice, grade 100
  • gtgt print dname
  • gtgt Alice
  • Sets
  • Has unique element
  • gtgt aSet set(a,b)

60
Lists (also called arrays)
  • Lists are sequences of objects
  • Mutable (unlike strings, and tuples)
  • List are defined with square brackets ,
    elements are comma , separated
  • List elements can have different types
  • List indexing starts at 0
  • If index is negative, begin at end of list
  • If index past last element ERROR

61
List access
  • Indexing and Slicing just like strings (same
    rules apply)

62
Working with lists
  • Can convert other collections to lists
  • List can contain mixed types, including other
    collections

63
Indexing lists of lists
  • Lists can be nested

64
List operators
  • concatenates two lists (list1 list2)
  • repeats the list a number of times (list1
    Integer)
  • in tests membership

65
List comparisons
  • gt, lt, , lt, gt, !
  • Similar rules to strings, compares elements in
    order
  • ordered elements being compared should have same
    type

66
Collection functions
  • len(C) returns the length of the collection C
  • min(C) returns the minimum element in C only
    considers first element of list of lists
  • max(C) returns the maximum element in C only
    considers first element of list of lists
  • sum(L) returns the sum of elements in list L
    elements must all be numbers

67
Lists can change
  • Lists can be modified, MUTABLE
  • Strings cannot be changed, IMMUTABLE

68
List methods
  • There are Non-modifying methods (don't change the
    list)
  • index(x)
  • count(x)
  • and Modifying methods (Will change the list
    without need for assignment)
  • append(x)
  • extend(C)
  • insert(i, x)
  • remove(x)
  • pop()
  • sort()
  • reverse()

69
Appending and Extending
  • append(...) adds a single element to a list
  • extend(...) joins two lists
  • can also use ''

70
List methods
  • sort()
  • count(x)
  • pop()
  • del keyword also removes an element

71
split() and join()
  • Going from strings to lists and back again
  • These are string methods
  • join(C) takes as an argument any iterable
    collection type (such as lists, strings, tuples)

72
List Comprehension
73
Strings
74
Quote Use
  • Single Quotes
  • These strings must fit on a single line of
    source
  • Double Quotes
  • Also has to fit on a single line of source
  • Triple (single or double) Quotes
  • """ These quotes are very useful when you need to
    span multiple lines. They are also often used for
    long code comments """

75
Quotes inside strings
  • To use apostrophes
  • " Let's use double quotes
  • To use double quotes in our strings
  • ' They say, "use single quotes"
  • Triple Quotes can take care of both cases
  • """ With 3 quotes it's "easy" to use apostrophes
    quotes. """
  • ''' With 3 quotes it's "easy" to use apostrophes
    quotes. '''

76
Slash \
  • We can use the \ to span multiple lines
  • Works with strings or expressions
  • No character can follow the \

77
Character escaping
  • Since some characters have special meanings, we
    have to escape them to use them in our strings
  • "We can \"escape\" characters like this
  • 'Or let\'s escape them like this
  • 'and this \\ is how we get a backslash in our
    string'

78
Whitespace
  • This is an empty string, not a character
  • This is a space
  • This is a tab (a single character)
  • This is a new line (in Unix/Mac OS X)
  • This is a new line (in Windows)
  • This is a new line (in old Mac lt 9)

79
Strings are sequences
80
Simple string usage
  • Can access with indexing like lists
  • Strings do not have append(...) and extend(...)
    functions

81
Adding () and Repeating ()
  • We can add (concatenate) strings with
  • We can also repeat them with

82
Compare strings
  • Test equality using
  • What about lt, gt, lt, gt

83
Strings
  • Strings are sequences like lists
  • Each element in a string is a character
  • Characters we can print letters ('a', 'b', 'c',
    ...) numbers ('1', '3', '4', ...) and symbols
    ('_at_', '', '', ...)
  • Non printing characters
  • Whitespace '\t', '\n', '\r\n'
  • try printing this '\a'

84
Characters are really numbers
  • ASCII table

85
Character numerical values
86
Print the ABCs
  • Using numbers...

87
String Comparisons
  • Characters are compared by their numerical value
  • shorter strings are smaller
  • If first characters are equal, compare the next
    one

88
String Comparisons
  • These are characters, not numbers
  • Capital letters are smaller (refer to ascii table)

89
Testing membership
90
import string
91
String is an Object
  • Objects contain
  • Data
  • x 'Hello' data is sequence of characters
  • Actions (methods)
  • things object can do (often on self)

92
Upper/Lower Case methods
  • These methods are available to all string objects
  • Strings are IMMUTABLE
  • this means that characters in the sequence cannot
    change
  • methods return a new string
  • original data is unchanged

93
What kind of character
  • These methods are available to all string objects
  • Tests that return boolean types
  • isalnum() - does the string contain only letters
    and digits
  • isalpha() - does the string contain only letters
  • isdigit() - does the string contain only digits

94
Formatting strings with strip(...)
95
Formatting strings
96
String Formatting
97
Formatting Floats
98
Creating Forms/Templates
99
Using replace
100
Output
101
find(...) rfind(...)
102
Go through all matches and capitalize
103
Dictionaries
104
Dictionaries
  • Another collection type, but NOT a sequence
    (order doesn't matter)
  • Also referred to as an associative array, or map
  • Dictionaries are a collection of keys that point
    to values

105
Key --gt Value
106
About Dictionaries
  • Define dictionaries using curly braces
  • key-value pairs are separated using colons
  • Dictionaries are MUTABLE (can add or remove
    entries)
  • Keys
  • Can only be IMMUTABLE objects (int, float,
    string, tuples)
  • Values
  • Can be anything
  • Idea easier to find data based on a simple key,
    like the English Language Webster Dictionary

107
Indexing and Assignment
  • Index using square brackets and keys returns
    associated value
  • Numbered indices are not defined
  • Can modify the dictionary by assigning new
    key-value pairs or changing value a key points to

108
Dictionaries with Different Key Types
  • Cannot index or search based on values, only
    through keys
  • Numbers, Strings, Tuples can be keys (anything
    IMMUTABLE)

109
Operators
  • for indexing using key inside square
    brackets
  • len() "length" is the number of key-value pairs
    in the dictionary
  • in boolean test of membership (is key in
    dictionary?)
  • for iterates through keys in the dictionary

110
Operators in use
111
Dictionary Methods
  • items() returns all the key-value pairs as a
    list of tuples
  • keys() returns a list of all the keys
  • values() returns a list of all the values
  • copy() returns a shallow copy of the dictionary

112
Methods
113
zip( ) - dict( )
  • zip() creates a list of tuples from two lists
  • dict() creates a dictionary from a mapping
    object, or an empty dictionary

114
Functions
115
Why Use Functions?
  • Functions provide encapsulation, making code
    better, readable
  • Divide and Conquer Problem Solving
  • Break large complicated problems into smaller
    sub-problems
  • Solution to sub-problems written in functions
  • sub-problem solutions can be reused, shared
  • Simplification/Readability
  • Removes duplicate code with function calls

116
Why Use Functions?
  • Abstraction
  • Provides a high-level interface to program
  • You know WHAT it does, not HOW it does it
  • Security
  • A small well defined piece of code is easier to
    prove correct
  • Code can be fixed in one place, and will be
    corrected everywhere function is called

117
Function Definition
118
Function Calls
119
Functions that do things...
  • no parameters
  • no return statement
  • affect the environment

120
Functions that have parameters...
  • definition has parameters
  • call takes arguments
  • no return statement
  • affect the environment

121
Functions that return results...
  • return keyword
  • Function performs processing
  • Returns value/object

122
Functions with default parameters...
  • Parameters can have default values
  • We can call this function with
  • print_message("Hello class.")
  • print_message("Hello class.", 3)
  • Can explicitly name arguments
  • print_message(times3, msg"Hello class.")

123
Variable Scope (local variables)
124
Variable Scope (global variables)
125
Introduction of News Aggregator
126
First Module - urllib
  • Usage import urllib
  • This module provides a high-level interface for
    fetching data across the World Wide Web. In
    particular, the urlopen() function is similar to
    the built-in function open(), but accepts
    Universal Resource Locators (URLs) instead of
    filenames. Some restrictions apply -- it can only
    open URLs for reading, and no seek operations are
    available

127
urlopen()
  • urllib.urlopen(url)
  • Example
  • import urllib
  • f urllib.open(http//www.python.org)
  • text f.read()

128
Second Module - xml.dom.minidom
  • Usage
  • from xml.dom.minidom import parse, parseString
  • is a light-weight implementation of the Document
    Object Model interface. It is intended to be
    simpler than the full DOM and also significantly
    smaller
  • DOM applications typically start by parsing some
    XML into a DOM

129
RSS feed program
  • def get_latest_feed_items()
  • return item_list
  • def search_latest_feed_items(item_list,
    searchterm)
  • Return filtered_item_list
  • Example search item description
  • Function usage
  • latests get_latest_feed_items()
  • search search_latest_feed_items(latests,
    "game")

130
Example of Modification
  • Retrieve latest feed item list
  • item_list get_latest_feed_items()
  • Define a search term, means all
  • searchterm
  • Obtain the filtered item list
  • filtered_item_list search_latest_feed_items(item
    _list,searchterm)

131
Example of Modification
  • Remember, keys tagnames in the XML!
  • If you want to modify useful_keys, make sure you
    attach the "u".
  • For example, if you want to add author, add
    u'author' to the list
  • Define your useful keys
  • useful_keys u'title', u'pubDate',
    u'description', u'guid'

132
Example of Modification
  • Display all items and keys
  • for item in filtered_item_list
  • for key in useful_keys
  • print "s s" (key,itemkey)
  • print key " " itemkey
  • print " "

133
Some Modification Ideas (1)
  • Read in an RSS feed and find MULTIPLE keywords
    (as many as the user wants),
  • Return the corresponding articles.
  • You may want to think about the readability of
    the results.
  • Note that articles MAY be repeated if different
    keywords occur in their titles and/or description
    (hint Useful keys).

134
Some Modification Ideas (II)
  • Filter articles from an RSS feed based on
    multiple keywords.
  • (hint Nested loops, filtering by one keyword in
    each loop).

135
Some Modification Ideas (III)
  • Count how many times certain interesting words
    appear in an RSS feed
  • Plot Excel charts (bar, pie, or line graphs).

136
Some Modification Ideas (IV)
  • Read an RSS feed and allow the user to specify
    how many news he/she wants to see at one time.
  • You may want to display how the total number of
    news first,
  • THEN ask the user how many news they want to see.

137
Some Modification Ideas (IV)
  • The ability to take MULTIPLE RSS feeds, then go
    through them ALL and look for articles with a
    certain keyword.
  • You can either give user a limit on maximum
    number of feeds, or allow as many feeds as user
    wants.
  • Note Probably the hardest. This one simulates a
    mini search engine / web crawler.

138
Your Works
  • Specify roles
  • Come out some ideas or use those ideas but
    explain in your own words
  • How much progress you can make
  • Team work, coordinate with each other (Project
    manager)
  • Try to answer all listed question
  • Prepare your presentation and all other works
  • Grade is based on creativity and complexity as
    well as the role you performed

139
Discussion
Write a Comment
User Comments (0)
About PowerShow.com