Title: News Aggregator
1News Aggregator
- A news aggregator refers to a system including
software application, webpage or service that
collects syndicated content using RSS and other
XML feeds from weblogs and mainstream media
sites. Aggregators improve upon the time and
effort needed to regularly check websites of
interest for updates, creating a unique
information space or "personal newspaper." An
aggregator is able to subscribe to a feed, check
for new content at user-determined intervals, and
retrieve the content. The content is sometimes
described as being "pulled" to the subscriber, as
opposed to "pushed" with email or other channels.
Unlike recipients of some "pushed" information,
the aggregator user can easily unsubscribe from a
feed. - Software which allows syndicated news content
(such as RSS feeds) to be brought together and
displayed.
2Introduction of Python
- For Engr 101 5-Week News Aggregator Module
- Fall 2010
- Instructor Tao Wang
3What is a computer?
4Computer Organization
5Software / Programs
- Computer Programs instruct the CPU which
operations it should perform/execute - Programmers write the code for the tasks a
computer program performs - But computers only understand binary (1s and
0s) - Programmers need a language to write computer code
6Types of Programming Languages
- Low-Level Languages
- Machine Languages
- CPU instructions are binary strings of 1s and
0s 10010000 - Each kind of CPU has different instruction sets
- Programs are not portable across CPUs
architectures - Difficult for programmers to read/write
- Assembly Languages
- Use English-like abbreviations to represent CPU
instructions - A bit easier to understand MOV AL, 42h
- Converted to machine language using an assembler
- Still not portable
7Types of Programming Languages
- High-Level Languages
- C/C, Java/C, Python, Ruby, many more...
- These languages abstract hardware implementation
details - Provides programmers a logical computer model
- Allows programmer to focus on solving problems
instead of low-level hardware details - Use English-like keywords and statements to write
code - Use a compiler or interpreter that translates
code to machine language - Makes code portable across different CPUs and
platforms - Programmer does not have to learn each CPUs
instructions
8Compiled vs. Interpreted
- Both Compilers and Interpreters translate source
code to machine language - Compiled
- Must compile program on each target CPU
architecture prior to execution. - Interpreted
- Code can be directly run on any platform that has
an available interpreter
9About Python
- Designed by Guido van Rossum in late 1980s
- Interpreted programming language
- Imperative, Dynamic, and Object-Oriented
- Python Programs
- are a sequence of instructions written in Python
language - interpreter executes the instructions
sequentially, in order - programs can take inputs and send data to outputs
- programs can process and manipulate data
- programs may read and write data to RAM, Hard
Drive, ... - First Program
- print Welcome to learning Python.
10PYTHON LETS BEGIN
- The Python Shell (2.6)
- You type commands
- It does stuff
- It converts python to machines instructions and
runs them right now
11Python as Calculator
- gtgt 2 2 add
- 4
- gtgt 2 3 muliply
- 6
- gtgt 32 powers
- 9
- gtgt 12 11 modulo (remainder)
- 1
12Division
- Integer division
- 1/4 Integer division, no decimals
- 0
- Float division
- 1.0/4.0 float number division
- 0.25
13Literals, Variables, Data Types,Statements and
Expressions
14Literals, Data Types
- Numbers
- Integers are natural numbers ..., -2, -1, 0, 1,
2, ... (32 bits) - Floats contain decimals 4.5, 67.444443335,
7E2... - Booleans True, False
- Long ints that exceed 32 bit capacity
- Complex numbers 4.5, 1j 1j -1 0j
- Strings
- Strings are used to represent words, text, and
characters - examples (can use single, double or triple
quotes) - I am learning python.
- 'hello.'
15Variables
- Literals are data values that our program use
- Data is stored in memory, and we need a way to
access it - We use variables to name our data in memory
- We can retrieve data by calling the name that
refers to it - student_name Ben
- print student_name
- is the assignment operator, assigns a data
value to a variable
16Variables Syntax Rules
- Variable names must begin with a letter
(uppercase or lowercase) or underscore (_) - good programming convention variables should
not start with uppercase letters, commonly used
for something else - remainder of name can contain letters, (_), and
numbers - names may not contain spaces or special
characters - names are case sensitive and must not be a
reserved python keyword - myVariable and myvariable refer to different data
17Statements and Expressions
- Statement perform a task do not return a value
- x 2
- y 3
- print y
- Expression return a value
- gtgt x y
- 5
18Expressions (evaluate to values)
- Math expressions
- gtgt 10 2 3
- gtgt 10 (2.0 3)
- Boolean expressions
- gtgt 10 lt 2 False
- gtgt 10 gt10 True
- Combined with logic operators
- gtgt (10lt2) or (1010)
- Can combine
- gtgt (acd) gt (da-c)
19Expressions (evaluate to values)
- String expressions
- gtgt hel lo hello
- gtgt Hi 3 HiHiHi
- Operator Precedence
- Parentheses
- Exponentiation
- Multiplication and Division
- Addition and Subtraction
20Operator Precedence (top-to-bottom)
21Data Types
22Data Types
- What if data types dont match?
- STRONG TYPES no automatic conversion (for non
number types)
23Data Types
24Python Keywords, User Input
25Python Keywords
- RESERVED do not use as variable names
26User Input
- Create interactive programs by requesting user
input
27Control Structures
28Branching / Conditional Statements
29if - statement
- if a lt 0
- print a is negative
30if - else
31if - elif - else
- If one test fails, perform next test
32Nested if statements
33Modules
34Modules
- Python Strength large collection of open source
modules - Modules are collections (packages) of useful
(tested) code you can reuse - Common modules random, math
- The modules we use for the project
- urllib, xml
35Modules
- Python Standard Library (packages included with
most python distributions) - http//docs.python.org/library/index.html
- PyPI (Python Package Index)
- http//pypi.python.org/pypi
- repository of optional modules available (11,000)
36Using Modules
- Math module contains many useful functions and
values math.sin, math.cos, math.pow, math.sqrt,
... - Using modules
37Getting help
- In python interpreter you can get documentation
38Control StructuresRepetition, Iteration
39Repetition
- selection statements (if-elif-else) let us make
simple decisions - repeating statements and decisions let us build
more complex programs
40while
41Testing Primeness
42break statement
- break immediately ends loops
- DO NOT overuse Can be difficult to
read/understand logic
43Testing Primeness
44range(...)
- Built in function produces a sequence (list)
- range(0, 3) ? 0, 1, 2
- range(3) ? 0, 1, 2
- range(1, 3) ? 1, 2
- range(1,7,2) ? 1, 3, 5
45for
- The for loop is used for iteration
46continue statement
- break and continue work in both while and for
loops
47Find all primes
48while - else
49Nesting Control Structures
- We can nest control structures (if, while, for)
- We can nest many times
- while
- while
- while
- If
- for ...
- There is a limit If you reach it, something is
WRONG - Abuse makes code unreadable Use functions
instead... (more in a bit)
50Counter-Controlled Loops
51Sentinel-Controlled Loops
52Accumulating
53Swapping
- x 2
- y 3
- Swap (WRONG)
- x y
- y x
- x 3
- y 3
- Swap (CORRECT)
- z x
- x y
- y z
- x 3
- y 2
54Multiple Assignments
- aInt, bInt, cInt 1, 2, 3
- Swapping with multiple assignment
- aInt, bInt bInt, aInt
- Why does this work? (entire right side is
evaluated before assignments)
55Everything is an object
56Debugging
57Debugging
- Syntax Errors Python gives us an alert code
crashes - Runtime Errors How do we fix incorrect results
(logic) in our programs? - We need to trace the codes execution flow.
- Tracing Keep track of variable values as each
line is executed - Print Statements strategically add print to view
results at each step don't over do or it will be
difficult to keep track - Can help us detect Infinite Loops
58More Data Types...(LISTS)
59Collection Types
- List s
- Sequential and mutable
- gtgt k 1,3, 5
- gtgt m hel, 3
- Tuples
- Sequential and immutable
- gtgt (1,2,3)
- Dictionaries
- map collection
- gtgt dname Alice, grade 100
- gtgt print dname
- gtgt Alice
- Sets
- Has unique element
- gtgt aSet set(a,b)
-
60Lists (also called arrays)
- Lists are sequences of objects
- Mutable (unlike strings, and tuples)
- List are defined with square brackets ,
elements are comma , separated - List elements can have different types
- List indexing starts at 0
- If index is negative, begin at end of list
- If index past last element ERROR
61List access
- Indexing and Slicing just like strings (same
rules apply)
62Working with lists
- Can convert other collections to lists
- List can contain mixed types, including other
collections
63Indexing lists of lists
64List operators
- concatenates two lists (list1 list2)
- repeats the list a number of times (list1
Integer) - in tests membership
65List comparisons
- gt, lt, , lt, gt, !
- Similar rules to strings, compares elements in
order - ordered elements being compared should have same
type
66Collection functions
- len(C) returns the length of the collection C
- min(C) returns the minimum element in C only
considers first element of list of lists - max(C) returns the maximum element in C only
considers first element of list of lists - sum(L) returns the sum of elements in list L
elements must all be numbers
67Lists can change
- Lists can be modified, MUTABLE
- Strings cannot be changed, IMMUTABLE
68List methods
- There are Non-modifying methods (don't change the
list) - index(x)
- count(x)
- and Modifying methods (Will change the list
without need for assignment) - append(x)
- extend(C)
- insert(i, x)
- remove(x)
- pop()
- sort()
- reverse()
69Appending and Extending
- append(...) adds a single element to a list
- extend(...) joins two lists
- can also use ''
70List methods
- sort()
- count(x)
- pop()
- del keyword also removes an element
71split() and join()
- Going from strings to lists and back again
- These are string methods
- join(C) takes as an argument any iterable
collection type (such as lists, strings, tuples)
72List Comprehension
73Strings
74Quote Use
- Single Quotes
- These strings must fit on a single line of
source - Double Quotes
- Also has to fit on a single line of source
- Triple (single or double) Quotes
- """ These quotes are very useful when you need to
span multiple lines. They are also often used for
long code comments """
75Quotes inside strings
- To use apostrophes
- " Let's use double quotes
- To use double quotes in our strings
- ' They say, "use single quotes"
- Triple Quotes can take care of both cases
- """ With 3 quotes it's "easy" to use apostrophes
quotes. """ - ''' With 3 quotes it's "easy" to use apostrophes
quotes. '''
76Slash \
- We can use the \ to span multiple lines
- Works with strings or expressions
- No character can follow the \
77Character escaping
- Since some characters have special meanings, we
have to escape them to use them in our strings - "We can \"escape\" characters like this
- 'Or let\'s escape them like this
- 'and this \\ is how we get a backslash in our
string'
78Whitespace
- This is an empty string, not a character
- This is a space
- This is a tab (a single character)
- This is a new line (in Unix/Mac OS X)
- This is a new line (in Windows)
- This is a new line (in old Mac lt 9)
79Strings are sequences
80Simple string usage
- Can access with indexing like lists
- Strings do not have append(...) and extend(...)
functions
81Adding () and Repeating ()
- We can add (concatenate) strings with
- We can also repeat them with
82Compare strings
- Test equality using
- What about lt, gt, lt, gt
83Strings
- Strings are sequences like lists
- Each element in a string is a character
- Characters we can print letters ('a', 'b', 'c',
...) numbers ('1', '3', '4', ...) and symbols
('_at_', '', '', ...) - Non printing characters
- Whitespace '\t', '\n', '\r\n'
- try printing this '\a'
84Characters are really numbers
85Character numerical values
86Print the ABCs
87String Comparisons
- Characters are compared by their numerical value
- shorter strings are smaller
- If first characters are equal, compare the next
one
88String Comparisons
- These are characters, not numbers
- Capital letters are smaller (refer to ascii table)
89Testing membership
90import string
91String is an Object
- Objects contain
- Data
- x 'Hello' data is sequence of characters
- Actions (methods)
- things object can do (often on self)
92Upper/Lower Case methods
- These methods are available to all string objects
- Strings are IMMUTABLE
- this means that characters in the sequence cannot
change - methods return a new string
- original data is unchanged
93What kind of character
- These methods are available to all string objects
- Tests that return boolean types
- isalnum() - does the string contain only letters
and digits - isalpha() - does the string contain only letters
- isdigit() - does the string contain only digits
94Formatting strings with strip(...)
95Formatting strings
96String Formatting
97Formatting Floats
98Creating Forms/Templates
99Using replace
100Output
101find(...) rfind(...)
102Go through all matches and capitalize
103Dictionaries
104Dictionaries
- Another collection type, but NOT a sequence
(order doesn't matter) - Also referred to as an associative array, or map
- Dictionaries are a collection of keys that point
to values
105Key --gt Value
106About Dictionaries
- Define dictionaries using curly braces
- key-value pairs are separated using colons
- Dictionaries are MUTABLE (can add or remove
entries) - Keys
- Can only be IMMUTABLE objects (int, float,
string, tuples) - Values
- Can be anything
- Idea easier to find data based on a simple key,
like the English Language Webster Dictionary
107Indexing and Assignment
- Index using square brackets and keys returns
associated value - Numbered indices are not defined
- Can modify the dictionary by assigning new
key-value pairs or changing value a key points to
108Dictionaries with Different Key Types
- Cannot index or search based on values, only
through keys - Numbers, Strings, Tuples can be keys (anything
IMMUTABLE)
109Operators
- for indexing using key inside square
brackets - len() "length" is the number of key-value pairs
in the dictionary - in boolean test of membership (is key in
dictionary?) - for iterates through keys in the dictionary
110Operators in use
111Dictionary Methods
- items() returns all the key-value pairs as a
list of tuples - keys() returns a list of all the keys
- values() returns a list of all the values
- copy() returns a shallow copy of the dictionary
112Methods
113zip( ) - dict( )
- zip() creates a list of tuples from two lists
- dict() creates a dictionary from a mapping
object, or an empty dictionary
114Functions
115Why Use Functions?
- Functions provide encapsulation, making code
better, readable - Divide and Conquer Problem Solving
- Break large complicated problems into smaller
sub-problems - Solution to sub-problems written in functions
- sub-problem solutions can be reused, shared
- Simplification/Readability
- Removes duplicate code with function calls
116Why Use Functions?
- Abstraction
- Provides a high-level interface to program
- You know WHAT it does, not HOW it does it
- Security
- A small well defined piece of code is easier to
prove correct - Code can be fixed in one place, and will be
corrected everywhere function is called
117Function Definition
118Function Calls
119Functions that do things...
- no parameters
- no return statement
- affect the environment
120Functions that have parameters...
- definition has parameters
- call takes arguments
- no return statement
- affect the environment
121Functions that return results...
- return keyword
- Function performs processing
- Returns value/object
122Functions with default parameters...
- Parameters can have default values
- We can call this function with
- print_message("Hello class.")
- print_message("Hello class.", 3)
- Can explicitly name arguments
- print_message(times3, msg"Hello class.")
123Variable Scope (local variables)
124Variable Scope (global variables)
125Introduction of News Aggregator
126First Module - urllib
- Usage import urllib
- This module provides a high-level interface for
fetching data across the World Wide Web. In
particular, the urlopen() function is similar to
the built-in function open(), but accepts
Universal Resource Locators (URLs) instead of
filenames. Some restrictions apply -- it can only
open URLs for reading, and no seek operations are
available
127urlopen()
- urllib.urlopen(url)
- Example
- import urllib
- f urllib.open(http//www.python.org)
- text f.read()
128Second Module - xml.dom.minidom
- Usage
- from xml.dom.minidom import parse, parseString
- is a light-weight implementation of the Document
Object Model interface. It is intended to be
simpler than the full DOM and also significantly
smaller - DOM applications typically start by parsing some
XML into a DOM
129RSS feed program
- def get_latest_feed_items()
- return item_list
- def search_latest_feed_items(item_list,
searchterm) - Return filtered_item_list
- Example search item description
- Function usage
- latests get_latest_feed_items()
- search search_latest_feed_items(latests,
"game")
130Example of Modification
- Retrieve latest feed item list
- item_list get_latest_feed_items()
- Define a search term, means all
- searchterm
- Obtain the filtered item list
- filtered_item_list search_latest_feed_items(item
_list,searchterm)
131Example of Modification
- Remember, keys tagnames in the XML!
- If you want to modify useful_keys, make sure you
attach the "u". - For example, if you want to add author, add
u'author' to the list - Define your useful keys
- useful_keys u'title', u'pubDate',
u'description', u'guid'
132Example of Modification
- Display all items and keys
- for item in filtered_item_list
- for key in useful_keys
- print "s s" (key,itemkey)
- print key " " itemkey
- print " "
133Some Modification Ideas (1)
- Read in an RSS feed and find MULTIPLE keywords
(as many as the user wants), - Return the corresponding articles.
- You may want to think about the readability of
the results. - Note that articles MAY be repeated if different
keywords occur in their titles and/or description
(hint Useful keys).
134Some Modification Ideas (II)
- Filter articles from an RSS feed based on
multiple keywords. - (hint Nested loops, filtering by one keyword in
each loop).
135Some Modification Ideas (III)
- Count how many times certain interesting words
appear in an RSS feed - Plot Excel charts (bar, pie, or line graphs).
136Some Modification Ideas (IV)
- Read an RSS feed and allow the user to specify
how many news he/she wants to see at one time. - You may want to display how the total number of
news first, - THEN ask the user how many news they want to see.
137Some Modification Ideas (IV)
- The ability to take MULTIPLE RSS feeds, then go
through them ALL and look for articles with a
certain keyword. - You can either give user a limit on maximum
number of feeds, or allow as many feeds as user
wants. - Note Probably the hardest. This one simulates a
mini search engine / web crawler.
138Your Works
- Specify roles
- Come out some ideas or use those ideas but
explain in your own words - How much progress you can make
- Team work, coordinate with each other (Project
manager) - Try to answer all listed question
- Prepare your presentation and all other works
- Grade is based on creativity and complexity as
well as the role you performed
139Discussion