Title: Integrating Techniques for Eventbased Business Intelligence Gathering
1Integrating Techniques for Event-based Business
Intelligence Gathering
Kareem S. Aggour John Interrante Ibrahim
Gokcen July 16, 2006
2Business Problems
- Manual search of existing news sources/
aggregators - Emergence of novel news sources
- Dealing with information explosion vs. keeping
abreast of important developments - Distributed data collection across marketing,
sales
3Motivation
- Identify sales/risk leads on 8 topics
- Risk Bankruptcy, Management Succession,
Litigation, Change in Auditors, Rating Change - Sales Bankruptcy, Outsourcing, Mergers
Acquisitions, Facility Expansions - Provide actionable and focused content to risk
and sales reps in financial services businesses - Automate extraction and integration of events
from multiple providers - Reduce repetitious work by centralizing event
collection
4Anticipated Results (MA examples)
- First Financial Management Corp said it has
offered to acquire Comdata Network Inc for 18
per share in cash and stock, or a total of about
342.7 million. - Delta announced last September that it was
purchasing Western. - Nerco Inc said its oil and gas unit closed the
acquisition of a 47 working interest in the
Broussard oil and gas field from Davis Oil Co for
about 22.5 million in cash.
Extract key information from articles efficiently
and with good precision/recall for all topics
5How? EBIG Agent Architecture
- Ontology generation
- Named entity extraction
- Targeted phrase extraction using a dependency
grammar
- Query generation expansion
- Data visualization
- Text classification
6Integrating Techniques
Query expansion
Text classification
Ontology generation
Data visualization
Event extraction
Named entity extraction
7Extraction Pipeline
Articles
Sentences
Events
Text classification
Ontology patterns
Named entity extraction
Phrase extraction
8Query Generation Expansion
- Store queries (to a news source for a given date
range) to prevent duplicate retrieval - If articles exist in the DB, retrieve from DB
- Expand queries based on previously retrieved
articles - Word frequency analysis on bag of words
- Present frequent words in relevant articles for
review
9Text Classification with SVMs
- Linear Support Vector Machines (SVMlight)
- High-dimensionality enables good class separation
- One-vs-all for 8 topics
- Amenable to incremental learning
- Label corrections by research analysts
- Incoming new articles
10Data Visualization
- Centroid algorithm for cluster-preserving
dimension reduction (Kim et al. 2005) Compute a
p-dimensional representation qi of an
n-dimensional vector q (p - Compute two centroids
- Cc1, c2
- Solve minqi Cqi q2
Rating change articles
Used primarily for article label validation and
finding anomalies
11Ontology Generation
- Topic patterns filter sentences
- Key nouns and key verbs combined (acceptoffer,
agreeacqui) symmetrically - Refined after precision/recall analysis
- Topic keywords are used to extract events
- Key nouns, verbs themselves
- Phrases are extracted around them
12Named Entity Extraction
- Existence of an entity (company, organization) in
a sentence indicates an event - Entities become a part of extraction rules
- Sentences with at least one entity are sent to
the event extractor - No anaphora resolution
- Commercial and Open Source tools available
- Connexors MEX, GATE
- Ability to add custom lexicons in both
13Targeted Phrase Extraction (TPE)
- Originates from Functional Dependency Grammar
(Tapanainen et al.) - The syntax tree of a sentence has a unique root,
which is the main verb of the sentence - All other verbs also are roots of subtrees
Delta announced last September that it was
purchasing Western
14Targeted Phrase Extraction (TPE)
- Given a target string S (key noun, verb or
company name) compute its subtree - If S is the main verb, output the entire parse
tree (except tmp) - If S is a subject or an object in the sentence
output the corresponding parse subtree - If S is a modifier of a subject or object, output
the corresponding parse subtree
15Targeted Phrase Extraction (TPE)
- Simple TPE rules become predicate-argument pairs
(word/concept, role)
- (C-Company, SUBJ) Extract all clauses where a
company name is a subject - Company X acquired Company Y
- (C-Company, OBJ) Extract all clauses where a
company name is an object - Company X acquired Company Y
- ((C-Company, ?), (takeover, MOD_OBJ)) Extract
all clauses where a company name is present and
the word takeover is an object modifier - Company X rebuffed a takeover proposal from
Company Y
16Experimental Results
- Reuters MA Reuters-21578, Apte-90 split, ACQ
category, - WSJ The Wall Street Journal articles on MA,
Bankruptcy, Facility Expansions
17Extraction Results
- First Financial Management Corp said it has
offered to acquire Comdata Network Inc for 18
per share in cash and stock, or a total of about
342.7 million - Named entities First Financial Management Corp,
Comdata Network Inc - Delta announced last September that it was
purchasing Western - Named entities Delta
- Nerco Inc said its oil and gas unit closed the
acquisition of a 47 working interest in the
Broussard oil and gas field from Davis Oil Co for
about 22.5 million in cash. - Named entities Nerco Inc, Davis Oil Co
- Gander Mountain Inc said it acquired the
privately held Western Ranchman Outfitters, a
catalog and point-of-purchase retailer of western
apparel based in Cheyenne, WY. - Named entities Gander Mountain Inc, Western
Ranchman Outfitters
18Using EBIG
19Company Searching
20Industry Searching
21Event Reports
22Heatmap Event Visualization
23Conclusions
- Illustrated an end-to-end business application of
event extraction - Demonstrated the applicability of a multi-agent
system integrating ML and NLP techniques to
collection of focal events - Analyst relevance feedback will be critical in
filtering content - Learning costs and benefits of news sources will
improve information quality and system efficiency - Deliberative learning
24Q A