I256: Applied Natural Language Processing - PowerPoint PPT Presentation

About This Presentation

Title:

I256: Applied Natural Language Processing

Description:

Posed as an alternative to LSA. score(choicei) = log2(p(problem ... Classification formalism: Decision Lists. 27. Slide adapted from Manning & Raghavan ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 33

Provided by: coursesIs

Learn more at: https://courses.ischool.berkeley.edu

Category:

Tags: applied | formalism | i256 | language | natural | processing

Transcript and Presenter's Notes

Title: I256: Applied Natural Language Processing

1
I256 Applied Natural Language Processing
Marti Hearst Nov 13, 2006
2
Today

Automating Lexicon Construction

3
PMI (Turney 2001)

Pointwise Mutual Information
Posed as an alternative to LSA
score(choicei) log2(p(problem choicei) /
(p(problem)p(choicei)))
With various assumptions, this simplifies to
score(choicei) p(problem choicei) /
p(choicei)
Conducts experiments with 4 ways to compute this
score1(choicei) hits(problem AND choicei) /
hits(choicei)

4
Dependency Parser (Lin 98)

Syntactic parser that emphasizes dependancy
relationships between lexical items.
Alice is the author of the book.
The book is written by Alice

pcomp-n
pred
mod
det
s
det
Illustration by Bengi Mizrahi
5
Automating Lexicon Construction
6
What is a Lexicon?

A database of the vocabulary of a particular
domain (or a language)
More than a list of words/phrases
Usually some linguistic information
Morphology (manag- e/es/ing/ed ? manage)
Syntactic patterns (transitivity etc)
Often some semantic information
Is-a hierarchy
Synonymy
Numbers convert to normal form Four ? 4
Date convert to normal form
Alternative names convert to explicit form
Mr. Carr, Tyler, Presenter ? Tyler Carr

7
Lexica in Text Mining

Many text mining tasks require named entity
recognition.
Named entity recognition requires a lexicon in
most cases.
Example 1 Question answering
Where is Mount Everest?
A list of geographic locations increases accuracy
Example 2 Information extraction
Consider scraping book data from amazon.com
Template contains field publisher
A list of publishers increases accuracy
Manual construction is expensive 1000s of person
hours!
Sometimes an unstructured inventory is sufficient
Often you need more structure, e.g., hierarchy

8
Semantic Relation Detection

Goal automatically augment a lexical database
Many potential relation types
ISA (hypernymy/hyponymy)
Part-Of (meronymy)
Idea find unambiguous contexts which (nearly)
always indicate the relation of interest

9
Lexico-Syntactic Patterns (Hearst 92)
10
Lexico-Syntactic Patterns (Hearst 92)
11
Adding a New Relation
12
Automating Semantic Relation Detection

Lexico-syntactic Patterns
Should occur frequently in text
Should (nearly) always suggest the relation of
interest
Should be recognizable with little pre-encoded
knowledge.
These patterns have been used extensively by
other researchers.

13
Lexicon Construction (Riloff 93)

Attempt 1 Iterative expansion of phrase list
Start with
Large text corpus
List of seed words
Identify good seed word contexts
Collect close nouns in contexts
Compute confidence scores for nouns
Iteratively add high-confidence nouns to seed
word list. Go to 2.
Output Ranked list of candidates

14
Lexicon Construction Example

Category weapon
Seed words bomb, dynamite, explosives
Context ltnew-phrasegt and ltseed-phrasegt
Iterate
Context They use TNT and other explosives.
Add word TNT
Other words added by algorithm rockets, bombs,
missile, arms, bullets

15
Lexicon Construction Attempt 2

Multilevel bootstrapping (Riloff and Jones 1999)
Generate two data structures in parallel
The lexicon
A list of extraction patterns
Input as before
Corpus (not annotated)
List of seed words

16
Multilevel Bootstrapping

Initial lexicon seed words
Level 1 Mutual bootstrapping
Extraction patterns are learned from lexicon
entries.
New lexicon entries are learned from extraction
patterns
Iterate
Level 2 Filter lexicon
Retain only most reliable lexicon entries
Go back to level 1
2-level performs better than just level 1.

17
Scoring of Patterns

Example
Concept company
Pattern owned by ltxgt
Patterns are scored as follows
score(pattern) F/N log(F)
F number of unique lexicon entries produced by
the pattern
N total number of unique phrases produced by
the pattern
Selects for patterns that are
Selective (F/N part)
Have a high yield (log(F) part)

18
Scoring of Noun Phrases

Noun phrases are scored as follows
score(NP) sum_k (1 0.01 score(pattern_k))
where we sum over all patterns that fire for NP
Main criterion is number of independent patterns
that fire for this NP.
Give higher score for NPs found by
high-confidence patterns.
Example
New candidate phrase boeing
Occurs in owned by ltxgt, sold to ltxgt, offices of
ltxgt

19
Shallow Parsing

Shallow parsing needed
For identifying noun phrases and their heads
For generating extraction patterns
For scoring, when are two noun phrases the same?
Head phrase matching
X matches Y if X is the rightmost substring of Y
New Zealand matches Eastern New Zealand
New Zealand cheese does not match New Zealand

20
Seed Words
21
Mutual Bootstrapping
22
Extraction Patterns
23
Level 1 Mutual Bootstrapping

Drift can occur.
It only takes one bad apple to spoil the barrel.
Example head
Introduce level 2 bootstrapping to prevent drift.

24
Level 2 Meta-Bootstrapping
25
Evaluation
26
CoTraining (CollinsSinger 99)

Similar back and forth between
an extraction algorithm and
a lexicon
New They use word-internal features
Is the word all caps? (IBM)
Is the word all caps with at least one period?
(N.Y.)
Non-alphabetic character? (ATT)
The constituent words of the phrase (Bill is a
feature of the phrase Bill Clinton)
Classification formalism Decision Lists

27
CollinsSinger Seed Words
Note that categories are more generic than in the
case of Riloff/Jones.
28
CollinsSinger Algorithm

Train decision rules on current lexicon
(initially seed words).
Result new set of decision rules.
Apply decision rules to training set
Result new lexicon
Repeat

29
CollinsSinger Results
Per-token evaluation?
30
More Recent Work

Knowitall system at U Washington
WebFountain project at IBM

31
Lexica Limitations

Named entity recognition is more than lookup in a
list.
Linguistic variation
Manage, manages, managed, managing
Non-linguistic variation
Human gene MYH6 in lexicon, MYH7 in text
Ambiguity
What if a phrase has two different semantic
classes?
Bioinformatics example gene/protein metonymy

32
Discussion

Partial resources often available.
E.g., you have a gazetteer, you want to extend it
to a new geographic area.
Some manual post-editing necessary for
high-quality.
Semi-automated approaches offer good coverage
with much reduced human effort.
Drift not a problem in practice if there is a
human in the loop anyway.
Approach that can deal with diverse evidence
preferable.
Hand-crafted features (period for N.Y.) help a
lot.

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

I256: Applied Natural Language Processing PowerPoint PPT Presentation

I256: Applied Natural Language Processing - In online fashion: one data point at the time, update weights as necessary. wx b = 0 ... Online: can adjust to changing target, over time. Advantages ... | PowerPoint PPT presentation | free to view

I256: Applied Natural Language Processing PowerPoint PPT Presentation

I256: Applied Natural Language Processing - Shallow Parsing. Break text up into non-overlapping contiguous subsets of tokens. ... Chunking vs. Full Syntactic Parsing 'G.K. Chesterton, author of The Man ... | PowerPoint PPT presentation | free to view

I256: Applied Natural Language Processing PowerPoint PPT Presentation

I256: Applied Natural Language Processing - I256: Applied Natural Language Processing | PowerPoint PPT presentation | free to view

I256: Applied Natural Language Processing PowerPoint PPT Presentation

I256: Applied Natural Language Processing - Extracted artists, links, reviews from music websites ... British Left Waffles on Falkland Islands. Red Tape Holds Up New Bridges ... | PowerPoint PPT presentation | free to view

I256: Applied Natural Language Processing PowerPoint PPT Presentation

I256: Applied Natural Language Processing - I256: Applied Natural Language Processing Marti Hearst Nov 1, 2006 (Most s originally by Barbara Rosario, modified here) Today Algorithms for Classification ... | PowerPoint PPT presentation | free to view

I256: Applied Natural Language Processing PowerPoint PPT Presentation

I256: Applied Natural Language Processing - I256: Applied Natural Language Processing Marti Hearst Aug 30, 2006 Today Introductions Python Basics Introduction to NLTK The Natural Language Toolkit (NLTK ... | PowerPoint PPT presentation | free to view

How Languages Are Learned 4th edition Patsy M. Lightbown and Nina Spada Summary of Chapter 4 PowerPoint PPT Presentation

How Languages Are Learned 4th edition Patsy M. Lightbown and Nina Spada Summary of Chapter 4 - Information processing Language acquisition is the building up of ... The innatist perspective applied to second language learning Krashen s Monitor Model The ... | PowerPoint PPT presentation | free to view

JSB Market Research : Natural Language Processing (NLP) Market -Worldwide Market Forecast & Analysis (20132018) PowerPoint PPT Presentation

JSB Market Research : Natural Language Processing (NLP) Market -Worldwide Market Forecast & Analysis (20132018) - The Natural Language Processing market is built around recognition, operational and analytics technologies. In 2013, the highest market share is accounted by recognition technologies, such as Interactive Voice Response (IVR), Optical Character Recognition (OCR) and pattern and image recognition. See Full Report @ - http://bit.ly/1rd5iVN | PowerPoint PPT presentation | free to view

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing PowerPoint PPT Presentation

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing - LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky | PowerPoint PPT presentation | free to view

Constrained Conditional Models Learning and Inference for Information Extraction and Natural Language Understanding PowerPoint PPT Presentation

Constrained Conditional Models Learning and Inference for Information Extraction and Natural Language Understanding - Constrained Conditional Models Learning and Inference for Information Extraction and Natural Language Understanding Dan Roth Department of Computer Science | PowerPoint PPT presentation | free to view

2013-2020 Global Natural Language Processing Market (NLP) PowerPoint PPT Presentation

2013-2020 Global Natural Language Processing Market (NLP) - Big Market Research, Global Natural Language Processing (NLP) Market Size, Share, Global Trends, Company Profiles, Demand, Insights, Analysis, Research, Report, Opportunities, Segmentation and Forecast, 2013 – 2020. Natural Language processing is a field of computer science, and artificial intelligence that is concerned with interaction between computer and human language.It is a component of artificial intelligence, capable of understanding human language and later converts into machine language. Porter’s five force model and SWOT analysis discusses the market players’ business plans, which would aid in developing new market strategies. | PowerPoint PPT presentation | free to view

Sanskrit and Natural Language Processing A new Paradigm in Sanskrit Education PowerPoint PPT Presentation

Sanskrit and Natural Language Processing A new Paradigm in Sanskrit Education - Sanskrit and Natural Language Processing A new Paradigm in Sanskrit Education Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT ... | PowerPoint PPT presentation | free to view

Natural Language Processing Market 2015-2019 PowerPoint PPT Presentation

Natural Language Processing Market 2015-2019 - The development of NLP solutions is difficult because computers need humans to speak in a programming language that needs to be precise, unambiguous, and highly structured with less amount of enunciated voice commands. However, human speech cannot be precise, and it is often ambiguous and depend on variables that include slang, regional dialects, and social context. View more details of "Natural Language Processing Market" @ http://www.bigmarketresearch.com/global-natural-language-processing-market | PowerPoint PPT presentation | free to view

Global Natural Language Processing Market By Types,Technology And Geography Report Forecast 2021 PowerPoint PPT Presentation

Global Natural Language Processing Market By Types,Technology And Geography Report Forecast 2021 - Natural language process software handles numerous applications such as information extraction, question answering, machine language translation, and much more. | PowerPoint PPT presentation | free to view

North America Natural Language Processing Market Professional Survey Trends by Product and Application 2017 PowerPoint PPT Presentation

North America Natural Language Processing Market Professional Survey Trends by Product and Application 2017 - The report offers a detailed insight into the upstream raw material analysis and downstream demand analysis along with crucial elements of North America Natural Language Processing Market report for furthermore highlights key proposals for new project development along with offering an assessment of investment feasibility analysis. | PowerPoint PPT presentation | free to view

Natural Language Processing Market by Manufacturers, Countries, Type and Application, Forecast to 2022 PowerPoint PPT Presentation

Natural Language Processing Market by Manufacturers, Countries, Type and Application, Forecast to 2022 - 24 Market Reports provides a complete data analysis of North America Natural Language Processing Market by Manufacturers, Countries, Type and Application, Forecast to 2022 with Market value, Sales, Price, Industry Analysis and Forecast with the help of Industry Experts. | PowerPoint PPT presentation | free to view

Global Natural Language Processing Market 2017 Forecast to 2022 PowerPoint PPT Presentation

Global Natural Language Processing Market 2017 Forecast to 2022 - This report focuses on the Natural Language Processing in Global market, especially in North America, Europe and Asia-Pacific, South America, Middle East and Africa. This report categorizes the market based on manufacturers, regions, type and application. | PowerPoint PPT presentation | free to view

Ai centric course learn natural language processing PowerPoint PPT Presentation

Ai centric course learn natural language processing - We provide Artificial Intelligence (AI) training course in India Online. We also provide training and certification courses classes in below filed Machine Learning training, NLP training program , Natural Language Processing training course, AI training program etc. | PowerPoint PPT presentation | free to view

Global and chinese healthcare natural language processing industry, 2018 market research report PowerPoint PPT Presentation

Global and chinese healthcare natural language processing industry, 2018 market research report - Visit Here: https://www.grandresearchstore.com/chemicals-and-materials/global-and-chinese-healthcare-natural-language-processing-industry-2018-market-research-report The 'Global and Chinese Healthcare Natural Language Processing Industry, 2013-2023 Market Research Report' is a professional and in-depth study on the current state of the global Healthcare Natural Language Processing industry with a focus on the Chinese market. The report provides key statistics on the market status of the Healthcare Natural Language Processing manufacturers and is a valuable source of guidance and direction for companies and individuals interested in the industry | PowerPoint PPT presentation | free to view

NLP Tutorial AI with Python | Natural Language Processing PowerPoint PPT Presentation

NLP Tutorial AI with Python | Natural Language Processing - Natural Language Processing is casually dubbed NLP. It is a field of AI that deals with how computers and humans interact and how to program computers to process and analyze huge amounts of natural language data. This faces some challenges like speech recognition, natural language understanding, and natural language generation | PowerPoint PPT presentation | free to view

Healthcare Natural Language Processing (NLP) Market is Expected to be Worth US$ 4.3 Bn by 2024 PowerPoint PPT Presentation

Healthcare Natural Language Processing (NLP) Market is Expected to be Worth US$ 4.3 Bn by 2024 - Healthcare Natural Language Processing (NLP) Market is expected to be worth US$4.3 bn by the end of 2024 as compared to US$936 mn in 2015. During the forecast years of 2016 and 2024, the global market is projected to rise at a CAGR of 18.8%. | PowerPoint PPT presentation | free to view

Natural Language Processing Market Size- KBV Research PowerPoint PPT Presentation

Natural Language Processing Market Size- KBV Research - The Global Natural Language Processing Market size is expected to reach $29.5 billion by 2025, rising at a market growth of 20.5% CAGR during the forecast period. Natural language processing involves several different techniques for human language interpretation, ranging from statistical and machine learning methods to algorithmic and rules-based approaches. A wide range of approaches are necessity because text-and voice-based data, like practical applications, varies widely. Basic NLP tasks include tokenization and sorting, lemmatization/stemming, speech tagging, language detection, and semantic relationship identification. NLP tasks generally break down language into smaller, simpler pieces, understand the relationships between the pieces, and explore how the pieces work together to create meaning. Full Report: https://www.kbvresearch.com/natural-language-processing-market/ | PowerPoint PPT presentation | free to view

Healthcare Natural Language Processing (NLP) Market worth US$ 7,450.8 Mn by 2027 PowerPoint PPT Presentation

Healthcare Natural Language Processing (NLP) Market worth US$ 7,450.8 Mn by 2027 - Healthcare Natural Language Processing (NLP) Market is projected to expand at a CAGR of 19.1% from 2019 to 2027 | PowerPoint PPT presentation | free to view

Natural Language Processing Used in Mobile Applications PowerPoint PPT Presentation

Natural Language Processing Used in Mobile Applications - NLP (Natural Language Processing) is the subfield of AI (Artificial Intelligence) which interprets the interaction between the computer and the human language. It involves reading, understanding, and making sense and value of the human language | PowerPoint PPT presentation | free to view

Overview of Natural language processing PowerPoint PPT Presentation

Overview of Natural language processing - Do you want to know what is natural language processing works (NLP), NLP Techniques, and Key Application Areas of NLP. Download now and take the leverage of it. | PowerPoint PPT presentation | free to view

Tools and Techniques for Natural Language Processing PowerPoint PPT Presentation

Tools and Techniques for Natural Language Processing - RDI offers valuable solutions for society. With RDI's advanced and adaptable NLP tools and building block libraries, it's easy to innovate without having to reinvent the wheel every time. You can use Tashkeel is an advanced automatic Arabic discretization system capable of processing up to hundreds of Arabic words per second. Romooz is RDI's Arabic Entity Recognition (NER) technology, which is able to mark specific entities in text for you. | PowerPoint PPT presentation | free to view

Natural Language Processing Market Research Report 2022-2027 PowerPoint PPT Presentation

Natural Language Processing Market Research Report 2022-2027 - The growing adoption of NLP solutions by numerous enterprises to enhance their internal and external operations is primarily driving the natural language processing (NLP) market. Besides this, the increasing digitization of data and the expanding usage of internet services and connected devices are further stimulating the market growth. Know More: https://www.imarcgroup.com/natural-language-processing-market | PowerPoint PPT presentation | free to view