Foundations of Statistical NLP Chapter 4. Corpus-Based Work - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Foundations of Statistical NLP Chapter 4. Corpus-Based Work

Description:

Getting Set up(1/2) Text corpora are usually big. major ... colon, semicolon, dash is regarded as a sentence. recent research sentence boundary detection ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 16

Provided by: klplReP

Category:

Tags: nlp | based | chapter | corpus | dash | foundations | statistical | work

Transcript and Presenter's Notes

Title: Foundations of Statistical NLP Chapter 4. Corpus-Based Work

1
Foundations of Statistical NLPChapter 4.
Corpus-Based Work

? ? ?

2
Abstract

Getting Set Up
Computers, Corpora, Software
Looking at Text
Low-level formatting issues
Tokenization What is a word?
Morphology
Sentences
Mark-up Data
Markup schemes
Grammatical tagging

3
Getting Set up(1/2)

Text corpora are usually big.
major limitation on the use of corpora
Computer? ???? ??
Corpora
use text corpora distributed by main organization
corpus special collection of textual material
general issue is representative sample of the
population of interest.

4
Getting Set up(2/2)

Software
Text editors shows fairly literally
Regular expressions find certain pattern
Programming languages C, C, Perl
Programming techniques

5
(No Transcript)
6
Looking at Text

Text come a row format or marked up.
Markup
a term is used for putting code of some sort into
a computer file.
commercial word processing WYSIWYG
Features of text in human languages
difficulty to process automatically

7
Low-level formatting issues

Junk formatting/content
junk document header, separator, table,
diagram, etc.
OCR deal with only English text -gt remove junk
(other text)
Uppercase and lowercase
The original Brown corpus was used to capital
letter
Should we treat brown in Richard Brown and brown
paint as the same?
proper name detection difficult problem

8
Tokenization What is a word?(1)

Tokenization
To divide the input text into unit called token
what is a word?
graphic word (Kucera and Francis. 1967)
a string of contiguous alphanumeric characters
with space on either sidemay include hyphens and
apo-strophes, but no other punctuation marks
-gt workable definition 22.50, Microoft,
Cnet

9
Tokenization What is a word?(2)

Period
distinction end of sentence punctuation marks,
abbreviation makrs as in etc. or Wash.
Single apostrophes
English contractions Ill or isnt
dogs dog is or dog has or genitive case
Hyphenation
line-breaking hyphen is present in typographical
source
e-mail, 26-year-old, co-operate

10
Tokenization What is a word?(3)

The same form representing multiple words
homographs saw has two lexemes (chap 7)
Word segmentation in other languages
Many languages do not put spaces in between
words
Whitespace not indicating a word break
the New York-New Haven railroad
Variant coding of information of a certain
seman-tic type

11
Morphology

Stemming processing
a process that strips off affixes and leaves you
with a stem.
lemmatization
one is attempting to find the lemma or lexeme of
which one is looking at an inflected form
IR community has shown that doing stemm-ing does
not help the performance

12
Sentences

What is a sentence?
something ending with a ., ? or !.
colon, semicolon, dash is regarded as a sentence
recent research sentence boundary detection
Riley(1989) statistical classification tree
Palmer and Hearst (1994 1997) a neural network
to predict sentence boundaries
Mikheev(1998) Maximum Entropy approaches to the
problem

13
Mark-up Schemes

early days, markup schemes
including header information in texts(giving
author, date, title, etc.)
SGML
general language that lets one define a grammar
for texts,
XML
subset of SGML particularly designed for web

14
Grammatical tagging

first step of analysis
automatic grammatical tagging for categories
distinguishing comparative and superlative
Tag sets (Table 4.5)
incorporate morphological distinction of a
particular language
The design of a tag set
target feature of classification
useful information about the grammatical class of
a word
predictive feature
prediction the behavior of other words in the
context

15
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Corpus-Based Work PowerPoint PPT Presentation

Corpus-Based Work - Corpus-Based Work Chapter 4 Foundations of statistical natural language processing | PowerPoint PPT presentation | free to view

Foundations of statistical natural language processing PowerPoint PPT Presentation

Foundations of statistical natural language processing - Introduction Chapter 1 Foundations of statistical natural language processing | PowerPoint PPT presentation | free to view

NLP Practitioner Course PowerPoint PPT Presentation

NLP Practitioner Course - NLP Practitioner Course Develop the internal flexibility and emotional strength to respond appropriately and creatively to events and changes in your work and your life.Achieve a state of openness and acceptance of the stimuli for learning that exist in all contexts of your work and life. Call for more info 9895184269. | PowerPoint PPT presentation | free to view

Bathroom Remodeling Contractors in Corpus Christi TX PowerPoint PPT Presentation

Bathroom Remodeling Contractors in Corpus Christi TX - We offer own custom design work so we can get your project from beginning to end. When it comes to remodeling or adding to your obtainable space, we have the knowledge you're penetrating for. We can also custom construct the home you've for all time wanted. Call (817) 657-1357 to agenda the home improvements you wish for in Corpus Christi, TX. | PowerPoint PPT presentation | free to view

NLP Life Coach Training in Dubai PowerPoint PPT Presentation

NLP Life Coach Training in Dubai - NLP Practitioner certification and Master Practitioner certification from any Neuro-Linguistic Programming training establishment is the entry criterion for this programme. Becoming a life coach in Dubai Full Training Led by two International Master Trainers of know thyself coaching institute. Many Additional International Trainers as assistants. Very Affordable. And Complete program support. ICF Approved, ACC. Evening Classes Available. ICF, ACC Approved. | PowerPoint PPT presentation | free to view

Example-based Machine Translation based on Deeper NLP PowerPoint PPT Presentation

Example-based Machine Translation based on Deeper NLP - Example-based Machine Translation based on Deeper NLP Toshiaki Nakazawa1, Kun Yu1, Sadao Kurohashi2 1. Graduate School of Information Science and Technology, | PowerPoint PPT presentation | free to view

How to Change Your Whole Personality By NLP Sales Book PowerPoint PPT Presentation

How to Change Your Whole Personality By NLP Sales Book - Personality is a collection of patterns — thought, behavior, and feeling — that make up who you are. And guess what? Examples can change. It'll take work, yet in the event that you're genuinely dedicated to this thought, anything can happen. Keep in mind, however, that your old identity will probably radiate through routinely as our convictions and believing is molded by our background. http://bit.ly/2tuzdnn | PowerPoint PPT presentation | free to view

Chapter 1 Computer Basics PowerPoint PPT Presentation

Chapter 1 Computer Basics - Think Machine Learning; NLP ... Automatic Content ... Database and information retrieval Artificial intelligence and robotics ... | PowerPoint PPT presentation | free to view

How to Take Your Business to the Next Level By NLP Sales Book PowerPoint PPT Presentation

How to Take Your Business to the Next Level By NLP Sales Book - Is your business stuck in a rut? Doing what you've always done just doesn't cut it anymore. Simply take a gander at previous film rental monster Blockbuster. Regardless of being notable and commanding the film rental market for quite a long time, the organization neglected to adequately adjust to changes in the market and is presently outdated. Here are a few stages you can take to guarantee that your business advances with the evolving times. Dan Story has been conveying NLP deals preparing for more than 10 years, helping business people from each industry achieve more prospects, finalize more negotiations and gain more commissions. http://bit.ly/2tuzdnn | PowerPoint PPT presentation | free to view

nlp practitioner certification training dubai PowerPoint PPT Presentation

nlp practitioner certification training dubai - Know Thyself Coaching Institute NLP Coaching Academy is the market leader in Enter Fast Growing Profession: NLP Coaching, Request Information Today! ICF Approved, ACC. Convenient online class. Evening Classes Available. Very Affordable. Best life coach training NLP and Coaching programs in Asia, with associates in Dubai, US, Canada, UK, Kuwait, Jordan, Nepal, Sri Lanka. | PowerPoint PPT presentation | free to view

Chapter 8: Introduction to Statistical Inferences PowerPoint PPT Presentation

Chapter 8: Introduction to Statistical Inferences - Chapter 8: Introduction to ... Find the lower and upper confidence limits. 5. ... Example: A company advertises the net weight of its cereal is 24 ounces. | PowerPoint PPT presentation | free to view

Use NLP Techniques to Overcome Emotional Hurdles To Success PowerPoint PPT Presentation

Use NLP Techniques to Overcome Emotional Hurdles To Success - As part of my vocation I am blessed to meet many individuals across different strata of society. They happen to be from diverse backgrounds and from different age groups. One particular age group that I am interested in is those between 18 to 23 years. This section of society I believe will shape the world that I will inhabit in the coming decade.NLP Coaching in Mumbai can be of great assistance to this age group and I will elaborate why. | PowerPoint PPT presentation | free to view

NEURO LINGUISTIC PROGRAMMING OR NLP-THE GREATEST SELF IMPROVEMENT SYSTEM IN THE WORLD: PowerPoint PPT Presentation

NEURO LINGUISTIC PROGRAMMING OR NLP-THE GREATEST SELF IMPROVEMENT SYSTEM IN THE WORLD: - Neuro Linguistic Programming was first developed in the early 1970’s by an information scientist, Richard Bandler, and a linguistics professor, John Grinder. From their studies of successful people, they created a way to analyze and transform human excellence, resulting in the most powerful, practical psychology ever developed. Neuro Linguistic Programming Mumbai is the study of human excellence. Here we will explain the nlp and fundamental principles of Neuro Linguistic Programming Mumbai, NLP coaching academy Mumbai, and many nlp techniques Mumbai. | PowerPoint PPT presentation | free to view

Model Based Manufacturing Technologies Market PowerPoint PPT Presentation

Model Based Manufacturing Technologies Market - A recent market study published by Future Market Insights (FMI) on the model-based manufacturing technologies market includes a global industry analysis for 2016-2020 and an opportunity assessment for 2021-2031, and delivers a comprehensive assessment of the most important market dynamics. After conducting a thorough research on the historical as well as current growth parameters, growth prospects of the market are obtained with maximum precision. | PowerPoint PPT presentation | free to view

nlp practitioner certification trainer PowerPoint PPT Presentation

nlp practitioner certification trainer - Join Know Thyself Coaching Institute and our Master -Trainer Makram Maadad for a live experiential NLP life coaching Certification Training to LEARN and EXPERIENCE the transformational technique son how to master your mindset and navigate life confidently. Contact us to find out when are our empowering upcoming training seminars. | PowerPoint PPT presentation | free to view

Healthcare Natural Language Processing (NLP) Market is Expected to be Worth US$ 4.3 Bn by 2024 PowerPoint PPT Presentation

Healthcare Natural Language Processing (NLP) Market is Expected to be Worth US$ 4.3 Bn by 2024 - Healthcare Natural Language Processing (NLP) Market is expected to be worth US$4.3 bn by the end of 2024 as compared to US$936 mn in 2015. During the forecast years of 2016 and 2024, the global market is projected to rise at a CAGR of 18.8%. | PowerPoint PPT presentation | free to view

CHAPTER 14: Confidence Intervals The Basics PowerPoint PPT Presentation

CHAPTER 14: Confidence Intervals The Basics - Chapter 5 Chapter 5 Basic Practice of ... * Statistical Inference Statistical Inference Statistical inference provides methods for drawing conclusions about a ... | PowerPoint PPT presentation | free to view

Healthcare Natural Language Processing (NLP) Market worth US$ 7,450.8 Mn by 2027 PowerPoint PPT Presentation

Healthcare Natural Language Processing (NLP) Market worth US$ 7,450.8 Mn by 2027 - Healthcare Natural Language Processing (NLP) Market is projected to expand at a CAGR of 19.1% from 2019 to 2027 | PowerPoint PPT presentation | free to view

Ultimate Guide to Understand Natural Language Processing (NLP) PowerPoint PPT Presentation

Ultimate Guide to Understand Natural Language Processing (NLP) - We have explained the basics of Natural Language Processing and what are major applications that mainly created with NLP. | PowerPoint PPT presentation | free to view

NLP Master Practitioner Certification at Wisdom Tree Solutions PowerPoint PPT Presentation

NLP Master Practitioner Certification at Wisdom Tree Solutions - Wisdom Tree Solutions is credited as well-versed for NLP certification program. You can learn the motivational guideline of skill set to set free someone from phobia, stress & illusion. | PowerPoint PPT presentation | free to view

NLP Training in India (1) PowerPoint PPT Presentation

NLP Training in India (1) - Vikram Dhar is a well-known business coach who also offers NLP training in India. Vikram Dhar is an excellent trainer who can assist you in expanding your strategic business approach in order to acquire new clients and increase revenue. He approaches his work as a business coach with the goal of discovering new opportunities and assisting his clients in reaching their full potential. For more information visit us : https://vikramdhar.in/ | PowerPoint PPT presentation | free to view

NLP Master Practitioner Certification at Wisdom Tree Solutions (1) PowerPoint PPT Presentation

NLP Master Practitioner Certification at Wisdom Tree Solutions (1) - Wisdom Tree Solutions is credited as well-versed for NLP certification program. You can learn the motivational guideline of skill set to set free someone from phobia, stress & illusion. | PowerPoint PPT presentation | free to view

Corpus Christi maritime law PowerPoint PPT Presentation

Corpus Christi maritime law - The Corpus Christi maritime attorney helps you understand the law, resolve disputes, and all the legal procedures and documents involved in maritime law. | PowerPoint PPT presentation | free to view

NLP for making executives - Why is it so relevant? PowerPoint PPT Presentation

NLP for making executives - Why is it so relevant? - Having said it, it’s also understandable that since NLP is a relatively latent technique it’s hard to find certified practitioners for training. However, in today’s world impossible is nothing. With booming online marketplaces, like Braingroom for instance, you always have a chance to offer the best training to your executives. With Braingroom, you don’t just find the best trainer but you relate and connect below enrolling your executives. | PowerPoint PPT presentation | free to view

NLP Training in India PowerPoint PPT Presentation

NLP Training in India - Vikram Dhar is a well-known business coach who also offers NLP training in India. Vikram Dhar is an excellent trainer who can assist you in expanding your strategic business approach in order to acquire new clients and increase revenue. He approaches his work as a business coach with the goal of discovering new opportunities and assisting his clients in reaching their full potential. For more information visit us : https://vikramdhar.in/ | PowerPoint PPT presentation | free to view

HMM-based speech synthesis: the new generation of artificial voices PowerPoint PPT Presentation

HMM-based speech synthesis: the new generation of artificial voices - HMM-based speech synthesis: the new generation of artificial voices Thomas Drugman thomas.drugman@umons.ac.be * ... | PowerPoint PPT presentation | free to view

NLP Interview Questions || Coding Tag PowerPoint PPT Presentation

NLP Interview Questions || Coding Tag - Giving an interview for NLP role is very different from generic data science profile here is , Well organized, easy and frequently asked NLP interview Question to learn and regain into your mind Coding tag gives you a well build tutorials with a lot of examples of how where and when. To harness your potential with resources. | PowerPoint PPT presentation | free to view