Reranking for largescale statistical machine translation - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Reranking for largescale statistical machine translation

Description:

Extract translation phrase-pairs (based on word-alignment heuristics) ... For Chinese-English 80 million words corpus: 12 million unique phrase-pairs are extracted. ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 19

Provided by: kyam1

Category:

Tags: chinese | largescale | machine | miss | reranking | statistical | translation | you

Transcript and Presenter's Notes

Title: Reranking for largescale statistical machine translation

1
Re-ranking for large-scale statistical machine
translation

Kenji Yamada and Ion Muslea
Language Weaver
kyamada,imuslea_at_languageweaver.com

2
The Task

Re-ranking the n-best output from the
phrase-based statistical machine translation
(SMT) system.
Possible gain from the re-ranker
1-best BLEU 31.6
256-best BLEU 42.8
(see JHU03 workshop report)
BLEU Geometric average of of 1- to 4-grams
match with reference.

3
How phrase-based SMT works

Use sentence-aligned bi-lingual corpus.
Run word-alignment algorithm (such as IBM model)
Extract translation phrase-pairs (based on
word-alignment heuristics)
Obtain phrase-level and word-level translation
probability, and other feature values.
Build a log-linear model of
above probabilities, language models, and other
feature values
weights are tuned with dev-corpus.
Translate (decode) by beam-search in the model.

4
Junk phrase-pairs

Translation phrase-pairs are automatically
extracted from the bi-lingual corpus.
For Chinese-English 80 million words corpus
12 million unique phrase-pairs are extracted.
It contains a lot of junk.
Idea Use each phrase-pair as a feature for the
n-best re-ranker.

5
Re-ranking by Perceptron

Fast algorithm for huge data and of parameters.
Previous work
PRank Crammer Singer 2003
OAP-BPM Harrington 2003
Applied to SMT
Shen and Joshi 2005
Liang, et.al. 2006

6
Our extension

Partial pair-wise comparison
Oracle (best in the n-best) vs. non-oracle
Strength of weight-update is proportional with
the difference.
Ensemble training
Split the training data, train separately, and
average learned weights.

7
Prepare the training set for the re-ranker

Generate the n-best translation for each training
sentence.
4 million sentences 200-best 800 million data
points
Calculate BP1 score for each hypothesis in the
n-best.
BP1 1 floored BLEU score (per sentence)
Use the best-BP1 hypothesis as a reference
Extract features from each hypothesis
Decoder cost
Phrase-pair IDs

8
Algorithm

Init w0 (1 for decoder cost, 0 for phrase-pairs)
For each epoch
For each sentence
For each non-oracle hypothesis xi in the
n-best
if wt xi lt wt xoracle then
// miss-classification
wt1 wt (xi xoracle) x a
where a BP1(xoracle) BP1(xi)
else wt1 wt
tt1
Output wt or S wi / t

9
Parallel Training

Split the training data into X sets
Train a perceptron on each split
Average the learned weight vectors

10
Interleaving Dev-data

Distribution difference between Train and
Dev/Test corpus.
Mix duplicated Dev data into Train

11
Experiment Setup

Baseline phrase-based SMT system
Chinese-to-English
Train 80 million words
Dev 993 sentences
Test 919 sentences
Test-BLEU 31.19
Baseline system uses 12 million phrase-pairs
Extracted automatically from the Train corpus
Use only 4M phrase pairs for the re-ranker
(prune if it appears gt100k or 1 times)

12
Result (953-split)
BLEU
Epochs
13
Result (no split)
BLEU
Epochs
14
Conclusion and Future Work

Large-scale n-best re-ranking for phrase-based
SMT
Parallel perceptron training
Interleave dev-corpus helps
Future work
Other feature types (e.g. n-grams)
Non-uniform weight mixture

15
Thank You!
16
Result (953-split)
BLEU
Epochs
17
Result (no-split)
BLEU
Epochs
18
Using Dev-data only?

How much Train/Dev contribute?
Baseline 31.19
Train only 31.31
Dev only 31.41
Train Dev 31.72

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

An Overview of Statistical Machine Translation PowerPoint PPT Presentation

An Overview of Statistical Machine Translation - Translation Dictionaries From Minimal Resources ' ... Free Translation. Tschernobyl. k nnte. dann. etwas. sp ter. an. die. Reihe. kommen. Then. we ... | PowerPoint PPT presentation | free to view

Generation-Heavy Hybrid Machine Translation Nizar Habash Postdoctoral Researcher Center for Computational Learning Systems Columbia University PowerPoint PPT Presentation

Generation-Heavy Hybrid Machine Translation Nizar Habash Postdoctoral Researcher Center for Computational Learning Systems Columbia University - Low quality (many-to-many) translation lexicon ... Translation. Spanish. English. Theta Linking. Expansion. Assignment. Pruning. Linearization ... | PowerPoint PPT presentation | free to view

Machine Translation ICS 482 Natural Language Processing PowerPoint PPT Presentation

Machine Translation ICS 482 Natural Language Processing - Machine Translation ICS 482 Natural Language Processing ... statistical MT, and EBMT Requirements: Aligned large parallel corpus of translated sentences ... | PowerPoint PPT presentation | free to view

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang PowerPoint PPT Presentation

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted from ... | PowerPoint PPT presentation | free to view

A Phrase-Based, Joint Probability Model for Statistical Machine Translation PowerPoint PPT Presentation

A Phrase-Based, Joint Probability Model for Statistical Machine Translation - A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006 The Noisy Channel ... | PowerPoint PPT presentation | free to view

Speech to Speech Machine Translation (S2SMT) PowerPoint PPT Presentation

Speech to Speech Machine Translation (S2SMT) - Speech to Speech Machine Translation (S2SMT) Kapita Selekta, 26 November 2005 Suyanto Overview Motivation S2SMT System Applications Conclusion Discussion VerbMobil ... | PowerPoint PPT presentation | free to view

Statistical Machine Translation: IBM Models and the Alignment Template System PowerPoint PPT Presentation

Statistical Machine Translation: IBM Models and the Alignment Template System - Statistical Machine Translation: IBM Models and the Alignment Template System Statistical Machine Translation Goal: Given foreign sentence f: Maria no dio una ... | PowerPoint PPT presentation | free to view

Global Machine Translation Market Size, Share, Growth, Demand, Forecast 2015-2019 PowerPoint PPT Presentation

Global Machine Translation Market Size, Share, Growth, Demand, Forecast 2015-2019 - Machine translation (MT) is the translation of words by a computer from one language to another. It is one of the major sub-fields of computational linguistics, and deals with statistical or rule-based modeling of natural language. It enables speedy translation of documents and content. It is a cost-effective method employed by several companies to create multilingual content for a global website. Get full access of the report: http://www.researchbeam.com/global-machine-translation-2015-2019-market | PowerPoint PPT presentation | free to view

Example-based Machine Translation based on Deeper NLP PowerPoint PPT Presentation

Example-based Machine Translation based on Deeper NLP - Example-based Machine Translation based on Deeper NLP Toshiaki Nakazawa1, Kun Yu1, Sadao Kurohashi2 1. Graduate School of Information Science and Technology, | PowerPoint PPT presentation | free to view

Language Translation Software: Market Shares, Strategies, and Forecasts, Worldwide, 2014 to 2020 PowerPoint PPT Presentation

Language Translation Software: Market Shares, Strategies, and Forecasts, Worldwide, 2014 to 2020 - Worldwide markets are poised to achieve continuing growth as the language translation software systems are put in place to support mobile end point information collections that are localized. | PowerPoint PPT presentation | free to view

Global Machine Translation (MT) Sales Market Report 2017 PowerPoint PPT Presentation

Global Machine Translation (MT) Sales Market Report 2017 - This Report provided by GrandResearchStore is about, sales (consumption) of Machine Translation (MT) in Global market, especially in United States, China, Europe and Japan, focuses on top players in these regions/countries, with sales, price, revenue and market share for each player in these regions, covering AppTek Asia Online Cloudwords IBM | PowerPoint PPT presentation | free to view

Global Machine Translation Market share to reach $1.5bn by 2024 PowerPoint PPT Presentation

Global Machine Translation Market share to reach $1.5bn by 2024 - More Information @ http://bit.ly/2oMmk1O The machine translation market is advancing at a rapid rate and in order to keep pace with the translation requirements, companies are becoming increasingly aware of the need to localize content into more languages. This is due to the rising demand for location based content across various industrial verticals such as e-commerce, electronics, travel, e-commerce and hospitality. | PowerPoint PPT presentation | free to view

Machine Translation Market to hit $1.5bn by 2024 PowerPoint PPT Presentation

Machine Translation Market to hit $1.5bn by 2024 - More Information @ http://bit.ly/2xF7VLn Machine translation industry encompasses a wide variety of application areas such as electronics, healthcare, IT, automotive, and military & defense. The notable e-commerce industry giants such as eBay, Alibaba, Amazon are using MT technology to expand their business base across the globe, which will fuel machine translation market share from e-commerce applications. | PowerPoint PPT presentation | free to view

Machine Learning Using Spark Online Training PowerPoint PPT Presentation

Machine Learning Using Spark Online Training - http://www.learntek.org/product/machine-learning-using-spark/ http://www.learntek.org Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional. | PowerPoint PPT presentation | free to view

Machine Translation Market in Europe is projected to surpass $390mn by 2024 PowerPoint PPT Presentation

Machine Translation Market in Europe is projected to surpass $390mn by 2024 - The Europe machine translation market is one of the major markets owing to a large number of companies adopting machine translation. European countries have been at the forefront investing in these systems for deploying sophisticated models that target numerous languages and high multi-lingual content. | PowerPoint PPT presentation | free to view

North America Machine Translation Market Trends, Demand, Survey till 2024 PowerPoint PPT Presentation

North America Machine Translation Market Trends, Demand, Survey till 2024 - The North America machine translation market will witness a significant growth in the healthcare domain. Machine translation is a tool occasionally used in the healthcare for translating instructional content, websites, doctor manuals, leaflets, patient brochures, and other crucial communication materials. | PowerPoint PPT presentation | free to view

Clinical applications of Machine learning in Radiology: Pubrica.com PowerPoint PPT Presentation

Clinical applications of Machine learning in Radiology: Pubrica.com - Machine learning serves as one of the vital quantitative tools that serve as better biomarkers in the radiological diagnosis of diseases. By survey ML frameworks as a teammate, not as a contender, future radiologists could profit by an organization. Learn More: https://bit.ly/2SKJKo1 Contact us: Web: https://pubrica.com/ Blog: https://pubrica.com/academy/ Email: sales@pubrica.com WhatsApp : +91 9884350006 United Kingdom : +44-1143520021 | PowerPoint PPT presentation | free to view

10 AI, Data Science, Machine Learning Terms You Need to Know in 2020 PowerPoint PPT Presentation

10 AI, Data Science, Machine Learning Terms You Need to Know in 2020 - Do you want to start your career in information technology but unsure about the job opportunities that lie ahead? These s will tell you all about the Artificial intelligence, Machine learning and Data science terms you need to know in 2020. | PowerPoint PPT presentation | free to view

Machine Translation and Post Editing: Everything You Need to Know PowerPoint PPT Presentation

Machine Translation and Post Editing: Everything You Need to Know - MT has come a long way, however, it is still not able to do “sense to sense” translation. In fact, its purpose has never been to replace human translators, but to make their job easier and faster. MTPE, on the other hand, is a new approach to translation that incorporates computers and human translators to grant the client premium quality translations at lower rates! To get MTPE right, you need the help of professionals. From choosing the right MT engine for your language pair and subject matter to training linguists to post edit, expertise, and experience is required. https://www.milestoneloc.com/machine-translation-editing/ | PowerPoint PPT presentation | free to view

How to Become A Machine Learning Engineer ? | How to Learn Machine Learning? PowerPoint PPT Presentation

How to Become A Machine Learning Engineer ? | How to Learn Machine Learning? - Using statistical methods, algorithms are trained to make classifications or predictions, uncovering key insights within data mining projects. Machine learning training can help you excel in the career as a specialist in the IT field. | PowerPoint PPT presentation | free to view

Here’s Why Car Dealers Should Hire A Translation Agency Today! PowerPoint PPT Presentation

Here’s Why Car Dealers Should Hire A Translation Agency Today! - Professional translation is needed in every industry. We need it for communication. Car dealers would require that, just like corporate firms, to better communicate with their clients. Every complex manual about the machine function and information needs to be translated in a user-friendly medium. If a proper translation is not taken care of, then we might end up in a situation like the tower of the people of Babel. | PowerPoint PPT presentation | free to view

Are machine translation services more effective? PowerPoint PPT Presentation

Are machine translation services more effective? - If you require quick translations, machine translation services are far less expensive than employing a human translator, but you must factor in customising expenses so that it functions the way you want it to using your own terminology. Otherwise, you'll use data-sucking programmes like Google Translate or Bing Translator which were created for information rather than as a dependable, high-quality translation tool. Contact EZ to know more about the best options in translation. | PowerPoint PPT presentation | free to view

How Google Utilizes Machine Learning In Its Popular Services PowerPoint PPT Presentation

How Google Utilizes Machine Learning In Its Popular Services - Machine learning has great room for growth as it is still evolving. And, if you want to make the most of this technology, this is undoubtedly the right time to make a switch. A reputed machine learning bootcamp in California will help you learn about ML algorithms, advanced ML concepts, deep learning, NLP, and more. | PowerPoint PPT presentation | free to view

Translate from English to Assamese PowerPoint PPT Presentation

Translate from English to Assamese - Machine translation vs. Human translation: Who wins? It’s a long debate whether Artificial intelligence will replace human labor. Well, translation is something that you need to grow and communicate with the world. You have two options, Machine translation or Human translation. No doubt technology is advancing day by day, and people prefer something quick to save their time. Machine translations are quicker than human translation, but are you sure the quality will be perfect? | PowerPoint PPT presentation | free to view

A Step-by-Step Guide to Using Certified Spanish Translation Services PowerPoint PPT Presentation

A Step-by-Step Guide to Using Certified Spanish Translation Services - English and Spanish are the world's top languages. And to share your idea across the world .You need to be fluent in both of them.Hence Most of the business struggle to find certified English translations or certified Spanish translation services that can help them with all their business communications and document translations.The Translation Group is one of those certified and professional translation service providers that ensures that all the guidelines are precisely followed during all legal document translations.our professionals share the step-by-step process of how a certified Spanish translation services works. | PowerPoint PPT presentation | free to view

Applied Data Science and Machine Learning PowerPoint PPT Presentation

Applied Data Science and Machine Learning - With outstanding and renowned faculty, MAGES Institute brings you a highly competitive Applied Data Science and Machine Learning Course in which in the first 1-3 weeks you'll learn about data science fundamentals, then in the next 4-9 weeks you'll master data analytics and data engineering. From week 10-12 you'll learn data visualization which will be followed by machine learning in week 13-19. It will end with a capstone/internship in weeks 20-24. | PowerPoint PPT presentation | free to view

Machine Translation Market PowerPoint PPT Presentation

Machine Translation Market - Machine Translation Market forecast to reach $1.58 billion by 2025, after growing at a CAGR of 13.06% during 2020-2025 owing to high adoption of machine translation technology in end-user verticals such as military and defense, automotive, healthcare and others for automated translation of source material into another language. | PowerPoint PPT presentation | free to view