The European Patent Office - PowerPoint PPT Presentation

About This Presentation
Title:

The European Patent Office

Description:

European Machine Translation Programme. The European Patent Office. European. Patent Office ... of JP-EN patent translation. Agreement EPO - Member States ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 14
Provided by: olif
Category:

less

Transcript and Presenter's Notes

Title: The European Patent Office


1

European Patent Office
European Machine Translation Programme
Wolfgang Täger
December 2006
2
Overview
  • Programme Partners and Goals
  • MT engine
  • Dictionary format
  • Available corpora
  • Alignment Extraction
  • Validation Concordancing
  • DEMO

3
Programme Partners and Goals
  • Trigger Success of JP-EN patent translation
  • Agreement EPO - Member States
  • MT of patents/ abstracts/ communications to/from
    English
  • Three language pairs per year
  • First three languages FR - DE - ES
  • Candidates for next year Swedish, Dutch,
    Italian, Romanian, Greek

4
MT engine
  • Trial with SMT system (Language Weaver)
  • Call for tender Winner Worldlingo (Systran)
  • Going public (esp_at_cenet) December 2006
  • Needed Improve translation by specific
    dictionaries

5
Dictionary format
  • Desiderata
  • open standard
  • XML-Unicode
  • support features of MT engines
  • support conditional translations (e.g. based on
    IPC)
  • Is not intended for terminology (no definitions,
    lexical focus and no semantic focus).
  • OLIF format was chosen
  • How to get dictionaries ? By bilingual term
    extraction !

6
Available corpora
  • 560.000 EP-B publications gt claims in EN,DE,FR
  • 300.000 DE-T2 publications
  • 37.000 ES-B3/T3 publications
  • gt Align corpora for term extraction,
    concordancing, translation memory (and SMT)

ES B3/T3 (LaTex)
DE-T2
EP-B1
DESC ES
DESC DE
DESC EN OR FR OR DE
CL ES
(CL DE)
CL EN
CL FR
CL DE
7
Available corpora
  • 560.000 EP-B publications gt claims in EN,DE,FR
  • 300.000 DE-T2 publications
  • 37.000 ES-B3/T3 publications
  • gt Align corpora for term extraction,
    concordancing, translation memory (and SMT)

ES B3/T3 (LaTex)
DE-T2
EP-B1
DESC ES
DESC DE
DESC EN OR FR OR DE
CL ES
(CL DE)
CL EN
CL FR
CL DE
8
Alignment Extraction
  • Alignment Trial at EPO with internally developed
    SW
  • Result was not improved by external companies
    during call for tender.

9
Alignment Extraction
  • Call for tender for bilingual term extraction
  • Winner DFKI
  • Alignment of corpora, POS tagging, Identification
    of terms
  • Pairing of terms using clues like co-occurrence
    score, string similarity, grammatical clues,
    position, available dictionaries, ...
  • Providing further information like gender,
    inflection, transitivity, countable, ...

10
Validation Concordancing
  • Development of OLIF editor at EPO
  • Remove noise
  • Correct entries
  • Use concordancer (provides statistics based on
    parallel corpora)
  • gt DEMO

11
OLIF format
  • Support of more languages
  • Clarification of inflection scheme
  • Clarification of term vs lex approach
  • Tools

12
Relational database ??
Transl
SemRel
Concept
Term
Naming
SurfForm
InflForm
Lemma
RegEx
LexType
Infl
13
Relational database ??
Transl
SemRel
hot drink ...
grüner Tee
Naming
grüner
Nom. Sg. str. f. pos.
grün
-er
DE, Adj
iLike klein
14
End
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com