CLARIN and PAROLE a selfish talk - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

CLARIN and PAROLE a selfish talk

Description:

First selfish outburst: back to the LREC2008 Panel session. My ... Second selfish outburst: sitting on an orphan. Summary and concluding ... outburst of ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 19
Provided by: stev283
Category:

less

Transcript and Presenter's Notes

Title: CLARIN and PAROLE a selfish talk


1
CLARIN and PAROLE a selfish talk
  • Steven Krauwer
  • Utrecht institute of Linguistics
  • UiL-OTS

2
Overview
  • What I know about PAROLE in half a slide
  • CLARIN in two slides
  • First selfish outburst back to the LREC2008
    Panel session
  • My assigned task here
  • Second selfish outburst sitting on an orphan
  • Summary and concluding remarks

3
What I know about PAROLE
  • is very little
  • lexical project
  • late 90s
  • morph synt info for gt 10 languages
  • semantic info added later (SIMPLE)
  • but
  • was it a typical project for resources creators,
  • or has anyone ever used the results?
  • has it been maintained / extended?
  • would it be worth including in CLARIN?

4
What is CLARIN
  • Common Language Resources and Technology
    Infrastructure (http//www.clarin.eu)
  • Basic idea
  • European federation of digital archives with
    language data and tools (text, speech,
    multimodal, gesture )
  • target audience humanities and social sciences
    scholars
  • with uniform single sign-on access to the
    archives
  • with access to language and speech technology
    tools to retrieve, manipulate, enhance, explore
    and exploit data
  • all languages are equally important
  • to cover all EU and associated countries

5
Phasing and funding
  • 2008-10 Preparatory phase
  • funded by the EU (grant 212230, 4.1 M , 33
    consortium partners in 23 countries, plus 123
    other organisations in 32 countries), with (at
    this moment) additional funding from 19 national
    governments (gt 14 M, ranging 50K - 5M)
  • 2011-14 Construction phase
  • To be funded by the member states (100 M
    needed, 5 M committed by 1 country, more to
    follow)
  • 2015- Exploitation phase
  • to be jointly funded by national governments
  • 2008-2018 Estimated cost ca 200 M

6
My first selfish outburst
  • I see at least 5 ways for CLARIN to fail
  • Technical problems?
  • dont think so
  • Lack of enthusiasm from linguists?
  • dont think so
  • Lack of resources and technologies
  • dont think so

7
How could we then fail?1 standards
  • Common standards are crucial for an
    infrastructure of this type where (ideally) all
    resources and technologies can work together
  • Threat 1 Lack of agreement on standards
  • Our strategy
  • single standards not always necessary (support
    more than one, provide mappings between them)
  • take existing best and safe practice very
    seriously
  • broad involvement of the community at large, not
    just the project partners to ensure broad support
  • International collaboration
  • teaming up with related initiatives (e.g.
    FLaReNet)
  • Role for PAROLE? Is P international? Ps Scope?

8
How could we then fail?2 sustainability
  • CLARIN started as a bottom-up initiative but
    major public financial efforts are needed to
    build and exploit the infrastructure
  • Threat 2 Lack of financial support
  • Our strategy/hope
  • EU has put CLARIN on their roadmap of essential
    infrastructures for consideration by national
    governments
  • We try to persuade national funding agencies to
    participate in CLARIN
  • Investment per country/language not excessive
  • Dont think PAROLE could help here

9
How could we then fail?3 ignoring content
evolution
  • CLARINs original constituency is mostly the
    written language community but much of the
    material relevant for e.g. modern historians or
    sociologists includes other linguistic modalities
  • Threat 3 CLARIN misses the multimodal boat and
    will be obsolete before its completion
  • Our strategy
  • reaching out actively to the other communities
  • openness, with a standing invitation to others to
    join
  • teaming up with initiatives with a broader scope
    than just textual data
  • Is PAROLE in this respect broader than CLARIN?

10
How could we then fail?4 language coverage
  • One of our ideological pillars is that all
    languages are equal
  • Threat 4 Minor languages will effectively be
    excluded because of poor resources and technology
    coverage (no market gt no )
  • Our strategy
  • the BLARK concept (Basic Language Resource Kit)
    may help to bring all languages to a certain
    minimal level of coverage
  • porting of technologies and expertise between
    languages can save time and effort
  • joint transnational actions to fill the gaps
  • Role for PAROLE (esp all languages?)

11
How could we then fail?5 take-up
  • CLARIN can only be successful if our target
    audience will actually make use of the facilities
    we offer to support and innovate their research
  • Threat 5 Failure to reach and convince our
    audience
  • Our strategy
  • teaming up with other infrastructure initiatives
    targeting related audiences (e.g. sister project
    DARIAH)
  • in next phase strong focus on training, education
    and awareness
  • start building convincing demonstrators and joint
    pilots to get a better understanding of the
    users needs
  • Not sure whether PAROLE could help here

12
My task here
  • The suggested subject of your talk would be an
    overall presentation of the new approaches to
    Linguistic engineering that are represented by
    Clarin, and the key issues that could be taken
    into account from the point of view of the
    producers of Linguistic Resources

13
new approaches to linguistic engineering in
CLARIN
  • Not directly we want to use what is (and will
    be) there, except maybe responding to specific
    humanities needs (e.g. tools for diachronic
    research or comparing manuscripts)
  • More indirectly we might want to start a new
    fashion by focusing on interoperable tools and
    resources, offered as services, and chainable
    into workflows
  • Not sure how PAROLE would come in here

14
key issues that could be taken into account by
producers
  • Issue number 1 is .. standards!

15
Issue nr 1
  • standards standards standards standards standards
    standards standards standards standards!!!!!
    standards standards standards standards!!!!!!
    standards standards standards standards
    standards!!!!! standards!!!!!!!!!!!!!!!!!
    standards!!!!!!!!!!!!!! standards!!!!!!!!!!!
    standards!!!!!!!!!!!

16
Other issues
  • Some more
  • Licensing and business models, including
  • Fair use
  • Repurposing of data
  • Quality assurance (incl. definition of quality
    and validation)
  • Broad language coverage (esp smaller languages)
  • Identifying resources that are really innovative
    or that might help crossing discipline boundaries
  • Anything here for PAROLE?

17
Second outburst of selfishness
  • The BLARK concept exists, but has become pretty
    much an orphan after ELSNET stopped being funded
  • It has been taken up in a number of places (Dutch
    language union, NEMLAR/MEDAR, France, Sweden,
    CLARIN, ELDA), but there is no body that feels
    responsible for the concept
  • Could PAROLE play a role here (maybe for the
    lexical part)?

18
Summary and concluding remarks
  • My ignorance concerning PAROLE has become evident
    and I need to be filled in but not necessarily
    here and now
  • For some issues I really dont know whether
    PAROLE would be capable of playing a role
  • If PAROLE represents a significant community it
    might want to participate in standards
    discussions
  • PAROLE might be an interesting instrument to help
    looking after (lexical aspects of?) the BLARK
  • THANKS!
Write a Comment
User Comments (0)
About PowerShow.com