Ontologically-based Searching for Jobs in Linguistics - PowerPoint PPT Presentation

About This Presentation
Title:

Ontologically-based Searching for Jobs in Linguistics

Description:

Group of faculty (5) and students (15) from CS, Linguistics, SOAIS ... 0002 cassette stereo. 0002 a/c. 0003 Auto. 0003 jade green. 0003 gold. DLLS 2003. 9 ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 27
Provided by: derylewl
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Ontologically-based Searching for Jobs in Linguistics


1
Ontologically-based Searching for Jobs in
Linguistics
  • Deryle Lonsdale
  • lonz_at_byu.edu

Funded by
2
The BYU Data Extraction Group
  • Group of faculty (5) and students (15) from CS,
    Linguistics, SOAIS
  • Goal ontology-based data extraction
  • NSF funding CISE/IIS/IDM TIDIE
  • Website www.deg.byu.edu/
  • Papers, presentations
  • Tools
  • Demos

3
The BYU Data Extraction Group
4
Overview
  • Ontology-based extraction
  • Building knowledge sources
  • Jobs in linguistics (Sproat)
  • Putting it all together
  • Some sample results

5
Ontologies and IE
Source
Target
6
Document-based IE
7
Conceptual modeling (OSM)
8
Recognition and Extraction
9
Car-Ads Ontology (textual)
  • Car -gtobject
  • Car 0..1 has Year 1..
  • Car 0..1 has Make 1..
  • Car 0...1 has Model 1..
  • Car 0..1 has Mileage 1..
  • Car 0.. has Feature 1..
  • Car 0..1 has Price 1..
  • PhoneNr 1.. is for Car 0..
  • PhoneNr 0..1 has Extension 1..
  • Year matches 4
  • constant extract \d2
  • context "(\\d)4-9\d
    \d"
  • substitute "" -gt "19" ,
  • End

10
The data-frame library
  • Low-level patterns implemented as regular
    expressions
  • Match items such as email addresses, phone
    numbers, names, etc.
  • Mileage matches 8
  • constant extract "\b1-9\d0,2k"
    substitute "kK" -gt "000" ,
  • extract "1-9\d0,2?,\d3"
  • context "\\d1-9\d0,2?,\d3\d"
    substitute "," -gt "",
  • extract "1-9\d0,2?,\d3"
  • context "(mileage\\s)\\d1-9\d0,2
    ?,\d3\d" substitute "," -gt "",
  • extract "1-9\d3,6"
  • context "\\d1-9\d3,6\smi(\
    .\b\les\b)",
  • extract "1-9\d3,6"
  • context "(mileage\\s)\\d1-9
    \d3,6\b"
  • keyword "\bmiles\b", "\bmi\.", "\bmi\b",
    "\bmileage\b"
  • end

11
Lexicons
  • Repositories of enumerable classes of lexical
    information
  • FirstNames, LastNames, USstates, ProvoOremApts,
    CarMakes, Drugs, CampGroundFeats, etc.

12
Accessing the output
  • Extracted information is stored in a relational
    database
  • Results can be queried using SQL
  • Wide range of views is possible

13
Finding jobs in linguistics
  • Linguistlist.org, LSA
  • Email distribution lists (corpora, langage
    naturelle, CAAL/ACLA, etc.)
  • Usual commercial sites (monster.com, flipdog.com,
    dice.com)
  • Word-of-mouth sources

14
Sproats analysis
  • Random sample (224/2250) of LinguistList
    postings, 1994-2001
  • Development vs. research, academic vs. industrial
  • Linguists are most often (approx. 80 of the
    time) offered development jobs
  • Linguists hired more for specific tasks (e.g.
    grammar, lexicon development) rather than for
    more general research-oriented tasks (e.g.
    creating new technological approaches.)

15
The banner years
  • Year Academia Industry Industry
  • 1994 27 2 7
  • 1995 45 5 10
  • 1996 52 3 5
  • 1997 48 3 6
  • 1998 57 3 5
  • 1999 56 14 20
  • 2000 55 43 39
  • 2001 (mid) 22 10 31
  • Dramatic rise in 1999, 2000
  • Steep drop-off since 2001
  • Rising demand for technical, computational
    skills

16
Linguistic jobs ontology
  • Why?
  • user-specifiable constraints
  • Somewhat closely follows existing ontologies
    (e.g. jobs, software)

17
Data frames and lexicons
  • Language names
  • ethnologue
  • (sub)fields of linguistics
  • Linguistlist.org
  • Tools, toolkits
  • Software components, programming languages
  • Linguistics-related job titles
  • Activities
  • Responsibilities
  • Country names

18
The corpus
  • 3237 postings (LinguistList, Corpora, LN, WoM)
  • 1998 541
  • 1999 575
  • 2000 871
  • 2001 952
  • 2002 788
  • Some noise (non-English, factored, program
    descriptions, attachments, etc.)
  • Semi-automatic edits (boilerplate, publicity
    blurbs about institutions, etc.)

19
Sample output
  • Here

20
Observations
  • 270 dont have linguist (!)
  • Demand for knowledge of English equals that for
    all other languages combined (G, F, S, J, C)
  • Computer/computational background required for
    almost 1/3 (1116)
  • Noticeable amount of headhunting, particularly in
    Seattle, DC areas

21
Programming languages
22
Popular subfields
23
Subfields (another perspective)
24
An engineering discipline?
  • 160 linguistics jobs ending in engineer
  • Software development cycle
  • research e., software design e.
  • development e., software e.
  • software quality e., linguistic test e.,
    linguistic quality e.
  • linguistic support e., user experience e.
  • presales e., technical sales e.
  • Specific subfields
  • web site e.
  • speech e., voice recognition e., speech
    recognition application e., speech e., ASR
    tuning e., audio e.
  • dialog e.
  • tools e.
  • AI e., NLP e.
  • knowledge e.
  • linguist e., natural language e.
  • staff e.
  • human factors e., user interface e.

25
Paradigms
26
Other observations
  • Often a job title is not even listed (!)
  • More in18 of data frames (e.g. email, ph. )
  • Great need for (preferably hierarchical) lexical
    repositories related to linguistics
  • job titles
  • theoretical frameworks, subfields
  • typical linguist job activities
  • linguistic research/development venues
Write a Comment
User Comments (0)
About PowerShow.com