Michael P' Oakes - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Michael P' Oakes

Description:

Entry requirements: first or second class degree in a related discipline. ... religious texts, still unsolved mysteries e.g. The Quiet Don, Marxism and the ... – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 24
Provided by: osirisSun
Category:

less

Transcript and Presenter's Notes

Title: Michael P' Oakes


1
Michael P. Oakes
  • University of Sunderland

2
Contents
  • Proposals for a Masters programme in Natural
    Language Processing
  • Future research plans / link with Wolverhampton
  • Plans for publications
  • Plans for grant proposals
  • Other funding ideas

3
Proposals for a Masters programme in Natural
Language Processing
  • Some preliminaries
  • Entry requirements first or second class degree
    in a related discipline. Computer programming
    will be taught from scratch.
  • Funding Erasmus, European Social Fund, ESRC
    Masters training package scheme for programme
    development, work-based learning
  • Students must receive an accurate idea of the
    content of the programme beforehand
  • Induction week meet the teaching team,
    familiarity with the University, formal
    registration, etc.
  • Diploma, Certificate and Masters awards. 8
    taught modules (24 lectures, 18 hours practical,
    58 directed reading, 50 self-directed research).

4
Certificate Stage
5
Diploma Stage
6
Project
  • Close links with industry established through
    3-month industrial placements, based either with
    the company or at the University.
  • The sponsor will either be from industry or
    academia, and there will also be a staff member
    from Wolverhampton to act as supervisor.
  • Project management (TOR, reviews), poster, viva,
    dissertation (typically introduction, research,
    analysis, implementation, evaluation /
    experiments, reflective conclusions).

7
Administration
  • Programme board of studies Institute Director or
    deputy, student representatives, one or more
    employers representatives, module leaders,
    programme leader, responsible for the management
    of the programme and the well-being of each
    module.
  • Board of assessment to decide student
    progression. External Examiner, no student
    representatives
  • Internal (prior to hand-out) and External (sample
    work shown prior to programme assessments)
    moderation.
  • Other quality control student and staff
    feedback, EEs report, programme annual report.
  • Each student has a personal tutor and student
    handbook.
  • Timely, face-to-face assessment may improve
    student satisfaction.

8
Future Research Plans,
  • And how these might complement the research
    topics of the Research Group in Computational
    Linguistics.

9
Automatic Summarisation
  • CAST Project produced an automatic summarisation
    tool term-based summarisation
  • Content-Based Abstracting (Paice).
  • TRESTLE (Gaizauskas).
  • David Evans evaluation of information extraction
  • Query-based summaries. Intrinsic
    (representativeness) vs. Extrinsic (judgeability)
    evaluation (Liang).
  • SumTrain reached second round of EU evaluation.
  • Extraction of statistics-related phrases, e.g.
    greater than, significant reduction in, was
    directly proportional to, did not affect.

10
Concept-Based Abstracting Project
  • window length 4
  • STOP 6 "and foliar treatment AGEN"
  • 5 "foliar treatment AGEN "
  • 5 "treatment AGEN AGEN"
  • 4 "effect of mildew AGEN"
  • 3 "AGEN gave a significant"
  • 2 "AGEN was the most"
  • 2 "AGEN at different sowing"
  • 2 "AGEN increased fertile tillers
  • LOW-FQ 1 "effect of AGEN sprays"

11
Automatic Terminology Processing
  • Le An Ha looked at the concept of a terminology
    rather than individual terms. Knowledge patterns
    from glossaries store of terms and relations
    between them.
  • David Evans. Identification of terms using TF.IDF
    and other statistical methods (see slide 20).
  • Shiyan Ou. Sentiment classification (see slide
    20).
  • Constantin Orasan. Corpus of junk mail (spam
    filters, Farrow).
  • Constantin Orasan. Analysis of genre differences
    project on Language, Computation and Style
    (authorship).
  • Englishes, Scrip newsfeeds, BELGA feature
    extraction for text classification.

12
Annotation tools
  • Constantin Orasan PALinkA, automatic annotation
    of anaphoric links.
  • Lewandowska, Oakes Rayson part-of-speech and
    semantic code tagging in English alignment
    enables partial semantic tagging of L2.

13
Annotation Aligned and Partially Tagged Polish
text (Lewandowska, Oakes and Rayson)
  • Tak jest_A3 mowi Polemarch_Z99 a do_Z5 tego
    jeszcze urzadra nocne nabozenstwo, ktore_Z8 warto
    zobaczyc
  • __PUNC That_DD1_Z8 s_VBZ_A3 the_AT_Z5
    way_NN1_X4.2 of_IO_Z5 it_PPH1_Z8 ,_,_PUNC
    __PUNC said_VVD_Q2.1 Polymarchus_NP1_Z99
    _,_,PUNC __PUNC and_CC_Z5 ,_,_PUNC
    besides_RR_Z5 _,_,PUNC there_EX_Z5, is_VBZ_A3
    to_TO_Z5 be_VBI_A3 a_AT1_Z5 night_NNT1_T1.3
    festival_NN1_K1/S1.1.3 which_DDQ_Z8
    will_VM_T1.1.3 be_VBI_A3 worth_II_I1.3
    seeing_VVG_X3.4 ._._PUNC

14
Mobile Devices
  • Laura Hasler and Dalila Mekhaldi QALL-ME,
    Question-Answering for Digital Phones.
  • Chufeng Chen Annotation of digital photographs
    taken with a GPS camera. A gazetteer
    translated longitude and latitude data into
    place name, geographical feature, e.g. Long
    54.91, Lat -1.4, place Sunderland, feature
    harbour. Episodic memory.

15
Other Related Work
  • Andrea Mulloni Corpus Linguistics.
  • Empirical vs. Chomskyan
  • Own interest Statistics for Corpus Linguistics.
  • Driving the process rather than merely testing
    for statistical significance, e.g. Mutual
    Information to find collocations.
  • Irina Temnikova Machine Translation
  • Alignment for example-based machine translation
    (Lewandowska Oakes).

16
Plans for Publications (1)
  • Book Chapters in press
  • Processing Multilingual Corpora, Chapter 32 of
    Corpus Linguistics An International Handbook,
    eds. Anke Lüdeling and Merja Kytö, Mouton de
    Gruyter.
  • Corpus Linguistics and Stylometry, Chapter 52,
    ibid.
  • Corpus Linguistics and Language Variation, in
    Contemporary Approaches to Corpus Linguistics,
    ed. Paul Baker, Continuum.
  • Javanese, in Languages of the World, ed.
    Bernard Comrie, Routledge.
  • J. Vilares, M. Oakes and M. Vilares A
    Knowledge-Light Approach to Query Translation in
    CLIR. RANLP V, ed. N. Nicolov, Benjamins.

17
Plans for Publications (2)
  • Under second review
  • S-W. Ke, C. Bowerman and M. Oakes, Automatic
    classification of personal email with PERC and
    time-related strategies, ACM Transactions on
    Information Systems.
  • W-C Lin, M. Oakes and J. Tait, Improving image
    annotation via representative feature selection,
    Cognitive Processing.

18
Plans for Publications (3)
  • Future plans
  • VITALAS Video and image Indexing and reTrievAl in
    the LArge Scale.
  • Update Statistics for Corpus Linguistics sold
    over 1500 copies, but now 10 years old
  • Last chapter was Literary Detective Work, which
    could be a book in its own right disputed
    authorship (compendium of techniques,
    Shakespeare, religious texts, still unsolved
    mysteries e.g. The Quiet Don, Marxism and the
    Philosophy of Language), unknown languages
    (Linear B, Voynich manuscript). JLLC, QL.

19
Plans for Grant Proposals (1)
  • Closing the Semantic Gap
  • Related to machine learning (boosting), caption
    analysis, gazetteers, alignment of low level
    image content features and high level semantic
    features (words)
  • Son of VITALAS?

20
Plans for Grant Proposals (2)
  • Which words are truly characteristic of a corpus?
    X² etc.
  • Countable linguistic features.
  • Measures from IR e.g. PageRank (Lódz, Palomino).
  • AHRC (if theoretical, Englishes), ESRC (if
    applied, e.g. spam filters).
  • Sentiment analysis (Thijs Westerveld at Teezir)
    mining online opinions. Cheerful, chic, cheap,
    clean vs. chaos, cranky, cumbersome, damaged.
  • Interface between NLP and IR sentence analysis
    e.g. adjectives, negatives follow links to
    navigate websites.
  • IR relevant vs. irrelevant documents.

21
Plans for Grant Proposals (3)
  • Temporal relations in query language modelling
    (Dawei Song).
  • Temporal similarity semantic similarity ?
    overall similarity.
  • The temporal similarity between texts (e.g. query
    and document) can be estimated by a) time stamp,
    b) temporal logic between the texts (Andrea
    Setzer).

22
Plans for Grant Proposals (4)
  • Corpus Profiling Workshop on October 18th.
  • Exploring how corpus characteristics affect the
    behaviour of techniques in IR and NLP, and to set
    out a roadmap for a shared research agenda.
  • Data set profile impacts on automatic
    classification, IR, anaphora resolution,
    automatic summarisation and word sense
    disambiguation.

23
Other Funding Ideas
  • IRSG-like Industry Day to foster industrial
    contacts (consultancy? Grant proposals?)
  • Organise conferences, e.g. bid for Corpus
    Linguistics, CLEF, ECIR.
  • Exploitation of Intellectual Property.
  • Is there an equivalent of CEDEC (Computing and
    Engineering Distance Education Centre) with whom
    we can discuss marketing programmes world-wide /
    part-time? Work-based learning?
Write a Comment
User Comments (0)
About PowerShow.com