SARA - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

SARA

Description:

Enables the user to search through the BNC for examples of ... Aston, G./Burnard, L. (1998): The BNC Handbook. Exploring the British National Corpus with SARA. ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 29
Provided by: Sus5143
Category:
Tags: sara | aston

less

Transcript and Presenter's Notes

Title: SARA


1
SARA BNCweb-Specialised Concordancing Software
  • Presentation by Susan Boser

2
SARA
3
What is SARA?
  • SARA SGML Aware Retrieval Application
  • Client/Server software tool to query the BNC
  • Developed by Tony Dodd as part of the BNC Project
  • Holds a central database of texts with SGML
    mark-up to be queried by clients

4
What does SARA?
  • Enables the user to search through the BNC for
    examples of specific words, phrases, patterns of
    words etc.
  • Examples can be sorted and displayed in different
    formats
  • Searches can be limited to particular SGML
    contexts
  • To particular kinds of texts
  • Wild card or regular expression searching is
    possible
  • Supports complex queries

5
5 Types of Queries
  • Word Query
  • Phrase Query
  • POS Query
  • Pattern Query
  • SGML Query

6
Types of Queries 1
7
Types of Queries 2
8
Types of Queries 3
9
Types of Queries 4
10
Types of Queries 5
11
Complex QueriesQuery Builder
  • Combines the different or same kind of the 5 main
    query options
  • The parts are represented by nodes of various
    types
  • Consists at least of two nodes
  • Scope node defines the context (SGML element or
    Span(number of words))
  • Content node may be linked together and edit
    type of query

12
CQL Query
  • CQLCorpus Query Language
  • Defines a query using own internal command
    language of SARA
  • Atomic query
  • A word, punctuation mark, or delimited string
    (e.g. jam, ?, Mrs.)
  • A word-and-PoS pair (e.g. CANNN1)
  • A phrase (e.g. not in your life)
  • A pattern (e.g. colo?r)
  • An SGML query (e.g. ltbodygt)
  • Wildcard character _ (e.g. home _ center)

13
CQL 2
  • Unary operators
  • Case operator makes query case-sensitive
  • Header _at_ operator makes query search within
    headers as well as bodies of texts
  • Not ! Operator matches everything which is not a
    solution to the query (e.g. !cat dog finds
    occurrences of dog not preceded by cat)

14
CQL 3
  • Binary operators
  • Sequence blanks between two queries(e.g. cat
    dog)
  • Disjunction operator matches cases which
    satisfy either query (e.g. cat dog)
  • Join (order matters) and (order does not
    matter) operator match cases which satisfy both
    queries(e.g. cat dog)

15
Displaying the results
  • Line mode, all occurences of the item displayed
    as a single line
  • Page mode, each occurence displayed in full on
    the screen

16
Display Format
  • PLAIN, displays only the words and punctuation of
    the results
  • POS, displays the Part of Speech information for
    any word on the screen
  • SGML, each result displayed with full SGML
    mark-up
  • CUSTOM, displays each hit according to a
    user-defined format

17
BNCweb
18
What is BNCweb?
  • User-friendly, web-based client program
  • searches and retrieves for lexical, grammatical
    and textual data from BNC
  • relies on SARA
  • offers a wide range of additional features to
    SARA

19
Search Options
  • 2 Main Options
  • Standard Query
  • Lemma Query

20
1 Standard Query
  • Searches for words or phrases
  • Searches are not sensitive for wordclass
    distinction (e.g. nouns, verbs, adjectives,
    adverbs, conjunctins etc.)
  • Possibility to restrict searches to a subset of
    either written or spoken texts
  • Subsets are defined by selecting relevant
    metatextual categories

21
2 Lemma Query
  • Searches for words by additionally specifying the
    lemma type (nouns, verbs, adjectives etc.)
  • Also possibility to restrict searches to a subset
    of written or spoken texts and its metatextual
    categories (compare Standard Query)

22
Other Functions
  • Browse a file (by filename sentence number)
  • Word lookup (produces alphabetically ordered
    lists of lexical items)
  • Scan keyword/titles (retrieves a list of BNC text
    files on the basis of the classification
    contained in the "title" and "keyword" element of
    the file headers)
  • Explore genre labels (retrieves a list of BNC
    text files according to genre classification
    criteria )
  • Frequency lists

23
Display Options
  • Sentence format, views concordance lines
  • KWIC (Keyword-in-Context) format

24
SARA vs. BNCweb 1Features included in both
  • Usual functions of
  • thin delete
  • sort save
  • collocations
  • Search words, phrases, patterns, POS
  • Restrict search with the help of metatextual
    categories
  • Bibliographical data available

25
SARA vs. BNCweb 2Advantages
  • SARA
  • Definition of complex queries with the help of
    Query Builder
  • BNCweb
  • User-friendly
  • Works faster than SARA
  • No restrictions on the number of search results
  • Possibility of tag-sequence search (specify
    syntactic
  • Structures within the 4 words
  • Preceding and/or following the
  • match of a given query)
  • Offers detailed descriptive statistics for query
    results in the distribution option
  • Gives both absolute numbers of hits and
    normalised frequency counts

26
SARA vs. BNCweb 3Disadvantages
  • SARA
  • Not very user-friendly (complex, complicated)
  • Download restrictions (max 2000 hits)
  • no facility for automatic generation of collocate
    lists or further statistical analysis
  • No lemmatized index or lemmatizing component
  • No possibility to define, save re-use
    subcorpora except by saving and
  • re-using the queries which define them
  • SARA client can address only the whole of the
    SARA index
  • Cannot be used to search for patterns of POS
    codes without specifying the query by attaching a
    word

27
SARA vs. BNCweb 4Disadvantages
  • BNCweb
  • Can only be installed on the basis of a full
    installation of BNC text files, the software tool
    SARA (with its index files) and some additional
    UNIX-tools
  • Only compatible for 2nd version of BNC (BNC World
    Edition)
  • Complex query only possible if the user is aware
    of CQL

28
Sources
  • Literature
  • Aston, G./Burnard, L. (1998) The BNC Handbook.
    Exploring the British National Corpus with SARA.
    Edinburgh Edinburgh University Press.
  • Internet Sources
  • www.natcorp.ox.ac.uk/using/papers/burnard96a.htm
  • http//homepage.mac.com/bncweb/manual/bncwebman-ho
    me.htm
  • http//www.linguistlist.org/issues/13/13-2840.html
  • http//www.natcorp.ox.ac.uk/sara/index.html
Write a Comment
User Comments (0)
About PowerShow.com