Masters Thesis Defense - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Masters Thesis Defense

Description:

Helps prepare the References section in their documents. Defines entry types and required/optional fields ... BibTeX formats and sorts according to the .bst style ... – PowerPoint PPT presentation

Number of Views:8763
Avg rating:3.0/5.0
Slides: 79
Provided by: s010
Category:
Tags: bst | defense | masters | thesis

less

Transcript and Presenter's Notes

Title: Masters Thesis Defense


1
Masters Thesis Defense
  • Bibliographic ToolsIn The Context Of WWW And
    LaTeX
  • Munushree Thummala
  • Committee members
  • Dr. Prabhaker Mateti (Advisor)
  • Dr. Thomas Hartrum
  • Dr. T.K. Prasad

2
Agenda
  • Introduction
  • BiBTeX Primer
  • Bibliographic Tool Survey
  • Requirements for the BiBTeXTools
  • Design Discussion
  • Conclusion
  • Future Work
  • Questions Answers Session
  • Demonstration

3
Introduction
  • Preparing academic papers
  • Collecting bibliographic entries
  • Tools used to prepare the papers
  • Common problems

4
BibTeX Primer
  • What is BibTeX?
  • Helps prepare the References section in their
    documents
  • Defines entry types and required/optional fields
  • Uses style files to define the format of
    references
  • Standards for publications are specified in style
    files
  • Used with LaTeX
  • Latex collects \cites in the .tex file
  • BibTeX extracts corresponding references from
    .bib file
  • BibTeX formats and sorts according to the .bst
    style
  • Output of BibTeX program is LaTeX formatted text

5
Sample BibTeX entry
  • _at_mastersthesisThummala-2007, author
    Munushree Thummala, title Bibliographic
    tools in the context of WWW and \latex, month
    November, year 2007, school Wright
    State University, OPTkey ,
  • OPTtype ,
  • OPTaddress ,
  • OPTnote ,
  • OPTannote ,
  • advisor Prabhaker Mateti

6
Contribution Of Thesis
  • Evaluation of Bibliographic tools
  • BiBTeX to Database Suite of Tools
  • Database to store BibTeX entries
  • LoadBiBTeX
  • BibSearch
  • Discovery of Duplicate BiBTeX entries
  • Normalization of BiBTeX entries
  • Text to BiBTeX Translation
  • TextToBiBTeX command line tool API
  • PDFrefsToBiBTeX command line tool
  • Integration of TextToBiBTeX into Aigaion

7
Bibliographic Tools
  • There are 100 tools
  • In this thesis 87 are reviewed
  • Tools were evaluated for the following
  • Formats supported
  • Navigating, Searching and Sorting capabilities
  • Ease of maintaining bibliographic entries
  • Duplicate discovery
  • Import/Export to other formats

8
Bibliographic Tools
  • Web browser based tools
  • Aigaion, Bibsonomy, CiteULike, Zotero, BibORB,
    Basilic, PubsOnline, etc.
  • Desktop/Small scale tools
  • JabRef, KBibTeX, TkBibTeX, BibDB, BibEdit, Open
    Office Bibliographic Manager, Tellico, etc.
  • Commercial tools
  • Scholars Aid, Bookends, NotaBene, ProCite, etc.
  • Utilities
  • Bib2html, Bibclean, Bp, Bibdup, Sixpack, etc.

9
A Few Notable Tools
  • Aigaion
  • Zotero
  • Bibsonomy
  • JabRef

10
Aigaion
  • Web application, Open source
  • Easy to use
  • Supports basic editing features
  • Supports Multiple Users
  • Native format is BiBTeX
  • Organizes references by Topics Sub Topics
  • Maintains a list of authors to eliminate
    duplication
  • Duplicate discovery present in import feature

11
Aigaion (Contd. 2)
12
Aigaion (Contd. 3)
13
Aigaion (Contd. 4) Author Profile
14
Zotero
  • Firefox Browser Extension Easy to use
  • Organizes entries in collections
  • Captures bibliographic entries from websites
    automatically
  • Some drawbacks
  • Loses BiBTeX citation keys and custom fields
    while importing
  • Not well suited for managing BiBTeX
    bibliographies
  • Local storage

15
Zotero (Contd. 2)
16
Zotero (Contd. 3)
17
Zotero (Contd. 4)
18
Zotero (Contd. 5)
19
Bibsonomy
  • Web browser based, hosted service
  • Easy to use
  • References
  • Users upload refs and bookmarks to Bibsonomy
  • Made available to other users
  • Tagged with keywords for categorization and
    search
  • Can be exported as BiBTeX
  • Browser shortcuts to capture entries from web

20
Bibsonomy (Contd. 2)
21
Bibsonomy (Contd. 3)
22
Bibsonomy (Contd. 4)
23
Bibsonomy (Contd. 5)
24
JabRef
  • Desktop Application
  • Easy to use
  • Multiple bib files can be edited
  • Search online
  • CiteSeer, Medline, IEEExplore, ArXiv.org
  • Native format is BibTeX
  • Auto generate BiBTeX keys
  • Imports/Exports multiple formats

25
JabRef (Contd. 2)
26
JabRef (Contd. 3)
27
JabRef (Contd. 4)
28
Requirements for New Tools
  • Text to BiBTeX translation
  • Translating free style text into BibTeX
  • Customizing the translation
  • Certainty of Recognition measure
  • Extract references section from PDF papers
  • Provide an API for other developers to integrate
    free style translation into their applications
  • Command line invocation
  • GUI also
  • Normalized BiBTeX output

29
Requirements (Contd. 2)
  • Database of Bibliographic entries
  • Database to store BiBTeX files
  • Tool to Detect duplicates
  • Command line invocation
  • Normalized BiBTeX output

30
Requirements (Contd. 3)
  • Search and Generate BiBTeX files
  • Flexible searches
  • Command line invocation
  • Outputs BiBTeX format
  • Normalized BiBTeX output
  • Platform Independent

31
Database on Local Machine
  • Tables to store
  • BiBTeX entries
  • lookup data for text to BiBTeX translation
  • search index data for fast and flexible searching

32
Database Of BiBTeX Entries
  • A schema to store BiBTeX entries
  • including string macros
  • Ability to specify a tag for each entry
  • Tag defaults to .bib filename

33
Database Of Lookup Data
  • A database Schema to store lookup tables
  • Lookup Tables
  • Author Sub Names
  • Journal Names
  • Publishers
  • Cities
  • States
  • Months
  • Organizations

34
Database Of Search Indexes
  • A database Schema to store BiBTeX Search Index
    data
  • Stores data as sequence of tokens
  • Provides ability to search
  • Any field(s)
  • Any keyword(s)
  • Citation key also stored as tokens

35
LoadBiBTeX Tool
  • Loads BiBTeX files into the database and updates
    the search index tables
  • Loads the lookup tables used by Text to BiBTeX
    tool
  • Detects duplicates

36
LoadBibTeX Loads BiBTeX Files
  • Program Usage
  • LoadBiBTeX loadentries bibtag thesis2007
    bibfile thesis.bib
  • Any entries that have errors are not loaded and
    are shown in the output
  • Updates the index tables used by the BibSearch
    tool

37
LoadBibTeX Populate Lookup Tables
  • Program Usage
  • LoadBiBTeX loadauthors loadpublishers
    loadjournals bibfile thesis.bib
  • Only new values are loaded
  • The above command does not load the BiBTeX entries

38
LoadBibTeX Duplicate Discovery
  • Program Usage
  • LoadBiBTeX dupdisc bibtag thesis2007 bibfile
    thesis.bib
  • The BiBTeX entries in thesis.bib are read and
    compared to the entries in the database
    corresponding to the bibtag thesis2007
  • Any entries considered to be duplicates are
    displayed for the user

39
BibSearch Searching The Database
  • Program Usage
  • BibSearch bibtag thesis2007 fields author
    keywords Donald Knuth
  • The database is searched for entries with the tag
    thesis2007 and the words Donald and Knuth
    in the author field
  • The resulting BiBTeX entries and any required
    _at_String constructs are normalized and written to
    the output

40
Normalization
  • Make BiBTeX entries consistent
  • Some of the rules
  • Citation Keys are consistent
  • Fields are enclosed in to preserve formatting
  • Month field abbreviations are expanded
  • Missing required fields are indicated to the user
    appropriately
  • Order of the fields in the output
  • Where is it implemented?
  • In whichever tool a particular rule makes sense
  • Spread across TextToBiBTeX, LoadBibTeX, BibSearch

41
Normalization (Example 2)
  • _at_mastersthesisThummala2007, title
    Bibliographic tools in the context of WWW and
    \latex, year 2007, school Wright State
    University, month Nov, author Munushree
    Thummala, advisor Prabhaker Mateti,
  • _at_MASTERSTHESISThummala-2007, AUTHOR
    Munushree Thummala, TITLE
    Bibliographic tools in the context of
    WWW and
    \latex, MONTH November, YEAR
    2007, SCHOOL Wright State
    University, ADVISOR Prabhaker
    Mateti,

42
Normalization (Example 3)
  • _at_InCollection lawrence01access,
  • author "Steve Lawrence",
  • title "Access to Scientific Literature",
  • journal "The \it Nature Yearbook of
    Science and Technology",
  • editor "Declan Butler",
  • publisher "Macmillan",
  • address "London, England",
  • pages "86-88",
  • year 2001
  • _at_INCOLLECTION Lawrence-2001,
  • AUTHOR Steve Lawrence,
  • TITLE Access to Scientific
    Literature,
  • BOOKTITLE ,
  • YEAR 2001,
  • JOURNAL The \it Nature Yearbook of
    Science and Technology,
  • EDITOR Declan Butler,
  • PUBLISHER Macmillan,
  • ADDRESS London, England,

43
Text to BiBTeX Translation
  • What are Free Style References and where would
    authors find these ?
  • References at the end of academic papers
  • References on Internet sites like CiteSeer
  • A jotted-down text description
  • How do authors benefit from this translation ?
  • No need to manually convert to BiBTeX
  • Significantly better accuracy
  • Speeds the process of translating multiple
    references

44
Text to BiBTeX Translation (Contd. 2)
  • Ways to translate free style text
  • Write a routine to analyze the strings and guess
    the fields
  • Develop
  • Language Grammar
  • Recursive Descent Parser
  • Which method did we pick?
  • Recursive Descent Parsing
  • Tried other methods with varying degrees of
    success

45
Text to BiBTeX Translation (Contd. 3)
  • How does the Parser work?
  • Extent A sequence of tokens
  • Field type An extent that matches the set of
    okTokens for that field and ends when a
    notOkToken (including a delimiting token) is hit.
  • Backtrack If the current token in an extent does
    not match the field, it is backtracked to the
    beginning token, and given a chance to match
    other field types.
  • Unrecognized If the current token does not
    match any field type, it is appended to the
    unrecognized field list and the above process is
    repeated starting at the next token.

46
Text to BiBTeX Translation (Contd. 4)
  • How is a series of tokens recognized as a field?
  • Author, Journal fields - lookup table and
    heuristics
  • Title field - quoted strings or heurisitics
  • Pages field
  • PAGES.PP.P. ltnumber numbergt
  • Year field - a four digit number between 1900
    and 2100
  • Volume field
  • VOL. VOLUME ltnumbergt
  • Number field
  • NO. NUMBER ltnumbergt
  • Abbrev field
  • ltvolumegt(ltnumbergt)ltstartpagegt-ltendpagegt
  • Edition field-
  • EDITIONltnumbergt or ltnumbergt EDITION
  • Publisher field, Place, State - Lookup table

47
Text to BiBTeX Translation (Contd. 5)
  • A lexical analyzer tokenizes
  • Holland, J. H. Adaptation in Natural and
    Artificial Systems. The University of Michigan
    Press, Ann Arbor, MI (1975).

48
Text to BiBTeX Translation (Contd. 6)
  • Author Field Recognition
  • Holland was present in author lookup table
  • J., H. are initials and the author is
    recognized as present in the form lastname,
    firstname
  • Author Field is set to J.H. Holland

49
Text to BiBTeX Translation (Contd. 7)
  • Title Field Recognition
  • Since Adaptation is not recognized as a
    possible starting token of any other field,
    tokens are gathered till the next punctuation as
    title field

50
Text to BiBTeX Translation (Contd. 8)
  • Publisher Field Recognition
  • The sequence of tokens The University, of,
    Michigan and Press represent a valid
    publisher name in the publishers lookup table
  • Thus The University of Michigan Press is
    publisher field

51
Text to BiBTeX Translation (Contd. 9)
  • Place and State Field Recognition
  • The sequence of tokens Ann and Arbor
    represents a valid place name in the cities
    lookup table
  • The token MI represents a valid state name in
    the states lookup table

52
Text to BiBTeX Translation (Contd. 10)
  • Year Field Recognition
  • The token 1995 is a valid year value in the
    range 1900 - 2100. As such it becomes the year
    field

53
Text to BiBTeX Translation (Contd. 11)
  • Citation Entry Type
  • Since there are no distinguishing fields
    recognized, the entry type is defaulted to Misc
  • CORN calculations
  • Author field is fully recognized ? a CORN of 100
  • Title field follows Author field ? a CORN of 100
  • Publisher field is in lookup table ? a CORN of
    100
  • There are no required fields for Misc entry type.
    So multiplier is 1
  • Entry CORN AVG ( Author Title Publisher)
    multiplier 100

54
Text to BiBTeX Translation (Contd. 12)
  • -- Entry CORN 100 Author 100 Title 100
  • -- Publisher 100
  • _at_MISCHolland-1975
  • AUTHOR J. H. Holland
  • TITLE Adaptation in Natural and
    Artificial Systems
  • YEAR 1975
  • PUBLISHER The University of Michigan
    Press
  • PLACE Ann Arbor
  • STATE MI

55
Text to BiBTeX Translation Example 1
  • Werner Damm and Bernhard Josko. A sound and
    relatively complete Hoare-logic for a language
    with higher type procedures. Acta Informatica,
    2059-101, 1983.
  • -- Entry CORN 87 Author50 Title 100
    Journal 100 Pages 100
  • _at_ARTICLEDamm-Josko-1983,
  • AUTHOR Werner Damm and
    Bernhard Josko,
  • TITLE A sound and
    relatively complete Hoare-logic
  • for a language with higher type
    procedures,
  • YEAR 1983,
  • JOURNAL Acta Informatica,
  • PAGES 59-101,
  • VOLUME 20,

56
Text to BiBTeX Translation Example 2
  • Collins R. J. and Jefferson D. R. "AntFarm
    towards simulated evolution." In C. G. Langton,
    C. Taylor, J. D. Farmer, and S. Rasmussen (Eds.),
    Artificial Life II, Vol. X of SFI Studies in the
    Sciences of Complexity. Redwood City, CA
    Addison-Wesley, 1991, pp.579-601.

_at_INPROCEEDINGSJ-R-1991, AUTHOR Collins
R. J. and Jefferson D. R., TITLE
AntFarm towards simulated evolution., YEAR
1991, EDITOR G. Langton and C.
Taylor and J. D. Farmer and S.
Rasmussen, PAGES 579-601, PUBLISHER
Addison - Wesley, JOURNAL In C,
PLACE Redwood City, STATE CA,
OPTERRORFIELD0 Artificial Life II,
OPTERRORFIELD1 Vol. X of SFI Studies
in the Sciences of Complexity,

57
Correctness Of Recognition Number
  • CORN for entire BiBTeX entry is based on
  • CORN for each field recognized
  • Completeness of the entry ( of required fields
    present)
  • CORN is calculated for
  • Author field
  • Editor field
  • Title field
  • Journal field
  • Publisher field
  • Pages field

58
CORN Example 1
_at_INPROCEEDINGSWegener-2002, AUTHOR I.
Wegener, TITLE Methods for the
Analysis of Evolutionary
Algorithms on PseudoBoolean Functions,
BOOKTITLE , YEAR 2002,
PUBLISHER Kluwer Academic Publishers,
JOURNAL In Evolutionary
Optimization,
59
CORN Example 1 (Contd.)
  • Author, Title and Publisher were correctly
    recognized and their field CORN is set to 100
    each.
  • The journal field was recognized due to the
    presence of string In. As such it is assigned
    a CORN of 50.
  • The required field Booktitle is not present so
    the multiplier is ¾.
  • This reduces the entry CORN to 65.
    (10010010050)/43/4

60
CORN Example 2
  • _at_MISCLuckham-1990,
  • AUTHOR David Luckham,
  • TITLE Programming with Specifications,
  • YEAR 1990,
  • EDITION 1,
  • OPTERRORFIELD0 Springer,
  • OPTERRORFIELD1 Berlin,

61
CORN Example 2 (Contd.)
  • One of the Author names is not fully recognized
    and hence reduces the CORN for author field to
    1/2100 50
  • Title is correctly recognized and its field CORN
    is set to 100.
  • Year and Edition fields are correctly recognized
    but do not impact entry CORN.
  • Entry CORN (10050)/2 75. Since the entry
    type is MISC, the multiplier is 1.

62
CORN Example 3
_at_INPROCEEDINGSCollins-Jefferson-1990,
AUTHOR Robert J. Collins and David
R. Jefferson, TITLE AntFarm
Towards simulated evolution, BOOKTITLE
, YEAR 1990, PAGES
579--601, MONTH February,
PUBLISHER Addison - Wesley, JOURNAL
In Artificial Life II Proceedings
of the Workshop on Artificial
Life, PLACE Santa Fe, STATE
NM,
63
CORN Example 3 (Contd.)
  • Author names are fully recognized and hence CORN
    is set to 100.
  • Title is correctly recognized and its field CORN
    is set to 100.
  • Pages is recognized and the page range is valid
    so CORN is 100.
  • Journal is recognized with a heuristic, so CORN
    is set to 50.
  • Publisher is publishers lookup table, so CORN is
    set to 100.
  • Entry CORN (10010050100100)/5 (3/4) 67.
    The multiplier ¾ is due to the missing booktitle
    required field.

64
TextToBiBTeX API
  • SetupDbConnection
  • setInputString
  • setMarkupStream re colorized HTML
  • setBiBTeXStream re BiBTeX entries
  • textToBiBTeX text to BiBTeX translation
  • getEntriesCount
  • getBibTeXEntryFieldCount
  • getBibTeXEntryField

65
TextToBiBTeX API (Contd.)
  • Java library jar
  • Non-java programs can invoke
  • TextToBiBTeX
  • PDFrefsToBiBTeX

66
TextToBiBTeX Command line tool
  • Free style input in a file
  • BiBTeX output
  • Marked up HTML output
  • Uses TextToBiBTeX API
  • Usage
  • TextToBiBTeX lttxt filegt bib file

67
PDFrefsToBiBTeX Command line tool
  • PDF file as input
  • BiBTeX output
  • Marked up HTML output
  • Uses 3rd party tool PDFBox for parsing PDF file
  • Uses TextToBiBTeX API
  • Usage
  • PDFrefsToBiBTeX -clean ltpdf filegt bib file

68
Integrating into Aigaion
  • Free Style translation functionality integrated
    into Aigaion
  • Free Style recognition from PDF files
  • Logic to clean the text recognized from PDF
  • Synchronizing TextToBiBTeX lookup tables with
    entries from Aigaion database

69
Integrating Into Aigaion (Contd. 2)
70
Integrating Into Aigaion (Contd. 3)
71
Integrating Into Aigaion (Contd. 4)
72
Integrating Into Aigaion (Contd. 5)
73
Sync Tables with Aigaion (Contd. 6)
74
Sync Tables with Aigaion (Contd. 7)
75
Conclusion
  • Tool Survey
  • Evaluated over 80 tools
  • Tool Recommendations
  • Database of BiBTeX entries
  • Store BiBTeX files as database entries
  • Searching is based on token level instead of
    string level which yields good results
  • Duplicates are detected logically instead of
    string comparisons

76
Conclusion (Contd.)
  • Text to BiBTeX translation
  • TextToBiBTeX saves scholars time and effort by
    relieving them from the burden of translating and
    maintaining BiBTeX entries
  • TextToBiBTeX API allows other tools to reuse free
    style functionality
  • Integrated into Aigaion tool
  • Converted PDF references into BiBTeX format

77
Future Work
  • Better duplicate detection by letting the users
    configure the base rules for detecting duplicates
  • Recognizing more variations in Free style text
  • Recognizing more fields
  • Optimizing the database loading speed for BiBTeX
    entries

78
Demonstration
  • Integration of free style into Aigaion
  • Text file input
  • PDF file input
  • LoadBiBTeX Duplicate Discovery
  • BibSearch Searching the database
  • LoadBiBTeX loading a BiBTeX file
  • LoadBiBTeX updating lookup tables
Write a Comment
User Comments (0)
About PowerShow.com