Title: Additional NLS Tools
1Additional NLS Tools
- NLSs Java NLP tools
- MMTx
- GSpell
2NLS Java NLP Tools
- Tokenizer
- Lexical Lookup
- NP Parser
- Document Centric
- Java Programs
- and APIs
3Java NLP Tools Tokenizer
Document
- Tokenizes text into
- Sections (paragraphs)
- Sentences
- Tokens
- Can handle
- FreeText
- HTML
- MedLINE Abstracts
Sections
Section 1
Sentences
Sentence 1
Tokens
Token 1
4Java NLP Tools Tokenizer
- Usage
- tokenize.batsh Options
- --fileNamefileName
- --outputFileNamefileName
- --inputTypefreeTextHTMLmedlineCitations
- --sections
- --sentences
- --tokens
- --pipedOutput
- --indicate_citation_end
5Java NLP Tools Tokenizer
tokenize.bat --inputFile5.txt --inputTypefreeTex
t --sentences --tokens
--pipedOutput
- Sentence197182But those follow-up tests have
been inconclusive, state and federal officials
said. - Token16979900But
- Token1710110510those
- Token1810811320follow
- Token1911411420-
- Token2011511630up
- Token2111812240tests
- Token2212412750have
- Token2312913260been
- Token2413414570inconclusive
6NLP Tools Lexical Lookup
Document
- Chunks tokens into
- terms
- From SPECIALIST
- Lexicon
- From regular
- expressions
Sections
Section 1
Sentences
Sentence 1
LexicalElements
Lexical Element 1
Tokens
7Java NLP Tools Lexical Lookup
- Usage
- LexicalLookup.batsh Options
- --fileNamefileName
- --outputFileNamefileName
- --inputTypefreeTextHTMLmedlineCitations
- --sections
- --sentences
- --lexicalElements
- --lexicalEntries
- --tokens
- --pipedOutput
-
8Java NLP Tools Lexical Lookup
LexicalLookup.bat --inputFile5.txt
--inputTypefreeText
--lexicalElements --lexicalEntries --pipedOutput
- Lexical Element17LEXICONprepBut9799
- LexicalEntrybutconjbaseE0014465
- LexicalEntrybutprepbaseE0014464
- Lexical Element18LEXICONdetthose101105
- LexicalEntrythosedetpluralE0060728
- LexicalEntrythosepronbaseE0060729
- Lexical Element20LEXICONadjfollow-up108116
- LexicalEntryfollow-upadjbaseE0028422
- Lexical Element23LEXICONnountests118122
- LexicalEntrytestsverbpres3sE0060349
- LexicalEntrytestsnounpluralE0060348
9NLP Tools NpParser
- Chunks sentences
- into simple phrases
10Java NLP Tools NpParser
- Usage
- npParser.batsh Options
- --fileNamefileName
- --outputFileNamefileName
- --inputTypefreeTextHTMLmedlineCitations
- --sections
- --sentences
- --phrases--nps--mincoMan
- --lexicalElements
- --lexicalEntries
- --tokens
- --pipedOutput
-
11Java NLP Tools NpParser
npParser.bat --inputFile5.txt --inputTypefreeTex
t --phrases --pipedOutput
- Phrase0010The companycompany
- Phrase11214has
- Phrase21624forwarded
- Phrase32639some materialsmaterials
- Phrase44162to a state laboratorystate
laboratory - Phrase56474in RichmondRichmond
- Phrase67686for furtherfurther
- Phrase78894testing
12MMTxMetaMapTechnology Transfer
- Maps text phrases to Metathesaurus
- concepts
- Java
- Implementation
- of MetaMap
Document
Tokenization
POS Tagger Client
Lexical Lookup
Parser
Variant Generation
Candidate Retrieval
Evaluation
Phrase 1
Final Mapping
Post-processing Presentation
13MMTx
- Usage
- MMTx ltoptionsgt --fileNameinfile
outputFileNameoutfile - --strict_model--moderate_model--relaxed_model
- --KSYearyear--mm_data_versioncustomName
- --thresholdlowestScore
- --truncate_candidates_mappings
- --term_processing--allow_overmatches--allow_co
ncept_gaps - --composite_phrases
- --prefer_multiple_concepts
- --fielded_output
14MMTx
MMTx --inputFile5.txt --inputTypefreeText
- Processing 00000000.tx.3 One problem is caused
by the VecTest itself, - which uses a dipstick to measure the presence of
a protein - associated with the parasite that causes malaria.
- Phrase "One problem"
- Meta Candidates (2)
- 861 Problem, NOS Finding,Pathologic Function
- 694 One Quantitative Concept
- Meta Mapping (888)
- 694 One Quantitative Concept
- 861 Problem, NOS Finding,Pathologic Function
15GSpell
16GSpell
- Spelling suggestion tool
- Pure Java application with Java APIs
- Support for multi word dictionary entries
17GSpell Usage
- Usage
- GSpellFind.shbat
- --dictionaryNameOfDictionary
- --inputFileSource --outputFiletarget
- --truncateN --considerNCandidatesN
- --maxEditDistanceN
- --fieldedText --termFieldX
--correctFieldY - --reportTime --version--help
18GSpell Example
- anonomousanonymous1.00.8734230160180236NGrams
- anonomousallonomous2.00.5819672267388108NGram
s - anonomousautonomous2.00.5819672267388108NGram
s - anonomousanadromous3.00.2958160192082048NGram
s - anonomousanalogous3.00.2958160192082048NGrams
- anonomousanomalous3.00.2958160192082048NGrams
- anonomousanonymously3.00.295816019208248NGram
s - anonomousanonymes3.00.2958160192082048Metapho
ne - anonomousanonyms3.00.2958160192082048Metaphon
e - anonomousacoprous4.00.11470810702102521NGrams
19GSpell Indexing
- Usage
- GSpellIndex.shbat
- --dictionaryNameOfDictionary
- --inputFileSourceFile
- --reportTime --version--help
- Format for the input file
- One word per line
20Downloadable Resources
- umlslex.nlm.nih.gov
- Lvg
- Java NLP Tools
- GSpell
- mmtx.nlm.nih.gov
21Lexical Tools for UMLS Developers
Allen C. Browne, Guy Divita, Chris Lu Lister
Hill National Center for Biomedical
Communications National Library of Medicine
Lexical Systems
umlsLex.nlm.nih.gov Email
umlslex_at_nlm.nih.gov Knowledge Source
Server http//umlsks.nlm.nih.gov UMLS
Information http//umlsInfo.nlm.nih.go
v