Title: The Neighborhood Auditing Tool
1The Neighborhood Auditing Tool
- James Geller
- Yehoshua Perl
- C. Paul Morrey
2Participating Student Developers
- Dayanand Sagar
- Kushal Chopra
- Sandeep Ramachandran
- Anisa Vishnani
- Aditi Dekhane
- Kandarp Shah
- Rajesh Gupta
- Suraj Pal Singh
- Saurabh Patel
- Kartik Gopal
- Yakup Kav
- Rahul Bhave
- Sirish Motati
- Pratik Shah
- Saurabh Singhi
- Sirish Motati Reddy
- Sandeep Pasuparthy
- Ramya Gokanakonda
2
3Overview
- Goals of an Auditors Tool for the UMLS
- Principles of Auditing with Neighborhoods
- The Idea of a Hybrid Display
- Current State of the NAT Serving the Auditor
- Feature Presentation
- Live Audit Session
- Planned State of the NAT Guiding the Auditor
- Conclusions and Future Work
3
4Auditing the UMLS
- The UMLS consists of over 100 terminologies.
- It is natural that inconsistencies will appear
- Over 1.5 million concepts and over 7 million
terms - Two level structure consisting of the Semantic
Network and the Metathesaurus
4
5How We did it before the NAT Paper Form
CPT C1081844 Antonospora locustae SRC NCBI STY
T004T009 Fungus Invertebrate DEF SYN
Antonospora locustae Nosema locustae PAR
AntonosporaSTY Invertebrate CHD
6Previous Work on Auditing
- H. Gu, Y. Perl, J. Geller, M. Halper, L. Liu, and
J.J. Cimino. Representing the UMLS as an
Object-oriented Database Modeling Issues and
Advantages. J Am Med Inform Assoc, 7(1)66-80,
2000. - J. Geller, H. Gu, Y. Perl, and M. Halper.
Semantic refinement and error correction in large
terminological knowledge bases. Data Knowledge
Engineering, 45(1)1-32, 2003. - Y. Chen, Y. Perl, J. Geller, and J.J. Cimino.
Analysis of a study of the users, uses, and
future agenda of the UMLS. J Am Med Inform
Assoc, 14(2)221-231, 2007. - H. Gu, G. Hripcsak, Y. Chen, C.P. Morrey, G.
Elhanan, J.J. Cimino, J. Geller, and Y. Perl.
Evaluation of a UMLS auditing process of semantic
type assignments. In J.M. Teich, J. Suermondt,
and G. Hripcsak, editors, Proc AMIA Symp, pages
294-298, Chicago IL, Nov. 2007.
7Auditing Results Paper Form
- (C1081844) Antonospora locustae
- STY Fungus Invertebrate
- No errors
- Semantic Type Error Fungus
- Semantic Type Error Invertebrate
- Ambiguity
- Add Semantic Type______________________
- Other error_____________________________
- Comments _____________________________
______________________________________
7
8Goals of an Auditors Tool for the UMLS
- Display relevant information to the auditor.
- Do not overwhelm the auditor with too much
information. - Helps the auditor focus on areas most likely to
contain errors. - Neighborhood display of reviewed concepts
- Algorithms suggest likely erroneous concepts
8
9Principles of Auditing with Neighborhoods
- Several years of experience Auditing is to a
large degree a local activity. - Concepts have two kinds of knowledge elements
- Textual Knowledge Elements Preferred term, CUI,
synonyms, LUI, definition, sources, semantic
types - CONtextual Knowledge Elements Neighbors
9
10Neighborhoods
- Focus concept The concept presently under review
- Immediate Neighborhood The set of concepts
reachable from the focus concept by stepping one
relationship (up, down, lateral, etc.) - Extended neighborhood Includes parents of
parents (grandparents), children of children
(grandchildren) and siblings. No lateral chains.
10
11Immediate Neighborhood
11
12Extended Neighborhood
12
13Up-Extended and Down-Extended Neighborhood
- An up-extended neighborhood includes grandparents
and the immediate neighborhood. - A down-extended neighborhood includes
grandchildren and the immediate neighborhood. - Give auditor all s/he needs but not more.
14Semantic Type Neighborhood
- If we provide the semantic types for every
concept, those also form a neighborhood. - It is important to keep the information which
semantic types belong to which concepts.
15References about Neighborhood
- M.S. Tuttle, D.D. Sherertz, N.E. Olson, M.S.
Erlbaum, W.D. Sperzel, and L.F. Fuller, et al.
Using META-1, the first version of the UMLS
Metathesaurus. In Proc 14th Annu Symp Comput Appl
Med Care, pages 131-135, Washington, D.C., 1990. - S.J. Nelson, M.S. Tuttle, W.G. Cole, D.D.
Sherertz, W. D. Sperzel, M.S. Erlbaum, L.L.
Fuller, N.E. Olson, From meaning to term
semantic locality in the UMLS Metathesaurus. In
Proc Annu Symp Comput Appl Med Care, pages
209-213, Washington, D.C., 1991. - J.J. Cimino, H. Min, and Y. Perl. Consistency
across the hierarchies of the UMLS Semantic
Network and Metathesaurus. J Biomed Inform,
36(6)450-461, 2003.
16Desirable Information Beyond Neighborhoods
- Concept definition for Focus Concept
- Concept sources for Focus Concept
- Assigned Semantic Types of concepts
- Definitions of relevant Semantic Types
- Global view of the Semantic Network
- Indented (better for wide branches)
- Graphical (better for almost everything else) we
set the standard on this.
16
17The Idea of a Hybrid Display
- Diagrams are wonderful as long as they fit on
one screen. - Indented text is wonderful as long as there are
no or very few multiple parents. - But the UMLS does not fit onto one screen and
there are many cases of multiple parents.
17
18WHAT makes a diagram wonderful?
- You can follow parent/child paths with your eyes.
- You can get a feeling for everything a concept is
connected to with one look. - You can see multiple parents and paths with one
look. - You can see global features (short and bushy
versus tall and sparse, or (gasp) tall and bushy).
18
19What makes Indented Text Wonderful?
- Indentation expresses parenthood elegantly.
- There are no lines crossing.
- You dont need a layout algorithm.
- There is a linear order in which to study text.
19
20The Idea of a Hybrid Display (cont.)
- Keep the best features of text and the best
features of diagrams. - Maintain relative positions between the focus
concept and its children, parents, etc. - Eliminate clutter of arrows.
20
21A Hybrid Diagram/Form Display of a Neighborhood
Parents
Synonyms
Relationships
Focus Concept
Children
21
22Important Auditing Principles
- If a concept C has a combination of semantic
types assigned, and very few other concepts C1Cn
(n lt 6) have that same combination assigned, then
C and C1Cn are suspicious concepts. - We call this a small intersection.
- Group-based auditing Audit sets of similar
concepts. - Y. Chen, H. Gu, Y. Perl, J. Geller, and M.
Halper. Structural group auditing of a UMLS
semantic types extent. J Biomed Inform, 2007.
Accepted for publication.
23Current State of the NAT Serving the Auditor
- The Neighborhood Auditing Tool has been
implemented to fully support display of
neighborhoods. - Navigation to adjacent neighborhoods is easy.
- Additional features listed before have been
implemented.
23
24Demonstration of NAT Features
- Neighborhood
- Relationships
- Siblings
- Grandparents and grandchildren
- Synonyms
- Focus concept definition
- Focus concept sources
- Semantic Type display
- Semantic Type definition
- Semantic Network (indented)
- Semantic Network (diagram)
- Display Options
- Navigation
- Search
- Viewing History
- UMLS version
offline version
24
25Audit Example
- An algorithm determined that the concept
Antonospora locustae was likely assigned
incorrect semantic types. - We follow an auditors review of this concept
using the data from 2007AA.
offline version
25
26Preliminary Evaluation Study with NAT
- Compare paper-based auditing and NAT-based
auditing. - Counterbalanced groups.
- Recall improves with NAT use. Auditors seem
willing to investigate more concepts. - Precision stays the same. Auditors mental
process does not improve (?).
27Planned State of the NATGuiding the Auditor by
Finding (i.e. Computing) Audit Sets
- As noted before, errors are likely in small
intersections. - Planned new version of the NAT will compute and
display small intersections. - Errors are clearly visible in small groups of
supposedly similar concepts. - Planned new version of the NAT will compute small
groups of supposedly similar concepts.
27
2828
29Finding Successively Smaller Groups of Concepts
- Finding Audit sets by selecting
- Concepts with same semantic type.
- Concepts with 1. and same root.
- Concepts with 1. and 2. that have the same
relationships.
29
30(No Transcript)
31Audit Set Examples
- Example A A selection of concepts in the
intersection of Manufactured Object
Organization under the root School (environment). - Example B All concepts that are in a
non-chemical intersection with an extent size
less than five.
31
32Possible Auditors Recommendations (see Pg. 7)
- Mark concept as reviewed and correct.
- Mark semantic types that should be removed.
- Mark semantic types that should be added.
- Mark other kinds of errors.
- Attach notes to a reviewed concept.
32
3333
34Conclusions and Future Work
- Preliminary study showed that people are more
successful finding errors with NAT than with
paper sources. ? - Recall improved with the NAT, precision did not.
- NAT seems to nicely complement use of the UMLSKS.
34
35Conclusions and Future Work (cont.)
- This year, work with more human subjects to
quantify these observations. - Integration of algorithms for finding audit sets
with NAT. - By extent size
- Using roots, and relationship patterns within
extents.
35
36Thank you!
3737
38Preliminary Evaluation Study
Auditor Errors Errors Recall Recall Precision Precision F F
Auditor with NAT w/o NAT with NAT w/o NAT with NAT w/o NAT with NAT w/o NAT
1 57 45 0.97 0.82 0.53 0.51 0.86 0.63
2 22 20 0.43 0.35 0.55 0.55 0.48 0.43
3 39 34 0.64 0.58 0.46 0.53 0.54 0.55
4 56 44 0.55 0.54 0.30 0.34 0.39 0.42
Avg. 44 36 0.65 0.57 0.46 0.48 0.57 0.51
39Improved Recall
- The auditor finds it easy to search for more
errors in the neighborhood of the suspicious
concept. - With better recall and the same precision you
still find more errors.
40Auditing Demonstration
- The concept Antonospora locustae was selected for
audit by an algorithm that found it was the only
concept assigned to the intersection Fungus
Invertebrate in the UMLS 2007AA.
40
4141
4242
4343
4444
45(No Transcript)
4646
4747
4848
4949
50(No Transcript)
51(No Transcript)
52(No Transcript)
53NAT Features Demonstration
53
54Neighborhood
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74(No Transcript)