Title: Semantic Processing of Twitter Traffic for Epidemic Surveillance
1Semantic Processing of Twitter Traffic for
Epidemic Surveillance David Hale Project Lead,
SemanticTwitter david.hale_at_nih.gov _at_lostonroute66
U.S. National Library of Medicine National
Institutes of Health Department of Health and
Human Services
2Pandemic Preparedness
- Outbreaks data requires agile information
collection / dissemination - Passive vs. Active Information Acquisition
- Engagement within utilized channels
- Disaster information traffic delays, loss,
overload
3Future of Syndromic Surveillance
- Social media
- Real-time data
- Monitor sentiment as well as events
- NLP analysis
- Requires less data / lower computational
intensity than massive ingestion / keyword
searches - More informative
- swine flu and travel VS. how fast swine flu
travels AND is it safe to travel during a swine
flu epidemic
4Twitter
- SMS gateway enables posting from mobile devices
- Users post without breaking context or setting
- JIT (just-in-time) blogging
- Grammaticality variable
- Folksonomy user defined vocabularies
- Hashtags () denote topics
5Twitter
- Some posts provide (purported) information
- Authority/accuracy not determined
- Majority express opinions
- Often with humor or sarcasm
- Value for syndromic surveillance
- Source for assessing public sentiment
- Observation of information trending
- As a guide for government action
6Examples
- CDC tips for preventing the flu wash hands often
and stay home when sick - Oklahoma health officials say swine flu headed to
state, public needs to take precautions - I bet this whole swine flu scare really has
Kermit the Frog rethinking his relationship - Whats next? Three-toed sloth flu?
7NLP Analysis
- Unified Medical Language System (UMLS)
- Medical concepts in semantic types (or classes)
- MetaMap
- Identifies UMLS concepts in text
- SemRep
- Identifies semantic relations between concepts
- Tools currently available for download
- http//skr.nlm.nih.gov/
- Substantial learning curve
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Monitoring Twitter with NLP
- Processed 1300 Twitter posts
- Known to be about swine flue
- Sent during 1 hour on Monday, April 27, 2009
- Preprocessed, to accommodate format
- Ran MetaMap and SemRep
- Extracted semantic concepts and relationships
- Defined a semantic schema for influenza epidemic
12Schema UMLS Semantic Types
- Schema for influenza epidemic
- Disease or Syndrome
- Sign or Symptom
- Geographic Area
- Mammal
- Health Care Organization
- Medical Device
13MetaMap and SemRep Output
- Tweet
- Texas confirms third case of swine flu
- Concepts extracted
- Texas Geographic Area
- Third Quantitative Concept
- Family suidae Mammal
- Influenza Disease or Syndrome
- Relationship
- Influenza PROCESS_OF Family suidae
14Results Filtered through Schema
- Disease or Syndrome Influenza
- Sign or Symptom Coughing
- Geographic Area Mexico
- Mammal Family suidae
- Health Care Organization Centers for Disease
Control and Prevention (U.S.) - Medical Device Mask
15Results PROCESS_OF Relation
- Influenza PROCESS_OF Family suidae
- Influenza PROCESS_OF Farmer, unspecified
- Influenza PROCESS_OF Hispanics
- Influenza PROCESS_OF Mexican
- Influenza in Birds PROCESS_OF Human
- Influenza-like symptoms PROCESS_OF Passenger
- Flu symptoms PROCESS_OF Family suidae
- Swine influenza PROCESS_OF Family suidae
16Next Steps
- Further testing (w/ noise) for effectiveness
- Grammatical analysis as determinant of authority
- Refine filters (frequency, semantic types)
- User control
- Implementation of proof-of-concept
- Preprocessing for tweet format
- NLP
- Final filtering
- Optimize output for specific roles
17Opportunities
- Biosurveillance
- Monitoring of wide-spread sentiment
- Targeted information provision
- Respond to misinformation trends
- Potential for evaluating authenticity
- Semantic comparison to trusted source
18Semantic Processing of Twitter Traffic for
Epidemic Surveillance David S. Hale Project
Lead, SemanticTwitter david.hale_at_nih.gov _at_lostonro
ute66
National Library of Medicine National Institutes
of Health Department of Health and Human Services