Title: Toward an integration of qualitative and quantitative text analysis methods
1- Toward an integration of qualitative and
quantitative text analysis methods - Words instead of Numbers
- Words as Numbers
- Words and Numbers
Normand Péladeau, President peladeau_at_provalisresea
rch.com Provalis Research
2NUMERICAL DATA
TEXTUALDATA
3Qualitative Analysis
ContentAnalysis
TextMining
Validity
Minutes Hours Days
Days Weeks Months
Time Requirements
4ContentAnalysis
TextMining
Qualitative Analysis
Reliability
Minutes Hours Days
Days Weeks Months
Time Requirements
5SIMSTAT (1989) Statistical Analysis
6(No Transcript)
7(No Transcript)
8SIMSTAT (1989) Statistical Analysis
9(No Transcript)
10THREE MAJOR OBSTACLES 1) Large number of word
forms 2) Polymorphy of language One idea ?
multiple forms 3) Polysemy of words One
word ? many ideas
11- List of most frequent words
- Extraction of common phrases and technical
vocabulary - Categorization of word and phrases using a user
defined dictionary (or taxonomy)
12(No Transcript)
13- Thesaurus and semantic database (WordNet)
- Keyword in context list (KWIC)
14Keyword in Context List (KWIC)
Senses of word stress 1 (psychology) a
state of mental or emotional strain or suspense
2 (physics) force that produces strain on a
physical body 3 Verb - single out as important
15Keyword in Context List (KWIC)
Disambiguation using phrases STRESS_THE or
STRESS_THAT ? single out as
important UNDER_STRESS ? Emotional State
16Keyword in context list (KWIC)
Disambiguation using rules TRANSFER IS NEAR
TECHNOLOGY TRANSFER IS NOT NEAR
BUS UNSATISFIED OR (SATISFIED IS AFTER
NÉGATION)
17(No Transcript)
18- Cluster analysis of words or documents
- Automatic classification of documents using
machine learning algorithms
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23Automatic Classification of Documents
24- Cluster analysis of words or documents
- Automatic classification of documents using
machine learning algorithms - Statistical reduction of words x documents matrix
(SVD, factor analysis, PCA)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28- Frequency Analysis (words, phrases, categories)
- Univariate analysis of frequency
- Comparison with normative data (frequency of
words) - Co-occurrence of keywords similarity of
documents - Hierarchical cluster analysis, multidimensional
scaling, proximity plots - Keywords x numeric or categorical variables
- Crosstab (with statistical test), bar charts,
line charts, heatmaps, correspondence analysis - Automatic classification of documents
- Machine Learning algorithms (Naïve Bayes
Nearest Neighbors)
29SIMSTAT (1989) Statistical Analysis
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36- More Information Retrieval Text Mining tools
in QDA Miner - Benefits for qualitative analysis
- Provide assistance to human coders
- Assess the reliability of coding made by a single
coder - Identify typical and atypical examples
- Gradually move from manual to automatic coding
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Information Retrieval Models
Boolean search ( AND, OR, NOT )
Similarity search
42- QUERY by EXAMPLE
- Find sentence or paragraphs similar to a given
example - Starting example can be
- Typed or selected text
- Text segments associated with a code
- Relevance feedback mechanism to improve search
results
43- Fuzzy string matching
- Resistant to misspelling
- Matches related words
- Example resistant to misspelling
- resistent to mispeling
- resist to misspelled words
- resist to words not correctly spelled
44Automatic Document Classification
45(No Transcript)
46(No Transcript)
47- CLIENT Federal Aviation Administration (FAA)
JetBlue Airline - PRODUCTS WordStat SimStat
- APPLICATION Knowledge discovery
- Identification of human errors in flight
irregularity reports. - Comparison of collision risks at different
airports from text analysis of TCAS reports. - Development of a taxonomy of aviation safety
terms.
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52- CLIENT CISCO Systems Inc (Product Marketing
Department) - PRODUCTS WordStat, Simstat QDA Miner
- APPLICATION Market Research
- Analysis of the impact of media campaigns (CATA
of exchanges on internet forums). - Analysis of consumer satisfaction related to
numerous products and services.
53Reactions to the launching of CRS-1
54- CLIENT US Office of Personnel Management
- PRODUCTS WordStat SimStat
- APPLICATIONS Test item analysis, job analysis,
surveys - Identification gender or racial bias,
inappropriate language, vagueness of
instructions, in test items. - Assessment of skills and competencies from
critical incidents for job analysis. - Analysis of answers to open-ended questions
55- CLIENT The Planning Commission Hillsborough
County (Florida) - PRODUCTS WordStat SimStat
- APPLICATION Thematic analysis of consultations
for urban planning. - Content analysis of
- About 3000 comments from citizens.
- Transcripts of community meetings and public
hearings. - Identification of major local issues and concerns
for different communities.
56- CLIENT Political Science University of
Michigan Princeton University - PRODUCTS WordStat QDA Miner
- APPLICATION Thematic content analysis of
judicial briefs political speeches - Judicial identification of differences in style
of argumentation of - Groups in favor of affirmative action programs
- Groups opposed to affirmative action programs
- Russias response to the introduction and use of
American Forces in Central Asia after September
11.
57- Analysis of responses to open-ended questions
- Summarize interview or focus-group transcripts
- Identification of patterns of vocabulary usage
- Literature profiling in genome research
- Identification of trends in historical archives
- Optimize library searches
- Determine authorship of disputed documents
- Detect psychological disorders or processes
- Analysis of food and drugs interactions
- Fraud detection
58- WANT MORE INFORMATION? A FREE DEMONSTRATION? A
TRIAL VERSIONS? - Meet me at our exhibit booth
- Email me at peladeau_at_provalisresearch.com
- 3. Visit our web site WWW.PROVALISRESEARCH
.COM - 4. Schedule a free demo with me at your work
place (in London until October 4th) or over the
web.