Toward an integration of qualitative and quantitative text analysis methods - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Toward an integration of qualitative and quantitative text analysis methods

Description:

... using a user defined dictionary (or taxonomy) Categorization Dictionary. Dimension Reduction. Semantic Categorization ... (CATA of exchanges on internet forums) ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Toward an integration of qualitative and quantitative text analysis methods


1
  • Toward an integration of qualitative and
    quantitative text analysis methods
  • Words instead of Numbers
  • Words as Numbers
  • Words and Numbers

Normand Péladeau, President peladeau_at_provalisresea
rch.com Provalis Research
2
NUMERICAL DATA
TEXTUALDATA
3
Qualitative Analysis
ContentAnalysis
TextMining
Validity
Minutes Hours Days
Days Weeks Months
Time Requirements
4
ContentAnalysis
TextMining
Qualitative Analysis
Reliability
Minutes Hours Days
Days Weeks Months
Time Requirements
5
SIMSTAT (1989) Statistical Analysis
6
(No Transcript)
7
(No Transcript)
8
SIMSTAT (1989) Statistical Analysis
9
(No Transcript)
10
THREE MAJOR OBSTACLES 1) Large number of word
forms 2) Polymorphy of language One idea ?
multiple forms 3) Polysemy of words One
word ? many ideas
11
  • List of most frequent words
  • Extraction of common phrases and technical
    vocabulary
  • Categorization of word and phrases using a user
    defined dictionary (or taxonomy)

12
(No Transcript)
13
  • Thesaurus and semantic database (WordNet)
  • Keyword in context list (KWIC)

14
Keyword in Context List (KWIC)
Senses of word stress 1 (psychology) a
state of mental or emotional strain or suspense
2 (physics) force that produces strain on a
physical body 3 Verb - single out as important
15
Keyword in Context List (KWIC)
Disambiguation using phrases STRESS_THE or
STRESS_THAT ? single out as
important UNDER_STRESS ? Emotional State
16
Keyword in context list (KWIC)
Disambiguation using rules TRANSFER IS NEAR
TECHNOLOGY TRANSFER IS NOT NEAR
BUS UNSATISFIED OR (SATISFIED IS AFTER
NÉGATION)
17
(No Transcript)
18
  • Cluster analysis of words or documents
  • Automatic classification of documents using
    machine learning algorithms

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Automatic Classification of Documents
24
  • Cluster analysis of words or documents
  • Automatic classification of documents using
    machine learning algorithms
  • Statistical reduction of words x documents matrix
    (SVD, factor analysis, PCA)

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
  • Frequency Analysis (words, phrases, categories)
  • Univariate analysis of frequency
  • Comparison with normative data (frequency of
    words)
  • Co-occurrence of keywords similarity of
    documents
  • Hierarchical cluster analysis, multidimensional
    scaling, proximity plots
  • Keywords x numeric or categorical variables
  • Crosstab (with statistical test), bar charts,
    line charts, heatmaps, correspondence analysis
  • Automatic classification of documents
  • Machine Learning algorithms (Naïve Bayes
    Nearest Neighbors)

29
SIMSTAT (1989) Statistical Analysis
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
  • More Information Retrieval Text Mining tools
    in QDA Miner
  • Benefits for qualitative analysis
  • Provide assistance to human coders
  • Assess the reliability of coding made by a single
    coder
  • Identify typical and atypical examples
  • Gradually move from manual to automatic coding

37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Information Retrieval Models
Boolean search ( AND, OR, NOT )
Similarity search
42
  • QUERY by EXAMPLE
  • Find sentence or paragraphs similar to a given
    example
  • Starting example can be
  • Typed or selected text
  • Text segments associated with a code
  • Relevance feedback mechanism to improve search
    results

43
  • Fuzzy string matching
  • Resistant to misspelling
  • Matches related words
  • Example resistant to misspelling
  • resistent to mispeling
  • resist to misspelled words
  • resist to words not correctly spelled

44
Automatic Document Classification
45
(No Transcript)
46
(No Transcript)
47
  • CLIENT Federal Aviation Administration (FAA)
    JetBlue Airline
  • PRODUCTS WordStat SimStat
  • APPLICATION Knowledge discovery
  • Identification of human errors in flight
    irregularity reports.
  • Comparison of collision risks at different
    airports from text analysis of TCAS reports.
  • Development of a taxonomy of aviation safety
    terms.

48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
  • CLIENT CISCO Systems Inc (Product Marketing
    Department)
  • PRODUCTS WordStat, Simstat QDA Miner
  • APPLICATION Market Research
  • Analysis of the impact of media campaigns (CATA
    of exchanges on internet forums).
  • Analysis of consumer satisfaction related to
    numerous products and services.

53
Reactions to the launching of CRS-1
54
  • CLIENT US Office of Personnel Management
  • PRODUCTS WordStat SimStat
  • APPLICATIONS Test item analysis, job analysis,
    surveys
  • Identification gender or racial bias,
    inappropriate language, vagueness of
    instructions, in test items.
  • Assessment of skills and competencies from
    critical incidents for job analysis.
  • Analysis of answers to open-ended questions

55
  • CLIENT The Planning Commission Hillsborough
    County (Florida)
  • PRODUCTS WordStat SimStat
  • APPLICATION Thematic analysis of consultations
    for urban planning.
  • Content analysis of
  • About 3000 comments from citizens.
  • Transcripts of community meetings and public
    hearings.
  • Identification of major local issues and concerns
    for different communities.

56
  • CLIENT Political Science University of
    Michigan Princeton University
  • PRODUCTS WordStat QDA Miner
  • APPLICATION Thematic content analysis of
    judicial briefs political speeches
  • Judicial identification of differences in style
    of argumentation of
  • Groups in favor of affirmative action programs
  • Groups opposed to affirmative action programs
  • Russias response to the introduction and use of
    American Forces in Central Asia after September
    11.

57
  • Analysis of responses to open-ended questions
  • Summarize interview or focus-group transcripts
  • Identification of patterns of vocabulary usage
  • Literature profiling in genome research
  • Identification of trends in historical archives
  • Optimize library searches
  • Determine authorship of disputed documents
  • Detect psychological disorders or processes
  • Analysis of food and drugs interactions
  • Fraud detection

58
  • WANT MORE INFORMATION? A FREE DEMONSTRATION? A
    TRIAL VERSIONS?
  • Meet me at our exhibit booth
  • Email me at peladeau_at_provalisresearch.com
  • 3. Visit our web site WWW.PROVALISRESEARCH
    .COM
  • 4. Schedule a free demo with me at your work
    place (in London until October 4th) or over the
    web.
Write a Comment
User Comments (0)
About PowerShow.com