Title: CATPAC
1CATPAC WordStat
- Anne D. Sito
-
- Erin Sonenstein
- COM 633 FA 09
2CATPAC
3Overview of CATPAC
- Designed to recognize frequently used words in
text - Identifies and groups patterns of similar words
- Provides output of clustering algorithms,
perceptual maps, and interactive clustering
4Data Preparation Text
51. Convert document into .txt file
62. Inputting Data
73. Select Text File You Want to Analyze
84. Select Make Dendrogram
95. Initial Output Screen
106. Output Data Screen
117. Output Dendrogram
128. Data Presented in ThoughtView 2D
139. Data Presented in ThoughtView 3D
1410. Thought View 3D (Rotated)
15Discussion and Limitations
- s
- Found words like you, youll, and to be
the most used in this text. - Examines relationships between words based on
proximity in the text. - -s
- Words are measured based on frequency, not
importance. - Focuses less on what words mean or how they fit
together based on dictionaries.
16WordStat http//www.provalisresearch.com/wordsta
t/wordstat.html
17Overview of WordStat
- Content Analysis Module for SIMSTAT
- Specifically designed to process textual
information geared for open-ended data which
includes journal articles, speeches, electronic
communication, interviews, etc. - Has existing dictionary library and can also run
analyses from new dictionaries built by the user - Can perform statistical analyses (i.e., factor
analysis, word frequencies, multiple regression,
etc.) - KWIC Key Word In Context tables are available
for any included or not included word or word
pattern
18Data Comparing Reviews of the Book on Amazon.com
Between Men and Women
191. Create a Text File
202. Input Text File to WordStat
213. Define Your Variables
224. Running the Analysis
235. Existing Dictionary Was Not Relevant for
Our Data
246. New Dictionary Available Online!
257. (Free) New Dictionary Download
268. Import New Dictionary Maintain Exclusion List
279. Level 1 Analysis
2810. Level 2 Analysis
2911. Overall Frequencies
3012. Gender Differences
3113. Dendrogram
3214. Clustering
3315. 3-D Figure of Output
3416. Concurrence Matrix
3517. KWIC by Gender
3618. Words by each Text Case
3719. Word Count Category Frequency
3820. Aggression Example
3921. Limitations TerrificAnxiety?
40Discussion Limitations
- Allows multiple independent variables
- Dictionaries may not always be complete
- Words in .txt file must be be spelled correctly
- Could not distinguish between quotes from the
book and original thoughts - May not account for different usage of certain
words, (e.g., combating, terrific)
41Any Questions? Thank You!