Title: Part.I Data Analysis
1Part.I Data Analysis
ISAGA2003 114 Applying Data Mining to Video
Game Clustering Based on the Data of the
Internet Survey
- Tetsuya Onoda, Yuka Nakano, and Daiki Arai
- Keio University, Kenji Kumasaka Laboratory
2- ----- CONTENTS ----------
- Why we need Internet Survey
- The method of Data Cleaning
- The method of Layer Division
- The method of Clustering items
3- ----- CONTENTS ----------
- Why we need Internet Survey
- The method of Data Cleaning
- The method of Layer Division
- The method of Clustering items
41. Why we need Internet Survey
The Two Major Approaches of Cultural Phenomenon
Analysis
Objective approach Subjective approach
51. Why we need Internet Survey
Objective approach
The Annual Video Game Industry Report and so on.
The SALES is the main theme.
61. Why we need Internet Survey
Subjective approach
Subculture Criticism and so on.
Contents analysis, Comment, and Experiences
71. Why we need Internet Survey
The Two Major Approaches of Cultural Phenomenon
Analysis
Objective approach Subjective approach
Inter-Subjective approach
81. Why we need Internet Survey
Inter-Subjective approach
Community(User)
Industry(Company)
Media Communication
Change of the MARKET
Change of the LIFESYTLE
91. Why we need Internet Survey
- As investigation which can investigate
-
- the diversity of works
- the diversity of players
- Internet Survey is needed.
101. Why we need Internet Survey
Industry side
Industry(Company)
Analysis of a Purchase history
http//www.amazon.co.jp/
111. Why we need Internet Survey
Community side
Community(User)
Analysis of BBS or Weblog
http//www.kanshin.com/
12One of the best research in analyzing the
structure and the function of MEDIA COMMUNICATION
Community(User)
Industry(Company)
is iMap.gr.jp
13(No Transcript)
141. Why we need Internet Survey
Game titles for analysis
Popularity
151. Why we need Internet Survey
Game titles for analysis
Conventional Analysis
Like ? Dislike
Already Played ? Not Played yet
I have ? I dont have
16Everyone know them. Therefore we can't grasp the
diversity.
Image data from amazon.co.jp (the following
images as well).
171. Why we need Internet Survey
Game titles for analysis
All 963 items
iMap Game Analysis
Click or NULL
He/She knows or He/She may not know
18A LIFESTYLE is recollected although everyone
does not know them.
19- ----- CONTENTS ----------
- Why we need Internet Survey
- The method of Data Cleaning
- The method of Layer Division
- The method of Clustering items
202. The method of Data Cleaning
In Internet Survey, the noise of data is
indispensable. Therefore, it is necessary to
rectify data before going into data
analysis. iMap has two type of noises mainly.
- Order Effect
- Apparent Degree of Support
212. The method of Data Cleaning
Noise1. Order Effect
22The "order effect" is the problem that clicked
items gather in a specific position cause of
display order.
23We can say it is like ABC in Japanese Alphabet
242. The method of Data Cleaning
For example, Genre MANGA
Adachi, Mitsuru
Toriyama, Akira
Akimoto, Osamu
Fujiko, Fujio, F
Araki, Hirohiko
A is the top of Alphabet in Japan,too.
252. The method of Data Cleaning
For example, Genre MUSIC
It influences not only in ranking but also in
the distance between items.
262. The method of Data Cleaning
Solution of the Order effect"
272. The method of Data Cleaning
We evaluated the Order effect"
The distribution of average value.
282. The method of Data Cleaning
We evaluated the Order effect"
The distribution of Standard Division. The user
whose SD 0 should be deleted .
292. The method of Data Cleaning
We evaluated the Order effect"
It is approximated to a normal distribution. 95
confidence interval is applied here, and the
user data in a rejection region is deleted.
302. The method of Data Cleaning
We solved the Order effect"
The distribution of average value After Data
Cleaning.
312. The method of Data Cleaning
Noise2. Apparent Degree of Support
322. The method of Data Cleaning
Space invaders is supported by the older
generation, isnt it?
Male182 Female76 Total258
BirthYear
Female
Male
332. The method of Data Cleaning
The rate of clicks is influenced by deviation of
the user number.
Male8675 Female6745 Total15420
Male182 Female76 Total258
BirthYear
BirthYear
Male
Male
Female
Female
Space invaders
User Number
342. The method of Data Cleaning
"The support rate is divided by "the user rate"
for each generation
Male8675 Female6745 Total15420
Male182 Female76 Total258
BirthYear
BirthYear
Male
Male
Female
Female
Space invaders
User Number
352. The method of Data Cleaning
It's reasonable!!
Specialization coefficient
BirthYear
Female
Male
36The shifting of ranking top20 with Data Cleaning
The items like "ice climber Aisu-kuraima"
settled in a reasonable position. The globally
recognized items like "Pokemon" appeared in a
higher rank.
37- ----- CONTENTS ----------
- Why we need Internet Survey
- The method of Data Cleaning
- The method of Layer Division
- The method of Clustering items
383. The method of Layer Division
If there are three items, strength of the
relation is...
393. The method of Layer Division
Considered ordinarily...
KOEI, Historical SLG, and the stage in Japan.
403. The method of Layer Division
However, in the data
-
Everyone knows, Popular, Majority
413. The method of Layer Division
Nobunagas Ambition from Meiji Restoration is
high, but Meiji Restoration from Nobunagas
Ambition is low.
Support56
Support18
Confidence (?)94 (?)39
423. The method of Layer Division
Each direction between Nobunagas Ambition and
Final Fantasy is high.
Support78
Support56
Confidence (?)92 (?)69
433. The method of Layer Division
Strength of the relation between items
94
92
39
60
Therefore, Nobunagas Ambition and Final
Fantasy are tend to be connected.
443. The method of Layer Division
Game titles for analysis
All 963 items
L1
We assume a beautiful pyramid class, but should
not divide freely. We need the quantitative
method.
L2
L3
L4
L5
L6
453. The method of Layer Division
Real Number Graph (Support Average)
463. The method of Layer Division
A popularity vote shows such an Acute Shape
generally.
- Sport team
- Music Artist
- Novelist
- Comic
- Movie
- Game
- and so on
473. The method of Layer Division
Therefore we took the logarithm.
If the common logarithms which use 10 as a bottom
are taken for an example 10000 ? 10 ? 4
1000 ? 10 ? 3 100 ? 10 ? 2
10 ? 10 ? 1
4
3
2
1
483. The method of Layer Division
Natural Logarithm Graph (Support Average)
493. The method of Layer Division
The method of Layer Division
The maximum degree of average support is 38.632
of "Dragon Warrior. The logarithm of 38.632 is
3.654. 3.654 is divided by 6 is 0.6090. If
turned off for every 0.6090 from the top, Six
layers will be made.
503. The method of Layer Division
Game titles for analysis
All 963 items
Layer1 From Dragon Warrior To Pokemon 9
items
L1
L2
L3
L4
L5
L6
513. The method of Layer Division
Game titles for analysis
All 963 items
Layer2 From Pac-Man To The Legend Of
Valkyrie 54 items
L1
L2
L3
L4
L5
L6
523. The method of Layer Division
Game titles for analysis
All 963 items
Layer3 From R-TYPE To Soccer 128 items
L1
L2
L3
L4
L5
L6
533. The method of Layer Division
Game titles for analysis
All 963 items
Layer4 From Famicom Wars To ???????? ???????
? (Real Sound) 202 items
L1
L2
L3
L4
L5
L6
543. The method of Layer Division
Game titles for analysis
All 963 items
Layer5 From ?????? (Eternal Bonds)
To Psychic War 210 items
L1
L2
L3
L4
L5
L6
553. The method of Layer Division
Game titles for analysis
All 963 items
Layer6 From BLOOD THE LAST VAMPIRE 360
items
L1
L2
L3
L4
L5
L6
The Layer6 is excepted from analysis. Because it
is still unstable.
563. The method of Layer Division
Game titles for analysis
All 963 items
The item for analysis From L1 Dragon Warrior
To L5 Psychic War 603 items
L1
L2
L3
L4
L5
L6
573. The method of Layer Division
Game titles for analysis
603 items
The major titles supported by all users
L1
L2
L3
L4
L5
L6
583. The method of Layer Division
Game titles for analysis
603 items
L1
L2
L3
L4
The minor titles supported by maniac users
L5
L6
593. The method of Layer Division
Game titles for analysis
603 items
L1
The hint for making game culture rich is hidden
in Layer3.
L2
L3
L4
L5
L6
60- ----- CONTENTS ----------
- Why we need Internet Survey
- The method of Data Cleaning
- The method of Layer Division
- The method of Clustering items
614. The method of Clustering items
Game titles for analysis
603 items
L1
603 items (Too huge size) Clustering items is
needed. To grasp their characteristics more
effectively.
L2
L3
L4
L5
L6
624. The method of Clustering items
Kohonen Network (one of the neural network
technology) is used for CLUSTERING. The input
data is a matrix table which crosses ITEMs and
USERs. Those cells are consisted of TRUE or
FALSE. The case TRUE, it means that the user
clicked the item.
634. The method of Clustering items
The Basis of a Kohonen network
- Dividing Fields
- Seeding by the random number
- Competitive Learning
- The output of a Self-Organization-Map
644. The method of Clustering items
1. A field division is fixed.
654. The method of Clustering items
2. Seeds (items) are scattered.
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
664. The method of Clustering items
3. Congenial items connect, but uncongenial
items oppose each other.
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
674. The method of Clustering items
4. They settle in the stable place finally.
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
684. The method of Clustering items
It is the limit of this technique that the first
seeding influences the last result.
694. The method of Clustering items
Fortunately, congenial items should be planted on
near.
704. The method of Clustering items
Also there is a possibility that they will not
meet eternally if seeded in the distance.
714. The method of Clustering items
Score Between Items" which solves accidental
problems
72Seeding which has another random number is
repeated ten times.
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
10
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
734. The method of Clustering items
The position relation between all items is
converted into a score.
744. The method of Clustering items
Consequently, all items take the score between
the other items in a layer with full
marks(100point).
754. The method of Clustering items
Clustering in a Layer Layer1
L1
L2
L3
L4
L5
L6
764. The method of Clustering items
They are all too popular to be connected.
774. The method of Clustering items
Therefore 9 items ? 9 clusters
L1
L2
L3
L4
L5
L6
784. The method of Clustering items
Clustering in a Layer Layer2
L1
L2
L3
L4
L5
L6
794. The method of Clustering items
We tied up only items which are related
intensely.
804. The method of Clustering items
Therefore 54 items ? 30 clusters
L1
L2
L3
L4
L5
L6
814. The method of Clustering items
Clustering in a Layer Layer3
L1
L2
L3
L4
L5
L6
824. The method of Clustering items
2.5 level of higher ranks
834. The method of Clustering items
Therefore 128 items ? 39 clusters
L1
L2
L3
L4
L5
L6
844. The method of Clustering items
Clustering in a Layer Layer4 Layer5
L1
L2
L3
L4
L5
L6
854. The method of Clustering items
5.0 level of higher ranks
864. The method of Clustering items
202 items ? 58 clusters
L1
L2
L3
L4
L5
L6
Those who would like to know this method in more
detail, please refer to http//web.sfc.keio.ac.jp
/ond/ISAGA2003/.
874. The method of Clustering items
210 items ? 50 clusters
L1
L2
L3
L4
L5
L6
Those who would like to know this method in more
detail, please refer to http//web.sfc.keio.ac.jp
/ond/ISAGA2003/.
884. The method of Clustering items
Clustering not only cuts down a number, but
abolishes the difference between layers. It
means Super Flat
89L3
L4
L2
L4
L1
L3
L4
L3
L2
L4
L3
L1
L4
L2
L4
L3
L4
L1
L3
L4
L2
L3
L4
90L4
L4
L3
L2
L4
L1
L3
L4
L3
L2
L4
L3
L4
L2
L1
L3
L4
L2
L3
L4
L4
L1
L2
L3
L3
L4
L4
L2
L3
L4
914. The method of Clustering items
The Shifting of the degree of support
In the case of item L1(9) gt L2(54) gt L3(128) gt
L4(202)
In the case of cluster L1(9) ? L2(39) ? L3(30) ?
L4(58)
924. The method of Clustering items
Super Flat
which enable us to draw a Self-Organization-Map
beyond the difference of layers.
934. The method of Clustering items
Media Map 40x30 SOMs are drawn many times. We
adapted the map which has greatest distribution
of the clusters positions.
944. The method of Clustering items
Kohonen between Layers from Layer1 to Layer3
L1
L2
L3
L4
L5
L6
954. The method of Clustering items
Major Map
964. The method of Clustering items
Kohonen between Layers Layer3 and Layer4
L1
L2
L3
L4
L5
L6
974. The method of Clustering items
Middle Map
984. The method of Clustering items
Kohonen between Layers Layer4 and Layer5
L1
L2
L3
L4
L5
L6
994. The method of Clustering items
Minor Map
100- ----- CONTENTS ----------
- Why we need Internet Survey
- The method of Data Cleaning
- The method of Layer Division
- The method of Clustering items
Finally we got three Media Maps.
101Part.I Data Analysis
ISAGA2003 114 Applying Data Mining to Video
Game Clustering Based on the Data of the
Internet Survey
- Tetsuya Onoda, Yuka Nakano, and Daiki Arai
- Keio University, Kenji Kumasaka Laboratory