RからWekaのSimpleKMeansを使う

R Advent Calendar 2011 の21日目担当の @gepuro です。

R言語から、Javaで記述されているWekaというデータマイニングツールを使用する方法を紹介します。ここでは、WekaにあるSimpleKMeansを使ってみます。

> help(SimpleKmeans)

を参考にして、

> install.package("RWeka")
> library("RWeka")
> data(iris)
> cl1 <- SimpleKMeans(iris[, -5], Weka_control(N = 3))

> cl1



kMeans

======



Number of iterations: 6

Within cluster sum of squared errors: 6.982216473785234

Missing values globally replaced with mean/mode



Cluster centroids:

                            Cluster#

Attribute       Full Data          0          1          2

                    (150)       (61)       (50)       (39)

==========================================================

Sepal.Length       5.8433     5.8885      5.006     6.8462

Sepal.Width        3.0573     2.7377      3.428     3.0821

Petal.Length        3.758     4.3967      1.462     5.7026

Petal.Width        1.1993      1.418      0.246     2.0795







> table(predict(cl1), iris$Species)

   

    setosa versicolor virginica

  0      0         47        14

  1     50          0         0

  2      0          3        36

このようにして、RからWekaを使うことができます。

WekaのXMeansを使おうとしたのですが、
よく分かりませんでした。orz