To be able to run the SOM algorithm, you have to load the package called 
SOMbrero. The function used to run it is called trainSOM() and is 
detailed below.
This documentation only considers the case of dissimilarity matrices.
The trainSOM function has several arguments, but only the first one is
required. This argument is x.data which is the dataset used to train the 
SOM. In this documentation, it is passed to the function as a matrix or a data
frame. This set must be a dissimilarity matrix, i.e., a symmetric matrix of
positive numbers, with zero entries on the diagonal.
The other arguments are the same as the arguments passed to the initSOM
function (they are parameters defining the algorithm, see help(initSOM)
for further details).
The trainSOM function returns an object of class somRes (see 
help(trainSOM) for further details on this class).
The following table indicates which graphics are available for a relational SOM.
| Type | Energy | Obs | Prototypes | Add | Super Cluster | 
|---|---|---|---|---|---|
| no type | x | ||||
| hitmap | x | x | |||
| color | x | ||||
| lines | x | x | x2 | ||
| barplot | x | x | x2 | ||
| radar | x | x | x2 | ||
| pie | x | x2 | |||
| boxplot | x | ||||
| 3d | |||||
| poly.dist | x | x | |||
| umatrix | x | ||||
| smooth.dist | x | ||||
| words | x | ||||
| names | x | x | |||
| graph | x | x | |||
| mds | x | x | |||
| grid.dist | x | ||||
| grid | x | ||||
| dendrogram | x | ||||
| dendro3d | x | 
In the “Super Cluster” column, a plot marked by “x2” means it is available for both data set variables and additional variables.
lesmis data setThe lesmis data set provides the coappearance graph of the characters of 
the novel Les Miserables (Victor Hugo). Each vertex stands for a character whose
name is given by the vertex label. One edge means that the corresponding two
characters appear in a common chapter in the book. Each edge also has a value
indicating the number of coappearances. The lesmis data contain two
objects: the first one lesmis is an igraph object (see the igraph
web page),
with 77 nodes and 254 edges.
Further information on this data set is provided with help(lesmis).
data(lesmis)
lesmis
## IGRAPH 3babff7 U--- 77 254 -- 
## + attr: layout (g/n), id (v/n), label (v/c), value (e/n)
## + edges from 3babff7:
##  [1]  1-- 2  1-- 3  1-- 4  3-- 4  1-- 5  1-- 6  1-- 7  1-- 8  1-- 9  1--10
## [11] 11--12  4--12  3--12  1--12 12--13 12--14 12--15 12--16 17--18 17--19
## [21] 18--19 17--20 18--20 19--20 17--21 18--21 19--21 20--21 17--22 18--22
## [31] 19--22 20--22 21--22 17--23 18--23 19--23 20--23 21--23 22--23 17--24
## [41] 18--24 19--24 20--24 21--24 22--24 23--24 13--24 12--24 24--25 12--25
## [51] 25--26 24--26 12--26 25--27 12--27 17--27 26--27 12--28 24--28 26--28
## [61] 25--28 27--28 12--29 28--29 24--30 28--30 12--30 24--31 31--32 12--32
## [71] 24--32 28--32 12--33 12--34 28--34 12--35 30--35 12--36 35--36 30--36
## + ... omitted several edges
plot(lesmis, vertex.size=0)
The dissim.lesmis object is a matrix with entries equal to the length of 
the shortest path between two characters (obtained with the function
shortest.paths of package igraph). Note that its row and column
names have been initialized to the characters' names to ease the use of the 
graphical functions of SOMbrero.
set.seed(622)
mis.som <- trainSOM(x.data=dissim.lesmis, type="relational", nb.save=10,
                   init.proto="random", radius.type="letremy")
plot(mis.som, what="energy")
The dissimilarity matrix dissim.lesmis is passed to the trainSOM 
function as input. As the SOM intermediate backups have been registered
(nb.save=10), the energy evolution can be plotted: it stabilized in the
last 100 iterations.
The clustering component provides the classification of each of the 77 
characters. The table function is a simple way to view data distribution 
on the map.
mis.som$clustering
##           Myriel         Napoleon   MlleBaptistine      MmeMagloire 
##                5                5                4                4 
##     CountessDeLo         Geborand     Champtercier         Cravatte 
##                5                5                5                5 
##            Count           OldMan          Labarre          Valjean 
##                5                5                2                2 
##       Marguerite           MmeDeR          Isabeau          Gervais 
##                2                6                1                7 
##        Tholomyes        Listolier          Fameuil      Blacheville 
##               21               21               21               21 
##        Favourite           Dahlia          Zephine          Fantine 
##               21               21               21               22 
##    MmeThenardier       Thenardier          Cosette           Javert 
##               18               23               13               17 
##     Fauchelevent       Bamatabois         Perpetue         Simplice 
##                1               11               22               17 
##      Scaufflaire           Woman1            Judge     Champmathieu 
##                3                1               11               11 
##           Brevet       Chenildieu      Cochepaille        Pontmercy 
##               11               11               11               19 
##     Boulatruelle          Eponine          Anzelma           Woman2 
##               23               23               23                3 
##   MotherInnocent          Gribier        Jondrette        MmeBurgon 
##                1                1               15               15 
##         Gavroche     Gillenormand           Magnon MlleGillenormand 
##               20               13               18               13 
##     MmePontmercy      MlleVaubois   LtGillenormand           Marius 
##               13               13               13               19 
##        BaronessT           Mabeuf         Enjolras       Combeferre 
##               19               25               25               25 
##        Prouvaire          Feuilly       Courfeyrac          Bahorel 
##               25               25               25               25 
##          Bossuet             Joly        Grantaire   MotherPlutarch 
##               25               25               20               25 
##        Gueulemer            Babet       Claquesous     Montparnasse 
##               23               23               23               23 
##        Toussaint           Child1           Child2           Brujon 
##                7               15               15               23 
##     MmeHucheloup 
##               20
table(mis.som$clustering)
## 
##  1  2  3  4  5  6  7 11 13 15 17 18 19 20 21 22 23 25 
##  5  3  2  2  8  1  2  6  6  4  2  2  3  3  7  2  9 10
plot(mis.som)
The clustering can be displayed using the plot function
with type=names.
plot(mis.som, what="obs", type="names")
or by sur-imposing the original igraph object on the map:
plot(mis.som, what="add", type="graph", var=lesmis)
Clusters profile overviews can be plotted either with e.g., lines or radar.
plot(mis.som, what="prototypes", type="lines")
plot(mis.som, what="prototypes", type="radar")
On these graphics, one variable is represented respectively with a point or a slice. It is therefore easy to see which variable affects which cluster.
To see how different the clusters are, some graphics show the distances between prototypes. These graphics have exactly the same behaviour as in the other SOM types.
"poly.dist" represents the distances between neighboring prototypes with
polygons plotted for each cell of the grid. The smaller the distance between 
a polygon's vertex and a cell border, the closer the pair of prototypes.
The colors indicates the number of observations in the neuron (white is used
for empty neurons);
"umatrix" fills the neurons of the grid using colors that represent
the average distance between the current prototype and its neighbors;
"smooth.dist" plots the mean distance between the current prototype and 
its neighbors with a color gradation;
"mds" plots the number of the neuron on a map according to a Multi
Dimensional Scaling (MDS) projection;
"grid.dist" plots a point for each pair of prototypes, with x 
coordinates representing the distance between the prototypes in the 
input space, and y coordinates representing the distance between the 
corresponding neurons on the grid.
plot(mis.som, what="prototypes", type="poly.dist", print.title=TRUE)
plot(mis.som, what="prototypes", type="smooth.dist")
plot(mis.som, what="prototypes", type="umatrix", print.title=TRUE)
plot(mis.som, what="prototypes", type="mds")
plot(mis.som, what="prototypes", type="grid.dist")
Here we can see that the prototypes located in the top left corner of the map (e.g., clusters 4 and 5) are far from the others.
Finally, with a graphical overview of the clustering
plot(lesmis, vertex.label.color=rainbow(25)[mis.som$clustering], vertex.size=0)
legend(x="left", legend=1:25, col=rainbow(25), pch=19)
We can see that cluster 5 is very relevant to the story: as the characters of
this cluster appear only in the sub-story of the Bishop Myriel, he is the
only connection for all other characters of cluster 5. The same kind of
conclusion holds for cluster 11, among others. Most of the other clusters have a
small number of observations: it thus seems relevant to compute super clusters.
As the number of clusters is quite important with the SOM algorithm, it is possible to perform a hierarchical clustering. First, let us have an overview of the dendrogram:
plot(superClass(mis.som))
## Warning in plot.somSC(superClass(mis.som)): Impossible to plot the rectangles: no super clusters.
According to the proportion of variance explained by super clusters, 5 groups seem to be a good choice.
sc.mis <- superClass(mis.som, k=5)
summary(sc.mis)
## 
##    SOM Super Classes
##      Initial number of clusters :  25 
##      Number of super clusters   :  5 
## 
## 
##   Frequency table
## 1 2 3 4 5 
## 9 2 4 6 4 
## 
##   Clustering
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
##  1  1  1  2  2  1  1  1  1  3  1  1  4  4  3  5  5  4  4  3  5  5  4  4  3 
## 
## 
##   ANOVA
##          F                       :  10.09429 
##          Degrees of freedom      :  4 
##          p-value                 :  1.526375e-06 
##                  significativity :  ***
table(sc.mis$cluster)
## 
## 1 2 3 4 5 
## 9 2 4 6 4
plot(sc.mis)
plot(sc.mis, type="grid", plot.legend=TRUE)
plot(sc.mis, type="lines", print.title=TRUE)
plot(sc.mis, type="mds", plot.legend=TRUE)
plot(sc.mis, type="dendro3d")
library(RColorBrewer)
plot(lesmis, vertex.size=0, vertex.label.color=
       brewer.pal(6, "Set2")[sc.mis$cluster[mis.som$clustering]])
legend(x="left", legend=paste("SC",1:5), col=brewer.pal(5, "Set2"), pch=19)
cluster 1 constains Valjean which has a central position in the MDS
visualization;
cluster 2 contains Myriel and the characters involved in his 
sub-story;
cluster 3 contains Gavroche, the abandonned child of the
Thenardier, and the characters of his sub-story;
cluster 4 countains the Thenardier family: mister and misses Thenardier,
their daughter Eponine and also the characters involved in their story. It
also contains Javert who is seeking to find the main character of the 
story, Valjean, and Marius. All these characters are strongly 
related to Cosette who is Marius's lover and has lived in the
Thenardier family during her childhood;
cluster 5 countains Fantine and the characters involved in her 
sub-story.
SOMbrero also contains functions to compute a projected graph based on the super-clusters and to display it:
projectIGraph(sc.mis, lesmis)
## IGRAPH aedb8ca UNW- 5 7 -- 
## + attr: layout (g/n), name (v/c), size (v/n), weight (e/n)
## + edges from aedb8ca (vertex names):
## [1] 1--2 1--3 1--4 1--5 3--4 3--5 4--5
par(mar=rep(0,4))
plot(sc.mis, type="projgraph", variable=lesmis, s.radius=2)