To be able to run the SOM algorithm, you have to load the package called 
SOMbrero. The function used to run it is called trainSOM() and is 
detailed below.
This documentation only considers the case of contingency tables.
The trainSOM function has several arguments, but only the first one is
required. This argument is x.data which is the dataset used to train the 
SOM. In this documentation, it is passed to the function as a matrix or a data
frame. This set must be a contingency table, i.e., it must contain either 0 or 
positive integers. Column and row names must be supplied.
The other arguments are the same as the arguments passed to the initSOM
function (they are parameters defining the algorithm, see help(initSOM)
for further details).
The trainSOM function returns an object of class somRes (see 
help(trainSOM) for further details on this class).
presidentielles2002 data setThe presidentielles2002 data set provides the number of votes at the first
round of the 2002 French presidential election for each of the 16 candidates in
all of the 106 French administrative districts called “departements”. Further
details about this data set and the 2002 French presidential election are given
with help(presidentielles2002).
data(presidentielles2002)
apply(presidentielles2002, 2, sum)
##      MEGRET      LEPAGE  GLUCKSTEIN      BAYROU      CHIRAC      LE_PEN 
##      667043      535875      132696     1949219     5666021     4804772 
##     TAUBIRA SAINT_JOSSE      MAMERE      JOSPIN      BOUTIN         HUE 
##      660515     1204801     1495774     4610267      339157      960548 
## CHEVENEMENT     MADELIN   LAGUILLER  BESANCENOT 
##     1518568     1113551     1630118     1210562
(the two candidates that ran the second round of the election were Jacques Chirac and the far-right candidate Jean-Marie Le Pen)
set.seed(01091407)
korresp.som <- trainSOM(x.data=presidentielles2002, dimension=c(8,8),
                        type="korresp", scaling="chi2", nb.save=10,
                        radius.type="letremy")
korresp.som
##       Self-Organizing Map object...
##          online learning, type: korresp 
##          8 x 8 grid with square topology
##          neighbourhood type: letremy 
##          distance type: letremy
As the energy is registered during the intermediate backups, we can have a look at its evolution
plot(korresp.som, what="energy")
which is stabilized during the last 100 iterations.
The clustering component contains the final classification of the dataset. As both row and column variables are classified, the length of the resulting vector is equal to the sum of the number of rows and the number of columns.
NB: The clustering component shows first the column variables (here, the candidates) and then the row variables (here, the departements).
The following table indicates which graphics are available for a korresp SOM.
| Type | Energy | Obs | Prototypes | Add | Super Cluster | 
|---|---|---|---|---|---|
| no type | x | ||||
| hitmap | x | x | |||
| color | x2 | x2 | |||
| lines | x2 | x2 | |||
| barplot | x | ||||
| radar | x | ||||
| pie | |||||
| boxplot | |||||
| 3d | x2 | ||||
| poly.dist | x | x | |||
| umatrix | x | ||||
| smooth.dist | x | ||||
| words | |||||
| names | x | ||||
| graph | |||||
| mds | x | x | |||
| grid.dist | x | ||||
| grid | x | ||||
| dendrogram | x | ||||
| dendro3d | x | 
In the column “Prototypes”, a plot marked “x2” means that this plot is available for both row and column variables. In the “Super Cluster” column, a “x2” cell means the plot is available for both data set variables and additional variables.
korresp.som$clustering
##                   MEGRET                   LEPAGE               GLUCKSTEIN 
##                       15                       16                        8 
##                   BAYROU                   CHIRAC                   LE_PEN 
##                       32                       54                        9 
##                  TAUBIRA              SAINT_JOSSE                   MAMERE 
##                        6                       12                       22 
##                   JOSPIN                   BOUTIN                      HUE 
##                       57                       16                       21 
##              CHEVENEMENT                  MADELIN                LAGUILLER 
##                       22                       23                       21 
##               BESANCENOT                      ain                    aisne 
##                       21                       27                       18 
##                   allier  alpes_de_haute_provence             hautes_alpes 
##                       43                       33                       41 
##          alpes_maritimes                  ardeche                 ardennes 
##                        2                       35                       25 
##                   ariege                     aube                     aude 
##                       33                       25                       26 
##                  aveyron         bouches_du_rhone                 calvados 
##                       43                        3                       52 
##                   cantal                 charente        charente_maritime 
##                       42                       43                       52 
##                     cher                  correze                corse_sud 
##                       35                       51                       41 
##              haute_corse                cote_d'or            cotes_d'armor 
##                       41                       27                       44 
##                   creuse                 dordogne                    doubs 
##                       42                       43                       27 
##                    drome                     eure             eure_et_loir 
##                       26                       18                       26 
##                finistere                     gard            haute_garonne 
##                       61                       18                       28 
##                     gers                  gironde                  herault 
##                       34                        4                       19 
##          ille_et_vilaine                    indre          indre_et_loire_ 
##                       61                       34                       44 
##                    isere                     jura                   landes 
##                       28                       25                       43 
##             loir_et_cher                    loire              haute_loire 
##                       35                       27                       33 
##         loire_atlantique                   loiret                      lot 
##                       61                       27                       34 
##          lot_et_garonne_                   lozere          maine_et_loire_ 
##                       34                       41                       60 
##                   manche                    marne              haute_marne 
##                       52                       27                       33 
##                  mayenne       meurthe_et_moselle                    meuse 
##                       51                       27                       33 
##                 morbihan                  moselle                   nievre 
##                       52                        2                       33 
##                     nord                     oise                     orne 
##                       21                       27                       35 
##            pas_de_calais              puy_de_dome     pyrenees_atlantiques 
##                       13                       44                       52 
##          hautes_pyrenees      pyrenees_orientales                 bas_rhin 
##                       34                       26                       40 
##                haut_rhin                    rhone              haute_saone 
##                       27                       39                       25 
##          saone_et_loire_                   sarthe                   savoie 
##                       27                       44                       26 
##             haute_savoie                    paris          seine_maritime_ 
##                       27                       30                       13 
##          seine_et_marne_                 yvelines              deux_sevres 
##                       28                       48                       43 
##                    somme                     tarn          tarn_et_garonne 
##                       12                       43                       34 
##                      var                 vaucluse                   vendee 
##                        2                       18                       52 
##                   vienne             haute_vienne                   vosges 
##                       43                       43                       26 
##                    yonne    territoire_de_belfort                  essonne 
##                       26                       41                       28 
##          hauts_de_seine_        seine_saint-denis             val_de_marne 
##                       37                       28                       28 
##               val_d'oise               guadeloupe               martinique 
##                       28                        6                        6 
##                   guyane               la_reunion                  mayotte 
##                        6                       57                       41 
##       nouvelle_caledonie      polynesie_francaise saint_pierre_et_miquelon 
##                       50                       50                       41 
##         wallis_et_futuna   francais_de_l'etranger 
##                       41                       51
The resulting distribution of the clustering on the map can also be visualized by a hitmap:
plot(korresp.som, what="obs", type="hitmap")
For a more precise view, "names" plot is implemented: it prints, 
in each neuron, the names of the variables assigned to it ; in the korresp SOM, 
both row and column variable names are printed.
plot(korresp.som, what="obs", type="names", scale=c(0.9,0.5))
The map is divided into two main parts: minor candidates are classified at its top left hand side whereas the first main candidates CHIRAC, LE PEN and JOSPIN are classified at the bottom right hand side of the map, in three different parts of this corner. Some strinking facts are:
the position of TAUBIRA, opposed to that of JOSPIN from who she is very close from a political point of view. She is associated with Guyane, Guadeloupe and Martinique, overseas departements;
the position of HUE, LAGUILLER and BESANCENOT, far left candidates, all associated in the same cluster and close to traditional departements voting for far left candidates: Nord, Pas de Calais and Seine Maritime ;
the fact that most departement are classified at the bottom right hand side of the map with the three main candidates.
Some graphics from the numeric SOM algorithm are still available in the korresp 
case. They are detailed below. As the resulting clustering provides the 
classification for both rows and columns, a new argument view is used to 
specify which one should be considered. Its possible values are either 
"r" for row variables (the default value) or "c" for column 
variables.
Three representations are available:
view 
argument is used)# plot the line prototypes (106 French departements)
plot(korresp.som, what="prototypes", type="lines", view="r", print.title=TRUE)
# plot the column prototypes (16 candidates)
plot(korresp.som, what="prototypes", type="lines", view="c", print.title=TRUE)
The peaks in neurons 5 and 6 correspond, in the row view, to the overseas departements and, in the column view, to the candidate TAUBIRA. In the column views, the two peaks clearly identified in the bottom right side clusters correspond to the two “main” tranditional candidates JOSPIN and CHIRAC (respectively, left and right candidates).
A more precise individual view are given with the graphics “color” and “3d”, here drawn, as an example for the candidate “Le Pen” and for the departement “Martinique”.
variable) is represented on the map;"color".par(mfrow=c(1,2))
plot(korresp.som, what="prototypes", type="color", variable="TAUBIRA")
plot(korresp.som, what="prototypes", type="3d", variable="martinique")
The first graphic shows that TAUBIRA obtained her best scores in the departements located at the left hand side of the map. The second graphic shows that the candidates that obtained the higher scores in Martinique are located in the bottom right hand side of the map.
The graphics can also be drawn by giving the variable number and its type, either “r” or “c” (here, as an example, CHIRAC who is the 5th candidate):
par(mfrow=c(1,2))
plot(korresp.som, what="prototypes", type="color", variable=5, view="c")
plot(korresp.som, what="prototypes", type="3d", variable=5, view="c")
Hence CHIRAC obtained more votes in departement located at the top of the map and has his lowest scores in departements Jura, Aubde, Haute Saone, Ardennes, Alpes de Haute Provence, Ariege, … located at the bottom (middle) of the map.
These graphics are exactly the same as in the numerical case:
"poly.dist" represents the distances between neighboring prototypes with
polygons plotted for each cell of the grid. The smaller the distance between 
a polygon's vertex and a cell border, the closer the pair of prototypes.
The colors indicates the number of observations in the neuron (white=empty);
"umatrix" fills the neurons of the grid using colors that represent
the average distance between the current prototype and its neighbors (darker
colors for prototypes that are farther from their neighbors);
"smooth.dist" plots the mean distance between the current prototype and 
its neighbors with a color gradation;
"mds" plots the number of the neuron on a map according to a Multi
Dimensional Scaling (MDS) projection;
"grid.dist" plots a point for each pair of prototypes, with x 
coordinates representing the distance between the prototypes in the 
input space, and y coordinates representing the distance between the 
corresponding neurons on the grid.
plot(korresp.som, what="prototypes", type="poly.dist", print.title=TRUE)
plot(korresp.som, what="prototypes", type="umatrix", print.title=TRUE)
plot(korresp.som, what="prototypes", type="smooth.dist", print.title=TRUE)
plot(korresp.som, what="prototypes", type="mds")
plot(korresp.som, what="prototypes", type="grid.dist")
Neuron 6 which has been already picked out in the section Clustering interpretation for having prototypes rather different than the rest of the map shows a larger distance to the others.
quality(korresp.som)
## $topographic
## [1] 0.1415094
## 
## $quantization
## [1] 11292.83
By default, the quality function calculates both quantization and topographic 
errors. It is also possible to specify which one you want to obtain, by using
the argument quality.type.
The topographic error value varies between 0 (good projection quality) and 1 (poor projection quality). Here, the topographic quality of the mapping is rather good with a topographic error equal to 0.142.
The quantization error is an unbounded positive number. The closer from 0 it is, the better the projection quality is.
In the SOM algorithm, the number of clusters is necessarily close to the number of neurons on the grid (not necessarily equal as some neurons may have no observations assigned to them). This - quite large - number may not suit the original data for a clustering purpose.
A usual way to address clustering with SOM is to perform a hierarchical
clustering on the prototypes. This clustering is directly available in the
package SOMbrero using the function superClass. To do so, you can
first have a quick overview to decide on the number of super clusters which 
suits your data.
plot(superClass(korresp.som))
## Warning in plot.somSC(superClass(korresp.som)): Impossible to plot the rectangles: no super clusters.
By default, the function plots both a dendrogram and the evolution of the
percentage of explained variance. Here, 3 super clusters seem to be a good
choice. The output of superClass is a somSC class object.
Basic functions have been defined for this class:
my.sc <- superClass(korresp.som, k=3)
summary(my.sc)
## 
##    SOM Super Classes
##      Initial number of clusters :  64 
##      Number of super clusters   :  3 
## 
## 
##   Frequency table
##  1  2  3 
## 13 24 27 
## 
##   Clustering
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
##  1  1  2  2  2  2  2  2  1  1  2  2  2  2  2  2  1  1  1  2  2  2  2  2  1 
## 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
##  1  1  3  2  2  2  2  1  1  1  3  3  2  2  2  3  3  3  3  3  3  3  3  3  3 
## 51 52 53 54 55 56 57 58 59 60 61 62 63 64 
##  3  3  3  3  3  3  3  3  3  3  3  3  3  3
plot(my.sc, plot.var=FALSE)
Like plot.somRes, the function plot.somSC has an 
argument 'type' which offers many different plots and can thus be 
combined with most of the graphics produced by plot.somSC:
Case "grid" fills the grid with colors according to the super clustering 
(and can provide a legend).
Case "dendro3d" plots a 3d dendrogram.
plot(my.sc, type="grid", plot.legend=TRUE)
plot(my.sc, type="dendro3d")
The three super-clusters correspond to traditional votes (blue), far right votes (green) and votes for minor candidates (orange).
A couple of plots from plot.somRes are also available for the super 
clustering. Some identify the super clusters with colors:
plot(my.sc, type="hitmap", plot.legend=TRUE)
plot(my.sc, type="lines", print.title=TRUE)
plot(my.sc, type="lines", print.title=TRUE, view="c")
plot(my.sc, type="mds", plot.legend=TRUE)
And some others identify the super clusters with titles:
plot(my.sc, type="color", view="r", variable="correze")
plot(my.sc, type="color", view="c", variable="JOSPIN")
plot(my.sc, type="poly.dist")
Let us consider the first super cluster. It contains departements which were crucial in promoting the far right candidate LE PEN, unexpected winner of the first round:
##  [1] "LE_PEN"                  "ain"                    
##  [3] "aisne"                   "alpes_de_haute_provence"
##  [5] "alpes_maritimes"         "ardeche"                
##  [7] "ardennes"                "ariege"                 
##  [9] "aube"                    "aude"                   
## [11] "cher"                    "cote_d'or"              
## [13] "doubs"                   "drome"                  
## [15] "eure"                    "eure_et_loir"           
## [17] "gard"                    "gers"                   
## [19] "herault"                 "indre"                  
## [21] "jura"                    "loir_et_cher"           
## [23] "loire"                   "haute_loire"            
## [25] "loiret"                  "lot"                    
## [27] "lot_et_garonne_"         "marne"                  
## [29] "haute_marne"             "meurthe_et_moselle"     
## [31] "meuse"                   "moselle"                
## [33] "nievre"                  "oise"                   
## [35] "orne"                    "hautes_pyrenees"        
## [37] "pyrenees_orientales"     "haut_rhin"              
## [39] "haute_saone"             "saone_et_loire_"        
## [41] "savoie"                  "haute_savoie"           
## [43] "tarn_et_garonne"         "var"                    
## [45] "vaucluse"                "vosges"                 
## [47] "yonne"