The CpDyna package is dedicated to dynamic graphs and implements the algorithm described in (Corneli, Latouche, and Rossi 2018). In that paper, a model based approached is proposed in order to cluster the vertices of a dynamic graph, while detecting multiple change points in the interaction intensities. In this vignette, we show how tu use the package on simulated and real datasets.
set.seed(1)library(CpDyna)The package contains 1 simulated dataset and 2 real datasets.
Gnu is a \((3\times40)\) simulated data matrix. It reports on the first row \(40\) interacion times. One interaction time corresponds to an undirected interaction from a source node to a destination node. Source nodes are reported on the second row of Gnu, destination nodes on the third one.
Loading the simulated dataset:
data("Gnu")A custom partition is created based on the time horizon in Gnu:
tail(Gnu[1,])## [1] 99.43402 99.44803 99.73759 99.82499 99.87731 99.97434
custom_ptn <- c(1:100)Call to the function ModFit:
res <- ModFit(Gnu,
4,
4,
MinimalPartition = FALSE,
N_initializations = 20,
custom_partition = custom_ptn)Looking at the estimated clusters/change points:
res$est_z## [1] 3 3 3 3 3 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [36] 2 2 2 2 2
res$est_cp## [1] 20 30 60 100
and plotting the results:
par(mfrow = c(1,4))
ModPlot(Gnu, res, type = "adjacency")## Warning: package 'knitr' was built under R version 3.4.3
Loading the real datasets:
data(wtab)
data(stations_df)In order to take a look at the dynamics, we plot an histogram of the interaction dates:
par(mfrow=c(1,1), bty="n", cex.axis=1.5, cex.lab=1.5)
hist(wtab$Start.Date, breaks = 60, xaxt="n", yaxt="n", main = "", xlab = "time (hours)")
last <- 86340 # roughly 24h
time.seq<-seq(from=1, to=last, by=3600)
axis(1, at=time.seq, lab=c(0:23), lwd=.2)
axis(2, lwd=.2)Gnu is slightly manipulated to fit the ModFit function format:
Gnu <- as.matrix(wtab)
Gnu <- t(Gnu)Finally, the ModFit function is called to perform node clustering and change point detection:
N <- max(Gnu[2,], Gnu[3,]) # number of nodes/stations
step <- 900 # in seconds, corresponds to 15 minutes
custom_ptn <- seq(from = step, to = max(Gnu[1,]), by = step) # user defind partition
res <- ModFit(Gnu, 4, 4, eps=10^(-1),MinimalPartition = FALSE, custom_partition = custom_ptn, N_initializations = 1, verbose = TRUE)The estimated change points are added to the histogram:
cp <- res$est_cp
abline(v = cp, col="red", lwd=1.5)library(mapview)
library(sp)
new_df <- stations_df[, c(1,2,4,5)]
#converting columns from factor to proper formats
sq<-c(1,3,4)
for(i in 1:length(sq)) new_df[,sq[i]]<-as.numeric(levels(new_df[,sq[i]]))[new_df[,sq[i]]]
new_df[,2]<-as.character(levels(new_df[,2]))[new_df[,2]]
WhoIsWho <- seq(1:N)
match_pos <- res$est_z[match(new_df$id, WhoIsWho)] # for each station (id) in new_df, i look for its position in WhoIsWho and take outz at this position
new_df<-cbind(new_df, match_pos)
pos_na <- which(is.na(new_df$match_pos))
new_df <- new_df[-pos_na, ]
tav <-c ("RoyalBlue3","red", "green3","gold")
sub_df <- new_df[,c(3,4)]
coordinates(sub_df) <- ~long+lat
proj4string(sub_df) <- CRS("+init=epsg:4326")
mapview(sub_df, color=tav[new_df$match_pos], cex= 0.5, alpha = 0.8, lwd=6)Corneli, Marco, Pierre Latouche, and Fabrice Rossi. 2018. “Multiple Change Points Detection and Clustering in Dynamic Networks.” Statistics and Computing 28 (5): 989–1007. doi:10.1007/s11222-017-9775-1.