In the introduction we have see that a dependency network can be built using get_dep(). While it is theoretically possible to use get_dep() iteratively to obtain all dependencies of all packages available on CRAN, it is not practical to do so. This package provides two functions get_dep_all_packages() and get_graph_all_packges() for obtaining the dependencies of all CRAN packages directly, as well as an example dataset.
The example dataset cran_dependencies contains all dependencies as of 2020-05-09.
data(cran_dependencies)
cran_dependencies
#> # A tibble: 211,381 × 4
#>    from  to             type     reverse
#>    <chr> <chr>          <chr>    <lgl>  
#>  1 A3    xtable         depends  FALSE  
#>  2 A3    pbapply        depends  FALSE  
#>  3 A3    randomForest   suggests FALSE  
#>  4 A3    e1071          suggests FALSE  
#>  5 aaSEA DT             imports  FALSE  
#>  6 aaSEA networkD3      imports  FALSE  
#>  7 aaSEA shiny          imports  FALSE  
#>  8 aaSEA shinydashboard imports  FALSE  
#>  9 aaSEA magrittr       imports  FALSE  
#> 10 aaSEA Bios2cor       imports  FALSE  
#> # ℹ 211,371 more rows
dplyr::count(cran_dependencies, type, reverse)
#> # A tibble: 8 × 3
#>   type       reverse     n
#>   <chr>      <lgl>   <int>
#> 1 depends    FALSE   11123
#> 2 depends    TRUE     9672
#> 3 imports    FALSE   57617
#> 4 imports    TRUE    51913
#> 5 linking to FALSE    3433
#> 6 linking to TRUE     3721
#> 7 suggests   FALSE   35018
#> 8 suggests   TRUE    38884This is essentially a snapshot of CRAN. We can obtain all the current dependencies using get_dep_all_packages(), which requires no arguments:
df0.cran <- get_dep_all_packages()
head(df0.cran)
#>       from         to    type reverse
#> 3 AATtools   magrittr imports   FALSE
#> 4 AATtools      dplyr imports   FALSE
#> 5 AATtools doParallel imports   FALSE
#> 6 AATtools    foreach imports   FALSE
#> 7   ABACUS    ggplot2 imports   FALSE
#> 8   ABACUS      shiny imports   FALSE
dplyr::count(df0.cran, type, reverse) # numbers in general larger than above
#>          type reverse     n
#> 1     depends   FALSE 10436
#> 2     depends    TRUE  9028
#> 3    enhances   FALSE   611
#> 4    enhances    TRUE   625
#> 5     imports   FALSE 95904
#> 6     imports    TRUE 88142
#> 7  linking to   FALSE  5532
#> 8  linking to    TRUE  5933
#> 9    suggests   FALSE 61165
#> 10   suggests    TRUE 67632As of 2023-11-21, there are 0 packages that have all 10 types of dependencies, and 5 packages that have 9 types of dependencies: Matrix, bigmemory, miceadds, rstan, xts.
We can build dependency network using get_graph_all_packages(). Furthermore, we can verify that the forward and reverse dependency networks are (almost) the same, by looking at their size (number of edges) and order (number of nodes).
g0.depends <- get_graph_all_packages(type = "depends")
g0.depends
#> IGRAPH 887dafa DN-- 4585 7434 -- 
#> + attr: name (v/c)
#> + edges from 887dafa (vertex names):
#>  [1] A3          ->xtable     A3          ->pbapply    abc         ->abc.data  
#>  [4] abc         ->nnet       abc         ->quantreg   abc         ->MASS      
#>  [7] abc         ->locfit     ABCp2       ->MASS       abctools    ->abc       
#> [10] abctools    ->abind      abctools    ->plyr       abctools    ->Hmisc     
#> [13] abd         ->nlme       abd         ->lattice    abd         ->mosaic    
#> [16] abodOutlier ->cluster    abundant    ->glasso     Ac3net      ->data.table
#> [19] acc         ->mhsmm      accelmissing->mice       accelmissing->pscl      
#> [22] accessrmd   ->ggplot2    accrualPlot ->lubridate  acdcR       ->raster    
#> + ... omitted several edgesWe could obtain essentially the same graph, but with the direction of the edges reversed, by specifying type = "reverse depends":
The dependency words accepted by the argument type is the same as in get_dep(). The two networks’ size and order should be very close if not identical to each other. Because of the dependency direction, their edge lists should be the same but with the column names from and to swapped.
For verification, the exact same graphs can be obtained by filtering the data frame for the required dependency and applying df_to_graph():
g1.depends <- df0.cran |>
  dplyr::filter(type == "depends" & !reverse) |>
  df_to_graph(nodelist = dplyr::rename(df0.cran, name = from))
g1.depends # same as g0.depends
#> IGRAPH 7dcb402 DN-- 4585 7434 -- 
#> + attr: name (v/c), type (e/c), reverse (e/l)
#> + edges from 7dcb402 (vertex names):
#>  [1] A3          ->xtable     A3          ->pbapply    abc         ->abc.data  
#>  [4] abc         ->nnet       abc         ->quantreg   abc         ->MASS      
#>  [7] abc         ->locfit     ABCp2       ->MASS       abctools    ->abc       
#> [10] abctools    ->abind      abctools    ->plyr       abctools    ->Hmisc     
#> [13] abd         ->nlme       abd         ->lattice    abd         ->mosaic    
#> [16] abodOutlier ->cluster    abundant    ->glasso     Ac3net      ->data.table
#> [19] acc         ->mhsmm      accelmissing->mice       accelmissing->pscl      
#> [22] accessrmd   ->ggplot2    accrualPlot ->lubridate  acdcR       ->raster    
#> + ... omitted several edgesIf we extract the equivalent graph of reverse dependencies, we should obtain the same graph as before (had it been extracted above):
# Not run
g1.rev_depends <- df0.cran |>
  dplyr::filter(type == "depends" & reverse) |>
  df_to_graph(nodelist = dplyr::rename(df0.cran, name = from))
g1.rev_depends # should be same as g0.rev_dependsThe networks obtained above should all be directed acyclic graphs:
One may notice that there are external reverse dependencies which won’t be appear in the forward dependencies if the scraping is limited to CRAN packages. We can find these external reverse dependencies by nodelist = NULL in df_to_graph():
df1.rev_depends <- df0.cran |>
  dplyr::filter(type == "depends" & reverse) |>
  df_to_graph(nodelist = NULL, gc = FALSE) |>
  igraph::as_data_frame() # to obtain the edge list
df1.depends <- df0.cran |>
  dplyr::filter(type == "depends" & !reverse) |>
  df_to_graph(nodelist = NULL, gc = FALSE) |>
  igraph::as_data_frame()
dfa.diff.depends <- dplyr::anti_join(
  df1.rev_depends,
  df1.depends,
  c("from" = "to", "to" = "from")
)
head(dfa.diff.depends)
#>     from         to    type reverse
#> 1  abind     baySeq depends    TRUE
#> 2  abind     CNORdt depends    TRUE
#> 3  abind FISHalyseR depends    TRUE
#> 4  abind   riboSeqR depends    TRUE
#> 5  abind   S4Arrays depends    TRUE
#> 6 adabag   m6Aboost depends    TRUEThis means we are extracting the reverse dependencies of which the forward equivalents are not listed. The column to shows the packages external to CRAN. On the other hand, if we apply dplyr::anti_join() by switching the order of two edge lists,
dfb.diff.depends <- dplyr::anti_join(
  df1.depends,
  df1.rev_depends,
  c("from" = "to", "to" = "from")
)
head(dfb.diff.depends)
#>                 from       to    type reverse
#> 1           abctools parallel depends   FALSE
#> 2                abd     grid depends   FALSE
#> 3 AcceptanceSampling  methods depends   FALSE
#> 4 AcceptanceSampling    stats depends   FALSE
#> 5              acdcR    stats depends   FALSE
#> 6               acid    stats depends   FALSEthe column to lists those which are not on the page of available packages on CRAN (anymore). These are either defunct or core packages.
Using the data frame df0.cran, we can also obtain the degree for each package and each type:
df0.summary <- dplyr::count(df0.cran, from, type, reverse)
head(df0.summary)
#>       from     type reverse n
#> 1       A3  depends   FALSE 2
#> 2       A3 suggests   FALSE 2
#> 3 AATtools  imports   FALSE 4
#> 4   ABACUS  imports   FALSE 2
#> 5   ABACUS suggests   FALSE 2
#> 6  ABC.RAP  imports   FALSE 3We can look at the “winner” in each of the reverse dependencies:
df0.summary |>
  dplyr::filter(reverse) |>
  dplyr::group_by(type) |>
  dplyr::top_n(1, n)
#> # A tibble: 5 × 4
#> # Groups:   type [5]
#>   from    type       reverse     n
#>   <chr>   <chr>      <lgl>   <int>
#> 1 Rcpp    linking to TRUE     2958
#> 2 ggplot2 depends    TRUE      465
#> 3 ggplot2 imports    TRUE     3687
#> 4 knitr   suggests   TRUE     9574
#> 5 shiny   enhances   TRUE       14This is not surprising given the nature of each package. To take the summarisation one step further, we can obtain the frequencies of the degrees, and visualise the empirical degree distribution neatly on the log-log scale:
df1.summary <- df0.summary |>
  dplyr::count(type, reverse, n)
#> Storing counts in `nn`, as `n` already present in input
#> ℹ Use `name = "new_name"` to pick a new name.
gg0.summary <- df1.summary |>
  dplyr::mutate(reverse = ifelse(reverse, "reverse", "forward")) |>
  ggplot2::ggplot() +
  ggplot2::geom_point(ggplot2::aes(n, nn)) +
  ggplot2::facet_grid(type ~ reverse) +
  ggplot2::scale_x_log10() +
  ggplot2::scale_y_log10() +
  ggplot2::labs(x = "Degree", y = "Number of packages") +
  ggplot2::theme_bw(20)
gg0.summary This shows the reverse dependencies, in particular 
Reverse_depends and Reverse_imports, follow the power law, which is empirically observed in various academic fields.
We can now visualise (the giant component of) the CRAN network of Depends, using functions in the package visNetwork. To do this, we will need to convert the igraph object g0.depends to the node list and edge list as data frames.
prefix <- "http://CRAN.R-project.org/package=" # canonical form
degrees <- igraph::degree(g0.depends)
df0.nodes <- data.frame(id = names(degrees), value = degrees) |>
  dplyr::mutate(title = paste0('<a href=\"', prefix, id, '\">', id, '</a>'))
df0.edges <- igraph::as_data_frame(g0.depends, what = "edges")We could use igraph::membership() & igraph::cluster_*() for community detection and visualisation of the clusters using different colours, which however will take too much computing time and therefore not shown here.
By adding the column title in df0.nodes, we enable clicking the nodes and being directed to their CRAN pages, in the interactive visualisation below:
set.seed(2345L)
vis0 <- visNetwork::visNetwork(df0.nodes, df0.edges, width = "100%", height = "720px") |>
  visNetwork::visOptions(highlightNearest = TRUE) |>
  visNetwork::visEdges(arrows = "to", color = list(opacity = 0.5)) |>
  visNetwork::visNodes(fixed = TRUE) |>
  visNetwork::visIgraphLayout(layout = "layout_with_drl")
vis0Methods in social network analysis, such as stochastic block models, can be applied to study the properties of the dependency network. Ideally, by analysing the dependencies of all CRAN packages, we can obtain a bird’s-eye view of the ecosystem. The number of reverse dependencies is modelled in this other vignette.