An implementation of common higher order functions with syntactic sugar for anonymous function. Provides also a link to ‘dplyr’ for common transformations on data frames to work around non standard evaluation by default.
devtools::install_github("wahani/dat")install.packages("dat")R CMD check. And you don’t like that.dplyr is not respecting the class of the object it operates on; the class attribute changes on-the-fly.dplyr nor data.table are playing nice with S4, but you really, really want a S4 data.table or tbl_df.rlist and purrr.dplyrThe examples are from the introductory vignette of dplyr. You still work with data frames: so you can simply mix in dplyr features whenever you need them. The functions filtar, mutar and sumar are R CMD check friendly replacements for the corresponding versions in dplyr. For select you can use extract. The function names are chosen so that they are similar but do not conflict with dplyrs function names - so dplyr can be savely attached to the search path.
library("nycflights13")
library("dat")## Loading required package: aoos## 
## Attaching package: 'dat'## The following object is masked from 'package:base':
## 
##     replacefiltar can be used as a replacement for filter and slice. When you reference a variable in the data itself, you can indicate this by using a one sided formula.
filtar(flights, ~ month == 1 & day == 1)
filtar(flights, 1:10)And for sorting:
filtar(flights, ~ order(year, month, day))## # A tibble: 336,776 x 19
##     year month   day dep_t… sched_… dep_d… arr_… sched… arr_d… carr… flig…
##    <int> <int> <int>  <int>   <int>  <dbl> <int>  <int>  <dbl> <chr> <int>
##  1  2013     1     1    517     515   2.00   830    819  11.0  UA     1545
##  2  2013     1     1    533     529   4.00   850    830  20.0  UA     1714
##  3  2013     1     1    542     540   2.00   923    850  33.0  AA     1141
##  4  2013     1     1    544     545  -1.00  1004   1022 -18.0  B6      725
##  5  2013     1     1    554     600  -6.00   812    837 -25.0  DL      461
##  6  2013     1     1    554     558  -4.00   740    728  12.0  UA     1696
##  7  2013     1     1    555     600  -5.00   913    854  19.0  B6      507
##  8  2013     1     1    557     600  -3.00   709    723 -14.0  EV     5708
##  9  2013     1     1    557     600  -3.00   838    846 - 8.00 B6       79
## 10  2013     1     1    558     600  -2.00   753    745   8.00 AA      301
## # ... with 336,766 more rows, and 8 more variables: tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>You can use characters, logicals, regular expressions and functions to select columns. Regular expressions are indicated by a leading “^”. Character are simply passed to dplyr::select_.
flights %>%
  extract(c("year", "month", "day")) %>%
  extract("year:day") %>%
  extract("^day$") %>%
  extract(is.numeric)The main difference between mutate and mutar is that you use a ~ instead of =.
mutar(
  flights,
  gain ~ arr_delay - dep_delay,
  speed ~ distance / air_time * 60
)Grouping data is handled within mutar:
mutar(flights, n ~ n(), by = "month")sumar(flights, delay ~ mean(dep_delay, na.rm = TRUE), by = "month")You can also provide additional arguments to a formula. This is especially helpful when you want to pass arguments from a function to such expressions. The additional augmentation can be anything which you can use to select columns (character, regular expression, function) or a named list where each element is a character.
sumar(
  flights,
  .n ~ mean(.n, na.rm = TRUE) | "^.*delay$",
  x ~ mean(x, na.rm = TRUE) | list(x = "arr_time"),
  by = "month"
)## # A tibble: 12 x 4
##    month dep_delay arr_delay arr_time
##    <int>     <dbl>     <dbl>    <dbl>
##  1     1     10.0      6.13      1523
##  2     2     10.8      5.61      1522
##  3     3     13.2      5.81      1510
##  4     4     13.9     11.2       1501
##  5     5     13.0      3.52      1503
##  6     6     20.8     16.5       1468
##  7     7     21.7     16.7       1456
##  8     8     12.6      6.04      1495
##  9     9      6.72   - 4.02      1504
## 10    10      6.24   - 0.167     1520
## 11    11      5.44     0.461     1523
## 12    12     16.6     14.9       1505Using this package you can create S4 classes to contain a data frame (or a data.table) and use the interface to dplyr. Both dplyr and data.table do not support integration with S4. The main function here is mutar which is generic enough to link to subsetting of rows and cols as well as mutate and summarise. In the background dplyrs ability to work on a data.table is being used.
library("data.table")
setClass("DataTable", "data.table")
DataTable <- function(...) {
  new("DataTable", data.table::data.table(...))
}
setMethod("[", "DataTable", mutar)
dtflights <- do.call(DataTable, nycflights13::flights)
dtflights[1:10, "year:day"]
dtflights[n ~ n(), by = "month"]
dtflights[n ~ n(), sby = "month"]
dtflights %>%
  filtar(~month > 6) %>%
  mutar(n ~ n(), by = "month") %>%
  sumar(n ~ first(n), by = "month")Inspired by rlist and purrr some low level operations on vectors are supported. The aim here is to integrate syntactic sugar for anonymous functions. Furthermore the functions should support the use of pipes.
map and flatmap as replacements for the apply functionsextract for subsettingreplace for replacing elements in a vectorWhat we can do with map:
map(1:3, ~ .^2)
flatmap(1:3, ~ .^2)
map(1:3 ~ 11:13, c) # zip
dat <- data.frame(x = 1, y = "")
map(dat, x ~ x + 1, is.numeric)What we can do with extract:
extract(1:10, ~ . %% 2 == 0) %>% sum
extract(1:15, ~ 15 %% . == 0)
l <- list(aList = list(x = 1), aAtomic = "hi")
extract(l, "^aL")
extract(l, is.atomic)What we can do with replace:
replace(c(1, 2, NA), is.na, 0)
replace(c(1, 2, NA), rep(TRUE, 3), 0)
replace(c(1, 2, NA), 3, 0)
replace(list(x = 1, y = 2), "x", 0)
replace(list(x = 1, y = 2), "^x$", 0)
replace(list(x = 1, y = "a"), is.character, NULL)