There are number of areas where Crunch needs to represent an object
as belonging to one of many categories. The simplest and most common
example of this is the categories of a categorical variable. For a
categorical variable, the values of the variable can be one of a limited
set of categories and those categories are specified in the Crunch API
as metadata about the variable. These categoricals are similar to R’s
factors but are richer because Crunch categoricals can have
any number of missing values (compared to just NA for
factors), as well as a numeric representation that is
separate from the category ids (which is useful for things like income
bins, where you might put the middle of the bin as the value).
Moving beyond just categorical variables, we have a need to be able
to represent a number of different properties, transformations, etc. in
a category-like way. One concrete example is used heavily in order to
add subtotals and headings to representations of categorical variables.
In order to do this, we have two families of S4 classes:
AbstractCategory and AbstractCategories
Although subtotals and headings was the initial motivation for the new
classes, they will allow for other types of representations and
manipulations in the future.
The core classes that all other classes inherit from are
AbstractCategory and AbstractCategories. The
first, AbstractCategory, is designed to represent a single
category, which might have a number of properties about it (what those
are will be explained in more detail below). The second,
AbstractCategories is designed to hold more than one
AbstractCategory together to form a coherent group. As a
simple, example: an AbstractCategories for binned income
could have 5 AbstractCategorys: <$25,000,
$25,000-$49,999, $50,000-$99,999, $100,000-$199,999, >$200,000. This
could be represented in R as:
income <- AbstractCategories(AbstractCategory(name = "<$25,000"),
AbstractCategory(name = "$25,000-$49,999"),
AbstractCategory(name = "$50,000-$99,999"),
AbstractCategory(name = "$100,000-$199,999"),
AbstractCategory(name = ">$200,000"))An alternate (and less typing) way to instantiate this same
AbstractCategories is to send lists, and the constructor
takes care of calling the AbstractCategory class on each
(as below). Each of the child-classes of AbstractCategories
(described in the sections below) have their own mapping of plural
container to singular entity constructor in the same way, so passing
Categories a list will result in a Categories
object full of Category objects.
income <- AbstractCategories(list(name = "<$25,000"),
list(name = "$25,000-$49,999"),
list(name = "$50,000-$99,999"),
list(name = "$100,000-$199,999"),
list(name = ">$200,000"))Finally, there’s a data argument, if you already have a
list of AbstractCategorys (or simply named lists!) you want
to pass in (the same thing could also be accomplished with
do.call):
income_list <- list(list(name = "<$25,000"),
list(name = "$25,000-$49,999"),
list(name = "$50,000-$99,999"),
list(name = "$100,000-$199,999"),
list(name = ">$200,000"))
income <- AbstractCategories(data=income_list)Any methods that are defined for the abstract classes will function
on the subclasses as well. Child classes might have special over-ride
methods defined for them, but for the most part, if a method can be used
on AbstractCategories or AbstractCategory it
can be used on the child classes as well.
AbstractCategories inherits from list and
AbstractCategory inherits from namedList so
many of the same methods will be work with both of them. This includes
using [, [[, [<-, and
[[<- to get and set subsets of
AbstractCategories and $, and [[
to get the properties in an AbstractCategory.
lapply has also been defined for
AbstractCategories for easily iterating over all members.
modifyCats also allows for modifying one
AbstractCategories object by updating with new information
from a second AbstractCategories object in the same way
that modifyList works, but crucially it does not recurse
into the AbstractCategory objects themselves.
Finally, there are a few custom methods that return the values of the
properties as either a vector of that property for each member (when
using the plural versions against AbstractCategories) or a
vector (typically of length one) for a single member (when using the
singular versions against AbstractCategory).
names returns the names associated with each
AbstractCategory in an AbstractCategories
object. And name returns the names associated with an
AbstractCategory object. ids and
id patterns the exact same way.
Categories from a categorical variable are represented by the
Categories and Category classes. They inherit
directly from AbstractCategories and Category
respectively. For these, each Category must have a
name and an id, they optionally can have a
numeric_value, missing, and
selected property.
values and value return the
numeric_values property from Categories or a
single Category respectively.is.na and is.na returns the
missing property from Categories or a single
Category respectively.is.selected and is.selected returns the
selected property from Categories or a single
Category respectively.Insertions allow users to insert new categories into a variable or a
CrunchCube for display purposes. This is useful when the user would like
to show things like aggregates (e.g. subtotals) without manipulating the
underlying data (or creating a new variable). Insertions are defined as
part of the Crunch API (see the Transforms section below for an
explanation about where Insertions live). The Insertions
class is designed to mirror the Crunch API for insertions as closely as
possible. Insertions and Insertion inherit
directly from AbstractCategories and Category
respectively.
Insertions must have a name and an
anchor. The name is just like
Category names, and is used as the label to display. The
anchor is the id of the category after which the insertion
should be placed.
Since insertions can represent a number of different aggregations,
they also can have function and args
properties. The function property is a character describing
the aggregation to use (e.g. "subtotal") and the
args property is a vector of the category ids
to use as operands for the function.
The Insertion class has two child classes:
Subtotal and Heading. The
Insertions class can contain anything that inherits from
Insertion. Therefor an Insertions object might
include Insertions, Subtotals, and
Headings.
anchors and anchor return the anchor
property from Insertions or a single Insertion
respectively.funcs and func return the function
property from Insertions or a single Insertion
respectively.arguments returns the args property from a
single Insertion.Subtotals and headings are both types of insertions. Because
of this Subtotal and Heading classes inherit
from Insertion rather than directly from
AbstractCategory. These classes are designed to hold known
types of Insertions to make it easier to work with Insertions (for
example: testing which insertion to style in what way when using
prettyPrint functions). Additionally, these classes have
slightly more user-friendly names (e.g. after instead of
anchor), and they accept either ids or
names to refer to specific Categorys.
A Subtotal must have name,
after, and categories properties.
name is the same as other abstract categories.
after is similar to anchor but can be either a
category id or a category name after which the
subtotal should be placed. categories is either the
category ids or a category names to
subtotal.
The same as Insertion, however some have customizations:
* func always returns the string "subtotal"
(because by definition a Subtotal object is an
Insertion with function="subtotal") *
anchor and arguments both have an option
var_items which is required if the Subtotal is
using category names instead of ids in the after or
categories properties. Supplying the categories is required
in order to translate from category names to
ids which are required to be a well-formed
Insertion.
A Heading must have name and
after properties. Both of which have the same
interpretation as Subtotal above.
The same as Subtotal for anchor.
func and arguments return NA
As a concrete example, let’s take the following categories:
feeling_cats <- Categories(
list(name = "Very Happy", id = 1),
list(name = "Somewhat Happy", id = 2),
list(name = "Neither Happy nor Unhappy", id = 3),
list(name = "Somewhat Unhappy", id = 4),
list(name = "Very Unhappy", id = 5)
)
feeling_cats## id name value missing
## 1 1 Very Happy NA FALSE
## 2 2 Somewhat Happy NA FALSE
## 3 3 Neither Happy nor Unhappy NA FALSE
## 4 4 Somewhat Unhappy NA FALSE
## 5 5 Very Unhappy NA FALSE
And make some subtotals and headings to use as insertions:
feeling_subtotals <- Insertions(
Heading(name = "How I feel about cheese", position = "top"),
Subtotal(name = "Generally Happy", after = "Somewhat Happy",
categories = c("Very Happy", "Somewhat Happy")),
Subtotal(name = "Generally Unhappy", after = 5,
categories = c(4, 5))
)Notice that the “Generally Happy” subtotal is made specifying
category names for after and
categories:
feeling_subtotals[[2]]$after## [1] "Somewhat Happy"
feeling_subtotals[[2]]$categories## [1] "Very Happy" "Somewhat Happy"
Where as the “Generally Unhappy” subtotal uses ids:
feeling_subtotals[[3]]$after## [1] 5
feeling_subtotals[[3]]$categories## [1] 4 5
Since the Crunch API does not have a distinction between
Subtotals Headings, and other
Insertions, we sometimes need to convert from
Subtotals or Headings to
Insertions. This is accomplished with the method
makeInsertion(). This method takes a Subtotal
or Heading and returns a valid Insertion. If
the Subtotal or Heading has category
name references instead of ids, then you must
include a Categories object as the var_items
argument. In general, this is only needed before sending a heterogeneous
set of Insertions to the Crunch API.
Using the examples we used before, we can see how this works:
feeling_insertions <- Insertions(data = lapply(feeling_subtotals, makeInsertion, var_items = feeling_cats))Now, all of the Subtotals and Heading from
feeling_subtotals are proper Insertions:
sapply(feeling_insertions, class)## [1] "Insertion" "Insertion" "Insertion"
This means that the after property has been translated
into anchor, and the function and
args properties have been filled in appropriately:
feeling_insertions[[3]]$anchor## [1] 5
feeling_insertions[[3]]$`function`## [1] "subtotal"
feeling_insertions[[3]]$args## [1] 4 5
Because Insertions are required to use category
ids only, the new all-Insertions
feeling_insertions has translated the “Generally Happy”
subtotal’s category names to ids:
feeling_insertions[[2]]$anchor## [1] 2
feeling_insertions[[2]]$args## [1] 1 2
Since the Crunch API does not have a distinction between
Subtotals Headings, and other
Insertions when we get data about Insertions
from the API, we need to change the classes for the
Insertions that the crunch package knows
about. To do this, we can use either subtypeInsertions to
change the types of all of the members of an Insertions
object, or subtypeInsertion to change the type of a single
Insertion object.
These functions work by inspecting the Insertion and
determining if it can be identified as one of the known child classes of
Insertion (namely: Subtotal or
Heading).
Using the same example above, we can convert back from all
Insertions to the subtypes:
feeling_subtotals_again <- subtypeInsertions(feeling_insertions)
sapply(feeling_subtotals_again, class)## [1] "Heading" "Subtotal" "Subtotal"
There are two sets of inheritance: one for containers and one for members: Classes inherit from those immediately to their left
| top-level classes | 1st children | 2nd children | |
|---|---|---|---|
| containers | AnstractCategories |
Categories |
|
AnstractCategories |
Insertions |
||
| members | AbstractCategory |
Category |
|
AbstractCategory |
Insertion |
Subtotal |
|
AbstractCategory |
Insertion |
Heading |
The Transforms class and set of functions is not an
abstract category at all, but rather it mirrors the Crunch API’s set of
transformations that are allowed on a variable or CrunchCube. One of the
possible transformations are insertions (which is where
Insertions are stored). Currently the crunch
package doesn’t support other transformations.