--- title: "An overview of package plot3logit" author: "Flavio Santi, Maria Michela Dickson, Giuseppe Espa, Diego Giuliani" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{An overview of package plot3logit} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: ../inst/REFERENCES.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, fig.align = "center" ) ``` ## Introduction The package permits the covariate effects of trinomial regression models to be represented graphically by means of a ternary plot. The aim of the plots is helping the interpretation of regression coefficients in terms of the effects that a change in regressors' values has on the probability distribution of the dependent variable. Such changes may involve either a single regressor, or a group of them (composite changes), and the package permits both cases to be handled in a user-friendly way. Methodological details are illustrated and discussed in @santi2019. The package can read the results of **both categorical and ordinal trinomial logit** regression fitted by various functions (see the next section) and creates a `field3logit` object which may be represented by means of functions `gg3logit` and `stat_field3logit`. The `plot3logit` package inherits graphical classes and methods from the package `ggtern` [@hamilton2018] which, in turn, is based on the package `ggplot2` [@wickham2016a]. Graphical representation based on **standard graphics** is made available through the package `Ternary` [@smith2017] by functions `plot3logit` and `TernaryField`, and by the `plot` method of `field3logit` objects. See the help of `field3logit` for representing composite effects and `multifield3logit` for drawing multiple fields. See @santi2022 for a detailed presentation of the package and its features, or read the main vignette of the package which is based on it. ## Compatibility Function `field3logit` of package `plot3logit` can read trinomial regression estimates from the output of the following functions: * `polr` of package `MASS` (ordinal logit regression); * `mlogit` of package `mlogit` (logit regression); * `multinom` of package `nnet` (logit regression); * `clm` and `clm2` of package `ordinal` (ordinal logit regression); * `vgam` and `vglm` of package `VGAM` (logit regression). Explicit estimates can be passed to `field3logit()` by means of a named list too (see the help of `field3logit` or the main vignette of the package); moreover, users can easily extend the range of objects that can be processed by `field3logit()` by implementing the S3 methods of the generic `extract3logit` (see the help). In the following, an example is provided for each object `field3logit` can work with. First, however, package `plot3logit` must be attached and dataset `cross_1year` must be loaded: ```{r, message=FALSE} library(plot3logit) data(cross_1year) str(cross_1year) ``` ### Package `MASS` Function `polr` of package `MASS` [@venables2002] fits a *proportional odds logistic regression* by means of a fairly simple syntax, however the order of labels of the dependent variable must be explicitly stated: ```{r, message=FALSE, results='hide'} library(MASS) mydata <- cross_1year mydata$finalgrade <- factor( x = mydata$finalgrade, levels = c('Low', 'Average', 'High'), ordered = TRUE ) mod0 <- polr(finalgrade ~ gender + irregularity, data = mydata) field3logit(mod0, 'genderFemale') ``` ### Package `mlogit` Package `mlogit` of package `mlogit` [@mlogit] can fit a wide range of models. The current implementation of `plot3logit` only works with pure trinomial models, where only specific coefficients are considered: ```{r, message=FALSE, results='hide'} library(mlogit) mydata <- mlogit.data(cross_1year, choice = 'employment_sit', shape = 'wide') mod0 <- mlogit(employment_sit ~ 0 | gender + finalgrade, data = mydata) field3logit(mod0, 'genderFemale') ``` ### Package `nnet` Function `multinom` of package `nnet` [@venables2002] fits a multinomial model by means of a very simple syntax: ```{r, message=FALSE, results='hide'} library(nnet) mod0 <- multinom(employment_sit ~ gender + finalgrade, data = cross_1year) field3logit(mod0, 'genderFemale') ``` ### Package `ordinal` Package `ordinal` [@ordinal] permits ordinal multinomial logistic models to be fitted by means of two functions: `clm` and `clm2`. The latter is "a new improved implementation" of the former (see the help of `clm2`), in both cases the syntax is simple and standard. Unlike `polr` of package `MASS` [@venables2002], the dependent variable may be an unordered factor, and in that case both `clm` and `clm2` consider the vector of the labels as ordered: if this is not the case, the dependent variable should be properly redefined. ```{r, message=FALSE, results='hide'} library(ordinal) mydata$finalgrade <- factor(mydata$finalgrade, c('Low', 'Average', 'High')) mod0 <- clm(finalgrade ~ gender + irregularity, data = mydata) field3logit(mod0, 'genderFemale') mod0 <- clm2(finalgrade ~ gender + irregularity, data = mydata) field3logit(mod0, 'genderFemale') ``` ### Package `VGAM` Package `VGAM` [@yee2010] permits multinomial logistic models to be fitted by means of two functions: `vgam` and `vglm`. In case of multinomial logistic models they share the same syntax: ```{r, message=FALSE, results='hide'} library(VGAM) mod0 <- vgam( formula = employment_sit ~ gender + finalgrade, family = multinomial(), data = cross_1year ) field3logit(mod0, 'genderFemale') mod0 <- vglm( formula = employment_sit ~ gender + finalgrade, family = multinomial(), data = cross_1year ) field3logit(mod0, 'genderFemale') ``` ### Reading from `list` Point estimates and other information on a fitted model may be passed as a named list to `field3logit`. In case of a non-ordinal trinomial model, the matrix of coefficients (component `B`) should be a matrix with named rows and two columns, whereas possible levels of the dependent variable should be included as a separate component (`levels`) which should be a character vector of length three, whose first component is the reference level of the model (see the help of `extract3logit` for details): ```{r, message=FALSE, results='hide'} mod0 <- list( B = matrix( data = c(-2.05, 0.46, -2.46, 0.37, -1.2, 0.8, 0.7, 0.4), ncol = 2, dimnames = list(c('Intercept', 'X1', 'X2', 'X3')) ), levels = c('Class A', 'Class B', 'Class C') ) field3logit(mod0, 'X1') ``` ## A brief example Fit a trilogit model by means of package `nnet` where the student's employment situation is analysed with respect to all variables in the dataset `cross_1year`: ```{r, message = FALSE} library(plot3logit) data(cross_1year) library(nnet) mod0 <- multinom(employment_sit ~ ., data = cross_1year) ``` The gender effect is analysed by means of a ternary plot which is generated in two steps. Firstly, the vector field is computed: ```{r} field0 <- field3logit(mod0, 'genderFemale') ``` Secondly, the field is represented on a ternary plot, using either `gg`-graphics: ```{r} gg3logit(field0) + stat_field3logit() ``` or standard graphics: ```{r, fig.width=6, fig.asp=1} plot(field0) ``` ## How to specify the covariate changes Ternary plots represent the effect of a change in covariate values on the probability distribution of the dependent variable. The function `field3logit` permits such change to be specified in three different ways: explicitly, as a numeric vector, or implicitly, either by means of a named numeric vector, or by means of an R expressions. All three methods are briefly illustrated below. As an example, the following subsections refer to this trinomial logistic regression model: ```{r} library(plot3logit) library(nnet) mod0 <- multinom(employment_sit ~ finalgrade + irregularity + hsscore, cross_1year) mod0 ``` ### Explicit specification of $\Delta x$ This method for specifying the change in the covariate values requires the vector $\Delta x$ to be explicitly defined, thus it may be suitable when $\Delta x$ results from some calculations. On the other hand, it is less user-friendly than implicit syntax, as it depends on the order of regressors in the design matrix. #### First example If the effect of a *high final grade* has to be assessed, the vector of changes $\Delta x$ can be set according to the position of the dummy variable `finalgradeHigh` in the matrix of coefficients of the model `mod0`: ```{r} coef(mod0) ``` in this case, we have that \[ \Delta x=[0, 0, 1, 0, 0, 0]' \] since `finalgradeHigh` is the fourth coefficient (including the intercept) of the matrix of coefficients. It follows that the function `field3logit` can be invoked as it follows: ```{r} field3logit(mod0, c(0, 0, 1, 0, 0, 0)) ``` #### Second example It is also possible to set $\Delta x$ so as to consider changes involving more than one regressor, as well as fractional changes. In such cases, $\Delta x$ will consist in a vector where there are several non-zero elements which may take any positive or negative value. Assume, for example, that we want to study the effect of a decrease by 10 in the high school final score, associated to an high final grade. In such a case, we have that: \[ \Delta x =[0, 0, 1, 0, 0, -10]'\,, \] thus: ```{r} field3logit(mod0, c(0, 0, 1, 0, 0, -10)) ``` ### Implicit specifications of $\Delta x$ Unlike the explicit method, the implicit syntaxes allows the user to initialise the vector $\Delta x$ by specifying only the covariates which should vary. There are two possible syntaxes: one is based on named numeric vectors, the other is based on `R` code. In the following, both of them are illustrated. #### First example If the effect of a *high final grade* has to be assessed, the implicit syntaxes which allow to assess the effect of a unitary change of `finalgradeHigh` are the following: ```{r} # Named numeric vectors: field3logit(mod0, c(finalgradeHigh = 1)) # R code: field3logit(mod0, 'finalgradeHigh') ``` Note that the console output produced by printing the output of `field3logit` shows both the implicit effect (line `Effect`) and the associated vector $\Delta x$ (line `Explicit effect`). #### Second example If we want to study the effect of a decrease by 10 in the high school final score, associated to an high final grade, the implicit syntaxes are: ```{r} # Named numeric vectors: field3logit(mod0, c(finalgradeHigh = 1, hsscore = -10)) # R code: field3logit(mod0, 'finalgradeHigh - 10 * hsscore') ``` Compare the line `Explicit effect` of this output to the line `Effect` of the same example in the previous section: as expected, they are the same. ## Comparing multiple effects When effects of multiple changes have to be compared at a time, multiple fields should be computed and represented on the same plot. This task can be easily done by creating a `multifield3logit` object and directly representing it. Since objects `multifield3logit` result by putting together two or more `field3logit` objects, the package `plot3logit` allows the user to create a `multifield3logit` object by adding up two or more `filed3logit` or `multifield3logit` objects using standard sum operator `+`. Here it is an example. The following command fit a trilogit model where all available variables are used as regressors. Then four `fields3logit` objects are computed for assessing the effects of a some combined changes in the duration of studies and in students' final degree score. Note that each field is computed just with respect to a single probability distribution (`refpoint`) of the dependent variable, and only one arrow is computed. The reason of this is that we have to represent four fields on the same plot, thus olny a small number of arrows can be drawn in order to preserve the clarity of the graph. ```{r, results='hide'} data(cross_1year) mod0 <- nnet::multinom(employment_sit ~ ., data = cross_1year) refpoint <- list(c(0.7, 0.15, 0.15)) field_Sdur <- field3logit(mod0, 'durationShort', label = 'Short duration', p0 = refpoint, narrows = 1) field_Ldur <- field3logit(mod0, 'durationLong', label = 'Long duration', p0 = refpoint, narrows = 1) field_Hfgr <- field3logit(mod0, 'finalgradeHigh', label = 'High final grade', p0 = refpoint, narrows = 1) field_Lfgr <- field3logit(mod0, 'finalgradeLow', label = 'Low final grade', p0 = refpoint, narrows = 1) ``` Now the `multifield3logit` object can be created by adding all the `field3logit` objects up together: ```{r} mfields <- field_Sdur + field_Ldur + field_Lfgr + field_Hfgr mfields ``` and the `multifield3logit` object `mfield` can be represented in a graph: ```{r, message = FALSE} gg3logit(mfields) + stat_field3logit(aes(colour = label)) + theme_zoom_L(0.45) ``` The code needed for generating the object `mfields` may be conveniently made shorter in this way (see the main vignette of the package or the help of `field3logit` for details on syntax): ```{r} depo <- list( list(delta = 'durationShort', label = 'Short duration'), list(delta = 'durationLong', label = 'Long duration'), list(delta = 'finalgradeHigh', label = 'High final grade'), list(delta = 'finalgradeLow', label = 'Low final grade') ) mfields <- field3logit(mod0, delta = depo, p0 = refpoint, narrows = 1) mfields ``` It is also possible to rely on syntax based on delimiters `<<...>>` (see the help of `field3logit`, section **Details** or the main vignette of the package) in order to create a `multifield3logit` object consisting of fields based on the effect of each dummy variable generated by the same `factor` regressor: ```{r} field3logit(mod0, delta = '<>', p0 = refpoint, narrows = 1) ``` ## Confidence regions The package `plot3logit` allows also to draw the confidence regions associated to each effect, both in case of `field3logit` and `multifield3logit` objects. The confidence regions can be computed when the function `field3logit` is called by setting the argument `conf`. Otherwise, they can be added later through the function `add_confregions` as it follows: ```{r} field0 <- add_confregions(field0, conf = 0.95) field0 ``` and through the same syntax in case of `multifield3logit` objects: ```{r} mfields <- add_confregions(mfields, conf = 0.95) ``` The statistic `stat_conf3logit` permits confidence regions to be drawn, if available: ```{r, eval=FALSE} gg3logit(field0) + stat_field3logit() + stat_conf3logit() ``` and ```{r, message = FALSE} gg3logit(mfields) + stat_field3logit(aes(colour = label)) + stat_conf3logit(aes(fill = label)) + theme_zoom_L(0.45) ``` ## References