eatATA efficiently translates test design requirements for Automated Test Assembly (ATA) into constraints for a Mixed Integer Linear Programming Model (MILP). A number of efficient and user-friendly functions are available, and the resulting matrix of constrains can be easily transformed to be in line with a MILP solver, like the Gurobi solver. In the remainder of this vignette I will illustrate the typical use of eatATA using a case-based example.
The eatATA package can be installed from GitHub. Note that older R versions had issues with installations from online repositories like GitHub. R version > 3.6.0 should work without any issues.
devtools::install_github("beckerbenj/eatATA")To use eatATA, you also need to install the Gurobi solver and its corresponding R package. A detailed vignette on the installation process can be found here.
First, eatATA is loaded into your R session.
# loading eatATA
library(eatATA)No ATA without an item pool. In this example I use a fictional example item pool of 80 items. The item pool information is stored as an excel file that is included in the package. To import the item pool information into R I recommend using the package readxl. This package imports the data as a tibble, but in the code below, the item pool is immediately transformed into a data.frame.
Note that R requires a rectangular data set. Yet, often excel files store additional information in rows above or below the "rectangular" item pool information. The skip argument in the read_excel() function can be used to skip unnecessary rows in the excel file. (Note that the item pool can also be directly accessed in the package via items; see ?items for more information.)
items_path <- system.file("extdata", "items.xlsx", package = "eatATA")
items <- as.data.frame(readxl::read_excel(path = items_path), stringsAsFactors = FALSE)Inspection of the item pool indicates that the items have different properties: item format (MC, CMC, short_answer, or open), difficulty (diff_1 - diff_5), average response times in minutes (RT_in_min). In addition, similar items can not be in the same booklet or test form. This information is stored in the column exclusions, which indicates which items are too similar and should not be in the same booklet with the item in that row..
head(items)
#>   Item_ID                exclusions RT_in_min subitems MC CMC short_answer open
#> 1 item_00          item_01, item_06       1.0        1 NA  NA            1   NA
#> 2 item_01          item_00, item_06       1.5        1 NA  NA            1   NA
#> 3 item_02 item_04, item_63, item_62       2.0        1 NA  NA           NA    1
#> 4 item_03                      <NA>       1.5        1 NA  NA            1   NA
#> 5 item_04 item_02, item_63, item_62       1.5        1 NA  NA            1   NA
#> 6 item_05                      <NA>       1.0        1 NA  NA            1   NA
#>   diff_1 diff_2 diff_3 diff_4 diff_5
#> 1      1     NA     NA     NA     NA
#> 2     NA      1     NA     NA     NA
#> 3     NA     NA      1     NA     NA
#> 4     NA     NA      1     NA     NA
#> 5     NA      1     NA     NA     NA
#> 6      1     NA     NA     NA     NABefore defining the constraints, item pool data has to be in the correct format. For instance, some dummy variables (indicator variables) in the item pool use both NA and 0 to indicate "the category does not apply". Therefore, the dummy variables should be transformed so that there are only two values (1 = "the category applies", and 0 = "the category does not apply").
Often a set of dummy variables can be summarized into a single factor variable. This is automatically done by the function dummiesToFactor(). However, the function can only be used when the categories are mutually exclusive. For instance, in the example item pool, items can contain sub-items with different format or difficulties. As a result, some items contain two sub-items with different formats. Therefore, in this example, the dummiesToFactor() function throws an error and cannot be used.
# clean data set (categorical dummy variables must contain only 0 and 1)
items <- dummiesToFactor(items, dummies = c("MC", "CMC", "short_answer", "open"), facVar = "itemFormat")
#> Error in dummiesToFactor(items, dummies = c("MC", "CMC", "short_answer", : All values in the 'dummies' columns have to be 0, 1 or NA.
items <- dummiesToFactor(items, dummies = paste0("diff_", 1:5), facVar = "itemDiff")
#> Error in dummiesToFactor(items, dummies = paste0("diff_", 1:5), facVar = "itemDiff"): All values in the 'dummies' columns have to be 0, 1 or NA.
items[c(24, 33, 37, 47, 48, 54, 76), ]
#>    Item_ID       exclusions RT_in_min subitems MC CMC short_answer open diff_1
#> 24 item_23             <NA>       3.5        2  1   1           NA   NA     NA
#> 33 item_32          item_36       1.5        2 NA  NA            2   NA      1
#> 37 item_36 item_27, item_32       1.5        2 NA  NA            2   NA      1
#> 47 item_46 item_54, item_44       2.5        2 NA  NA            2   NA     NA
#> 48 item_47 item_45, item_37       2.0        2 NA  NA            2   NA     NA
#> 54 item_53 item_43, item_59       2.5        2 NA  NA            2   NA     NA
#> 76 item_75             <NA>       1.5        2 NA  NA            2   NA     NA
#>    diff_2 diff_3 diff_4 diff_5
#> 24     NA      2     NA     NA
#> 33     NA      1     NA     NA
#> 37      1     NA     NA     NA
#> 47      1      1     NA     NA
#> 48     NA      1      1     NA
#> 54      1      1     NA     NA
#> 76     NA      1      1     NAIn addition, the column short_answer can have NA as a value, and is consequently not a dummy variable. Therefore, I will (a) treat short_answer as a numerical value, (b) collapse MC and open into a new factor MC_open_none, (these dummies are mutually exclusive), and (c) turn CMC and the difficulty indicators into factors. (See ?autoItemValuesMinMax and ?computeTargetValues for further information on the different treatment of factors and numerical variables.)
# make new factor with three levels: "MC", "open" and "else"
items <- dummiesToFactor(items, dummies = c("MC", "open"), facVar = "MC_open_none")
#> Warning in dummiesToFactor(items, dummies = c("MC", "open"), facVar = "MC_open_none"): For these rows, there is no dummy variable equal to 1: 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20, 21, 25, 28, 29, 30, 32, 33, 34, 36, 37, 38, 39, 40, 41, 44, 45, 46, 47, 48, 50, 54, 55, 58, 60, 65, 67, 68, 69, 70, 72, 74, 76, 77, 79, 80
#> A '_none_ 'category is created for these rows.
# clean data set (NA should be 0)
for(ty in c(paste0("diff_", 1:5), "CMC", "short_answer")){
  items[, ty] <- ifelse(is.na(items[, ty]), yes = 0, no = items[, ty])
}
# make factors of CMC dummi
items$f_CMC <- factor(items$CMC, labels = paste("CMC", c("no", "yes"), sep = "_"))
# example item format
table(items$short_answer)
#> 
#>  0  1  2 
#> 34 38  8In this example, the goal is to assemble 14 booklets out of the 80 items item pool. All items should be assigned to one (and only one booklet), so that there is no item overlap and the item pool is completely depleted.
To be more precise, the required constraints are:
no item overlap between test blocks
complete item pool depletion
equal distribution of item formats across test blocks
equal difficulty distribution across test blocks
some items can not be together in the same booklet (item exclusions)
as similar as possible response times across booklets
For ease of use, I set up two variables that I will use frequently: the number of test forms or booklets to be created (nForms) and the number of items in the item pool (nItems).
# set up fixed variables
nItems <- nrow(items)  # number of items
nForms <- 14           # number of blockseatATA offers a variety of function the automatically compute the constraints mentioned above.
The first two constraints (no item overlap and item pool depletion) can be implemented by a single function: itemUsageConstraint(). To achieve this, the operator argument should be set to "=".
itemOverlap <- itemUsageConstraint(nForms, nItems = nItems, operator = "=") Constraints with respect to categorical variables or factors (like MC_open_none) or numerical variables (like short_answer), can be easily implemented using the autoItemValuesMinMax() function. The result of this function depends on whether a factor or a numerical variable is used. That is, autoItemValuesMinMax() automatically determines the minimum and maximum frequency of each category of a factor. But for numerical variables, it automatically determines the target value.
The allowedDeviation argument specifies the allowed range between booklets regarding the category or the numerical value. If the argument is omitted, it defaults to "no deviation is allowed" for numerical values, and to the minimal possible deviation for categorical variables or factors. Hence, for numeric values, I will specify allowedDeviation = 1. The function prints the calculated target value or the resulting allowed value range on booklet level.
# item formats
mc_openItems <- autoItemValuesMinMax(nForms = nForms, itemValues = items$MC_open_none)
#> The minimum and maximum frequences per test form for each item category are
#>        min max
#> MC       1   2
#> _none_   3   4
#> open     0   1
cmcItems <- autoItemValuesMinMax(nForms = nForms, itemValues = items$f_CMC)
#> The minimum and maximum frequences per test form for each item category are
#>         min max
#> CMC_no    5   6
#> CMC_yes   0   1
saItems <- autoItemValuesMinMax(nForms = nForms, itemValues = items$short_answer, allowedDeviation = 1)
#> The minimum and maximum values per test form are: min = 2.86 - max = 4.86
# difficulty categories
Items1 <- autoItemValuesMinMax(nForms = nForms, itemValues = items$diff_1, allowedDeviation = 1)
#> The minimum and maximum values per test form are: min = 0 - max = 2
Items2 <- autoItemValuesMinMax(nForms = nForms, itemValues = items$diff_2, allowedDeviation = 1)
#> The minimum and maximum values per test form are: min = 0.57 - max = 2.57
Items3 <- autoItemValuesMinMax(nForms = nForms, itemValues = items$diff_3, allowedDeviation = 1)
#> The minimum and maximum values per test form are: min = 1.64 - max = 3.64
Items4 <- autoItemValuesMinMax(nForms = nForms, itemValues = items$diff_4, allowedDeviation = 1)
#> The minimum and maximum values per test form are: min = 0 - max = 1.86
Items5 <- autoItemValuesMinMax(nForms = nForms, itemValues = items$diff_5, allowedDeviation = 1)
#> The minimum and maximum values per test form are: min = 0 - max = 1.29To implement item exclusion constraints, two function can be used: itemExclusionTuples() and itemExclusionConstraint(). When item exclusions are supplied as a single character string for each item, with item identifiers separated by ", ", they should be transformed first.
# item exclusions variable
items$exclusions[1:5]
#> [1] "item_01, item_06"          "item_00, item_06"         
#> [3] "item_04, item_63, item_62" NA                         
#> [5] "item_02, item_63, item_62"This transformation can be done using the itemExclusionTuples() function, which creates so called tuples: pairs of exclusive items. These tuples can be used directly with the itemExclusionConstraint() function.
# item exclusions
exclusionTuples <- itemExclusionTuples(items, idCol = "Item_ID", 
                                       exclusions = "exclusions", sepPattern = ", ")
excl_constraints <- itemExclusionConstraint(nForms = 14, exclusionTuples = exclusionTuples, 
                                            itemIDs = items$Item_ID)Another helpful function is the itemsPerFormConstraint() function, which constrains the number of items per test forms. However, since this is not required in this example, I will not use these constraints in the final ATA constraints.
# number of items per test form
min_Nitems <- floor(nItems / nForms) - 3
noItems <- itemsPerFormConstraint(nForms = nForms, nItems = nItems, 
                                  operator = ">=", min_Nitems)Finally, I am setting up an optimization constraint. This constraint is not a clear yes or no constraint, and it does not have to be attained perfectly. Instead, the solver will minimize the distance of the actual booklet value for all booklets towards a target value. In our example, we specify 10 minutes as the target response time RT_in_min for all booklets.
# optimize average time
av_time <- itemTargetConstraint(nForms, nItems = nItems, itemValues = items$RT_in_min, targetValue = 10)Before calling the optimization algorithm the specified constraints should be formatted to be in line with Gurobi. First, I collect all the constraints that should be used in a list, which I then use in the prepareConstraints() function.
# Prepare constraints
gurobi_constr <- list(itemOverlap, mc_openItems, cmcItems, saItems, 
                      Items1, Items2, Items3, Items4, Items5, excl_constraints,
                      av_time)
gurobi_rdy <- prepareConstraints(gurobi_constr, nForms = nForms, nItems = nItems)Now I can call Gurobis gurobi() function, which will solve the optimization problem. Using the params argument you can set options and parameters of the solver. By setting the TimeLimit to 30, I limit Gurobi to stop searching for an optimal solution after 30 seconds. For most small test assembly problems, however, computation times will be much shorter.
library(gurobi)
# Optimization
solver_raw <- gurobi(gurobi_rdy, params = list(TimeLimit = 30))If the problem is feasible (which means that there is at least one solution that satisfies all the constraints), the output printed by Gurobi will look like this:
#> Optimize a model with 990 rows, 1121 columns and 9324 nonzeros
#> Variable types: 1 continuous, 1120 integer (1120 binary)
#> Coefficient statistics:
#>   Matrix range     [1e+00, 4e+00]
#>   Objective range  [1e+00, 1e+00]
#>   Bounds range     [0e+00, 0e+00]
#>   RHS range        [5e-01, 1e+01]
#> Found heuristic solution: objective 3.5000000
#> Presolve removed 392 rows and 0 columns
#> Presolve time: 0.01s
#> Presolved: 598 rows, 1121 columns, 8204 nonzeros
#> Variable types: 0 continuous, 1121 integer (1120 binary)
#> 
#> Root relaxation: objective 2.857143e-01, 476 iterations, 0.01 seconds
#> 
#>     Nodes    |    Current Node    |     Objective Bounds      |     Work
#>  Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time
#> 
#>      0     0    0.28571    0   41    3.50000    0.28571  91.8%     -    0s
#> H    0     0                       1.5000000    0.28571  81.0%     -    0s
#> H    0     0                       1.0000000    0.28571  71.4%     -    0s
#> H    0     0                       0.5000000    0.28571  42.9%     -    0s
#> 
#> Explored 1 nodes (1119 simplex iterations) in 0.07 seconds
#> Thread count was 8 (of 8 available processors)
#> 
#> Solution count 4: 0.5 1 1.5 3.5 
#> 
#> Optimal solution found (tolerance 1.00e-04)
#> Best objective 5.000000000000e-01, best bound 5.000000000000e-01, gap 0.0000%If the problem is unfeasible, the output printed by Gurobi will look like this:
#> Optimize a model with 1004 rows, 1121 columns and 10444 nonzeros
#> Variable types: 1 continuous, 1120 integer (1120 binary)
#> Coefficient statistics:
#>   Matrix range     [1e+00, 4e+00]
#>   Objective range  [1e+00, 1e+00]
#>   Bounds range     [0e+00, 0e+00]
#>   RHS range        [5e-01, 8e+01]
#> Presolve removed 70 rows and 1120 columns
#> Presolve time: 0.00s
#> 
#> Explored 0 nodes (0 simplex iterations) in 0.00 seconds
#> Thread count was 1 (of 8 available processors)
#> 
#> Solution count 0
#> 
#> Model is infeasible
#> Best objective -, best bound -, gap -If the latter happens, one option is to relax some of the constraints. Further, for first diagnostic purposes you can omit some constraints completely, to see which constraints are especially challenging. If you have a better grasp of the possibilities of the item pool, you can add these constraints back, but for example with larger allowedDeviations.
The solution provided by Gurobi can be inspected using the processGurobiOutput() function. If the output argument is set to "list", you get a list output which is easy to read. If the output argument to "data.frame", you get an output optimized for exporting, for example to excel.
out_list <- processGurobiOutput(solver_raw, items = items, nForms = nForms, output = "list")
## first two booklets
out_list[1:2]
#> [[1]]
#>    Item_ID       exclusions RT_in_min subitems MC CMC short_answer open diff_1
#> 21 item_20          item_45       2.0        1 NA   0            1   NA      0
#> 24 item_23             <NA>       3.5        2  1   1            0   NA      0
#> 36 item_35             <NA>       1.5        1 NA   0            1   NA      0
#> 42 item_41          item_26       1.0        1  1   0            0   NA      0
#> 48 item_47 item_45, item_37       2.0        2 NA   0            2   NA      0
#>    diff_2 diff_3 diff_4 diff_5 MC_open_none   f_CMC form_1 form_2 form_3 form_4
#> 21      0      1      0      0       _none_  CMC_no      1      0      0      0
#> 24      0      2      0      0           MC CMC_yes      1      0      0      0
#> 36      0      0      0      1       _none_  CMC_no      1      0      0      0
#> 42      1      0      0      0           MC  CMC_no      1      0      0      0
#> 48      0      1      1      0       _none_  CMC_no      1      0      0      0
#>    form_5 form_6 form_7 form_8 form_9 form_10 form_11 form_12 form_13 form_14
#> 21      0      0      0      0      0       0       0       0       0       0
#> 24      0      0      0      0      0       0       0       0       0       0
#> 36      0      0      0      0      0       0       0       0       0       0
#> 42      0      0      0      0      0       0       0       0       0       0
#> 48      0      0      0      0      0       0       0       0       0       0
#> 
#> [[2]]
#>    Item_ID                exclusions RT_in_min subitems MC CMC short_answer
#> 2  item_01          item_00, item_06       1.5        1 NA   0            1
#> 19 item_18          item_48, item_56       2.0        1  1   0            0
#> 20 item_19                   item_76       1.5        1 NA   0            1
#> 60 item_59 item_53, item_43, item_29       1.5        1 NA   0            1
#> 62 item_61                   item_60       3.0        1 NA   0            0
#>    open diff_1 diff_2 diff_3 diff_4 diff_5 MC_open_none  f_CMC form_1 form_2
#> 2    NA      0      1      0      0      0       _none_ CMC_no      0      1
#> 19   NA      0      0      0      1      0           MC CMC_no      0      1
#> 20   NA      0      0      1      0      0       _none_ CMC_no      0      1
#> 60   NA      0      1      0      0      0       _none_ CMC_no      0      1
#> 62    1      0      0      1      0      0         open CMC_no      0      1
#>    form_3 form_4 form_5 form_6 form_7 form_8 form_9 form_10 form_11 form_12
#> 2       0      0      0      0      0      0      0       0       0       0
#> 19      0      0      0      0      0      0      0       0       0       0
#> 20      0      0      0      0      0      0      0       0       0       0
#> 60      0      0      0      0      0      0      0       0       0       0
#> 62      0      0      0      0      0      0      0       0       0       0
#>    form_13 form_14
#> 2        0       0
#> 19       0       0
#> 20       0       0
#> 60       0       0
#> 62       0       0In this case I also want to assemble the created booklets into test forms. Therefore, I am interested in booklet exclusions that can result from item exclusions. The analyzeBlockExclusion() function can be used to obtain tuples with booklet exclusions.
analyzeBlockExclusion(processedObj = out_list, idCol = "Item_ID", exclusionTuples = exclusionTuples)
#>      Name 1   Name 2
#> 1   block 2  block 4
#> 2   block 4  block 9
#> 3   block 2  block 9
#> 4   block 8  block 9
#> 5  block 14  block 9
#> 6   block 5  block 9
#> 7  block 14  block 8
#> 8   block 5  block 8
#> 9  block 10  block 9
#> 10 block 10  block 6
#> 11 block 14  block 6
#> 12  block 6  block 8
#> 14 block 10  block 7
#> 15 block 10 block 14
#> 16 block 10  block 8
#> 19  block 2  block 7
#> 20  block 2  block 5
#> 22  block 1  block 5
#> 23 block 10 block 13
#> 24  block 3  block 9
#> 25  block 1 block 11
#> 28  block 3  block 5
#> 30 block 12  block 5
#> 31  block 1 block 12
#> 33 block 11  block 6
#> 34  block 5  block 7
#> 36 block 14  block 3
#> 37 block 14  block 7
#> 39  block 3  block 7
#> 41 block 12  block 8
#> 43  block 2  block 8
#> 44 block 14  block 5
#> 45 block 11  block 4Finally, when the solution should be exported as an excel file (.xlsx), This can, for example, be achieved via the eatAnalysis package, which has to be installed from Github.
devtools::install_github("beckerbenj/eatAnalysis")
out_df <- processGurobiOutput(solver_raw, items = items, nForms = nForms, output = "data.frame")
eatAnalysis::write_xlsx(out_df, filePath = "example_excel.xlsx",
                        row.names = FALSE)