The R package `multinomineq`

facilitates the Bayesian
analysis of discrete/categorical data using multinomial models with
inequality constraints. Essentially, these types of models allow to
analyze discrete choice frequencies while assuming that the
corresponding choice probabilities \(\theta\) are constrained by linear
inequality constraints (e.g., by the simple order \(\theta_{1} \leq \theta_{2}\leq \theta_{3}\leq
\theta_{4}\)). In the following, this vignette shows how to
define and test such models in R. The second section below provides a
concise formal definition of the model class.

The following example shows how inequality constraints can be used to test the description-experience (DE) gap in risky decision making (Hertwig et al., 2004). In the experiment, participants had to choose one of two risky gambles in each of 6 different paired comparisons (e.g., a lottery in which you win 10€ with \(p=.20\) versus a lottery in which you can win 20€ with \(p=.45\)). The description-experience gap states that people underweigh small probabilities \(p\) when making decisions based on experience. We will conduct a reanalysis of Hertwig’s data set using the inequality-constrained multinomial model in Regenwetter & Robinson (2017).

First, it is necessary to install the R package
`multinomineq`

from Github. Note that it is necessary to
install the developer R tools first. Afterwards,
`multinomineq`

can be installed and loaded via:

```
### uncomment to install packages:
# install.packages("devtools", "coda", "RcppArmadillo", "RcppProgress", "Rglpk", "quadprog")
# devtools::install_github("danheck/multinomineq")
library("multinomineq")
library("coda")
set.seed(1234)
```

We reanalyze the data by Hertwig et al. (2004) who collected choice frequencies of \(n=25\) participants for 6 paired comparisons of risky gambles. The observed choice frequencies can simply be summarized by the frequencies of choosing “Option H” (a specific lottery, cf. original article) in each of the 6 paired comparisons:

First, it is shown how to use the vertex (\(V\)-)representation for
inequality-constrained models. Essentially, the rows of the matrix \(V\) list all patterns that are predicted by
a substantive theory of interest. Each prediction either states that
“Option H” is chosen (value `1`

) or not (value
`0`

). For example, if the theory makes the prediction that
“Option H” is chosen only in the 4th paired comparison but otherwise
not, the matrix \(V\) needs to include
the following pattern in one row: `c(0,0,0,1,0,0)`

.
Similarly, all possible predictions need to be combined to form the
matrix \(V\). Inequality-constrained
models assume a mixture distribution over all these possible predicted
patterns (e.g., due to heterogeneity across participants or trials).
Geometrically, the corresponding parameter space thus represents the
convex hull of the predicted patterns/vertices (i.e., all possible
linear combinations of the rows of \(V\) when using nonnegative weights).

For the description-experience gap, Regenwetter & Robinson (2017) used a specific decision making model (Cumulative Prospect Theory) to generate all possible predictions. The matrix \(V\) lists all of these predictions when assuming that participants underweigh small probabilities \(p\) of winning money. The following matrix contains all predicted patterns for the 6 risky gambles:

```
# Underweighting of small probabilities CPT:
# (1 = choose "Option H"; 0 = do not choose "Option H")
V <- matrix(
c(
0, 0, 0, 0, 0, 0, # first predicted pattern
0, 0, 0, 1, 0, 0, # second predicted pattern
0, 0, 1, 1, 0, 0, # ...
1, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0,
1, 0, 1, 1, 0, 0,
1, 1, 0, 0, 0, 0,
1, 1, 0, 0, 1, 0,
1, 1, 0, 1, 1, 0,
1, 1, 0, 0, 1, 1,
1, 1, 0, 1, 0, 0,
1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1
),
ncol = 6, byrow = TRUE
)
```

Given this representation of the theoretical predictions, it is not
easy to check whether a given vector of choice probabilities for the 6
gambles is inside the convex hull of the predicted patterns. As a
remedy, one can use the function `inside()`

to test this, or
the function `find_inside()`

to find a probability vector
that is inside the convex hull of the predictions:

```
# define a specific vector of choice probabilities:
p_observed <- c(.25, .48, .93, .10, .32, .50)
# check whether p is inside the convex hull defined by V:
inside(p_observed, V = V)
#> [1] FALSE
# to find a (random) probability that is inside the convex hull:
find_inside(V = V, random = TRUE)
#> [1] 0.8396019 0.7133954 0.2367023 0.4547242 0.4952754 0.4340429
```

Instead of listing the vertices (or edges) of the inequality-constrained parameter space, we may instead define a set of linear inequality constraints on the parameters explicitly. For this purpose, we specify a matrix \(A\) and a vector \(b\). The model only allows parameter vectors \(\theta\) that satisfy the inequalities defined by the matrix equation: \(A \cdot \theta \leq b\).

For the description-experience gap, one can show that instead of listing the predictions by the matrix \(V\) as shown above, one can equivalently describe the linear inequalities as follows:

```
A <- matrix(
c(
0, 0, -1, 0, 0, 0, # first inequality
0, 0, 0, 0, 0, -1, # second inequality
-1, 1, 0, 0, 0, 0, # ...
0, -1, 0, 0, 1, 0,
0, 0, 0, 0, -1, 1,
0, 0, 1, -1, 0, 0,
0, 0, 0, 1, 0, 0,
1, 0, 0, 0, 0, 0
),
ncol = 6, byrow = TRUE
)
b <- c(0, 0, 0, 0, 0, 0, 1, 1)
```

For example, the third row of \(A\) and the third value of \(b\) specify that \(-1 \cdot \theta_1 + 1 \cdot \theta_2 + 0 \cdot \theta_3+ 0 \cdot \theta_4+ 0 \cdot \theta_5+ 0 \cdot \theta_6 \leq 0\) (which is equivalent to \(\theta_2 \leq \theta_1\)).

Using the matrix \(A\) and the
vector \(b\), one can use standard
matrix multiplication in R (`%*%`

) to check whether a given
vector of choice probabilities satisfies all of the inequality
constraints:

```
p_observed <- c(.25, .48, .93, .10, .32, .50)
all(A %*% p_observed <= b)
#> [1] FALSE
# corresponding function in multinomineq:
inside(p_observed, A, b)
#> [1] FALSE
# find a point that satisfies the constraints:
find_inside(A, b, random = TRUE)
#> [1] 0.4849912 0.4236391 0.4196148 0.4196248 0.4236291 0.4236191
```

Note that it is sometimes possible to convert the vertex (\(V\)-)representation (convex hull) to the
inequality (\(Ab\)-)representation
using specific software. For older R versions (before 4.0.0), this could
be done with the package `rPorta`

from Github:

In a Bayesian framework, inequality-constrained multinomial models can be fitted by drawing posterior samples for the parameter vector of choice probabilities \(\theta\). If the model is defined in terms of a matrix \(V\) with predicted patterns, we obtain \(M=2,000\) posterior samples as follows:

```
# fit data from Description and Experience condition:
p_D <- sampling_binom(k = HER_desc, n = n, V = V, M = 2000, progress = FALSE)
p_E <- sampling_binom(k = HER_exp, n = n, V = V, M = 2000, progress = FALSE)
```

If the model is defined in terms of the inequality \(Ab\)-representation, we obtain posterior
samples in a similar way by replacing the argument `V=V`

by
the two arguments `A=A, b=b`

:

```
# fit data from Description and Experience condition:
p_D <- sampling_binom(
k = HER_desc, n = n, A = A, b = b,
M = 2000, progress = FALSE
)
p_E <- sampling_binom(
k = HER_exp, n = n, A = A, b = b,
M = 2000, progress = FALSE
)
```

If possible, the \(Ab\)-representation should be used because the sampling is much more efficient than for the \(V\)-representation. Irrespective of how the posterior samples have been obtained, we can use standard methods in Bayesian analysis to check the convergence of the MCMC sampler and obtain summary statistics of the posterior distribution:

```
# summarize posterior samples:
summary(p_D)
#>
#> Iterations = 11:2000
#> Thinning interval = 1
#> Number of chains = 1
#> Sample size per chain = 1990
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> [1,] 0.5981 0.05581 0.001251 0.003776
#> [2,] 0.5607 0.05367 0.001203 0.004074
#> [3,] 0.4417 0.06956 0.001559 0.004187
#> [4,] 0.4865 0.07051 0.001581 0.004135
#> [5,] 0.5098 0.05610 0.001258 0.003800
#> [6,] 0.4632 0.06159 0.001381 0.003376
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> var1 0.4862 0.5614 0.5977 0.6350 0.7078
#> var2 0.4490 0.5267 0.5595 0.5956 0.6657
#> var3 0.3060 0.3924 0.4422 0.4881 0.5771
#> var4 0.3552 0.4399 0.4853 0.5329 0.6282
#> var5 0.3913 0.4730 0.5114 0.5459 0.6188
#> var6 0.3453 0.4214 0.4638 0.5069 0.5832
```

Similarly, we can use posterior-predictive checks to test whether an
inequality-constrained model fits the data adequately (i.e., whether the
constraints hold empirically). Essentially, this method computes
Pearson’s \(X^2\)-statistic to measure
the discrepancy both for the observed and the posterior-predicted
frequencies. If the model does *not* fit the data, the
posterior-predicted \(p\)-value
`ppp`

will become very small (and it will be close to 0.50 if
the model fits). Hence, we usually apply a criterion such as
`ppp < .05`

to reject models that do not fit the data, as
in the following example:

Often, one is interested in quantifying the evidence in favor of the
inequality constraints. We can use Bayesian model selection to compare
the inequality-constrained model \(M_0\) against the unconstrained model \(M_u\) that does not restrict the choice
probabilities \(\theta\) (also called
encompassing model). The Bayes factor `bf_0u`

quantifies the
evidence in favor of the inequality-constrained model \(M_0\) versus the unconstrained model \(M_u\). Similarly, `bf_u0`

is the
inverse of `bf_0u`

and provides the evidence for the
unconstrained model. Moreover, the Bayes factor `bf_00'`

compares the inequality-constrained model \(M_0\) against the model \(M_{0'}\) which has a parameter space
defined as the *complement* of the parameter space of \(M_0\).

Bayes factors are computed as follows:

```
# compute Bayes factor using the V-representation
bf_D <- bf_binom(k = HER_desc, n = n, V = V, M = 10000, progress = FALSE)
bf_E <- bf_binom(k = HER_exp, n = n, V = V, M = 10000, progress = FALSE)
```

```
# compute Bayes factor using the Ab-representation
bf_D <- bf_binom(HER_desc, n = n, A = A, b = b, M = 10000, progress = FALSE)
bf_E <- bf_binom(HER_exp, n = n, A = A, b = b, M = 10000, progress = FALSE)
bf_D
#> bf se ci.5% ci.95%
#> bf_0u 0 3.608596e-03 1.200689e-05 9.795163e-03
#> bf_u0 Inf 3.608004e+07 1.020912e+02 8.328551e+04
#> bf_00' 0 3.539881e-03 1.179143e-05 9.617549e-03
bf_E
#> bf se ci.5% ci.95%
#> bf_0u 36.1269036 2.598473041 32.18187465 40.67857138
#> bf_u0 0.0276802 0.001987101 0.02458297 0.03107339
#> bf_00' 122.8414969 9.369839417 108.56683879 139.11544600
```

Note that `multinomineq`

also provides an error of
approximation for the Bayes factor to account for random sampling error.
The matrix that is provided as output shows the standard error
`se`

for the Bayes factor and the 90% credibility interval
`ci.5%`

and `ci.95%`

.

Within the function `bf_binom`

, the encompassing Bayes
factor is computed in two steps based on the *unconstrained*
multinomial model \(M_u\):

- Draw
*prior*parameter samples from \(M_u\) and count the proportion of samples \(c\) inside the inequality-constrained parameter space. - Draw
*posterior*parameter samples from \(M_u\) and count the proportion of samples \(f\) inside the inequality-constrained parameter space.

Based on these two proportions, the Bayes factor in favor of the inequality constraints is approximated as \(B_{0u}=f/c\). It might be convenient to perform these computations in separate steps in R for some models and datasets. For example, the prior constant \(c\) is sometimes known analytically. Moreover, when analyzing multiple datasets or participants, it is more efficient to estimate the prior constant \(c\) only once because \(c\) is identical for all analyses:

```
# proportion of *prior* samples satisfying inequality constraints
cnt <- count_binom(k = 0, n = 0, A = A, b = b, M = 50000, progress = FALSE)
cnt
#> Number of samples satisfying the inequality constraints:
#> count M steps
#> [1,] 1037 50000 8
#>
#> To extract the proporiton of samples, use:
#> attr(count, "proportion") = 0.02074 (SE = 0.00064279).
# proportion of *prior* samples satisfying inequality constraints
# (dataset: Experience)
cnt_E <- count_binom(k = HER_exp, n = n, A = A, b = b, M = 10000, progress = FALSE)
cnt_E
#> Number of samples satisfying the inequality constraints:
#> count M steps
#> [1,] 7191 10000 8
#>
#> To extract the proporiton of samples, use:
#> attr(count, "proportion") = 0.7191 (SE = 0.004491353).
# obtain Bayes factor (including error of approximation)
count_to_bf(posterior = cnt_E, prior = cnt)
#> bf se ci.5% ci.95%
#> bf_0u 34.67213115 1.105057000 32.93231334 36.51122429
#> bf_u0 0.02884161 0.000919178 0.02738884 0.03036531
#> bf_00' 120.87230740 4.688334259 113.52968049 128.88221949
```

Since Bayes factors can be hard to compute for some models and data
sets, `multinomineq`

provides a stepwise procedure that aims
at increasing the efficiency. The user can specify a minimum number of
samples `cmin`

that must satisy a subset of the inequality
constraints. The algorithm keeps sampling until this threshold is
reached. The argument `steps`

determines at which rows of the
matrix \(A\) the inequalities are split
into subsets of increasing size. For example,
`steps = c(3, 5, 7)`

means that the first subset contains
only the first 3 inequalities, the second subset the first 5
inequalities etc.:

```
# use a stepwise counting approach for the "Description" condition:
# ("cmin" = minimum of samples that satisfy constraints before sampling stops)
cnt_D <- count_binom(
k = HER_desc, n = n, A = A, b = b,
M = 5000, cmin = 1, steps = c(3, 5, 7), progress = FALSE
)
cnt_D
#> Number of samples satisfying the inequality constraints:
#> count M steps
#> [1,] 125 5000 3
#> [2,] 84 5000 5
#> [3,] 25 5000 7
#> [4,] 5000 5000 8
#>
#> To extract the proporiton of samples, use:
#> attr(count, "proportion") = 2.1e-06 (SE = 5.337439e-07).
# obtain Bayes factor (including error of approximation)
count_to_bf(posterior = cnt_D, prior = cnt)
#> bf se ci.5% ci.95%
#> bf_0u 1.012536e-04 2.530872e-05 6.689430e-05 1.489683e-04
#> bf_u0 9.876190e+03 2.589882e+03 6.712837e+03 1.494896e+04
#> bf_00' 9.915382e-05 2.479517e-05 6.549815e-05 1.458469e-04
```

In the present example, we only observed *binomial* data,
which can easily be described by the frequency of choosing one of two
options. If more than two choice options are provided for a comparison,
a different data format has to be used for *multinomial* data. In
the binomial case, we ommitted the frequencies of *not* choosing
a specific option because we knew the total number \(n\). For multinomial data, one needs to
specify *all* choice frequencies \(k_{ij}\). Moreover, one needs to specify a
vector `options`

that defines how many choice options are
available for each item type (e.g., `options = c(3,3,)`

means
that there are two scenarios, each with three possible options).

Using this notation, the 6 binomial choice frequencies above can equivalently be specified by the multinomial data format as follows:

```
# binomial data format
n <- 25
HER_desc <- c(9, 16, 16, 7, 12, 16)
# multinomial data format ("k" must be a vector!)
k_multinom <- c(
9, 16, # first binary gamble
16, 9, # second binary gamble
16, 9, # ...
7, 17,
12, 13,
16, 9
)
options <- c(2, 2, 2, 2, 2, 2) # 2 options for each type
```

The binomial format can also be translated to the multinomial format automatically:

```
mn <- binom_to_multinom(HER_desc, n)
mn
#> $k
#> p1_1 p1_2 p2_1 p2_2 p3_1 p3_2 p4_1 p4_2 p5_1 p5_2 p6_1 p6_2
#> 9 16 16 9 16 9 7 18 12 13 16 9
#>
#> $options
#> [1] 2 2 2 2 2 2
```

The multinomial data format is used as input for the analysis as follows:

```
posterior <- sampling_multinom(k = mn$k, options = mn$options, A = A, b = b)
bayesfactor <- bf_multinom(k = k_multinom, options = options, A = A, b = b)
bayesfactor
#> bf se ci.5% ci.95%
#> bf_0u 0 5.662915e-03 1.542493e-05 1.555550e-02
#> bf_u0 Inf 2.935595e+08 6.428594e+01 6.483043e+04
#> bf_00' 0 5.527939e-03 1.503289e-05 1.515908e-02
```

In the following, we define inequality-constrained multinomial models for discrete data. We use the term “item type” to refer to a category system \(i\) that is modeled by a multinomial distribution with a fixed total number of observations \(n_i\). For an item type \(i\), we label the observed frequencies as \(k_{ij}\).

The parameter vector of interest refers to the choice probabilities: \[\theta = (\theta_{11},\dots,\theta_{1 (J_1-1)},\theta_{21},\dots, \theta_{I (J_I-1)})\] Note that the last probability within each item type is ommited because probabilities have to sum to one. Hence, for \(n_i=3\) choice options there are only 2 free parameters.

Using this notation, we can define a product-multinomial likelihood function: \[p(k \mid \theta) = \prod_{i=1}^I {n_i \choose k_{i1} ,\dots ,k_{i J_i} } \prod_{j=1}^{J_i - 1} \theta_{ij}^{k_{ij}} \left(1- \sum_{r=1}^{J_i-1} \theta_{ir}\right)^{k_{J_i}}\]

Inequality constraints can be defined in two different, equivalent ways. First, one may define a list of inequality constraints that have to hold for the choice probabilities, for instance: \[\begin{align*} \theta_{11} \leq \theta_{21} \, \text{ and } \, \theta_{21} \leq \theta_{31} \end{align*}\] These inequalities can be expressed in vector notation as \(A \,\theta \leq b\) by defining a matrix $ A$ and a vector $ b$ as follows: \[ A = \pmatrix{1 & -1 & 0 \\ 0 & 1 & -1} \,\, \text{ and }\,\, b = \pmatrix{0 \\ 0}\] To formalize this notation, we define the constrained parameter space \(\Omega_0\) via a matrix \(A\) and a vector \(b\) that specify a list of inequalities that have to hold: \[\Omega_0 = \left\{\theta \in \Omega \, \middle | \, A \, \theta \leq b \right\}\]

Alternatively, one may list all the vertices \(v^{(s)}\) defining the constrained parameter space in the rows of a matrix \(V\). In the example above, we have \[ v^{(1)} = \pmatrix{0\\0\\0} \,,\, v^{(2)} = \pmatrix{0\\0\\1} \,,\, v^{(3)} = \pmatrix{0\\1\\1} \,,\, v^{(1)} = \pmatrix{1\\1\\1}\] These vectors can be convenietly summarizes in a matrix $ V$ with one vertex per row: \[ V = \pmatrix{0 & 0 & 0\\ 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 1}\] In general, the restricted parameter space \(\Omega_0\) is defined as the convex hull of the set of vertices \(v^{(s)}\): \[\Omega_0 = \left\{ \theta = \sum_{s=1}^S \alpha_s v^{(s)} \,\middle |\, \alpha_s \geq 0 \text{ for all } s=1,\dots, S \text{ and }\sum_{s=1}^S \alpha_s=1 \right\}\]

As a prior distribution for the parameters \(\theta\), we assume independent Dirichlet distributions for the different item types \(i\) with shape parameters \(\alpha_{ij}\). Note that we only allow parameters that are inside the constrained parameter space \(\Omega_0\): \[p(\theta) = \frac {1}{c} \, \mathbb I_{\Omega_0}(\theta) \, \prod_{i=1}^I \prod_{j=1}^{J_i} \theta_{ij}^{\alpha_{ij}-1}\] Here, \(\mathbb I_{\Omega_0}(\theta)\) is the indicator function which is one if \(\theta\in\Omega_0\) and zero otherwise. Moreover, the normalizing constant \(c\) ensures that the prior density function integrates to one.

Finally, the analytical solution of the posterior distribution is also a product-Dirichlet distribution, truncated to the constrained parameter space: \[p(\theta \mid k) = \frac {1}{f} \, \mathbb I_{\Omega_0}(\theta) \, \prod_{i=1}^I \prod_{j=1}^{J_i} \theta_{ij}^{k_{ij} + \alpha_{ij}-1}\] Similarly as before, \(f\) is the normalizing constant of the posterior density function.

For a more detailed introduction with technical details, see:

- Heck, D. W., & Davis-Stober, C. P. (2019). Multinomial models with linear inequality constraints: Overview and improvements of computational methods for Bayesian inference. Manuscript under revision. https://arxiv.org/abs/1808.07140