`icons`

dataset in the `hyper2`

packageTo cite the `hyper2`

package in publications, please use
Hankin (2017). This short document
analyses the climate change dataset introduced by O’Neill (2007) and discussed in the
`hyperdirichlet`

R package (Hankin
2010), but using the improved `hyper2`

package
instead. O’Neill and Hulme (2009) observe
that lay perception of climate change is a complex and interesting
process, and here we assess the engagement of non-experts by the use of
“icons” (this word is standard in this context. An icon is a
“representative symbol”) that illustrate different impacts of climate
change.

Here we analyse results of one experiment (O’Neill 2007), in which subjects were presented
with a set of icons of climate change and asked to identify which of
them they found most concerning. Six icons were used: PB [polar bears,
which face extinction through loss of ice floe hunting grounds], NB [the
Norfolk Broads, which flood due to intense rainfall events], L [London
flooding, as a result of sea level rise], THC [the thermo-haline
circulation, which may slow or stop as a result of anthropogenic
modification of the water cycle], OA [oceanic acidification as a result
of anthropogenic emissions of \({\mathrm
C}{\mathrm O}_2\)], and WAIS [the West Antarctic Ice Sheet, which
is rapidly calving as a result of climate change]. Methodological
constraints dictated that each respondent could be presented with a
maximum of four icons. The R idiom below (dataset `icons`

in
the package) shows the experimental results.

```
## NB L PB THC OA WAIS
## [1,] 5 3 NA 4 NA 3
## [2,] 3 NA 5 8 NA 2
## [3,] NA 4 9 2 NA 1
## [4,] 10 3 NA 3 4 NA
## [5,] 4 NA 5 6 3 NA
## [6,] NA 4 3 1 3 NA
## [7,] 5 1 NA NA 1 2
## [8,] 5 NA 1 NA 1 1
## [9,] NA 9 7 NA 2 0
```

(M is called `icons_table`

in the package). Each row of
`M`

corresponds to a particular cue given to respondents. The
first row, for example, means that a total of \(5+3+4+3=15\) people were shown icons NB, L,
THC, WAIS [column names of the the non-NA entries]; 5 people chose NB as
most concerning, 3 chose L, and so on. The dataset is more fully
described in the package. The builtin `icons`

likelihood
function in the `hyper2`

package may be created by using the
`saffy()`

function:

```
## log(L^24 * (L + NB + OA + THC)^-20 * (L + NB + OA + WAIS)^-9 * (L + NB
## + THC + WAIS)^-15 * (L + OA + PB + THC)^-11 * (L + OA + PB + WAIS)^-18
## * (L + PB + THC + WAIS)^-16 * NB^32 * (NB + OA + PB + THC)^-18 * (NB +
## OA + PB + WAIS)^-8 * (NB + PB + THC + WAIS)^-18 * OA^14 * PB^30 *
## THC^24 * WAIS^9)
```

`## [1] TRUE`

At this point, the `icons`

object as created above is
mathematically identical to the `icons`

object in the
`hyperdirichlet`

package (and indeed the `hyper2`

package), but the terms might appear in a different order due to
`disordR`

discipline.

The first step is to find the maximum likelihood estimate for the
`icons`

likelihood:

```
## NB L PB THC OA WAIS
## 0.25230 0.17364 0.22458 0.17011 0.11069 0.06867
```

We also need the log-likelihood at an unconstrained maximum:

`## [1] -174.997435`

We see agreement to 4 decimal places with the value given in the
`hyperdirichlet`

package. The next step is to assess a number
of hypotheses concerning the relative sizes of \(p_1\) through \(p_6\).

Below, I investigate a number of inequality hypotheses about \(p_1,\ldots, p_6\). The general analysis proceeds as follows; we can use \(H_0\colon p_1=1/6\) as an example but the method applies to any observation. I consider the general case first, then modifications for one-sided hypotheses.

- We have an observation \({\mathcal O}\) and wish to quantify the strength of evidence to support this.
- Translate our observation into a hypothesis phrased in terms as a restriction on parameter space.
- State a null hypothesis and alternative. For example, we might have \(H_0\colon p_1=1/6\) and \(H_A\colon p_1\neq 1/6\), and \(H_A\) is the complement of \(H_0\). We sometimes write \(H_A\colon\sum p_i=1\), ignoring the (measure zero) \(H_0\).
- We then attempt to reject \(H_0\) and to do this we perform two likelihood maximizations: one unconstrained, corresponding to \(H_A\), and one constrained, corresponding to \(H_0\).
- We calculate \({\mathcal L}_{H_0} :=\operatorname{argmax}_{p\in H_0}{\mathcal L}(p)\) and \({\mathcal L}_{H_A} := \operatorname{argmax}_{p\in H_A}{\mathcal L}(p)\) and observe that, in general, \({\mathcal O}\) implies the strict inequality \({\mathcal L}_{H_A} > {\mathcal L}_{H_0}\).
- Calculate the likelihood ratio \(R={\mathcal L}_{H_A}/{\mathcal
L}_{H_0}>1\) or equivalently the
*support*for \(H_0\), \(\Lambda = \log R>0\). - We may either use Edwards’s criterion of two units of support per degree of freedom, or Wilks’s theorem: \(H_0\longrightarrow 2\Lambda\sim\chi^2_d\). If we are considering inequality hypotheses, we have \(d=1\).
- If the criterion is met we may reject \(H_0\) and infer that \(H_A\) is the case.

As another example, consider \(H_0\colon p_1=p_2=p_3\). The vector \({\mathbf p}\in \left\{\left.\left(p_1,\ldots,p_6\right)\right| \sum p_i=1\right\}\) is a point in a 5-dimensional manifold, and \(H_0\) asserts that \({\mathbf p}\in \left\{\left.\left(p_1,p_1,p_1,p_4,p_5,p_6\right)\right| 3p_1+p_4+p_5+p_6=1\right\}\), that is, a 3-dimensional manifold. The null imposes a loss of two degrees of freedom.

The logic above operates for one-sided tests. We might observe that \(\hat{p_1}\) is the largest and wish to test a null of \(p_1\leqslant 1/6\) against an alternative of \(p_1>1/6\). Note that the one-sided p-value and likelihood ratio statistic are the same as the two-sided values.

Below I will rework some of the hypotheses tested in the hyperdirichlet package, with consistent labelling of null and alternative hypotheses, and renumbering

The most straightforward null would be the hypothesis of player equality, specifically:

\[ H_E\colon p_1=p_2=\cdots=p_n=\frac{1}{n}. \]

(“E” for equal). This was not carried out in Hankin (2010), because I had not thought of it.
Testing \(H_E\) is implemented in the
package as `test.equalp()`

. Note that because \(H_E\) is a point hypothesis we would have
\(n-1\) degrees of freedom (because
\(H_0\) has \(n-1\) df).

```
##
## Constrained support maximization
##
## data: icons
## null hypothesis: NB = L = PB = THC = OA = WAIS
## null estimate:
## NB L PB THC OA WAIS
## 0.166666667 0.166666667 0.166666667 0.166666667 0.166666667 0.166666667
## (argmax, constrained optimization)
## Support for null: -184.37715 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## NB L PB THC OA WAIS
## 0.2523041150 0.1736443259 0.2245818764 0.1701128100 0.1106860420 0.0686708306
## (argmax, free optimization)
## Support for alternative: -174.997435 + K
##
## degrees of freedom: 5
## support difference = 9.37971546
## p-value: 0.00213083779
```

Following the analysis in Hankin (2010), and restated above, we first observe that NB [the Norfolk Broads] is the icon with the largest estimated probability with \(\hat{p_1}\simeq 0.25\) (O’Neill gives a number of theoretical reasons to expect \(p_1\) to be large). This would suggest that \(p_1\) is in fact large, in some sense, and here I show how to assess this statement statistically. Consider the hypothesis \(H_0\colon p_1=1/6\), and as per the protocol above we will try to reject it.

To that end, we perform a constrained optimization, with (active)
constraint that \(p_1\leqslant 1/6\)
(note that the inequality constraint allows us to use fast
`maxp()`

, which has access to derivatives). We note the
support at the evaluate and then compare this support with the support
at the unconstrained evaluate; if the difference in support is large
then this constitutes strong evidence for \(H_A\) and then would conclude that \(p_1 > 1/6\). In package idiom, the
optimization is implemented by the `specificp.test()`

suite
of functions; these work by imposing additional constraints to the
`maxp()`

function via the `fcm`

and
`fcv`

arguments. Using the defaults we have:

```
##
## Constrained support maximization
##
## data: H
## null hypothesis: sum p_i=1, p_1 = 0.166666667
## null estimate:
## NB L PB THC OA WAIS
## 0.1666666514 0.1935760159 0.2520057446 0.1848196512 0.1248364824 0.0780954545
## (argmax, constrained optimization)
## Support for null: -177.614346 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## NB L PB THC OA WAIS
## 0.2523041150 0.1736443259 0.2245818764 0.1701128100 0.1106860420 0.0686708306
## (argmax, free optimization)
## Support for alternative: -174.997435 + K
##
## degrees of freedom: 1
## support difference = 2.61691171
## p-value: 0.0221517868 (two sided)
```

which tests a null of \(p_1\leqslant\frac{1}{6}\); see how the
evaluate under the null is on the boundary and we have \(\hat{p_1}=\frac{1}{6}\). Compare the
support of 2.607 with 2.608181 from the `hyperdirichlet`

package. This exceeds Edwards’s two-units-of-support criterion; the
\(p\)-value is obtained by applying
Wilks’s theorem on the asymptotic distribution of 2.

Both these criteria indicate that we may reject that hypothesis that \(p_1\leqslant 1/6\) and thereby infer \(p_1>\frac{1}{6}\).

We observe that NB (the Norfolk Broads) has large strength, and hypothesise that it is in fact stronger than any other icon. This is another constrained likelihood maximization, although this one is not possible with convex constraints. In the language of the generic procedure given above, we would have

\[ H_0\colon (p_1\leqslant p_2)\cup (p_1\leqslant p_3)\cup (p_1\leqslant p_4)\cup (p_1\leqslant p_5)\cup (p_1\leqslant p_6) \]

\[ \overline{H_0}\colon (p_1 > p_2)\cap (p_1 > p_3)\cap (p_1 > p_4)\cap (p_1 > p_5)\cap (p_1 > p_6) \]

(although note that \(H_0\) has
nonzero measure). In words, \(H_0\)
states that \(p_1\) is smaller than
\(p_2\), or \(p_1\) is smaller than \(p_3\), etc; while \(\overline{H_0}\) states that \(p_1\) is larger than \(p_2\), and is larger than \(p_3\), and so on. Considering the evaluate,
we see that \({\mathcal L}_{H_0} <
{\mathcal L}_{\overline{H_0}}\), so optimizing over \(\overline{H_0}\) is equivalent to
unconstrained optimization [of course, the intrinsic constraints \(p_i\geqslant 0, \sum p_i\leqslant 1\) have
to be respected]. Here, \(\overline{H_0}\) is regions of \((p_1,\ldots,p_6)\) with \(p_1\) being greater than *at least
one* of \(p_2,\ldots,p_6\). The
union of convex sets is not necessarily convex (e.g. a two-way Venn
diagram). As far as I can see, the only way to do it is to perform a
sequence of five constrained optimizations: \(p_1\leqslant p_2, p_1\leqslant p_3, p_1\leqslant
p_4, p_1\leqslant p_5\). The fillup constraint would be \(p_1\leqslant p_6\longrightarrow 2p_1+p_2+\cdots
+p_5\leqslant 1\). We then choose the largest likelihood from the
five.

```
o <- function(Ul,Cl,startp,give=FALSE){
small <- 1e-4 # ensure start at an interior point
if(missing(startp)){startp <- small*(1:5)+rep(0.1,5)}
out <- maxp(icons, startp=small*(1:5)+rep(0.1,5), give=TRUE, fcm=Ul,fcv=Cl)
if(give){
return(out)
}else{
return(out$value)
}
}
p2max <- o(c(-1, 1, 0, 0, 0), 0) # p1 <= p2
p3max <- o(c(-1, 0, 1, 0, 0), 0) # p1 <= p3
p4max <- o(c(-1, 0, 0, 1, 0), 0) # p1 <= p4
p5max <- o(c(-1, 0, 0, 0, 1), 0) # p1 <= p5
p6max <- o(c(-2,-1,-1,-1,-1),-1) # p1 <= p6 (fillup)
```

(the final line is different because \(p_6\) is the fillup value).

`## [1] -175.810521 -175.081858 -175.983441 -178.204173 -181.515387`

`## [1] -175.081858`

Thus the first element of `likes`

corresponds to the
maximum likelihood, constrained so that \(p_1\leqslant p_2\); the second element
corresponds to the constraint that \(p_1\leqslant p_3\), and so on. The largest
likelihood is the easiest constraint to break, in this case \(p_1\leqslant p_3\): this makes sense
because \(p_3\) has the second highest
MLE after \(p_1\). The extra likelihood
is given by

`## [1] 0.0844236585`

(the `hyperdirichlet`

package gives 0.0853 here, a
surprisingly small discrepancy given the difficulties of optimizing over
a nonconvex region). We conclude that there is no evidence for \(p_1\geqslant\max\left(p_2,\ldots,p_6\right)\).

It’s worth looking at the evaluate too:

```
o2 <- function(Ul,Cl){
jj <-o(Ul,Cl,give=TRUE)
out <- c(jj[[1]],1-sum(jj[[1]]),jj[[2]])
names(out) <- c("p1","p2","p3","p4","p5","p6","support")
return(out)
}
rbind(
o2(c(-1, 1, 0, 0, 0), 0), # p1 <= p2
o2(c(-1, 0, 1, 0, 0), 0), # p1 <= p3
o2(c(-1, 0, 0, 1, 0), 0), # p1 <= p4
o2(c(-1, 0, 0, 0, 1), 0), # p1 <= p5
o2(c(-2,-1,-1,-1,-1),-1) # p1 <= p6
)
```

```
## p1 p2 p3 p4 p5 p6
## [1,] 0.211674267 0.211674268 0.226222108 0.169452254 0.111434270 0.0695428329
## [2,] 0.238552641 0.174097331 0.238552998 0.169739010 0.110517185 0.0685408341
## [3,] 0.209075905 0.175046706 0.228380262 0.209075907 0.110003635 0.0684175840
## [4,] 0.181049944 0.176566992 0.228589945 0.165076513 0.181049945 0.0676666608
## [5,] 0.159212367 0.174456695 0.234412699 0.168868084 0.103837779 0.1592123762
## support
## [1,] -175.810521
## [2,] -175.081858
## [3,] -175.983441
## [4,] -178.204173
## [5,] -181.515387
```

In the above, the evaluate is the first five columns (\(p_6\) being the fillup) and the final
column is the log-likelihood at the evaluate. See how the constraint is
active in each line: `M[1,] == M[1:5,2:6]`

. Also note that
the largest log-likelihood is the second row: if we were to violate any
of the constraints, it would be \(p_1<p_3\), consistent with the fact that
\(p_3\) (polar bears) has the second
highest strength, after \(p_1\).

The next hypothesis follows from the observed smallness of \(\hat{p_5}\) (ocean acidification) and \(\hat{p_6}\) (West Antarctic Ice Sheet) at 0.111 and 0.069 respectively. These two strengths correspond to “distant” concerns and O’Neill had reason to consider their sum (which she argued would be small). Thus we specify \(H_0\colon p_5+p_6\leqslant \frac{1}{3}\).

The optimizing constraint of \(p_5+p_6\geqslant\frac{1}{3}\) translates to an operational constraint of \(-p_1-p_2-p_3-p_4\geqslant-\frac{2}{3}\) (because \(p_5+p_6=1-p_1-p_2-p_3-p_4\)):

`## [1] -182.21078`

then the extra support is

`## [1] 7.21334577`

(compare 7.711396 in `hyperdirichlet`

, not sure why the
discrepancy is so large).

The final example is motivated by the fact that both the distant
icons \(p_5\) and \(p_6\) had lower strength than any of the
local icons \(p_1,\ldots, p_4\). Thus
we would have \(H_0\colon\max\left\{p_5,p_6\right\}\geqslant\min\left\{p_1,p_2,p_3,p_4\right\}\).
This means the null optimization is constrained so that *at least
one* of \(\left\{p_5,p_6\right\}\)
exceeds *at least one* of \(\left\{p_1,p_2,p_3,p_4\right\}\). So we
have the union of the various possibilities:

\[\begin{equation}\label{eq:Habar}H_0= \overline{H_A}\colon \bigcup_{j\in\left\{5,6\right\}\atop k\in\left\{1,2,3,4\right\}} \left\{\left(p_1,p_2,p_3,p_4,p_5,p_6\right)\left|\sum p_i=1, p_j\geqslant p_k\right.\right\} \end{equation}\]

The fillup value \(p_6\) behaves differently in this context and \(p_6\geqslant p_2\), say, translates to \(-p_1-2p_2-p_3-p_4-p_5\geqslant -1\).

```
small <- 1e-4
start <- indep(c(small,small,small,small,0.5-2*small,0.5-2*small))
jj <- c(
o(c(-1, 0, 0, 0, 1), 0,start=start), # p1 >= p5
o(c( 0,-1, 0, 0, 1), 0,start=start), # p2 >= p5
o(c( 0, 0,-1, 0, 1), 0,start=start), # p3 >= p5
o(c( 0, 0, 0,-1, 1), 0,start=start), # p4 >= p5
o(c(-2,-1,-1,-1,-1),-1,start=start), # p1 >= p6
o(c(-1,-2,-1,-1,-1),-1,start=start), # p2 >= p6
o(c(-1,-1,-2,-1,-1),-1,start=start), # p3 >= p6
o(c(-1,-1,-1,-2,-1),-1,start=start) # p4 >= p6
)
jj
```

```
## [1] -178.204173 -175.894803 -177.318942 -175.747872 -181.515387 -177.978365
## [7] -180.406280 -177.753717
```

Above, the elements of vector `jj`

are the maximum
likelihoods for each of the separate components of the parameter space
allowed under \(H_0\). Note that these
components are not disjoint. So the maximum likelhood for the whole of
allowable parameters under \(H_0\)
would be the maximum of these maxima:

`## [1] -175.747872`

This corresponds to the fourth component of \(H_0\), viz \(p_4=p_5\). The extra support is thus

`## [1] 0.750437512`

(compare `hyperdirichlet`

which gives 3.16, not sure why
the difference although if pressed I would point to `hyper2`

using multiple walkers in its optimization [defaulting to \(n=10\)] ). We should look at the maximum
value:

```
## $par
## [1] 0.250298786 0.173616322 0.224279186 0.141760611 0.141760626
##
## $value
## [1] -175.747872
##
## $counts
## function gradient
## 551 86
##
## $convergence
## [1] 0
##
## $message
## NULL
##
## $outer.iterations
## [1] 2
##
## $barrier.value
## [1] 0.000172310891
##
## $likes
## [1] -175.747872
##
## $evaluate
## NB L PB THC OA WAIS
## 0.2502987861 0.1736163215 0.2242791861 0.1417606108 0.1417606257 0.0682844698
```

So the evaluate is at the boundary, for \(p_4=p_5\); `THC`

and
`OA`

have the same Bradley-Terry strength. The small amount
of extra support given by allowing an unconstrained optimization would
suggest that there is no strong evidence for the contention
investigated, viz “both the distant icons \(p_5\) and \(p_6\) have lower strength than any of the
local icons \(p_1,\ldots, p_4\)”.

Hankin, R. K. S. 2010. “A Generalization of the Dirichlet
Distribution.” *Journal of Statistical Software* 33 (11):
1–18. https://doi.org/10.18637/jss.v033.i11.

———. 2017. “Partial Rank Data with the
*The R Journal* 9 (2): 429–39.

`hyper2`

Package: Likelihood Functions for
Generalized Bradley-Terry Models.”
O’Neill, S. J. 2007. “An Iconic Approach to Representing Climate
Change.” PhD thesis, School of Environmental Science, University
of East Anglia.

O’Neill, S. J., and M. Hulme. 2009. “An Iconic Approach for
Representing Climate Change.” *Global Environmental
Change* 19: 402–10.