Using jrt

Nils Myszkowski, PhD

2023-04-13

This package provides user-friendly functions designed for the easy implementation of Item-Response Theory (IRT) models and scoring with judgment data. Although it can be used in a variety of contexts, the original motivation for implementation is to facilitate use for creativity researchers.

Disclaimer

jrt is not an estimation package, it provides wrapper functions that call estimation packages and extract/report/plot information from them. At this stage, jrt uses the (excellent) package mirt (Chalmers, 2012) as its only IRT engine. Thus, if you use jrt for your research, please ensure to cite mirt as the estimation package/engine:

We also encourage that you cite jrt – especially if you use the plots or the automatic model selection. Currently, this would be done with:

Ok now let’s get started…

What the data should look like

Then, a judgment data.frame would be provided to the function jrt. Here we’ll use the simulated one in jrt::ratings.

data <- jrt::ratings

It looks like this:

head(data)
#>   Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6
#> 1       5       4       3       4       4       4
#> 2       3       3       2       3       2       2
#> 3       3       3       3       3       3       2
#> 4       3       2       2       3       4       2
#> 5       2       3       1       2       2       1
#> 6       3       2       2       3       2       1

jrt is in development and these features will hopefully appear soon (check back !), but in this release:

I know, that’s a lot that you can’t do…but this covers the typical cases, at least for the Consensual Assessment Technique – which is why it was originally created.

Model fitting, scoring and statistics with jrt()

You will first want to first load the library.

library(jrt)
#> Loading required package: directlabels

The main function of the jrt package is jrt(). By default, this function will:

Let’s do it!

fit <- jrt(data, progress.bar = F)
#> The possible responses detected are: 1-2-3-4-5
#> 
#> -== Model Selection (6 judges) ==-
#> AIC for Rating Scale Model: 4414.163 | Model weight: 0.000
#> AIC for Generalized Rating Scale Model: 4368.776 | Model weight: 0.000
#> AIC for Partial Credit Model: 4022.956 | Model weight: 0.000
#> AIC for Generalized Partial Credit Model: 4014.652 | Model weight: 0.000
#> AIC for Constrained Graded Rating Scale Model: 4399.791 | Model weight: 0.000
#> AIC for Graded Rating Scale Model: 4307.955 | Model weight: 0.000
#> AIC for Constrained Graded Response Model: 3999.248 | Model weight: 0.673
#> AIC for Graded Response Model: 4000.689 | Model weight: 0.327
#>  -> The best fitting model is the Constrained Graded Response Model.
#> 
#>  -== General Summary ==-
#> - 6 Judges
#> - 300 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.977 | SD = 0.862
#> 
#> -== IRT Summary ==-
#> - Model: Constrained (equal slopes) Graded Response Model (Samejima, 1969) | doi: 10.1007/BF03372160
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Factor scoring method: Expected A Posteriori (EAP)
#> - AIC = 3999.248 | BIC = 4091.843 | SABIC = 4091.843 | HQ = 4036.305
#> 
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .893
#> - Expected reliability | Assumes a Normal(0,1) prior density: .894

Of course there’s more available here than one would report. If using IRT scoring (which is the main purpose of this package), we recommend reporting what IRT model was selected, along with IRT indices primarily, since the scoring is based on the estimation of the \(\theta\) abilities. In this case typically what is reported in the empirical reliability (here 0.893), which is the estimate of the reliability of the observations in the sample. It can be interpreted similarily as other more traditionnal indices of reliability (like Cronbach’s \(\alpha\)).

fit <- jrt(data, silent = T)

One may of course select a model based on assumptions on the data rather than on model fit comparisons. This is done through using the name of a model as an imput of the argument irt.model of the jrt() function. This bypasses the automatic model selection stage.

fit <- jrt(data, "PCM")
#> The possible responses detected are: 1-2-3-4-5
#> 
#>  -== General Summary ==-
#> - 6 Judges
#> - 300 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.977 | SD = 0.862
#> 
#> -== IRT Summary ==-
#> - Model: Partial Credit Model (Masters, 1982) | doi: 10.1007/BF02296272
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Factor scoring method: Expected A Posteriori (EAP)
#> - AIC = 4022.956 | BIC = 4115.55 | SABIC = 4115.55 | HQ = 4060.012
#> 
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .889
#> - Expected reliability | Assumes a Normal(0,1) prior density: .759

See the documentation for a list of available models. Most models are directly those of mirt. Others are versions of the Graded Response Model or Generalized Partial Credit Model that are constrained in various ways (equal discriminations and/or equal category structures) through the mirt.model() function of mirt.

Note that they can also be called by their full names (e.g. jrt(data, "Graded Response Model")).

head(fit@factor.scores)
#>   Judgments.Factor.Score Judgments.Standard.Error Judgments.Mean.Score
#> 1              1.7075935                0.5824540             4.000000
#> 2             -0.7213210                0.5581823             2.500000
#> 3             -0.1527368                0.5119554             2.833333
#> 4             -0.4246422                0.5319891             2.666667
#> 5             -2.2557844                0.6720457             1.833333
#> 6             -1.4155178                0.6202796             2.166667

Note : If you want a more complete output with the original data, use @output.data. If there were missing data, @output.data also appends imputed data.

head(fit@output.data)
#>   Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6 Judgments.Factor.Score
#> 1       5       4       3       4       4       4              1.7075935
#> 2       3       3       2       3       2       2             -0.7213210
#> 3       3       3       3       3       3       2             -0.1527368
#> 4       3       2       2       3       4       2             -0.4246422
#> 5       2       3       1       2       2       1             -2.2557844
#> 6       3       2       2       3       2       1             -1.4155178
#>   Judgments.Standard.Error Judgments.Mean.Score
#> 1                0.5824540             4.000000
#> 2                0.5581823             2.500000
#> 3                0.5119554             2.833333
#> 4                0.5319891             2.666667
#> 5                0.6720457             1.833333
#> 6                0.6202796             2.166667

Judge Category Curves

Judge characteristics can be inspected with Judge Category Curve (JCC) plots. They are computed with the function jcc.plot().

A basic example for Judge 3…

jcc.plot(fit, judge = 3)

Now of course, there are many options, but a few things that you could try:

jcc.plot(fit)

jcc.plot(fit, judge = c(1,6))

jcc.plot(fit, facet.cols = 2)

jcc.plot(fit, 1, greyscale = T)