31 Item Response Theory

One could make a case that item response theory is the most important statistical method about which most of us know little or nothing.
— David Kenny

Item response theory (IRT) is arguably one of the most influential developments in the field of educational and psychological measurement. IRT provides a foundation for statistical methods that are utilized in contexts such as test development, item analysis, equating, item banking, and computerized adaptive testing. Its applications also extend to the measurement of a variety of latent constructs in a variety of disciplines.

Given its role and influence in educational and psychological measurement, the topic of IRT has accumulated an extensive literature. Rather than cover every detail, this module gives a broad overview of IRT, with the intention of helping you understand key concepts and common applications. For comprehensive treatments of IRT, see de Ayala (2009) and Embretson and Reise (2000). For a comparison of CTT and IRT, see Hambleton and Jones (1993). Harvey and Hammer (1999) describe IRT specifically in the context of psychological testing.

This module begins with a comparison of IRT with classical test theory (CTT), including a discussion of strengths and weaknesses and some typical uses of each. Next, the traditional dichotomous IRT models are introduced with definitions of key terms and a comparison based on assumptions, benefits, limitations, and uses. Finally, details are provided on applications of IRT in item analysis, test development, item banking, and computer adaptive testing.

31.1 Learning objectives

Compare and contrast IRT and CTT in terms of their strengths and weaknesses.
Identify the two main assumptions that are made when using a traditional IRT model, regarding dimensionality and functional form or the number of model parameters.
Identify key terms in IRT, including probability of correct response, logistic curve, theta, IRF, TRF, SEM, and information functions.
Define the three item parameters and one ability parameter in the traditional IRT models, and describe the role of each in modeling performance with the IIF.
Distinguish between the 1PL, 2PL, and 3PL IRT models in terms of assumptions made, benefits and limitations, and applications of each.
Describe how IRT is utilized in item analysis, test development, item banking, and computer adaptive testing.

In this module, we’ll use epmr to run IRT analyses with PISA09 data, and we’ll use ggplot2 for plotting results.

# R setup for this module
library("epmr")
#> 
#> Attaching package: 'epmr'
#> The following object is masked from 'package:psych':
#> 
#>     skew
library("ggplot2")
# Functions we'll use in this module
# rirf() for getting an item response function
# scale_y_continuous() for plotting a continuous scale
# irtstudy() from epmr for running an IRT study
# rtef() from epmr for getting a test error function
# rtif() from epmr for getting a test information function