30 Summary

This module provided an introduction to item analysis in cognitive and noncognitive testing, with some guidelines on collecting and scoring pilot data, and an overview of five types of statistics used to examine item level performance, including item difficulty, item discrimination, internal consistency, option analysis, and differential item functioning. These statistics are used together to identify items that contribute or detract from the quality of a measure.

Item analysis, as described in this module, is based on a CTT model of test performance. We have assumed that a single construct is being measured, and that item analysis results are based on a representative sample from our population of test takers. Module 31 builds on the concepts introduced here by extending them to the more complex but also more popular IRT model of test performance.

30.1 Exercises

Explain why we should be cautious about interpreting item analysis results based on pilot data.
For an item with high discrimination, how should $p$-values on the item compare when calculated separately for two groups that differ in their true mean abilities?
Why is discrimination usually lower for CITC as compared with ITC for a given item?
What features of certain response options, in terms of the item content itself, would make them stand out as problematic within a option analysis?
Explain how AID is used to identify items contributing to internal consistency.
Conduct an item analysis on the PISA09 reading items for students in Great Britain (PISA09$cnt == "GBR"). Examine and interpret results for item difficulty, discrimination, and AID.
Conduct a option analysis on SR reading item r414q09, with an interpretation of results.