References

Abedi, Jamal. 2004. “The No Child Left behind Act and English Language Learners: Assessment and Accountability Issues.” Educational Researcher 33: 4–14.

AERA, APA, and NCME. 1999. Standards for educational and psychological testing. Washington DC: American Educational Research Association.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.

Beck, AT, RA Steer, and GK Brown. 1996. “Manual for the BDI-II.” San Antonio, TX: Psychological Corporation.

Bennett, Randy Elliot. 2011. “Formative Assessment: A Critical Review.” Assessment in Education: Principles, Policy & Practice 18 (1): 5–25.

Black, P., and D. Wiliam. 1998. “Inside the Black Box: Raising Standards Through Classroom Assessment.” Phi Delta Kappan 80: 139–48.

Briggs, D. C. 2009. “Preparation for College Admission Exams.” Arlington, VA: National Association for College Admission Counseling.

Carter, S D. 2002. “Matching Training Methods and Factors of Cognitive Ability: A Means to Improve Training Outcomes.” Human Resource Development Quarterly 13: 71–88.

Cizek, G. J. 2010. “An Introduction to Formative Assessment.” In Handbook of Formative Assessment, edited by H. L. Andrade and G. J. Cizek, 3–17. New York, NY: Routledge.

College Board. 2012. “The SAT Report on College and Career Readiness: 2012.” New York, NY: College Board.

de Ayala, R. J. 2009. The Theory and Practice of Item Response Theory. New York, NY: The Guilford Press.

De Boeck, Paul, Marjan Bakker, Robert Zwitser, Michel Nivard, Abe Hofman, Francis Tuerlinckx, and Ivailo Partchev. 2011. “The Estimation of Item Response Models with the Lmer Function from the Lme4 Package in R.” Journal of Statistical Software 39 (12): 1–28.

Deno, S. L. 1985. “Curriculum-based measurement: The emerging alternative.” Exceptional Children 52: 219–32.

Deno, S. L., L. S. Fuchs, D. Marston, and J. Shin. 2001. “Using curriculum-based measurement to establish growth standards for students with learning disabilities.” School Psychology Review 30: 507–24.

Doran, H., D. Bates, P. Bliese, and M. Dowling. 2007. “Estimating the Multilevel Rasch Model: With the Lme4 Package.” Journal of Statistical Software 20 (2): 1–18.

Ebel, R. 1961. “Must All Tests Be Valid?” American Psychologist 16 (640–647).

Embretson, S. E., and S. P. Reise. 2000. Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.

Ferketich, S. 1991. “Focus on Psychometrics: Aspects of Item Analysis.” Research in Nursing & Health 14: 165–68.

Fuchs, L. S., and D. Fuchs. 1999. “Monitoring student progress toward the development of reading competence: A review of three forms of classroom-based assessment.” School Psychology Review 28: 659–71.

Hambleton, R. K., and R. W. Jones. 1993. “Comparison of Classical Test Theory and Item Response Theory and Their Applications to Test Development.” Educational Measurement: Issues and Practice, 38–47.

Harvey, R. J., and A. L. Hammer. 1999. “Item Response Theory.” The Counseling Psychologist 27: 353–83.

Haynes, S. N., D. C. S. Richard, and E. S. Kubany. 1995. “Content Validity in Psychological Assessment: A Functional Approach to Concepts and Methods.” Psychological Assessment 7: 238–47.

Hursh, D. 2005. “The Growth of High-Stakes Testing in the USA: Accountability, Markets, and the Decline in Educational Equality.” British Educational Research Journal 31: 605–22.

Kane, M. T. 2013. “Validating the Interpretations and Uses of Test Scores.” Journal of Educational Measurement 50: 1–73.

Kuncel, N. R., S. A. Hezlett, and D. S. Ones. 2001. “A Comprehensive Meta-Analysis of the Predictive Validity of the Graduate Record Examinations: Implications for Graduate Student Selection and Performance.” Psychological Bulletin 127: 162–81.

Linn, R. L., E. L. Baker, and D. W. Betebenner. 2002. “Accountability Systems: Implications of Requirements of the No Child Left Behind Act of 2001.” Educational Researcher 31 (3–16).

Lord, F. M. 1952. “A theory of test scores.” Psychometric Monographs. No. 7.

Mehrens, W. A. 1992. “Using performance assessment for accountability purposes.” Educational Measurement: Issues and Practice 11: 3–9.

Messick, S. 1980. “Test validity and the ethics of assessment.” American Psychologist 35: 1012–27.

Militello, Matthew, Jason Schweid, and Stephen G Sireci. 2010. “Formative Assessment Systems: Evaluating the Fit Between School Districts’ Needs and Assessment Systems’ Characteristics.” Educational Assessment, Evaluation and Accountability 22 (1): 29–52.

Miller, C., and K. Stassun. 2014. “A Test That Fails.” Nature 510: 303–4.

Nunnally, J. C, and I. H. Bernstein. 1994. Psychometric Theory. New York, NY: McGraw-Hill.

Pope, K S, J N Butcher, and J Seelen. 2006. The MMPI, MMPI-2, & MMPI-A in Court: A Practical Guide for Expert Witnesses and Attorneys (3rd). Washington, DC: American Psychological Association.

Rasch, G. 1960. Probabilistic Models for Some Intelligence and Attainment Tests. Chicago, IL: University of Chicago Press.

Raymond, M. 2001. “Job Analysis and the Specification of Content for Licensure and Certification Examinations.” Applied Measurement in Education 14: 369–415.

Rizopoulos, Dimitris. 2006. “ltm: An R Package for Latent Variable Modelling and Item Response Theory Analyses.” Journal of Statistical Software 17 (5): 1–25. http://www.jstatsoft.org/v17/i05/.

Robinson, Ken. 1999. “All Our Futures: Creativity, Culture and Education.” London: Department for Education; Employment.

Santelices, M. V., and M. Wilson. 2010. “Unfair Treatment? The Case of Freedle, the SAT, and the Standardization Approach to Differential Item Functioning.” Harvard Educational Review 80: 106–34.

Spearman, Charles. 1904. “General Intelligence, Objectively Determined and Measured.” The American Journal of Psychology 15 (2): 201–92.

Sternberg, Robert J, and Wendy M Williams. 1997. “Does the Graduate Record Examination Predict Meaningful Success in the Graduate Training of Psychology? A Case Study.” American Psychologist 52 (6): 630–41.

Stiggins, R. J. 1987. “The Design and Development of Performance Assessments.” Educational Measurement: Issues and Practice 6: 33–42.

Torrance, E. P. 1981a. “Empirical Validation of Criterion-Referenced Indicators of Creative Ability Through a Longitudinal Study.” Creative Child and Adult Quarterly 6: 136–40.

———. 1981b. “Predicting the Creativity of Elementary School Children.” Gifted Child Quarterly 25: 55–62.

US Department of Education. 2002. “A New Era: Revitalizing Special Education for Children and Their Families.” Washington, DC: US Department of Education.

Whisman, Mark A, John E Perez, and Wiveka Ramel. 2000. “Factor Structure of the Beck Depression Inventory - Second Edition (BDI-II) in a Student Sample.” Journal of Clinical Psychology 56 (4): 545–51.

Wiliam, D., and P. Black. 1996. “Meanings and consequences: A basis for distinguishing formative and summative functions of assessment?” British Educational Research Journal 22: 537–48.