43
Enright, M., & Quinlan, T. (2010). Using e-rater
®
to score essays written by English language
learners: A complement to human judgment. Language Testing 27(3), 317–334.
Golub-Smith, M. L., Reese, C. M., & Steinhaus, K. (1993). Topic and topic type comparability on
the Test of Written English
TM
(TOEFL Research Report No. 42; ETS RR-93-10).
Princeton, NJ: ETS.
Herrington, A., & Moran, C. (2001). What happens when machines read our students' writing?
College English, 63(4), 480-499.
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Kane, M. (2001). Validating high-stakes testing programs. Educational Measurement: Issues and
Practice, 21(1), 31–35.
Kirsch, I., Jamieson, J., Taylor, C., & Eignor, D. (1998). Computer familiarity among TOEFL
examinees (TOEFL Research Report No. 59; ETS RR-98-06). Princeton, NJ: ETS.
Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2001). A comprehensive meta-analysis of the
predictive validity of the Graduate Record Examinations
®
: Implications for graduate
student selection and performance. Psychological Bulletin, 127(1), 162–181.
Landauer, T. K., Laham, D., & Foltaz, P. W. (2003). Automated scoring and annotation of essays
with the Intelligent Essay Assessor. In M. Shermis & J. Burstein (Eds.), Automated essay
scoring: A cross-disciplinary perspective (pp. 87–112). Mahwah, NJ: Lawrence Erlbaum
Associates.
Lee, Y.-W. (2006). Variability and validity of automated essay scores for TOEFL iBT: Generic,
hybrid, and prompt-specific models. Unpublished manuscript. Princeton, NJ: ETS.
Lee, Y.-W., Gentile, C., & Kantor, R. (2008). Analytic scoring of TOEFL
®
CBT essays: Scores
from humans and e-rater
®
(TOEFL Research Report No. RR-81; ETS RR-08-01).
Princeton, NJ: ETS.
Light, R.L., Xu, M., & Mossop, J. (1987). English proficiency and academic performance of
international students. TESOL Quarterly, 21(2), 251–261.
Linacre, J. M. (2010). Facets Rasch measurement computer program, version 3.67.1 [Computer
software]. Chicago, IL: Winsteps.com.
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to
the raters? Language Testing, 19, 246–276.