Books:

  • Jiao, H. & Lissitz, R. W.(in press, Eds.). Test fairness in the new generation of large-scale assessment. Information Age Publisher.
  • Jiao, H., & Lissitz, R. W. (in press, Eds.). Technology enhanced innovative assessment: Development, modeling, and scoring from an interdisciplinary perspective. Information Age Publisher.
  • Jiao, H. , & Lissitz, R. W.(2015, Eds.). The next generation of testing: Common core standards, Smarter-Balanced, PARCC, and the nationwide testing movement. Charlotte: Information Age Publishing Inc.
  • Lissitz, R. W. (2014, Eds.). Value added modeling and growth modeling with particular application to teacher and school effectiveness. Charlotte: Information Age Publishing Inc.
  • Lissitz, R. W. (Editor, in press). Informing the practice of teaching using formative and interim assessment: A systems approach. Charlotte: Information Age Publishing Inc.
  • Lissitz, R. W., & Jiao, H. (2012). Computers and their impact on state assessment: Recent history and predictions for the future.  Charlotte: Information Age Publishing Inc.
  • Lissitz, R. W. (Editor, 2010). The concept of validity: Revisions, new directions and applications. Charlotte: Information Age Publishing Inc.
  • Schafer, W., & Lissitz, R. W. (Editors, 2009). Assessment for alternate achievement standards: Current practices and future directions.  Baltimore: Brooks Publishing.
  • Lissitz, R. W.  (Editor, 2007).  Assessing and modeling cognitive development in school: Intellectual growth and standard setting. Maple Grove: JAM press.
  • Lissitz, R. W. (Editor, 2006). Longitudinal and value added modeling of student performance. Maple Grove: JAM Press.
  • Lissitz, R. W. (Editor, 2005). Value added models in education:  Theory and applications. Maple Grove: JAM Press.
  • Lissitz, R. W., & Schafer, W. D. (Editors, 2002). Assessment in educational reform: Both means and ends. Allyn and Bacon.

 

Journal Articles & Book Chapters:

  • Li, T., Xie, C., & Jiao, H. (in press). Assessing fit of alternative unidimensional polytomous item response models using posterior predictive model checking. Psychological Methods.
  • Li*, T., Jiao, H., & Macready, G. (in press). Different approaches to covariate inclusion in the mixture Rasch model. Educational and Psychological Measurement.
  • Carroll, A. J., Corlett-Rivera, K., Hackman, T., & Zou, J. (2016). E-Book Perceptions and Use in STEM and Non-STEM Disciplines: A Comparative Follow-Up Study. portal: Libraries and the Academy, 16(1), 131-163.
  • Li, Y., Panagiotou, O. A., Black, A., Liao, D., & Wacholder, S. (2016). Multivariate piecewise exponential survival modeling. Biometrics.
  • Jiao, H., Kamata, A., & Xie, C. (2015). A multilevel cross-classified testlet model for complex item and person clustering in item response modeling. In J. Harring, L. Stapleton, & S. Beretvas (Eds.), Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applicationsm. Charlotte, NC: Information Age Publishing.
  • Jiao, H., & Zhang*, Y. (2015). Polytomous multilevel testlet models for testlet-based assessments with complex sampling designs. British Journal of Mathematical and Statistical Psychology, 1, 65-83. Online first,DOI:10.1111/bmsp.12035.
  • Luo, Y., Jiao, H., & Lissitz, R. W. (2015). An empirical study of the impact of the choice of persistence model in value-added modeling upon teacher effect estimates. In L. A. van der Ark, D. Bolt, W.-C. Wang, J. A. Douglas & S.-M. Chow (Eds.), Quantitative psychology research (pp.133-143). Springer, Switzerland.
  • Wolfe, E. W., Jiao, H., & Song, T. (2015). A family of rater accuracy models. Journal of Applied Measurement. 16
  • Wolfe, E., Song, T. W., & Jiao, H. (2015). Features of difficult-to-score essays. Assessing Writing. 27, 1-10.
  • Jiao, H., & Lissitz, R. W. (2014). Direct modeling of student growth with multilevel and mixture extensions. In R. W. Lissitz & H. Jiao (Eds.), Value added modeling and growth modeling with particular application to teacher and school effectiveness. Charlotte: Information Age Publishing Inc.
  • Jiao, H., & Chen, Y.-F. (2014). Differential item and testlet functioning. In A. Kunnan (Ed.), The companion to language assessments (pp.1282-1300). John Wiley & Sons, Inc.
  • Chen, Y.-F., & Jiao, H. (2014). Does model misspecification lead to spurious latent classes? An evaluation of model comparison indices. In R. E. Millsap et al. (Eds.), New development in quantitative psychology, Springer Proceedings in Mathematics & Statistics, 66. DOI 10.1007/978-1-4614-9348-8_22, Springer Science +Business Media, New York.
  • Jiao, H., & Zhang, Y. (2014). Polytomous multilevel testlet models for testlet-based assessments with complex sampling designs. British Journal of Mathematical and Statistical Psychology. Advance online publication. DOI:10.1111/bmsp.12035
  • Wolfe, E. W., Jiao, H., & Song, T. (in press). A family of rater accuracy models. Journal of Applied Measurement.
  • Chen, Y.-F. & Jiao, H. (2014). Exploring the utility of background and cognitive variables in explaining latent differential item functioning: An example of the PISA 2009 reading assessment. Educational Assessment, 19, 77-96.
  • Li, Y. & Lissitz, R. W. (2012). Exploring the full-information bi-factor model in vertical scaling with construct shift. Applied Psychological Measurement, 36, 3-20.
  • Lissitz, R. W., Hou, X., & Slater, S. (2012). The contribution of constructed response items to large scale assessment: measuring and understanding their impact. Journal of Applied Testing Technology, 13, 1-50.
  • Jiao, H., & Chen, Y.-F. (2014). Differential item and testlet functioning. In A. Kunnan (Ed.), The companion to language assessments (pp.1282-1300). John Wiley & Sons, Inc.
  • Jiao, H., Wang, S., & He, W. (2013). Estimation methods for one-parameter testlet models. Journal of Educational Measurement, 50, 186-203.
  • Wang, S., Jiao, H., & Zhang, L. (2013). Validation of longitudinal achievement constructs of vertically scaled computerized adaptive tests: A multiple-indicator, latent-growth modeling approach. International Journal of Quantitative Research in Education, 1, 383-407.
  • Tao, J., Xu, B., Shi, N., & Jiao, H. (2013). Refining the two-parameter testlet response model by introducing testlet discrimination parameters. Japanese Psychological Research, 55, 284-291.
  • Wang, S., McCall, M., Jiao, H., & Harris, G. (2013). Construct validity and measurement invariance of computerized adaptive testing: Application to Measures of Academic Progress (MAP) using confirmatory factor analysis. Journal of Educational and Developmental Psychology, 3, 88-100.
  • Jiao, H., Macready, G., Liu, J., & Cho, Y. (2012). A mixture Rasch model based computerized adaptive test for latent class identification. Applied Psychological Measurement, 36, 469-493.
  • Li, Y., Jiao, H., & Lissitz, R.W. (2012). Applying multidimensional IRT models in validating test dimensionality: An example of K-12 large-scale science assessment. Journal of Applied Testing Technology, issue 2.
  • Lissitz, R. W. (2012). Standard setting: Past, present and perhaps the future. In Ercikan, K, Simon, M, and Rousseau, M (Eds), Improving large scale education assessment: Theory, issues and practice. Taylor and Francis/Routledge.
  • Lissitz, R. W., & Caliço, T. (2012). Validity is an action verb: Commentary on: Clarifying the consensus definition of validity. Journal of Measurement: Interdisciplinary Research and Perspectives10, 75-79.
  • Schafer, W. D., Lissitz, R. W., Zhu, X., Zhang, Y., Hou, X., & Li, Y. (2012). Evaluating teachers and schools using student growth models. Practical Assessment, Research & Evaluation, 17(17), 2.
  • Jiao, H., Lissitz, R. W., Macready, G., Wang, S. & Liang, S. (2011) Comparing the use of mixture Rasch modeling and judgmental procedures for standard setting. Psychological Test and Assessment Modeling, 53, 499-522.
  • Lissitz, R. W., & Li, F. F. (2011). Standard setting in complex performance assessments: An approach aligned with cognitive diagnostic models. Psychological Test and Assessment Modeling, 53, 461-485.
  • Templin, J., & Jiao, H. (2011). Applying model-based approaches to identify performance categories. In G. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (pp. 379-397). New York, NY: Routlege.
  • Fan, W., & Lissitz, R. W.  (2010). A multilevel analysis of students and schools on high school graduation exam: A case of Maryland. International Journal of Applied Educational Studies, 9, 1-18.
  • Jiao, H., & Wang, S. (2010). A multifaceted approach to investigating the equivalence between computer-based and paper-and-pencil assessments: An example of Reading Diagnostics. International Journal of Learning Technology, 5, 264-288.
  • Lissitz, R. W., & Wei, Hua (2008). Consistency of Standard Setting in an Augmented State Testing System. Educational Measurement: Issues and Practice, 27, 46-56.
  • Lissitz, R. W., & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36, 437-448.
  • Lissitz, R. W., & Samuelsen, K. (2007). Further clarification regarding validity and education. Educational Researcher, 36, 482-484.
  • Schafer, W. D., Liu, M., & Wang, H. (2007). Content and Grade Trends in State Assessments and NAEPPractical Assessment Research & Evaluation , 12.
  • Lissitz, R., Doran, H., Schafer, W., & Wilhoft, J.(2006). Growth modeling, value added modeling, and linking: An introduction (2006). In Lissitz, R. W. (Ed.), Longitudinal and value-added models of student performance (pp. 1-46). Maple Grove, MN: JAM Press.
  • Schafer, W. (2006). Growth Scales as Alternative to Vertical Scale. Practical Assessment Research & Evaluation 11.
  • Schafer, W., & Twing, J. (2006). Growth scales and pathways. In Lissitz, R. W. (Ed.), Longitudinal and value-added models of student performance (pp. 321-345). Maple Grove, MN: JAM Press.
  • Walston, J., Lissitz, R. W., & Rudner, L. (2006). The Influence of Web-based Questionnaire Presentation Variations on Survey Cooperation and Perceptions of Survey Quality. The Journal of Official Statistics, 22, 271-291.
  • Li, Y., & Schafer, W. (2005). Increasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests. Journal of Educational Measurement 42, 245-269.
  • Schafer, W. (2005). Technical documentation for alternate assessments (2005). Practical Assessment Research & Evaluation 10.
  • Shafer, W., Gagne, P. & Lissitz, R. (2005). Resistance to Confounding Style and Content in Scoring Constructed Response Items. Educational Measurement: Issues and Practice24, 22-28.

 

Professional Presentations:

  • Li, C. & Jiao, H. (2016, April). Modeling Learning Growth with a Cross-Classified Multilevel IRT Model. Paper to be presented at the 2016 Annual Meeting of the American Educational Research Association. Washington D.C.
  • Li, C. & Jiao, H. (2016, April).A Multilevel Cross-classified Dichotomous Item Response Theory (IRT) Model for Complex Person Clustering Structures. Paper to be presented at the 2016 Annual Meeting of the National Council on Measurement in Education. Washington D.C.
  • Liao, D., & Jiao, H. (2016, April). A multi-group cross-classified testlet model for dual local item dependence in the presence of DIF items. Paper submitted to 2016 International Objective Measurement Workshop, Washington, D.C.
  • Liao, D., Jiao, H, & Lissitz R. W. (2016, April). A conditional IRT model for directional local item dependency in multipart items. Paper to be presented at the 2016 Annual Meeting of the National Council on Measurement in Education, Washington, D.C.
  • Liao, D., & Yang, J. S. (2016, April). Sensitivity analysis of fit indices for multilevel-multidimensional item response models. Paper to be presented at the 2016 Annual Meeting of American Educational Research Association, Washington, D.C.
  • Jiao, H., Wolfe, E., Foltz, P., & Harrell-Williams, L. M. (2015, November). Distributional agreement indices for evaluating the performance of automated scoring. AEA-Europe Conference, Glasgow, England.
  • Song, T., Wolfe, E., & Jiao, H. (2015, November). What makes an essay difficult to score. AEA-Europe Conference, Glasgow, England.
  • Li, Y., Liao, D., & Lee, M-LT. (2015, August). Using Threshold Regression to Analyze Survival Data from Complex Surveys: with Application to Mortality Linked NHANES III Phase II Genetic Data. Paper presented at the 2015 Joint Statistical Meeting, Seattle, WA.
  • Li, C., Jiao, H., & Liao, D. (2015, July). A multilevel cross-classified polytomous item response theory model for complex person clustering structures. Paper presented at the 2015 International Meeting of the Psychometric Society, Beijing, China.
  • Liao, D., & Yang, J. S. (2015, July). Model fit evaluation in multilevel-multidimensional item response models: sensitivity to model misspecification. Paper presented at the 2015 International Meeting of the Psychometric Society, Beijing, China.
  • Jiao, H. (2015, April). A multilevel testlet model for mixed-format tests. Invited presentation at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois.
  • Jiao, H., Dogan, E., & Lissitz, R. W. (2015, April). Modeling local item dependence in multipart items using item splitting. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois.
  • Li, T. & Jiao, H. (2015, April). Guessing detection using hybrid mixture IRT model with response times. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois.
  • Liao, D., & Jiao, H. (2015, April). Multilevel graded response testlet model with complex sampling designs. Paper presented at the 2015 Annual Meeting of the National Council on Measurement in Education, Chicago, IL.
  • Zheng, X., Jiao, H., & Zheng, Q. (2015, April). Evaluating dimensionality assessment procedures in complex-structure noncompensatory framework. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois.
  • Li, C., Jiao, H. & Liao, D.(2015, March). Native Language and School Cluster Effects on English Language Learner’s English Proficiency Using a Cross-classified Multilevel Item Response Theory (IRT) Model. Paper presented on the 2015 EDMS research day at the University of Maryland, College Park, MD.
  • Liao, D., & Jiao, H. (2015, March). Polytomous multilevel testlet models with complex sampling designs. Paper presented on the 2015 EDMS research day at the University of Maryland, College Park, MD.
  • Jiao, H., Kamata, A., & Xie, C. (2014, November). A multilevel cross-classified testlet model for complex item and person clustering in item response modeling. Presented at the conference on Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications. University of Maryland, College Park.
  • Jiao, H., Bryant, R., & Luo, Y. (2014, October). Random vs. adaptive assignment of field-test items in computerized adaptive tests. Paper presented at the International Association of Computerized Adaptive Testing 2014 Meeting, Princeton, New Jersey.
  • Jiao, H., & Lissitz, R. (2014, October). Exploring a psychometric model for calibrating innovative items embedded in multiple contexts. Presented at the Fourteenth Annual Maryland Assessment Conference: Technology enhanced innovative assessment: Development, modeling, and scoring from an interdisciplinary perspective. University of Maryland, College Park.
  • Jiao, H., & Yao, L. (2014, August). Estimation of noncompensatory multidimensional Rasch model. Paper presented at the Meeting of the Pacific-Rim Objective Measurement Symposium. Guangzhou, China.
  • Luo, Y., Jiao, H., & Lissitz, R. (2014, July). An Empirical study of the impact of the choice of persistence model in value-added modeling upon teacher effect estimates. Paper presented at the Psychometric Society 2014 Meeting, Madison, Wisconsin.
  • Jiao, H. & Yang, X. (2014, May). A multicomponent testlet model. Presented at the Third Workshop on Statistical Methods in Cognitive Assessments. Fudan University, Shanghai, China.
  • Liao, D., & Jiao, H. (2014, May). Polytomous multilevel testlet models with complex sampling designs. Paper presented at the international conference “Frontiers of Hierarchical Modeling in Observational Studies, Complex Surveys and Big Data”, College Park, MD.
  • Luo, Y., & Jiao, H. (2014, May). Estimation methods for four-level Rasch model. Poster presented at the conference on the Frontiers of Hierarchical Modeling in Observational Studies, Complex Surveys and Big Data, College Park, Maryland.
  • Jiao, H., & Wang, S. (2014, April). Modeling complex binary item responses with an IRT model with internal restrictions on item difficulty. Paper presented at the National Council on Measurement in Education 2014 Meeting, Philadelphia, PA.
  • Jiao, H., Wolfe, E., & Song, T. (2014, April). Guessing in Rasch modeling. Paper presented at the 2014 International Objective Measurement Workshop, Philadelphia, PA.
  • Song, T., Wolfe, E., & Jiao, H. (2014, April). Features of difficult-to-score essays. Paper presented at the National Council on Measurement in Education 2014 Meeting, Philadelphia, PA.
  • Li, M., Li, T., Jiao, H. & Lissitz, R. (2014, April). Handling missing data in value added models: An empirical study.Paper presented at the National Council on Measurement in Education 2014 Meeting, Philadelphia, PA.
  • Li, T., Li, M., Jiao, H., & Lissitz, R. (2014, April). Bias in multilevel IRT estimation of teacher effectiveness. Paper presented at the National Council on Measurement in Education 2014 Meeting, Philadelphia, PA.
  • Xie, C., & Jiao, H. (2014, April). Cross-classified modeling of dual local item dependence. Paper presented at the National Council on Measurement in Education 2014 Meeting, Philadelphia, PA.
  • Jiao, H. (2014, February). Polytomous multilevel testlet models for testlet-based assessments with complex sampling designs. Presented at the Joint Program in Survey Methodology, University of Maryland, College Park.
  • Jiao, H., Kamata, A., Van Wie, A., & Luo, Y. (2013). A multilevel testlet model for multiple hierarchical levels of person clustering effects. Paper presented at the National Council on Measurement in Education 2013 Meeting, San Francisco, CA.
  • Kang, Y., Lissitz, R. W., Li, M., & Xie, C. (2013). Effect of unmodeled measurement error in value-added modeling: a simulation study. Paper presented at the American Educational Research Association 2013 Meeting, Division D 2013 In-Progress Research Gala, San Francisco, CA.
  • Li, M., Lissitz, R. W., Kang, Y., & Xie, C. (2013). Applying cross-validation method to value-added models. Paper presented at the 2013 annual meeting of the American Educational Research Association 2013 Meeting, Division D 2013 In-Progress Research Gala, San Francisco, CA.
  • Li, T., Xie, C., & Jiao, H. (2013). Assessing fit of alternative polytomous item response models using posterior predictive model checking. Paper presented at the American Educational Research Association 2013 Meeting, San Francisco, CA.
  • Xie, C., & Jiao, H. (2013, April). The Rasch model plus ability based slipping. Paper presented at the annual meeting of the American Educational Research Paper presented at the American Educational Research Association 2013 Meeting, Division D 2013 In-Progress Research Gala, San Francisco, CA.
  • Xie, C., Li, T., Rupp, A., & Jiao, H. (2013). Posterior predictive model checking for dichotomous item response theory models with upper asymptote effects. Paper presented at the American Educational Research Association 2013 Meeting, San Francisco, CA.
  • Xie, C., Lissitz, R., Jiao, H., Kang, Y., & Li, M. (2013). Accounting for team-teaching in value-added modeling of teacher effectiveness: A real data analysis. Paper presented at the American Educational Research Association 2013 Meeting, Division D 2013 In-Progress Research Gala, San Francisco, CA.
  • Jiao, H., & Lissitz, R. (2012). Modeling latent growth using mixture item response theory. Presented in the Twelfth Annual Maryland Assessment Conference: Value Added Modeling and Growth Modeling with Particular Application to Teacher and School Effectiveness, University of Maryland, College Park.
  • Li, Y., & Lissitz, R. W. (2012). Exploring the full-information bi-factor model in vertical scaling with construct shift.  Paper presented at the National Council on Measurement in Education 2012 Meeting, Vancouver, Canada.
  • Lissitz, R. W. (2012). The evaluation of teacher and school effectiveness using growth models and value added models: Hope versus reality. Invited address given at the American Educational Research Association 2012 Meeting, Division H, Vancouver, Canada.
  • Luo, Y., Jiao, H., & van Wie, A. (2012). A four-level three-parameter IRT. Paper presented at the Psychometric Society 2012 Meeting, Lincoln, Nebraska.
  • van Wie, A., Jiao, H., & Luo, Y. (2012). A four-level IRT for simultaneous evaluation of student, teacher, and school effects. Paper presented at the Psychometric Society 2012 Meeting, Lincoln, Nebraska.
  • Xie, C., & Jiao, H. (2012). A four-parameter multidimensional item response theory model. Paper presented at the annual meeting of the National Council on Measurement in Education 2012 Meeting, Vancouver, Canada.
  • Zhu, X., & Jiao, H. (2012). The testlet effect in vertical scaling. Paper presented at the 18th International Objective Measurement Workshop, Vancouver, Canada.
  • Jiao, H. (2011). Item response theory models for locally dependent item response data. Presented in the Workshop on Modern Psychometric and Statistical Methods for Large-Scale Education Assessments. Beijing Normal University, Beijing, China.
  • Jiao, H. (2011). Current status in K-12 state assessment programs in the USA. Presented at Morning Star, Guangzhou, China.
  • Jiao, H., Lissitz, R., & Zhu, X. (2011). Constructing a common scale in a testing program to model growth: Joint consideration of vertical scaling and horizontal equating. Paper presented at the American Educational Research Association 2011 Meeting, New Orleans, LA.
  • Lissitz, R. W. (Chairperson, 2011). Computerized adaptive tests for classification: Algorithms and applications. Symposium at New Orleans, LA.
  • Lissitz, R. W., Hou, X., & Slater, S. C. (2011). The contribution of constructed response items to large scale assessment: Measuring and understanding their impact. Presented at the National Council on Measurement in Education 2011 Meeting, New Orleans, LA.
  • Jiao, H., Lissitz, R. W., & Li, Y. (2011). Constructing a Common Scale in a Testing Program to Model Growth: Joint Consideration of Vertical Scaling and Horizontal Equating. Paper presented at the American Educational Research Association 2011 Meeting, New Orleans, LA.
  • Jiao, H. (2010). Effects of items and person clustering on measurement precision. Invited presentation at the Educational Psychology Colloquium in the Department of Human Development, University of Maryland, College Park.
  • Jiao, H., Lissitz, R. W., Macready, G., & Wang, S. (2010). Comparing the use of Mixture Rasch Modeling and Judgmental Procedures for Standard Setting. Paper presented at the National Council on Measurement in Education 2010 Meeting, Denver, CO.
  • Jiao, H., Lissitz, B., Macready, G., Wang, S., & Liang, S. (2010). Exploring using the Mixture Rasch Model for standard setting. Paper presented at the National Council on Measurement in Education 2010 Meeting, Denver, CO.
  • Li, Y., & Jiao, H. (2010). Multilevel polytomous testlet model. Paper presented at the National Council on Measurement in Education 2010 Meeting, Denver, CO.
  • Li, Y., Jiao, H., & Lissitz, B. (2010). Investigation of content clustering in large-scale science assessments using Rasch multidimensional IRT and testlet models. Paper presented at the National Council on Measurement in Education 2010 Meeting Graduate Student Poster Session, Denver, CO.
  • Li, Y., Jiao, H. & Lissitz, R. W. (2010). Providing validity evidence for fair use in international testing: A confirmatory factor analysis approach.  7th Conference of the International Test Commission, Hong Kong.
  • Li, Y., Jiao, H., & Lissitz, B. (2010). Construct equivalence of a state high-school graduation test with and without accommodations. Paper presented at the meeting of the International Commission of Testing, Hong Kong, China.
  • Lin, P., & Lissitz, R. W. (2010). The impact of calibration decision on developing a multidimensional vertical scale. Paper presented at the National Council on Measurement in Education 2010 Meeting, Denver, CO.
  • Lissitz, R. W., & Li, F. F. (2010). Standard Setting in Complex Performance Assessments: An Approach Aligned with Cognitive Diagnostic Models. Paper presented at the National Council on Measurement in Education 2010 Meeting, Denver, CO.
  • Cao, Y., & Lissitz, R. W. (2009). Mixed-format test equating: Effects of test dimensionality and common-item sets. Paper presented at the National Council on Measurement in Education 2009 Meeting, San Diego, CA.
  • Li, F. F., Patelis, T., & Lissitz, R. W. (2008). Rigorous Curriculum and SAT. Conference Proceedings 2008, Paper 5. From Rocky Hill, Ct: NERA
  • Lissitz, R. W., & Li, Y. (2008). Reporting aggregated data at the system, school and classroom level.  Maryland Assessment Group, Ocean City, Maryland.
  • Schafer, W. D., Wang, J., & Wang, V. (2008, October). Validity in action: State assessment validity evidence for compliance with NCLB . Ninth Annual Maryland Assessment Conference: The Concept of Validity: Revisions, New Directions and Applications, College Park.
  • Lin, P, Wei, H., & Lissitz, R. W. (2007). Equivalent test structure across grades:  A comparison of methods using empirical data.  In symposium On Measuring change and growth when the measure itself changes over time: Measurement and methodological issues.  Paper presented at the American Educational Research Association 2007 Meeting, Chicago, IL.
  • Lissitz, R. W., & Wei, H. (2007). Consistency of standard setting in an augmented state testing system.  Paper presented at the National Council on Measurement in Education 2007 Meeting, Chicago, IL.
  • Lissitz, R. W., & Kroopnick, M. (2007). An adaptive procedure for standard setting and a comparison with traditional approaches. Paper presented at the National Council on Measurement in Education 2007 Meeting, Chicago, IL.
  • Schafer, W. D., Liu, M., & Wang, H. (2007). Cross-Grade Comparisons among Statewide Assessments and NAEP. Paper presented at the American Educational Research Association 2007 Meeting, Chicago, IL.
  • Hislop, B., Von Secker, C., Bedford, S., & Perakis, S. (2006, Nov). Using Data to Drive Change. MAG Conference, MD.
  • Lissitz, R. W. (2006). The Maryland Assessment Research Center for Education Success (MARCES): Standard Setting and Assessment of Student Growth in Achievement. International Conference on Educational Evaluation, National Taiwan Normal University.
  • Lissitz, R. W. (2006). Using Data to Drive Change: Interventions. Maryland Assessment Group.
  • Lissitz, R., & Fan, W., et. al. (2006). The Prediction of Performance on the Maryland High School Graduation Exam: Magnitude, Modeling and Reliability of Results. Paper presented at the National Council on Measurement in Education 2006 Meeting, San Francisco, CA.
  • Lissitz, R. W., Fan, Wei Hua, Alban, T., Hislop, B., Strader, D., Wood, C., & Perakis, S. (2006). The prediction of Performance on the Maryland High School Graduation Exam: Magnitude, Modeling and Reliability of Results. Paper presented at the National Council on Measurement in Education 2006 Meeting, San Francisco, CA.
  • Mulvenon, S., Zumbo, B. D., Stegman, C., & Lissitz, R. W. (2006). Improving educational data, statistical models, and assessment designs for NCLB: The role of educational statisticians. Invited symposium for the American Educational Research Association 2006 Meeting, SIG/ES.
  • Corliss, Tia, & Lissitz, R. W. (2005). An empirical history of focus group studies in education and psychology from 1979 to 2003. Paper presented at the National Council on Measurement in Education 2005 Meeting, Montreal, Canada.
  • Lissitz, R. W., & Alban, T. (2005). Predicting Performance on the Maryland High School Assessments. Maryland Assessment Group, Ocean City, Maryland.
  • Shafer, W. (2005, December). Comments on school-level database design and use. Paper presented at National Research Council Symposium on Use of School-Level Data to Evaluate Federal Education Programs, Washington, DC.
  • Schafer, W., Papapolydorou, M., Rahman, T., & Parker. L. (2005). Effects of Test Administrator Characteristics on Achievement Test Scores (2005). Paper presented at the Conference of the National Council on Measurement in Education in Montreal, Canada.

 

Technical Reports and Invited Talks:

  • Jiao, H., Zou, J., Liao, D., Li, C., & Lissitz, R. W. (2016). Investigating the concordance relationship between the HSA cut scores and PARCC cut scores (MARC Research Report). College Park, MD: University of Maryland.
  • Liao, D., Li, C., Jiao, H., & Lissitz, R. W. (2015). Investigating the relationship between the PARCC test scores and the college admission tests: SAT/ACT/PSAT (MARC Technical Report). College Park, MD: University of Maryland.
  • Liao, D., Jiao, H., & Lissitz, R. W. (2015). Comparison of different approaches to dealing with directional local item dependence in multipart items (MARC Technical Report). College Park, MD: University of Maryland.
  • Jiao, H. (2014). Differential item functioning analysis for testlet-based assessments. (Research Report submitted to the Governing Board for the AERA Grants Program). College Park, MD: University of Maryland.
  • Jiao, H., Lissitz, R., & Hou, X. (2010). Computer-based testing in the K-12 state assessments. (MARCES Technical Report). College Park, MD: University of Maryland.
  • Lissitz, R., Li, Y., & Jiao, H. (2009). Investigation of Factorial Structure of the 2009 HSA Biology Test across Accommodated and non-accommodated Students. (MARCES Technical Report). College Park, MD: University of Maryland.
  • Jiao, H., & Lissitz, R. (2014, October). Exploring psychometric models for calibrating innovative items embedded in situations. Presented at the Fourteenth Annual Maryland Assessment Conference: Technology enhanced innovative assessment: Development, modeling, and scoring from an interdisciplinary perspective. University of Maryland, College Park.
  • Jiao, H., Kamata, A., & Xie, C. (2014, November). Multilevel cross-classified testlet model for complex item and person clustering in item response modeling. To be presented at the conference on Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications. University of Maryland, College Park.
  • Jiao, H., & Yang, X. (2014, May). A multicomponent testlet model. Presented at the Third Workshop on Statistical Methods in Cognitive Assessments. Fudan University, Shanghai, China.
  • Jiao, H. (2014, February). Polytomous multilevel testlet models for testlet-based assessments with complex sampling designs. Presented at the Joint Program in Survey Methodology, University of Maryland, College Park.

 

Completed Projects:

Teacher/School Effectiveness Using TRSG and Binary Models (2012)

MARC completed a large-scale data analysis under the guidance of Maryland State Department of Education (MSDE). The project compared the results of teacher and school effectiveness from two models: the TRSG model (the state model) and the binary model (a model looking at the students who improved or maintained the same performance level in the second year and those that did not improve). Due to confidentiality reasons, this information is not available for disclosure.

 

The Impact of Testing in Maryland (2011)

The objective of this project is to investigate the validity evidence based on consequences of the two statewide tests for Maryland, MSA and HSA, focusing on the impact on the three aspects: students, teachers, and central administrations.

1. The Impact of Testing in Maryland : Main effects tables
2. The Impact of Testing in Maryland : Interaction tables

3. Principal Component Analysis (PCA) of the Survey Data: Item loadings
4. Full Report of The Impact of Testing in Maryland

Using Student Growth Models For Evaluating Teachers and Schools (2012)

The Evaluation of Teacher and School Effectiveness Using Growth Models and Value Added Modeling: Hope Versus Reality (2012)

PowerPoint Presentation (longer version)
Presentation for AERA Division H, April 2012 - Vancouver, Canada

A Comparison of VAM Models (2012)

Quality Control Charts in Large-Scale Assessment Programs (2010)

Consideration of Test Score Reporting Based on Cut Scores (2009)

Modeling Growth for Accountability and Program Evaluation: An Introduction for Wisconsin Educators (2009)

This work was funded by a contract between the senior author, AIR, and the Wisconsin State Department of Education. We thank the state for permission to make this paper available.

Multiple Choice Items and Constructed Response Items: Does It Matter? (2008)

Content and Grade Trends in State Assessments and NAEP (2007)

Each state is required by the No Child Left Behind Act to report the percents of its students who have reached a score level called "proficient" or above for certain grades in the content areas of reading (or a similar construct) and math. Using 2005 data from public web sites of states and the National Assessment of Educational Progress (NAEP), state-to-state differences in percents were analyzed, both unconditionally and conditionally on NAEP, for (1) trend across content areas (horizontal moderation), (2) trend across grade levels (vertical moderation), and consistency with NAEP. While there was considerable variation from state to state, especially on an idealistic-realistic dimension, the results generally show that states are relatively consistent in trends across grades and contents.

Universal Design in Educational Assessments (2006)

Consistency in the Decision-Making of HSA and MSA: Identifying Students for Remediation for HSA(2006)

This study examined the consistency in the decision-making of HSA and MSA with a purpose of identifying students who are at risk of failing the HSA. The data on which the study was based consisted of the HSA and MSA scores collected from four counties in Maryland. The HSA scores were obtained from the 2004 administration in the subject area of English, and the MSA scores came from the 2003 administration in reading.

In this study, existing cutoff scores were used to dichotomize the MSA scale and re-categorize students so that the resulting categorizations could be compared with the pass/fail determinations of the HSA. Results show that the cut score for passing the HSA was more demanding than the cut score for the "proficient" category in the MSA. Similarly, it shows that the cut score for the "advanced" category in the MSA was set slightly too low, if the purpose is to identify students who are likely to fail the HSA. With regard to which cut score to use to identify students for remediation, it was recommended that students who are below proficient should be selected initially so that the wasted use of the remediation program is minimized.

The Prediction of Performance on the Maryland High School Graduation Exam: Magnitude, Modeling and Reliability of Results (2006)

This research identified potential predictors of at risk students before they take the Maryland High School Assessment (HSA) English examination . Research was based on data collected for the years 2002-2004 for students at four different school systems. To the extent possible, this study utilized the same data in each of the four school systems. The systems differ considerably as to the nature of the student population, size of the system and whether the system setting is more urban or rural.

The analysis of these data was done separately for each County. It occurred in a sequential manner with initial calculation of descriptive statistics, followed by ordinary least squares (OLS), and finally multilevel modeling (HLM). The results of the four school systems are compared on three factors: 1) the similarity of variables that are significantly related to HSA performance (their reliability); 2) the modeling (OLS versus HLM) that works best for predicting the HSA score; and 3) the magnitude of the prediction. Several potential indicators for the performance of HSA are identified by the study and discussed in the paper. These include two measures of reading (performance on MSA Reading and the Scholastic Reading Inventory, poverty, special education, and English Language Learner status. There was some evidence that a student's attendance, English scores at midterm, and GPA also seem to be related to his or her performance on the HSA English 1 exam.

Growth Scales as Alternative to Vertical Scales (2006)

Vertical scales are intended to allow longitudinal interpretations of student change over time, but several deficiencies of vertical scales call their use into question. Deficiencies of vertical scales were discussed and growth scales, a criterion-referenced alternative, were described. Some considerations in developing and using growth scales were suggested.

Harford Reading Excellence Program 2001-2002 - Focus: Program Evaluation & Support

The Harford Reading Excellence Act Grant program was a two-year program that strives to improve reading instruction in five target schools in Harford (MD) County Public Schools by implementing research-based instructional approaches through professional development and by providing a literacy-centered summer school program. This report summarized and highlighted the student outcomes based on Reading Excellence Act Grant program data as well as data from the Maryland State Department of Education. Contact: Robert Lissitz; Melissa Fein performed the evaluation.

Weighting Components - Focus: High School Assessments

This topic was prompted by the problem of weighting constructed-response and selected-response items in the high school assessment program. A literature-based paper was written by Lawrence Rudner and forwarded to the Psychometric Council. The paper was subsequently accepted for publication in Educational Measurement: Issues and Practice.

MD Assessment Web Site - Focus: All Assessment Programs

A searchable computer data base for all known literature on Maryland assessment programs was needed. Lawrence Rudner was the lead researcher and used MARCES-funded ERIC-based personnel for support. They received copies of all papers MSDE has collected over the years and organized the database. See http://marces.org/mdarch/

Evaluating MSPAP - Focus: MSPAP

MSDE had received two evaluative reviews of MSPAP, a content review chaired by Bill Evers and a psychometric review chaired by Ronald Hambleton. Each of these generated considerable reactions from MSDE and outside individuals. MSDE had a need for an evaluative synthesis of all this material with recommendations about how to proceed to improve assessment in Maryland. After much discussion, MARCES decided to solicit an independent contractor for this purpose. The study was done by Edys Quellmalz at SRI (Stanford). Bill Schafer was the MARCES lead for the project.

SE(PAC) - Focus: MSPAP

There is interest in expanding the methodology for generating standard errors of percents above cut to the district and state levels. But there is concern that the theory may not support generalizations at these large levels. A Monte-Carlo study was proposed and agreed to. The MARCES contact was Bill Schafer; Yuan Lee performed the study as a subcontractor.

Evaluating Service Learning - Focus: High School Graduation Requirements

Maryland has a requirement of a service-learning project for high school graduation. The effectiveness of the requirement was evaluated for its purpose and the quality of its implementation. The MARCES contact was Bob Lissitz; Melissa Fein, a MARCES affiliate, performed the study.

Test-Based Methods for Evaluating Teachers - Focus: General

Measurement of teacher quality has obvious potential for policy as well as personnel decisions. It seems reasonable to base assessment of teacher quality on student outcomes. Since these outcomes deemed most important are measured with educational tests, it is desirable to explore ways to quantify teacher quality with data from them. However, there are several methodological approaches to this problem. A study to compare them was carried out by Terry Alban on a contract basis; Bob Lissitz was the MARCES contact.

Annual Conference - Focus: General

The annual MARCES conference is held each year in mid August. Bob Lissitz is the MARCES contact.

Computer Adaptive Testing - Focus: High School Assessments

It would be desirable to have a diagnostic, computer-adaptive assessment to use for candidates who are about to take the high school assessments. This would yield a likelihood of passing and areas of particular weakness, should they exist. In order to demonstrate what is feasible, a prototype system was under development. Lawrence Rudner was the MARCES lead and Phill Gagne was assisting.

Confounding with Writing - Focus: MSPAP

Since the MSPAP format was constructed-response, the potential existed for writing to be confounded with other achievement domains as an artifact of the assessment, itself. The degree to which the MSPAP scoring process was resistant to such confounding had not been assessed. A study in which writing was manipulated for good and poor content was planned. Bob Lissitz was the MARCES contact and Phill Gagne was assisting.

Process Control for MSPAP - Focus: MSPAP

Each year the Psychometric Council reviewed each phase of the MSPAP analyses. Since eight years of data existed by this time, it seemed reasonable to use process control methods to identify quantitatively areas where the Council should have focused particular attention. Bill Schafer was the MARCES contact and Ying Jin was assisting.

Validity of Accommodations - Focus: MSPAP

Tippets, using confirmatory factor analysis for accommodated vs. non-accommodated students, evaluated the comparative internal structures of MSPAP. However, new insights in the structural equations modeling field had never been applied to her report. Further, whether it may or may not seem logically appropriate, it may not have been reasonable to report reading scores for students who received the reading accommodation if the validity of MSPAP was different for them. Bill Schafer was the MARCES contact and Mara Freeman was assisting.

Web-Based Technical Manual - Focus: MSPAP

The MSPAP technical manual was recreated and redistributed every year, although much of it was redundant. It seemed possible to use the web to update the manual. Perhaps only one manual was needed, with tables and text only to augment whatever was no longer current at a given time. We considered organizing the manual around the AERA/APA/NCME Standards for Psychological Testing. Bill Schafer was the MARCES contact and Mara Freeman was assisting.

Combining Data for School Performance Indices - Focus: MSPAP

At the time, only MSPAP data and attendance were combined into school performance indices (SPIs) for elementary schools. There was interest in adding a component for CTBS/5 data. A review of approaches in the various states with emphasis on several states that illustrate very different approaches had been developed. A briefing paper on the desirability of adding a new data source to the SPI was developed and a meeting was held with the State Superintendent where the topic was discussed. Bill Schafer was the MARCES contact and Geoff Wiggins was assisting.

 

Last Modified: October 2014

 
  EDMS Home Page
web counter
web counter