PhD Candidate
Optimizing Personalized Learning at Scale
Department of Psychology
University of Amsterdam
Preprint (PDF) · GitHub repository · Poster (IOPS 2024)
The estimation of student ability is paramount in large-scale personalized learning. To this end, state-of-the-art adaptive learning environments use item response theory (IRT). Previous work in traditional learning assessment has demonstrated that unidimensional IRT models fall short in adequately estimating ability when items on a test are skipped. In this study, we extend this work to online learning platforms. We analyze data from a large-scale online learning platform used to practice Arithmetic. Using the IRTree framework, we compare the unidimensional model of accuracy to a multidimensional model which additionally accounts for the decision to respond or skip a problem. We found support for problem-skipping as a non-ignorable process: students that were more likely to problem-skip were more likely to make erroneous responses. Further exploration revealed individual differences in the strategies involved with problem-skipping. To ensure that learning analytic tools are supported by fair measurement models, we suggest several ways to account for problem-skipping when estimating student ability.
Figure 1. Correlation between estimated nodes in the fully estimated IRTree model. The left graph displays the correlation between the item skipping threshold ($\beta^{(1)}$), and the item difficulty ($\beta^{(2)}$). The right-hand graph displays the correlation between the individual propensity to skip an item ($\theta^{(1)}$), and the individual propensity to answer incorrectly ($\theta^{(2)}$). For both graphs, The scatter plot denotes individual data points, and the regression line displays the best-fitting linear relationship between the nodes. Density plots denote the distribution of each random parameter.
Figure 2. This graph displays eight individual users extracted based on their estimated propensity to skip and make an incorrect response (center graph). Users are displayed in a clockwise fashion starting from the upper right corner (low accuracy, high skipping), to the upper left corner (low accuracy, low skipping) of the scatter plot. Each graph plots each each individual's response sequence to items in the Series game. The position on the y axis reflects the response time to the item. Colors denote the response type (correct, incorrect, question mark). While some users had longer response sequences in the extracted data (users C and E), the x axis is limited to showing the first 300 responses, to ease the visualization.
We analyze data from the Series game in Prowise Learn (formerly Math Garden), a computer-adaptive arithmetic practice platform. The dataset contains responses from 3,795 students across 386 items, in which students can choose to skip a problem by pressing a dedicated skip button. We fit a two-node IRTree model using Stan, and compare it to a standard unidimensional 1PL IRT model using posterior predictive checks and model fit indices. Code and materials are available on the GitHub repository.
Figure 2. The IRTree model used in this study. Node Y1 captures the student's decision to skip a problem (outcome: 1 = skip) or attempt it (continue to Y2). Node Y2 captures the accuracy of the response (1 = correct, 0 = incorrect). Each node has its own latent trait, allowing problem-skipping and accuracy to be modeled as distinct processes.