A statistical model for cluelessness named after the president. Because why not?
The problem three boffins from Rice University and Princeton are trying to answer arises because of the rise of large-scale online learning.
At a smaller scale – for example, in a lecture theatre or tutorial gathering – it’s relatively easy for a capable instructor to work out from a student’s question what part of a topic they’re finding hard to grasp.
That scales badly: in a MOOC (massive open online course), the tutor-to-student ratio could be thousands to one or more, but there’s an upside, since the scale of the student body is also a rich source of data.
D.TRUMP seeks to mine that student data for evidence of clue deficit, with the authors of this paper writing: “The scale of this data presents a great opportunity to revolutionise education by using machine learning algorithms to automatically deliver personalised analytics and feedback to students and instructors in order to improve the quality of teaching and learning.”
To achieve that, D.TRUMP transforms answers into low-dimensional textual vectors using tools like Word2Vec and the like; and the authors’ work, which is a statistical model that “jointly models both the transformed response textual feature vectors and expert expert labels on whether a response exhibits one or more misconceptions”.
The researchers tested their work against 386 students’ answers to a total of 1,668 questions in the AP Biology high-school level classes at OpenStax Tutor, giving them a total of 60,000 labelled responses.
It’s probably helpful at this point to identify just how fine the line can be between correct and an almost-correct misconception. From the paper:
For those (like the author) who didn’t study biology: “Inbreeding leads to harmful mutations” is a lay understanding of genetics. To be marked correct, the student needs to identify the mechanism, that inbreeding can bring together recessive mutations from mother and father.
Having developed D.TRUMP to the level that it can spot that kind of misconception, the system provides another bit of help to the educator: it can identify groups of students who share a misconception. This could indicate whether the students arrived in a course with a clue deficit, or that the courseware isn’t getting its message across. ®