I run some online assessment for an HCI module, and the question structure and marking scheme for the multiple-choice questions are a little unusual. I’ve written before about the initial marking scheme we used, in this article Teaching tip: online assessment of a fuzzy topic for our local teaching news publication, and also in this article Structuring an on-line assessment of students’ learning for an e-learning conference, however the marking scheme has evolved significantly since then. In this post I document the various versions that the marking scheme went through, and explain why we currently use the scheme that we do.
Here’s an example of the type of questions we used, along with the marking scheme we used initially:
A web design company is developing a web site for a health foods shop, and has created a prototype site. One of the company’s employees explores the prototype site carefully and systematically, looking specifically at responses to any actions that potential customers might perform. For example, in response to a customer clicking on a button to put an item into a shopping basket, there should then appear a clear indication to the customer that an item has been put into the shopping basket, and which item it is. Any responses that are missing, or inadequate, are noted and reported to the design team.
Please indicate whether the statements below are true or false.
Fig. 1 shows the general structure of questions. There is one big question per topic. For each question, a description of a scenario is given, followed by several mini-questions. This grouping of mini-questions by topic makes answering the questions relatively speedy, as there is less text to read per question. For each mini-question, students have to decide between one of two answers (in this case, between true and false).
An automated marking system needs to be able to clearly differentiate right and wrong answers, which is easy to do for multiple-choice questions. However, HCI is not such a black-and-white topic where the answer to a question is very clearly right or wrong, and so rather than present 4 or 5 different answers for students to select from, I felt that the choice of just two answers made for better differentation between correct and incorrect answers. The questions also had to be very carefully worded so that if a student did understand the HCI principle being tested, it was crystal clear which the correct answer was.
As for the marking scheme, it was chosen for various reasons:
Firstly, we didn’t want students to be able to get a substantial portion of marks simply from guessing, so we didn’t choose a marking scheme such as “Correct answer=+1, Incorrect answer=0″ which would have yielded 50% on average for complete guesswork. The marking scheme was chosen so that “Correct answer=+2, Incorrect answer=-2″ yields an average of 0 for guesswork, and allocates 2 marks for each correct answer to a mini-question. Why 2 marks? Well, one of the questions actually used a three-way split for its mini-questions, which were marked using a scheme of “Correct answer=+2, Incorrect answer=-1″ so that once again, a correct answer gets 2 marks and guesswork yields an average of 0.
Secondly we included a “Don’t Know” option, scoring zero, as recommended in Blueprint for Computer-assisted Assessment (Bull & McKenna, 2003), as it seemed reasonable that students might complain about being forced to guess and possibly losing two marks as a result.
Thirdly, it seemed a bit unfair to give someone less than zero if they didn’t know anything about a topic, they should just get zero for that topic. So we added the rider that the total score for a whole question would not be less than 0. This does raise very slightly the average expected number of marks for complete guesswork, but only by a very small amount, acceptably low in the context of the weighting for the assessment and the proportion of guesswork that students may employ.
As you’d expect, a student’s mark for the whole test was simply his/her total marks divided by the maximum possible marks, scaled to a percentage.
With this marking scheme, students didn’t like the negative marking. They understood that I didn’t want to award a mark of 50% for guesswork (once I had explained that perspective to them!) but they still didn’t like the idea that they could put down an answer and get 2 marks removed from their score.
Also, I formed the impression from looking at their answers that many students were scared of losing marks and would sometimes answer “Don’t Know” when they were not 100% sure of the right answer, so a score might be down to how confident the student was, rather than how much the student knew and understood.
Furthermore, it concerned me that students were potentially spending time during the test trying to weigh up the risk of putting down an answer, rather than thinking about the answers to the questions.
So in the second version of the marking scheme, students didn’t have the “Don’t Know” option:
[description of scenario]
Please indicate whether the statements below are True or False.
After implementing this marking scheme, I compared the students’ scores to those from the previous year, to see what effect this had had. I found that if anything there had been a slight increase (the lectures and other teaching methods remained the same) so I deduced that it was highly likely that the change of scheme had done no harm!
However, I was still getting fed up with complaints about negative marking. Even though students get 2 marks for every answer they know correctly, and on average get 0 for those they guess, this explanation did not sit well with them.
So I changed the marking scheme again. This is what’s in the rubric of the test papers:
Scoring is as follows: each correct box ticked for a part of a question
Or did I….? The marking scheme that we currently use is mathematically equivalent to that of before, it just sounds different.
The way I explain it to students is that their marks are proportional to how much better they do than guesswork. If a student did no better than guesswork then that student got zero. If a student ticked every single box correctly then that student obtains a score equal to the number of boxes. If in between, then it is proportional.
So for example, take a large question composed of 5 mini-questions, each with a True/False option. A guesser would score 2.5 on average, so we have:
- Student gets precisely 0, 1 or 2 boxes correct: scores 0 for the whole question.
- Student gets precisely 3 boxes correct: scores 1.
- Students gets precisely 4 boxes correct: scores 3.
- Students gets precisely 5 boxes correct: scores 5.
To be formal about it, for a question with n True/False mini-questions, if a student gets c boxes correct, then their score is max (0, 2(c – n/2) ).
More generally, for mini-questions with more than two tick-boxes, for a question with n mini-questions, each with k-way choices, if a student gets c boxes correct then
they get a score of max (0, (kc – n)/(k – 1) ).
Again, a student’s mark for the whole test is simply his/her total marks divided by the maximum possible marks, scaled to a percentage. And this results in the exact same overall marks that would have been obtained on the previous version of the scoring scheme. For example, if a student answers a question with 5 True/False mini-questions and gets 3 out of 5 correct, that student receives 20% of the available marks for that question in the previous marking scheme (2 out of a maximum 10) and this one (1 out of a maximum 5).
However, psychologically there is a big difference. Now it is much more obvious that you can’t lose anything by guessing; guessing might increase your number of boxes correctly ticked whereas leaving a box blank definitely won’t.
The students seem to prefer this new “measure how much better you did than guessing” approach, rather than “points taken off if you got it wrong”. Whilst they would still prefer the scheme of guessing to get 50% of the marks in a True/False marking scheme, they can understand why I don’t like that idea
So I’m not getting complaints any more. I do still have to explain the marking scheme to them carefully though, it’s rather different to the other tests that they do and I want to be sure that they understand how they are getting assessed.
I really like the scheme as it stands now, for the following reasons:
- It measures students’ actual accomplishments and does not reward guesswork. It allows T/F multiple choice questions to be used without giving away up to 50% of the marks for free that would occur if the marking scheme simply counted the number of right answers. In a multiple-choice there is always an element of randomness (indeed I think there is an element of randomness in many other forms of assessment too, like which topics happen to turn up on the exam) because unknown answers are guessed, and I think this scheme does a good job of minimising that amount of randomness.
- Rather than a seemingly-unfair negative marking scheme where students may feel aggrieved because wrong answers get not zero but points taken away from them, the perception is more one of needing to reach a high enough level of knowledge in a question in order to get points for that question. Explaining the marks scheme to students, the challenge is not defending a negative marking scheme but explaining the “proportional to how much better you do than guessing” and explaining why you don’t want to allocate a simple 1 mark per box correct because a guesser could get 50% on average.