Archive for the 'teaching' Category

A Marking Scheme for Multiple-Choice Questions

I run some online assessment for an HCI module, and the question structure and marking scheme for the multiple-choice questions are a little unusual. I’ve written before about the initial marking scheme we used, in this article Teaching tip: online assessment of a fuzzy topic for our local teaching news publication, and also in this article Structuring an on-line assessment of students’ learning for an e-learning conference, however the marking scheme has evolved significantly since then. In this post I document the various versions that the marking scheme went through, and explain why we currently use the scheme that we do.

Question Format

Here’s an example of the type of questions we used, along with the marking scheme we used initially:

Figure 1: an example question

Question 5

A web design company is developing a web site for a health foods shop, and has created a prototype site. One of the company’s employees explores the prototype site carefully and systematically, looking specifically at responses to any actions that potential customers might perform. For example, in response to a customer clicking on a button to put an item into a shopping basket, there should then appear a clear indication to the customer that an item has been put into the shopping basket, and which item it is. Any responses that are missing, or inadequate, are noted and reported to the design team.

Please indicate whether the statements below are true or false.

True False Don’t Know
a) The evaluation by the employee was formative.

 

 

 
b) The evaluation involved GOMS analysis.

 

 

 
c) This situation describes an example of heuristic evaluation.

 

 

 

Scoring:
Correct answer=+2, Incorrect answer=-2, Don’t Know=0
(Total score for Question 5 will not be less than 0.)

Fig. 1 shows the general structure of questions. There is one big question per topic. For each question, a description of a scenario is given, followed by several mini-questions. This grouping of mini-questions by topic makes answering the questions relatively speedy, as there is less text to read per question. For each mini-question, students have to decide between one of two answers (in this case, between true and false).

An automated marking system needs to be able to clearly differentiate right and wrong answers, which is easy to do for multiple-choice questions. However, HCI is not such a black-and-white topic where the answer to a question is very clearly right or wrong, and so rather than present 4 or 5 different answers for students to select from, I felt that the choice of just two answers made for better differentation between correct and incorrect answers. The questions also had to be very carefully worded so that if a student did understand the HCI principle being tested, it was crystal clear which the correct answer was.

Marking Scheme

As for the marking scheme, it was chosen for various reasons:

Firstly, we didn’t want students to be able to get a substantial portion of marks simply from guessing, so we didn’t choose a marking scheme such as “Correct answer=+1, Incorrect answer=0″ which would have yielded 50% on average for complete guesswork. The marking scheme was chosen so that “Correct answer=+2, Incorrect answer=-2″ yields an average of 0 for guesswork, and allocates 2 marks for each correct answer to a mini-question. Why 2 marks? Well, one of the questions actually used a three-way split for its mini-questions, which were marked using a scheme of “Correct answer=+2, Incorrect answer=-1″ so that once again, a correct answer gets 2 marks and guesswork yields an average of 0.

Secondly we included a “Don’t Know” option, scoring zero, as recommended in Blueprint for Computer-assisted Assessment (Bull & McKenna, 2003), as it seemed reasonable that students might complain about being forced to guess and possibly losing two marks as a result.

Thirdly, it seemed a bit unfair to give someone less than zero if they didn’t know anything about a topic, they should just get zero for that topic. So we added the rider that the total score for a whole question would not be less than 0. This does raise very slightly the average expected number of marks for complete guesswork, but only by a very small amount, acceptably low in the context of the weighting for the assessment and the proportion of guesswork that students may employ.

As you’d expect, a student’s mark for the whole test was simply his/her total marks divided by the maximum possible marks, scaled to a percentage.

With this marking scheme, students didn’t like the negative marking. They understood that I didn’t want to award a mark of 50% for guesswork (once I had explained that perspective to them!) but they still didn’t like the idea that they could put down an answer and get 2 marks removed from their score.

Also, I formed the impression from looking at their answers that many students were scared of losing marks and would sometimes answer “Don’t Know” when they were not 100% sure of the right answer, so a score might be down to how confident the student was, rather than how much the student knew and understood.

Furthermore, it concerned me that students were potentially spending time during the test trying to weigh up the risk of putting down an answer, rather than thinking about the answers to the questions.

So in the second version of the marking scheme, students didn’t have the “Don’t Know” option:

Figure 2: the second version of the marking scheme

Question #

[description of scenario]

Please indicate whether the statements below are True or False.

True False
a) [first mini-question]

 

 
b) [second mini-question]

 

 
… [other mini-questions]…

 

 

Scoring:
Correct answer=+2, Incorrect answer=-2
(Total score will not be less than 0.)

After implementing this marking scheme, I compared the students’ scores to those from the previous year, to see what effect this had had. I found that if anything there had been a slight increase (the lectures and other teaching methods remained the same) so I deduced that it was highly likely that the change of scheme had done no harm!

However, I was still getting fed up with complaints about negative marking. Even though students get 2 marks for every answer they know correctly, and on average get 0 for those they guess, this explanation did not sit well with them.

So I changed the marking scheme again. This is what’s in the rubric of the test papers:

Figure 3: the third version of the marking scheme

Scoring is as follows: each correct box ticked for a part of a question
gets 1 mark, and then the total marks for a whole question is scaled
linearly to measure how much better the answers are than pure guesswork.
The score for a whole question will result in a number from 0
up to the total number of question parts.

Or did I….? The marking scheme that we currently use is mathematically equivalent to that of before, it just sounds different.

The way I explain it to students is that their marks are proportional to how much better they do than guesswork. If a student did no better than guesswork then that student got zero. If a student ticked every single box correctly then that student obtains a score equal to the number of boxes. If in between, then it is proportional.

So for example, take a large question composed of 5 mini-questions, each with a True/False option. A guesser would score 2.5 on average, so we have:

  • Student gets precisely 0, 1 or 2 boxes correct: scores 0 for the whole question.
  • Student gets precisely 3 boxes correct: scores 1.
  • Students gets precisely 4 boxes correct: scores 3.
  • Students gets precisely 5 boxes correct: scores 5.

To be formal about it, for a question with n True/False mini-questions, if a student gets c boxes correct, then their score is max (0, 2(c – n/2) ).

More generally, for mini-questions with more than two tick-boxes, for a question with n mini-questions, each with k-way choices, if a student gets c boxes correct then
they get a score of max (0, (kc – n)/(k – 1) ).

Again, a student’s mark for the whole test is simply his/her total marks divided by the maximum possible marks, scaled to a percentage. And this results in the exact same overall marks that would have been obtained on the previous version of the scoring scheme. For example, if a student answers a question with 5 True/False mini-questions and gets 3 out of 5 correct, that student receives 20% of the available marks for that question in the previous marking scheme (2 out of a maximum 10) and this one (1 out of a maximum 5).

However, psychologically there is a big difference. Now it is much more obvious that you can’t lose anything by guessing; guessing might increase your number of boxes correctly ticked whereas leaving a box blank definitely won’t.

The students seem to prefer this new “measure how much better you did than guessing” approach, rather than “points taken off if you got it wrong”. Whilst they would still prefer the scheme of guessing to get 50% of the marks in a True/False marking scheme, they can understand why I don’t like that idea :-)

So I’m not getting complaints any more. I do still have to explain the marking scheme to them carefully though, it’s rather different to the other tests that they do and I want to be sure that they understand how they are getting assessed.

I really like the scheme as it stands now, for the following reasons:

  • It measures students’ actual accomplishments and does not reward guesswork. It allows T/F multiple choice questions to be used without giving away up to 50% of the marks for free that would occur if the marking scheme simply counted the number of right answers. In a multiple-choice there is always an element of randomness (indeed I think there is an element of randomness in many other forms of assessment too, like which topics happen to turn up on the exam) because unknown answers are guessed, and I think this scheme does a good job of minimising that amount of randomness.
  • Rather than a seemingly-unfair negative marking scheme where students may feel aggrieved because wrong answers get not zero but points taken away from them, the perception is more one of needing to reach a high enough level of knowledge in a question in order to get points for that question. Explaining the marks scheme to students, the challenge is not defending a negative marking scheme but explaining the “proportional to how much better you do than guessing” and explaining why you don’t want to allocate a simple 1 mark per box correct because a guesser could get 50% on average.

Hashing and Group Activities

For me, it’s a constant challenge, when planning teaching activities, to figure out how best to help students learn various topics within the constraints of the available resources. A while ago, I was pleased to find what seems to be a good teaching method for hash tables, which is the subject of this post.

More of hash tables later. First, some background:

Usually for a module, we have a large lecture room booked for two hours, followed by multiple small-group rooms booked for one hour. Typically, the larger room is used for a lecture, containing an initial presentation of the module concepts. Whilst there can be a certain amount of participation and interactivity in a lecture setting, it’s the smaller-group sessions (practical classes) that usually offers a better opportunity to engage students and make sure that they are learning the material, because as it’s a smaller group, there’s more opportunity for tutors to help give students individual feedback on how they are doing. Obviously, we want students to attend the small-group sessions because people generally learn better if they are doing some kind of activity where they are active participants, rather than sitting back and watching someone else do it. The idea is that if you have to do it yourself, you have to understand it, because you need the understanding to do the task.

So far, so good, but there’s an issue we have to bear in mind when planning what’s going to happen in the practical classes: like other universities up and down the country, levels of attendance aren’t what they should (or could) be, so how do we best encourage students to attend their practical classes? We asked our students some years ago what affected their attendance: it turns out one of the factors that affects whether they go to practical classes is how much value they see themselves as getting out of that class.

And herein lies the tricky bit.

If we set worksheets that the students have to work through individually, with the idea that the tutor can help when they get stuck, despite the help available, several of them tend to take the line “Oh I can work through this at home”, and I can understand why they’d prefer the comfort of their own desk with all its comforts to one of the University’s computer labs.

On the other hand, if we set worksheets more oriented towards a group activity, that does tend to be more attractive for them: not only is the class an experience they can’t get at home (which contributes to their perception of the practical’s value), but some students like keeping quiet in a group and letting others do the active work, and then they’re not getting as much out of the class as if they were actively participating.

In a nutshell, it’s quite a challenge to both get them through the door and give them the best chance of learning the material once they’re there in the practical class.

And that brings me on to hash tables.

Hash tables are a method of storing data: you have a table with slots in it for data, and understanding hashing is all about understanding which slot to put data into. To get students to understand hashing, as a teacher I feel that I really want each individual student insert data into a hash table table.

I used to have a practical worksheet which involved students being given data to insert into various hash tables, on their own, and I wasn’t happy with this worksheet. One reason was that the time taken to insert the data was quite substantial, and many students didn’t get to try all the different hashing schemes covered on the worksheet. Another reason was that it was the kind of exercise that students might feel that they could do it on their own rather than attend practicals.

Here’s what I did that I felt worked really well:

I had prepared in advance several little cards like this, each with a different number on, and randomly distributed them to the students. We were looking at three different hashing schemes, so each student got three cards, each with a different number on (the number being the data to be inserted into the hash table), one card for each hashing scheme.

Stage 1: students on their own went to the computers (or their pocket calculators) and calculated what the hash values of the three numbers they had, filling the values in on the space on the cards. These “hash values” are useful information, to be used in the second stage.

Stage 2: for each of the three hashing schemes, we started with an empty hash table. This hash table was displayed on an OHP screen, but could have been displayed on a whiteboard or on the floor, just so long as everyone could see it. Then, one by one, the students took it in turns to enter their own individual piece of data for that table. I supervised, to make sure that they were putting the data in the correct place and understanding why they did so.

This seemed to work really well. Every student had to place a piece of data, so every student had to participate and understand why the data gets placed where it does, including the shy students (who need not go first but can do one after their friend has done one). Because students only calculated hash values for one piece of data each, at the beginning 5 minutes of the session, the students didn’t spend so long calculating hash values but instead spent much more time on watching how all the various numbers got inserted into the hash table. That meant we had enough time for all of the students to look at all the hashing schemes, not like before where only the speedy students covered them all.

Of course, it doesn’t have to be numbers for the data. Strings work pretty well where you can base hash values on things like the length of the string, or how many vowels the string has got.

Next time, I think I’ll bring a prop, a kind of large mat, to be placed on the floor where everyone can see it, with the outline of the slots of a hash table, so that students can physically place the numbers in the slots on the hash table mat.

Algorithms and a Moustache

Me looking silly

I run a course on Data Structures and Algorithms.

The first week of the course introduces students to the ideas of complexity and big O notation, and I wasn’t happy with the existing worksheets used in the practical classes. The questions needed to be better at getting across the idea of time complexity to undergraduates with a wide range of abilities.

I had a vague idea of what I wanted to do: I wanted some algorithms that the students could follow by physically acting them out in the practical session, and then afterwards we could discuss the complexity of the algorithms. I hoped that the “learning by doing” approach would help them to understand and remember the complexity ideas better (i.e. what educational researchers would call experiential learning).

I looked around the web for inspiration, and I came across this web page which presents some course notes for “CS367 Introduction to Data Structures” at the University of Wisconsin (instructor Sally Peterson, web notes hosted by Rebecca Hasti). It had this algorithm suggestion:

“Consider the following three algorithms for determining whether anyone in the room has the same birthday as you.

Algorithm 1: You say your birthday, and ask whether anyone in the room has the same birthday. If anyone does have the same birthday, they answer yes.

Algorithm 2: You tell the first person your birthday, and ask if they have the same birthday; if they say no, you tell the second person your birthday and ask whether they have the same birthday; etc, for each person in the room.

Algorithm 3: You only ask questions of person 1, who only asks questions of person 2, who only asks questions of person 3, etc. You tell person 1 your birthday, and ask if they have the same
birthday; if they say no, you ask them to find out about person 2. Person 1 asks person 2 and tells you the answer. If it is no, you ask person 1 to find out about person 3. Person 1 asks person 2 to find out about person 3, etc.”

This was a nice idea for an algorithm to act out, but it didn’t quite suit my purposes. I wanted to discuss the complexities for all three versions. Once the students have identified their tutor’s birthday from the first run of the algorithm, it seems a bit pointless to run another version of the algorithm to find out something you already know. In addition, this would require people to reveal their birthdays, a fairly personal piece of information often used for identity checks (or students would have temporarily have to pretend to have a different birthday). Rather than do birthdays, I had a different idea: the photo identification algorithm:

The purpose of this exercise is to look at the time complexities of three different algorithms to solve the same problem. The problem is to identify whether any student in the room knows who is in the photo, and if so, who is it?

Algorithm 1
Tutor asks first student, then asks second student, then asks third student etc.
Algorithm 2
Tutor asks first student. If the identity is not known, the tutor passes the photo to the first student and asks them to ask the second student. The information is passed back to the tutor. If not known, the tutor asks the first student to ask the second student to ask the third student… etc.
Algorithm 3
The tutor holds up a photo. All students in the room immediately either speak the name of the person in the photo (if identified) or shake their head otherwise.

These algorithms don’t require personal information and you can run them as many times as you have photos printed out to show them. I chose photos of celebrities that were very well-known in some circles, but not others, as I didn’t want the algorithms to terminate too soon. I chose Benazir Bhutto (twice Prime Minister of Pakistan):

Benazir Bhutto

and Stephen Colbert (US comedian ubiquitous in the US but mostly unknown in the UK):

Stephen Colbert

For the third photo, I had the idea of creating a comedy moment, where the practical tutor shows the photo to the students and it is a photo of the practical tutor in some kind of comic style, like in comedy glasses and moustache. I thought this might make it fun and more memorable for the students. So I sent round an email to the department:

Does anyone have a comedy glasses/noses/moustache kind of thing that they could bring in and I could borrow for a few hours?

(Don’t ask. The short answer is that it’s for a practical class to help explain Big O notation to [...] students.)

In the end I went round the practical tutors who were going to be giving the practical sessions, taking photos of them in a silly jimmy hat (scottish tartan beret with red pom pom and ginger fake wig). All of them looked suitably amusing, except mine! Mine looked too sensible, not nearly comedic enough:

Me in a silly hat

So I donned a fake moustache instead. No I am not posting a picture of that.

Things I learnt as a result:

  • The students (and tutors) did indeed seem to find the practical class more memorable, and fun too.
  • In a practical room, you can hear the peals of laughter from the neighbouring lab.
  • The student grapevine circulates rumours of comedy tutors extremely quickly. Attendance the following week was up!
  • When you send round an email asking for comedy glasses and moustache, all of a sudden you have an instant conversation topic with people you meet in the corridors, who will all want to know what on earth is going on in your practical classes, and can they watch?
  • I have a lot of colleagues who are good sports and not afraid to make fools of themselves for the cause of explaining algorithms concepts.
  • If one of your colleagues looks unexpectedly hilarious in such a getup then he or she may not be too happy if you fall about laughing for half an hour…
  • … but it will make your day!
  • My departmental colleagues are very kind and if you laugh for half an hour straight they will offer to fetch you some oxygen.
  • Everyone will suddenly develop a keen interest in being emailed photos of your colleagues wearing silly hats and/or moustaches.
  • Giving your colleagues your solemn assurance that you are NOT sending electronic copies of the photos to anyone is very important.