Testing Spoken Language, A Handbook of Oral Testing Techniques, Nic Underhill, Cambridge Handbooks for Teachers, 1987


Image source: Pixabay

Summary notes

The most important influence on the development of language testing has been the legacy of psychometrics, in particular intelligence testing. A lot of time was devoted into proving there was a single measurable attribute called general intelligence. Psychometrics wanted to be a science. The aspects of human behavior that could be predicted and measured were emphasized. The multiple-choice test offered the learner no opportunity to behave as an individual. Individualism was described as ‘variance,’ and a lot of effort was put into reducing the amount of variance a test produces.

Expectations – Every culture values education highly, but does so in different ways.


  • interviewer
  • interlocutor – A person whose job is to help the learner to speak, but who is not required to assess him/her.
  • assessor
  • marker or rater
  • authentic
  • objective
  • stimulus
  • validity – Does the test measure what it’s supposed to?
  • reliability – Does the test give consistent results?
  • evaluate – Find out if the test is working.
  • moderate – To compare the way different assessors award marks and to reduce discrepancies.


Tests can be used to ask four basic kinds of question around:

  • proficiency
  • placement
  • diagnosis
  • achievement

Who does a learner speak to in a test?

  • Learner <> Interviewer/Assessor
  • Learner <> Interlocutor
  • Learner <> Learner
  • Learner <> Group

Chapter 3 presents a nice list of task types that could be given in oral exams.

A marking key or marking protocol sames time and uncertainty by specifying in advance, as far as possible, how markers should approach the marking of each question or task.

Performance criteria might include: Length of utterances; complexity; speed; flexibility; accuracy; appropriacy; independence; repetition; hesitation.

Mark categories could be given a weighting.

Few learners are ‘typical.’ It may be helpful to look for a range, not a point on a scale.

Additive marking is where the assessor has prepared a list of features to listen out for during the test. She awards a mark for each of these features that the learner produces correctly, and adds these marks to give the score. This is also known as an incremental mark system; the learner starts with a score of zero and earns each mark, one by one.

Subtractive marking is where the assessor subtracts one mark from a total for each mistake the learner makes, down to a minimum of zero.


Evaluating Your Students, A. Baxter, Richmond, 1997


Image source: Pixabay

Summary notes:

The traditional way to assess students has been through using tests. Testing has largely been aligned with scientific study – to administer a (placement) test, to make some changes (i.e., to teach), and to re-test (an end-of-course test). With this, if something is too difficult to measure, it hasn’t traditionally been tested (although newer ideas allow for forms of assessment and evaluation in the form of things such as student portfolios).

A good test is: valid, reliable, practical, has no negative backwash.

  • VALID – There are three types of validity:

content validity – Does the test test what was covered in class?

construct validity – Does the test test what it’s supposed to test and nothing else?

face validity – Does the test look like it’s testing what it is supposed to test from an initial glance?

  • RELIABLE – There are two areas of reliability:

test reliability – If you gave the same test to the same person, would the result be the same?

scorer reliability – Would two people come up with the same mark for the same test?

  • PRACTICAL – How practical is the test to administer?



direct testing – We ask the student to perform what we want to test (preferable).

indirect testing – We test things that give us an indication of a student’s performance (less preferable).

norm-referenced testing – When the results of a test compare students (a popular notion among state and exam board tests).

criteria-referenced testing – When test results tell you about what an individual student can do.

summative testing – Done at the end of a semester/year.

formative testing – Ongoing assessment that allows change to take place before the course is over.

congruent testing – This looks at the whole process before it starts so that any issues get resolved before a course is underway.

profiles/profiling and analytic mark schemes – A profile is not so much a score, but a reference to a set of descriptors of a person’s ability. A student might fall into a certain band. Some students will not have a flat profile.



A cloze test is a test in which words are deleted not according to what we want to test (as regular gap fills), but on a regular basis. Thus, every seventh word, or near enough, will be deleted. A variation on this is the C-test (first letter (elsewhere, literature says first half) of words given).

When to give assistance (i.e., provide clues/hints) depends on three testing problems:

  • When we are testing the students’ ability to transform something.
  • When we want to force the student to use a desired item.
  • When we want to put the same idea in each student’s head.



It is obvious that students don’t always learn everything we teach. On the other hand, it must also be true that they learn things we don’t teach.

If a student has a question about a text, this might mean that he/she may be ready to learn it. Baxter calls this the saliency effect. However, what is suddenly salient for one individual will probably not be salient for the whole class. Also, what the individual is ready to learn will probably not fit in with the teacher’s plan. If the teacher is practising skimming, and a student asks what a particular word means, the teacher would probably tell them the word wasn’t important because they are practising skim-reading.