Evaluating Your Students, A. Baxter, Richmond, 1997


Image source: Pixabay

Summary notes:

The traditional way to assess students has been through using tests. Testing has largely been aligned with scientific study – to administer a (placement) test, to make some changes (i.e., to teach), and to re-test (an end-of-course test). With this, if something is too difficult to measure, it hasn’t traditionally been tested (although newer ideas allow for forms of assessment and evaluation in the form of things such as student portfolios).

A good test is: valid, reliable, practical, has no negative backwash.

  • VALID – There are three types of validity:

content validity – Does the test test what was covered in class?

construct validity – Does the test test what it’s supposed to test and nothing else?

face validity – Does the test look like it’s testing what it is supposed to test from an initial glance?

  • RELIABLE – There are two areas of reliability:

test reliability – If you gave the same test to the same person, would the result be the same?

scorer reliability – Would two people come up with the same mark for the same test?

  • PRACTICAL – How practical is the test to administer?



direct testing – We ask the student to perform what we want to test (preferable).

indirect testing – We test things that give us an indication of a student’s performance (less preferable).

norm-referenced testing – When the results of a test compare students (a popular notion among state and exam board tests).

criteria-referenced testing – When test results tell you about what an individual student can do.

summative testing – Done at the end of a semester/year.

formative testing – Ongoing assessment that allows change to take place before the course is over.

congruent testing – This looks at the whole process before it starts so that any issues get resolved before a course is underway.

profiles/profiling and analytic mark schemes – A profile is not so much a score, but a reference to a set of descriptors of a person’s ability. A student might fall into a certain band. Some students will not have a flat profile.



A cloze test is a test in which words are deleted not according to what we want to test (as regular gap fills), but on a regular basis. Thus, every seventh word, or near enough, will be deleted. A variation on this is the C-test (first letter (elsewhere, literature says first half) of words given).

When to give assistance (i.e., provide clues/hints) depends on three testing problems:

  • When we are testing the students’ ability to transform something.
  • When we want to force the student to use a desired item.
  • When we want to put the same idea in each student’s head.



It is obvious that students don’t always learn everything we teach. On the other hand, it must also be true that they learn things we don’t teach.

If a student has a question about a text, this might mean that he/she may be ready to learn it. Baxter calls this the saliency effect. However, what is suddenly salient for one individual will probably not be salient for the whole class. Also, what the individual is ready to learn will probably not fit in with the teacher’s plan. If the teacher is practising skimming, and a student asks what a particular word means, the teacher would probably tell them the word wasn’t important because they are practising skim-reading.