Yes/No/Maybe Comprehension Assessments:

Valid, Reliable, Easy to Use

Ashley Hastings, Barbara Wheatley Brenda Murphy
Global Language Education Services
Shenandoah University
Contact: hastings@glesismore.com

This is the online version of a poster that was presented at TESOL 2013 in Dallas, Texas, on March 23.
Click here for a full-size PDF view of the poster (you can scroll around and resize it to view individual elements).


Examples of Yes/No/Maybe Questions

Patricia graduated from high school last month. She plans to go to college, but she needs to save some money first, so she’s doing data entry for an insurance company. It’s boring work, but the pay is pretty good.

Q: Does Patricia have a job?


Fred used to own a car, but gasoline and insurance were costing him a fortune. He sold his car, and now he uses public transportation whenever he needs to go somewhere.

Q: Does Fred drive to work?


Charlie is always asking Sandra for favors. Just yesterday, he called her and asked if she could feed his cats for him next week. He can really be a nuisance sometimes!

Q: Is Charlie Sandra’s brother?

When it comes to objective tests of language comprehension, the Yes/No/Maybe format has several advantages over its better-known rivals, True/False and Multiple Choice.

First, Y/N/M is an entirely natural type of question. People ask and answer such questions all the time.  T/F questions are syntactically equivalent to statements, which makes them appear natural, but it is relatively uncommon for statements to function as questions in ordinary discourse. MC questions are formally unlike anything real people say to each other in real life.

Y/N/M questions are also quite easy to construct. One simply selects or creates a short text and then asks a factual question based on the text. The “Maybe” option provides great latitude in item creation, because it allows one to pose a question whose answer is not found in the text. The lack of the “Maybe” option makes T/F items somewhat more challenging to construct. As for MC items, anyone who has worked with them knows how difficult it often is to think of enough plausible distractors.

With respect to the guessing factor, Y/N/M questions (33% chance of guessing the correct answer) are obviously superior to T/F (50% probability). Statistically, a T/F test must contain (for example) 60 items to equal the performance of a 45-item Y/N/M test. MC items with four choices are, of course, even better in this respect than Y/N/M; but this assumes that all of the distractors are equally plausible, which may not be the case.

The data and images in this poster presentation were provided by Global Language Education Services, which created the tests and has delivered them online to thousands of ESL and EFL students. The data were accumulated over a two-year period from online test administrations held under controlled, supervised conditions, usually for the purpose of placing students in various levels of English language programs, at the following insititutions: University of Wisconsin-Milwaukee, Wisconsin Lutheran College, Elgin Community College, Palo Alto College, Kilgore College, Maharishi University of Management, Stratford University, King Faisal University, and Sanyuan Foreign Languages School. Levels of proficiency represented in the data range from near-beginner to high-intermediate.

Each time a student took one of these tests, he or she was given a set of 45 Y/N/M items, presented one at a time in random order, with no backtracking allowed. Each student’s item set was drawn randomly from a larger item pool: 135 items (listening), and 180 items (reading). Time limits of varying length were imposed by the institutions themselves in accordance with the abilities and needs of their students. Roughly 15% of the tests taken were incomplete, either because of time limits or because individuals failed to respond to one or more items. Only complete tests with 45 item responses were included in the data for these analyses.

Item evaluation has been ongoing, and from time to time items with inferior discrimination have been discarded and new items have been added. All the items in our data were included in the analyses presented here, whether they are in the current item pool or not.

Scores shown on this poster were computed using a correction-for-guessing formula and expressed as the percentage of items that were actually understood. KR-21 reliability estimates were computed as follows. The total number of correct answers by each subject was reduced by 15 (and set to zero in case of a negative result) to account for guessing. The mean and standard deviation were then found and used in the KR-21 formula; N was set at 30 (45 questions minus the 15 removed for guessing).

Item discriminations were computed as standard product-moment correlations between the items and the total score, which in the case of binary items is equivalent to the point-biserial computation.

Test
Scores in Sample
Item Reponses in Sample
Mean Score
Standard Deviation
KR-21
Mean Item Discrimination
Listening
1,578
71,010
31.8
28.1
0.94
0.41
Reading
1,415
63,675
38.5
26.2
0.92
0.38

The fact that the Reading scores were on average higher than the Listening scores is probably due to the fact that some of the programs that contributed data normally give the Reading test only to students who have already passed the Listening test.

Items were also divided into categories for separate analysis. First, the items were sorted according to mean scores and divided into thirds: Easy, Medium, and Hard.


Listening
Reading
Mean Score
Mean Item Discrimination
Mean Score Mean Item Discrimination
Easy
51.3 0.40
59.9
0.40
Medium
35.5
0.45
40.2
0.39
Hard
16.8
0.37
21.6
0.35

Second, the items were divided according to their correct answers: Yes, No, and Maybe.


Listening
Reading
Mean Score
Mean Item Discrimination
Mean Score Mean Item Discrimination
Yes
45.9 0.33
47.8
0.36
No
32.6
0.42
37.0
0.42
Maybe
28.7
0.48
36.8
0.36

The scores in these categories were arranged by percentile and plotted as shown below.


Easy   Medium   Hard
Yes    No    Maybe 




Easy   Medium   Hard
Yes    No    Maybe 


The construct validity of these tests is supported by the even spacing and similar slopes of the Easy, Medium, and Hard items.

It is also clear that the Yes, No, and Maybe items perform well. The Yes items tend to have higher scores than the other two types; this may be due to response bias, as a quick survey of our data suggests that less proficient individuals tend to use Yes as their favored guessing response. We plan to do further research on this topic, using our existing data as well as new data that is constantly being generated.

The Maybe items appear to be better discriminators in Listening than in Reading. We hypothesize that Maybe items require more scrutiny than Yes or No items (because the correct answer depends on determining that the information asked for in the question is not actually given), and that this is especially difficult for less proficient individuals in the case of the Listening test, because it takes more time and effort to listen repeatedly to a Listening item than it does to repeatedly scan a Reading item. An unpublished pilot study conducted at Shenandoah University by Brenda Murphy tends to support this hypothesis, but the number of subjects was very small. We now have detailed data from the server that supplies these test items, including information about the amount of time that individuals spend on each item; we plan to analyze this data to see whether it sheds any light on this issue.











CourseSites by Blackboard is an excellent platform for online education. Its instructional tools are probably better known than its assessment tools, but both can be of great value—especially since individual instructors can create courses on CourseSites at no cost. See CourseSites for details.

Disclaimer: The authors are neither affiliated with nor compensated by CourseSites or its parent company, Blackboard.

The CourseSites assessment tools do not include a special format for Y/N/M tests, but it is easy to use the Multiple Choice format for this purpose. Just assign the response values "Yes," "No," and "Maybe" to the first three choices, and delete the fourth.

With CourseSites, you can:
  • control access to your assessments by setting passwords
  • create student groups and control which groups have access to which tests
  • make access to one test dependent on the results of another test
  • make a test open and close at specified times that you set in advance
  • control how long each student has to complete a test
  • decide whether items will be presented in a fixed or random order, all at once or one at a time
  • create a large item pool and have CourseSites randomly select a unique subset of items for each student
  • allow or prevent backtracking to earlier items
  • protect item security by turning off feedback options
To get a rough idea of how your language proficiency assessments might look to your students in CourseSites, click the link below and log in as

Username: eslteacher
Password: tesol2013


The CourseSite course contains a link to a tutorial that gives some technical pointers on creating Y/N/M Listening and Reading comprehension items and installing them in CourseSites. You can also access the tutorial through this link:


When your students take a test in CourseSites, their results are recorded in the Grade Center. The Grade Center display shows you how many total points each student earned (either raw score or percentage). However, you can download a data file that details exactly how each student responded to each item. This file can be pasted into an Excel spreadsheet, which gives you the power to extract and process a great deal of information that is not revealed in the Grade Center view.

Here are some of the things you can do by entering appropriate formulas in Excel:
  • distinguish between items that were answered incorrectly and items that were not answered at all
  • correct for guessing, which helps level the playing field for students with different guessing strategies
  • compute individual item difficulties
  • compensate for differential item difficulties in scoring
  • sort and display results according to name, ID number, or score


That's all! We hope this has been of some use to you.
If you have any questions, don't hesitate to contact the lead author:

Ashley Hastings, hastings@glesismore.com