Influence of Different Scoring Algorithms for Multiple True-False Items on the Measurement Precision of Multiple Choice Exams

Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Guttormsen, Sissel; Fischer, Martin R; Huwendiek, Sören (29 August 2018). Influence of Different Scoring Algorithms for Multiple True-False Items on the Measurement Precision of Multiple Choice Exams. In: AMEE Conference (pp. 801-802). Dundee: Association for Medical Education in Europe

Full text not available from this repository.

Official URL: https://amee.org/getattachment/Conferences/AMEE-20...

High measurement precision in assessment is of main concern in medical education. It ensures competent candidates pass and incompetent candidates fail an exam, and not vice versa. Measurement precision can be estimated globally, as well as specifically at
the cut score. Multiple True-False (MTF) items are a multiple-choice question format that prompts true/false decisions to all options to an item, enabling partial knowledge to be rewarded. Rewarding partial knowledge, in return, can affect measurement precision. MTF items are either scored dichotomously by rewarding no partial knowledge (DS), to reward every bit of partial knowledge (PS1/n) or to reward partial knowledge, but with a threshold to suppress marginal knowledge and guessing (e.g. above 50% correct true/false decision to an item, PS50). This PhD thesis analyzes the influence of different scoring algorithms for MTF items on the measurement precision of medical exams.

To investigate the influence of scoring algorithms, we performed three studies. First, we analyzed the effect of scoring on global reliability, i.e. Cronbach’s alpha. In a second study, we analyzed how to calculate measurement precision at the cut score by introducing the concept of conditional reliability, using both Item Response Theory (IRT) and Classical Test Theory (CTT). In the third study, we analyzed the influence of scoring algorithms of MTF items on the measurement precision at the cut score by determining the conditional reliability and conditional Standard Error of Measurement (cSEM) at the cut score, and the number of candidates with ambiguous results.

We could show that rewarding partial knowledge in MTF items indeed influences measurement precision, both at a global level as well as at the cut score. In the first study, we could show that crediting partial knowledge with a threshold (PS50) leads to high global reliability. In the second study, we introduced the concept of conditional reliability to analyze measurement precision at the cut score, showed that results are quite contrary in IRT and CTT and argued to use it in IRT. In the third study, we showed that scoring MTF items with PS50 leads to high conditional reliability and low cSEM at the cut score, as well as the lowest number of candidates with ambiguous results.

With this PhD project, we comprehensively analyzed the influence of scoring for MTF items on the measurement precision in summative medical exams. By examining the effect of different scoring algorithms, we advanced the understanding regarding measurement precision introducing the concept of conditional reliability to assessment in medical education. Since rewarding partial knowledge above a certain level showed high global reliability, high conditional reliability and low cSEM at the cut score, as well as the lowest number of candidates with ambiguous results, we recommend using this scoring algorithm. To use real data, we simulated different scoring algorithms on existing items that were originally constructed for rewarding partial knowledge (PS50). It would be interesting to analyze whether these results hold true if items are constructed with another scoring algorithms in mind.

Item Type:	Conference or Workshop Item (Paper)
Division/Institute:	04 Faculty of Medicine > Medical Education > Institute for Medical Education > Assessment and Evaluation Unit (AAE) 04 Faculty of Medicine > Medical Education > Institute for Medical Education 04 Faculty of Medicine > Medical Education > Institute for Medical Education > Education and Media Unit (AUM)
Graduate School:	Graduate School for Health Sciences (GHS)
UniBE Contributor:	Lahner, Felicitas-Maria, Lörwald, Andrea Carolin, Bauer, Daniel, Guttormsen, Sissel, Huwendiek, Sören
Subjects:	600 Technology > 610 Medicine & health
Publisher:	Association for Medical Education in Europe
Language:	English
Submitter:	Daniel Bauer
Date Deposited:	30 Aug 2018 10:24
Last Modified:	05 Dec 2022 15:17
Additional Information:	AMEE 2018 ABSTRACT BOOK, p. 801-802
URI:	https://boris.unibe.ch/id/eprint/119438

Actions (login required)

Edit item

Influence of Different Scoring Algorithms for Multiple True-False Items on the Measurement Precision of Multiple Choice Exams

Interest & Impact

Downloads

Citations

Search

Services

Actions (login required)

Item Type:

Division/Institute:

Graduate School:

UniBE Contributor:

Subjects:

Publisher:

Language:

Submitter:

Date Deposited:

Last Modified:

Additional Information:

URI:

Actions (login required)