I Recently gave a talk to fellow faculty on the phenomenon of “failure to fail” in emergency medicine. I am no expert, but I have tried to synthesize the details in a useful way. I have broken it down into three parts. Part 1 deals with the phenomenon of Failure to Fail. In separate posts I will introduce some forms of evaluator bias and then provide a prescription for more effective learner assessment in the ER. As always comments are welcome!
- We expect our medical trainees to acquire the fundamental clinical skills
- We expect them to evolve from novice to expert.
- Our goal is to graduate cadres of competent physicians who will serve their communities safely, effectively and conscientiously.
The Importance of Direct observation and Work-Based Assessment:
The current model of medical training is a blend of didactic teaching, clinical learning, simulation and self-directed endeavours. We then try to evaluate the learners formatively and summatively through written exams and standardised clinical scenarios.
We are learning that the best tool to evaluate learners is direct observation in the work context. This requires four things:
- Deliberate practice [on the part of the trainee]
- Intentional observation [on the part of the faculty]
- Action planning
This will place even more emphasis on direct faculty oversight. We will therefore need to develop their skills and coach them how to:
- Perform Direct observation
- Perform a valid [i.e repeatable] evaluation of skills
- Provide Effective feedback
Current State of Trainee Evaluation
FACT: the current model is sometimes failing to discriminate and fail learners
- This occurs in spite of observed unsatisfactory performance.
- This occurs despite faculty confidence with their ability to discriminate.
- Most faculty agree that this is the single most important problem with trainee evaluation.
We’re trying to understand why… This is what I found.
- The Wobegon Effect – [wikilink]
- From the fictional town featured on a radio show – a town where
“all the women are strong, all the men are good looking, and all the children are above average”
- Describes the human tendency to overestimate one’s achievements and capabilities in relation to others. [Also known as illusory superiority - link]
- Grossly inflated performance ratings have been found practically everywhere in North America:
- Business – both employees and managers
- University professors – overestimate their excellence [gulp]
- Studies on ‘bad drivers’ – everyone has one of these in their family!
- Not-surprisingly this phenomenon is equally pervasive in medicine
- Faculty struggle to provide honest feedback and consistent [valid] evaluations. [One study raters rated 66% trainees "above average" This is simply not possible! Pubmed]
- Fazio et al [see references] demonstrated that 40% of IM clerks should have failed, yet were passed on…FORTY PERCENT!!!
- In this study by Harasym et al [Link] showed that OSCE examiners are more likely to err on the side of passing than failing students
- Residents [in particular lowest performing ones] overestimate their competency compared to their ratings by faculty peers and nurses [Pubmed Link]
- Moreover the biggest overestimations lay in the so called “soft skills” – Communication, teamwork and professionalism. These are often the problems that give faculty and colleagues headaches with a particular learner.
- One reason might be because soft skills are hard to quantify – unlike suturing skills where incompetence is quickly identified
- End result is a culture of “failure to fail” where …
- Many Graduates are not acquiring required skill-set
- We are failing to serve patient needs
- Reduced safety, increased diagnostic error and reduced patient satisfaction
- Increasing negative fallout to ENTIRE profession – our reputations are being besmirched in the digital era.
- Ultimately public trust is being eroded.
- We cannot succeed at our job without public confidence in what we do.
Why we fail to fail learners:
Barriers to adequate Evaluation:
- Learners are all different. Moreover, the same learner will vary in skills through time as they grow and develop.
- We all have good and bad days.
- There exists a phenomenon called “Group-work effect” where medical teams can mask deficiencies of individual learners.
- Tools of evaluation flawed – some eval forms are poorly designed to discriminate learners
- We all work in the current culture of “too busy to teach”.
- There is an incredible amount of work needed to change this culture
- Faculty feel confident in ability when poled.
- Faculty feel sense of responsibility to patient, profession, learner BUT …
- Raters themselves are the Largest source of variance in ratings of learners:
- Examiners account for 40% of the variance in scores on OSCEs
- Examiners’ ratings are FOUR times more varied than the “true” scores of students
- some tend to be more strict – “Hawks” … some are more lenient – “Doves”
- negligible effect of gender, ethnicity and age/experience [one UK study that "hawks are more likely to be ethnic minority and older - link]
- Clinical competence of faculty members is also correlated with better evaluations [link]
- One Interesting study where faculty took OSCE themselves, then rated students … Results show that:
- Use their own practice style as frame of reference to rate learners
- Better performers on the OSCE were more insightful and attentive evaluators
2013 convenience sample of U of S EM faculty: Top three reasons :
1. Fear of being perceived as unfair
2. Lack of confidence in the supporting evidence when considering to “fail”
3. Uncertainty about how to identify specific behaviours
What I discovered in the literature:
- Competing demands [clinical vs educational] mean that education suffers.
- Lack of Documentation - Preceptors fail to record day-to-day performance. So when it comes to end of rotation eval – not enough evidence.
- Interpersonal conflict model decribes the following phenomenon:
- Faulty members’ goal is to improve trainees skills – preceptors do care a lot!
- They perceive the need to emphasize the positives and be gentle [to protect learner self esteem and maintain an alliance/engage the trainee.
- Faculty try and make feedback constructive without the learner feeling like it's a personal attack.
- This creates tension when one is forced to be negative and critical feedback
- Emotional component of giving negative feedback also makes it even more difficult
- Consequently this tension forces us to overemphasize the positives
- Creates mixed messages regarding feedback. Learners walk away with wrong message.
4. Lack of self efficacy:
- There's a lack of knowledge of what specifically to document - a) Faculty don't know what type of information to jot down, b) Faculty struggle to identify specific behaviours associated with failure.
[The reported low self-confidence during evaluations is actually a product of our training [or rather lack thereof]. No-one teaches you how to navigate minefields in evaluation. This is particularly evident for soft skills. Staff often think that their judgements are subjective interpretations].
5. Anticipating [arduous] appeal process – having an extra commitment, having to defend ones actions/comments, fear of escalation [legal action e.g.].
6. Lack of remediation options- there exists a lack of faculty support. This makes them unsure about what to do/advise after diagnosing a problem.
We have seen that the current model of medical training is failing to identify and fail underperforming learners. There are several reasons why, but faculty themselves play a large role in this culture of “Failure to Fail”. In my next post I will highlight some biases that we encounter when judging learners and provide a prescription for more effective learner evaluation.
Acknowledgement- Dr Jason Frank @drjfrank for pointing me in the right direction [authors Kogan and Holmboe are ones you should search out in particular]
Dudek et al 2005 Failure to Fail: The Perspectives of Clinical Supervisors
Fazio et al 2013. Grade inflation in the internal medicine clerkship: a national survey.
Harasym et al 2008 Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs
Kogan and Holmboe 2013. Realizing the promise and importance of performance-based assessment.