Failure to Fail Part 1- Why faculty evaluation may not identify a failing learner



I Recently gave a talk to fellow faculty on the phenomenon of “failure to fail” in emergency medicine. I am no expert, but I have tried to synthesize the details in a useful way. I have broken it down into three parts. Part 1 deals with the phenomenon of Failure to Fail.  In separate posts I will introduce some forms of evaluator bias and then provide a prescription for more effective learner assessment in the ER. As always comments are welcome!


  1. We expect our medical trainees to acquire the fundamental clinical skills
  2. We expect them to evolve from novice to expert.
  3. Our goal is to graduate cadres of competent physicians who will serve their communities safely, effectively and conscientiously.

The Importance of Direct observation and Work-Based Assessment:

The current model of medical training is a blend of didactic teaching, clinical learning, simulation and self-directed endeavours. We then try to evaluate the learners formatively and summatively through written exams and standardised clinical scenarios.

We are learning that the best tool to evaluate learners is direct observation in the work context. This requires four things:

  1. Deliberate practice [on the part of the trainee]
  2. Intentional observation [on the part of the faculty]
  3. Feedback
  4. Action planning

This will place even more emphasis on direct faculty oversight. We will therefore need to develop their skills and coach them how to:

  • Perform Direct observation
  • Perform a valid [i.e repeatable] evaluation of skills
  • Provide Effective feedback

Screen Shot 2014-01-29 at 1.51.23 PM

Current State of Trainee Evaluation

FACT: the current model is sometimes failing to discriminate and fail learners

      • This occurs in spite of observed unsatisfactory performance.
      • This occurs despite faculty confidence with their ability to discriminate.
      • Most faculty agree that this is the single most important problem with trainee evaluation.

We’re trying to understand why… This is what I found.

  •  The Wobegon Effect – [wikilink]
    • From the fictional town featured on a radio show – a town where

 “all the women are strong, all the men are good looking, and all the children are above average”

    • Describes the human tendency to overestimate one’s achievements and capabilities in relation to others. [Also known as illusory superiority – link] 
  • Grossly inflated performance ratings have been found practically everywhere in North America:
    • Business – both employees and managers
    • University professors – overestimate their excellence [gulp]
    • Studies on ‘bad drivers’ – everyone has one of these in their family! 
  • Not-surprisingly this phenomenon is equally pervasive in medicine
    • Faculty struggle to provide honest feedback and consistent [valid] evaluations. [One study raters rated 66% trainees “above average” This is simply not possible! Pubmed]
    • Fazio et al [see references] demonstrated that 40% of IM clerks should have failed, yet were passed on…FORTY PERCENT!!!
    • In this study by Harasym et al [Link] showed that OSCE examiners are more likely to err on the side of passing than failing students

    Screen Shot 2014-02-13 at 7.39.05 AM

    • Residents [in particular lowest performing ones] overestimate their competency compared to their ratings by faculty peers and nurses [Pubmed Link]
    • Moreover the biggest overestimations lay in the so called “soft skills – Communication, teamwork and professionalism. These are often the problems that give faculty and colleagues headaches with a particular learner.
    • One reason might be because  soft skills are hard to quantify – unlike suturing skills where incompetence is quickly identified 
  • End result is a culture of “failure to fail”  where …
    • Many Graduates  are not acquiring required skill-set
    • We are failing to serve patient needs
      • Reduced safety, increased diagnostic error and  reduced patient satisfaction
    • Increasing negative fallout to ENTIRE profession – our reputations are being besmirched in the digital era. 
    • Ultimately public trust is being eroded.
    • We cannot succeed at our job without public confidence in what we do.

Why we fail to fail learners:

Barriers to adequate Evaluation:

Learner factors

  • Learners are all different. Moreover, the same learner will vary in skills through time as they grow and develop.
  • We all have good and bad days.
  • There exists a phenomenon called “Group-work effect” where medical teams can mask deficiencies of individual learners.

Institution factors

  • Tools of evaluation flawed – some eval forms are poorly designed to discriminate learners
  • We all work in the current culture of “too busy to teach”.
  • There is an incredible amount of work needed to change this culture

Faculty Factors

  • Faculty feel confident in ability when poled.
  • Faculty feel sense of responsibility to patient, profession, learner BUT …
  • Raters themselves are the Largest source of variance in ratings of learners:
    • Examiners account for 40% of the variance in scores on OSCEs
    • Examiners’ ratings are FOUR times more varied than the “true” scores of students
    • some tend to be more strict – “Hawks” … some are more lenient – “Doves”
    • negligible effect of gender, ethnicity and age/experience [one UK study that “hawks are more likely to be ethnic minority and older – link]
  • Clinical competence of faculty members is also correlated with better evaluations [link]
    • One Interesting study where faculty took OSCE themselves, then rated students … Results show that:
      • Use their own practice style as frame of reference to rate learners
      • Better performers on the OSCE were more insightful and attentive evaluators

2013 convenience sample of U of S EM faculty: Top three reasons :

1. Fear of being perceived as unfair

2. Lack of confidence in the supporting evidence when considering to “fail”

3. Uncertainty about how to identify specific behaviours

What I  discovered in the literature:

  1. Competing demands [clinical vs educational] mean that education suffers.
  2. Lack of Documentation  – Preceptors fail to record day-to-day performance. So when it comes to end of rotation eval – not enough evidence.
  3. Interpersonal conflict model decribes the following phenomenon:
  • Faulty members’ goal is to improve trainees skills – preceptors do care a lot!
    • They perceive the need to emphasize the positives and be gentle [to protect learner self esteem and maintain an alliance/engage the trainee.
    • Faculty try and make feedback constructive without the learner feeling like it’s a personal attack.
    • This creates tension when one is forced to be negative and critical feedback
    • Emotional component of giving negative feedback also makes it even more difficult
    • Consequently this tension forces us to overemphasize the positives
    • Creates mixed messages regarding feedback. Learners walk away with wrong message.

4. Lack of self efficacy:

  • There’s a lack of knowledge of what specifically to document – a) Faculty don’t know what type of information to jot down, b) Faculty struggle to identify specific behaviours associated with failure.

[The reported low self-confidence during evaluations is actually a product of our training [or rather lack thereof]. No-one teaches you how to navigate minefields in evaluation. This is particularly evident for soft skills. Staff often think that their judgements are subjective interpretations].

5. Anticipating [arduous] appeal process – having an extra commitment, having to defend ones actions/comments, fear of escalation [legal action e.g.].
6. Lack of remediation options– there exists a lack of faculty support. This makes them unsure about what to do/advise after diagnosing a problem.


We have seen that the current model of medical training is failing to identify and fail underperforming learners. There are several reasons why, but faculty themselves play a large role in this culture of “Failure to Fail”. In my next post I will highlight some biases that we encounter when judging learners and provide a prescription for more effective learner evaluation.

Acknowledgement- Dr Jason Frank @drjfrank for pointing me in the right direction [authors Kogan and Holmboe are ones you should search out in particular]


Dudek et al 2005 Failure to Fail: The Perspectives of Clinical Supervisors

Fazio et al 2013. Grade inflation in the internal medicine clerkship: a national survey.

Harasym et al 2008 Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs

Kogan and Holmboe 2013. Realizing the promise and importance of performance-based assessment.


Leave a Reply

  1. Interesting post Dr. Lalani! Here are some random ramblings….

    As a student, evaluation is often on my mind as well. It can be very difficult to be a good and fair evaluator. I have seen some attendings be overly soft, giving exceeds or meets expectations when a student was clearly needing remedial work (opinion of rest of student group), but I’ve also seen some attendings be overly critical just because a student isn’t doing things “his/her” way, even though the student may have been previously taught that specific technique.

    I don’t think it is unfair or anything of the sort to fail a student who hasn’t performed to a standard. *But* I do think it necessary to inform the student of this before so s/he can act on the feedback before final evaluation. This is supposed to happen all the time already, but sometimes feedback is very sparse, unclear, and contradictory. From a student’s perspective, it can be a maze to navigate.

    An additional stress/difficulty for faculty though is sometimes they spend very little time with the student. Often it is the nurses or residents who see the student day-to-day, so how can an attending evaluate properly? Difficult for them for sure. I’m not sure how often attendings speak with the residents and nurses for multi-source feedback, but it doesn’t seem to happen nearly as often as it should. Another thing I have seen at one med school is students provide some of the multi-source feedback as well. I would really appreciate this as I feel students know the other students best, especially their teamwork and communication skills, etc. I’ve witnessed some instances where unprofessional behavior has occurred with/toward other students or with residents, but the attending never finds out because no one communicates the concern for fear of reprisal (esp. w/ fellow students). Maybe adding a student to the evaluation process would create more genuineness, as there is no need to impress/suck up to a classmate as sometimes occurs with attendings. This might be a better reflection for the “soft” skills….?

    I also wonder as well how CaRMS plays into this. Nearly every program description says under selection criteria “above average performance”. Clearly this makes no mathematical sense.

    In any case, there’ some jumbled thoughts. Thanks for the post, look forward to part 2!

    • Hi Danica,
      thanks for the student perspective – you have it right. Part 2 discusses some of the bias evaluators unwittingly introduce into evaluation. I also prescribe my solution to some of the questions you raise … stay tuned!