How to Fix Recommendation Bias and Evaluation Inflation
Recommendation letters are essential to landing a career in research or higher education. Like most people though, recommenders and hiring committees have biases—often unintentional—which can put some candidates at an unfair disadvantage and inflate the chances of others. To reduce bias and inflation in evaluations, research institutions could learn something from an organization that, perhaps surprisingly, has already taken steps to tackle such issues: the US Marine Corps.
In STEM and beyond, researchers have seen trends of biased recommendations and evaluations in fields such as medicine, geoscience, chemistry and biochemistry, business and economics, and psychology. Letters written for men were more likely to discuss achievements and superiority, while letters for women were more likely to focus on effort and raise doubts about the candidate’s excellence. I didn’t know this until I needed to write my first recommendation letter for a student I mentored, and came across a tip sheet online about how to avoid writing a gender-biased letter. I wondered how many other recommenders were unaware of their implicit biases, which are rooted in stereotypes we’ve absorbed throughout our life.
I also confronted the dilemma of how to evaluate someone fairly without exaggerating their capabilities—something that I could easily do, knowing that the competition is rigorous and that their future was partly in my hands. Recommendation inflation raises the question of whether admissions and hiring committees can compare applicants fairly if some evaluations are more honest than others. More importantly, implicit bias could affect how much a recommender inflates a letter or evaluation. Taking part in an implicit bias workshop and faculty hiring processes in graduate school further expanded my awareness of how bias and inflation affect professional advancement.
Many academics are trying to fix the problem, though. Some resources developed by researchers include the tip sheet I’ve already mentioned; implicit bias courses; and online tools to check for gender bias in recommendations. But an institution outside of academia can also offer a model: Over the past several years of my friendship with a Marine Corps officer, we traded details about the culture and rituals of academia and the military. This is how I learned about FitReps, the evaluations for officers and senior enlisted Marines.
Since 1999 the U.S. Marine Corps has been using what it calls Fitness Reports, or FitReps, to help solve evaluation inflation. With FitReps, recommenders rate an individual on over 10 different traits. These traits are generalized enough to apply to most of the hundreds of specialties within the Marine Corps—analogous to departments in research institutions. Giving someone the highest score for a trait needs written justification, helping prevent widespread inflation.
Each trait is considered equally in calculating an average score, which prevents someone reviewing the FitRep from subjectively deciding which criteria are most important. Written comments about accomplishments, characteristics and recommendations for promotion are also included with the scores. And unlike most academic recommenders, Marine officers take required training on how to complete the evaluations and have a manual for more guidance. Each FitRep is completed by two people, providing accountability and added value.
Most importantly, someone reviewing a Marine’s FitRep score can see how it compares to scores that the same evaluator gave to other Marines of the same rank.This transparency encourages recommenders to give honest evaluations, since a score given to one person will affect how another person’s score is judged. In research institutions, this type of evaluation archive could also help recommenders confidentially check if their letters and evaluations reflect implicit biases.
The system isn’t perfect. Data analysts published an extensive report for the Marine Corps in 2012 to answer questions about whether FitReps evaluated Marines fairly. The showed that FitReps did prevent inflation but also showed potential biases. White evaluators gave somewhat higher scores on average to white officers than to black officers; similarly, black evaluators gave higher scores to black officers than to white officers. Black and Hispanic officers received weaker recommendations for promotion than white officers who had the same relative FitRep score. These findings confirm that implicit biases are more difficult to extinguish than inflation. One surprising result: rather than tending to favor men over women, as is the case in academia, Marine officers received somewhat higher FitRep scores on average from evaluators of the opposite gender.
I’m not insisting that all research institutions should adopt a regimented evaluation process, but FitReps show that solutions exist to control inflation and check for bias. Academics already use online systems spanning hundreds of institutions, like the CommonApp and ORCID iD, so the idea of a systematic way to compare evaluations and recommendations may not be that far-fetched. The fact that implementing fairer recommendation systems might be difficult shouldn’t stop us from taking steps towards change, starting with the creation of training materials. Both recommenders and admissions/hiring committees should use currently available resources to check for inflation and bias while changes slowly take shape.