We make our mind up about people after seeing their faces for barely a fraction of a second.
Far from being trivial, these impressions impact our decision making and have real world implications. For example, politicians that simply appear more competent are more likely to win elections.
Can we reliably discern character from people’s faces, or are we being misled?
In Face Value, Princeton psychologist Alexander Todorov tells the scientific story of first impressions and argues the snap judgements we make of people’s faces are predictable, yet usually inaccurate.
Alexander Todorov manages to weave the psychological science of first impressions into a grand story, accompanied with slick photography and illustrations on virtually every other page which makes reading the book an engaging aesthetic experience. Face Value would make a great gift for anyone interested in the human mind, laymen and psychology nerds alike.
The story starts with the history of physiognomy.
Johann Kasper Lavater was the father of physionomy, which was conceived as the ‘art’ of reading people’s character from their faces. Although it was sold as a science of the time, it was anything but. Lavater’s ‘universal axioms and incontestable principles’ were:
[T]he forehead to the eyebrows, the mirror of intelligence; the cheeks and the nose form the seat of moral life; and the mouth and chin aptly represent the animal life.
Todorov states Lavater’s ‘evidence’ came from analysing sketches, and counterfactual statements “peppered with what would now be widely regarded as blatantly racist beliefs”.
The extent of Lavater’s influence on 19th century Europe cannot be overstated. For example, Lavater’s theories almost caused Charles Darwin to miss the Beagle voyage- the exploration that led to Darwin’s revolutionary observations and subsequently the theory of evolution. Why? The captain of the Beagle thought Darwin’s nose was too big, and doubted Darwin possessed ‘sufficient energy and determination for the voyage’. “But I think he was afterwards well-satisfied that my nose had spoken falsely”, Darwin is quoted saying in his autobiography.
Such questionable hiring practices were apparently widespread, and persisted for some time after Lavater’s influence waned. For example in the early 20th century, Katherine Blackford and Arthur Newcomb created a ‘scientific plan of employment’, a method they claimed selected the most competent employees. More than 200 companies used the services of Harrington Emerson, a firm which Blackford worked for. Although this was referred to as the science of ‘character analysis’, Todorov makes clear this was really pseudo-scientific physiognomy.
What sort of things did these assessments entail? Whilst trying not to raise any suspicions, the interviewers would make observations about candidate’s appearance such as their hair and eye colour, and gauge the shape of their head. For many jobs, an interview was not even needed. Increasingly, companies simply requested photographs as part of the recruitment process.
Despite the efforts from psychologists to discredit physionomy, the business world and the world at large remained receptive to such physiognomic ideas.
More sophisticated strands of physionogmy subsequently emerged. For example, Francis Galton introduced empirical methods to the field with the invention of composite photography at the end of the 19th century. In contrast to Lavater, Francis Galton was an established and respected scientist. Galton was a polymath, a cousin of Charles Darwin, and a hero to many scientists. Todorov argues Galton would be widely celebrated today as one of the greatest scientists of the 19th century were it not for his preoccupation with eugenics during the latter part of his life.
Galton’s interest in physionomy stemmed from his interest in heritability. It has since been well established that personality and physical appearance are partly heritable. However, Todorov argues it does not follow that there’s a correspondence between personality and physical appearance.
Composite photography was an empirical method for creating ‘pictorial averages’- a way of establishing the shared facial features of a group by identifying commonalities across individuals. Galton produced composite images of various groups, including families, prisoners, and people in asylum.
Galton’s composite photography started with prisoners, where he was ultimately left disappointed. Galton is quoted saying;
I have made numerous composites of various groups of convicts, which are interesting negatively rather than positively. They produce faces of a mean description, with no villainy written on them. The individual faces are villainous enough, but they are villainous in different ways, and when they are combined, the individual peculiarities disappear, and the common humanity of the low type is all that is left.
Despite Galton’s dissapointing results, the methods he invented continue to thrive. As stated by Todorov:
Today, anyone with a computer can obtain decent morphing software and manipulate facial images. Morphs of faces are regularly used in the media to illustrate concepts like the new face of America: a morph of faces representing the ethnicities living in the United States. And Galton’s project is alive and well. In the past decade, a few psychologists have been working on creating composites of different character types. Galton would have been pleased.
Numerous studies have recently been published purporting to show people’s ability infer personality, political orientation and sexuality from faces. However, Todorov stresses that the criteria used for the vast majority of these studies is better than chance, and that this threshold is not good enough. “The criterion for accuracy should be whether impressions from faces make us do better than relying on general knowledge and ignoring faces.”
Let’s use sexuality as an example. One study found that people who were asked to guess the sexual orientation of men’s after brief exposure to their Facebook photos (using photos posted by their friends) did better than chance. How much better? Not by much: they guessed accurately 52 percent of the time (marginally better than chance, which is 50 percent).
Technological advancements have also helped revive physiognomist’s ambitions. A paper was recently published that claimed artificial intelligence can deduce the sexuality of people on a dating site ‘with up to 91% accuracy’. The Guardian raised the alarm about this research, and the publication sparked a backlash from LGBT rights activists who fear this kind of technology could be used to identify gay people and put them at risk of harm.
However, how concerned should we be? Although the hit rate sounds impressive, the reality is that this technology is rather inaccurate. When shown one photo each of a gay and straight man both chosen at random, the model distinguished between them correctly 81% of the time. However, these hit rates don’t factor in base rates- the proportion of people who are actually gay in the overall population.
Roughly, 1 in 16 people identify as gay. Using the simple rule that assumes everyone is straight, you would be approximately 94% accurate. Conversely, the AI algorithm wouldn’t perform so well, as it would produce some false negatives (identifying gay people as straight) and a lot of false positives (mistaking straight men for being gay). When selecting the top 100 men most likely to be gay, the algorithm was only 47% accurate.
As outlined by the Economist:
The 91% accuracy rate only applies when one of the two men whose images are shown is known to be gay. Outside the lab the accuracy rate would be much lower. To demonstrate this weakness, the researchers selected 1,000 men at random with at least five photographs, but in a ratio of gay to straight that more accurately reflects the real world; approximately seven in every 100. When asked to select the 100 males most likely to be gay, only 47 of those chosen by the system actually were, meaning that the system ranked some straight men as more likely to be gay than men who actually are.
Todorov summarises his argument as follows:
Psychologists in the early twentieth century found little evidence for the accuracy of first impressions, but the past decade has seen a resurgence of physiognomic claims in scientific journals. We are told that it is possible to discern a person’s political leanings, religious affiliation, sexual orientation, and even criminal inclinations from images of their face… A closer look at the modern studies shows that the claims of the new physiognomy are almost as exaggerated as those in the eighteenth and nineteenth centuries.
Predicting elections? Child’s play
Todorov claims it is easy to discard the physiognonomists, but that they were clearly onto something. “Deep down, their intuition align with ours. We have immediate gut reactions to the appearance of others.”
Take a look at the two computer generated illustrations below. hypothetically, who would you rather talk to at a party?
If you’re like most people, you’ll want to chat to the person on the left. The individual on the left appears extroverted and therefore more fun than our introverted looking friend on the right.
Unlike the intuitions of physiognomists, these graphics have been produced by what Todorov terms ‘brute empirical force’. Todorov along with Nick Oosterhoof created a computer model which produced these faces by randomly generating a combination of facial compositions and asking participants to rate them on a range of dimensions. “As long as there is agreement on impressions, we can build precise models of these impressions and visualise them.”
Todorov says he became interested in studying first impressions after he and his students discovered that such judgements predict the outcomes of important political elections.
His research team administered questionnaires at Princeton, which contained images of the winners and the runners-up from all Senate races in the United States between 2000 and 2002- excluding highly recognisable politicians such as Hilary Clinton and John Kerry. Different students were given different questions, such as “who looks more competent?” and “who looks more honest?”.
Remarkably, judgements of who appeared more competent predicted approximately 70% of the elections.
Social scientists have since ruled out alternative explanations, such as whether the results were a product of poor image quality or the outcome of campaign spending. Such differences cannot explain the results.
Todorov states that a “general rule of science is that results should be replicable, especially if these results are surprising”. Accordingly, the appearance effect on election outcomes has since been replicated several times by researchers internationally.
One of the most famous replications was produced by John Antonakis and Olaf Dalgas, where they found swiss children’s judgements predicted the outcomes of French parliamentary elections. Antonakis and Dalgas used children as they would be less knowledgeable about politics than adults, and they could therefore rule out preferences based on additional information.
Antonakis and Dalgas had 5- 13 year old children initially play a computer game reenacting Odysseus’s trip from Troy to Ithaca. Then, the children were asked to imagine that they were about to undertake the voyage, and were shown pairs of pictures of French politicians who ran for the French parliment, and were asked to choose one of them as the captain of their boat. Just like adults’ competence judgements, children’s captain choices predicted approximately 70% of the elections.
Have a go for yourself. Out of the two below, which person looks more competent?
If you chose the guy on the left, congratulations! These two politicians ran for the Wisconsin senate seat in 2004, and Democrat Russ Feingold (left) was victorious over Republican Timothy Michels (right).
More recently, Mark van Vugt and Allen Grabo in the Netherlands reviewed this body of research in a paper titled The Many Faces of Leadership: An Evolutionary-Psychology Approach. Their lab used these methods for the then upcoming 2016 U.S. Presidential election, and made a contingency based prediction that Trump would win the election. Note that this was back in November 2015 before even the primaries had concluded, and when Trump winning the election was unthinkable for most people.
Evidently, we form judgements of people’s faces automatically and consistently. Todorov argues the mistake physionomists made was conflating predictable judgements with the accuracy of character assessments.
The agreement on our first impressions make physiognomy possible. The ease with which we dispatch impressions makes the physiognomists’ promise appealing. Physiognomists used the wrong methods and reached the wrong conclusions, but they were right that we can’t help but form impressions.
Trust and dominance
What exactly do these judgements correspond to then?
We’ve already established competence is perceived as the most important characteristic for politicians.
However, a key point emphasised in Face Value is that what people perceive as important can change in different contexts.
Imagine your country is currently at war and you have to cast your vote today. Who would you vote for, the face on the left or the right below?
If you’re like most people, you quickly opted for the face on the left.
However, what if you had to cast your vote during peacetime. Who would you vote for now?
Most people will reverse their choice and opt for the face on the right now. Todorov states; “This preference reversal is so easy to obtain that I often use it in classrooms.”
These images were produced by psychologist Antony Little and his colleagues here in the UK. The face produced on the left is more dominant and masculine, which is seen as a proxy for strong leadership. Conversely, the face on the right is perceived as more intelligent, forgiving and more likeable.
Now, look at the faces below.
You recognise these guys right? Back when the study was conducted, John Kerry was the Democratic candidate running against George W Bush for the U.S. Presidency.
The first pair of photos are actually morphs of 30 male faces. The faces were made by accentuating the differences between the shapes of Bush’s and Kerry’s faces from that of the average male face.
At the time of the U.S. election in 2004, the U.S. was at war in Iraq and Afghanistan. “I will leave the rest to your imagination.”
Todorov also makes clear that what we consider important also depends on our ideological inclinations and political leanings.
Out of the two faces below, who do you think would make a better leader?
The Danish researchers Lasse Lausten and Michael Petersen used faces generated by Todorov and Oosterhoof’s computer model and demonstrated that liberals tended to choose the face on the left, whereas conservatives tended to choose the more dominant face on the right.
Their research also illustrated that dominant-looking conservative candidates gained greater support, whereas dominant-looking liberal candidates were actually penalised- but only if they were men. Appearing dominant was bad news for female candidates, regardless of their political leanings.
Check out their face manipulations below.
The face has actually been manipulated to look less or more dominant (photo on the left and right, respectively). When the politicians were not well known, this manipulation had an effect. Liberals were more open to this guy’s policy position if he was made to look less dominant. Likewise, conservatives were more receptive when he was made to look more dominant.
The political situation or our ideological inclinations can change what we consider important, but they don’t change our propensity to form impressions and act on those impressions.
Although it is repetitive at times, the take home message from Face Value is that impressions of character attributes are highly similar, and boil down to two dimensions: dominance and trustworthiness.
Masculine appearance matters a great deal when it comes to impressions of dominance. Todorov states these impressions have a ‘kernel of truth’, as judgements of masculinity from faces tend to be associated with physical strength. “Physical strength is what primarily drives our attempt to read the ability of others to physically harm us.”
Todorov stresses that this is only a kernel, as most dominance hierarchies that matter in modern times are not based on physical strength. However this is reconcilable from an evolutionary perspective, and may serve as an example of a mismatch- psychological dispositions which served us well in our ancestral past, which are now misaligned with the demands of the modern world.
Also noted is that impressions of trustworthiness and dominance are not mutually exclusive: we tend to see untrustworthy faces as dominant, and dominant faces as untrustworthy.
Essentially what we’re evaluating when we see a new face is our gut reaction to how ‘good or bad’ we perceive the individual as being, and their level of power. As stated by Todorov;
Impressions of trustworthiness and dominance are the most important impressions, because they are our best effort, when only appearance information is available, to figure out the goodness or badness of the intentions of others and their ability to act on these actions.
Evolutionary theory is weaved into the book’s overarching narrative, and Todorov evidently sees human psychology as a product of evolutionary processes. However, Todorov argues that although evolutionary psychology is much more sophisticated than Blackford and Newcomb’s character analyses which drew heavily on evolutionary thinking, making evolutionary inferences is just as hard today as it was 100 years ago.
However, is this the case? Testing evolutionary hypotheses is certainly challenging, and it remains that we are left inferring from observations today about what happened hundreds of thousands of years ago. However, we now have a wealth of accumulated scientific evidence, new technologies and research methods which allow scientists to better test evolutionary hypotheses which were not available then.
A whole chapter in Face Value is dedicated to evaluating the evolutionary hypothesis that the face provides an ‘honest signal’ of character. In particular, Todorov evaluates whether facial masculinity, specifically the ‘facial-width-to-height ratio’ is a predictor of aggression. “In a nutshell, the claim that men with masculine faces are not only perceived as dominant, aggressive and threatening, but also that they are dominant, aggressive, and threatening.”
The facial-width-to-height ratio measures the distance between the left and right cheek bones by the distance between the upper lip and brow, as illustrated below.
Various studies show that we are sensitive to differences in this face ratio, as we perceive people with high ratios as being more aggressive and less trustworthy. However, what excited researchers was the possibility that this simple measure also predicts character traits.
In less than a 10 year period, over 60 scientific papers were published on the facial-width-to-height ratio. One of these papers was titled bad to the bone: facial structure predicts unethical behavior. The Canadian psychologists Justine Carré and Cheryl McCormick found that professional hockey players with a higher fWHRs were more likely to be in the penalty box, which was used as a straightforward measure of aggressive behaviour in an already aggressive sport.
The evolutionary logic behind such studies is that sexually dimorphic features in men are the result of male intrasexual competition. That is, bigger and stronger men were more reproductively successful, as they had a fighting advantage over smaller and weaker males.
However, Todorov states the paper ran into many methodological and theoretical issues. Evolutionary psychologist Bob Deaner and his colleagues doubted the results, and produced a large-scale replication of the hockey study using all NHL players except goalkeepers. They found no evidence that the fWHR predicts penalties directly linked to aggressive behaviour in the rink. However, Deaner and his colleagues did identify a key predictor of aggression: the size of the hockey players. The bigger players (those heavier and taller) tended to be more aggressive in the hockey rink.
The research on the facial width-to-height ratio indicates that rather than focusing solely on people’s faces, we rely on body size when judging physical formidability and by extension, aggressiveness and dominance.
Another leading evolutionary hypothesis for why fWHR could be an honest signal for character is sexual selection. It’s well established that heterosexual men find feminine faces more attractive, as a sign of fertility. The idea here is that because women are the choosier sex (as they invest more time and resources in raising children), they are attracted to masculine faces as masculinity signals good genetic quality.
However, Todorov argues the evidence is also stacked against this hypothesis. The strongest evidence against it comes from a meta-analytic review which found the correlation between gender and fWHR being .05, which is barely distinguishable from zero. A correlation this small means that gender can only explain up to 0.25% percent of the variation in fWHR across individuals. One would expect a much stronger correlation if fWHR was placed under sexual selection pressures.
Todorov notes that gender differences in fWHR are minuscule compared to gender differences we see in weight, height and muscle mass. Both weight and height have correlations with gender above .30, whereas the correlation between gender and muscle mass is above .80.
For fWHR to be sexually dimorphic, women would have to find men with masculine faces more attractive. However, the evidence on the attractiveness of men with masculine faces is split. In some studies, women find men with masculine faces more attractive. In others, they find feminine faces more so. What complicates matters is a leading theory that women find dominant men more attractive during ovulation. However, a recently released large-scale study on women’s use of oral contraceptives failed to replicate this finding.
Rather, the research suggests what women find attractive are men’s faces that have colour hues associated with good health, rather than a masculine shaped face.
That the face does not provide honest signals of character does not mean all evolutionary hypotheses are false. Rather, it suggests that these specific evolutionary hypotheses are likely incorrect.
It also doesn’t mean that these first impressions are useless, irrational quirks.
Relying on the face alone, it makes sense to make inferences of one’s character and intent. That we consistently perceive dominant faces as untrustworthy and threatening may be baggage from our evolutionary past, and could serve as an example of error management theory. It may be safer to assume bad intent behind a dominant looking stranger and be wrong most of the time, than to make no attribution and learn otherwise.
Todorov argues that evidence of adaptations in the face facilitating social communication but not for inferences of character is not surprisingly appreciating the timeline of human evolution. We humans spent the majority of our evolutionary history in small groups, and have only recently begun living in large-scale industrialised society. Todorov states it is no coincidence that physiognomy was born at a time when chiefdoms emerged, and that it flourished in the 19th century- when large-scale industrialised migration commenced. The physiognomists promised an easy, intuitive way of dealing with the increased uncertainty of interacting with strangers on a daily basis.
Todorov concludes the book with the following passage:
If you imagine humankind evolving for 24 hours, the time we have been living in large societies populated with strangers amounts to less than 5 minutes– the last 5 minutes of the day. The rest of the time, we lived in in small groups in which people did not have to rely on appearance information to draw inferences about character. The reliance on appearance emerged only in the last 5 minutes of our evolutionary history. Substantive person knowledge that was easily accessible in small-scale societies replaced by appearance stereotypes in large societies. In our quest to know others in the absence of good information, we are forced to rely on appearance information. This information could be useful as a guide to the intentions and actions of the person in the immediate here and now, but it is misleading as a guide to the person’s character… As long as we remember that, we will be less likely to fall into the physiognomist’s trap of seeing the face as a source of information about character.
What should business professionals take from all this? Hopefully, recruitment specialists accept it’s a bad idea to measure the width of people’s heads for selection purposes.
Despite the field’s disappointments, the revival of physiognomy has entered the business world. For example, an Israeli start-up called Faceception claims it’s software can not only identify complex personality traits from people’s faces, but that it can also spot terrorists and paedophiles. Business professionals should maintain healthy scepticism of such technologies, and seek evidence for such bold claims. Likewise, they should evaluate the ethical and legal dimensions of such technologies carefully.
Appreciating how consistent yet inaccurate our first impressions can be, the psychology of first impressions has serious implications for the conventional interview, and lends credence to the practice of blind assessments.
It’s not surprising that interviews are a weak predictor of job performance, with the typical correlation below .15. The discrepancy between the reality and expectations of interviews is what Richard Nisbett and Lee Ross call the interview illusion. Todorov argues that letters of reference are much better predictors of performance than interviews, as they summarise a much larger sample of observations. “In the world of available evidence, first impressions are of little value.”
Anna Lelkes was the first women to become a member of the Vienna Philharmonic Orchestra, one of the best in the world. It took Anna more than 20 years of playing as a ‘non-member’ to receive this status. Until recently, prestigious orchestras were comprised entirely by men. It was only until the introduction of blind auditions that the influx of women began. That is, prospective candidates being evaluated by a hiring committee behind closed curtains.
Todorov argues that had Anna Lelkes been screened in a blind audition at the very beginning, she would not have had to wait 20 years to become a member of the Vienna Philharmonic Orchestra.
If we care about fairness and better outcomes, we should structure decisions to increase access to valid performance-based information and limit access to appearance based information.
Written by Max Beilby for Darwinian Business
Click here to buy a copy of Face Value.
*Article updated 4th December 2017