# devtools::install_github("crsh/papaja")
library("papaja") # for manuscript formatting
library("kableExtra") # for table formatting
library("dplyr") # for data wrangling
library("magick") # for image processing

# devtools::install_github("debruine/webmorphR")
# devtools::install_github("debruine/webmorphR.stim")
# devtools::install_github("debruine/webmorphR.dlib")
library("webmorphR") # for reproducible stimuli
library("webmorphR.stim") # for additional stimuli
library("webmorphR.dlib") # for face detection

# bibliography
r_refs("r-references.bib", append = FALSE)

# Seed for random number generation
set.seed(42)
knitr::opts_chunk$set(
  cache       = FALSE,
  cache.extra = knitr::rand_seed,
  out.width   = "100%",
  warning     = FALSE,
  message     = FALSE,
  echo        = FALSE
)

wm_opts(plot.maxwidth = 850)

Introduction

Face stimuli are commonly used in research on visual and social perception. This almost always involves some level of stimulus preparation to rotate, resize, crop, and reposition faces on the image. In addition, many studies systematically manipulate face images by changing color and/or shape properties [e.g., @jones2019biological; reviewed in @Little_2011].

Gronenschild and colleagues [-@Gronenschild_2009] argue for the importance of standardizing face stimuli so that they are not "confounded by factors such as brightness and contrast, head size, hair cut and color, skin color, and the presence of glasses and earrings". They describe a three-step standardization process. First, they manually removed features such as glasses and earrings in Photoshop. Second, they geometrically standardized images by semi-automatically defining eye and mouth coordinates used to fit the images within an oval mask, Third, they optically standardized images by converting them to greyscale and remapping values between the minimum and 98% threshold onto the full range of values. While laudable in its aims, this procedure has not achieved widespread adoption, probably because the authors provided no code or tools. In personal communication, the main author said that this is because "the procedure is based on standard image processing algorithms described in many textbooks". However, we were unable to easily replicate the procedure and found several places where instructions had more than one possible interpretation or relied on the starting images having specific properties, such as symmetric lighting reflections in the eyes.

The goal of this paper is to argue for the importance of reproducible stimulus processing methods in face research and to introduce an open-source R package that allows researchers to create face stimuli with scripts that can then be shared so that others can create stimuli using identical methods.

Why are reproducible stimulus construction methods important?

Lisa once gave up on a research project because she couldn't figure out how to manipulate spatial frequency to make the stimuli look like those in a relevant paper. When she contacted the author, they didn't know how the stimuli were created because a postdoc had done it in Photoshop and didn't leave a detailed record of the method.

Reproducibility is especially important for face stimuli because faces are sampled, so replications should sample new faces as well as new participants [@barrgeneralizing]. The difficulty of creating equivalent face stimuli is a major barrier to this, resulting in stimulus sets that are used across dozens or hundreds of papers. For example, the Chicago Face Database [@CFD_2015] has been cited in almost 800 papers. Ekman and Friesen's [-@ekman1976pictures] Pictures of Facial Affect has been cited more than 5500 times. This image set is currently selling for $399 for "110 photographs of facial expressions that have been widely used in cross-cultural studies, and more recently, in neuropsychological research". Such extensive reuse of image sets means that any confounds present in the image set can cause highly "replicable" but potentially false findings.

Additionally, image sets are often private and reused without clear attribution. Our group has only recently been trying to combat this by making image sets public and citable where possible [e.g., @FRL_London;@Canada2003;@Morrison_2018] and including clear explanations of reuse where not possible [e.g., @jones2018no,@holzleitner2019comparing].

Common Techniques

In this section, we will give an overview of common techniques used to process face stimuli across a wide range of research involving faces. It was basically impossible to systematically survey the literature about the methods used to create facial stimuli, in large part because of poor documentation. However, several common methods are discussed below.

Mystery Methods

Many researchers describe image manipulation generically or use "in-house" methods that are not well specified enough for another researcher to have any chance of replicating them.

Each of the images was rendered in gray-scale and morphed to a common shape using an in-house program based on bi-linear interpolation (see e.g., Gonzalez & Woods, 2002). Key points in the morphing grid were set manually, using a graphics program to align a standard grid to a set of facial points (eye corners, face outline, etc.). Images were then subject to automatic histogram equalization. [@burton2005robust, p. 263]

The reference above [@gonzalez2002digital] has been cited by 2384 papers on Google Scholar, and is a 190-page book. It mentions bilinear interpolation on pages 64--66 in the context of calculating pixel color when resizing images and it's unclear how this could be used to morph shape.

They were cropped such that the hair did not extend well below the chin, resized to a height of 400 pixels, and placed on 400 x 400 pixel backgrounds consisting of phase-scrambled variations of a single scene image (for example stimuli, see Figure 1). [@pegors2015simultaneous, p. 665]

While the example images in the figure mentioned above help to clarify the methods, it was clear that there was a large degree of subjectivity in determining how to crop the hair.

Photoshop/Image editors

A search for "Photoshop face attractiveness" produced 19,300 responses in Google Scholar[^1]. Here are descriptions of the use of Photoshop from a few of the top hits.

[^1]: All web search figures are from Google Scholar in May 2022.

If necessary, scanned pictures were rotated slightly, using Adobe Photoshop software, clockwise to counterclockwise until both pupil centres were on the same y-coordinate. Each picture was slightly lightened a constant amount by Adobe Photoshop. [@Scheib_1999, p. 1914]

These pictures were edited using Adobe Photoshop 6.0 to remove external features (hair, ears) and create a uniform grey background. [@sforza2010my, p. 150]

The averaged composites and blends were sharpened in Adobe Photoshop to reduce any blurring introduced by blending. [@rhodes2001attractiveness, p. 615]

Most papers that use Photoshop methods simply state in lay terms what the editing accomplished, and not the specific tools used to accomplish it. For example, it is not clear what sharpening tool was used in the last quote above, and what settings were used. Were all images sharpened by the same amount or was this done "by eye"?

A potential danger to processing images "by eye" is the possibility of visual adaptation affecting the researcher's perception. It is well known that viewing images with specific alterations to shape or colour alters the perception of subsequent images [@rhodes2017adaptive]. Thus, a researcher's perception of the "typical" face can change after exposure to altered faces [@DeBruine:2007JEPHPP]. While some processing will always require human intervention, reproducible methods can also allow researchers to record their specific decisions so such biases can be detected and corrected for.

Scriptable Methods

There are several scriptable methods for creating image stimuli, including MatLab, ImageMagick, and GraphicConvertor. Photoshop is technically scriptable, but a search of "Photoshop script face" only revealed a few computer vision papers on detecting photoshopped images [e.g., @wang2019detecting].

MatLab [@higham2016matlab] is widely used within visual psychophysics. A Google Scholar search for "MatLab face attractiveness" returned 23,000 hits, although the majority of papers we inspected used MatLab to process EEG data, present the experiment, or analyse image color, rather than using MatLab to create the stimuli. "MatLab face perception" generated 97,300 hits, more of which used MatLab to create stimuli.

The average pixel intensity of each image (ranging from 0 to 255) was set to 128 with a standard deviation of 40 using the SHINE toolbox (function lumMatch) (Willenbockel et al., 2010) in MATLAB (version 8.1.0.604, R2013a). [@visconti2014facilitated, p. 2]

ImageMagick [@imagemagick] is a free, open-source program that creates, edits, and converts images in a scriptable manner. The {magick} R package [@R-magick] allows you to script image manipulations in R using ImageMagick.

Images were cropped, resized to 150 × 150 pixels, and then grayscaled using ImageMagick (version 6.8.7-7 Q16, x86_64, 2013-11-27) on Mac OS X 10.9.2. [@visconti2014facilitated, p. 2]

GraphicConvertor [@nishimura2000graphicconverter] is typically used to batch process images, such as making images a standard size or adjusting color. While not technically "scriptable", batch processing can be set up in the GUI interface and then saved to a reloadable ".gaction" file. (A search for '"gaction" graphicconvertor' on Google Scholar returned no hits.)

We used the GraphicConverterTM application to crop the images around the cat face and make them all 1024x1024 pixels. One of the challenges of image matching is to do this process automatically. [@paluszek2019pattern, p.214]

Scriptable methods are a laudable start to reproducible stimuli, but the scripts themselves are often not shared, or are in a proprietary closed format, such as MatLab. Additionally, most images that were processed with scriptable methods also used some non-scripted pre-processing to manually crop or align the images.

Commerical morphing

Face averaging or "morphing" is a common technique for making images that are blends of two or more faces. We found 937 Google Scholar responses for "Fantamorph face", 170 Google Scholar responses for "WinMorph face" and fewer mentions of several other programs, such as MorphThing (no longer available) and xmorph.

Most of these programs do not use open formats for storing delineations: the x- and y-coordinates of the landmark points that define shape and the way these are connected with lines. Their algorithms also tend to be closed and there is no common language for describing the procedures used to create stimuli in one program in a way that is easily translatable to another program. Here are descriptions of the use of commercial morphing programs from a few of the top hits.

The faces were carefully marked with 112 nodes in FantaMorph™, 4th version: 28 nodes (face outline), 16 (nose), 5 (each ear), 20 (lips), 11 (each eye), and 8 (each eyebrow). To create the prototypes, I used FantaMorph Face Mixer, which averages node locations across faces. Prototypes are available online, in the Personality Faceaurus [http://www.nickholtzman.com/faceaurus.htm]. [@Holtzman_2011, p. 650]

The link above contains only morphed face images and no further details about the morphing or stimulus preparation procedure.

The 20 individual stimuli of each category were paired to make 10 morph continua, by morphing one endpoint exemplar into its paired exemplar (e.g. one face into its paired face, see Figure 1C) in steps of 5%. Morphing was realized within FantaMorph Software (Abrosoft) for faces and cars, Poser 6 for bodies (only between stimuli of the same gender with same clothing), and Google SketchUp for places. [@Weigelt_2013, p. 4]

Psychomorph/WebMorph {#psychomorph}

Psychomorph is a program developed by Tiddeman, Perrett and colleagues. It uses "template" files in a plain text open format to store delineations and the code is well documented in academic papers and available as an open-source Java package.

Benson and Perrett [@benson1991perception;@benson1991synthesising;@benson1993extracting] describe algorithms for creating composite images by marking corresponding coordinates on individual face images, remapping the images into the average shape, and combining the colour values of the remapped images. These images are also called "prototype" images and can be used to generate caricatures.

The averaging and caricaturing methods were later complemented by a transforming method [@rowland1995manipulating]. This method quantifies shape and colour differences between a pair of faces, creating a "face space" vector along which other faces can be manipulated. This method is distinct from averaging. For example, averaging an individual face with a prototype smiling face will produce a face that looks approximately halfway between the individual and the prototype. The smile will be more intense than the original individual's smile if they weren't smiling, and be less intense if the individual was smiling more than the prototype. However, the transform method defines the shape and/or color difference between neutral and smiling prototypes to define a vector of smiling. Transforming an individual face by some positive percent of the difference between neutral and smiling faces will then always result in an individual face that looks more cheerful than the original individual, no matter how cheerful they started out (Fig\ \@ref(fig:transform-demo)).

# setup for making figures below
neutral <- load_stim_london() |> 
  add_info(london_info) |>
  subset(face_gender == "female") |>
  webmorphR.dlib::auto_delin("dlib70", replace = TRUE)

smiling <- load_stim_smiling() |> 
  add_info(london_info) |>
  subset(face_gender == "female") |>
  webmorphR.dlib::auto_delin("dlib70", replace = TRUE)

neutral_avg <- avg(neutral, texture = FALSE)
smiling_avg <- avg(smiling, texture = FALSE)
neutral_avg_texture <- avg(neutral, texture = TRUE)
smiling_avg_texture <- avg(smiling, texture = TRUE)

aligned <- c(neutral_avg, smiling_avg, 
             neutral_avg_texture, smiling_avg_texture) |>
  rename_stim(new_names = c("neutral_avg", "smiling_avg", 
                            "neutral_avg_texture", "smiling_avg_texture")) |>
  align()

write_stim(aligned, "images", overwrite = TRUE)
write_stim(smiling[2], "images", "ind_1", overwrite = TRUE)
write_stim(smiling[3], "images", "ind_2", overwrite = TRUE)
avg <- read_stim("images", "_avg\\.") |> crop(0.6, 0.75)
ind <- read_stim("images", "ind_") |> crop(0.6, 0.75)
smi <- trans(ind, avg$neutral_avg, avg$smiling_avg, c(smile = 0.5), 0.5)
s1 <- c(avg$smiling_avg, ind[1]) |> avg()
s2 <- c(avg$smiling_avg, ind[2]) |> avg()

grid <- c(ind$ind_1, ind_s1 = s1, smi$ind_1_smile, 
  ind$ind_2, ind_s2 = s2, smi$ind_2_smile) |> 
  label(rep(LETTERS[3:5], 2), gravity = "northwest", location = "+20+10", size = 80) |>
  plot(external_pad = 0)

avg |>
  resize(height = height(grid)) |>
  label(LETTERS, gravity = "northwest", location = "+10+5", size = 40) |>
  c(grid) |>
  plot(padding = 3)

These methods were improved by wavelet-based texture averaging [@tiddeman2001prototyping], resulting in images with more realistic textural details, such as facial hair and eyebrows. This reduces the "fuzzy" look of composite images, but can also result in artifacts, such as lines on the forehead in Figure\ \@ref(fig:texture-comp), which are a result of some images having a fringe.

x <- load_stim_london("005|024|036|119")

notex <- avg(x, texture = FALSE)
tex <- avg(x, texture = TRUE)

c(notex, tex) |> crop(0.5, 0.75, y_off = 0.1)

The desktop version of Psychomorph was last updated in 2013, and can be difficult to install on some computers. To solve this problem, we started developing WebMorph [@webmorph], a web-based version that uses the Facemorph Java package from Psychomorph for averaging and transforming images, but has independent methods for delineation and batch processing. While the desktop version of Psychomorph has limited batch processing ability, it requires a knowledge of Java to be fully scriptable. WebMorph has more extensive batch processing capacity, including the ability to set up image processing scripts in a spreadsheet, but some processes such as delineation still require a fair amount of manual processing. In this paper, we introduce webmorphR [@R-webmorphR], an R package companion to WebMorph that allows you to create R scripts to fully and reproducibly describe all of the steps of image processing and easily apply them to a new set of images.

terms <- tibble::tribble(
  ~Term, ~Definition,
  "template", "a set of landmark points that define shape and the way these are connected with lines; only image with the same template can be averaged or transformed",
  "landmark", "a point that marks corresponding locations on different images",
  "morphing", "blending two or more images to make an image with an average shape andor color",
  "transforming", "changing the shape and/or color of an image by some proportion of a vector that is  defined as the difference between two images",
  "delineation", "the x- and y-coordinates for a specific template that describe an image",
  "lines", "connections between landmarks; these may be used to interpolate new landmarks for morphing",
  "prototype", "an average of faces with similar characteristics, such as expression, gender, age, and/or ethnic group",
  "composite", "an average of more than one face image"
) |>
  dplyr::arrange(Term)

kableExtra::kable(terms, caption = "Glossary of terms.") |>
  kable_styling(bootstrap_options = c("striped", "responsive"))

Methods

In this section, we will cover some common image manipulations and how to achieve them reproducibly using webmorphR [@R-webmorphR]. We will also be using webmorphR.stim [@R-webmorphR.stim], a package that contains a number of open-source face image sets, and webmorphR.dlib [@R-webmorphR.dlib], a package that provides dlib models and functions for automatic face detection. These latter two packages cannot be made available on CRAN because of their large file size.

# install.packages("webmorphR")
# remotes::install_github("debruine/webmorphR.stim")
# remotes::install_github("debruine/webmorphR.dlib")
library("webmorphR") 
library("webmorphR.stim") # for additional stimuli
library("webmorphR.dlib") # for face detection with dlib

Editing

Almost all image sets start with raw images that need to be cropped, resized, rotated, padded, and/or color normalised. Although many reproducible methods exist to manipulate images in these ways, they are complicated when an image has an associated template, so webmorphR has functions that alter the image and template together (Fig.\ \@ref(fig:editing)).

orig <- demo_stim() # load demo images
mirrored <- mirror(orig)
cropped  <- crop(orig, width = 0.75, height = 0.75)
resized  <- resize(orig, 0.75)
rotated  <- rotate(orig, degrees = 180)
padded   <- pad(orig, 30, fill = "black")
grey     <- greyscale(orig)
c(orig, mirrored, cropped, resized, rotated, padded, grey) |>
  pad(0, 0, 0, 60) |>
  label(rep(LETTERS, each = 2), 
        gravity = "northwest", size = 60) |> 
  plot_stim(nrow = 2, byrow = FALSE)

Delineation

The image manipulations above work best if your raw images start the same size and aspect ratio, with the faces in the same orientation and position on each image. This is frequently not the case with raw images. Image delineation provides a way to set image manipulation parameters relative to face landmarks by marking corresponding points according to a template.

WebMorph.org's default face template marks 189 points (Fig.\ \@ref(fig:delineate)). Some of these points have very clear anatomical locations, such as point 0 ("left pupil"), while others have only approximate placements and are used mainly for masking or preventing morphing artifacts from affecting the background of images, such as point 147 ("about 2cm to the left of the top of the left ear (creates oval around head)"). Template point numbering is 0-based because PsychoMorph was originally written in Java.

load_stim_composite("f_multi") |>
  crop_tem(20) |>
  resize(2) |>
  draw_tem(pt.shape = "index", 
           pt.color = "#FFFFFFFF", 
           pt.size = 25)

The function tem_def() retrieves a template definition that includes point names, default coordinates, and the identity of the symmetrically matching point for mirroring or symmetrising images Table\ \@ref(tab:tem-def).

# get all information about a standard template
FRL <- tem_def("frl")

FRL$points[1:10, 1:5] |> 
  kableExtra::kable(caption = "The first 10 landmark points of WebMorph.org's default \"FRL\" template.", digits = 0) |>
  kable_styling(bootstrap_options = c("striped"))

You can automatically delineate faces with a simpler template (Fig.\ \@ref(fig:auto-delin)) using the online services provided through the free web platform Face++ [-@faceplusplus], or dlib models provided by Davis King on a CC-0 license.

# load 5 images with FRL templates
f <- load_stim_london("006|038|064|066|135")

# remove templates and auto-delineate with dlib
dlib70_tem <- auto_delin(f, "dlib70", replace = TRUE)
dlib7_tem <- auto_delin(f, "dlib7", replace = TRUE)

# remove templates and auto-delineate with Face++
fpp106_tem <- auto_delin(f, "fpp106", replace = TRUE)
fpp83_tem <- auto_delin(f, "fpp83", replace = TRUE)
c(f[1], fpp106_tem[1], fpp83_tem[1], dlib70_tem[1], dlib7_tem[1]) |>
  draw_tem() |>
  label(LETTERS, gravity = "northwest", location = "+20+10")

A study comparing the accuracy of four common measures of face shape (sexual dimorphism, distinctiveness, bilateral asymmetry, and facial width to height ratio) between automatic and manual delineation concluded that automatic delineation had higher replicability and good correlations with manual delineation [@jones2021facial]. However, around 2% of images had noticeably inaccurate automatic delineation, which should be screened for by outlier detection and visual inspection.

You can use the delin() function in webmorphR to open auto-delineated images in a visual editor to fix any inaccuracies.

dlib7_tem_fixed <- delin(dlib7_tem)
knitr::include_graphics("images/delin.png")

While automatic delineation has the advantage of being very fast and generally more replicable than manual delineation, it is more limited in the areas that can be described. Typically, automatic face detection algorithms outline the lower face shape and internal features of the face, but don't define the hairline, hair, neck, or ears. Manual delineation of these can greatly improve stimuli created through morphing or transforming (Fig.\ \@ref(fig:avg-comp)).

frl_avg <- avg(f)
fpp_avg <- avg(fpp106_tem)

c(frl_avg, fpp_avg) |> 
  label(LETTERS, gravity = "northwest", location = "+20+10")

Facial Metrics

Once you have images delineated, you can use the x- and y-coordinates to calculate various facial-metric measurements (Table\ \@ref(tab:metrics)). Get all or a subset of points with the function get_point(). Remember, points are 0-based, so the first point (left pupil) is 0. This function returns a data table with one row for each point for each face.

eye_points <- get_point(f, pt = 0:1)
eye_points |>
  kable(caption = "Coordinates of the first two points.", row.names = FALSE) |>
  kable_styling(bootstrap_options = c("striped", "responsive"))

The metrics() function helps you quickly calculate the distance between any two points, such as the pupil centres, or use a more complicated formula, such as the face width-to-height ratio from Lefevre et al. [-@lefevre2013telling].

# inter-pupillary distance between points 0 and 1
ipd <- metrics(f, c(0, 1))

# face width-to-height ratio
left_cheek <- metrics(f, "min(x[110],x[111],x[109])")
right_cheek <- metrics(f, "max(x[113],x[112],x[114])")
bizygomatic_width <- right_cheek - left_cheek
top_upper_lip <- metrics(f, "y[90]")
highest_eyelid <- metrics(f, "min(y[20],y[25])")
face_height <- top_upper_lip - highest_eyelid
fwh <- bizygomatic_width/face_height

# alternatively, do all calculations in one equation
fwh <- metrics(f, "abs(max(x[113],x[112],x[114])-min(x[110],x[111],x[109]))/abs(y[90]-min(y[20],y[25]))")
data.frame(
  face = names(f),
  x0 = metrics(f, "x[0]"),
  y0 = metrics(f, "y[0]"),
  x1 = metrics(f, "x[1]"),
  y1 = metrics(f, "y[1]"),
  ipd = ipd,
  fwh = fwh
) |>
  kable(caption = "Facial metric measurements.", row.names = FALSE) |>
  kable_styling(bootstrap_options = c("striped", "responsive"))

While it is possible to calculate metrics such as width-to-height ratio from 2D face images, this does not mean it is a good idea. Even on highly standardized images, head tilt can have large effects on such measurements [@hehman2013enhancing]. When image qualities such as camera type and head-to-camera distance are not standardized, facial metrics are meaningless at best [@tvrebicky2016focal].

Alignment

If your image set isn't highly standardised, you probably want to crop, resize and rotate your images to get them all in approximately the same orientation on images of the same size. There are several reproducible options, each with pros and cons.

One-point alignment (Fig.\ \@ref(fig:norm-comp)A) doesn't rotate or resize the image at all, but aligns one of the delineation points across images. This is ideal when you know that your camera-to-head distance and orientation was standard (or meaningfully different) across images and you want to preserve this in the stimuli, but you still need to get them all in the same position and image size.

Two-point alignment (Fig.\ \@ref(fig:norm-comp)B) resizes and rotates the images so that two points (usually the centres of the eyes) are in the same position on each image. This will alter relative head size such that people with very close-set eyes will appear to have larger heads than people with very wide-set eyes. This technique is good for getting images into the same orientation when you didn't have any control over image rotation and camera-to-head distance of the original photos.

Procrustes alignment (Fig.\ \@ref(fig:norm-comp)C) resizes and rotates the images so that each delineation point is as aligned as possible across all images. This can obscure meaningful differences in relative face size (e.g., a baby's face will be as large as an adult's), but can be superior to two-point alignment. While this requires that the whole face be delineated, you can use a minimal template such as a face outline or the Face++ auto-delineation to achieve good results.

You can very quickly delineate an image set with a custom template using the delin() function in webmorphR if auto-delineation doesn't provide suitable points.

# one-point alignment
onept <- align(f, pt1 = 55, pt2 = 55,
               x1 = width(f)/2, y1 = height(f)/2,
               fill = "dodgerblue")

# two-point alignment
twopt <- align(f, pt1 = 0, pt2 = 1, fill = "dodgerblue")

# procrustes alignment
proc <- align(f, pt1 = 0, pt2 = 1, procrustes = TRUE, fill = "dodgerblue")
plot_rows(A = onept, B = twopt, C = proc, top_label = FALSE)

Masking

Oftentimes, researchers will want to remove the background, hair, and clothing from an image to avoid confounds. For example, the presence versus absence of hairstyle information can reverse preferences for masculine versus feminine male averages [@debruine2006correlated].

The "standard oval mask" has enjoyed widespread popularity because it is straightforward to add to images using programs like PhotoShop. WebmorphR's mask_oval() function allows you to set oval boundaries manually (Fig.\ \@ref(fig:mask)A) or in relation to minimum and maximum template coordinates for each face (Fig.\ \@ref(fig:mask)B) or across the full image set. An arguably better way to mask out hair, clothing and background from images is to crop around the curves defined by the template (Fig.\ \@ref(fig:mask)C).

# standard oval mask
bounds <- list(t = 200, r = 400, b = 300, l = 400)
oval <- mask_oval(f, bounds, fill = "dodgerblue")

# template-aware oval mask
oval_tem <- f |>
  subset_tem(features("gmm")) |> # remove external points
  mask_oval(fill = "dodgerblue") # oval boundaries to max and min template points

# template-aware mask
masked <- mask(f, c("face", "neck", "ears"), fill = "dodgerblue")
plot_rows(A = oval, B = oval_tem, C = masked, top_label = FALSE)

Averaging

Creating average images (also called composite or prototype images) through morphing can be a way to visualise the differences between groups [@burton2005robust], manipulate averageness [@Little_2011], or create prototypical faces for image transformations.

Averaging faces with texture [@tiddeman2001prototyping;@tiddeman2005towards] makes composite images look more realistic (Fig.\ \@ref(fig:avg-texture)A). However, averages created without texture averaging look smoother and may be more appropriate for transforming color (Fig.\ \@ref(fig:avg-texture)B).

avg_tex <- avg(f, texture = TRUE)
avg_notex <- avg(f, texture = FALSE)
# close up on the eyes
eyes <- c(avg_tex, avg_notex) |> 
  align(x1=100, y1 = 100, x2 = 300, y2 = 100, width = 400, height = 150)

c(avg_tex, avg_notex, eyes) |> 
  label(c("A", "B", "", ""), gravity = "northwest", location = "+20+10") |>
  resize(width = width(avg_tex)) |>
  plot(nrow = 2)

Transforming

Transforming alters the appearance of one face by some proportion of the differences between two other faces. This technique is distinct from morphing. For example, you can transform a face in the dimension of sexual dimorphism by calculating the shape and color differences between a prototype female face (Fig.\ \@ref(fig:trans-vs-morph)A) and a prototype male face (Fig.\ \@ref(fig:trans-vs-morph)B). If you morph an individual female face with these images, you get faces that are halfway between the individual and prototype faces (Fig.\ \@ref(fig:trans-vs-morph)C,D). However, if you transform the individual face by 50% of the prototype differences, you get feminised and masculinized versions of the individual face (Fig.\ \@ref(fig:trans-vs-morph)E,F).

# load individual images
canada <- load_stim_canada() |> resize(0.5)
individual <- canada[8]

# make female and male composite images
f_avg <- canada |> subset("f") |> avg(texture = FALSE)
m_avg <- canada |> subset("m") |> avg(texture = FALSE)

# average the individual face with male and female averages
fem_morph <- c(individual, f_avg) |> avg()
masc_morph <- c(individual, m_avg) |> avg()

# make masc and fem versions of the individual
sexdim <- trans(trans_img = individual,
                from_img = f_avg,
                to_img  = m_avg,
                shape = c(fem = -0.5, masc = 0.5),
                color = c(fem = -0.5, masc = 0.5))
c(f_avg, m_avg, fem_morph, masc_morph, sexdim) |>
  label(rep(LETTERS[1:3], each = 2), 
        gravity = "northwest", 
        location = "+20+10") |>
  plot(nrow = 2, byrow = FALSE)

If, for example, the individual female face was more feminine than the average female face, morphing with the average female face produces an image that is less feminine than the original individual, while transforming along the male-female dimension produces and image that is always more feminine than the original. Morphing with a prototype also results in an image with increased averageness, while transforming maintains individually distinctive features.

Transforming also allows you to manipulate shape and colour independently (Fig.\ \@ref(fig:trans-shape-color)).

masc <- trans(trans_img = individual,
              from_img = f_avg,
              to_img  = m_avg,
              shape = c(shape = 0.5, color = 0, both = 0.5), 
              color = c(shape = 0, color = 0.5, both = 0.5))
c(individual, masc) |>
  label(LETTERS, 
        gravity = "northwest", 
        location = "+20+10") |>
  plot(nrow = 1)

Symmetrising

Although a common technique [e.g., @Mealey_1999], left-left and right-right mirroring (Fig.\ \@ref(fig:mirror-sym)) is not recommended for investigating perceptions of facial symmetry. This is because this method typically produces unnatural images for any face that isn't already perfectly symmetric. For example, if the nose does not lie in a perfectly straight line from the centre point between the eyes to the centre of the mouth, then one of the mirrored halves will have a much wider nose than the original face, while the the other half will have a much narrower nose than the original face. In extreme cases, one mirrored version can end up with three nostrils and the other with a single nostril.

# make eye points exactly horizontal
hzeyes <- horiz_eyes(f)

# calculate midpoint of eyes for each image
pts <- get_point(hzeyes, pt = 0:1)
midpoint <- mean(pts$x)

# crop and mirror images
left_side <- crop(hzeyes, width = midpoint, x_off = 0)
left_mirror <- mirror(left_side)
right_side <- crop(hzeyes, width = width(hzeyes)-midpoint, x_off = midpoint)
right_mirror <- mirror(right_side)

# paste images together
left_left <- mapply(function(ls, rs) {
  c(ls, rs) |> plot(padding = 0)
}, left_side, left_mirror) |>
  crop(800, 800)


right_right <- mapply(function(ls, rs) {
  c(ls, rs) |> plot(padding = 0)
}, right_mirror, right_side) |>
  crop(800, 800)

plot_rows(left_left, right_right)

A morph-based technique is a more realistic way to manipulate symmetry [@Little_2011,@little2001self,@paukner2017capuchin]. It preserves the individual's characteristic feature shapes and avoids the problem of having to choose an axis of symmetry on a face that isn't perfectly symmetrical. In this method, the original face is mirror-reversed and each template point is re-labelled. The original and mirrored images are averaged together to create a perfectly symmetric version of the image that has the same feature widths as the original face (Fig.\ \@ref(fig:morph-sym)). You can also use this symmetric version to create asymmetric versions of the original face through transforming: exaggerating the differences between the original and the symmetric version.

sym_both <- symmetrize(f)
sym_shape <- symmetrize(f, color = 0)
sym_color <- symmetrize(f, shape = 0)
sym_anti <- symmetrize(f, shape = -1.0, color = 0)
plot_rows(A = sym_both, B = sym_color, 
          C = sym_shape, D = sym_anti,
          top_label = FALSE)

Case Studies

In this section, we will demonstrate how more complex face image manipulations can be scripted, such as the creation of prototype faces, making emotion continuua, manipulating sexual dimorphism, manipulating resemblance, and labelling stimuli with words or images.

London Face Set

We will use the open-source, CC-BY licensed image set, the Face Research Lab London Set [@FRL_London]. Images are of 102 adults whose pictures were taken in London, UK, in April 2012 for a project with Nikon camera (Fig.\ \@ref(fig:london-set)). All individuals were paid and gave signed consent for their images to be "used in lab-based and web-based studies in their original or altered forms and to illustrate research (e.g., in scientific journals, news media or presentations)."

london <- load_stim_london() |>
  resize(0.5) |> # makes the demo run faster, change for final
  add_info(webmorphR.stim::london_info)

london |> plot(nrow=6, maxwidth = 2000)
smiling <- load_stim_smiling() |>
  resize(0.5) |> # makes the demo run faster, change for final
  add_info(webmorphR.stim::london_info)

Each subject has one smiling and one neutral pose. For each pose, 5 full colour images were simultaneously taken from different angles: left profile, left three-quarter, front, right three-quarter, and right profile, but we will only use the front-facing images in the examples below. These images were cropped to 1350x1350 pixels and the faces were manually centered (many years ago before we made the tools in this paper). The neutral front images have template files that mark out 189 coordinates delineating face shape for use with Psychomorph or WebMorph.

Protoypes

The first step for many types of stimuli is to create prototype faces for some categories, such as expression or gender. The faces that make up these averages should be matched for other characteristics that you want to avoid confounding with the categories of interest, such as age or ethnicity. Here, we will choose 5 Black female faces, automatically delineate them, align the images, and create neutral and smiling prototypes (Fig.\ \@ref(fig:emo-avg)).

# select the relevant images and auto-delineate them
neu_orig <- subset(london, face_gender == "female") |> 
  subset(face_eth == "black") |> subset(1:5) |>
  auto_delin("dlib70", replace = TRUE)

smi_orig <- subset(smiling, face_gender == "female") |> 
  subset(face_eth == "black") |> subset(1:5) |>
  auto_delin("dlib70", replace = TRUE)

# align the images
all <- c(neu_orig, smi_orig) 
aligned <- all |>
  align(procrustes = TRUE, fill = patch(all)) |>
  crop(.6, .8, y_off = 0.05)

neu <- subset(aligned, 1:5)
smi <- subset(aligned, 6:10)

neu_avg <- avg(neu, texture = FALSE)
smi_avg <- avg(smi, texture = FALSE)
c(neu_avg,
  plot(neu, external_pad = 0, ncol = 1), 
  plot(smi, external_pad = 0, ncol = 1),
  smi_avg
) |> 
  resize(height = height(neu_avg))

We use the "dlib70" auto-delineation model, which is available through webmorphR.dlib [@R-webmorphR.dlib], but requires the installation of python and some python packages. However, it has the advantage of not requiring setting up an account at Face++ and doesn't transfer your images to a third party.

Emotion Continuum

Once you have two prototype images, you can set up a continuum that morphs between the images and even exaggerates beyond them (Fig.\ \@ref(fig:continuum)). Note that some exaggerations beyond the prototypes can produce impossible shape configurations, such as the negative smile, where the open lips from a smile go to closed at 0% and pass through each other at negative values.

steps <- continuum(neu_avg, smi_avg, from = -0.5, to = 1.5, by = 0.25)
lab <- paste0(seq(-0.5, 1.5, .25) * 100, "%")
steps |>
  label(lab, gravity = "northwest", location = "+20+10") |>
  plot(nrow = 1)

Sexual dimorphism transform

We can use the full templates to create sexual dimorphism transforms from neutral faces. Repeat the process above for 5 male and 5 female neutral faces, skipping the auto-delineation because these images already have webmorph templates (Fig.\ \@ref(fig:sexdim-avg)).

# select the relevant images
f_orig <- subset(london, face_gender == "female") |> 
  subset(face_eth == "black") |> subset(1:5)

m_orig <- subset(london, face_gender == "male") |> 
  subset(face_eth == "black") |> subset(1:5)

# align the images
all <- c(f_orig, m_orig) 
aligned <- all |>
  align(procrustes = TRUE, fill = patch(all)) |>
  crop(.6, .8, y_off = 0.05)

f <- subset(aligned, 1:5)
m <- subset(aligned, 6:10)

f_avg <- avg(f, texture = FALSE)
m_avg <- avg(m, texture = FALSE)
c(f_avg,
  plot(f, external_pad = 0, ncol = 1), 
  plot(m, external_pad = 0, ncol = 1),
  m_avg
) |> 
  resize(height = height(f_avg))

Next, transform each individual image using the average female and male faces as transform endpoints (Fig.\ \@ref(fig:sexdim-transform)).

# use a named vector for shape to automatically rename the images
sexdim <- trans(
  trans_img = c(f, m),
  from_img = f_avg,
  to_img = m_avg,
  shape = c(fem = -.5, masc = .5)
)
plot_rows(A = subset(sexdim, "_fem"),
          B = subset(sexdim, "_masc"), 
          size = 10,
          location = "+5+2")

Self-resemblance transform

Some research involves creating "virtual siblings" for participants to test how they perceive and behave towards strangers with phenotypic kinship cues [@DeBruine_2004PRSLB;@DeBruine_2005PRSLB;@DeBruine_2011PNAS]. As discussed in detail in DeBruine et al. [-@DeBruine_2008ASB], while morphing techniques are sufficient to create same-gender virtual siblings, transforming techniques are required to make other-gender virtual siblings without confounding self-resemblance with androgyny (Fig.\ \@ref(fig:virtual-sibs)).

virtual_sis <- trans(
  trans_img = f_avg,   # transform an average female face
  shape = 0.5,         # by 50% of the shape differences
  from_img = m_avg,    # between an average male face
  to_img = m) |>       # and individual male faces
  mask(c("face", "neck","ears")) 

virtual_bro <- trans(
  trans_img = m_avg,   # transform an average male face
  shape = 0.5,         # by 50% of the shape differences
  from_img = m_avg,    # between an average male face
  to_img = m) |>       # and individual male faces
  mask(c("face", "neck","ears"))
plot_rows(A = crop_tem(m), 
          B = crop_tem(virtual_bro), 
          C = crop_tem(virtual_sis),
          top_label = FALSE)

Labels

Many social perception studies require labelled images, such a minimal group designs. You can add custom labels and superimpose images on stimuli (Fig.\ \@ref(fig:label-comp)).

# download, resize and save flag images

dir.create("images/flags", FALSE)

magick::image_read("https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Flag_of_Scotland.svg/2560px-Flag_of_Scotland.svg.png") |>
  magick::image_resize(magick::geometry_size_pixels(75)) |>
  magick::image_write("images/flags/saltire.png")

magick::image_read("https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Flag_of_Wales.svg/2560px-Flag_of_Wales.svg.png") |>
  magick::image_resize(magick::geometry_size_pixels(75)) |>
  magick::image_write("images/flags/ddraig.png")
flags <- read_stim("images/flags")

ingroup <- f |>
  # pad 10% at the top with matching color
  pad(0.1, 0, 0, 0, fill = patch(f)) |> 
  label("Scottish", "north", "+0+10") |>
  image_func("composite", flags$saltire$img, 
              gravity = "northeast", offset = "+10+10")

outgroup <- f |>
  pad(0.1, 0, 0, 0, fill = patch(f)) |> 
  label("Welsh", "north", "+0+10") |>
  image_func("composite", flags$ddraig$img, 
             gravity = "northeast", offset = "+10+10")
plot_rows(ingroup, outgroup)

Discussion

Preparing your stimuli for face research in the ways described above has both personal and altruistic benefits. Once the original scripts are written, you will be able to prepare new stimuli without manual intervention. It also makes the process of changing your mind about the experimental design much less painful. If you decide that the images actually should have been aligned prior to several steps, you only need to add a line of code and rerun your script, instead of start a whole manual process over from scratch. But even more important, providing reproducible scripts can allow others to build on your work with their own images. This is beneficial for generalisability, whether or not you can share your original images.

In this section, we will discuss a number of issues related to making sure research that uses face stimuli is ethical and methodologically robust. While these issues may not be directly related to stimulus reproducibility, they are important to discuss in a paper that aims to make it easier for people to do research with face images.

Ethical Issues

Research with identifiable faces has a number of ethical issues. This means it is not always possible to share the exact images used in a study. In this case, it is all the more important for the stimulus construction methods to be clear and reproducible. However, there are other ethical issues outside of image sharing that we feel are important to highlight in a paper discussing the use of face images in research.

The use of face photographs must respect participant consent and personal data privacy. Images that are "freely" available on the internet, such as in Twitter profiles, are a grey area and the ethical issues should be carefully considered by the researchers and relevant ethics board.

We strongly advise against using face images in research where there is a possibility of real-world consequences for the pictured individuals. For example, do not post identifiable images of real people on real dating sites without the explicit consent of the pictured individuals for that specific research.

The use of face image analysis should never be used to predict behaviour or as automatic screening. For example, face images cannot be used to predict criminality or decide who should proceed to the interview stage in a job application. This type of application is unethical because the training data is always biased. Face image analysis can be useful for researching what aspects of face images give rise to the perception of traits like trustworthiness, but should not be confused with the ability to detect actual behaviour. Researchers have a responsibility to consider how their research may be misused in this manner.

Natural vs standardised source images

Use the right image for the question. -- Ben, do you think you could write a bit about this? I thought it would be useful to explain when/why you might use standardised images versus naturalistic "holiday snaps". WebmorphR can help process either, but the delineations are mainly specialised for front-facing faces (although profile face templates are available).

left_profile <- tem_def(33)
right_profile <- tem_def(32)

left_viz <- viz_tem_def(left_profile)
right_viz <- viz_tem_def(right_profile)
c(left_viz, right_viz) |> crop_tem()

Head position

Morphometrics -- Iris, can you add this?

Judging composites

In this section we will explain a serious caveat to research using composite faces that concludes something about group differences from judgements of a single pair or a small number of pairs of composites. Since we are making it easier to create composites, we do not want to inadvertently encourage research with this particular design.

As a concrete illustration, a recent paper by @alper2021all used faces from the Faceaurus database [@holtzman2011facing]. “Holtzman (2011) standardized the assessment scores, computed average scores of self- and peer-reports, and ranked the face images based on the resulting scores. Then, prototypes for each of the personality dimensions were created by digitally combining 10 faces with the highest, and 10 faces with the lowest scores on the personality trait in question (Holtzman, 2011).” This was done separately for male and female faces.

Since scores on the three dark triad traits are positively correlated, the three pairs of composite faces are not independent. Indeed, Holtzman states that 5 individuals were in all three low composites for the male faces, while the overlap was less extreme in other cases. With 105 observers, Holtzman found that the ability to detect the composite higher in a dark triad trait was greater than chance.

While we commend both Holtzman and Alper, Bayrak, and Yilmaz for their transparency, data sharing, and material sharing, we argue that this test has an effective N of 2, not 105, and that further replications using these images, such as those done by Alper, Bayrak, and Yilmaz, regardless of number of observers or preregistered status, lend no further weight of evidence to the assertion that dark triad traits are visible in physical appearance.

To explain this, we'll use an analogy that has nothing to do with faces (bear with us). Imagine a researcher predicts that women born on odd days are taller than women born on even days. Ridiculous, right? So let's simulate some data assuming that isn't true. The code below samples 20 women from a population with a mean height of 158.1 cm and an SD of 5.7. Half are born on odd days and half on even days.

set.seed(8675309)

stim_n <- 10
height_m <- 158.1
height_sd <- 5.7

odd <- rnorm(stim_n, height_m, height_sd)
even <- rnorm(stim_n, height_m, height_sd)

t.test(odd, even)

A t-test shows no significant difference, which is unsurprising. We simulated the data from the same distribution, so we know for sure there is no real difference here. Now we're going to average the height of the women with odd and even birthdays. So if we create a full-body composite of women born on odd days, she would be 161.2 cm tall, and a composite of women born on even days would be 156.7 cm tall.

If we ask 100 observers to look at these two composites, side-by-side, and judge which one looks taller, what do you imagine would happen? It's likely that nearly all of them would judge the odd-birthday composite as taller. But let's say that observers have to judge the composites independently, and they are pretty bad with height estimation, so their estimates for each composite have error with a standard deviation of 10 cm. We then compare their estimates for the odd-birthday composite with the estimate for the even-birthday composite in a paired-samples t-test.

obs_n <-100 # number of observers
error_sd <- 10 # observer error

# add the error to the composite mean heights
odd_estimates <- mean(odd) + rnorm(obs_n, 0, error_sd)
even_estimates <- mean(even) + rnorm(obs_n, 0, error_sd)

t.test(odd_estimates, even_estimates, paired = TRUE)
p <- t.test(odd_estimates, even_estimates, paired = TRUE)$p.value |> round(3)
x <- replicate(10000, mean(rnorm(10))-mean(rnorm(10)))
mean_unsigned_diff_10 <- round(mean(abs(x)), 2)

Now the women with odd birthdays are significantly taller than the women with even birthdays (p = r p). Or are they?

We can be sure that by chance alone, our two composites will be at least slightly different on any measure, even if they are drawn from identical populations. The smaller the number of stimuli that go into each composite, the larger the mean (unsigned) size of this difference. With only 10 stimuli per composite (like the Facesaurus composites), the mean unsigned effect size of the difference between composites from populations with no real difference is r mean_unsigned_diff_10 (in units of SD of the original trait distribution). If our observers are accurate enough at perceiving this difference, or we run a very large number of observers, we are virtually guaranteed to find significant results every time. Additionally, there is a 50% chance that these results will be in the predicted direction, and this direction will be replicable across different samples of observers for the same image set.

So what does this mean for studies of the link between personality traits and facial appearance? The analogy with birth date and height holds. As long as there are facial morphologies that are even slightly consistently associated with the perception of a trait, then composites will not be identical in that morphology. Thus, even if that morphology is totally unassociated with the trait as measured by, e.g., personality scales or peer report (which is often the case), using the composite rating method will inflate the false positive rate for concluding a difference.

The smaller the number of stimuli that go into each composite, the greater the chance that they will be visibly different in morphology related to the judgement of interest, just by chance alone. The larger the number of observers or the better observers are at detecting small differences in this morphology, the more likely that "detection" will be significantly above chance. Repeating this with a new set of observers does not increase the amount of evidence you have for the association between the face morphology and the measured trait. You've only measured it once in one population of faces. If observers are your unit of analyses, you are making conclusions about whether the population of observers can detect the difference between your stimuli, you cannot generalise this to new stimulus sets.

So how should researchers test for differences in facial appearance between groups? Assessment of individual face images, combined with mixed effects models [@debruine2021understanding], can allow you to simultaneously account for variance in both observers and stimuli, avoiding the inflated false positives of the composite method (or aggregating ratings). People often use the composite method when they have too many images for any one observer to rate, but cross-classified mixed models can analyse data from counterbalanced trials or randomised subset allocation.

Another reason to use the composite rating method is when you are not ethically permitted to use individual faces in research, but are ethically permitted to use non-identifiable composite images. In this case, you can generate a large number of random composite pairs to construct the chance distribution. The equivalent to a p-value for this method is the proportion of the randomly paired composites that your target pair has a more extreme result than. While this method is too tedious to use when constructing composite faces manually, scripting allows you to automate such a task.

set.seed(8675309) # for reproducibility

# load 20 faces
f <- load_stim_canada("f") |> resize(0.5)

# set to the number of random pairs you want
n_pairs <- 5

# repeat this code n_pairs times
pairs <- lapply(1:n_pairs, function (i) {
  # sample a random 10:10 split
  rand1 <- sample(names(f), 10)
  rand2 <- setdiff(names(f), rand1)

  # create composite images
  comp1 <- avg(f[rand1])
  comp2 <- avg(f[rand2])

  # save images with paired names
  nm1 <- paste0("img_", i, "_a")
  nm2 <- paste0("img_", i, "_b")
  write_stim(comp1, dir = "images/composites", names = nm1)
  write_stim(comp2, dir = "images/composites", names = nm2)
})
pairs <- read_stim("images/composites/")
plot(pairs, byrow = FALSE, nrow = 2)

Open Resources

In conclusion, we hope that this paper has convinced you that it is both possible and desirable to use scripting to prepare stimuli for face research. You can access more detailed tutorials for webmorph.org at https://debruine.github.io/webmorph/ and for webmorphR at https://debruine.github.io/webmorphR/. All image sets used in this tutorial are available on a CC-BY license at figshare and all software is available open source. The code to reproduce this paper can be found at https://github.com/debruine/webmorphR/tree/master/paper.

\newpage

References

We used r cite_r("r-references.bib") to produce this manuscript.

\begingroup \setlength{\parindent}{-0.5in} \setlength{\leftskip}{0.5in}

\endgroup



debruine/webmorphR documentation built on Aug. 15, 2022, 3:51 p.m.