Adding to the dictionary"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
df <- data.frame(
  gender = c("male", "enby", "womn", "mlae", "mann", "frau", "femme", "homme", "nin"),
  stringsAsFactors = FALSE
)

Outline

While the gendercoder dictionaries aim to be as comprehensive as possible, it is inevitable that new typos and variations will occur in wild data. Moreover, at present, the dictionaries are limited to data the authors have had access to which has been collected in English. As such, if you are collecting data, you will at some point want to add to or create your own dictionaries (and if so, we strongly encourage contributions either as a pull request via GitHub, or by raising an issue so the team can help).

Adding to the dictionary

Let's say I have free-text gender data, but some of it is not in English.

library(gendercoder)
df

I can create a new dictionary by creating a named vector, where the names are the raw, uncoded values, and the values are the desired outputs. This can then be used as the dictionary in the recode_gender() function.

new_dictionary <- c(
  mann = "man", 
  frau = "woman", 
  femme = "woman", 
  homme = "man", 
  nin = "man")

new_dictionary_df <- df
new_dictionary_df$recoded_gender <- recode_gender(
  df$gender,
  dictionary = new_dictionary,
  retain_unmatched = TRUE
)
new_dictionary_df

However, as you can see using just this new dictionary leaves a number of responses uncoded that the built-in dictionaries could handle. As the dictionaries are just vectors, we can simply concatenate these to use both at the same time.

We can do this in-line...

inline_df <- df
inline_df$recoded_gender <- recode_gender(
  df$gender,
  dictionary = c(manylevels_en, new_dictionary),
  retain_unmatched = TRUE
)
inline_df

Or otherwise we can create a new dictionary and call that later, useful if you might want to save an augmented dictionary for later use or for contributing to the package.

manylevels_plus <-  c(manylevels_en, new_dictionary)

stepped_df <- df
stepped_df$recoded_gender <- recode_gender(
  df$gender,
  dictionary = manylevels_plus,
  retain_unmatched = TRUE
)
stepped_df

Making it official

Let's say you are happy with your manylevels_plus dictionary and think it should be part of the manylevels_en dictionary in the package. All you need to do is fork the gendercoder repo, clone it to your local device, and then rename your vector and use the usethis::use_data() function to overwrite the manylevels_en dictionary as shown below.

manylevels_en <-  manylevels_plus
usethis::use_data(manylevels_en, overwrite = TRUE)

Once you've pushed the changes to your fork, you can make a pull request. Please tell us what you're adding so we know what to look out for and how to test it.



Try the gendercoder package in your browser

Any scripts or data that you put into this service are public.

gendercoder documentation built on May 19, 2026, 1:08 a.m.