compute_name_index: Compute multiple indices of surnames and given names.
In ChineseNames: Chinese Name Database 1930-2008

View source: R/ChineseNames.R

compute_name_index

R Documentation

Compute multiple indices of surnames and given names.

Description

Compute all available name features (indices) based on familyname and givenname. You can either input a data frame with a variable of Chinese full names (and a variable of birth years, if necessary) or just input a vector of full names (and a vector of birth years, if necessary).

Usage 1: Input a single value or a vector of name (and birth, if necessary).
Usage 2: Input a data frame of data and the variable name of var.fullname (or var.surname and/or var.givenname) (and var.birthyear, if necessary).

Caution: Name-character uniqueness (NU) for birth year >= 2010 is estimated by forecasting and thereby may not be accurate.

Usage

compute_name_index(
  data = NULL,
  var.fullname = NULL,
  var.surname = NULL,
  var.givenname = NULL,
  var.birthyear = NULL,
  name = NA,
  birth = NA,
  index = c("NLen", "SNU", "SNI", "NU", "CCU", "NG", "NV", "NW", "NC"),
  NU.approx = TRUE,
  digits = 4,
  return.namechar = TRUE,
  return.all = FALSE
)

Arguments

`data`	Data frame.
`var.fullname`	Variable name of Chinese full names (e.g., `"name"`).
`var.surname`	Variable name of Chinese surnames (e.g., `"surname"`).
`var.givenname`	Variable name of Chinese given names (e.g., `"givenname"`).
`var.birthyear`	Variable name of birth year (e.g., `"birth"`).
`name`	If no `data`, you can just input a vector of full name(s).
`birth`	If no `data`, you can just input a vector of birth year(s).
`index`	Which indices to compute? By default, it computes all available name indices: `NLen`: full-name length (2~4). `SNU`: surname uniqueness (1~6). `SNI`: surname initial (1~26). `NU`: name-character uniqueness (1~6). `CCU`: character-corpus uniqueness (1~6). `NG`: name gender (-1~1). `NV`: name valence (1~5). `NW`: name warmth (1~5). `NC`: name competence (1~5).
`NU.approx`	Whether to approximately compute name-character uniqueness (NU) using the nearest two birth cohorts with relative weights (which would be more precise than just using a single birth cohort). Defaults to `TRUE`.
`digits`	Number of decimal places. Defaults to `4`.
`return.namechar`	Whether to return separate name characters. Defaults to `TRUE`.
`return.all`	Whether to return all temporary variables in the computation of the final variables. Defaults to `FALSE`.

Details

https://psychbruce.github.io/ChineseNames/

Value

A new data frame (class data.table) with name indices appended. Full names are split into name0 (surnames, with compound surnames automatically detected), name1, name2, and name3 (given-name characters).

Examples

## Prepare ##
sn = familyname$surname[1:12]
gn = c(top100name.year$name.all.1960[1:6],
       top100name.year$name.all.2000[1:6],
       top100name.year$name.all.1960[95:100],
       top100name.year$name.all.2000[95:100])
demodata = data.frame(name=paste0(sn, gn),
                      birth=c(1960:1965, 2000:2005,
                              1960:1965, 2000:2005))
demodata

## Compute ##
newdata = compute_name_index(demodata,
                             var.fullname="name",
                             var.birthyear="birth")
newdata

ChineseNames documentation built on Aug. 21, 2025, 5:50 p.m.