Links97Pair: Kinship linking file for pairs of relatives in the NLSY97
In NlsyLinks: Utilities and Kinship Information for Research with the NLSY

Links97Pair

R Documentation

Kinship linking file for pairs of relatives in the NLSY97

Description

This dataset specifies the relatedness coefficient (ie, 'R') between subjects in the same extended family. Each row represents a unique relationship pair.

NOTE: Two variable names changed in November 2013. Subject1Tag and Subject2Tag became SubjectTag_S1 and SubjectTag_S2.

Format

A data frame with 2,519 observations on the following 5 variables. There is one row per unique pair of subjects, irrespective of order.

ExtendedID Identity of the extended family of the pair; it corresponds to the HHID in the NLSY97. See References below.
SubjectTag_S1 Identity of the pair's first subject. See Details below.
SubjectTag_S2 Identity of the pair's second subject. See Details below.
R The pair's Relatedness coefficient. See Details below.
RelationshipPath Specifies the relationship category of the pair. This variable is a factor, with level Housemates=1.

Details

The variable ExtendedID corresponds to the NLSY97 variable ⁠[SIDCODE]⁠ (e.g., R11930.00), which uniquely identifies a household that may contain multiple NLSY97 subjects.

The variables SubjectTag_S1 and SubjectTag_S2 uniquely identify subjects. It corresponds to the NLSY97 variable ⁠[PUBID]⁠, (e.g., R00001.00).

The RelationshipPath variable is not useful with this dataset, but is included to be consistent with the Links97Pair dataset.

An extended family with k subjects will have k(k-1)/2 rows. Typically, Subject1 is older while Subject2 is younger.

MZ twins have R=1. DZ twins and full-siblings have R=.5. Half-siblings have R=.25. Typical first cousins have R=.125. Unrelated subjects have R=0 (this occasionally happens for Housemates, but never for the other paths). Other R coefficients are possible.

There are several other uncommon possibilities, such as half-cousins (R=.0625) and ambiguous aunt-nieces (R=.125, which is an average of 1/4 and 0/4). The variable coding for genetic relatedness,R, in Links97Pair contains only the common values of R whose groups are likely to have stable estimates. However the variable RFull in Links97PairExpanded contains all R values. We strongly recommend using R in this base::data.frame. Move to RFull (or some combination) only if you have a good reason, and are willing to carefully monitor a variety of validity checks. Some of these excluded groups are too small to be estimated reliably.

Author(s)

Will Beasley

Source

Information comes from the Summer 2018 release of the NLSY97 sample. Data were extracted with the NLS Investigator (https://www.nlsinfo.org/investigator/).

References

For more information on R (ie, the Relatedness coefficient), please see Rodgers, Joseph Lee, & Kohler, Hans-Peter (2005). Reformulating and simplifying the DF analysis model. Behavior Genetics, 35 (2), 211-217.

Examples

library(NlsyLinks) # Load the package into the current R session.
summary(Links97Pair) # Summarize the five variables.
hist(Links97Pair$R) # Display a histogram of the Relatedness coefficients.
table(Links97Pair$R) # Create a table of the Relatedness coefficients for the whole sample.

# Create a dataset of only monozygotic sibs.
mz_sibs <- subset(Links97Pair, R > .9)
summary(mz_sibs) # Create a table MZ sibs.

NlsyLinks documentation built on Aug. 31, 2025, 5:08 p.m.