trio.check: Check Case-Parent Trio Data for Mendelian Errors

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/trio.check.R

Description

This function checks case-parent trio data in linkage or genotype format for Mendelian errors. If no errors are found, the function returns an object suitable for input to the trio.prepare function. Otherwise, an object identifying the Mendelian errors is returned.

Usage

1
trio.check(dat, is.linkage=TRUE, replace=FALSE)

Arguments

dat

A matrix or data frame of pedigree data in linkage format, or in genotype format.

If the data are in linkage format, the file has to have the standard linkage/pedigree format. Each row describes an individual, and the columns are <famid> <pid> <fatid> <motid> <sex> <affected> <genotype:1_1> <genotype:1_2> ... <genotype:n_1> <genotype:n_2>. Here, <famid> is a unique identifier for each family, <pid> is a unique identifier for an individual within each family, <fatid> and <motid> identify the father and mother of the individual, <sex> denotes the gender, using the convention 1=male, 2=female, <affected> denotes the disease status (0=unknown, 1=unaffected, 2=affected). Only one phenotype column is allowed. Each genotype is encoded using two columns (<genotype:k_1> and <genotype:k_2>), identifying the alleles (1 for the major allele, 2 for the minor allele, 0 if missing). Other values for the alleles will result in an error. Please see the data frames trio.ped1 and trio.ped2 contained in this package as examples for trio data in linkage file format (complete and with missing records, respectively).

If the data are in genotype format, each row in the object describes an individual, and each block of three consecutive rows describes the two parents and the affected child in a trio. The columns in the object are <famid> <pid> <genotype_1> ... <genotype_n>. Here, <famid> is a unique identifier for each family, <pid> is a unique identifier for an individual within each family (with each block of three consecutive rows describing the two parents and the affected child in a trio). Each <genotype> is encoded as an integer indicating the number of variant alleles (e.g. 0=common homozygote, 1=heterozygote, and 2=rare homozygote, and NA=missing genotype). Please see the data frames trio.gen1 and trio.gen2 contained in this package as examples for trio data in linkage file format (complete and with missing records, respectively).

is.linkage

A logical value indicating if the case parent data are in linkage file format (TRUE) or in genotype format (FALSE).

replace

A logical value indicating whether existing Mendelian errors should be replaced by missing values. For each Mendelian error found (for a particular trio at a particular locus), all three genotypes are replaced by NA, and an object suitable for input to the trio.prepare function is returned.

Details

The first function used from this package should always be trio.check. Unless otherwise specified, this function assumes that the data are in linkage format, however, genotype data can also be accommodated. If no Mendelian inconsistencies in the data provided are identified, trio.check creates an object that can be processed in the subsequent analysis with the trio.prepare function. If the data were in linkage format, the genotype information for each SNP will be converted into a single variable, denoting the number of variant alleles.

To delineate the genotype information for the pseudo-controls in the subsequent analysis, the trio data must not contain any Mendelian errors. The function trio.check returns a warning, and an R object with relevant information when Mendelian errors are encountered in the supplied trio data. It is the users responsibility to find the cause for the Mendelian errors and correct those, if possible. However, Mendelian inconsistencies are often due to genotyping errors and thus, it might not be possible to correct those in a very straightforward manner. In this instance, the user might want to encode the genotypes that cause theses Mendelian errors in some of the trios as missing data. The function trio.check allows for this possibility, using the argument replace=T.

Value

The function trio.check returns a list with the following elements:

trio

A data frame with the genotypes of the trios, suitable for input to the function trio.prepare. This element will be NULL if Mendelian errors are detected.

errors

This element will be NULL if no Mendelian errors are detected. Otherwise, this element will be a data frame with five columns, indicating the Mendelian errors detected in the object dat. The five columns of the data frame refer to the trio (trio), the family id (famid), the genotype (snp), the row numbers (r), and the column numbers (c).

trio.err

This element will be NULL if no Mendelian errors are detected. Otherwise, this element will be a data frame with the trio genotype data. If the input was a linkage file, the data will be converted from alleles to genotypes. If the input was a genotype file, this element will be identical to the input.

Author(s)

Qing Li, mail2qing@yahoo.com

References

Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.

See Also

trio.prepare

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
data(trio.data)
trio.tmp <- trio.check(dat=trio.ped1)
str(trio.tmp, max=1)
trio.tmp$trio[1:6,]

trio.tmp <- trio.check(dat=trio.ped.err)
str(trio.tmp, max=1)
trio.tmp$errors
trio.tmp$trio.err[1:3, c(1,2, 11:12)]
trio.ped.err[1:3,c(1:2, 23:26)]

trio.tmp <- trio.check(dat=trio.gen.err, is.linkage=FALSE)
trio.tmp$errors
trio.tmp$trio.err[1:6, c(1,2,7), drop=FALSE]

trio.rep <- trio.check(dat=trio.gen.err, is.linkage=FALSE, replace=TRUE)
trio.rep$trio[1:6,c(1,2,7)]

Example output

List of 2
 $ trio  :'data.frame':	300 obs. of  12 variables:
 $ errors: NULL
  famid pid snp1 snp2 snp3 snp4 snp5 snp6 snp7 snp8 snp9 snp10
1 10001   1    0    0    1    1    1    1    0    0    0     0
2 10001   2    0    2    0    0    2    0    0    0    0     0
3 10001   3    0    1    1    1    1    1    0    0    0     0
4 10002   1    0    0    2    2    0    2    0    0    0     1
5 10002   2    0    2    0    0    2    0    0    0    0     0
6 10002   3    0    1    1    1    1    1    0    0    0     0
[1] "Found Mendelian error(s)."
List of 3
 $ trio    : NULL
 $ errors  :'data.frame':	4 obs. of  5 variables:
 $ trio.err:'data.frame':	300 obs. of  12 variables:
  trio famid snp r  c
1    1 10001   9 1 11
2    1 10001  10 1 12
3    2 10002  10 4 12
4    3 10003  10 7 12
  famid pid snp9 snp10
1 10001   1    0     1
2 10001   2    0     2
3 10001   3    2     0
  famid pid snp9_1 snp9_2 snp10_1 snp10_2
1 10001   1      1      1       1       2
2 10001   2      1      1       2       2
3 10001   3      2      2       1       1
[1] "Found Mendelian error(s)."
  trio famid snp r c
1    1  2001   5 1 7
2    2  2002   5 4 7
   famid pid snp5
6   2001   1    0
7   2001   2    0
5   2001   3    1
9   2002   1    1
10  2002   2    0
8   2002   3    2
   famid pid snp5
6   2001   1   NA
7   2001   2   NA
5   2001   3   NA
9   2002   1   NA
10  2002   2   NA
8   2002   3   NA

trio documentation built on Nov. 8, 2020, 7:41 p.m.