vcf2geno: Transformation of VFC File

Description Usage Arguments Value Author(s) See Also

View source: R/vcf2geno.R

Description

Transforms a vcf file into a matrix in genotype format required by, e.g., the functions for computing the genotypic TDT.

Usage

1
2
3
vcf2geno(vcf, ped, none = "0/0", one = c("0/1"), both = "1/1", na.string = ".",
    use.rownames = FALSE, allowDifference = FALSE, removeMonomorphic = TRUE, 
		removeNonBiallelic = TRUE, changeMinor = FALSE)

Arguments

vcf

a matrix resulting from reading a vcf file into R, or an object of class collapsedVCF (i.e. the output of, e.g., the function readVcf from the VariantAnnotation package). If use.rownames = FALSE, the column names of the genotype matrix must correspond to the personal IDs in ped (i.e. either the column pid of ped, if the entries in pid are unique, or otherwise, a combination of the columns famid and pid from ped, combined using an underscore). If use.rownames = TRUE, the column names of the genotype matrix specified by vcf must correspond to the row names of ped.

ped

a data frame containing the family information for the subjects in vcf (might also contain information for other subjects, see allowDifference). This data frame must contain the columns famid, pid, fatid, and motid comprising the family ID, the personal ID as well as the ID of the father and the mother, respectively.

none

a character string or vector specifying the coding for the homozygous reference genotype.

one

a character string or vector specifying the coding for the heterozygous genotype.

both

a character string or vector specifying the coding for the homozygous variant genotype.

na.string

a character string or vector specifying how missing values are coded in the vcf file.

use.rownames

a logical value specifying whether the row names of ped correspond to the sample names in vcf. For details, see vcf.

allowDifference

a logical value specifying whether ped and vcf are allowed to also contain samples not available in the respective other object. If FALSE, all samples in ped must also be available in vcf, and vice versa (matched as described in vcf). If TRUE, at least 10% of the samples must be contained in both vcf and ped.

removeMonomorphic

a logical value specifying whether monomorphic SNVs should be removed from the output.

removeNonBiallelic

a logical value specifying whether SNVs showing other genotypes than the ones specified by none, one, and both (which are, therefore, assumed to show more than two alleles) should be removed.

changeMinor

a logical value specifying whether the coding of the genotypes should be changed for SNVs for which the default coding leads to a minor allele frequency larger than 0.5. The genotypes are coded by the number of minor alleles, i.e. the genotype(s) specified by none is coded by 0, the genotype(s) specified by one is coded by 1, and the genotype(s) specified by both is coded by 2. If for an SNV this leads to a minor allele frequency larger than 0.5 and changeMinor = TRUE, this 0, 1, 2-coding will be changed into a 2, 1, 0-coding.

Value

A matrix in genotype format required, e.g., by functions for performing different types of the genotypic TDT, such as colTDT.

Author(s)

Holger Schwender, holger.schwender@udo.edu

See Also

colTDT, colGxG, colGxE, ped2geno


trio documentation built on Nov. 8, 2020, 7:41 p.m.