getgenotypesdos: fetch dosage integer matrix for specified markers

View source: R/mega2rcreate.R

getgenotypesdosR Documentation

fetch dosage integer matrix for specified markers

Description

This function calls a C++ function that does all the heavy lifting. It passes the arguments necessary for the C++ function: some from the caller's arguments and some from data frames that are in the "global" environment, envir. From its markers_arg argument, it gets the locus_index and the index in the unified_genotype_table. From the "global" environment, envir, it gets a bit vector of compressed genotype information, and some bookkeeping related data. Note: This function also contains a dispatch/switch on the type of compression in the genotype vector. A different C++ function is called when there is compression versus when there is no compression.

Usage

getgenotypesdos(markers_arg, envir = ENV)

Arguments

markers_arg

a data.frame with the following 5 observations:

locus_link

is the ordinal ranking of this marker among all loci

locus_link_fill

is the position of corresponding genotype data in the unified_genotype_table

MarkerName

is the text name of the marker

chromosome

is the integer chromosome number

position

is the integer base pair position of marker

envir

an environment that contains all the data frames created from the SQLite database.

Details

The unified_genotype_table contains one raw vector for each person. In the vector, there are two bits for each genotype. This function creates an output matrix by fixing the marker and collecting genotype information for each person and then repeating for all the specified markers.

Value

a list of 3 values, named "ncol", "zero", "geno".

geno

is a matrix of dosages as integers. The value 0 is given to the Major allele value, 1 is given to the heterozygote value, and 2 is given to the Minor allele. In the matrix, there is usually one column for each marker in the markers_arg argument. But if there would be only the one allele 0 or 2 in the column, the column is ignorednot present. There is one row for each person in the family (fam) table.

ncol

Is the count of the actual number of columns in the geno matrix.

zero

Is a vector with one entry per marker. The value will be 0 if the marker is not in the geno matrix. Otherwise the value is the column number in the geno matrix where the marker data appears.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

getgenotypesdos(ENV$markers[ENV$markers$chromosome == 1,])


Mega2R documentation built on May 29, 2024, 1:14 a.m.