new_acset: Construct an allele count set

Description Usage Arguments Details Value Examples

View source: R/scphaser.R

Description

new_acset constructs a list which is the data-structure used by the scphaser phasing functions.

Usage

1
2
new_acset(featdata, refcount = NA, altcount = NA, phenodata = NA,
  gt = NA)

Arguments

featdata

Data-frame with four required columns and arbitrary additional columns. The purpose of the featdata data-frame is to map variants to features and specify the two alleles of each variant. The four required columns must be named 'feat', 'var', 'ref' and 'alt'. The rownames must also be set to be identical to the var column. 'feat' is a character vector specifying feature names, such as gene names. 'var' is a character vector specifying variant names, such as dbSNP rs id. 'ref' and 'alt' are character vectors specifying the alleles of each variant, such as the reference and alternative allele.

refcount

Matrix with allelic counts for the reference allele with variants as rows and cells as columns. The rownames have to match the values in the 'var' column in the featdata, and the colnames the values in the phenodata 'sample' column. If refcount is provided altcount must also be provided. If the gt argument is provided then refcount and altcount are not required arguments.

altcount

Matrix with allelic counts for the alternative allele, where the row- and col-names must match those of refcount.

phenodata

Data-frame which annotates the cells. It must contain a column named 'sample'. If the phenodata argument is not provided it is created by the link{new_acset} function with the sample column set to be identical to the column names of refcount or gt.

gt

Matrix with integer values representing transcribed genotype calls. 0: reference allele most highly expressed, 1: bi-allelic expression with similar degree of expression from the two alleles, 2: alternative allele most highly expressed. NA's are allowed and can be used to indicate entries where no call could be made. The rownames have to match the values in the 'var' column in the featdata, and the colnames the values in the phenodata 'sample' column. If refcount and altcount are provided then the gt argument does not need to be provided.

Details

The function performs a number of error-checks to ensure that the constructed acset-list elements satisfy the data-format used by the phasing functions.

Value

acset An acset list which contains elements required to apply the phasing functions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
##create a small artificial genotype matrix
ncells = 10
paternal = c(0, 2, 0, 0, 2)
maternal = c(2, 0, 2, 2, 0)
gt = as.matrix(as.data.frame(rep(list(paternal, maternal), ncells / 2)))
vars = 1:nrow(gt)
colnames(gt) = 1:ncells
rownames(gt) = vars

##create a feature annotation data-frame
nvars = nrow(gt)
featdata = as.data.frame(matrix(cbind(rep('jfeat', nvars),
as.character(1:nvars), rep('dummy', nvars), rep('dummy', nvars)), ncol = 4,
dimnames = list(vars, c('feat', 'var', 'ref', 'alt'))), stringsAsFactors =
FALSE)

##create acset
acset = new_acset(featdata, gt = gt)

scphaser documentation built on May 29, 2017, 3:49 p.m.