GUSbase: Genotyping Uncertainty with Sequencing Data: Base Package

readRA

R Documentation

Read an Reference/Alternate (RA) file.

Description

Function which processes an RA file into an RA object.

Usage

readRA(rafile, snpsubset = NULL, sampthres = 0.01, excsamp = NULL, ...)

Arguments

`rafile`	Character string giving the path to the RA file to be read into R. Typically the required string is returned from the VCFtoRA function when the VCF file is converted to RA format.
`snpsubset`	Integer vector giving the indices of the SNPs from the RA file to be read in. This indices correspond to the rows of the RA file (excluding the header row).
`sampthres`	A numeric value giving the filtering threshold for which individual samples are removed. Default is 0.01 which means that samples with an average number of reads per SNP that is less than 0.01 are removed.
`excsamp`	A character vector of the sample IDs that are to be excluded (or discarded). Note that the sample IDs must correspond to those given in the RA file that is to be processed.
`...`	Additional arguments (not used).

Details

RA format is a tab-delimited with columns, CHROM, POS, SAMPLES where SAMPLES consists of sampleIDs, which typically consist of a colon-delimited sampleID, flowcellID, lane, seqlibID. e.g.,

CHROM	POS	999220:C4TWKACXX:7:56	999204:C4TWKACXX:7:56
1	415	5,0	0,3
1	443	1,0	4,4
1	448	0,0	0,2

Note: Indels are removed, multiple alternative alleles are removed and ./. is translated into 0,0.

Value

An R6 object of class RA.

Author(s)

Timothy P. Bilton

Examples

file <- simDS()
RAfile <- VCFtoRA(file$vcf)
simdata <- readRA(RAfile)

## Reading in a subset of the data
# Takes SNPs 10 to 30
subset <- readRA(RAfile, snpsubset = 10:30)

# Read in a random set of SNPs
set.seed(675)
subset <- readRA(RAfile, snpsubset = sample(1:1000, size=10))

tpbilton/GUSbase documentation built on March 8, 2024, 1:35 p.m.