getgeninfo: Get information about a gen, impute2, file

Description Usage Arguments Value Examples

View source: R/getinfo.R

Description

Routine to return information about a gen file. This information is used by other routines to allow for quicker extraction of values from the file.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
getgeninfo(
  genfiles,
  snpcolumns = 1L:5L,
  startcolumn = 6L,
  impformat = 3L,
  chromosome = character(),
  header = c(FALSE, TRUE),
  gz = FALSE,
  index = TRUE,
  snpidformat = 0L,
  sep = c("\t", "\t")
)

Arguments

genfiles

A vector of file names. The first is the name of the gen file. The second is name of the sample file that contains the subject information.

snpcolumns

Column numbers containing chromosome, snpid, location, reference allele, alternate allele, respectively. This must be an integer vector. All values must be positive except for the chromosome. The value for the chromosome may be -1 or -0. -1 indicates that the chromosome value is passed to the routine using the chromosome parameter. 0 indicates that the chromosome value is in the snpid and that the snpid has the format chromosome:other_data. Default value is c(1L, 2L, 3L, 4L, 5L).

startcolumn

Column number of first column with genetic probabilities or dosages. Must be an integer value. Default value is 6L.

impformat

Number of genetic data values per subject. 1 indicates dosage only, 2 indicates P(g=0) and P(g=1) only, 3 indicates P(g=0), P(g=1), and P(g=2). Default value is 3L.

chromosome

Chromosome value to use if the first value of the snpcolumns is equal to 0. Default value is character().

header

Indicators if the gen and sample files have headers. If the gen file does not have a header. A sample file must be included. Default value is c(FALSE, TRUE).

gz

Indicator if file is compressed using gzip. Default value is FALSE.

index

Indicator if file should be indexed. This allows for faster reading of the file. Indexing a gzipped file is not supported. Default value is TRUE.

snpidformat

Format to change the snpid to. 0 indicates to use the snpid format in the file. 1 indicates to change the snpid into chromosome:location, 2 indicates to change the snpid into chromosome:location:referenceallele:alternateallele, 3 indicates to change the snpid into chromosome:location_referenceallele_alternateallele, Default value is 0.

sep

Separators used in the gen file and sample files, respectively. If only value is provided it is used for both files. Default value is c("\t", "\t")

Value

List with information about the gen file. This includes family and subject IDs along with a list of the SNPs in the file. Other information needed to read the file is also included.

Examples

1
2
3
4
5
6
7
# Get file names of th gen and sample file
gen3afile <- system.file("extdata", "set3a.imp", package = "BinaryDosage")
gen3ainfo <- system.file("extdata", "set3a.sample", package = "BinaryDosage")

# Get the information about the gen file
geninfo <- getgeninfo(genfiles = c(gen3afile, gen3ainfo),
                      snpcolumns = c(0L, 2L:5L))

BinaryDosage documentation built on Jan. 13, 2020, 5:06 p.m.