extract_snps: Extract snps from imputed chunk files

Description Usage Arguments Details Value Examples

Description

Extract snps from imputed chunk files

Usage

1
2
3
extract_snps(snps, indir, outdir, chunkmap, idfile, keep = NULL,
             chunkmap_cols = 1:3, pattern = "\\.gz$", ncore = 1L,
             chr_chunk = ".*chr([^_]+)_(\\d+)", quiet = FALSE)

Arguments

snps

Character vector with identifiers of snps to be extracted.

indir

Directory with gz-compressed chunk files.

outdir

Directory where extracted chunk files should go.

chunkmap

Paths to files mapping snps to chromosomes and chunks.

idfile

Path to file listing individual identifiers in the order in which their snp data appears in the imputed chunk files.

keep

Character vector with identifiers of individuals for which to extract snps.

chunkmap_cols

Length-3 integer vector naming the positions of the columns in the ‘chunkmap’ files corresponding to snp identifier, chromosome, and chunk number.

pattern

Regex pattern used to match input chunk files in ‘indir’.

ncore

Number of cores to use in parallel.

chr_chunk

Extended regular expression with two parenthesized subexpressions matching chromosome and chunk number in input chunk file names.

quiet

If FALSE (default) progress messages are printed to the screen.

Details

By default, the first 3 columns of the ‘chunkmap’ files are assumed to correspond to the snp identifier, the chromosome, and the chunk number (in that order). If your ‘chunkmap’ files use different columns, you must specify the corresponding columns in ‘chunkmap_cols’. Columns other than the ones named in ‘chunkmap_cols’ are ignored.

We assume the a ‘chunkmap’ file has a header if all fields in the first row are of type "character". Therefore, headers of your ‘chunkmap’ files must not contain any numbers in the column names.

The ‘idfile’ must not have a header and must contain one identifier per line. The order of individuals in the extracted chunk files will be written to the file ‘order_of_individuals.txt’ in ‘outdir’.

Value

Returns ‘NULL’ unless some snps could not be extracted in which case a data frame is returned. The data frame contains the columns ‘snp’, ‘chr’, ‘chunk’, and ‘comment’. The comment column states the reason why a snp could not be extracted.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(genFun)

## Not run: 
extract_snps(
    snps      = c("rs123", "rs456"),
    indir     = "path/to/directory/with_chunk_files/",
    outdir    = "path/where/extracted/chunk/files/should/go",
    chunkmap  = c("path/to/chunkmap_part_1.txt", "path/to/chunkmap_part_2.txt"),
    idfile    = "path/to/file/with_all_person_ids_in_correct_order.txt",
    keep      = c("person_1", "person_2"),
    chunkmap_cols = c(1L, 3L, 4L),
    ncore     = 6L)

## End(Not run)

cbaumbach/genFun documentation built on May 13, 2019, 1:47 p.m.