partitionVariantFile: Partition the variant file to run in parallel

Description Usage Arguments Value Examples

View source: R/run_annotation.R

Description

partitionVariantFile is used to partition the variant file into small partition files by user defined size or number

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
partitionVariantFile(
  variantFile,
  chunkSize = NULL,
  chunkNum = NULL,
  chunkPath = "chunks",
  species,
  ensemblVersion = NULL,
  overwrite = FALSE,
  dataDir = NULL,
  getTranscript = TRUE,
  format = "csv",
  verbose = FALSE
)

Arguments

variantFile

variant file in CSV format

chunkSize

Partition the variant file into chunk files, which has a certain number of rows

chunkNum

Partition the variant file into a certain number of chunk files

chunkPath

(optional) A file directory to store all partition files. By default will output to "chunks" folder

species

human or mouse

ensemblVersion

(optional) a number specifying which version of Ensembl annotation you'd like to use, by default use the latest version

overwrite

(optional) If chunkPath already exists and not empty, whether or not to overwrite it. By default, do not overwrite and will raise error.

dataDir

(optional) path to store database information, if not specified will create a folder named as input variant file name with a "db_" prefix

getTranscript

(optional) Whether to get ids of the transcripts that overlap with all the variants. The default value is TRUE. If the number of variants is too large (for example > 100,000), set it to FALSE and do this in runUTRAnnotation on each partition in parallel.

format

(optional) csv or vcf, the default is csv

verbose

Whether print diagnostic messages. The default is FALSE.

Value

a list of partition variant files

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
variants_sample <- system.file("extdata", "variants_sample.csv", package = "utr.annotation")

# Partition variants_sample file equally into 3 variant files
# and store them in user specified chunkPath folder
partitionVariantFile(variantFile = variants_sample,
                     chunkNum = 3,
                     chunkPath = "chunks_3",
                     species = "human",
                     ensemblVersion = 93,
                     dataDir = "db_all_variants")

# Partition variants_sample file into smaller variant files each of which contains 7 variants,
# and store them in user specified chunkPath folder
partitionVariantFile(variantFile = variants_sample,
                     chunkSize = 7,
                     chunkPath = "chunks_7_vars",
                     species = "human",
                     ensemblVersion = 93,
                     dataDir = "db_all_variants")
unlink("db_all_variants", recursive = TRUE)
unlink("chunks_3", recursive = TRUE)
unlink("chunks_7_vars", recursive = TRUE)

utr.annotation documentation built on Aug. 23, 2021, 9:06 a.m.