locus: Genomic locus for CNV calls

Description Usage Arguments Details Value Author(s)

View source: R/locus.R


locus Extract genomic locus information to each row of a data-frame consisting of CNV-like entries


locus(, whichCyto = "remote", bands, assembly = "hg19",
  n.cores = 4)


a data-frame consisting of CNVcalls


"remote" or "local"


data-frame containing genomic locus reference for the selected assembly


genomic assembly, either "hg18", "hg19" or "hg38"


number of usable CPU cores


This function takes as input a data-frame containing CNV calls or any similar entries and returns the same dataframe with three additional columns: "loc.str", "loc.end", and "locus". This can be useful per se and it is a required step for comparing two datasets with the function inter_comp. Input must possess the following columns: \item"chr", chromosome of the call in GRCh format (i.e. "1", not "chr1") \item"start" start of the call \item"end" end of the call \newline By default the function will attempt to download the required cytobands file of the selected assembly (default is "hg19"), it is possible to pass a local file as bands setting the whichCyto parameter to "local" instead. If a local file is used the fist four columns must be "chr", "start", "end", "locus". Columns name is not relevant as long as the corrected order is maintained. \newline The function uses a for loop and this is its major bottleneck. In order to speed up the process the input dataset is splitted according to the n.cores parameter and the splits are processed in parallel. As an example, processing a data-frame with ~10500 entries takes about 10 seconds using 4 cores, and about 4 seconds using 16 cores on our system, while the same work using only one core takes around 31 seconds. Default number of cores is 4, in this way it should work with default parameters even on a laptop.




Simone Montalbano [email protected]

SinomeM/cnv_geaRs documentation built on Jan. 25, 2020, 8:39 p.m.