Description Usage Arguments Details Value Custom reader function See Also Examples
View source: R/faster_gene_cna.R
The function reads through in all sample CNA data given by the sample
description sample_desc
and returns a joint CNA gain/loss table based
on gene regions across samples.
1 2 3 |
sample_desc |
data.table object created by create_sample_desc. |
gain_threshold |
CNA expression above this will be considered as gain region. By default \log_2{2.5} - 1 |
loss_threshold |
CNA expression below this will be considered as loss region. By default \log_2{1.5} - 1 |
read_fun |
Custom reader function, see its own section for more detail. |
progress |
Whether to display a progress bar. By default |
progress_width |
The text width of the shown progress bar. By default is 48 chars wide. |
parallel |
Enable parallelism by plyr. One has to specify a parallel engine beforehand. See example for more information. |
... |
Arguments passed to the custom reader function specified in
|
A gene is considered to have CNA gain if the overlapped CNA record expression is higher than the given threshold. Similarly, a gene is considered CNA loss if the overlapped CNA record is lower than the given threshold. If multiple CNA records map onto the same gene region with both gain and loss, the majority wins. If none of the records map to the gene, NA is given.
By default it assumes the data to be of TCGA level 3 file format. For other data formats (e.g. raw data or other experiments from GEO), one should implement a custom reader function that accepts the filepath as the first argument. See section Custom reader function for full specification.
Currently the package ships a custom genome reference hg19, hg19DBNM, for gene region look up. Each gene's region is defined by the widest splicing form it has in NCBI curated records. The defined region includes intron regions. This limitation may change in the future.
data.table of CNA gain/loss on each gene region for all samples,
whose rows represent regions of genes and columns represent sample names.
First column GENE
contains the corresponding gene names.
Custom reader function is given by
read_fun = your_reader_fun
. It takes the filepath to CNA data as the
first argument and returns a data.table with at least the following four
columns: Chromosome
, Start
, End
, and
Segment_Mean
of type character, integer, integer and numeric
respectively.
Rest arguments of create_gene_cna(...)
will be passed to this reader
function.
Note: all string-like columns should NOT be of type factor
.
Remember to set stringsAsFactors = FALSE
.
read.table
and fread
for custom reader function implementation; create_sample_desc
for creating sample description. If the gene information already exists in
the data, try direct_gene_cna to skip the genome reference lookup.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ## Use first three samples of the builtin dataset
sample_root <- system.file("extdata", package = "iGC")
sample_desc_pth <- file.path(sample_root, "sample_desc.csv")
sample_desc <- create_sample_desc(
sample_desc_pth, sample_root=sample_root
)[1:3]
## Define custom reader function for TCGA level 3 gene exp. data
my_cna_reader <- function(cna_filepath) {
cna <- data.table::fread(cna_filepath, sep = '\t', header = TRUE)
data.table::setnames(
cna,
c("Sample", "Chromosome", "Start", "End", "Num_Probes", "Segment_Mean")
)
# pick only the needed columns
cna[, .(Chromosome, Start, End, Segment_Mean)]
}
## Read all samples' CNA data and combined as a single table
gene_cna <- create_gene_cna(
sample_desc,
gain_threshold = log2(2.3) - 1, loss_threshold = log2(1.7) - 1,
read_fun = my_cna_reader,
)
gene_cna[GENE %in% c("BRCA2", "TP53", "SEMA5A"), ]
## Not run:
## To boost the speed, utilize parallelization
doMC::registerDoMC(4) # number of CPU cores
gene_cna <- create_gene_cna(
sample_desc,
gain_threshold = log2(2.3) - 1, loss_threshold = log2(1.7) - 1,
read_fun = my_cna_reader,
parallel = TRUE
)
## End(Not run)
|
| | 0%
|====== |33.33333% ~1 s remaining
|============= |66.66667% ~0 s remaining
|====================|100% ~0 s remaining
|====================|100% Completed after 1 s
GENE TCGA-AO-A0JL-01A TCGA-BH-A0HF-01A TCGA-BH-A0HK-01A
1: BRCA2 -1 -1 0
2: SEMA5A 1 0 0
3: TP53 0 0 -1
Progress disabled when using parallel plyr
Warning messages:
1: In readChild(ch) : unable to terminate child process: Permission denied
2: In readChild(ch) : unable to terminate child process: Permission denied
3: In readChild(ch) : unable to terminate child process: Permission denied
4: In cleanup(mc.cleanup) : unable to terminate child: Permission denied
5: In cleanup(mc.cleanup) : unable to terminate child: Permission denied
6: In cleanup(mc.cleanup) : unable to terminate child: Permission denied
Error while shutting down parallel: unable to terminate some child processes
sh: 1: rm: Permission denied
Warning messages:
1: In cleanup(kill = tools::SIGKILL, detach = TRUE, shutdown = TRUE) :
unable to terminate child: Permission denied
2: In cleanup(kill = tools::SIGKILL, detach = TRUE, shutdown = TRUE) :
unable to terminate child: Permission denied
3: In cleanup(kill = tools::SIGKILL, detach = TRUE, shutdown = TRUE) :
unable to terminate child: Permission denied
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.