Description Usage Arguments Details Value Author(s) Examples
Divide data along different dimensions into equally spaced bins, and summarize the datapoints that fall into any of these n-dimensional bins.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | binNdimensions(
dims.df,
nbins = 10,
use_bin_numbers = TRUE,
ncores = getOption("mc.cores", 2L)
)
aggregateByNdimBins(
x,
dims.df,
nbins = 10,
FUN = mean,
...,
ignore.na = TRUE,
drop = FALSE,
empty = NA,
use_bin_numbers = TRUE,
ncores = getOption("mc.cores", 2L)
)
densityInNdimBins(
dims.df,
nbins = 10,
use_bin_numbers = TRUE,
ncores = getOption("mc.cores", 2L)
)
|
dims.df |
A dataframe containing one or more columns of numerical data for which bins will be generated. |
nbins |
Either a number giving the number of bins to use for all dimensions (default = 10), or a vector containing the number of bins to use for each dimension of input data given. |
use_bin_numbers |
A logical indicating if ordinal bin numbers should be
returned ( |
ncores |
Number of cores to use for computations. |
x |
The name of the dimension in |
FUN |
A function to use for aggregating data within each bin. |
... |
Additional arguments passed to |
ignore.na |
Logical indicating if |
drop |
A logical indicating if empty bin combinations should be removed
from the output. By default ( |
empty |
When |
These functions take in data along 1 or more dimensions, and for
each dimension the data is divided into evenly-sized bins from the minimum
value to the maximum value. For instance, if each row of dims.df
were a gene, the columns (the different dimensions) would be various
quantitative measures of that gene, e.g. expression level, number of exons,
length, etc. If plotted in cartesian coordinates, each gene would be a
single datapoint, and each measurement would be a separate dimension.
binNdimensions returns the bin numbers themselves. The output
dataframe has the same dimensions as the input dims.df, but each
input data has been replaced by its bin number (an integer). If
codeuse_bin_numbers = FALSE, the center points of the bins are returned
instead of the bin numbers.
aggregateByNdimBins summarizes some input data x in each
combination of bins, i.e. in each n-dimensional bin. Each row of the output
dataframe is a unique combination of the input bins (i.e. each
n-dimensional bin), and the output columns are identical to those in
dims.df, with the addition of one or more columns containing the
aggregated data in each n-dimensional bin. If the input x was a
vector, the column is named "value"; if the input x was a dataframe,
the column names from x are maintained.
densityInNdimBins returns a dataframe just like
aggregateByNdimBins, except the "value" column contains the number
of observations that fall into each n-dimensional bin.
A dataframe.
Mike DeBerardine
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | data("PROseq") # import included PROseq data
data("txs_dm6_chr4") # import included transcripts
#--------------------------------------------------#
# find counts in promoter, early genebody, and near CPS
#--------------------------------------------------#
pr <- promoters(txs_dm6_chr4, 0, 100)
early_gb <- genebodies(txs_dm6_chr4, 500, 1000, fix.end = "start")
cps <- genebodies(txs_dm6_chr4, -500, 500, fix.start = "end")
df <- data.frame(counts_pr = getCountsByRegions(PROseq, pr),
counts_gb = getCountsByRegions(PROseq, early_gb),
counts_cps = getCountsByRegions(PROseq, cps))
#--------------------------------------------------#
# divide genes into 20 bins for each measurement
#--------------------------------------------------#
bin3d <- binNdimensions(df, nbins = 20, ncores = 1)
length(txs_dm6_chr4)
nrow(bin3d)
bin3d[1:6, ]
#--------------------------------------------------#
# get number of genes in each bin
#--------------------------------------------------#
bin_counts <- densityInNdimBins(df, nbins = 20, ncores = 1)
bin_counts[1:6, ]
#--------------------------------------------------#
# get mean cps reads in bins of promoter and genebody reads
#--------------------------------------------------#
bin2d_cps <- aggregateByNdimBins("counts_cps", df, nbins = 20,
ncores = 1)
bin2d_cps[1:6, ]
subset(bin2d_cps, is.finite(counts_cps))[1:6, ]
#--------------------------------------------------#
# get median cps reads for those bins
#--------------------------------------------------#
bin2d_cps_med <- aggregateByNdimBins("counts_cps", df, nbins = 20,
FUN = median, ncores = 1)
bin2d_cps_med[1:6, ]
subset(bin2d_cps_med, is.finite(counts_cps))[1:6, ]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.