View source: R/ndimensional_binning.R
binNdimensions | R Documentation |
Divide data along different dimensions into equally spaced bins, and summarize the datapoints that fall into any of these n-dimensional bins.
binNdimensions(
dims.df,
nbins = 10L,
use_bin_numbers = TRUE,
ncores = getOption("mc.cores", 2L)
)
aggregateByNdimBins(
x,
dims.df,
nbins = 10L,
FUN = mean,
...,
ignore.na = TRUE,
drop = FALSE,
empty = NA,
use_bin_numbers = TRUE,
ncores = getOption("mc.cores", 2L)
)
densityInNdimBins(
dims.df,
nbins = 10L,
use_bin_numbers = TRUE,
ncores = getOption("mc.cores", 2L)
)
dims.df |
A dataframe containing one or more columns of numerical data for which bins will be generated. |
nbins |
Either a number giving the number of bins to use for all dimensions (default = 10), or a vector containing the number of bins to use for each dimension of input data given. |
use_bin_numbers |
A logical indicating if ordinal bin numbers should be
returned ( |
ncores |
Number of cores to use for computations. |
x |
The name of the dimension in |
FUN |
A function to use for aggregating data within each bin. |
... |
Additional arguments passed to |
ignore.na |
Logical indicating if |
drop |
A logical indicating if empty bin combinations should be removed
from the output. By default ( |
empty |
When |
These functions take in data along 1 or more dimensions, and for
each dimension the data is divided into evenly-sized bins from the minimum
value to the maximum value. For instance, if each row of dims.df
were a gene, the columns (the different dimensions) would be various
quantitative measures of that gene, e.g. expression level, number of exons,
length, etc. If plotted in cartesian coordinates, each gene would be a
single datapoint, and each measurement would be a separate dimension.
binNdimensions
returns the bin numbers themselves. The output
dataframe has the same dimensions as the input dims.df
, but each
input data has been replaced by its bin number (an integer). If
codeuse_bin_numbers = FALSE, the center points of the bins are returned
instead of the bin numbers.
aggregateByNdimBins
summarizes some input data x
in each
combination of bins, i.e. in each n-dimensional bin. Each row of the output
dataframe is a unique combination of the input bins (i.e. each
n-dimensional bin), and the output columns are identical to those in
dims.df
, with the addition of one or more columns containing the
aggregated data in each n-dimensional bin. If the input x
was a
vector, the column is named "value"; if the input x
was a dataframe,
the column names from x
are maintained.
densityInNdimBins
returns a dataframe just like
aggregateByNdimBins
, except the "value" column contains the number
of observations that fall into each n-dimensional bin.
A dataframe.
Mike DeBerardine
data("PROseq") # import included PROseq data
data("txs_dm6_chr4") # import included transcripts
#--------------------------------------------------#
# find counts in promoter, early genebody, and near CPS
#--------------------------------------------------#
pr <- promoters(txs_dm6_chr4, 0, 100)
early_gb <- genebodies(txs_dm6_chr4, 500, 1000, fix.end = "start")
cps <- genebodies(txs_dm6_chr4, -500, 500, fix.start = "end")
df <- data.frame(counts_pr = getCountsByRegions(PROseq, pr),
counts_gb = getCountsByRegions(PROseq, early_gb),
counts_cps = getCountsByRegions(PROseq, cps))
#--------------------------------------------------#
# divide genes into 20 bins for each measurement
#--------------------------------------------------#
bin3d <- binNdimensions(df, nbins = 20, ncores = 1)
length(txs_dm6_chr4)
nrow(bin3d)
bin3d[1:6, ]
#--------------------------------------------------#
# get number of genes in each bin
#--------------------------------------------------#
bin_counts <- densityInNdimBins(df, nbins = 20, ncores = 1)
bin_counts[1:6, ]
#--------------------------------------------------#
# get mean cps reads in bins of promoter and genebody reads
#--------------------------------------------------#
bin2d_cps <- aggregateByNdimBins("counts_cps", df, nbins = 20,
ncores = 1)
bin2d_cps[1:6, ]
subset(bin2d_cps, is.finite(counts_cps))[1:6, ]
#--------------------------------------------------#
# get median cps reads for those bins
#--------------------------------------------------#
bin2d_cps_med <- aggregateByNdimBins("counts_cps", df, nbins = 20,
FUN = median, ncores = 1)
bin2d_cps_med[1:6, ]
subset(bin2d_cps_med, is.finite(counts_cps))[1:6, ]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.