Description Usage Arguments Details Value Note Author(s) See Also Examples
View source: R/functionsbinning.R
This functions takes two samesized numeric vectors x
and y
, bins/cuts x
into bins (either a predefined number
of equalsized bins or bins of a predefined size) and aggregates values
in y
corresponding to x
values falling within each bin. By
default (i.e. method = "max"
) the maximal y
value for the
corresponding x
values is identified. x
is expected to be
incrementally sorted and, if not, it will be internally sorted (in which
case also y
will be ordered according to the order of x
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 
x 
Numeric vector to be used for binning. 
y 
Numeric vector (same length than 
breaks 
Numeric vector defining the breaks for the bins, i.e. the lower and upper values for each bin. See examples below. 
nBins 
integer(1) defining the number of desired bins. 
binSize 
numeric(1) defining the desired bin size. 
binFromX 
Optional numeric(1) allowing to manually specify
the range of xvalues to be used for binning.
This will affect only the calculation of the breaks for the bins
(i.e. if 
binToX 
Same as 
fromIdx 
Integer vector defining the start position of one or multiple
subsets of input vector 
toIdx 
Same as 
method 
A character string specifying the method that should be used to
aggregate values in 
baseValue 
The base value for empty bins (i.e. bins into which either
no values in 
sortedX 
Whether 
shiftByHalfBinSize 
Logical specifying whether the bins should be
shifted by half the bin size to the left. Thus, the first bin will have
its center at 
returnIndex 
Logical indicating whether the index of the max (if

returnX 

The breaks defining the boundary of each bin can be either passed
directly to the function with the argument breaks
, or are
calculated on the data based on arguments nBins
or binSize
along with fromIdx
, toIdx
and optionally binFromX
and binToX
.
Arguments fromIdx
and toIdx
allow to specify subset(s) of
the input vector x
on which bins should be calculated. The
default the full x
vector is considered. Also, if not specified
otherwise with arguments binFromX
and binToX
, the range
of the bins within each of the subsets will be from x[fromIdx]
to x[toIdx]
. Arguments binFromX
and binToX
allow to
overwrite this by manually defining the a range on which the breaks
should be calculated. See examples below for more details.
Calculation of breaks: for nBins
the breaks correspond to
seq(min(x[fromIdx])), max(x[fromIdx], length.out = (nBins + 1))
.
For binSize
the breaks correspond to
seq(min(x[fromIdx]), max(x[toIdx]), by = binSize)
with the
exception that the last break value is forced to be equal to
max(x[toIdx])
. This ensures that all values from the specified
range are covered by the breaks defining the bins. The last bin could
however in some instances be slightly larger than binSize
. See
breaks_on_binSize
and breaks_on_nBins
for
more details.
Returns a list of length 2, the first element (named "x"
)
contains the bin midpoints, the second element (named "y"
) the
aggregated values from input vector y
within each bin. For
returnIndex = TRUE
the list contains an additional element
"index"
with the index of the max or min (depending on whether
method = "max"
or method = "min"
) value within each bin in
input vector x
.
The function ensures that all values within the range used to define
the breaks are considered in the binning (and assigned to a bin). This
means that for all bins except the last one values in x
have to be
>= xlower
and < xupper
(with xlower
and xupper
being the lower and upper boundary, respectively). For
the last bin the condition is x >= xlower & x <= xupper
.
Note also that if shiftByHalfBinSize
is TRUE
the range of
values that is used for binning is expanded by binSize
(i.e. the
lower boundary will be fromX  binSize/2
, the upper
toX + binSize/2
). Setting this argument to TRUE
resembles
the binning that is/was used in profBin
function from
xcms
< 1.51.
NA
handling: by default the function ignores NA
values in
y
(thus inherently assumes na.rm = TRUE
). No NA
values are allowed in x
.
Johannes Rainer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  ########
## Simple example illustrating the breaks and the binning.
##
## Define breaks for 5 bins:
brks < seq(2, 12, length.out = 6)
## The first bin is then [2,4), the second [4,6) and so on.
brks
## Get the max value falling within each bin.
binYonX(x = 1:16, y = 1:16, breaks = brks)
## Thus, the largest value in x = 1:16 falling into the bin [2,4) (i.e. being
## >= 2 and < 4) is 3, the largest one falling into [4,6) is 5 and so on.
## Note however the function ensures that the minimal and maximal xvalue
## (in this example 1 and 12) fall within a bin, i.e. 12 is considered for
## the last bin.
#######
## Performing the binning ons subset of x
##
X < 1:16
## Bin X from element 4 to 10 into 5 bins.
X[4:10]
binYonX(X, X, nBins = 5L, fromIdx = 4, toIdx = 10)
## This defines breaks for 5 bins on the values from 4 to 10 and bins
## the values into these 5 bins. Alternatively, we could manually specify
## the range for the binning, i.e. the minimal and maximal value for the
## breaks:
binYonX(X, X, nBins = 5L, fromIdx = 4, toIdx = 10, binFromX = 1, binToX = 16)
## In this case the breaks for 5 bins were defined from a value 1 to 16 and
## the values 4 to 10 were binned based on these breaks.
#######
## Bin values within a subset of x, second example
##
## This example illustrates how the fromIdx and toIdx parameters can be used.
## x defines 3 times the sequence form 1 to 10, while y is the sequence from
## 1 to 30. In this very simple example x is supposed to represent M/Z values
## from 3 consecutive scans and y the intensities measured for each M/Z in
## each scan. We want to get the maximum intensities for M/Z value bins only
## for the second scan, and thus we use fromIdx = 11 and toIdx = 20. The breaks
## for the bins are defined with the nBins, binFromX and binToX.
X < rep(1:10, 3)
Y < 1:30
## Bin the M/Z values in the second scan into 5 bins and get the maximum
## intensity for each bin. Note that we have to specify sortedX = TRUE as
## the x and y vectors would be sorted otherwise.
binYonX(X, Y, nBins = 5L, sortedX = TRUE, fromIdx = 11, toIdx = 20)
#######
## Bin in overlapping subsets of X
##
## In this example we define overlapping subsets of X and perform the binning
## within these.
X < 1:30
## Define the start and end indices of the subsets.
fIdx < c(2, 8, 21)
tIdx < c(10, 25, 30)
binYonX(X, nBins = 5L, fromIdx = fIdx, toIdx = tIdx)
## The same, but predefining also the desired range of the bins.
binYonX(X, nBins = 5L, fromIdx = fIdx, toIdx = tIdx, binFromX = 4, binToX = 28)
## The same bins are thus used for each subset.

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.