mixedSortDF: sort data.frame keeping numeric values in proper order

mixedSortDFR Documentation

sort data.frame keeping numeric values in proper order

Description

sort data.frame keeping numeric values in proper order

Usage

mixedSortDF(
  df,
  byCols = seq_len(ncol(df)),
  na.last = TRUE,
  decreasing = NULL,
  useRownames = FALSE,
  verbose = FALSE,
  blanksFirst = TRUE,
  keepNegative = FALSE,
  keepInfinite = FALSE,
  keepDecimal = FALSE,
  ignore.case = TRUE,
  useCaseTiebreak = TRUE,
  sortByName = FALSE,
  honorFactor = TRUE,
  ...
)

Arguments

df

data.frame input

byCols

one of two types of input:

  1. integer vector referring to the order of columns to be used by mmixedOrder() to order the data.frame. Note that negative values will reverse the sort order for the corresponding column number. To sort rownames(df) use zero 0, and to reverse sorting rownames(x) use -0.1 where the negative sign will reverse the sort, and -0.1 will be rounded to 0.

  2. character vector of values in colnames(df), optionally including prefix "-" to reverse the sort. Note that the argument decreasing can also be used to specify columns to have reverse sort, either as a single value or vector to be applied to each column in byCols. To sort rownames(df) use "rownames" or "row.names". To reverse sorting rownames(df) use "-rownames" or "-row.names".

na.last

logical whether to move NA entries to the end of the sort. When na.last=TRUE then NA values will always be last, even following blanks and infinite values. When na.last=FALSE then NA values will always be first, even before blanks and negative infinite values.

decreasing

NULL or logical vector indicating which columns in byCols should be sorted in decreasing order. By default, the sign(byCols) is used to define the sort order of each column, but it can be explicitly overridden with this decreasing parameter.

useRownames

logical whether to use rownames(df) as a last tiebreaker in the overall rank ordering. This parameter has the primary effect of assuring a reproducible result, provided the rownames are consistently defined, or if rownames are actually row numbers. When useRownames=FALSE then rows that would otherwise be ties will be returned in the same order they were provided in df.

verbose

logical whether to print verbose output. When verbose=2 there is slightly more verbose output.

blanksFirst, na.last, keepNegative, keepInfinite, keepDecimal, ignore.case, useCaseTiebreak, sortByName

arguments passed to mmixedOrder(), except sortByName which is not passed along.

...

additional arguments passed to mmixedOrder() for custom sort options as described in mixedSort().

Details

This function is a wrapper around mmixedOrder() so it operates on data.frame columns in the proper order, using logic similar that used by base::order() when operating on a data.frame. The sort order logic is fully described in mixedSort() and mixedOrder().

Note that byCols can either be given as integer column index values, or character vector of colnames(x). In either case, using negative prefix - will reverse the sort order of the corresponding column.

For example byCols=c(2, -1) will sort column 2 increasing, then column 1 decreasing.

Similarly, one can supply colnames(df), such as byCols=c("colname2", "-colname1"). Values are matched as-is to colnames(df) first, then any values not matched are compared again after removing prefix - from the start of each character string. Therefore, if colnames(df) contains "-colname1" it will be matched as-is, but "--colname1" will only be matched after removing the first -, after which the sort order will be reversed for that column.

For direct control over the sort order of each column defined in byCols, you can supply logical vector to argument decreasing, and this vector is recycled to length(byCols).

Finally, for slight efficiency, only unique columns defined in byCols are used to determine the row order, so even if a column is defined twice in byCols, only the first instance is passed to mmixedOrder() to determine row order.

Value

data.frame whose rows are ordered using mmixedOrder().

See Also

Other jam sort functions: mixedOrder(), mixedSorts(), mixedSort(), mmixedOrder()

Other jam string functions: asSize(), breaksByVector(), cPasteSU(), cPasteS(), cPasteUnique(), cPasteU(), cPaste(), fillBlanks(), formatInt(), gsubOrdered(), gsubs(), makeNames(), mixedOrder(), mixedSorts(), mixedSort(), mmixedOrder(), nameVectorN(), nameVector(), padInteger(), padString(), pasteByRowOrdered(), pasteByRow(), sizeAsNum(), tcount(), ucfirst(), uniques()

Examples

# start with a vector of miRNA names
x <- c("miR-12","miR-1","miR-122","miR-1b", "miR-1a","miR-2");
# add some arbitrary group information
g <- rep(c("Air", "Treatment", "Control"), 2);
# create a data.frame
df <- data.frame(group=g,
   miRNA=x,
   stringsAsFactors=FALSE);

# input data
df;

# output when using order()
df[do.call(order, df), , drop=FALSE];

# output with mixedSortDF()
mixedSortDF(df);

# mixedSort respects factor order
# reorder factor levels to demonstrate.
# "Control" should come first
gf <- factor(g, levels=c("Control", "Air", "Treatment"));
df2 <- data.frame(groupfactor=gf,
   miRNA=x,
   stringsAsFactors=FALSE);

# now the sort properly keeps the group factor levels in order,
# which also sorting the miRNA names in their proper order.
mixedSortDF(df2);


x <- data.frame(l1=letters[1:10],
   l2=rep(letters[1:2+10], 5),
   L1=LETTERS[1:10],
   L2=rep(LETTERS[1:2+20], each=5));
set.seed(123);
rownames(x) <- sample(seq_len(10));
x;

# sort by including rownames
mixedSortDF(x, byCols=c("rownames"));
mixedSortDF(x, byCols=c("L2", "-rownames"));

# demonstrate sorting a matrix with no rownames
m <- matrix(c(2, 1, 3, 4), ncol=2);
mixedSortDF(m, byCols=-2)

# add rownames
rownames(m) <- c("c", "a");
mixedSortDF(m, byCols=0)
mixedSortDF(m, byCols="-rownames")
mixedSortDF(m, byCols="rownames")

mixedSortDF(data.frame(factor1=factor(c("Cnot9", "Cnot8", "Cnot10"))), honorFactor=FALSE)


jmw86069/jamba documentation built on March 26, 2024, 5:26 a.m.