collapse.rows: collapse rows

Description Usage Arguments Value Collapsed data uncollapse.rows collapse.rows Author(s) See Also Examples

Description

collapse/uncollapse the rows in a data.frame

Usage

1
2
3
4
collapse.rows(x, key.column = 1, cols2collapse = NULL, sep = " // ",
  max.nchar = NULL)

uncollapse.rows(x, cols2uncollapse = NULL, sep = " // ")

Arguments

x

a data.frame

key.column

the column that must end up having one row per key. numeric or character allowed.

cols2collapse

which are the columns that you want to collapse. Often there are columns which will contain the same info repeated over and over, and you don't want these things to have the same word repeated N times. Just ignore these columns then, and only supply the column names of those columns that you want to be joined.

sep

the seperator, such as “, ” or “ // ”

max.nchar

the maximum number of characters in each collapsed cell. if NULL then no filtering is performed, otherwise long strings will be truncated at max.nchar-3 with ... appended. if you intend to use uncollapse.rows later, and depend on there being the correct number of values found, then leave max.nchar=NULL

cols2uncollapse

Which columns need uncollapsing? Must specify at least 1 column (hint: this is the column that contains sep that you're trying to get rid of). If you specify >1 columns, then each cell in that row must have the same number of code elements to be split.

Value

collapse.rows: a data.frame with same num columns, but only N rows corresponding to the N different values in the key column. alphanumerically sorted by key column.
uncollapse.rows: a data.frame with same num columns, but with no rows that have duplicate values in the cols2uncollapse.

Collapsed data

Collapsed data means a data.frame with at least 1 column whose values are sep delimited. example:
* a cell of data with multiple gene symbols "Ankrd11|Galnt2"
* a cell of data with multiple GO terms, eg "GO:00001 /// GO:00002 /// GO:00003"
* a cell of data with multiple attributes, eg "TD, ND, CD" These type of data are very common when there are multiple values per key.

uncollapse.rows

uncollapse.rows takes collapsed data, and increases the number of rows, such that these data have 1 element per row. so 1 row with "Ankrd11|Galnt2" becomes 2 rows with "Ankrd11" and "Galnt2" for example. Thus changing it from n:1 to 1:1.

All columns that are not specified in cols2uncollapse will be repeated. If you have just 1 column to uncollapse, then only that column will be changed. If you have more than 1 column to expand, then within those rows that need uncollapsing, all specified columns MUST have the same number of elements. Eg consider a data.frame with 1 row per gene with 3 GO-term columns: GO.ID, GO.Name, GO.Evidence. For any given gene with 3 GO terms, there should also be exactly 3 GO ID's, 3 GO Names and 3 GO term evidence codes. If there are different numbers of elements found this will thow an error. @TODO: allow a different number of values per collapsed row.

Strongly suggest using this function to reverse the effects of collapse.rows, using the same arguments that were supplied to collapse.rows itself.

collapse.rows

Take a data.frame which has rows that contain mostly the same info, but some columns change. You want one row per unique value of key from x[,key.column], and in the columns that contain non-equal data, collapse these values into a single value, separated by “, ” or “ // ” for example.

Author(s)

Mark Cowley, 2009-01-07

See Also

uncollapse.rows

Examples

1
2
3
4
5
6
7
8
9
df <- data.frame(
   Name=rep(LETTERS[1:3], each=3), 
   Description=rep(letters[1:3], each=3),
   Value=LETTERS[11:19],
   stringsAsFactors=FALSE
)
a <- collapse.rows(df, 1, 3)
a
uncollapse.rows(a, 1, 3)

drmjc/mjcbase documentation built on May 15, 2019, 2:27 p.m.