data types

Description Usage Arguments Value Collapsed data uncollapse.rows collapse.rows Author(s) See Also Examples

collapse/uncollapse the rows in a data.frame

collapse.rows(x, key.column = 1, cols2collapse = NULL, sep = " // ",
  max.nchar = NULL)

uncollapse.rows(x, cols2uncollapse = NULL, sep = " // ")

`x`	a `data.frame`
`key.column`	the column that must end up having one row per key. numeric or character allowed.
`cols2collapse`	which are the columns that you want to collapse. Often there are columns which will contain the same info repeated over and over, and you don't want these things to have the same word repeated N times. Just ignore these columns then, and only supply the column names of those columns that you want to be joined.
`sep`	the seperator, such as “, ” or “ // ”
`max.nchar`	the maximum number of characters in each collapsed cell. if `NULL` then no filtering is performed, otherwise long strings will be truncated at `max.nchar-3` with ... appended. if you intend to use `uncollapse.rows` later, and depend on there being the correct number of values found, then leave `max.nchar=NULL`
`cols2uncollapse`	Which columns need uncollapsing? Must specify at least 1 column (hint: this is the column that contains `sep` that you're trying to get rid of). If you specify >1 columns, then each cell in that row must have the same number of code elements to be split.

collapse.rows: a data.frame with same num columns, but only N rows corresponding to the N different values in the key column. alphanumerically sorted by key column.
uncollapse.rows: a data.frame with same num columns, but with no rows that have duplicate values in the cols2uncollapse.

Collapsed data means a data.frame with at least 1 column whose values are sep delimited. example:
* a cell of data with multiple gene symbols "Ankrd11|Galnt2"
* a cell of data with multiple GO terms, eg "GO:00001 /// GO:00002 /// GO:00003"
* a cell of data with multiple attributes, eg "TD, ND, CD" These type of data are very common when there are multiple values per key.

uncollapse.rows takes collapsed data, and increases the number of rows, such that these data have 1 element per row. so 1 row with "Ankrd11|Galnt2" becomes 2 rows with "Ankrd11" and "Galnt2" for example. Thus changing it from n:1 to 1:1.

All columns that are not specified in cols2uncollapse will be repeated. If you have just 1 column to uncollapse, then only that column will be changed. If you have more than 1 column to expand, then within those rows that need uncollapsing, all specified columns MUST have the same number of elements. Eg consider a data.frame with 1 row per gene with 3 GO-term columns: GO.ID, GO.Name, GO.Evidence. For any given gene with 3 GO terms, there should also be exactly 3 GO ID's, 3 GO Names and 3 GO term evidence codes. If there are different numbers of elements found this will thow an error. @TODO: allow a different number of values per collapsed row.

Strongly suggest using this function to reverse the effects of collapse.rows, using the same arguments that were supplied to collapse.rows itself.

Take a data.frame which has rows that contain mostly the same info, but some columns change. You want one row per unique value of key from x[,key.column], and in the columns that contain non-equal data, collapse these values into a single value, separated by “, ” or “ // ” for example.

Mark Cowley, 2009-01-07

uncollapse.rows

df <- data.frame(
   Name=rep(LETTERS[1:3], each=3), 
   Description=rep(letters[1:3], each=3),
   Value=LETTERS[11:19],
   stringsAsFactors=FALSE
)
a <- collapse.rows(df, 1, 3)
a
uncollapse.rows(a, 1, 3)