wide_to_long: Convert ICD data from wide to long format
In icd: Comorbidity Calculations and Tools for ICD-9 and ICD-10 Codes

Description Usage Arguments Details Value Long and Wide Formats Data structure See Also Examples

Convert ICD data from wide to long format

Note the distinction between labelling existing data with any classes which icd provides, and actually converting the structure of the data.

wide_to_long(
  x,
  visit_name = get_visit_name(x),
  icd_labels = NULL,
  icd_name = "icd_code",
  icd_regex = c("icd", "diag", "dx_", "dx")
)

`x`	`data.frame` in wide format, i.e. one row per patient, and multiple columns containing ICD codes, empty strings or NA.
`visit_name`	The name of the column in the data frame which contains the patient or visit identifier. Typically this is the visit identifier, since patients come leave and enter hospital with different ICD-9 codes. It is a character vector of length one. If left empty, or `NULL`, then an attempt is made to guess which field has the ID for the patient encounter (not a patient ID, although this can of course be specified directly). The guesses proceed until a single match is made. Data frames may be wide with many matching fields, so to avoid false positives, anything but a single match is rejected. If there are no successful guesses, and `visit_id` was not specified, then the first column of the data frame is used.
`icd_labels`	vector of column names in which codes are found. If NULL, all columns matching the regular expression `icd_regex` will be included.
`icd_name`	The name of the column in the `data.frame` which contains the ICD codes. This is a character vector of length one. If it is `NULL`, `icd9` will attempt to guess the column name, looking for progressively less likely possibilities until it matches a single column. Failing this, it will take the first column in the data frame. Specifying the column using this argument avoids the guesswork.
`icd_regex`	vector of character strings containing a regular expression to identify ICD-9 diagnosis columns to try (case-insensitive) in order. Default is `c("icd", "diag", "dx_", "dx")`

Reshaping data is a common task, and is made easier here by knowing more about the underlying structure of the data. This function wraps the reshape function with specific behavior and checks related to ICD codes. Empty strings and NA values will be dropped, and everything else kept. No validation of the ICD codes is done.

data.frame with visit_name column named the same as input, and a column named by icd.name containing all the non-NA and non-empty codes found in the wide input data.

As is common with many data sets, key variables can be concentrated in one column or spread over several. Tools format of clinical and administrative hospital data, we can perform the conversion efficiently and accurately, while keeping some metadata about the codes intact, e.g. whether they are ICD-9 or ICD-10.

Long or wide format ICD data are all expected to be in a data frame. The data.frame itself does not carry any ICD classes at the top level, even if it only contains one type of code; whereas its constituent columns may have a class specified, e.g. icd9 or icd10who.

Other ICD data conversion: comorbid_df_to_mat(), comorbid_mat_to_df(), convert, decimal_to_short(), long_to_wide(), short_to_decimal()

widedf <- data.frame(
  visit_name = c("a", "b", "c"),
  icd9_01 = c("441", "4424", "441"),
  icd9_02 = c(NA, "443", NA)
)
wide_to_long(widedf)

The 'icd9' package is deprecated, and should be removed to avoid conflicts with  'icd' . The 'icd' package up to version 2.1 contains tested versions of all the deprecated function names which overlap with those in the old 'icd9' package, e.g., 'icd9ComorbidAhrq' '. It is highly recommended to run the command: 'remove.packages("icd9")'
  visit_name icd_code
1          a      441
2          b     4424
3          b      443
4          c      441