Convert ICD data from long to wide format

Note the distinction between labelling existing data with any classes which icd provides, and actually converting the structure of the data.

long_to_wide(
  x,
  visit_name = get_visit_name(x),
  icd_name = get_icd_name(x),
  prefix = "icd_",
  min_width = 1L
)

Arguments

x

data.frame of long-form data, one column for visit_name and one for ICD code

visit_name

The name of the column in the data frame which contains the patient or visit identifier. Typically this is the visit identifier, since patients come leave and enter hospital with different ICD-9 codes. It is a character vector of length one. If left empty, or NULL, then an attempt is made to guess which field has the ID for the patient encounter (not a patient ID, although this can of course be specified directly). The guesses proceed until a single match is made. Data frames may be wide with many matching fields, so to avoid false positives, anything but a single match is rejected. If there are no successful guesses, and visit_id was not specified, then the first column of the data frame is used.

icd_name

The name of the column in the data.frame which contains the ICD codes. This is a character vector of length one. If it is NULL, icd9 will attempt to guess the column name, looking for progressively less likely possibilities until it matches a single column. Failing this, it will take the first column in the data frame. Specifying the column using this argument avoids the guesswork.

prefix

character, default icd_ to prefix new columns

min_width,

single integer, if specified, writes out this many columns even if no patients have that many codes. Must be greater than or equal to the maximum number of codes per patient.

Details

This is more complicated than expected using base::reshape or reshape2::dcast allows. This is a reasonably simple solution using built-in functions.

Long and Wide Formats

As is common with many data sets, key variables can be concentrated in one column or spread over several. Tools format of clinical and administrative hospital data, we can perform the conversion efficiently and accurately, while keeping some metadata about the codes intact, e.g. whether they are ICD-9 or ICD-10.

Data structure

Long or wide format ICD data are all expected to be in a data frame. The data.frame itself does not carry any ICD classes at the top level, even if it only contains one type of code; whereas its constituent columns may have a class specified, e.g. icd9 or icd10who.

See also

Examples

longdf <- data.frame( visit_name = c("a", "b", "b", "c"), icd9 = c("441", "4424", "443", "441") ) long_to_wide(longdf)
#> visit_name icd_001 icd_002 #> 1 a 441 <NA> #> 2 b 4424 443 #> 4 c 441 <NA>
long_to_wide(longdf, prefix = "ICD10_")
#> visit_name ICD10_001 ICD10_002 #> 1 a 441 <NA> #> 2 b 4424 443 #> 4 c 441 <NA>