This is the function which prepares the input data for the categorization, and forms the core of the package, along with the C++ matrix code. This is pure data manipulation and generalizable beyond medical data.

categorize_simple(
  x,
  map,
  id_name,
  code_name,
  return_df = FALSE,
  return_binary = FALSE,
  restore_id_order = TRUE,
  preserve_id_type = FALSE,
  comorbid_fun = comorbid_mat_mul_wide,
  ...
)

Arguments

x

Data frame containing a column for an 'id' and a column for a code, e.g., an ICD-10 code.

map

named list containing vectors of ICD-9 codes. E.g. the AHRQ ICD-9 comorbidities, contains list(OBESE = c("2780", "27800", "27801", "27803", "V8554", "79391", "64910", "64911", "64912", "64913", "64914", "V8530", "V8531", "V8532", "V8533", "V8534", "V8535", "V8536", "V8537", "V8538", "V8539", "V8541", "V8542", "V8543", "V8544", "V8545" ), DEPRESS = c("3004", "30112", "3090", "3091", "311")) amongst other longer groups.

id_name

The name of the data.frame field which is the unique identifier.

code_name

String with name(s) of column(s) containing the codes.

return_df

single logical value, if TRUE, return 'tidy' data, i.e., the result is a data frame with the first column being the visit_id, and the second being the count. If visit_id was a factor or named differently in the input, this is preserved.

return_binary

Logical value, if TRUE, the output will be in 0s and 1s instead of TRUE and FALSE.

restore_id_order

Logical value, if TRUE, the default, the order of the visit IDs will match the order of visit IDs first encountered in the input data. This takes a third of the time in calculations on data with tens of millions of rows, so, if the visit IDs will be discarded when summarizing data, this can be set to FALSE for a big speed-up.

preserve_id_type

Single logical value, if TRUE, the visit ID column will be converted back to its original type. The default of FALSE means only factors and character types are restored in the returned data frame. For matrices, the row names are necessarily stored as character vectors.

comorbid_fun

function i.e. the function symbol (not character string) to be called to do the comorbidity calculation

...

arguments passed on to other functions

Details

The roadmap for icd includes the optimized categorization component being packaged independently, and the comorbidity package taking on the front-end for doing ICD-code-based comorbidities. This is in discussion.

Examples

u <- uranium_pathology m <- icd10_map_ahrq u$icd10 <- decimal_to_short(u$icd10) j <- icd:::categorize_simple(u, m, id_name = "case", code_name = "icd10")