emr: An analysis tool for diagnosis and procedure records

#' Common argument of diagnostic functions
#'
#' Common argument of diagnostic functions
#'
#' Common argument of diagnostic functions
#' @name common_DxArg
#' @import data.table
#' @param dxDataFile A data frame object of clinical diagnostic data with at least 3 columns: ID, ICD, and Date. As for date column, the data format should be YYYY/MM/DD or YYYY-MM-DD.
#' @param idColName Column name of ID column in dxDataFile. Data type of this argumant should be string without quotation marks.
#' @param icdColName Column name of ICD column in dxDataFile. Data type of this argumant should be string without quotation marks.
#' @param dateColName Column name of date column in dxDataFile, and the type of date column should be a date format in R or a string format with date information in YYYY/MM/DD or YYYY-MM-DD. Data type of this argumant should be string without quotation marks.
#' @param icdVerColName (Optional) Column name if there is a columns to record ICD-9/10 version used in dxDataFile. In this column, data format should be numeric 9L or 10L to indicate which ICD version is used for each cell. See examples below to get more information.
#' @param icd10usingDate The date that ICD-10 was started to be used in dxDataFile dataset. The data format should be YYYY/MM/DD or YYYY-MM-DD. Necessary if icdVerColName is null.
#' @param isDescription Binary. If true, category description of classification methods will be used in the group column. If false, category name will be used. By default, it is set to be \code{True} (standard category description).
#' @param groupDataType Five Stratified methods can be chosen: CCS (\code{ccs}), multiple-level CCS (\code{ccslvl1}, \code{ccslvl2}, CCSR (\code{ccsr}),\code{ccslvl3}, \code{ccslvl4}), PheWAS (\code{PheWAS}), comorbidities (\code{ahrq},\code{charlson}, \code{elix}), precise or fuzzy customized  method (\code{customGrepIcdGroup}, \code{customIcdGroup}). The value should be string stated above without quotation mark. Default value is \code{ccs}. When conducting cases selection by un-grouped ICD codes, then use the method: ICD (\code{ICD}).
#' @param customGroupingTable Used-defined grouping categories. \code{icdDxToCustom} needs a dataset with two columns called "Group" and "ICD", respectively; User can define one or more disease categories in "Group" column, and define a list of corresponding category-related ICD codes in "ICD" column. \code{icdDxToCustomGrep} needs a dataset with two columns: "Group", "grepIcd"; "Group" defines one or more disease categories and "grepICD" defines disease-related ICD code character strings containing regular expressions.
#' @param selectedCaseFile A data frame with the label of case selected or not. Can be generated by \code{\link{selectCases}} function or a self-defined data frame (as long as the column names and data type are the same as the output of selectCases function, but not recommended). Default is \code{'NULL'}
NULL

#' Code format transformation
#'
#' These two functions can convert the ICD diagnostic codes to a uniform format.
#'
#' \code{icdDxShortToDecimal} can be used for grouping diagnostic code to PheWAS classification (\code{\link{dxPheWAS}}: icdDxToPheWAS). \code{icdDxDecimalToShort} can be used for grouping to the other classification methods (\code{\link{dxCCS}}: icdDxToCCS & icdDxToCCSLvl; \code{\link{dxComorbid}}: icdDxToComorbid). These transformation functions do not only convert the ICD codes to uniform format but also check potential coding error of the ICD format and version.
#'
#'
#' @name dxUniform
#' @inherit common_DxArg
#' @param dxDataFile A data frame object of clinical diagnostic data with at least 2 columns: ICD, and Date. As for date column, the data format should be YYYY/MM/DD or YYYY-MM-DD.
#' @importFrom utils head
#' @return Two new \code{data.table}s. 1) \code{ICD}: Uniform format diagnostic codes with column name "ICD". 2) \code{Error}: Potential error codes with 5 columns: ICD, count, IcdVersionInFile, WrongType and Suggestion.
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' # convert the diagnostic codes to the decimal format
#'
#' icdDxShortToDecimal(sampleDxFile,ICD,Date, icd10usingDate = "2015/10/01")
#'
#' # convert the diagnostic codes to the short format
#'
#' icdDxDecimalToShort(sampleDxFile,ICD,Date, icd10usingDate = "2015/10/01")
NULL

#' Code classification for CCS
#'
#' These CCS functions (\code{icdDxToCCS} and \code{icdDxToCCSLvl}) collapse ICD codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than individual ICD diagnostic codes are.
#' CCS classification for ICD-9 and ICD-10 codes is a diagnostic categorization scheme that can employ in many types of projects analyzing data on diagnoses.
#'
#' Notice: CCS stopped updating since 2019, replacing by Clinical Classifications Software Refined (CCSR). \code{\link{dxCCSR}} (icdDxToCCSR) function is also provided.
#'
#' @name dxCCS
#' @inherit common_DxArg
#' @param CCSLevel Numeric. Used for multi-level CCS. By default, it is set to \code{1}. There is 4 multi-level CCS (1~4) for ICD-9, and 2 multi-level CCS (1 and 2) for ICD-10.
#' @return Three new \code{data.table}s. 1) \code{groupedDT}: Based on \code{dxDataFile} with two new columns for uniform format diagnostic codes and classified categories. 2) \code{summarised_groupedDT}: Summarized the \code{groupedDT} dataset and sorted by memberID. 3) \code{Error}: Potential error codes from standardization step: \code{\link{dxUniform}} (icdDxShortToDecimal and icdDxDecimalToShort).
#' @seealso Other code classification functions: \code{\link{dxCCSR}} (icdDxToCCSR), \code{\link{dxPheWAS}} (icdDxToPheWAS), \code{\link{dxCustom}} (icdDxToCustom and icdDxToCustomGrep), \code{\link{dxComorbid}} (icdDxToComorbid).
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' # Group diagnostic codes into single level of CCS classification
#'
#' icdDxToCCS(sampleDxFile, ID, ICD, Date, icd10usingDate =  "2015-10-01", isDescription = TRUE)
#'
#' # Group diagnostic codes into multiple levels of CCS classification
#'
#' icdDxToCCSLvl(sampleDxFile, ID, ICD, Date, icdVerColName = NULL, "2015-10-01", 2, TRUE)
NULL

#' Code classification for CCSR
#'
#' The CCSR function (\code{icdDxToCCSR} collapses ICD-10 codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than individual ICD-10 diagnostic codes are.
#' CCSR classification for ICD-10 codes is a diagnostic categorization scheme that can employ in many types of projects analyzing data on diagnoses.
#'
#' Notice: CCSR is only applicable to ICD-10. To process ICD-9 data, please use \code{\link{dxCCS}} (icdDxToCCS) function.
#'
#' @name dxCCSR
#' @inherit common_DxArg
#' @return Three new \code{data.table}s. 1) \code{groupedDT}: Based on \code{dxDataFile} with two new columns for uniform format diagnostic codes and classified categories. 2) \code{summarised_groupedDT}: Summarized the \code{groupedDT} dataset and sorted by memberID. 3) \code{Error}: Potential error codes from standardization step: \code{\link{dxUniform}} (icdDxShortToDecimal and icdDxDecimalToShort).
#' @seealso Other code classification functions: \code{\link{dxCCS}} (icdDxToCCS), \code{\link{dxPheWAS}} (icdDxToPheWAS), \code{\link{dxCustom}} (icdDxToCustom and icdDxToCustomGrep), \code{\link{dxComorbid}} (icdDxToComorbid).
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' # Group diagnostic codes into single level of CCSR classification
#'
#' icdDxToCCSR(sampleDxFile, ID, ICD, Date, icdVerCol = Version, isDescription = TRUE)
NULL


#' Code classification for PheWAS
#'
#' The PheWAS classification for ICD-9-CM codes is a diagnostic categorization scheme that can employ in many types of projects analyzing data on diagnoses.
#'
#' Collapse ICD codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than are individual ICD diagnostic codes.
#'
#' @name dxPheWAS
#' @inherit common_DxArg
#' @return Three new \code{data.table}s. 1) \code{groupedDT}: Based on \code{dxDataFile} with two new columns for uniform format diagnostic codes and classified categories. 2) \code{summarised_groupedDT}: Summarized the \code{groupedDT} dataset and sorted by memberID. 3) \code{Error}: Potential error codes from standardization step: \code{\link{dxUniform}} (icdDxShortToDecimal and icdDxDecimalToShort).
#' @seealso Other code classification functions: \code{\link{dxCustom}} (icdDxToCustom and icdDxToCustomGrep), \code{\link{dxComorbid}} (icdDxToComorbid), \code{\link{DxCCS}} (icdDxToCCS and icdDxToCCSLvl)
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' # Group diagnostic codes into PheWAS
#'
#' icdDxToPheWAS(sampleDxFile, ID, ICD, Date, icdVerColName = NULL, "2015-10-01", FALSE)
NULL

#' Code classification for customized group
#'
#' Researches can define the grouping categories and therefore have more flexible for grouping ICD diagnostic codes.
#'
#' There are two functions for customized defined grouping method, the customized category grouping is based on precise (`icdDxToCustom`) and fuzzy (`icdDxToCustomGrep`) grouping method, respectively.
#'
#' @name dxCustom
#' @inherit common_DxArg
#' @return Two new \code{data.table}s. 1) \code{groupedDT}: Based on \code{dxDataFile} with two new columns for uniform format diagnostic codes and classified standard categories. 2) \code{summarised_groupedDT}: Summarized the dataset  \code{groupedDT} and sorted by memberID.
#' @seealso Other code classification functions: \code{\link{dxPheWAS}}, \code{\link{dxComorbid}}, \code{\link{DxCCS}}
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' # Group diagnostic codes into "Chronic kidney disease" with precise grouping method
#'
#' groupingTable <- data.frame(Group = rep("Chronic kidney disease",6),
#'                             ICD = c("N181","5853","5854","5855","5856","5859"),
#'                             stringsAsFactors = FALSE)
#'
#' icdDxToCustom(sampleDxFile, ID, ICD, Date, customGroupingTable = groupingTable)
#'
#' # Group diagnostic codes into "Chronic kidney disease" with fuzzy grouping method
#'
#' grepTable <- data.frame(Group = "Chronic kidney disease",
#'                         grepIcd = "^585|^N18",
#'                         stringsAsFactors = FALSE)
#'
#' icdDxToCustomGrep(sampleDxFile, ID, ICD, Date, customGroupingTable = grepTable)
#'
NULL

#' Code classification for Comorbidity
#'
#' The comorbidities classification (AHRQ, Charlson ,and Elixhauser Comorbidity) for ICD diagnostic codes is a diagnostic categorization scheme that can employ in many types of projects analyzing data on diagnoses.
#'
#' Collapse ICD codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than are individual ICD diagnostic codes.
#'
#' @name dxComorbid
#' @inherit common_DxArg
#' @param comorbidMethod Three comorbidity methods: AHRQ, Charlson and Elixhauser Comorbidity. Change it to any of the other possible variables (\code{ahrq},\code{charlson}, and \code{elix}).
#' @param isDescription Category name or category description of standard classification methods for ICD diagnostic codes. By default it is set to \code{FALSE} (Comorbidity categories).
#' @return Three new \code{data.table}s. 1) \code{groupedDT}: Based on \code{dxDataFile} with two new columns for uniform format diagnostic codes and classified categories. 2) \code{summarised_groupedDT}: Summarized the \code{groupedDT} dataset and sorted by memberID. 3) \code{Error}: Potential error codes from standardization step: \code{\link{dxUniform}} (`icdDxShortToDecimal` and `icdDxDecimalToShort`).
#' @seealso Other code classification functions: \code{\link{dxPheWAS}}, \code{\link{dxCustom}}, \code{\link{DxCCS}}
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' # Group diagnostic codes into charlson comorbidity categories
#'
#' icdDxToComorbid(sampleDxFile, ID, ICD, Date, icdVerColName = NULL, "2015-10-01", charlson)
NULL

#' Data integration for case selection
#'
#' This query function can select the cases matching defined conditions for analyses.
#'
#' User can select cases by diagnostic categories, such as CCS category, ICD codes, etc. The function also provides the options to set the minimum number of diagnoses within a specific duration. The output dataset can be passed to `groupedDataLongToWide` to create tables in wide format for statistical analytic usage.
#'
#' @name selectCases
#' @inherit common_DxArg
#' @param caseCondition Certain diseases to be selected. The condition can be specific ICD, CCS category description, etc. String with regular expression is also supported.
#' @param caseCount  Minimum number of diagnoses time to be selected. If \code{caseCount} = \code{2}, then only patients who had been diagnosed twice (or above) would be selected. Default value is 1.
#' @param PeriodRange Determine duration of interest for performing the case selection. By default, it is set from 30 to 365 days (with argument \code{c(30,365)}). The lower bound and the upper of the wanted duration should be coded as a vector.
#' @param caseName Value to identify selected or not. The value will be filled in the labeling column called \code{selectedCase}. By default, it is set to be \code{"selected"}.
#' @return A new \code{data.table} based on standard classification dataset with a new column: \code{selectedCase}, in which each cell is labeled as selected or not. If the patient was diagnosed with certain diseases, but the selection condition is not satisfied, then the \code{selectedCase} cell will be labeled with a star (*).
#' @seealso Other data integration functions: \code{\link{splitDataByDate}}, \code{\link{getEligiblePeriod}}, \code{\link{getConditionEra}}
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' #select case with "Diseases of the urinary system" by level 2 of CCS classification
#'
#' selectCases(dxDataFile = sampleDxFile,
#'             ID, ICD, Date,
#'             icdVerColName = NULL,
#'             groupDataType = ccslvl2,
#'             icd10usingDate = "2015/10/01",
#'             caseCondition = "Diseases of the urinary system",
#'             caseCount = 1)
NULL

#' Data integration for data split
#'
#' This function splits data by the date of the clinical event and shows the data recorded before or after the clinical event, and calculates the period between the record date and index date based on a self-defined window gap.
#'
#' In most condition, users need to extract data by a specific clinical event (e.g., first diagnosis dates of chronic diseases). Users can define a table of clinical index dates of each patient. The date can be generated by \code{\link{selectCases}} or first/last admission date by \code{\link{getEligiblePeriod}}.
#'
#' @name splitDataByDate
#' @inherit common_DxArg
#' @param gap Gap length of the window. Default is \code{30} (data will be seperated every 30 days).
#' @param indexDateFile A data frame contained index dates for each patient in an observed period. The column names should be 'ID' and 'indexDate'.
#' @return A new \code{data.table} based on \code{dxDataFile} and classified by \code{indexDateFile} for each patient
#' @seealso Other data integration functions: \code{\link{selectCases}}, \code{\link{getEligiblePeriod}}, \code{\link{getConditionEra}}
#' @examples
#' # sample file for example
#'
#' SampleforCertainPatient <- sampleDxFile[grepl("A0|B0|C0|D0",ID),]
#'
#' head(SampleforCertainPatient)
#'
#' # Defined index date of patient A0,B0,C0 and D0
#'
#' indexDateTable <- data.frame(ID = c("A0","B0","C0","D0"),
#'                              indexDate = c("2023-08-12", "2024-02-12",
#'                                            "2015-12-05", "2017-01-29"),
#'                              stringsAsFactors = FALSE)
#' indexDateTable
#' # Split data by index date for each patient
#'
#' splitedData <- splitDataByDate(SampleforCertainPatient, ID, ICD, Date,
#'                                indexDateFile = indexDateTable,
#'                                gap = 30)
#' splitedData[15:19,]
NULL

#' Data integration for patient record period
#'
#' \code{getEligiblePeriod} is a function to find the first and last clinical event for a given patient. The result can be index date in the folloing process such as \code{\link{splitDataByDate}}.
#'
#' The function queries the earliest and latest admission date for each patient.
#'
#' @name getEligiblePeriod
#' @inherit common_DxArg
#' @return A new \code{data.table} based on \code{dxDataFile} with the earliest and latest admission date for each patient.
#' @seealso Other data integration functions: \code{\link{selectCases}}, \code{\link{splitDataByDate}}, \code{\link{getConditionEra}}
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' # Earliest and latest admission date for each patient
#'
#' record <- getEligiblePeriod(sampleDxFile, ID, Date)
#' head(record)
NULL

#' Data integration for condition era calculation
#'
#' Conditions era is used to integrate distributed data of clinical records into a single progression record.
#'
#' This function calculates condition era by grouped categories of each patient.
#'
#' @name getConditionEra
#' @importFrom stats na.omit
#' @inherit common_DxArg
#' @param gapDate Length of condition gap with numeric data type, By default, it set to 30 days \code{"30"}.
#' @return A new \code{data.table} based on classifying \code{dxDataFile} and calculated condition era by \code{groupDataType} for each patient.
#' @seealso Other data integration functions: \code{\link{selectCases}}, \code{\link{splitDataByDate}}, \code{\link{getEligiblePeriod}}
#' @examples
#' # sample file for example
#'
#' head(sampleDxFile)
#'
#' # Select case with "Diseases of the urinary system" by level 2 of CCS classification
#'
#' selectedCaseFile <- selectCases(sampleDxFile, ID, ICD, Date,
#'                                 icdVerColName = NULL,
#'                                 icd10usingDate = "2015/10/01",
#'                                 groupDataType = ccslvl2,
#'                                 caseCondition = "Diseases of the urinary system",
#'                                 caseCount = 1)
#'
#' # Condition era calculation with case selection
#'
#' Era1 <- getConditionEra(sampleDxFile, ID, ICD, Date, icdVerColName = NULL, "2015-10-01",
#'                         groupDataType = CCSlvl3,
#'                         selectedCaseFile = selectedCaseFile)
#' head(Era1)
#'
#'# Define the grouping categories
#'
#' grepTable <- data.frame(Group = "Chronic kidney disease",
#'                         grepIcd = "^58|^N18",
#'                         stringsAsFactors = FALSE)
#'
#' # Condition era calculation with grouping custom method of code standardization
#'
#' Era2 <- getConditionEra(sampleDxFile, ID, ICD, Date, icdVerColName = NULL, "2015-10-01",
#'                         groupDataType = customGrepIcdGroup,
#'                         customGroupingTable = grepTable)
#' head(Era2)
NULL

#' Data format transformation
#'
#' This function converts the long format of grouped data into a wide format which is fit to other analytical and plotting packages.
#'
#' The output of this function can be numeric or binary wide format.
#'
#' @name groupedDataLongToWide
#' @inherit common_DxArg
#' @param dxDataFile a \code{groupedDT} format data frame generated from four strategies of code classification (using unprocessed ICD codes as input is not recommended).
#' @param numericOrBinary The wide format data with or without certain diagnostic categories can be marked as binary \code{True}/\code{False} or numeric diagnosi time counts. Type N or B (character without quotation mark) to specify one of the method. Default is Binary \code{B}.
#' @return A new \code{data.table} based on classified \code{dxDataFile} dataset and converted into a wide format dataset.
#' @examples
#'
#' # Create a grouped data
#'
#' ELIX <- icdDxToComorbid(dxDataFile = sampleDxFile,
#'                        idColName = ID,
#'                        icdColName = ICD,
#'                        dateColName = Date,
#'                        icd10usingDate = "2015/10/01",
#'                        comorbidMethod = elix)
#'
#' head(ELIX$groupedDT)
#'
#' # Select case with "Diseases of the urinary system" by level 2 of CCS classification
#'
#' selectedCaseFile <- selectCases(dxDataFile = sampleDxFile,
#'                                 idColName = ID,
#'                                 icdColName = ICD,
#'                                 dateColName = Date,
#'                                 icdVerColName = NULL,
#'                                 icd10usingDate = "2015/10/01",
#'                                 groupDataType = ccslvl2,
#'                                 caseCondition = "Diseases of the urinary system",
#'                                 caseCount = 1)
#'
#'# Convert the long format of grouped data into a wide binary format with selected case
#'
#' groupedDataWide <- groupedDataLongToWide(ELIX$groupedDT,
#'                                          idColName = ID,
#'                                          categoryColName = Comorbidity,
#'                                          dateColName = Date,
#'                                          selectedCaseFile = selectedCaseFile)
#' groupedDataWide
NULL

#' Plot for error ICD list
#'
#' Pareto chart of error ICD list
#'
#' Through first phase function, code standardization, it detects diagnosis codes with potential error.
#' The Pareto chart includes bar plot and line chart to visualize individual possible error ICD codes represented in descending order and cumulative total.
#'
#' @import data.table
#' @import ggplot2
#' @importFrom stats reorder
#' @name plotICDError
#' @param errorFile Error file (a data frame) from ICD uniform functions \code{dxUniform} (icdDxDecimalToShort or icdDxShortToDecimal)
#' @param ICDVersion Certain version that interested. The argument should be string data type: 9, 10 or all (without quotation mark).
#' @param wrongICDType Certain wrong type that interested. The value can either version, format, or all, and this argument is the also string data type without quotation mark.
#' @param groupICD Binary data type. Only ICD-9 codes can group because ICD 10 already has unique alphanumeric codes to identify known diseases. Default is FALSE.
#' @param others Default is TRUE
#' @param topN Numeric argument. Default is 10 (Top 10; 10 most common wrong ICD).
#' @return A Pareto plot and a \code{data.table} of statistical information about error codes.
#' @seealso other plot function: \code{\link{plotDiagCat}}
#' @examples
#' # sample file for example
#' head(sampleDxFile)
#'
#' # Data of diagnosis codes with potential error
#'
#' error <- icdDxDecimalToShort(sampleDxFile, ICD, Date, icdVerColName = NULL, "2015/10/01")
#'
#' # Plot of top 3 common error ICD-9 codes and a list of the detail of error ICD codes
#'
#' plotICDError(errorFile = error$Error,
#'               icdVersion = 9,
#'               wrongICDType = all,
#'               groupICD = TRUE,
#'               others = TRUE,
#'               topN = 3)
#'
#' # Plot of top 10 common error ICD codes and a list of the detail of error ICD codes
#'
#' plotICDError(errorFile = error$Error,
#'               icdVersion = all,
#'               wrongICDType = all,
#'               groupICD = FALSE,
#'               others = TRUE)
#'
NULL

#' Plot of diagnostic categories
#'
#' Histogram plot of diagnostic categories
#'
#' This function provides an overview of grouping category of the diagnostic code in histogram plot. User can observe the proportion of diagnostic categories in their dataset through this function.
#' Also, Chi-square test and Fisher’s exact test are also included in this function. User can test if the proportion of each diagnostic category in case group and control group are statistical significantly different.
#'
#' @import data.table
#' @import ggplot2
#' @importFrom stats chisq.test
#' @importFrom stats fisher.test
#' @name plotDiagCat
#' @param groupedDataWide Wide table of data frame (generated from \code{\link{groupedDataLongToWide}} function).
#' @param topN Numeric. Default is 10 (Top 10; 10 most common wrong ICD).
#' @param limitFreq Numeric. minimum frequency shown (frequency below this threshold will not be shown in plot). Default is 0.01. In other words, the threshold is 1 percent patient among total patients been diagnosed in the same diagnostic category.
#' @param pvalue Numeric. p value of chisq.test. Default is 0.05.
#' @return A histogram plot and a \code{data.table} of summarized classified data.
#' @seealso Other plot function: \code{\link{plotICDError}}
#' @examples
#' # sample file for example
#' head(sampleDxFile)
#'
#' # Create a grouped data
#'
#' ELIX <- icdDxToComorbid(dxDataFile = sampleDxFile,
#'                        idColName = ID,
#'                        icdColName = ICD,
#'                        dateColName = Date,
#'                        icd10usingDate = "2015/10/01",
#'                        comorbidMethod = elix)
#'
#' head(ELIX$groupedDT)
#'
#' # Convert long format of grouped data into wide binary format
#'
#' groupedDataWide <- groupedDataLongToWide(ELIX$groupedDT,
#'                                          idColName = ID,
#'                                          categoryColName = Comorbidity,
#'                                          dateColName = Date)
#'
#' # plot of top 10 common grouped categories and a list of the detail of grouped categories
#'
#' plot1 <- plotDiagCat(groupedDataWide = groupedDataWide,
#'                           idColName = ID,
#'                           topN = 10,
#'                           limitFreq = 0.01)
#' plot1
#'
#' # Select case with "Diseases of the urinary system" by level 2 of CCS classification
#'
#' selectedCaseFile <- selectCases(dxDataFile = sampleDxFile,
#'                                 idColName = ID,
#'                                 icdColName = ICD,
#'                                 dateColName = Date,
#'                                 icdVerColName = NULL,
#'                                 icd10usingDate = "2015/10/01",
#'                                 groupDataType = ccslvl2,
#'                                 caseCondition = "Diseases of the urinary system",
#'                                 caseCount = 1)
#'
#'# Convert the long format of grouped data into a wide binary format with selected case
#'
#' groupedDataWide <- groupedDataLongToWide(ELIX$groupedDT,
#'                                          idColName = ID,
#'                                          categoryColName = Comorbidity,
#'                                          dateColName = Date,
#'                                          selectedCaseFile = selectedCaseFile)
#'
#' # plot of top 10 common grouped categories and a list of the detail of grouped categories
#'
#' plot2 <- plotDiagCat(groupedDataWide = groupedDataWide,
#'                           idColName = ID,
#'                           topN = 10,
#'                           limitFreq = 0.01,
#'                           pvalue = 0.05,
#'                           groupColName = selectedCase)
#' plot2
NULL

#' Common argument of procedure functions
#'
#' Common argument of procedure functions
#'
#' Common argument of procedure functions
#' @name common_PrArg
#' @import data.table
#' @param prDataFile A data frame object of clinical procedure data with at least 3 columns: ID, ICD, and Date. As for date column, the data format should be YYYY/MM/DD or YYYY-MM-DD.
#' @param idColName Column name of ID column in prDataFile. Data type of this argumant should be string without quotation marks.
#' @param icdColName Column name of ICD column in prDataFile. Data type of this argumant should be string without quotation marks.
#' @param dateColName Column name of date column in prDataFile (with date in YYYY/MM/DD or YYYY-MM-DD format). Data type of this argumant should be string without quotation marks.
#' @param icdVerColName (Optional) Column name of ICD-9/10 version recorded in prDataFile. Data format in this column should be numeric 9L or 10L.
#' @param icd10usingDate The date that ICD-10 was started to be used in prDataFile dataset. The data format should be YYYY/MM/DD or YYYY-MM-DD. Necessary if icdVerColName is null.
#' @param isDescription Binary. If true, category description of classification methods will be used in the group column. If false, category name will be used. By default, it is set to be \code{True} (standard category description).
NULL
NULL

#' Code format transformation
#'
#' These two functions can convert the ICD diagnostic codes to a uniform format.
#'
#' These transformation functions do not only convert the ICD to uniform format, but also check the potential coding error of the ICD codes’ format or version.
#'
#' @inherit common_PrArg
#' @name prUniform
#' @param prDataFile A data frame object of clinical procedure data with at least 2 columns: ICD, and Date. As for date column, the data format should be YYYY/MM/DD or YYYY-MM-DD.
#' @return Two new \code{data.table}s. 1) \code{ICD}: Uniformly formatted procedure codes. 2) \code{Error}: Potential error codes.
#' @examples
#' # sample file for example
#'
#' head(samplePrFile)
#'
#' # convert the procedure codes to the short format
#'
#' icdPrDecimalToShort(samplePrFile,ICD,Date, icdVerColName = NULL, "2015/10/01")
#'
#' # convert the procedure codes to the decimal format
#'
#' icdPrShortToDecimal(samplePrFile,ICD,Date, icdVerColName = NULL, "2015/10/01")
NULL

#' Code classification for CCS
#'
#' The CCS classification for ICD-9 and ICD-10 codes is a procedure categorization scheme that can employ in many types of projects analyzing data on procedures
#'
#' These CCS functions collapse ICD codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than individual ICD procedure codes are.
#'
#' @importFrom stats complete.cases
#' @name prCCS
#' @inherit common_PrArg
#' @param CCSLevel Numeric. Used for multi-level CCS. By default, it is set to \code{1}. There is 3 multi-level CCS (1~3) for ICD-9, and 2 multi-level CCS (1 and 2) for ICD-10.
#' @return Two new \code{data.table}s. 1) \code{groupedDT}: Based on \code{prDataFile} with two new columns for uniform format procedure codes and classified standard categories. 2) \code{Error}: Potential error codes from \code{\link{prUniform}}.
#' @seealso see other code classification: \code{\link{PC}} (icdPrToProcedureClass).
#' @examples
#' # sample file for example
#'
#' head(samplePrFile)
#'
#' # Group procedure codes into single level of CCS classification
#'
#' icdPrToCCS(samplePrFile, ID, ICD, Date, icdVerColName = NULL, "2015-10-01", TRUE)
#'
#' # Group procedure codes into multiple levels of CCS classification
#'
#' icdPrToCCSLvl(samplePrFile, ID, ICD, Date, icdVerColName = NULL, "2015-10-01", 2, TRUE)
#'
NULL

#' Code classification for procedure code
#'
#' The Procedure Class classification for ICD-9 and ICD-10 codes is a procedure categorization scheme that can employ in many types of projects to analyze data.
#'
#' This function collapse ICD codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than individual ICD procedure codes are.
#'
#' @importFrom stats complete.cases
#' @name PC
#' @inherit common_PrArg
#' @return Two new \code{data.table}s. 1) \code{groupedDT}: Based on \code{prDataFile} with two new columns for uniform format procedure codes and classified standard categories. 2) \code{Error}: Potential error codes from \code{\link{prUniform}}.
#' @seealso see other code classification: \code{\link{prCCS}}
#' @examples
#' # sample file for example
#'
#' head(samplePrFile)
#'
#' # Group procedure codes into procedure class classification
#'
#' icdPrToProcedureClass(samplePrFile, ID, ICD, Date, icdVerColName = NULL, "2015-10-01", TRUE)
#'
NULL
DHLab-CGU/emr documentation built on Sept. 2, 2023, 9:16 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
DHLab-CGU/emr
An analysis tool for diagnosis and procedure records

R/fun_emr.R
In DHLab-CGU/emr: An analysis tool for diagnosis and procedure records

R Package Documentation

Browse R Packages

We want your feedback!

DHLab-CGU/emr An analysis tool for diagnosis and procedure records

R/fun_emr.R In DHLab-CGU/emr: An analysis tool for diagnosis and procedure records

R Package Documentation

Browse R Packages

We want your feedback!

DHLab-CGU/emr
An analysis tool for diagnosis and procedure records

R/fun_emr.R
In DHLab-CGU/emr: An analysis tool for diagnosis and procedure records