View source: R/classify_risk.R
| classify_risk | R Documentation |
Reads a CSV file containing sample metadata and assigns each sample to a risk category based on a specified scoring column. Supports built-in presets for seven major disease types, fully custom user-defined risk boundaries, or automatic classification using a normalised Risk Score derived from the data itself.
classify_risk(
file_path,
column_name,
disease_type = "auto",
n_groups = 3,
score_min = NULL,
score_max = NULL,
risk_groups = NULL,
output_dir = NULL
)
file_path |
Character. Path to the input CSV file containing sample metadata. |
column_name |
Character. Name of the column containing the grading or staging score (e.g., Gleason score, Nottingham score, TNM stage). |
disease_type |
Character. Disease type for built-in preset risk
groupings. Supported values: |
n_groups |
Integer. Number of risk groups to create. Only used when
|
score_min |
Numeric or NULL. Minimum possible value of the score. If NULL (default), automatically detected from the data. |
score_max |
Numeric or NULL. Maximum possible value of the score. If NULL (default), automatically detected from the data. |
risk_groups |
Named list of functions. Required only when
|
output_dir |
Character or NULL. Directory to save the output CSV file. If NULL (default), output is saved in the same directory as the input file. |
When disease_type = "auto", the function computes a normalised
Risk Score for each sample using min-max normalisation:
Risk Score = \frac{score - min(score)}{max(score) - min(score)}
The Risk Score ranges from 0 (lowest risk) to 1 (highest risk). Risk group boundaries are then determined automatically:
If the score distribution is approximately symmetric (skewness
between -0.5 and +0.5), equal-width boundaries are used, dividing
the 0-1 range into n_groups equal intervals.
If the score distribution is skewed (skewness outside -0.5 to +0.5), quantile-based boundaries are used, ensuring approximately equal numbers of samples per group.
The splitting method chosen is reported via a message. Risk group labels
are generated automatically based on n_groups.
Built-in presets use clinically validated risk stratification systems:
D'Amico classification (D'Amico et al., 1998): low_risk (<=6), intermediate_risk (7), high_risk (>=8).
Nottingham Prognostic Index (Galea et al., 1992): low_risk (3-5), intermediate_risk (6-7), high_risk (8-9).
Dukes-based risk (Dukes, 1932): low_risk (A), intermediate_risk (B/C), high_risk (D).
TNM stage-based (Goldstraw et al., 2016): low_risk (I), intermediate_risk (II/III), high_risk (IV).
FIGO stage-based (Bhatla et al., 2019): low_risk (I), intermediate_risk (II/III), high_risk (IV).
Ann Arbor/Lugano (Cheson et al., 2014): limited (I/II), advanced (III/IV).
Breslow depth (Breslow, 1970): low_risk (<=1.0mm), intermediate_risk (1.0-4.0mm), high_risk (>4.0mm).
A named list where each element corresponds to a risk group and contains the sample IDs belonging to that group. The number of elements matches the number of risk groups detected or specified.
D'Amico AV, et al. (1998). Biochemical outcome after radical prostatectomy. JAMA, 280(11):969-974.
Galea MH, et al. (1992). The Nottingham prognostic index. Breast Cancer Res Treat, 22(3):207-219.
Dukes CE. (1932). The classification of cancer of the rectum. J Pathol Bacteriol, 35:323-332.
Goldstraw P, et al. (2016). The IASLC Lung Cancer Staging Project. J Thorac Oncol, 11(1):39-51.
Bhatla N, et al. (2019). Revised FIGO staging for carcinoma of the cervix uteri. Int J Gynaecol Obstet, 145(1):129-135.
Cheson BD, et al. (2014). The Lugano Classification. J Clin Oncol, 32(27):3059-3068.
Breslow A. (1970). Thickness and depth of invasion in the prognosis of cutaneous melanoma. Ann Surg, 172(5):902-908.
# Auto mode - let the function decide risk grouping (any disease)
sample_file <- system.file("extdata", "sample_data.csv",
package = "RiskyCNV")
result <- classify_risk(
file_path = sample_file,
column_name = "gleason_score",
disease_type = "auto",
n_groups = 3,
output_dir = tempdir()
)
print(names(result))
# Prostate cancer preset
result_prostate <- classify_risk(
file_path = sample_file,
column_name = "gleason_score",
disease_type = "prostate",
output_dir = tempdir()
)
print(result_prostate$low_risk)
# Custom risk groups for any disease
result_custom <- classify_risk(
file_path = "samples.csv",
column_name = "risk_score",
disease_type = "custom",
risk_groups = list(
"low_risk" = function(x) x <= 5,
"high_risk" = function(x) x > 5
),
output_dir = tempdir()
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.