statistically_relevant_phenotypes_server: Filter statistically relevant phenotypes by comparing two...
In SciOmicsLab/PhenoComb: Phenotype Combinatorial Analysis

View source: R/statistical_comparisons_server.R

statistically_relevant_phenotypes_server

R Documentation

Filter statistically relevant phenotypes by comparing two sample groups (Server Version)

Description

compute_statistically_relevant_phenotypes filters statistically relevant phenotypes by comparing two sample groups defined in sample_data.

Usage

statistically_relevant_phenotypes_server(
  output_folder,
  channel_file,
  sample_file,
  input_phenotype_counts = "combinatorial_phenotype_counts.csv",
  input_phenotype_counts_log = NULL,
  output_file = "significant_phenotypes.csv",
  log_file = "significant_phenotypes.log",
  test_type = "group",
  groups_column = NULL,
  g1 = NULL,
  g2 = NULL,
  correlation_column = NULL,
  survival_time_column = NULL,
  survival_status_column = NULL,
  max_pval = 0.05,
  parent_phen = NULL,
  continue = FALSE,
  n_threads = 1,
  verbose = FALSE
)

Arguments

`output_folder`	Path to folder where output files from this and previous steps should be saved.
`channel_file`	Path to a ".csv" file containing columns named: Channel, Marker, T1, [T2, T3, ... , Tn], [OOB].
`sample_file`	Path to a ".csv" file containing a Sample_ID column and additional grouping columns for the samples.
`input_phenotype_counts`	Name of the file inside `output_folder` with cell counts input. Default: "combinatorial_phenotype_counts.csv"
`input_phenotype_counts_log`	Name of the file inside `output_folder` with the cell counts log. Used to get the number of phenotypes in `input_phenotype_counts` in a faster way. If not provided, number of phenotypes will be read from csv file. Might not work for huge files. Default: NULL.
`output_file`	Output file. Default: "significant_phenotypes.csv".
`log_file`	Output log file. Default: "significant_phenotypes.log".
`test_type`	Type of statistical test to be performed. Value can be "group", "correlation", or "survival". Default: "group". Additional parameters should be provided accordingly, such as [groups_column, g1, g2] for "group", [correlation_column] for "correlation", and [survival_time_column, survival_status_column] for "survival". Parameters not used in the test will be ignored.
`groups_column`	Column name in `sample_data` where group identifications are stored.
`g1`	Group label or vector with group labels for first group.
`g2`	Group label or vector with group labels for second group.
`correlation_column`	Column name in `sample_data` where data to be correlated is stored.
`survival_time_column`	Column name in `sample_data` where survival time is stored.
`survival_status_column`	Column name in `sample_data` where survival status is stored.
`max_pval`	Maximum p-value. Used to filter phenotypes in final output.
`parent_phen`	Parent phenotype to filter for. All phenotypes in the output will contain the parent phenotype.
`continue`	If TRUE, look for files to resume execution. Also needed to save necessary "continuing" files.
`n_threads`	Number of threads to be used. Default: 1.
`verbose`	If TRUE, print outputs from log to stdout.

Details

This version reads the "output_folder/combinatorial_phenotype_counts.csv" by chunks and save the partial results to a file. It is intended to be used with large datasets which inputs wouldn't fit in the available memory.