statistically_relevant_phenotypes_server: Filter statistically relevant phenotypes by comparing two...

View source: R/statistical_comparisons_server.R

statistically_relevant_phenotypes_serverR Documentation

Filter statistically relevant phenotypes by comparing two sample groups (Server Version)

Description

compute_statistically_relevant_phenotypes filters statistically relevant phenotypes by comparing two sample groups defined in sample_data.

Usage

statistically_relevant_phenotypes_server(
  output_folder,
  channel_file,
  sample_file,
  input_phenotype_counts = "combinatorial_phenotype_counts.csv",
  input_phenotype_counts_log = NULL,
  output_file = "significant_phenotypes.csv",
  log_file = "significant_phenotypes.log",
  test_type = "group",
  groups_column = NULL,
  g1 = NULL,
  g2 = NULL,
  correlation_column = NULL,
  survival_time_column = NULL,
  survival_status_column = NULL,
  max_pval = 0.05,
  parent_phen = NULL,
  continue = FALSE,
  n_threads = 1,
  verbose = FALSE
)

Arguments

output_folder

Path to folder where output files from this and previous steps should be saved.

channel_file

Path to a ".csv" file containing columns named: Channel, Marker, T1, [T2, T3, ... , Tn], [OOB].

sample_file

Path to a ".csv" file containing a Sample_ID column and additional grouping columns for the samples.

input_phenotype_counts

Name of the file inside output_folder with cell counts input. Default: "combinatorial_phenotype_counts.csv"

input_phenotype_counts_log

Name of the file inside output_folder with the cell counts log. Used to get the number of phenotypes in input_phenotype_counts in a faster way. If not provided, number of phenotypes will be read from csv file. Might not work for huge files. Default: NULL.

output_file

Output file. Default: "significant_phenotypes.csv".

log_file

Output log file. Default: "significant_phenotypes.log".

test_type

Type of statistical test to be performed. Value can be "group", "correlation", or "survival". Default: "group". Additional parameters should be provided accordingly, such as [groups_column, g1, g2] for "group", [correlation_column] for "correlation", and [survival_time_column, survival_status_column] for "survival". Parameters not used in the test will be ignored.

groups_column

Column name in sample_data where group identifications are stored.

g1

Group label or vector with group labels for first group.

g2

Group label or vector with group labels for second group.

correlation_column

Column name in sample_data where data to be correlated is stored.

survival_time_column

Column name in sample_data where survival time is stored.

survival_status_column

Column name in sample_data where survival status is stored.

max_pval

Maximum p-value. Used to filter phenotypes in final output.

parent_phen

Parent phenotype to filter for. All phenotypes in the output will contain the parent phenotype.

continue

If TRUE, look for files to resume execution. Also needed to save necessary "continuing" files.

n_threads

Number of threads to be used. Default: 1.

verbose

If TRUE, print outputs from log to stdout.

Details

This version reads the "output_folder/combinatorial_phenotype_counts.csv" by chunks and save the partial results to a file. It is intended to be used with large datasets which inputs wouldn't fit in the available memory.

Value

Output is saved to file.


SciOmicsLab/PhenoComb documentation built on Aug. 26, 2023, 1:28 p.m.