run_SMC: Wrapper function for the Stratification of a Mutational...

Description Usage Arguments Value See Also Examples

Description

run_SMC takes as input a big dataframe constructed from a vcf-like file of a whole cohort. This wrapper function calls custom functions to construct a mutational catalogue and stratify it according to categories indicated by a special column in the input dataframe:

This stratification yields a collection of stratified mutational catalogues, these are reformatted and sent to the custom function SMC and thus indirectly to LCD_SMC to perform a signature analysis of the stratified mutational catalogues. The result is then handed over to plot_SMC for visualization.

Usage

1
2
3
4
5
6
7
run_SMC(my_table, this_signatures_df, this_signatures_ind_df, this_subgroups_df,
  column_name, refGenome, cohort_method_flag = "all_PIDs",
  in_strata_order_ind = seq_len(length(unique(my_table[, column_name]))),
  wordLength = 3, verbose_flag = 1, target_dir = NULL,
  strata_dir = NULL, output_path = NULL, in_all_exposures_df = NULL,
  in_rownames = c(), in_norms = NULL, in_label_orientation = "turn",
  this_sum_ind = NULL)

Arguments

my_table

A big dataframe constructed from a vcf-like file of a whole cohort. The first columns are those of a standard vcf file, followed by an arbitrary number of custom or user defined columns. One of these must carry a PID (patient or sample identifyier) and one must be the category used for stratification.

this_signatures_df

A numeric data frame W in with n rows and l columns, n being the number of features and l being the number of signatures

this_signatures_ind_df

A data frame containing meta information about the signatures

this_subgroups_df

A data frame indicating which PID (patient or sample identifyier) belongs to which subgroup

column_name

Name of the column in my_table which is going to be used for stratification

refGenome

FaFile of the reference genome to extract the motif context of the variants in my_table

cohort_method_flag

Either or several of c("all_PIDs","cohort","norm_PIDs"), representing alternative ways to average over the cohort.

in_strata_order_ind

Index vector defining reordering of the strata

wordLength

Integer number defining the length of the features or motifs, e.g. 3 for tripletts or 5 for pentamers

verbose_flag

Verbose if verbose_flag=1

target_dir

Path to directory where the results of the stratification procedure are going to be stored if non-NULL.

strata_dir

Path to directory where the mutational catalogues of the different strata are going to be stored if non-NULL

output_path

Path to directory where the results, especially the figures produced by plot_SMC are going to be stored.

in_all_exposures_df

Optional argument, if specified, H, i.e. the overall exposures without stratification, is set to equal in_all_exposures_df. This is equivalent to forcing the LCD_SMC procedure to use e.g. the exposures of a previously performed NMF decomposition.

in_rownames

Optional parameter to specify rownames of the mutational catalogue V i.e. the names of the features.

in_norms

If specified, vector of the correction factors for every motif due to differing trinucleotide content. If null, no correction is applied.

in_label_orientation

Whether or not to turn the labels on the x-axis.

this_sum_ind

Optional set of indices for reordering the PIDs

Value

A list with entries exposures_list, catalogues_list, cohort and name_list.

See Also

create_mutation_catalogue_from_df

normalizeMotifs_otherRownames

plot_SMC

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 library(BSgenome.Hsapiens.UCSC.hg19)
 data(sigs)
 data(lymphoma_test)
 data(lymphoma_cohort_LCD_results)
 strata_list <- 
   cut_breaks_as_intervals(lymphoma_test_df$random_norm,
                           in_outlier_cutoffs=c(-4,4),
                           in_cutoff_ranges_list=list(c(-2.5,-1.5),
                                                      c(0.5,1.5)),
                           in_labels=c("small","intermediate","big"))
 lymphoma_test_df$random_cat <- strata_list$category_vector
 choice_ind <- (names(lymphoma_Nature2013_COSMIC_cutoff_exposures_df) 
                %in% unique(lymphoma_test_df$PID))
 lymphoma_test_exposures_df <- 
   lymphoma_Nature2013_COSMIC_cutoff_exposures_df[,choice_ind]
 temp_subgroups_df <- make_subgroups_df(lymphoma_test_df,
                                        lymphoma_test_exposures_df)
 mut_density_list <- run_SMC(lymphoma_test_df,
                             AlexCosmicValid_sig_df,
                             AlexCosmicValid_sigInd_df,
                             temp_subgroups_df,
                             column_name="random_cat",
                             refGenome=BSgenome.Hsapiens.UCSC.hg19,
                             cohort_method_flag="norm_PIDs",
                             in_rownames = rownames(AlexCosmicValid_sig_df))

eilslabs/YAPSA documentation built on May 16, 2019, 1:23 a.m.