v_prepareVdata_CHM_T: Prepare data for module: CHM_T (Transcript)

Description Usage Arguments Details Value

View source: R/module_CHM_T.R

Description

The workflow of vigilante is highly module-based. To ensure a successful and smooth run, vigilante needs to prepare input data before continuing.

Usage

1
2
3
4
5
6
7
v_prepareVdata_CHM_T(
  doTE = FALSE,
  customFileName_selection = NULL,
  selectBy = "ENST",
  customFileName_commonName = NULL,
  addCommonName = FALSE
)

Arguments

doTE

logic, whether to prepare Transcript Expression (TE) data, if FALSE, no TE data will be available for downstream v_chmTranscript function; if TRUE, please make sure TE data files are properly named, see Details for more information about file naming.

customFileName_selection

string, relative or absolute path to the custom transcript selection data file, can be set to "example" and use vigilante-embedded example transcript selection data, see Details for more information.

selectBy

character, choose one from c("ENST", "ENSG", "Gene"), use which ID/name to select target transcripts; "ENST" for ENSEMBL transcript ID, "ENSG" for ENSEMBL gene ID, "Gene" for gene name.

customFileName_commonName

(optional) string, relative or absolute path to the custom transcript common name data file, can be set to "example" and use vigilante-embedded example transcript common name data, see Details for more information.

addCommonName

logic, whether to add common name for corresponding transcript in downstream analysis; will be overriden and reset to FALSE if customFileName_commonName is not properly provided.

Details

Oftentimes input data files generated by upstream tools came with diverse naming conventions. It might be easy for the user to recognize those files, but not for vigilante if there is no recognizable patterns.

To make input data files clear to vigilante, it would be nice to have them named something like "studyID_sampleID_(other descriptions).file extension". Here "studyID" is the name of the study or project, and it will be used in multiple naming situations (such as on the plot, or in the output file names), so it is recommended to be concise and meaningful.

For module CHM_T, currently supported input data files are listed below, please contact the author if you want to add more files to the supported list: Transcript Expression (GE): *cufflinks.isoforms* for Cufflinks

Here is more information about 'customFileName_selection'. To begin with, user can provide a csv file through 'customFileName_selection' containing a panel of transcripts/genes in dataframe format (check vigilante.knights.sword::transcript_selection for example) if user already has some transcripts/genes of interest in mind and wants to check how those transcripts (or transcripts of the selected genes) perform. Note there are 3 columns in the example, but in practice user only needs to provide 1 column and set 'selectBy' to match the provided data: "ENST" for ENSEMBL transcript ID, "ENSG" for ENSEMBL gene ID, "Gene" for gene name.

Here is more information about 'customFileName_commonName'. Similar to 'customFileName_selection', but it is optional, user can provide a csv file through 'customFileName_commonName' containing a panel of transcripts in dataframe format (check vigilante.knights.sword::transcript_commonName for example) if user already has some (transcript ID-common name) pair and wants to use them in downstream analysis.

For both 'customFileName_selection' and 'customFileName_commonName', there is an alternative option by setting the value to "example"; in this way, user can try the embedded example data (derived from external public available data) and get an idea of how this function works.

Value

list, because R CMD check discourages assignments to the global environment within functions, user needs to run the function with explicitly assigning the return value to a global variable named "prepareVdata_CHM_T_returnList", which will be a list containing the required variables for downstream analyses.


yilixu/vigilante documentation built on June 4, 2021, 5:07 a.m.