knitr::opts_chunk$set( collapse = TRUE, comment = "#>", library(kableExtra), path2_annotated_sample_review <- "/home/axel/my_code/nursa/geometric2/vignettes/GSE6346_sample_review_anno.txt", path2_contrast_file <- "/home/axel/my_code/nursa/geometric2/vignettes/contrasts.txt", path2_eset <- "/home/axel/my_code/nursa/geometric2/vignettes/.GSE6346/GEOtemp" , fit_file <- "/home/axel/my_code/nursa/geometric2/vignettes/.GSE6346/Results/fit.txt", path2fit_file_directory <- "/home/axel/my_code/nursa/geometric2/vignettes/.GSE6346/Results/", path2GEOmetadb <- "/home/axel/projects/nursa/GEOmetric/GEOmetadb.sqlite" )
NURSA requires a treating GEO data in a specific way. Details are described in "Step by step guide for NURSA curation". In brief: Appropriate GEO datasets (GSE) are selected, for microarray data a data matrix file with processed data is downloaded as well as platform information. 2-color microarrays and analyzing RNAseq data sets are discussed in separate vignettes.
library(geometric2) data("biosamps")
gse <- "GSE6346" get_input(gse, "/home/axel/my_code/nursa/geometric2/vignettes")
Running the get_input command creates the following file structure
system("tree .GSE6346", intern = TRUE)
The file sample_review.txt is a table that can be loaded into LibreOffice Calc or other spreadsheet programs.
samples <- fread("vignettes/.GSE6346/sample_review.txt") kable(head(samples), caption="sample_review.txt")
Columns that need to be annotated are: - Group - Node - NodeFunction - BSM - BSMDCD - BSMDCD2
The get_input() function tries to provide some of the required information. This process, however, is not very reliable and needs to be checked. Most of the time the information needs to be added manually. In many instances the required information is not provided in the GEO entry and it is necessary to consult the corresponding publication for the required information. But also this effort is not always rewarded.
While the Node and BSM fields are important for annotation later the Group field is required to create the desired contrasts. The github repository cited above provides a function to assign the groups and to create contrasts based on this annotated file. The process is however buggy and I recommended to do this step manually.
samples <- fread(path2_annotated_sample_review) kable(head(samples), caption="Annotated sample_review.txt")
Performed by running Anh Ngueyet’s Step2.Rmd script. The script is hosted in the R folder of the geometric package. This is to ensure the data is reasonably coherent. Outliers in the PCA might indicate potentially mislabeled samples, or a batch effect. The boxplots give a good indication if the data is satisfactorily normalized. In extreme cases samples can be removed from further analysis.
The functions below require the presence of a contrast file which should look like this:
contrasts <- fread(path2_contrast_file) kable(contrasts, caption="Contrasts")
The essential columns are Contrast which holds the title of the contrast which is constructed according to NURSA requirements and the Formula which utilizes the groups assigned earlier. The other fields are not strictly required for the process laid out here but are required when following the protocol given in the github repository mentioned in the intro.
Generally experiments that differ in only one parameter can be compared. Examples are:
The following contrast is not valid: - untreated vs treated with molecule A and B
The function get_eset() creates an expression set object. It requires a GSE value and the path to the GEO data
my_eset <- get_eset(gse, path2_eset)
Before continuing here I recommend reading the relevant sections in the limma user guide (pp 35)
my_design <- get_design() kable(my_design)
The next step requires the contrast.txt file to be in the same directory as the sample_review.txt directory.
fit <- lmFit(my_eset, my_design) fit
contrast <- get_contrast.matrix(path2_contrast_file) kable(contrast)
fit2 <- contrasts.fit(fit, contrast) fit3 <- eBayes(fit2)
write_fit_file(fit3, fit_file)
This constitutes the last step. It's the topic of another vignettes.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.