#knitr::opts_knit$set(root.dir = "/home/axel/my_code/nursa/geometric2/test_data")
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  library(kableExtra),
  path2contrast_file <- "~/my_code/nursa/geometric2/vignettes/contrasts.txt",
  path2_annotated_samples <- "~/my_code/nursa/geometric2/vignettes/GSE6346_sample_review_anno.txt",
  path2my_geo_data <- "/home/axel/my_code/nursa/geometric2/vignettes/.GSE6346/GEOtemp",
  output_file <- "/home/axel/my_code/nursa/geometric2/vignettes/.GSE6346/Results/fit.txt"
)

Warning

The geometric2 package aims to streamline limma processing and annotation of GEO data specifically for the NURSA project. It uses code from https://github.com/avucoh/GEOmetric. Some of the code has been modified, some hasn't, but none has been unit-tested! So, I can't recommend that anybody outside of the Kaddis-Lab uses it. Those who use it should be aware of the shortcomings.

Introduction

NURSA requires a treating GEO data in a specific way. Details are described in "Step by step guide for NURSA curation". In brief: Appropriate GEO datasets (GSE) are selected, for microarray data a data matrix file with processed data is downloaded as well as platform information. RNAseq data processing is discussed in a separate vignette.

Loading the library

library(geometric2)
data("biosamps")

Getting the input data

gse <- "GSE6346"
get_input(gse, "/home/axel/my_code/nursa/geometric2/vignettes")

Running the get_input command creates the following file structure

system("tree .GSE6346", intern = TRUE)

The file sample_review.txt is a table that can be loaded into LibreOffice Calc or other spreadsheet programs.

Annotating the input data

samples <- fread(".GSE6346/sample_review.txt")
kable(head(samples), caption="sample_review.txt")

Columns that need to be annotated are: - Group - Node - NodeFunction - BSM - BSMDCD - BSMDCD2

The get_input() function tries to provide some of the required information. This process, however, is not very reliable and needs to be checked. Most of the time the information needs to be added manually. In many instances the required information is not provided in the GEO entry and it is necessary to consult the corresponding publication for the required information. But also this effort is not always rewarded.

While the Node and BSM fields are important for annotation later the Group field is required to create the desired contrasts. The github repository cited above provides a function to assign the groups and to create contrasts based on this annotated file. The process is however buggy and it is recommended to do this step manually.

samples <- fread(path2_annotated_samples)
kable(head(samples), caption="Annotated sample_review.txt")

Creating the contrast file

The functions below require the presence of a contrast file which should look like this:

contrasts <- fread(path2contrast_file)
kable(contrasts, caption="Contrasts")

The essential columns are Contrast which holds the title of the contrast which is constructed according to NURSA requirements and the Formula which utilizes the groups assigned earlier. The other fields are not strictly required for the process laid out here but are required when following the protocol given in the github repository mentioned in the intro.

Loading the processed data

The function get_eset() creates an expression set object. It requires a GSE value and the path to the GEO data

my_eset <- get_eset(gse, path2my_geo_data)

Creating a design matrix

Before continuing here I recommend reading the relevant sections in the limma user guide (pp 35)

my_design <- get_design()
kable(my_design)

fitting

The next step requires the contrast.txt file to be in the same directory as the sample_review.txt directory.

fit <- lmFit(my_eset, my_design)
fit
contrasts <- get_contrast.matrix(path2contrast_file = path2contrast_file)
kable(contrasts)
fit2 <- contrasts.fit(fit, contrasts)
fit3 <- eBayes(fit2)

Writing fitdata to a file

write_fit_file(fit3, output_file )
fit3$

Annotating the output file



axelmuller/geometric2 documentation built on May 25, 2019, 6:26 p.m.