In axelmuller/geometric2: NURSA annotation aid

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  library(kableExtra),
  path2_annotated_sample_review <- "/home/axel/my_code/nursa/geometric2/vignettes/GSE6346_sample_review_anno.txt",
  path2_contrast_file <-  "/home/axel/my_code/nursa/geometric2/vignettes/contrasts.txt",
  path2_eset <- "/home/axel/my_code/nursa/geometric2/vignettes/.GSE6346/GEOtemp" ,
  fit_file <- "/home/axel/my_code/nursa/geometric2/vignettes/.GSE6346/Results/fit.txt",
  path2fit_file_directory <- "/home/axel/my_code/nursa/geometric2/vignettes/.GSE6346/Results/",
  path2GEOmetadb <- "/home/axel/projects/nursa/GEOmetric/GEOmetadb.sqlite"
)

Introduction

NURSA requires a treating GEO data in a specific way. Details are described in "Step by step guide for NURSA curation". In brief: Appropriate GEO datasets (GSE) are selected, for microarray data a data matrix file with processed data is downloaded as well as platform information. 2-color microarrays and analyzing RNAseq data sets are discussed in separate vignettes.

Loading the library

library(geometric2)
data("biosamps")

Getting the input data

gse <- "GSE6346"
get_input(gse, "/home/axel/my_code/nursa/geometric2/vignettes")

Running the get_input command creates the following file structure

system("tree .GSE6346", intern = TRUE)

The file sample_review.txt is a table that can be loaded into LibreOffice Calc or other spreadsheet programs.

Annotating the input data

samples <- fread("vignettes/.GSE6346/sample_review.txt")
kable(head(samples), caption="sample_review.txt")

Columns that need to be annotated are: - Group - Node - NodeFunction - BSM - BSMDCD - BSMDCD2

The get_input() function tries to provide some of the required information. This process, however, is not very reliable and needs to be checked. Most of the time the information needs to be added manually. In many instances the required information is not provided in the GEO entry and it is necessary to consult the corresponding publication for the required information. But also this effort is not always rewarded.

While the Node and BSM fields are important for annotation later the Group field is required to create the desired contrasts. The github repository cited above provides a function to assign the groups and to create contrasts based on this annotated file. The process is however buggy and I recommended to do this step manually.

samples <- fread(path2_annotated_sample_review)
kable(head(samples), caption="Annotated sample_review.txt")

QC

Performed by running Anh Ngueyet’s Step2.Rmd script. The script is hosted in the R folder of the geometric package. This is to ensure the data is reasonably coherent. Outliers in the PCA might indicate potentially mislabeled samples, or a batch effect. The boxplots give a good indication if the data is satisfactorily normalized. In extreme cases samples can be removed from further analysis.

Creating the contrast file

The functions below require the presence of a contrast file which should look like this:

contrasts <- fread(path2_contrast_file)
kable(contrasts, caption="Contrasts")

The essential columns are Contrast which holds the title of the contrast which is constructed according to NURSA requirements and the Formula which utilizes the groups assigned earlier. The other fields are not strictly required for the process laid out here but are required when following the protocol given in the github repository mentioned in the intro.

Rules for suitable contrasts

Generally experiments that differ in only one parameter can be compared. Examples are:

wild type vs mutant
untreated vs treated with molecule A
untreated vs treated with molecule B
treated with molecule A vs treated with molecule A and B

The following contrast is not valid: - untreated vs treated with molecule A and B

Loading the processed data

The function get_eset() creates an expression set object. It requires a GSE value and the path to the GEO data

my_eset <- get_eset(gse, path2_eset)

Creating a design matrix

Before continuing here I recommend reading the relevant sections in the limma user guide (pp 35)

my_design <- get_design()
kable(my_design)

fitting

The next step requires the contrast.txt file to be in the same directory as the sample_review.txt directory.

fit <- lmFit(my_eset, my_design)
fit

contrast <- get_contrast.matrix(path2_contrast_file)
kable(contrast)

fit2 <- contrasts.fit(fit, contrast)
fit3 <- eBayes(fit2)

Writing the fitted data to a file

write_fit_file(fit3, fit_file)

Adding gene symbols to the fitted data

This constitutes the last step. It's the topic of another vignettes.

axelmuller/geometric2 documentation built on May 25, 2019, 6:26 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

axelmuller/geometric2
NURSA annotation aid

In axelmuller/geometric2: NURSA annotation aid

Introduction

Loading the library

Getting the input data

Annotating the input data

QC

Creating the contrast file

Rules for suitable contrasts

Loading the processed data

Creating a design matrix

fitting

Writing the fitted data to a file

Adding gene symbols to the fitted data

R Package Documentation

Browse R Packages

We want your feedback!

axelmuller/geometric2 NURSA annotation aid

In axelmuller/geometric2: NURSA annotation aid

Introduction

Loading the library

Getting the input data

Annotating the input data

QC

Creating the contrast file

Rules for suitable contrasts

Loading the processed data

Creating a design matrix

fitting

Writing the fitted data to a file

Adding gene symbols to the fitted data

R Package Documentation

Browse R Packages

We want your feedback!

axelmuller/geometric2
NURSA annotation aid