In axelmuller/geometric2: NURSA annotation aid

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  library(kableExtra),
  library(geometric2)
)

Types of 2-color experiments

I recommend reading the limma user guide prior to processing 2-color experiments, the topic is discussed in chapters 10 and 11 (pp 51).

2-color arrays fall into two categories:

2-color experiments with a common reference
direct 2-color experiments

Treating 2-color experiments with a common reference is analoguous to processing 1-color experiments.

Direct 2-color experiments can have a variant in which the dyes are swapped. This is supposed to address any bias caused by the dye.

In this vignette I discuss processing the GSE15512 dataset. In this experiment human monocytes are treated with lipoteichoic acid (LTA) from S. aureus and L. plantarum. The two LTAs are referred to as aLTA and pLTA respectively. There are 6 repeats of each experiment and the other color channel observes untreated cells. As a result setting up a contras is not required, it's provided by the experiment itself. A straight forward way to process the data is to separate the two experiments from each other and to create series matrix files for each. These can then be processed individually.

Getting the input data

gse <- "GSE15512"
get_input(gse, ".")

Edit the sample_review.txt file and perform QC as described in the 1C_experiments vignette.

Creating series matrix files for the individual experiments

In a first step the downloaded series matrix file needs to be unzipped

system("gunzip .GSE15512/GEOtemp/GSE15512_series_matrix.txt.gz")

A quick look at the series matrix file reveals that meta data preceedes the experimental data. These meta-data comments start with an exclamation point.

sm_file <- read_lines(".GSE15512/GEOtemp/GSE15512_series_matrix.txt", n_max = 5)
kable(sm_file)

Here we see the transition from comments to experimental data

sm_file <- read_lines(".GSE15512/GEOtemp/GSE15512_series_matrix.txt", skip = 60, n_max = 5)
kable(sm_file)

The grep command can be used to remove the comments.

system("grep -v ! '.GSE15512/GEOtemp/GSE15512_series_matrix.txt' > '.GSE15512/GEOtemp/GSE15512_edited_matrix.txt'")

We can now read in the series matrix as a data.table and select the desired columns to create new matrix files.

mf <- fread(".GSE15512/GEOtemp/GSE15512_edited_matrix.txt")
kable(head(mf))

The first column holds the IDs for the spots, column 2 to 7 contain the experimental data for the aLTA challenge and columns 8 to 13 store the pLTA data. For this tutorial we create a new series matrix with the pLTA data. For this we need the IDs and the data.

mf_pLTA <- mf[, c(1, 8:13)]
kable(head(mf_pLTA))

The resulting data.table can either be transformed into a matrix directly or alternatively written to a file and then loaded. For purposes of record keeping I prefer the latter

fwrite(mf_pLTA, ".GSE15512/GEOtemp/GSE15512_pLTA_matrix.txt")

Creating an ExpressionSet

exprs_pLTA <- as.matrix(read.table(".GSE15512/GEOtemp/GSE15512_pLTA_matrix.txt", 
                                   header = TRUE,
                                   sep = ",",
                                   row.names = 1,
                                   as.is = TRUE))
exprs_pLTA <- ExpressionSet(assayData = exprs_pLTA)

Fitting

For this type of experiment neither design nor contrasts are required. All that is left is fitting the data and adding the gene symbols to the output

fit_pLTA <- lmFit(exprs_pLTA)
fit2_pLTA <- eBayes(fit_pLTA)

write.fit(fit2_pLTA, results=NULL, file=".GSE15512/GSE11512_pLTA_fit.txt", digits=10, adjust="fdr", method="separate",F.adjust="none", sep="\t")