knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 5,
  fig.height = 4,
  dpi=200
)

Introduction

The NanoStringGeoMxSet was inherited from Biobase's ExpressionSet class. The NanoStringGeoMxSet class was designed to encapsulate data and corresponding methods for NanoString DCC files generated from the NanoString GeoMx Digital Spatial Profiling (DSP) platform.

There are numerous functions that NanoStringGeoMxSet inherited from ExpressionSet class. You can find these in this link: https://www.bioconductor.org/packages/release/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf

Loading Packages

Loading the NanoStringNCTools and GeoMxTools packages allow users access to the NanoStringGeoMxSet class and corresponding methods.

library(NanoStringNCTools)
library(GeomxTools)
library(EnvStats)
library(ggiraph)

Building a NanoStringGeoMxSet from .DCC files

Use the readNanoStringGeoMxSet function to read in your DCC files.

The phenoDataFile variable takes in the annotation file, the phenoDataDccColName is to specify which column from your annotation contains the DCC file names. The protocolDataColNames are the columns in your annotation file that you want to put in the protocol data slot.

datadir <- system.file("extdata", "DSP_NGS_Example_Data",
                       package="GeomxTools")
DCCFiles <- dir(datadir, pattern=".dcc$", full.names=TRUE)
PKCFiles <- unzip(zipfile = file.path(datadir,  "/pkcs.zip"))
SampleAnnotationFile <- file.path(datadir, "annotations.xlsx")

demoData <-
  suppressWarnings(readNanoStringGeoMxSet(dccFiles = DCCFiles,
                                          pkcFiles = PKCFiles,
                                          phenoDataFile = SampleAnnotationFile,
                                          phenoDataSheet = "CW005",
                                          phenoDataDccColName = "Sample_ID",
                                          protocolDataColNames = c("aoi",
                                                                   "cell_line",
                                                                   "roi_rep",
                                                                   "pool_rep",
                                                                   "slide_rep")))

class(demoData)
isS4(demoData)
is(demoData, "ExpressionSet")
demoData

How is the DSP data stored in the NanoStringGeoMxSet object?

# access the count matrix 
assayData(demoData)[["exprs"]][1:3, 1:3]

# access pheno data
pData(demoData)[1:3, ]

# access the protocol data
pData(protocolData(demoData))[1:3, ]

# access the probe information
fData(demoData)[1:3, ]

# check feature type
featureType(demoData)

# access PKC information
annotation(demoData)

Accessing and Assigning NanoStringGeoMxSet Data Members

Alongside the accessors associated with the ExpressionSet class, NanoStringGeoMxSet objects have unique additional assignment and accessor methods facilitating common ways to view DSP data and associated labels.

Access annotations

The package provide functions to get the annotations of the data

Access the available pheno and protocol data variables

svarLabels(demoData)
head(sData(demoData), 2)

Design information can be assigned to the NanoStringGeoMxSet object, as well as feature and sample labels to use for NanoStringGeoMxSet plotting methods.

design(demoData) <- ~ `segments`
design(demoData)

dimLabels(demoData)
dimLabels(demoData)[2] <- "Sample ID"
dimLabels(demoData)

Summarizing NanoString GeoMx Data

Easily summarize count results using the summary method. Data summaries can be generated across features or samples. Labels can be used to generate summaries based on feature or sample groupings.

head(summary(demoData, MARGIN = 1), 2)
head(summary(demoData, MARGIN = 2), 2)
unique(sData(demoData)$"cell_line")
head(summary(demoData, MARGIN = 2, GROUP = "cell_line")$"HS578T", 2)
head(summary(demoData, MARGIN = 2, GROUP = "cell_line")$"COLO201", 2)
head(summary(demoData, MARGIN = 2, GROUP = "cell_line", log2 = FALSE)$"COLO201", 2)

Subsetting NanoStringGeoMxSet Objects

NanoStringGeoMxSet provides subsetting methods including bracket subsetting and subset functions. Users can use the subset or select arguments to further subset by feature or sample, respectively.

dim(demoData)

use the bracket notation

dim(demoData[, demoData$`slide name` == "6panel-old-slide1 (PTL-10891)"])

Or use subset method to subset demoData object by selecting only certain slides

dim(subset(demoData, select = phenoData(demoData)[["slide name"]] == "6panel-old-slide1 (PTL-10891)"))

Subset by selecting specific targets and slide name

dim(subset(demoData, TargetName == "ACTA2", `slide name` == "6panel-old-slide1 (PTL-10891)"))
dim(subset(demoData, CodeClass == "Control", `slide name` == "6panel-old-slide1 (PTL-10891)"))

use endogenousSubset and negativeControlSubset function to subset the demodata and include only features that belong to endogenous code class or negative code class.

dim(endogenousSubset(demoData))
dim(negativeControlSubset(demoData))

endogenousSubset function also takes select arguments to further subset by phenodata

dim(endogenousSubset(demoData, 
                              select = phenoData(demoData)[["slide name"]] == "6panel-old-slide1 (PTL-10891)"))

# tally the number of samples according to their protocol or phenodata grouping
with(endogenousSubset(demoData), table(`slide name`))
with(demoData [1:10, 1:10], table(cell_line))    
with(negativeControlSubset(demoData), table(CodeClass))

Apply Functions Across Assay Data

Similar to the ExpressionSet's esApply function, an equivalent method is available with NanoStringGeoMxSet objects. Functions can be applied to assay data feature-wise or sample-wise.

Add the demoElem data which is computed as the logarithm of the count matrix (exprs) into the demoData by using assayDataApply function. The accessor function assayDataElement from eSet returns matrix element from assayData slot of object. Elt refers to the element in the assayData.

assayDataElement(demoData, "demoElem") <- 
  assayDataApply(demoData, MARGIN=2, FUN=log, base=10, elt="exprs")
assayDataElement(demoData, "demoElem")[1:3, 1:2]

# loop over the features(1) or samples(2) of the assayData element and get the mean
assayDataApply(demoData, MARGIN=1, FUN=mean, elt="demoElem")[1:5]

# split the data by group column with feature, pheno or protocol data then get the mean
head(esBy(demoData, 
            GROUP = "cell_line", 
            FUN = function(x) { 
              assayDataApply(x, MARGIN = 1, FUN=mean, elt="demoElem") 
            }))

Built-in Quality Control Assessment

Users can flag samples that fail QC thresholds or have borderline results based on expression. The setQC Flags will set the QC flags in the protocolData for the samples and probes that are low in count and saturation levels. It will also set flags for probe local outliers (low and high) and Global Outliers

demoData <- shiftCountsOne(demoData, useDALogic=TRUE)
demoData <- setSegmentQCFlags(demoData)
head(protocolData(demoData)[["QCFlags"]])
demoData <- setBioProbeQCFlags(demoData)
featureData(demoData)[["QCFlags"]][1:5, 1:4]

Probes and Samples that were flagged can be removed from analysis by subsetting.

Subset object to exclude all that did not pass Sequencing and background QC.

QCResultsIndex <- which(apply(protocolData(demoData)[["QCFlags"]], 
                              1L , function(x) sum(x) == 0L))
QCPassed <- demoData[, QCResultsIndex]
dim(QCPassed) 

After cleaning the object from low counts, the counts can be collapsed to Target using aggregateCounts function.

Save the new object as target_demoData when you call the aggregateCounts function. This will change the dimension of the features. After aggregating the counts, feature data will contain target counts and not probe counts. To check the feature type, you can use the featureType accessor function.

target_demoData <- aggregateCounts(demoData)
dim(target_demoData)

Note that feature data changed to target.

featureType(target_demoData)
exprs(target_demoData)[1:5, 1:5]

Normalization

There is a preloaded GeoMx DSP-DA Normalization that comes with the NanoStringGeoMxSet class. This includes the options to normalize on quantile, housekeeping or negative normalization.

target_demoData <- normalize(target_demoData , norm_method="quant", 
                      desiredQuantile = .9, toElt = "q_norm")
target_demoData <- normalize(target_demoData , norm_method="neg", fromElt="exprs",  toElt="neg_norm")
target_demoData <- normalize(target_demoData , norm_method="hk", fromElt="exprs", toElt="hk_norm")
assayDataElement( target_demoData , elt = "q_norm" )[1:3, 1:2]
assayDataElement( target_demoData , elt = "hk_norm" )[1:3, 1:2]
assayDataElement( target_demoData , elt = "neg_norm" )[1:3, 1:2]

Transforming NanoStringRCCSet Data to Data Frames

The NanoStringGeoMxSet munge function generates a data frame object for downstream modeling and visualization. This combines available features and samples into a long format.

neg_set <- negativeControlSubset(demoData)
class(neg_set)
neg_ctrls <- munge(neg_set, ~ exprs)
head(neg_ctrls, 2)
class(neg_ctrls)
head(munge(demoData, ~ exprs), 2)
munge(demoData, mapping = ~`cell_line` + GeneMatrix)

Transforming NanoSetRccSet assayData matrices

Subtract max count from each sample Create log1p transformation of adjusted counts

thresh <- assayDataApply(negativeControlSubset(demoData), 2, max)
demoData <-
  transform(demoData,
            negCtrlZeroed = sweep(exprs, 2, thresh),
            log1p_negCtrlZeroed = log1p(pmax(negCtrlZeroed, 0)))
assayDataElementNames(demoData)
sessionInfo()


Nanostring-Biostats/GeomxTools documentation built on April 14, 2024, 1:25 a.m.