# NAPPA: A novel statistical method for the processing and normalisation of mRNA data output from the Nanostring nCounter software

### Description

Enables the processing and normalisation of the mRNA data output from the Nanostring nCounter software. Performs an adjustment based on the observed field of view for each lane. Performs a background correction using the truncated Poisson distribution adjustment. Performs a positive control normalisation using the E2 value. Performs a housekeeper normalisation by estimating the slope multiplier from sigmoidal curve fit.

### Usage

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ```
NAPPA(
data,
tissueType = c("tumour", "cells"),
NReferenceSamples = sampleNumber,
sampleNumber = ncol(data) - 3,
scaleFOV = T,
background.method = c("poisson", "subtract", "poisson.global",
"subtract.global", "subtract.max", "subtract.globalmax",
"subtract.mean2sd", "subtract.globalmean2sd", "none"),
nposcontrols = 4,
poscontrol.method = c("average", "weighted.average",
"geometric.mean", "average.prebc"),
hk.method = c("shrunken.correct", "shrunken.subtract",
"subtract", "correct"),
betas = NULL,
hknormfactor.mean = NULL,
sigmoidparameters = NULL,
addconstant = 10,
imputezeroes.method = c("min", "min.retro", "none"),
raise.low.counts=2,
output = NULL)
``` |

### Arguments

`data` |
This should be the output RCC files from the Nanostring nCounter software, saved as a tab delimited text file. If there are multiple cartridges from the same experiment - simply merge together into a single file to run. In the first column, housekeeping genes should be indicated by changing the standard 'Endogenous,' call generated by nCounter to 'Housekeeping.' |

`tissueType` |
Should be indicated as either 'tumour' or 'cells' depending on the sample type used. |

`NReferenceSamples` |
The number of samples which will be used to determine the mean expression level in the calculation of the normalisation shrinkage parameters and housekeeping correction factor. The first NReferenceSamples sample columns in the input file are used. The default is to use all samples. |

`sampleNumber` |
A synonym for NReferenceSamples |

`scaleFOV` |
Logical flag indicating whether to normalise for the number of successfully imaged fields of view. By default set to TRUE. |

`background.method` |
The method used for background correction. The default is poisson which performs a Truncated Poisson Correction using the average background counts for each lane (sample). Other options are poisson.global which performs a Truncated Poisson Correction using a global average background over all lanes, subtract, subtract.max and subtract.mean2sd which subtract the mean, maximum and mean plus 2 standard deviations of the background count within each lane, and their global versions subtract.global, subtract.globalmax and subtract.globalmean2sd which subtract the same statistics calculated over all lanes. |

`nposcontrols` |
The number of the positive control probes to use in calculating a positive control normalisation factor. By default set to 4. |

`poscontrol.method` |
The method used to combine the positive controls to generate a single positive control value for each lane. The default option is average, other options are weighted.average, geometric mean and average.prebc (an average based on the values before background correction, effectively reversing the order of the background and positive control steps) |

`hk.method` |
The method used for the housekeeping normalisation step. The default is shunken.correct, performing a shrunken correction. Other options are shrunken.subtract, correct, and subtract which perform either housekeeper subtraction or correction using shrunken or standard subtraction. |

`betas` |
The shrinkage parameters for shrunken housekeeping normalisation. These may be taken from a previous run of NAPPA using the output="Betas"option. By default these are calculated for each gene based on the first NReferenceSamples samples. |

`hknormfactor.mean` |
The correction term for the housekeeping normalisation factors. This may be taken from a previous run of NAPPA using the output="HousekeepingFactor" option. By default this is calculated as the average housekeeping factor within the data. |

`sigmoidparameters` |
A vector of length two containing the location and slope parameters for the sigmoid curve used to calculate the shrinkage parameters for the shrunken housekeeping normalisation. By default these are determined from the tissueType argument. |

`addconstant` |
Constant added to the final expression levels to present them on a more user-friendly scale. By default set to 10. |

`imputezeroes.method` |
The method used to impute values for zero raw counts. The default option min imputes the lowest normalised value observed for that gene in the normalised data set. The option none does no imputation, and the option min.retro is a legacy option allowing back-compatibility with earlier versions of NAPPA. |

`raise.low.counts` |
Minimum value to raise all raw counts to in the initial pre-processing step. By default set to 2. To omit this step set to a value of zero. |

`output` |
Values to be returned from the function. By default the function returns just a gene expression matrix. If output is set to a non-null value then a list is returned containing the requested components as detailed in the value section. |

### Details

The RCC file that is output by the Nanostring nCounter software contains two empty rows when output. These rows must be removed prior to analysis. Multiple cartridges that use the same codeset can be merged together into a single file (there is no upper limit to sample size). It is recommended that as many samples as possible are used to calculate the gene means for the housekeeping normalisation (12 samples is the recommended minimum).

### Value

The value of the NAPPA function is determined by the output option. By default a matrix is returned containing the gene expression values with genes as rows and lanes (samples) as columns. If output is set to a non-null value then a list is returned containing the gene expression matrix as the item geneexpression and those of the following items listed in output:

`Housekeeping` |
A matrix of the housekeeping genes |

`HousekeepingFactor` |
A vector of the housekeeping normalisation factor for each lane, and also the housekeeping correction factor, HousekeepingFactor.Mean |

`Betas` |
A vector of the shrinkage factors used in the housekeeping normalisation for each gene |

`Backgrounds` |
A vector of the average background for each lane |

`PosFactor` |
A vector of the positive control factors used for each lane |

`Description` |
The parameters used by the NAPPA function |

If output is set to "All", a list containing all components is returned.

### Author(s)

Chris Harbron Mark Wappett

### Examples

1 2 3 |