Description Usage Arguments Details Value Author(s) References Examples

Determine which genes have sufficiently large counts to be retained in a statistical analysis.

1 2 3 4 5 6 7 | ```
## S3 method for class 'DGEList'
filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, ...)
## S3 method for class 'SummarizedExperiment'
filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, ...)
## Default S3 method:
filterByExpr(y, design = NULL, group = NULL, lib.size = NULL,
min.count = 10, min.total.count = 15, large.n = 10, min.prop = 0.7, ...)
``` |

`y` |
matrix of counts, or a |

`design` |
design matrix. Ignored if |

`group` |
vector or factor giving group membership for a oneway layout, if appropriate. |

`lib.size` |
library size, defaults to |

`min.count` |
numeric. Minimum count required for at least some samples. |

`min.total.count` |
numeric. Minimum total count required. |

`large.n` |
integer. Number of samples per group that is considered to be “large”. |

`min.prop` |
numeric. Minimum proportion of samples in the smallest group that express the gene. |

`...` |
any other arguments.
For the |

This function implements the filtering strategy that was intuitively described by Chen et al (2016).
Roughly speaking, the strategy keeps genes that have at least `min.count`

reads in a worthwhile number samples.
More precisely, the filtering keeps genes that have count-per-million (CPM) above *k* in *n* samples, where *k* is determined by `min.count`

and by the sample library sizes and *n* is determined by the design matrix.

*n* is essentially the smallest group sample size or, more generally, the minimum inverse leverage of any fitted value.
If all the group sizes are larger than `large.n`

, then this is relaxed slightly, but with *n* always greater than `min.prop`

of the smallest group size (70% by default).

In addition, each kept gene is required to have at least `min.total.count`

reads across all the samples.

Logical vector of length `nrow(y)`

indicating which rows of `y`

to keep in the analysis.

Gordon Smyth

Chen Y, Lun ATL, and Smyth, GK (2016).
From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.
*F1000Research* 5, 1438.
http://f1000research.com/articles/5-1438

1 2 3 4 5 | ```
## Not run:
keep <- filterByExpr(y, design)
y <- y[keep,]
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.