Description Usage Arguments Details Value Examples
This is NOT cryptographically secure nor equivalent to a proper 2-key de-ID! It is likely to dissuade casual attack, but cannot stop a motivated attacker.
1 2 |
x |
a RangedSummarizedExperiment to anonymize |
salt |
a salting phrase to slow brute-force attacks ("0x") |
strip |
strip rehashed objects of any deID'ing metadata? (TRUE) |
algo |
algorithm to use for the one-way hash (default is "md5") |
deorder |
scramble rows and columns? (FALSE; disrupts data digest) |
Specialized functions for rehash'ing specialized SE-like objects and for providing key-exchangeable versions of this functionality are forthcoming.
Assay renaming currently works by matching the assay name to the actual hdf5 path name used in HDF5 backing files (assays.h5), as produced by HDF5Array::saveHDF5SummarizedExperiment(...). This should ease interop with e.g. Python consumers of the data (they'll still need reverse mappings for the column and row names, but that's not too terribly difficult either).
At some point, it may make more sense to save metadata for rehash/dehash purposes to a relatively language-agnostic data format like Feather, or else break up all the pieces into CSVs and write a Python package to handle the reversing of hash-mappings. Either should be fine for interop.
1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | ncols <- 6
nrows <- 200
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rownames(counts) <- apply(expand.grid(letters, letters), 1,
paste0, collapse="")[seq_len(nrow(rse))]
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
names(rowRanges) <- rownames(counts)
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
# a toy RangedSummarizedExperiment (?SummarizedExperiment)
rse <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
assays(rse)$cpm <- sweep(assays(rse)$counts * 1e6, 2, normalizers, `/`)
covs <- colData(rse) # alternative to pulling these from res$covs
# rehash the toy RangedSummarizedExperiment:
res <- rehash(rse, salt="testing", strip=TRUE, algo="md5", deorder=FALSE)
deIDed <- res$object
# test it out with HDF5-backed storage:
library(HDF5Array)
deIDedPath <- file.path(tempdir() , "deIDed")
deIDed <- saveHDF5SummarizedExperiment(deIDed, deIDedPath, replace=TRUE)
# recover the rehashed object using the saved metadata:
meta <- res$meta
covs <- res$covs
reIDed <- dehash(deIDed, meta=meta, covs=covs, check=TRUE)
if (!is.null(colnames(rse))) {
stopifnot(identical(colnames(reIDed), colnames(rse)))
}
if (!is.null(rownames(rse))) {
stopifnot(identical(rownames(reIDed), rownames(rse)))
}
# seeing is believing
show(reIDed)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.