library(SomaDataIO) library(withr) Sys.setlocale("LC_COLLATE", "en_US.UTF-8") knitr::opts_chunk$set( echo = TRUE, collapse = TRUE, comment = "#>" )
Occasionally, additional clinical data is obtained after samples have been submitted to SomaLogic, Inc. or even after 'SomaScan' results have been delivered.
This requires the new clinical, i.e. non-proteomic, data to be merged
with the 'SomaScan' data into a "new" ADAT prior to analysis.
For this purpose, a command-line-interface ("CLI") tool has been included
with SomaDataIO
in the cli/merge/
directory, which allows one to
generate an updated *.adat
file via the command-line without
having to launch an integrated development environment ("IDE"), e.g. RStudio
.
To use SomaDataIO
s exported functionality from within an R session,
please see merge_clin()
.
The clinical merge tool is an R script
that comes with an installation
of SomaDataIO:
dir(system.file("cli", "merge", package = "SomaDataIO", mustWork = TRUE)) merge_script <- system.file("cli/merge", "merge_clin.R", package = "SomaDataIO") merge_script
First create a temporary "analysis" directory:
analysis_dir <- tempfile(pattern = "somascan-") # create directory dir.create(analysis_dir) # sanity check dir.exists(analysis_dir) # copy merge tool into analysis directory file.copy(merge_script, to = analysis_dir)
Let's create some dummy 'SomaScan' data derived from the example_data
object from SomaDataIO.
First we reduce its size to 9 samples and 5 proteomic features, and
then write to text file in our new analysis directory with write_adat()
.
This will be the "new" starting point for the clinical
data merge and represents where customers would typically begin an analysis.
feats <- withr::with_seed(3, sample(getAnalytes(example_data), 5L)) sub_adat <- dplyr::select(example_data, PlateId, SlideId, Subarray, SampleId, Age, all_of(feats)) |> head(9L) withr::with_dir(analysis_dir, write_adat(sub_adat, file = "ex-data-9.adat") )
Next we create random clinical data with a common key (this is typically
the SampleId
identifier but it could be any common key).
df <- data.frame(SampleId = as.character(seq(1, 9, by = 2)), # common key group = c("a", "b", "a", "b", "a"), newvar = withr::with_seed(1, rnorm(5))) df # write clinical data to file withr::with_dir(analysis_dir, write.csv(df, file = "clin-data.csv", row.names = FALSE) )
At this point there are now 3 files in our analysis directory:
dir(analysis_dir)
merge_clin.R
the merge script engine itself clin-data.csv
:SampleId
group
newvar
ex-data-9.adat
:PlateId
, SlideId
, Subarray
, SampleId
, and Age
PlateId
, SlideId
, and Subarray
are key fields common
to almost all ADATs; removing them could yield unintended resultsSampleId
is requiredThe clinical data merge tool is simple to use at most common command line
terminals (BASH
, ZSH
, etc.). You must have R
installed
(and available) with SomaDataIO
and its dependencies installed.
The merge script takes 4 (four), ordered arguments:
*.adat
) file*.csv
) fileSampleId
)*.adat
) for new ADATThe primary syntax is for when the common key in both files, (ADAT and CSV), has the same variable name:
# change directory to the analysis path cd `r analysis_dir` # run the Rscript: # - we recommend using the --vanilla flag Rscript --vanilla merge_clin.R ex-data-9.adat clin-data.csv SampleId ex-data-9-merged.adat
withr::with_dir(analysis_dir, base::system2( "Rscript", c("--vanilla", "merge_clin.R", "ex-data-9.adat", "clin-data.csv", "SampleId", "ex-data-9-merged.adat") ) )
dir(analysis_dir)
In certain instances you may have the common key under
a different variable name in their respective files.
This is handled by a modification to argument 3,
which now takes the form key1=key2
where key1
contains the common keys in the *.adat
file,
and key2
contains keys for the *.csv
file.
To highlight this syntax, first let's create a new clinical
data file with a different variable name, ClinID
:
# rename original `df` names(df) <- c("ClinID", "letter", "size") df # write clinical data to file withr::with_dir(analysis_dir, write.csv(df, file = "clin-data2.csv", row.names = FALSE) )
We can now execute the same merge script at the command line with a slightly modified syntax:
Rscript --vanilla merge_clin.R ex-data-9.adat clin-data2.csv SampleId=ClinID ex-data-9-merged2.adat
withr::with_dir(analysis_dir, base::system2( "Rscript", c("--vanilla", "merge_clin.R", "ex-data-9.adat", "clin-data2.csv", "SampleId=ClinID", "ex-data-9-merged2.adat") ) )
dir(analysis_dir)
Now let's check that the clinical data was merged successfully and
yields the expected *.adat
:
new <- withr::with_dir(analysis_dir, read_adat("ex-data-9-merged2.adat") ) new getMeta(new) getAnalytes(new)
merge_clin.R
script provided with
SomaDataIO.merge_clin()
.if ( dir.exists(analysis_dir) ) { unlink(analysis_dir, force = TRUE) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.