knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
bioset
is intended to help you working with sets of raw data.
Working in a lab it is not uncommon to have a data set of raw values (because your measuring device spat it out) and you now need to somehow transform and organise the data so that you can work with it.
A stable version of bioset
is available on CRAN: https://cran.r-project.org/package=bioset
So all you need to do is:
install.packages("bioset")
You can find the latest additions and changes on GitHub. To spare CRAN administrators' time it is requested of all package authors not to submit changes too frequently.
Consequently, I will make new features available on GitHub first. Packages I have not yet submitted to CRAN will be labelled vX.Y.Z-pre.N
and appear under: https://github.com/randomchars42/bioset/releases.
To install those packages you can use githubinstall
# install.packages("githubinstall") gh_install_packages("bioset", ref = "vX.Y.Z-pre.N")
You can install the very latest changes in bioset
-master from github with:
# install.packages("devtools") devtools::install_github("randomchars42/bioset")
bioset
lets you:
Suppose you have an ods
/ xls(x)
file with raw values obtained from a measurement like this:
data <- utils::read.csv( system.file("extdata", "values.csv", package = "bioset"), header = FALSE) rownames(data) <- LETTERS[1:4] knitr::kable( data, row.names = TRUE, col.names = as.character(1:6))
Save them as set_1.csv
- thats like an ods
/ xls(x)
file but its basically a text file with the values separated by commas. In the current versions of LibreOffice / OpenOffice / Microsoft office theres an option "Save as" > "csv".
Load the package.
library("bioset")
Then you can use set_read()
to get all values with their position as name in a nice tibble:
set_read()
data <- bioset::set_read( file_name = "values.csv", path = system.file("extdata", package = "bioset") ) knitr::kable(data)
set_read()
automagically reads set_1.csv
in your current directory. If you have more than one set use set_read(num = 2)
to read set 2, etc.
If your files are called plate_1.csv
, plate_2.csv
, ..., (run_1.csv
, run_1.csv
) you can set file_name = "plate_#NUM#.csv"
(run_#NUM#.csv
, ...).
If your files are stored in ./files/
tell set_read()
where to look via path = "./files/"
.
Before feeding your samples into your measuring device you most likely drafted some sort of plan which position corresponds to which sample (didn't you?).
data <- utils::read.csv( system.file("extdata", "names.csv", package = "bioset"), header = FALSE) rownames(data) <- LETTERS[1:4] knitr::kable( data, row.names = TRUE, col.names = as.character(1:6))
So you had some calibrators (1-4) and samples A, B, C, D, E, F, G, H, each in duplicates.
To easily set the names for your samples just copy the names into your set_1.csv
:
data <- utils::read.csv( system.file("extdata", "values_names.csv", package = "bioset"), header = FALSE) rownames(data) <- LETTERS[1:8] knitr::kable( data, row.names = TRUE, col.names = as.character(1:6))
Tell set_read()
your data contains the names and which column should hold those names by setting additional_vars = c("name")
.
set_read( additional_vars = c("name") )
This will get you:
data <- bioset::set_read( file_name = "values_names.csv", path = system.file("extdata", package = "bioset"), additional_vars = c("name") ) knitr::kable(data)
Suppose samples A, B, C, D were taken at day 1 and E, F, G, H were taken from the same rats / individuals / patients on day 2.
It would be more elegant to encode that into the data:
data <- utils::read.csv( system.file("extdata", "values_names_properties.csv", package = "bioset"), header = FALSE) rownames(data) <- LETTERS[1:8] knitr::kable( data, row.names = TRUE, col.names = as.character(1:6))
Now, tell set_read()
your data contains the names and day by setting additional_vars = c("name", "day")
. This will get you:
set_read( additional_vars = c("name", "day") )
data <- bioset::set_read( file_name = "values_names_properties.csv", path = system.file("extdata", package = "bioset"), additional_vars = c("name", "day") ) knitr::kable(data)
Propably, your measuring device only gave you raw values (extinction rates / relative light units / ...). You know the concentrations of CAL1, CAL2, CAL3 and CAL4. Conveniently, the concentrations follow a linear relationship. To get the concentrations for the rest of the samples you need to interpolate between those calibrators.
set_calc_concentrations()
does exactly this for you:
set_calc_concentrations( data, cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"), cal_values = c(1, 2, 3, 4) # ng / ml )
data <- bioset::set_calc_concentrations( data, cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"), cal_values = c(1, 2, 3, 4) # ng / ml ) knitr::kable(data)
Your calibrators are not so linear? Perhaps after a ln-ln transformation? You can use: model_func = fit_lnln
and interpolate_func = interpolate_lnln
. Basicallly, you can use any function as model_function
that returns a model which is understood by your interpolate-func
.
So samples were measured in duplicates. For our further research you might want to use the mean and perhaps exclude samples with too much spread in their values.
set_calc_variability()
to the rescue.
data <- set_calc_variability( data = data, ids = sample_id, value, conc )
This will give you the mean and coefficient of variation (as well as n of the samples and the standard deviation) for the columns value
and conc
. It will use sample_id
to determine which rows belong to the same sample.
data <- bioset::set_calc_variability( data = data, ids = sample_id, value, conc ) knitr::kable(data)
If you need to read and transform multiple sets sets_read
can do that for you.
It takes basically the same arguments as set_read
, set_calc_concentrations
and set_calc_variability
combined and combines their functionality. The principal difference is, that sets_read
takes sets
- the number of sets to process.
It returns a list and may (write_data = TRUE
) create two files in your current directory:
data_all.csv
and data_samples.csv
with the processed data.
sets_read()
's list holds the following items:
$all
: here you will find all the data , including calibrators,
duplicates, ... (saved in data_all.csv
if write_data = TRUE
)$samples
: only one row per distinct sample here - no calibrators, no
duplicates -> most often you will work with this data
(saved in data_samples.csv
if write_data = TRUE
)$set1
: a list$plot
: a plot showing you the function used to calculate the
concentrations for this set. The points represent the calibrators.$model
: the model as returned by model_func
$set2
- $setN
): the same information for every set you haveTake a look at the data
# now you may run it :) result_list <- sets_read( sets = 1, sep = ",", additional_vars = c("name", "day"), cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"), cal_values = c(1, 2, 3, 4) # ng / ml )
result_list <- bioset::sets_read( sets = 1, sep = ",", path = system.file("extdata", package = "bioset"), additional_vars = c("name", "day"), cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"), cal_values = c(1, 2, 3, 4), # ng / ml write_data = FALSE )
result_list$all
knitr::kable(result_list$all)
result_list$samples
knitr::kable(result_list$samples)
result_list$set1$plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.