knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
bioset
is intended to help you working with sets of raw data.
Working in a lab it is not uncommon to have a data set of raw values (because your measuring device spat it out) and you now need to somehow transform and organise the data so that you can work with it.
A stable version of bioset
is available on CRAN: https://cran.r-project.org/package=bioset
So all you need to do is:
install.packages("bioset")
You can find the latest additions and changes on GitHub. To spare CRAN administrators' time it is requested of all package authors not to submit changes too frequently.
Consequently, I will make new features available on GitHub first. Packages I have not yet submitted to CRAN will be labelled vX.Y.Z-pre.N
and appear under: https://github.com/randomchars42/bioset/releases.
To install those packages you can use githubinstall
# install.packages("githubinstall") gh_install_packages("bioset", ref = "vX.Y.Z-pre.N")
You can install the very latest changes in bioset
-master from github with:
# install.packages("devtools") devtools::install_github("randomchars42/bioset")
bioset
lets you:
Suppose you have an ods
/ xls(x)
file with raw values obtained from a measurement like this:
data <- utils::read.csv( system.file("extdata", "values.csv", package = "bioset"), header = FALSE) rownames(data) <- LETTERS[1:4] knitr::kable( data, row.names = TRUE, col.names = as.character(1:6))
Save them as set_1.csv
- thats like an ods
/ xls(x)
file but its basically a text file with the values separated by commas. In the current versions of LibreOffice / OpenOffice / Microsoft office theres an option "Save as" > "csv".
Load the package.
library("bioset")
Then you can use set_read()
to get all values with their position as name in a nice tibble:
set_read()
data <- bioset::set_read( file_name = "values.csv", path = system.file("extdata", package = "bioset") ) knitr::kable(data)
set_read()
automagically reads set_1.csv
in your current directory. If you have more than one set use set_read(num = 2)
to read set 2, etc.
If your files are called plate_1.csv
, plate_2.csv
, ..., (run_1.csv
, run_1.csv
) you can set file_name = "plate_#NUM#.csv"
(run_#NUM#.csv
, ...).
If your files are stored in ./files/
tell set_read()
where to look via path = "./files/"
.
Before feeding your samples into your measuring device you most likely drafted some sort of plan which position corresponds to which sample (didn't you?).
data <- utils::read.csv( system.file("extdata", "names.csv", package = "bioset"), header = FALSE) rownames(data) <- LETTERS[1:4] knitr::kable( data, row.names = TRUE, col.names = as.character(1:6))
So you had some calibrators (1-4) and samples A, B, C, D, E, F, G, H, each in duplicates.
To easily set the names for your samples just copy the names into your set_1.csv
:
data <- utils::read.csv( system.file("extdata", "values_names.csv", package = "bioset"), header = FALSE) rownames(data) <- LETTERS[1:8] knitr::kable( data, row.names = TRUE, col.names = as.character(1:6))
Tell set_read()
your data contains the names and which column should hold those names by setting additional_vars = c("name")
.
set_read( additional_vars = c("name") )
This will get you:
data <- bioset::set_read( file_name = "values_names.csv", path = system.file("extdata", package = "bioset"), additional_vars = c("name") ) knitr::kable(data)
Suppose samples A, B, C, D were taken at day 1 and E, F, G, H were taken from the same rats / individuals / patients on day 2.
It would be more elegant to encode that into the data:
data <- utils::read.csv( system.file("extdata", "values_names_properties.csv", package = "bioset"), header = FALSE) rownames(data) <- LETTERS[1:8] knitr::kable( data, row.names = TRUE, col.names = as.character(1:6))
Now, tell set_read()
your data contains the names and day by setting additional_vars = c("name", "day")
. This will get you:
set_read( additional_vars = c("name", "day") )
data <- bioset::set_read( file_name = "values_names_properties.csv", path = system.file("extdata", package = "bioset"), additional_vars = c("name", "day") ) knitr::kable(data)
Propably, your measuring device only gave you raw values (extinction rates / relative light units / ...). You know the concentrations of CAL1, CAL2, CAL3 and CAL4. Conveniently, the concentrations follow a linear relationship. To get the concentrations for the rest of the samples you need to interpolate between those calibrators.
set_calc_concentrations()
does exactly this for you:
set_calc_concentrations( data, cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"), cal_values = c(1, 2, 3, 4) # ng / ml )
data <- bioset::set_calc_concentrations( data, cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"), cal_values = c(1, 2, 3, 4) # ng / ml ) knitr::kable(data)
Your calibrators are not so linear? Perhaps after a ln-ln transformation? You can use: model_func = fit_lnln
and interpolate_func = interpolate_lnln
. Basicallly, you can use any function as model_function
that returns a model which is understood by your interpolate-func
.
So samples were measured in duplicates. For our further research you might want to use the mean and perhaps exclude samples with too much spread in their values.
set_calc_variability()
to the rescue.
data <- set_calc_variability( data = data, ids = sample_id, value, conc )
This will give you the mean and coefficient of variation (as well as n of the samples and the standard deviation) for the columns value
and conc
. It will use sample_id
to determine which rows belong to the same sample.
data <- bioset::set_calc_variability( data = data, ids = sample_id, value, conc ) knitr::kable(data)
If you need to read and transform multiple sets sets_read
can do that for you.
It takes basically the same arguments as set_read
, set_calc_concentrations
and set_calc_variability
combined and combines their functionality. The principal difference is, that sets_read
takes sets
- the number of sets to process.
It returns a list and may (write_data = TRUE
) create two files in your current directory:
data_all.csv
and data_samples.csv
with the processed data.
sets_read()
's list holds the following items:
$all
: here you will find all the data , including calibrators,
duplicates, ... (saved in data_all.csv
if write_data = TRUE
)$samples
: only one row per distinct sample here - no calibrators, no
duplicates -> most often you will work with this data
(saved in data_samples.csv
if write_data = TRUE
)$set1
: a list$plot
: a plot showing you the function used to calculate the
concentrations for this set. The points represent the calibrators.$model
: the model as returned by model_func
$set2
- $setN
): the same information for every set you haveTake a look at the data
# now you may run it :) result_list <- sets_read( sets = 1, sep = ",", additional_vars = c("name", "day"), cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"), cal_values = c(1, 2, 3, 4) # ng / ml )
result_list <- bioset::sets_read( sets = 1, sep = ",", path = system.file("extdata", package = "bioset"), additional_vars = c("name", "day"), cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"), cal_values = c(1, 2, 3, 4), # ng / ml write_data = FALSE )
result_list$all
knitr::kable(result_list$all)
result_list$samples
knitr::kable(result_list$samples)
result_list$set1$plot
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.