bioset
is intended to help you working with sets of raw data.
Working in a lab it is not uncommon to have a data set of raw values (because your measuring device spat it out) and you now need to somehow transform and organise the data so that you can work with it.
A stable version of bioset
is available on CRAN: https://cran.r-project.org/package=bioset
So all you need to do is:
install.packages("bioset")
You can find the latest additions and changes on GitHub. To spare CRAN administrators' time it is requested of all package authors not to submit changes too frequently.
Consequently, I will make new features available on GitHub first. Packages I have not yet submitted to CRAN will be labelled vX.Y.Z-pre.N
and appear under: https://github.com/randomchars42/bioset/releases.
To install those packages you can use githubinstall
# install.packages("githubinstall")
gh_install_packages("bioset", ref = "vX.Y.Z-pre.N")
You can install the very latest changes in bioset
-master from github with:
# install.packages("devtools")
devtools::install_github("randomchars42/bioset")
bioset
lets you:
Suppose you have an ods
/ xls(x)
file with raw values obtained from a measurement like this:
| | 1| 2| 3| 4| 5| 6| |-----|----:|----:|----:|----:|----:|----:| | A | 102| 107| 156| 145| 360| 342| | B | 198| 203| 101| 121| 231| 226| | C | 296| 291| 276| 283| 430| 413| | D | 430| 386| 325| 298| 110| 119|
Save them as set_1.csv
- thats like an ods
/ xls(x)
file but its basically a text file with the values separated by commas. In the current versions of LibreOffice / OpenOffice / Microsoft office theres an option "Save as" > "csv".
Load the package.
library("bioset")
Then you can use set_read()
to get all values with their position as name in a nice tibble:
set_read()
| set| position | sample_id | name | value| |----:|:---------|:-----------|:-----|------:| | 1| A1 | A1 | A1 | 102| | 1| B1 | B1 | B1 | 198| | 1| C1 | C1 | C1 | 296| | 1| D1 | D1 | D1 | 430| | 1| A2 | A2 | A2 | 107| | 1| B2 | B2 | B2 | 203| | 1| C2 | C2 | C2 | 291| | 1| D2 | D2 | D2 | 386| | 1| A3 | A3 | A3 | 156| | 1| B3 | B3 | B3 | 101| | 1| C3 | C3 | C3 | 276| | 1| D3 | D3 | D3 | 325| | 1| A4 | A4 | A4 | 145| | 1| B4 | B4 | B4 | 121| | 1| C4 | C4 | C4 | 283| | 1| D4 | D4 | D4 | 298| | 1| A5 | A5 | A5 | 360| | 1| B5 | B5 | B5 | 231| | 1| C5 | C5 | C5 | 430| | 1| D5 | D5 | D5 | 110| | 1| A6 | A6 | A6 | 342| | 1| B6 | B6 | B6 | 226| | 1| C6 | C6 | C6 | 413| | 1| D6 | D6 | D6 | 119|
set_read()
automagically reads set_1.csv
in your current directory. If you have more than one set use set_read(num = 2)
to read set 2, etc.
If your files are called plate_1.csv
, plate_2.csv
, ..., (run_1.csv
, run_1.csv
) you can set file_name = "plate_#NUM#.csv"
(run_#NUM#.csv
, ...).
If your files are stored in ./files/
tell set_read()
where to look via path = "./files/"
.
Before feeding your samples into your measuring device you most likely drafted some sort of plan which position corresponds to which sample (didn't you?).
| | 1 | 2 | 3 | 4 | 5 | 6 | |-----|:-----|:-----|:----|:----|:----|:----| | A | CAL1 | CAL1 | A | A | B | B | | B | CAL2 | CAL2 | C | C | D | D | | C | CAL3 | CAL3 | E | E | F | F | | D | CAL4 | CAL4 | G | G | H | H |
So you had some calibrators (1-4) and samples A, B, C, D, E, F, G, H, each in duplicates.
To easily set the names for your samples just copy the names into your set_1.csv
:
| | 1 | 2 | 3 | 4 | 5 | 6 | |-----|:-----|:-----|:----|:----|:----|:----| | A | 102 | 107 | 156 | 145 | 360 | 342 | | B | 198 | 203 | 101 | 121 | 231 | 226 | | C | 296 | 291 | 276 | 283 | 430 | 413 | | D | 430 | 386 | 325 | 298 | 110 | 119 | | E | CAL1 | CAL1 | A | A | B | B | | F | CAL2 | CAL2 | C | C | D | D | | G | CAL3 | CAL3 | E | E | F | F | | H | CAL4 | CAL4 | G | G | H | H |
Tell set_read()
your data contains the names and which column should hold those names by setting additional_vars = c("name")
.
set_read(
additional_vars = c("name")
)
This will get you:
| set| position | sample_id | name | value| |----:|:---------|:-----------|:-----|------:| | 1| A1 | CAL1 | CAL1 | 102| | 1| B1 | CAL2 | CAL2 | 198| | 1| C1 | CAL3 | CAL3 | 296| | 1| D1 | CAL4 | CAL4 | 430| | 1| A2 | CAL1 | CAL1 | 107| | 1| B2 | CAL2 | CAL2 | 203| | 1| C2 | CAL3 | CAL3 | 291| | 1| D2 | CAL4 | CAL4 | 386| | 1| A3 | A | A | 156| | 1| B3 | C | C | 101| | 1| C3 | E | E | 276| | 1| D3 | G | G | 325| | 1| A4 | A | A | 145| | 1| B4 | C | C | 121| | 1| C4 | E | E | 283| | 1| D4 | G | G | 298| | 1| A5 | B | B | 360| | 1| B5 | D | D | 231| | 1| C5 | F | F | 430| | 1| D5 | H | H | 110| | 1| A6 | B | B | 342| | 1| B6 | D | D | 226| | 1| C6 | F | F | 413| | 1| D6 | H | H | 119|
Suppose samples A, B, C, D were taken at day 1 and E, F, G, H were taken from the same rats / individuals / patients on day 2.
It would be more elegant to encode that into the data:
| | 1 | 2 | 3 | 4 | 5 | 6 | |-----|:-----|:-----|:-----|:-----|:-----|:-----| | A | 102 | 107 | 156 | 145 | 360 | 342 | | B | 198 | 203 | 101 | 121 | 231 | 226 | | C | 296 | 291 | 276 | 283 | 430 | 413 | | D | 430 | 386 | 325 | 298 | 110 | 119 | | E | CAL1 | CAL1 | A_1 | A_1 | B_1 | B_1 | | F | CAL2 | CAL2 | C_1 | C_1 | D_1 | D_1 | | G | CAL3 | CAL3 | A_2 | A_2 | B_2 | B_2 | | H | CAL4 | CAL4 | C_2 | C_2 | D_2 | D_2 |
Now, tell set_read()
your data contains the names and day by setting additional_vars = c("name", "day")
. This will get you:
set_read(
additional_vars = c("name", "day")
)
| set| position | sample_id | name | day | value| |----:|:---------|:-----------|:-----|:----|------:| | 1| A1 | CAL1 | CAL1 | NA | 102| | 1| B1 | CAL2 | CAL2 | NA | 198| | 1| C1 | CAL3 | CAL3 | NA | 296| | 1| D1 | CAL4 | CAL4 | NA | 430| | 1| A2 | CAL1 | CAL1 | NA | 107| | 1| B2 | CAL2 | CAL2 | NA | 203| | 1| C2 | CAL3 | CAL3 | NA | 291| | 1| D2 | CAL4 | CAL4 | NA | 386| | 1| A3 | A_1 | A | 1 | 156| | 1| B3 | C_1 | C | 1 | 101| | 1| C3 | A_2 | A | 2 | 276| | 1| D3 | C_2 | C | 2 | 325| | 1| A4 | A_1 | A | 1 | 145| | 1| B4 | C_1 | C | 1 | 121| | 1| C4 | A_2 | A | 2 | 283| | 1| D4 | C_2 | C | 2 | 298| | 1| A5 | B_1 | B | 1 | 360| | 1| B5 | D_1 | D | 1 | 231| | 1| C5 | B_2 | B | 2 | 430| | 1| D5 | D_2 | D | 2 | 110| | 1| A6 | B_1 | B | 1 | 342| | 1| B6 | D_1 | D | 1 | 226| | 1| C6 | B_2 | B | 2 | 413| | 1| D6 | D_2 | D | 2 | 119|
Propably, your measuring device only gave you raw values (extinction rates / relative light units / ...). You know the concentrations of CAL1, CAL2, CAL3 and CAL4. Conveniently, the concentrations follow a linear relationship. To get the concentrations for the rest of the samples you need to interpolate between those calibrators.
set_calc_concentrations()
does exactly this for you:
set_calc_concentrations(
data,
cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"),
cal_values = c(1, 2, 3, 4) # ng / ml
)
| set| position | sample_id | name | day | value| real| conc| recovery| |----:|:---------|:-----------|:-----|:----|------:|-----:|----------:|----------:| | 1| A1 | CAL1 | CAL1 | NA | 102| 1| 1.0089686| 1.0089686| | 1| B1 | CAL2 | CAL2 | NA | 198| 2| 1.9656203| 0.9828102| | 1| C1 | CAL3 | CAL3 | NA | 296| 3| 2.9422023| 0.9807341| | 1| D1 | CAL4 | CAL4 | NA | 430| 4| 4.2775286| 1.0693822| | 1| A2 | CAL1 | CAL1 | NA | 107| 1| 1.0587942| 1.0587942| | 1| B2 | CAL2 | CAL2 | NA | 203| 2| 2.0154459| 1.0077230| | 1| C2 | CAL3 | CAL3 | NA | 291| 3| 2.8923767| 0.9641256| | 1| D2 | CAL4 | CAL4 | NA | 386| 4| 3.8390633| 0.9597658| | 1| A3 | A_1 | A | 1 | 156| NA| 1.5470852| NA| | 1| B3 | C_1 | C | 1 | 101| NA| 0.9990035| NA| | 1| C3 | A_2 | A | 2 | 276| NA| 2.7428999| NA| | 1| D3 | C_2 | C | 2 | 325| NA| 3.2311908| NA| | 1| A4 | A_1 | A | 1 | 145| NA| 1.4374689| NA| | 1| B4 | C_1 | C | 1 | 121| NA| 1.1983059| NA| | 1| C4 | A_2 | A | 2 | 283| NA| 2.8126557| NA| | 1| D4 | C_2 | C | 2 | 298| NA| 2.9621325| NA| | 1| A5 | B_1 | B | 1 | 360| NA| 3.5799701| NA| | 1| B5 | D_1 | D | 1 | 231| NA| 2.2944694| NA| | 1| C5 | B_2 | B | 2 | 430| NA| 4.2775286| NA| | 1| D5 | D_2 | D | 2 | 110| NA| 1.0886896| NA| | 1| A6 | B_1 | B | 1 | 342| NA| 3.4005979| NA| | 1| B6 | D_1 | D | 1 | 226| NA| 2.2446437| NA| | 1| C6 | B_2 | B | 2 | 413| NA| 4.1081216| NA| | 1| D6 | D_2 | D | 2 | 119| NA| 1.1783757| NA|
Your calibrators are not so linear? Perhaps after a ln-ln transformation? You can use: model_func = fit_lnln
and interpolate_func = interpolate_lnln
. Basicallly, you can use any function as model_function
that returns a model which is understood by your interpolate-func
.
So samples were measured in duplicates. For our further research you might want to use the mean and perhaps exclude samples with too much spread in their values.
set_calc_variability()
to the rescue.
data <- set_calc_variability(
data = data,
ids = sample_id,
value,
conc
)
This will give you the mean and coefficient of variation (as well as n of the samples and the standard deviation) for the columns value
and conc
. It will use sample_id
to determine which rows belong to the same sample.
| set| position | sample_id | name | day | value| real| conc| recovery| value_n| value_mean| value_sd| value_cv| conc_n| conc_mean| conc_sd| conc_cv| |----:|:---------|:-----------|:-----|:----|------:|-----:|----------:|----------:|---------:|------------:|----------:|----------:|--------:|-----------:|----------:|----------:| | 1| A1 | CAL1 | CAL1 | NA | 102| 1| 1.0089686| 1.0089686| 2| 104.5| 3.535534| 0.0338329| 2| 1.033881| 0.0352320| 0.0340774| | 1| B1 | CAL2 | CAL2 | NA | 198| 2| 1.9656203| 0.9828102| 2| 200.5| 3.535534| 0.0176336| 2| 1.990533| 0.0352320| 0.0176998| | 1| C1 | CAL3 | CAL3 | NA | 296| 3| 2.9422023| 0.9807341| 2| 293.5| 3.535534| 0.0120461| 2| 2.917289| 0.0352320| 0.0120770| | 1| D1 | CAL4 | CAL4 | NA | 430| 4| 4.2775286| 1.0693822| 2| 408.0| 31.112698| 0.0762566| 2| 4.058296| 0.3100418| 0.0763970| | 1| A2 | CAL1 | CAL1 | NA | 107| 1| 1.0587942| 1.0587942| 2| 104.5| 3.535534| 0.0338329| 2| 1.033881| 0.0352320| 0.0340774| | 1| B2 | CAL2 | CAL2 | NA | 203| 2| 2.0154459| 1.0077230| 2| 200.5| 3.535534| 0.0176336| 2| 1.990533| 0.0352320| 0.0176998| | 1| C2 | CAL3 | CAL3 | NA | 291| 3| 2.8923767| 0.9641256| 2| 293.5| 3.535534| 0.0120461| 2| 2.917289| 0.0352320| 0.0120770| | 1| D2 | CAL4 | CAL4 | NA | 386| 4| 3.8390633| 0.9597658| 2| 408.0| 31.112698| 0.0762566| 2| 4.058296| 0.3100418| 0.0763970| | 1| A3 | A_1 | A | 1 | 156| NA| 1.5470852| NA| 2| 150.5| 7.778175| 0.0516822| 2| 1.492277| 0.0775105| 0.0519411| | 1| B3 | C_1 | C | 1 | 101| NA| 0.9990035| NA| 2| 111.0| 14.142136| 0.1274066| 2| 1.098655| 0.1409281| 0.1282733| | 1| C3 | A_2 | A | 2 | 276| NA| 2.7428999| NA| 2| 279.5| 4.949747| 0.0177093| 2| 2.777778| 0.0493248| 0.0177569| | 1| D3 | C_2 | C | 2 | 325| NA| 3.2311908| NA| 2| 311.5| 19.091883| 0.0612902| 2| 3.096662| 0.1902529| 0.0614381| | 1| A4 | A_1 | A | 1 | 145| NA| 1.4374689| NA| 2| 150.5| 7.778175| 0.0516822| 2| 1.492277| 0.0775105| 0.0519411| | 1| B4 | C_1 | C | 1 | 121| NA| 1.1983059| NA| 2| 111.0| 14.142136| 0.1274066| 2| 1.098655| 0.1409281| 0.1282733| | 1| C4 | A_2 | A | 2 | 283| NA| 2.8126557| NA| 2| 279.5| 4.949747| 0.0177093| 2| 2.777778| 0.0493248| 0.0177569| | 1| D4 | C_2 | C | 2 | 298| NA| 2.9621325| NA| 2| 311.5| 19.091883| 0.0612902| 2| 3.096662| 0.1902529| 0.0614381| | 1| A5 | B_1 | B | 1 | 360| NA| 3.5799701| NA| 2| 351.0| 12.727922| 0.0362619| 2| 3.490284| 0.1268353| 0.0363395| | 1| B5 | D_1 | D | 1 | 231| NA| 2.2944694| NA| 2| 228.5| 3.535534| 0.0154728| 2| 2.269557| 0.0352320| 0.0155237| | 1| C5 | B_2 | B | 2 | 430| NA| 4.2775286| NA| 2| 421.5| 12.020815| 0.0285191| 2| 4.192825| 0.1197889| 0.0285700| | 1| D5 | D_2 | D | 2 | 110| NA| 1.0886896| NA| 2| 114.5| 6.363961| 0.0555804| 2| 1.133533| 0.0634176| 0.0559469| | 1| A6 | B_1 | B | 1 | 342| NA| 3.4005979| NA| 2| 351.0| 12.727922| 0.0362619| 2| 3.490284| 0.1268353| 0.0363395| | 1| B6 | D_1 | D | 1 | 226| NA| 2.2446437| NA| 2| 228.5| 3.535534| 0.0154728| 2| 2.269557| 0.0352320| 0.0155237| | 1| C6 | B_2 | B | 2 | 413| NA| 4.1081216| NA| 2| 421.5| 12.020815| 0.0285191| 2| 4.192825| 0.1197889| 0.0285700| | 1| D6 | D_2 | D | 2 | 119| NA| 1.1783757| NA| 2| 114.5| 6.363961| 0.0555804| 2| 1.133533| 0.0634176| 0.0559469|
If you need to read and transform multiple sets sets_read
can do that for you.
It takes basically the same arguments as set_read
, set_calc_concentrations
and set_calc_variability
combined and combines their functionality. The principal difference is, that sets_read
takes sets
- the number of sets to process.
It returns a list and may (write_data = TRUE
) create two files in your current directory: data_all.csv
and data_samples.csv
with the processed data.
sets_read()
's list holds the following items:
$all
: here you will find all the data , including calibrators, duplicates, ... (saved in data_all.csv
if write_data = TRUE
)$samples
: only one row per distinct sample here - no calibrators, no duplicates -> most often you will work with this data (saved in data_samples.csv
if write_data = TRUE
)$set1
: a list$plot
: a plot showing you the function used to calculate the concentrations for this set. The points represent the calibrators.$model
: the model as returned by model_func
$set2
- $setN
): the same information for every set you haveTake a look at the data
# now you may run it :)
result_list <- sets_read(
sets = 1,
sep = ",",
additional_vars = c("name", "day"),
cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"),
cal_values = c(1, 2, 3, 4) # ng / ml
)
result_list$all
| set| position | sample_id | name | day | value| real| recovery| n| raw| raw_mean| raw_sd| raw_cv| concentration| concentration_sd| concentration_cv| |----:|:---------|:-----------|:-----|:----|------:|-----:|----------:|----:|----:|----------:|----------:|----------:|--------------:|------------------:|------------------:| | 1| A1 | CAL1 | CAL1 | NA | 102| 1| 1.0089686| 2| 102| 104.5| 3.535534| 0.0338329| 1.033881| 0.0352320| 0.0340774| | 1| B1 | CAL2 | CAL2 | NA | 198| 2| 0.9828102| 2| 198| 200.5| 3.535534| 0.0176336| 1.990533| 0.0352320| 0.0176998| | 1| C1 | CAL3 | CAL3 | NA | 296| 3| 0.9807341| 2| 296| 293.5| 3.535534| 0.0120461| 2.917289| 0.0352320| 0.0120770| | 1| D1 | CAL4 | CAL4 | NA | 430| 4| 1.0693822| 2| 430| 408.0| 31.112698| 0.0762566| 4.058296| 0.3100418| 0.0763970| | 1| A2 | CAL1 | CAL1 | NA | 107| 1| 1.0587942| 2| 107| 104.5| 3.535534| 0.0338329| 1.033881| 0.0352320| 0.0340774| | 1| B2 | CAL2 | CAL2 | NA | 203| 2| 1.0077230| 2| 203| 200.5| 3.535534| 0.0176336| 1.990533| 0.0352320| 0.0176998| | 1| C2 | CAL3 | CAL3 | NA | 291| 3| 0.9641256| 2| 291| 293.5| 3.535534| 0.0120461| 2.917289| 0.0352320| 0.0120770| | 1| D2 | CAL4 | CAL4 | NA | 386| 4| 0.9597658| 2| 386| 408.0| 31.112698| 0.0762566| 4.058296| 0.3100418| 0.0763970| | 1| A3 | A_1 | A | 1 | 156| NA| NA| 2| 156| 150.5| 7.778175| 0.0516822| 1.492277| 0.0775105| 0.0519411| | 1| B3 | C_1 | C | 1 | 101| NA| NA| 2| 101| 111.0| 14.142136| 0.1274066| 1.098655| 0.1409281| 0.1282733| | 1| C3 | A_2 | A | 2 | 276| NA| NA| 2| 276| 279.5| 4.949747| 0.0177093| 2.777778| 0.0493248| 0.0177569| | 1| D3 | C_2 | C | 2 | 325| NA| NA| 2| 325| 311.5| 19.091883| 0.0612902| 3.096662| 0.1902529| 0.0614381| | 1| A4 | A_1 | A | 1 | 145| NA| NA| 2| 145| 150.5| 7.778175| 0.0516822| 1.492277| 0.0775105| 0.0519411| | 1| B4 | C_1 | C | 1 | 121| NA| NA| 2| 121| 111.0| 14.142136| 0.1274066| 1.098655| 0.1409281| 0.1282733| | 1| C4 | A_2 | A | 2 | 283| NA| NA| 2| 283| 279.5| 4.949747| 0.0177093| 2.777778| 0.0493248| 0.0177569| | 1| D4 | C_2 | C | 2 | 298| NA| NA| 2| 298| 311.5| 19.091883| 0.0612902| 3.096662| 0.1902529| 0.0614381| | 1| A5 | B_1 | B | 1 | 360| NA| NA| 2| 360| 351.0| 12.727922| 0.0362619| 3.490284| 0.1268353| 0.0363395| | 1| B5 | D_1 | D | 1 | 231| NA| NA| 2| 231| 228.5| 3.535534| 0.0154728| 2.269557| 0.0352320| 0.0155237| | 1| C5 | B_2 | B | 2 | 430| NA| NA| 2| 430| 421.5| 12.020815| 0.0285191| 4.192825| 0.1197889| 0.0285700| | 1| D5 | D_2 | D | 2 | 110| NA| NA| 2| 110| 114.5| 6.363961| 0.0555804| 1.133533| 0.0634176| 0.0559469| | 1| A6 | B_1 | B | 1 | 342| NA| NA| 2| 342| 351.0| 12.727922| 0.0362619| 3.490284| 0.1268353| 0.0363395| | 1| B6 | D_1 | D | 1 | 226| NA| NA| 2| 226| 228.5| 3.535534| 0.0154728| 2.269557| 0.0352320| 0.0155237| | 1| C6 | B_2 | B | 2 | 413| NA| NA| 2| 413| 421.5| 12.020815| 0.0285191| 4.192825| 0.1197889| 0.0285700| | 1| D6 | D_2 | D | 2 | 119| NA| NA| 2| 119| 114.5| 6.363961| 0.0555804| 1.133533| 0.0634176| 0.0559469|
result_list$samples
| position | sample_id | name | day | plate| n| raw| raw_sd| raw_cv| concentration| concentration_sd| concentration_cv| |:---------|:-----------|:-----|:----|------:|----:|------:|----------:|----------:|--------------:|------------------:|------------------:| | A3 | A_1 | A | 1 | 1| 2| 150.5| 7.778175| 0.0516822| 1.492277| 0.0775105| 0.0519411| | B3 | C_1 | C | 1 | 1| 2| 111.0| 14.142136| 0.1274066| 1.098655| 0.1409281| 0.1282733| | C3 | A_2 | A | 2 | 1| 2| 279.5| 4.949747| 0.0177093| 2.777778| 0.0493248| 0.0177569| | D3 | C_2 | C | 2 | 1| 2| 311.5| 19.091883| 0.0612902| 3.096662| 0.1902529| 0.0614381| | A5 | B_1 | B | 1 | 1| 2| 351.0| 12.727922| 0.0362619| 3.490284| 0.1268353| 0.0363395| | B5 | D_1 | D | 1 | 1| 2| 228.5| 3.535534| 0.0154728| 2.269557| 0.0352320| 0.0155237| | C5 | B_2 | B | 2 | 1| 2| 421.5| 12.020815| 0.0285191| 4.192825| 0.1197889| 0.0285700| | D5 | D_2 | D | 2 | 1| 2| 114.5| 6.363961| 0.0555804| 1.133533| 0.0634176| 0.0559469|
result_list$set1$plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.