  collapse = TRUE,
  comment = "#>"

bioset is intended to help you working with sets of raw data.

Working in a lab it is not uncommon to have a data set of raw values (because your measuring device spat it out) and you now need to somehow transform and organise the data so that you can work with it.


A stable version of bioset is available on CRAN:

So all you need to do is:


You can find the latest additions and changes on GitHub. To spare CRAN administrators' time it is requested of all package authors not to submit changes too frequently.

Consequently, I will make new features available on GitHub first. Packages I have not yet submitted to CRAN will be labelled vX.Y.Z-pre.N and appear under:

To install those packages you can use githubinstall

# install.packages("githubinstall")
gh_install_packages("bioset", ref = "vX.Y.Z-pre.N")

You can install the very latest changes in bioset-master from github with:

# install.packages("devtools")

Why? What bioset can do for you

bioset lets you:

Data import

Suppose you have an ods / xls(x) file with raw values obtained from a measurement like this:

data <-
    system.file("extdata", "values.csv", package = "bioset"),
    header = FALSE)
rownames(data) <- LETTERS[1:4]

  row.names = TRUE,
  col.names = as.character(1:6))

Save them as set_1.csv- thats like an ods / xls(x) file but its basically a text file with the values separated by commas. In the current versions of LibreOffice / OpenOffice / Microsoft office theres an option "Save as" > "csv".

Load the package.


Then you can use set_read() to get all values with their position as name in a nice tibble:

data <- bioset::set_read(
  file_name = "values.csv",
  path = system.file("extdata", package = "bioset")

set_read() automagically reads set_1.csv in your current directory. If you have more than one set use set_read(num = 2) to read set 2, etc.

If your files are called plate_1.csv, plate_2.csv, ..., (run_1.csv, run_1.csv) you can set file_name = "plate_#NUM#.csv" (run_#NUM#.csv, ...).

If your files are stored in ./files/ tell set_read() where to look via path = "./files/".

Naming the values

Before feeding your samples into your measuring device you most likely drafted some sort of plan which position corresponds to which sample (didn't you?).

data <-
    system.file("extdata", "names.csv", package = "bioset"),
    header = FALSE)
rownames(data) <- LETTERS[1:4]

  row.names = TRUE,
  col.names = as.character(1:6))

So you had some calibrators (1-4) and samples A, B, C, D, E, F, G, H, each in duplicates.

To easily set the names for your samples just copy the names into your set_1.csv:

data <-
    system.file("extdata", "values_names.csv", package = "bioset"),
    header = FALSE)
rownames(data) <- LETTERS[1:8]

  row.names = TRUE,
  col.names = as.character(1:6))

Tell set_read() your data contains the names and which column should hold those names by setting additional_vars = c("name").

  additional_vars = c("name")

This will get you:

data <- bioset::set_read(
  file_name = "values_names.csv",
  path = system.file("extdata", package = "bioset"),
  additional_vars = c("name")

Encoding additional properties

Suppose samples A, B, C, D were taken at day 1 and E, F, G, H were taken from the same rats / individuals / patients on day 2.

It would be more elegant to encode that into the data:

data <-
    system.file("extdata", "values_names_properties.csv", package = "bioset"),
    header = FALSE)
rownames(data) <- LETTERS[1:8]

  row.names = TRUE,
  col.names = as.character(1:6))

Now, tell set_read() your data contains the names and day by setting additional_vars = c("name", "day"). This will get you:

  additional_vars = c("name", "day")
data <- bioset::set_read(
  file_name = "values_names_properties.csv",
  path = system.file("extdata", package = "bioset"),
  additional_vars = c("name", "day")


Calculating concentrations

Propably, your measuring device only gave you raw values (extinction rates / relative light units / ...). You know the concentrations of CAL1, CAL2, CAL3 and CAL4. Conveniently, the concentrations follow a linear relationship. To get the concentrations for the rest of the samples you need to interpolate between those calibrators.

set_calc_concentrations() does exactly this for you:

  cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"),
  cal_values = c(1, 2, 3, 4) # ng / ml
data <- bioset::set_calc_concentrations(
  cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"),
  cal_values = c(1, 2, 3, 4) # ng / ml


Your calibrators are not so linear? Perhaps after a ln-ln transformation? You can use: model_func = fit_lnln and interpolate_func = interpolate_lnln. Basicallly, you can use any function as model_function that returns a model which is understood by your interpolate-func.

Duplicates / Triplicates / ...

So samples were measured in duplicates. For our further research you might want to use the mean and perhaps exclude samples with too much spread in their values.

set_calc_variability() to the rescue.

data <- set_calc_variability(
  data = data,
  ids = sample_id,

This will give you the mean and coefficient of variation (as well as n of the samples and the standard deviation) for the columns value and conc. It will use sample_id to determine which rows belong to the same sample.

data <- bioset::set_calc_variability(
  data = data,
  ids = sample_id,


The short way

If you need to read and transform multiple sets sets_read can do that for you.

It takes basically the same arguments as set_read, set_calc_concentrations and set_calc_variability combined and combines their functionality. The principal difference is, that sets_read takes sets - the number of sets to process.

It returns a list and may (write_data = TRUE) create two files in your current directory: data_all.csv and data_samples.csv with the processed data.

sets_read()'s list holds the following items:

Take a look at the data

# now you may run it :)
result_list <- sets_read(
  sets = 1,
  sep = ",",
  additional_vars = c("name", "day"),
  cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"),
  cal_values = c(1, 2, 3, 4) # ng / ml
result_list <- bioset::sets_read(
  sets = 1,
  sep = ",",
  path = system.file("extdata", package = "bioset"),
  additional_vars = c("name", "day"),
  cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"),
  cal_values = c(1, 2, 3, 4), # ng / ml
  write_data = FALSE

randomchars42/bioset documentation built on May 7, 2019, 9:43 p.m.