ds_basic: A basic datasource (DS)

Description Usage Arguments Details See Also Examples

View source: R/ds_basic.R

Description

The standard datasource used to get training and test splits of data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
ds_basic(
  binned_data,
  var_to_decode,
  num_cv_splits,
  use_count_data = FALSE,
  num_label_repeats_per_cv_split = 1,
  label_levels_to_use = NULL,
  num_resample_sites = NULL,
  site_IDs_to_use = NULL,
  site_IDs_to_exclude = NULL,
  randomly_shuffled_labels_before_running = FALSE,
  create_simultaneously_recorded_populations = 0
)

Arguments

binned_data

A string that list a path to a file that has data in binned format, or a data frame of binned_data that is in binned format.

var_to_decode

A string specifying the name of the labels that should be decoded. This label must be one of the columns in the binned data that starts with 'label.'

num_cv_splits

A number specifying how many cross-validation splits should be used.

use_count_data

If the binned data is neural spike counts, then setting use_count_data = TRUE will convert the data into spike counts. This is useful for classifiers that work on spike count data, e.g., the poisson_naive_bayes_CL.

num_label_repeats_per_cv_split

A number specifying how many times each label should be repeated in each cross-validation split.

label_levels_to_use

A vector of strings specifying specific label levels that should be used. If this is set to NULL then all label levels available will be used.

num_resample_sites

The number of sites that should be randomly selected when constructing training and test vectors. This number needs to be less than or equal to the number of sites available that have num_cv_splits * num_label_repeats_per_cv_split repeats.

site_IDs_to_use

A vector of integers specifying which sites should be used.

site_IDs_to_exclude

A vector of integers specifying which sites should be excluded.

randomly_shuffled_labels_before_running

A boolean specifying whether the labels should be shuffled prior to the get_data() function being called. This is used when one wants to create a null distribution for comparing when decoding results are above chance.

create_simultaneously_recorded_populations

If the data from all sites was recorded simultaneously, then setting this variable to 1 will cause the get_data() function to return simultaneous populations rather than pseudo-populations.

Details

This 'basic' datasource is the datasource that will most commonly be used for most analyses. It can generate training and tests sets for data that has been recorded simultaneously or pseudo-populations for data that was not recorded simultaneously.

Like all datasources, this datasource takes binned format data and has a get_data() method that is called by a cross-validation object to get training and testing splits of data that can be passed to a classifier.

See Also

Other datasource: ds_generalization()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# A typical example of creating a datasource to be passed cross-validation object
data_file <- system.file("extdata/ZD_150bins_50sampled.Rda", package = "NDTr")
ds <- ds_basic(data_file, "stimulus_ID", 18)

# If one has many repeats of each label, decoding can be faster if one
# uses fewer CV splits and repeats each label multiple times in each split.
ds <- ds_basic(data_file, "stimulus_ID", 6,
  num_label_repeats_per_cv_split = 3
)

# One can specify a subset of labels levels to be used in decoding. Here
#  we just do a three-way decoding analysis between "car", "hand" and "kiwi".
ds <- ds_basic(data_file, "stimulus_ID", 18,
  label_levels_to_use = c("car", "hand", "kiwi")
)

# One never explicitely calls the get_data() function, but rather this is
# done by the cross-validator. However, to illustrate what this function
# does, we can call it explicitly here to get training and test data:
all_cv_data <- NDTr:::get_data(ds)
names(all_cv_data)

emeyers/NDTr documentation built on Aug. 8, 2020, 3:41 p.m.