cogdat

![Build Status](https://travis-ci.org/daniel1noble/cogdat.png?branch =master) codecov

Package containing a single function which can be used to process datasets collected on learning or cognition experiments from horizontal format to vertical format for analysis. Data needs to be in a specific format for the processDat function to work correctly.

How to set up data

Data needs to be set up with some rules in mind. First, every data must contain an ID, Date and var column label. It is important to note that these need to be placed at specific locations in the data set. An example data set below should better demonstrate this point:

| | | | | | | | | | | |
|------|----------|----------|----------|------------|----------|----------|----------|----------|----------|----------|----| | | | | | Date |10.11.12 | 11.11.12 | 12.11.12 | 13.11.12 | 14.11.12 | 15.11.12 | …. | | | | |Trial | 1 | 2 | 3 | 4 | 5 | 6 | …. | | | | | | Task | 1 | 1 | 1 | 1 | 1 | 1 | …. | | ID | Sex | Chan | trt | var | | | | | | | | | 12 | M | 1 | 2 | correct | 1 | 1 | 1 | 0 | 0 | 0 | …. | | | | | | latency | 12 | 2 | 34 | 5 | 6 | 7 | …. | | | | | | researcher | FK | FK | FK | FK | FK | FK | …. | | 34 | F | 1 | 2 | correct | 1 | 1 | 1 | 0 | 0 | 0 | …. | | | | | | latency | 12 | 2 | 34 | 5 | 6 | 7 | …. | | | | | | researcher | FK | FK | FK | FK | FK | FK | …. | | …. | …. | …. | …. | …. | …. | …. | …. | …. | …. | …. | …. |

Above is an example of how one would set up their data. The important structure here is that Date is in the 1st row, ID in the first column and var between trial data and all the individual data. This is important because these markers are used as reference points to define the boundaries of the data set so that all the relevant data can be extracted, re-shuffled and put back together in the right format. Something one should look out for are extra spaces (make sure that there are no rogue spaces in "empty boxes"). Best way to be sure that isn't the case is to simply put NA in the empty cells. So long as you stick to the rules above you can have as many unique ID's in the data as you want, as many trial specific data (i.e. Trial, Task etc), as many unique variables for individuals (i.e. correct, latency, researcher etc) and as many trials as your heart desires! This is great because it creates a lot of flexibility in data structure that can accommodate different types of projects, while still allowing you to use a data format that is easy to read and follow as data is being collected.

Installation and running processDat

Package installation can be done with devtools, however, given that it's only just a single function one can just copy the function into R and run it. If you want to install with devtools you can do this:

install.packages("devtools")
devtools::install_github("daniel1noble/cogdat")
library(cogdat)

As an example of how to process data consider the example above (which is an example data file). First, we must import the non-processed data file(s) as follows:

data(dataEg1)
head(dataEg1)

data(dataEg2)
head(dataEg2)

Note that we set the header argument as false. This is done so that Data is located in the first row of the imported data file. This must always be the case. Also, it is important to treat all strings variables as characters, not factors (at least until I fix a few things up). Once the data are imported in R we can process it by calling processDat on our data frame:

#Load the dataset that contains the above table
data_procEg1 <- processDat(dataEg1)
head(data_procEg1)

#Try with the second example, which is a bit more elaborate. 
data_procEg2 <- processDat(dataEg2)
head(data_procEg2)

Now the data is formatted vertically and you can change the variables to the relevant types (numeric, factors, integer etc) and it will be ready for analysis.

head(data_procEg1)
  ID Sex Chan trt     Date Trial Task correct latency researcher
1  12   M    1   2 10.11.12     1    1       1      12         FK
2  12   M    1   2 11.11.12     2    1       1       2         FK
3  12   M    1   2 12.11.12     3    1       1      34         FK
4  12   M    1   2 13.11.12     4    1       0       5         FK
5  12   M    1   2 14.11.12     5    1       0       6         FK
6  12   M    1   2 15.11.12     6    1       0       7         FK
7  34   F    1   2 10.11.12     1    1       1      12         FK
8  34   F    1   2 11.11.12     2    1       1       2         FK
9  34   F    1   2 12.11.12     3    1       1      34         FK
10 34   F    1   2 13.11.12     4    1       0       5         FK
11 34   F    1   2 14.11.12     5    1       0       6         FK
12 34   F    1   2 15.11.12     6    1       0       7         FK

Note: This function is in very early stages yet and one needs to be careful to correctly check the output with the raw data that was put in to make sure everything matches. I have NOT implemented tests or proper warning messages at this stage. If you have questions, concerns and/or suggestions for improvement please let me know.



daniel1noble/cogdat documentation built on May 14, 2019, 3:40 p.m.