knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/load-raw-data-",
  out.width = "100%"
)

options(knitr.kable.NA = '')
hcolor <- "#363B74"
tcolor <- "#EEEEEE"

```{css styles, echo = FALSE, eval = TRUE} p { margin-bottom: 0px !important; }

<!-- Utilitary functions -->
```r
# Include sub-index with HTML tag sub
sub_html <- function(str1, str2) {
  return(paste0("[", str1, "]<sub>", str2, "</sub>"))
}
library(MetaPipe)

MetaPipe only accepts comma-separated values (CSV) files with the following structure:

nrows <- 1
raw_df <- data.frame(A = rep("&nbsp;", nrows),
                     B = rep(NA, nrows),
                     C = rep(NA, nrows),
                     D = rep(NA, nrows),
                     E = rep(NA, nrows),
                     G = rep(NA, nrows),
                     H = rep(NA, nrows))
colnames(raw_df) <- c("ID", 
                      sub_html("Property", 1), "...", sub_html("Property", "M"), 
                      sub_html("Trait", 1), "...", sub_html("Trait", "N"))
`%>%` <- dplyr::`%>%`
knitr::kable(raw_df, format = "html", escape = FALSE, table.attr = "style='width:100%;'", align = 'c') %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>%
  kableExtra::row_spec(0, extra_css = "vertical-align: middle;", background = hcolor, color = tcolor) %>%
  kableExtra::column_spec(1:7, border_left = TRUE, border_right = TRUE)

where the first column (ID) should be an unique identifier for each entry, if there are repeated values MetaPipe will aggregate and replace them by a single row (mean across entries). The data structure can have 0 to M properties, including categorical and numerical. Finally, at least one one trait is expected.

The function call is as follows:

load_raw(raw_data_filename = "FILE.CSV", 
         excluded_columns = c(2, 3, ..., M))

where raw_data_filename is the filename containing the raw data, either absolute or relative paths are accepted. The argument excluded_columns is a vector containing the indices of the properties, e.g. c(2, 3, ..., M).

# Toy dataset
set.seed(123)
example_data <- data.frame(ID = c(1,2,3,4,5),
                           P1 = c("one", "two", "three", "four", "five"), 
                           T1 = rnorm(5), 
                           T2 = rnorm(5),
                           T3 = c(NA, rnorm(4)),                     #  20 % NAs
                           T4 = c(NA, 1.2, -0.5, NA, 0.87),          #  40 % NAs
                           T5 = NA)                                  # 100 % NAs
## Write to disk
write.csv(example_data, "example_data.csv", row.names = FALSE)
## Create copy with duplicated rows
write.csv(example_data[c(1:5, 1, 2), ], "example_data_dup.csv", row.names = FALSE)

# Load the data
load_raw("example_data.csv", c(2))
load_raw("example_data_dup.csv", c(2))

Next, see either Replace Missing Data [Optional] or Assess Normality.

filenames <- c(list.files(".", "example_*"))
output <- lapply(filenames, file.remove)


villegar/MetaPipe documentation built on Nov. 22, 2022, 10:44 p.m.