Data Analysis with `augmentedRCBD`"

out_type <- knitr::opts_knit$get("rmarkdown.pandoc.to")

r = getOption("repos")
r["CRAN"] = "https://cran.rstudio.com/"
#r["CRAN"] = "https://cloud.r-project.org/"
#r["CRAN"] = "https://ftp.iitm.ac.in/cran/"
options(repos = r)

# Workaround for missing pandoc in CRAN OSX build machines
out_type <- ifelse(out_type == "", "latex", out_type)

# Workaround for missing pandoc in Solaris build machines
out_type <- ifelse(identical (out_type, vector(mode = "logical", length = 0)),
                   "latex", out_type)
switch(out_type,
    html = {cat("<p>1. Division of Germplasm Conservation, ICAR-National Bureau of Plant Genetic Resources, New Delhi.</p>

<p>2. Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi.</p>

<p>3. Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, New Delhi.</p>

<p>4. Division of Germplasm Evaluation, ICAR-National Bureau of Plant Genetic Resources, New Delhi.</p>")},
    latex = cat("\\begin{center}
1. Division of Germplasm Conservation, ICAR-National Bureau of Plant Genetic Resources, New Delhi.

2. Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi.

3. Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, New Delhi.

4. Division of Germplasm Evaluation, ICAR-National Bureau of Plant Genetic Resources, New Delhi.

\\end{center}" )
)

\begin{center} \vspace{6pt} \hrule \end{center}

knitr::opts_chunk$set(echo = TRUE,
                      comment = "",
                      fig.cap = "")

\tableofcontents

\begin{wrapfigure}{r}{0.35\textwidth} \vspace{-10pt} \begin{center} \includegraphics[width=0.33\textwidth]{r system.file("extdata", "augmentedRCBD.png", package = "augmentedRCBD")} \end{center} \vspace{-10pt} \end{wrapfigure}

logo

1 Overview

The software augmentedRCBD is built on the R statistical programming language as an add-on (or 'package' in the R lingua franca). It performs the analysis of data generated from experiments in augmented randomised complete block design according to Federer, W.T. [-@federer_augmented_1956; -@federer_augmented_1956-1; -@federer_augmented_1961; -@federerModelConsiderationsVariance1976]. It also computes analysis of variance, adjusted means, descriptive statistics, genetic variability statistics etc. and includes options for data visualization and report generation.

This tutorial aims to educate the users in utilising this package for performing such analysis. Utilising augmentedRCBD for data analysis requires a basic knowledge of R programming language. However, as many of the intended end-users may not be familiar with R, sections 2 to 4 give a 'gentle' introduction to R, especially those aspects which are necessary to get augmentedRCBD up and running for performing data analysis in a Windows environment. Users already familiar with R can feel free to skip to section 5.

rlogo_url = 'https://www.r-project.org/logo/Rlogo.png'
if (!file.exists(rlogo_file <- 'rlogo.png')) download.file(rlogo_url, rlogo_file, mode = 'wb')
#knitr::include_graphics(cover_file)

\begin{wrapfigure}{r}{0.35\textwidth} \vspace{-10pt} \begin{center} \includegraphics[width=0.20\textwidth]{r "rlogo.png"} \end{center} \vspace{-5pt} \end{wrapfigure}

logo

2 R software {#rsoft}

It is a free software environment for statistical computing and graphics. It is free and open source, platform independent (works on Linux, Windows or MacOS), very flexible, comprehensive with robust interfaces for all the popular programming languages as well as databases. It is strengthened by its diverse library of add-on packages extending its ability as well as the incredible community support. It is one of the most popular tools being used in academia today [@tippmann_programming_2015].

rbase_url = 'https://raw.githubusercontent.com/aravind-j/augmentedRCBD/master/vignettes/rbase.png'
if (!file.exists(rbase_file <- 'rbase.png')) download.file(rbase_url, rbase_file, mode = 'wb')

rstudio_url = 'https://github.com/aravind-j/augmentedRCBD/raw/master/vignettes/rstudio.png'
if (!file.exists(rstudio_file <- 'rstudio.png')) download.file(rstudio_url, rstudio_file, mode = 'wb')

rstudiopanes_url = 'https://github.com/aravind-j/augmentedRCBD/raw/master/vignettes/rstudio%20panes.png'
if (!file.exists(rstudiopanes_file <- 'rstudio panes.png')) download.file(rstudiopanes_url, rstudiopanes_file, mode = 'wb')

\clearpage

3 Getting Started

This section details the steps required to set up the R programming environment under a third-party interface called RStudio in Windows.

3.1 Installing R

Download and install R for Windows from http://cran.r-project.org/bin/windows/base/.

switch(out_type,
    html = cat('<img src="https://github.com/aravind-j/augmentedRCBD/raw/master/vignettes/rbase.png" align="center" alt="The `R` download location.">'),
    latex = cat('\\includegraphics{rbase.png}'))
switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Fig. 1</strong>: The `R` download location.</p>'),
    latex = cat('\\begin{center}
                \\textbf{Fig. 1}: The \\texttt{R} download location.
                \\end{center}'))

3.2 Installing RStudio

The basic command line interface in native R is rather limiting. There are several interfaces which enhance it's functionality and ease of use, RStudio being one of the most popular among R programmers.

Download and install RStudio for Windows from https://www.rstudio.com/products/rstudio/download/#download

switch(out_type,
    html = cat('<img src="https://github.com/aravind-j/augmentedRCBD/raw/master/vignettes/rstudio.png" align="center" alt="The `RStudio` download location.">'),
    latex = cat('\\includegraphics{rstudio.png}'))
switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Fig. 2</strong>: The `RStudio` download location.</p>'),
    latex = cat('\\begin{center}
                \\textbf{Fig. 2}: The \\texttt{RStudio} download location.
                \\end{center}'))

3.3 The RStudio Interface

On opening RStudio, the default interface with four panes/windows is visible as follows. Few panes have different tabs.

switch(out_type,
    html = cat('<img src="https://github.com/aravind-j/augmentedRCBD/raw/master/vignettes/rstudio%20panes.png" align="center" alt="The default `RStudio` interface with the four panes.">'),
    latex = cat('\\includegraphics{rstudio panes.png}'))
switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Fig. 3</strong>: The default `RStudio` interface with the four panes.</p>'),
    latex = cat('\\begin{center}
                \\textbf{Fig. 3}: The default \\texttt{RStudio} interface with the four panes.
                \\end{center}'))

3.3.1 Console

This is where the action happens. Here any authentic R code typed after the '>' prompt will be executed after pressing 'Enter' to generate the output.

For example, type 1+1 in the console and press 'Enter'.

1+1

3.3.2 Source

This is where R Scripts (collection of code) can be created and edited. R scripts are text files with a .R extension. R Code for analysis can be typed and saved in such R scripts. New scripts can be opened by clicking 'File|New File' and selecting 'R Script'. Code can be selected from R Scripts and sent to console for evaluation by clicking 'Run' on the 'Source' pane or by pressing 'Ctrl + Enter'.

3.3.3 Environment|History|Connections

The 'Environment' tab shows the list of all the 'objects' (see section 4.3) defined in the current R session. It has also some buttons up top to open, save and clear the environment as well as few options for import of data under Import Dataset.

The 'History' tab shows a history of all the code that was previously evaluated. This is useful, if you want to go back to some code.

The 'Connections' tab helps to establish and manage connections with different databases and data sources.

3.3.4 Files|Plots|Packages|Help|Viewer

The 'Files' tab shows a sleek file browser to access the file directory in the computer with options to manage the working directory (see section 4.1) under the More button.

The 'Plots' tab shows all the plots generated in R with buttons to delete unnecessary ones and export useful ones as a pdf file or as an image file.

The 'Packages' tab shows a list of all the R add-on packages installed. The check box on the left shows whether they are loaded or not. There are also buttons to install and update R packages.

The 'Viewer' tab shows any web content output generated by an R code.

4 Some Basics

This section describes some basics to enable the users to have a working knowledge in R in order to use augmentedRCBD.

4.1 Working Directory {#wdir}

It is a file path to a folder on the computer which is recognised by R as the default location to read files from or write files to. The code getwd() shows the current working directory, while setwd() can be used to change the existing working directory.

# Print current working directory
getwd()
print("C:/Users/Computer/Documents")
# Set new working directory
setwd("C:/Data Analysis/")
getwd()
print("C:/Data Analysis/")

One key detail is that file paths in R uses forward slashes (/) as in MacOS or Linux, unlike backward slashes (\) in Windows. This needs to be considered while copying paths from default Windows file explorer.

4.2 Expression and Assignment

Expressions are instructions in the form of code to be entered after the > prompt in the console. Expressions can be a constant, an arithmetic or a condition. A more advanced and most useful expression is a function call (see section 4.3).

# Constant
123
# Arithmetic (add two numbers)
1 + 2
# Condition
34 > 25
1 == 2
# Function call (mean of a series of numbers)
mean(c(25,56,89,35))

Information from an expression can be stored as an 'object' (see section 4.3) by assigning a name using the operator '<-'.

# Assign the result of the expression 1 + 2 to an object 'a'
a <- 1 + 2
a

It is recommended to add comments to explain the code by using the '#' sign. Any code after the '#' sign will be ignored by R.

4.3 Objects and Functions {#ObjFun}

R is an object-oriented programming language (OOP). Any kind or construct created in R is an 'object'. Each object has a 'class' (shown using the class() function) and different 'attributes' which defines what operations can be done on that object. There are different types of data structure objects in R such as vectors, matrices, factors, data frames, and lists. A 'function' is also an object, which defines a procedure or a sequence of expressions.

4.3.1 Vector {#vector}

A vector is a collection of elements of a single type (or 'mode'). The common vector modes are 'numeric', 'integer', 'character' and 'logical'. The c() function is used to create vectors. The functions class(), str() and length() show the attributes of vectors.

Vector modes 'numeric' stores real numbers, while 'integer' stores integers, which can be enforced by suffixing elements with 'L'.

# A numeric vector
a <- c(1, 2, 3.3)
class(a)
str(a)
length(a)

# An integer vector
b <- c(1L, 2L, 3L)
class(b)
str(b)
length(b)

The vector mode 'character' store text.

# A character vector
c <- c("one","two","three")
class(c)
str(c)
length(c)

The vector mode 'logical' stores 'TRUE' OR 'FALSE' logical data.

#logical vector
d <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)
class(d)
str(d)
length(d)

4.3.2 Factor {#factor}

A 'factor' in R stores data from categorical data in variables as different levels.

catg <- c("male","female","female","male","male")
catg
is.factor(catg)

# Apply the factor function
factor_catg <- factor(catg)

factor_catg
is.factor(factor_catg)
class(factor_catg)
str(factor_catg)

A character, numeric or integer vector can be transformed to a factor by using the as.factor() function.

# Conversion of numeric to factor
a <- c(1, 2, 3.3)
class(a)
str(a)
fac_a <- as.factor(a)
class(fac_a)
str(fac_a)

# Conversion of integer to factor
b <- c(1L, 2L, 3L)
class(b)
str(b)
fac_b <- as.factor(b)
class(fac_b)
str(fac_b)

# Conversion of character to factor
c <- c("one","two","three")
class(c)
str(c)
fac_c <- as.factor(c)
class(fac_c)
str(fac_c)

4.3.3 Matrix

A 'matrix' in R is a vector with the attributes 'nrow' and 'ncol'.

# Generate 5 * 4 numeric matrix
m <- matrix(1:20, nrow = 5, ncol = 4)
m
class(m)
typeof(m)
# Dimensions of m
dim(m) 

4.3.4 List

A 'list' is a container containing different objects. The contents of list need not be of the same type or mode. A list can encompass a mixture of data types such as vectors, matrices, data frames, other lists or any other data structure.

w <- list(a, m, d, list(b, c))
class(w)
str(w)

4.3.5 Data Frame {#dataframe}

A 'data frame' in R is a special kind of list with every element having equal length. It is very important for handling tabular data in R. It is a array like structure with rows and columns. Each column needs to be of a single data type, however data type can vary between columns.

L <- LETTERS[1:4]
y <- 1:4
z <- c("This", "is", "a", "data frame")
df <- data.frame(L, x = 1, y, z)
df
str(df)
attributes(df)
rownames(df)
colnames(df)

4.3.6 Functions

All of the work in R is done by functions. It is an object defining a procedure which takes one or more objects as input (or 'arguments'), performs some action on them and finally gives a new object as output (or 'return'). class(), mean(), getwd(), +, etc. are all functions.

For example the function mean() takes a numeric vector as argument and returns the mean as a numeric vector.

a <- c(1, 2, 3.3)
mean(a)

The user can also create custom functions. For example the function foo adds two numbers and gives the result.

foo <- function(n1, n2) {
  out <- n1 + n2
  return(out)
}
foo(2,3)

4.4 Special Elements

In addition to numbers and text, there are some special elements which can be included in different data objects.

NA (not available) indicates missing data.

x <- c(2.5, NA, 8.6)
y <- c(TRUE, FALSE, NA)
z <- c("k", NA, "m", "n", "o")
is.na(x)
is.na(z)
anyNA(x)
a
is.na(a)

Inf indicates infinity.

1/0

NaN (Not a Number) indicates any undefined value.

0/0

4.5 Indexing

The [ function is used to extract elements of an object by indexing (numeric or logical). Named elements in lists and data frames can be extracted by using the $ operator.

Consider a vector a.

a <- c(1, 2, 3.3, 2.8, 6.7)
# Numeric indexing
# Extract first element
a[1]
# Extract elements 2:3
a[2:3]
# Logical indexing
a[a > 3]

Consider a matrix m.

m <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)
colnames(m) <- c('a', 'b', 'c')
m
# Extract elements
m[,2] # 2nd column of matrix
m[3,] # 3rd row of matrix
m[2:3, 1:3] # rows 2,3 of columns 1,2,3 
m[2,2] # Element in 2nd column of 2nd row
m[, 'b'] # Column 'b'
m[, c('a', 'c')] # Column 'a' and 'c'

Consider a list w.

w <- list(vec = a, mat = m, data = df, alist = list(b, c))

# Indexing by number
w[2] # As list structure
w[[2]] # Without list structure

# Indexing by name
w$vec
w$data

Consider a data frame df.

df

# Indexing by number
df[,2] # 2nd column of data frame
df[2] # 2nd column of data frame
df[3,] # 3rd row of data frame
df[2:3, 1:3] # rows 2,3 of columns 1,2,3 
df[2,2] # Element in 2nd column of 2nd row

# Indexing by name
df$L
df$z

4.6 Help Documentation

The help documentation regarding any function can be viewed using the ? or help() function. The help documentation shows the default usage of the function including, the arguments that are taken by the function and the type of output object returned ('Value').

?ls
help(ls)

?mean

?setwd

4.7 Packages {#pack}

Packages in R are collections of R functions, data, and compiled code in a well-defined format. They are add-ons which extend the functionality of R and at present, there are r nrow(available.packages()) packages available for deployment and use at the official repository, the Comprehensive R Archive Network (CRAN).

Valid packages from CRAN can be installed by using the install.packages() command.

# Install the package 'readxl' for importing data from excel
install.packages(readxl)

Installed packages can be loaded using the function library().

# Install the package 'readxl' for importing data from excel
library(readxl)

4.8 Importing and Exporting Tabular Data {#impexp}

Tabular data from a spreadsheet can be imported into R in different ways. Consider some data such as in Table 1. Copy this data in to a spreadsheet editor such as MS Excel and save it as augdata.csv, a comma-separated-value file and augdata.xlsx, an Excel file in the working directory (getwd()).

switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Table 1</strong>: Example data from an experiment in augmented RCBD design.</p>'),
    latex = cat('\\begin{center}
                \\textbf{Table 1}: Example data from an experiment in augmented RCBD design.
                \\end{center}'))
blk <- c(rep(1,7),rep(2,6),rep(3,7))
trt <- c(1, 2, 3, 4, 7, 11, 12, 1, 2, 3, 4, 5, 9, 1, 2, 3, 4, 8, 6, 10)
y1 <- c(92, 79, 87, 81, 96, 89, 82, 79, 81, 81, 91, 79, 78, 83, 77, 78, 78,
        70, 75, 74)
y2 <- c(258, 224, 238, 278, 347, 300, 289, 260, 220, 237, 227, 281, 311, 250,
        240, 268, 287, 226, 395, 450)
augdata <- data.frame(blk = as.factor(as.character(as.roman(blk))), trt, y1, y2)
knitr::kable(augdata, row.names = F)

The augdata.csv file can be imported into R using the read.csv() function or the read_csv() function in the readr package.

data <- read.csv(file = "augdata.csv")
str(data)
str(augdata)
augdata$blk <- as.character(augdata$blk)

The argument stringsAsFactors = FALSE reads the text columns as of type character instead of the default factor.

data <- read.csv(file = "augdata.csv", stringsAsFactors = FALSE)
str(data)
str(augdata)

The augdata.xlsx file can be imported into R using the read_excel() function in the readxl package.

library(readxl)
data <- read_excel(path = "augdata.xlsx")
str(augdata)

The tabular data can be exported from R to a .csv (comma-separated-value) file by the write.csv() function.

write.csv(x = data, file = "augdata.csv")

4.9 Additional Resources

To learn more about R, there are umpteen number of online tutorials as well as free courses available. Queries about various aspects can be put to the active and vibrant `R community online.

5 Installation of augmentedRCBD {#install}

The package augmentedRCBD can be installed using the following functions.

# Install from CRAN
install.packages('augmentedRCBD', dependencies=TRUE)

# Install development version from Github
if (!require('devtools')) install.packages('devtools')
library(devtools)
install_github("aravind-j/augmentedRCBD")

The stable release is hosted in CRAN (see section 4.7), while the under-development version is hosted as a Github repository. To install from github, you need to use the install_github() function from `devtools package.

Then the package can be loaded using the function

library(augmentedRCBD)
# Fetch release version
rver <- ifelse(test = gsub("(.\\.)(\\d+)(\\..)", "", getNamespaceVersion("augmentedRCBD")) == "",
               yes = getNamespaceVersion("augmentedRCBD"),
               no = as.vector(available.packages()["augmentedRCBD",]["Version"]))

The current version of the package is r rver. The previous versions are as follows.

Table 2. Version history of augmentedRCBD R package.

if (requireNamespace("RCurl", quietly = TRUE) & requireNamespace("httr", quietly = TRUE) & requireNamespace("XML", quietly = TRUE)) {
  pkg <- "augmentedRCBD"
  link <- paste0("https://cran.r-project.org/src/contrib/Archive/", pkg, "/")
  # cafile <- system.file("CurlSSL", "cacert.pem", package = "RCurl")
  # page <- httr::GET(link, httr::config(cainfo = cafile))
  page <- httr::GET(link)
  page <- httr::content(page, as = 'text')
  # page <- RCurl::getURL(link)

  VerHistory <- XML::readHTMLTable(page)[[1]][,2:3]
  colnames(VerHistory) <- c("Version", "Date")
  VerHistory <- VerHistory[VerHistory$Version != "Parent Directory",]
  VerHistory <- VerHistory[!is.na(VerHistory$Version), ]
  VerHistory$Date <- as.Date(VerHistory$Date)
  VerHistory$Version <- gsub("augmentedRCBD_", "", VerHistory$Version)
  VerHistory$Version <- gsub(".tar.gz", "", VerHistory$Version)

  VerHistory <- VerHistory[order(VerHistory$Date), c("Version", "Date")]
  rownames(VerHistory) <- NULL

  knitr::kable(VerHistory)

} else {
  print("Packages 'RCurl', 'httr' and 'XML' are required to generate this table")
}

To know detailed history of changes use news(package='augmentedRCBD').

6 Data Format

Certain details need to be considered for arranging experimental data for analysis using the augmentedRCBD package.

The data should be in long/vertical form, where each row has the data from one genotype per block. For example, consider the following data (Table 3) recorded for a trait from an experiment laid out in an augmented block design with 3 blocks and 12 genotypes(or treatment) with 6 to 7 genotypes/block. 8 genotypes (Test, G 5 to G 12) are not replicated, while 4 genotypes (Check, G 1 to G 4) are replicated.

switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Table 3</strong>: Data from an experiment in augmented RCBD design.</p>'),
    latex = cat('\\begin{center}
                \\textbf{Table 3}: Data from an experiment in augmented RCBD design.
                \\end{center}'))
dataeg <- structure(list(X__1 = c("**Block I**", "", "**Block II**", 
"", "**Block III**", ""), X__2 = c("G12", 
"82", "G5", "79", "**G4**", "78"), X__3 = c("**G4**", "81", "G9", 
"78", "**G2**", "77"), X__4 = c("G11", "89", "--", "--", "**G1**", 
"83"), X__5 = c("**G2**", "79", "**G3**", "81", "G6", "75"), 
    X__6 = c("**G1**", "92", "**G1**", "79", "G10", "74"), X__7 = c("G7", 
    "96", "**G2**", "81", "**G3**", "78"), X__8 = c("**G3**", 
    "87", "**G4**", "91", "G8", "70")), row.names = c(NA, -6L
), class = c("data.frame"))

knitr::kable(dataeg, col.names = NULL)

This data needs to be arranged with columns showing block, genotype (or treatment) and the data of the trait for each genotype per block (Table 4).

switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Table 4</strong>: Data from an experiment in augmented RCBD design arranged in long-form.</p>'),
    latex = cat('\\begin{center}
                \\textbf{Table 4}: Data from an experiment in augmented RCBD design arranged in long-form.
                \\end{center}'))
Block <- c(rep("Block I",7),rep("Block II",6),rep("Block III",7))
Treatment <- c("G 1", "G 2", "G 3", "G 4", "G 7", "G 11", "G 12", "G 1", "G 2", 
               "G 3", "G 4", "G 5", "G 9", "G 1", "G 2", "G 3", "G 4", "G 8", 
               "G 6", "G 10")
Trait <- c(92, 79, 87, 81, 96, 89, 82, 79, 81, 81, 91, 79, 78, 83, 77, 78, 78,
        70, 75, 74)
augdata <- data.frame(Block, Treatment, Trait)
knitr::kable(augdata, row.names = F)

The data for block and genotype (or treatment) can also be depicted as numbers (Table 5).

switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Table 5</strong>: Data from an experiment in augmented RCBD design arranged in long-form (Block and Treatment as numbers).</p>'),
    latex = cat('\\begin{center}
                \\textbf{Table 5}: Data from an experiment in augmented RCBD design arranged in long-form (Block and Treatment as numbers).
                \\end{center}'))
Block <- c(rep(1,7),rep(2,6),rep(3,7))
Treatment <- c(1, 2, 3, 4, 7, 11, 12, 1, 2, 3, 4, 5, 9, 1, 2, 3, 4, 8, 6, 10)
Trait <- c(92, 79, 87, 81, 96, 89, 82, 79, 81, 81, 91, 79, 78, 83, 77, 78, 78,
        70, 75, 74)
augdata <- data.frame(Block, Treatment, Trait)
knitr::kable(augdata, row.names = F)

Multiple traits can be added as additional columns (Table 6).

switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Table 6</strong>: Data from an experiment in augmented RCBD design arranged in long-form (Multiple traits).</p>'),
    latex = cat('\\begin{center}
                \\textbf{Table 6}: Data from an experiment in augmented RCBD design arranged in long-form (Multiple traits).
                \\end{center}'))
Block <- c(rep("Block I",7),rep("Block II",6),rep("Block III",7))
Treatment <- c("G 1", "G 2", "G 3", "G 4", "G 7", "G 11", "G 12", "G 1", "G 2", 
               "G 3", "G 4", "G 5", "G 9", "G 1", "G 2", "G 3", "G 4", "G 8", 
               "G 6", "G 10")
Trait1 <- c(92, 79, 87, 81, 96, 89, 82, 79, 81, 81, 91, 79, 78, 83, 77, 78, 78,
        70, 75, 74)
Trait2 <- c(258, 224, 238, 278, 347, 300, 289, 260, 220, 237, 227, 281, 311, 250,
        240, 268, 287, 226, 395, 450)
augdata <- data.frame(Block, Treatment, Trait1, Trait2)
knitr::kable(augdata, row.names = F)

Data should preferably be balanced i.e. all the check genotypes should be present in all the blocks. If not, a warning is issued. The number of test genotypes can vary within a block. There should not be any missing values. Rows of genotypes with missing values for one or more traits should be removed.

Such a tabular data should be imported (see section 7.8) into R as a data frame object (see section 4.3.5). The columns with the block and treatment categorical data should of the type factor (see section 4.3.2), while the column(s) with the trait data should be of the type integer or numeric (see section 4.3.1).

7 Data Analysis for a Single Trait

Analysis of data for a single trait can be performed by using augmentedRCBD function. It generates an object of class augmentedRCBD. Such an object can then be taken as input by the several functions to print the results to console (print.augmentedRCBD), generate descriptive statistics from adjusted means (describe.augmentedRCBD), plot frequency distribution (freqdist.augmentedRCBD) and computed genetic variability statistics (gva.augmentedRCBD). All these outputs can also be exported as a MS Word report using the report.augmentedRCBD function.

if (requireNamespace("diagram", quietly = TRUE)) {
  suppressMessages(library(diagram))

# Plot matrix
elpos <- coordinates(pos = c(1, 1, 3, 1, 1))
elpos[c(-3,-4), 1] <- elpos[5, 1]

par(mar = c(1, 1, 1, 1))
openplotmat()

# text(elpos, lab = as.character(c(1:7)), cex = 2)

# Arrows
arrows <- data.frame(from = c(3, 4, 4, 4, 4, 4), 
                     to   = c(4, 1, 2, 5, 6, 7))

for (i in 1:dim(arrows)[1]) {
  straightarrow(from = elpos[arrows[i,1], ],
                to = elpos[arrows[i,2], ], arr.type = "curved", arr.lwd = 0.5,
                lwd = 2, arr.pos = 0.5, arr.length = 0.2, arr.width = 0.15,
                lcol = "midnightblue", arr.col = "midnightblue")
}

# Textbox
elpostext <- elpos[c(3, 4, 1, 2, 5, 6, 7),]
flowtext <- c("Data", "augmentedRCBD",
              "print.augmentedRCBD", "describe.augmentedRCBD",
              "freqdist.augmentedRCBD", "gva.augmentedRCBD",
              "report.augmentedRCBD")
flowfont <- c("sans", rep("sans", 6))
flowradx <- c(0.065, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13)
flowcex <- c(0.7, rep(0.7, 6))
flowtcol <- c("black", rep("dodgerblue4", 6))
for (i in 1:dim(elpostext)[1]) {
  textround(elpostext[i,], radx = flowradx[i], rady = 0.03, lab = flowtext[i],
           box.col = "white", shadow.col = "lightskyblue3", shadow.size = 0.005,
           family = flowfont[i], cex = flowcex[i], col = flowtcol[i], rx = 0.0075)
}

} else {
  print("package 'diagram' is required to generate this figure")
}

Fig. 4. Workflow for analysis of single traits with augmentedRCBD.

7.1 augmentedRCBD()

Consider the data in Table 1. The data can be imported into R as vectors as follows.

blk <- c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3)
trt <- c(1, 2, 3, 4, 7, 11, 12, 1, 2, 3, 4, 5, 9, 1, 2, 3, 4, 8, 6, 10)
y1 <- c(92, 79, 87, 81, 96, 89, 82, 79, 81, 81, 91, 79, 78, 83, 77, 78, 78,
        70, 75, 74)
y2 <- c(258, 224, 238, 278, 347, 300, 289, 260, 220, 237, 227, 281, 311, 250,
        240, 268, 287, 226, 395, 450)

The blk and trt vectors with the block and treatment data need to be converted into factors as follows before analysis.

# Convert block and treatment to factors
blk <- as.factor(blk)
trt <- as.factor(trt)

With the data in appropriate format, the analysis can be performed as follows for the trait y1 as follows.

out1 <- augmentedRCBD(blk, trt, y1, method.comp = "lsd",
                      alpha = 0.05, group = TRUE, console = TRUE)
class(out1)

Similarly the analysis for the trait y2 can be computed as follows.

out2 <- augmentedRCBD(blk, trt, y2, method.comp = "lsd",
                      alpha = 0.05, group = TRUE, console = TRUE)
class(out2)

The data can also be imported as a data frame and then used for analysis. Consider the data frame data imported from Table 1 according to the instructions in section 4.8.

data <- data.frame(blk, trt, y1, y2)
str(data)
# Convert block and treatment to factors
data$blk <- as.factor(data$blk)
data$trt <- as.factor(data$trt)
# Results for variable y1
out1 <- augmentedRCBD(data$blk, data$trt, data$y1, method.comp = "lsd",
                      alpha = 0.05, group = TRUE, console = TRUE)
class(out1)

# Results for variable y2
out2 <- augmentedRCBD(data$blk, data$trt, data$y2, method.comp = "lsd",
                     alpha = 0.05, group = TRUE, console = TRUE)
class(out2)

Check genotypes are inferred by default on the basis of number of replications. However, if some test genotypes are also replicated, they may also be falsely detected as checks. To avoid this, the checks can be specified by the checks argument.

# Results for variable y1 (checks specified)
out1 <- augmentedRCBD(data$blk, data$trt, data$y1, method.comp = "lsd",
                      alpha = 0.05, group = TRUE, console = TRUE,
                      checks = c("1", "2", "3", "4"))

# Results for variable y2 (checks specified)
out2 <- augmentedRCBD(data$blk, data$trt, data$y2, method.comp = "lsd",
                      alpha = 0.05, group = TRUE, console = TRUE,
                      checks = c("1", "2", "3", "4"))

In case the large number of treatments or genotypes, it is advisable to avoid treatment comparisons with the group = FALSE argument as it will be memory and processor intensive. Further it is advised to simplify output with simplify = TRUE in order to reduce output object size.

If truncate.means = TRUE, then any negative adjusted means will be truncated to zero with a warning.

7.2 print.augmentedRCBD()

The results of analysis in an object of class augmentedRCBD can be printed to the console as follows.

# Print results for variable y1
print(out1)

# Print results for variable y2
print(out2)

7.3 describe.augmentedRCBD()

The descriptive statistics such as count, mean, standard error, minimum, maximum, skewness ( with p-value from D'Agostino test of skewness (@dagostino_transformation_1970)) and kurtosis (with p-value from Anscombe-Glynn test of kurtosis (@anscombe_distribution_1983)) for the adjusted means from the results in an object of class augmentedRCBD can be computed as follows.

# Descriptive statistics for variable y1
describe.augmentedRCBD(out1)

# Descriptive statistics for variable y2
describe.augmentedRCBD(out2)

7.4 freqdist.augmentedRCBD()

The frequency distribution of the adjusted means from the results in an object of class augmentedRCBD can be plotted as follows.

# Frequency distribution for variable y1
freq1 <- freqdist.augmentedRCBD(out1, xlab = "Trait 1")
plot(freq1)

# Frequency distribution for variable y2
freq2 <- freqdist.augmentedRCBD(out2, xlab = "Trait 2")
plot(freq2)

The colours for the check values may be specified using the argument check.col.

colset <- c("red3", "green4", "purple3", "darkorange3")

# Frequency distribution for variable y1
freq1 <- freqdist.augmentedRCBD(out1, xlab = "Trait 1", check.col = colset)
plot(freq1)

# Frequency distribution for variable y2
freq2 <- freqdist.augmentedRCBD(out2, xlab = "Trait 2", check.col = colset)
plot(freq2)

The default the check highlighting can be avoided using the argument highlight.check = FALSE.

# Frequency distribution for variable y1
freq1 <- freqdist.augmentedRCBD(out1, xlab = "Trait 1",
                                highlight.check = FALSE)
plot(freq1)

# Frequency distribution for variable y2
freq2 <- freqdist.augmentedRCBD(out2, xlab = "Trait 2",
                                highlight.check = FALSE)
plot(freq2)

7.5 gva.augmentedRCBD()

The genetic variability statistics such as mean, phenotypic, genotypic and environmental variation (@federerModelConsiderationsVariance1976), phenotypic, genotypic and environmental coefficient of variation (@burton_quantitative_1951, @burton_qualitative_1952), category of phenotypic and genotypic coefficient of variation according to @sivasubramaniam_genotypic_1973, broad-sense heritability (H^2^) (@lush_intra-sire_1940), H^2^ category according to @robinson_quantitative_1966, Genetic advance (GA), genetic advance as per cent of mean (GAM) and GAM category according to @johnson_estimates_1955 are computed from an object of class augmentedRCBD as follows. Genetic variability analysis needs to be performed only if the sum of squares of "Treatment: Test" are significant.

# Genetic variability statistics for variable y1
gva.augmentedRCBD(out1)

# Genetic variability statistics for variable y2
gva.augmentedRCBD(out2)

Negative estimates of variance components if computed are not abnormal. For information on how to deal with these, refer @robinson_genetic_1955 and @dudley_interpretation_1969.

7.5 report.augmentedRCBD()

The results generated by the analysis can be exported to a MS Word file as follows.

# MS word report for variable y1
report.augmentedRCBD(aug = out1,
                     target = file.path(tempdir(), "augmentedRCBD output - y1.docx"))

# MS word report for variable y2
report.augmentedRCBD(aug = out1,
                     target = file.path(tempdir(), "augmentedRCBD output - y2.docx"))
switch(out_type,
    html = cat('<img src="https://github.com/aravind-j/augmentedRCBD/raw/master/vignettes/augRCBDword.png" align="center" alt="The `R` download location.">'),
    latex = cat('\\includegraphics{augRCBDword.png}'))
switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Fig. 6</strong>: MS Word report generated with `report.agumentedRCBD` function.</p>'),
    latex = cat('\\begin{center}
                \\textbf{Fig. 6}: MS Word report generated with \\texttt{report.agumentedRCBD} function.
                \\end{center}'))

8 Data Analysis for a Multiple Traits

Analysis of data for a multiple traits simultaneously can be performed by using augmentedRCBD.bulk function. It generates an object of class augmentedRCBD.bulk. Such an object can then be taken as input by print.augmentedRCBD.bulk to print the results to console. The results can also be exported as a MS Word report using the report.augmentedRCBD.bulk function.

if (requireNamespace("diagram", quietly = TRUE)) {
  suppressMessages(library(diagram))

# Plot matrix
elpos <- coordinates(pos = c(1, 2, 1))
elpos[c(-2, -3), 1] <- 0.833333
elpos[c(-2, -3), 2] <- elpos[c(-2, -3), 2] + c(-0.1, 0.1)
elpos[c(2, 3), 1] <- elpos[c(2, 3), 1] - c(0.1, 0.3)

par(mar = c(1, 1, 1, 1))
openplotmat()

text(elpos, lab = as.character(c(1:4)), cex = 2)

# Arrows
arrows <- data.frame(from = c(2, 3, 3), 
                     to   = c(3, 1, 4))

for (i in 1:dim(arrows)[1]) {
  straightarrow(from = elpos[arrows[i,1], ],
                to = elpos[arrows[i,2], ], arr.type = "curved", arr.lwd = 0.5,
                lwd = 2, arr.pos = 0.5, arr.length = 0.2, arr.width = 0.15,
                lcol = "midnightblue", arr.col = "midnightblue")
}

# Textbox
elpostext <- elpos[c(2, 3, 1, 4),]
flowtext <- c("Data", "augmentedRCBD.bulk",
              "print.augmentedRCBD.bulk", "report.augmentedRCBD.bulk")
flowfont <- c("sans", rep("sans", 3))
flowradx <- c(0.065, 0.11, 0.13, 0.13)
flowcex <- c(0.7, rep(0.7, 3))
flowtcol <- c("black", rep("dodgerblue4", 3))
for (i in 1:dim(elpostext)[1]) {
  textround(elpostext[i,], radx = flowradx[i], rady = 0.03, lab = flowtext[i],
           box.col = "white", shadow.col = "lightskyblue3", shadow.size = 0.005,
           family = flowfont[i], cex = flowcex[i], col = flowtcol[i], rx = 0.0075)
}

} else {
  print("package 'diagram' is required to generate this figure")
}

Fig. 7. Workflow for analysis of multiple traits with augmentedRCBD.

8.1 augmentedRCBD.bulk()

Consider the data frame data imported from Table 1 according to the instructions in section 4.8.

data <- data.frame(blk, trt, y1, y2)
str(data)
# Convert block and treatment to factors
data$blk <- as.factor(data$blk)
data$trt <- as.factor(data$trt)

Rather than performing the analysis individually for each variable/trait separately using augmentedRCBD, the analysis can be performed simultaneously for for both the traits using augmentedRCBD.bulk function. It is a wrapper around the augmentedRCBD core function and its associated helper functions.

However in this case treatment comparisons/grouping by least significant difference or Tukey's honest significant difference method is not computed. Also the output object size is reduced using the simplify = TRUE argument in the augmentedRCBD function.

The logical arguments describe, freqdist and gva can be used to specify whether to generate the descriptive statistics, frequency distribution plots and genetic variability statistics respectively. If gva = TRUE, then plots to compare phenotypic and genotypic coefficient of variation, broad sense heritability and genetic advance over mean between traits are also generated.

bout <- augmentedRCBD.bulk(data = data, block = "blk",
                           treatment = "trt", traits = c("y1", "y2"),
                           checks = NULL, alpha = 0.05, describe = TRUE,
                           freqdist = TRUE, gva = TRUE,
                           check.col = c("brown", "darkcyan",
                                         "forestgreen", "purple"),
                           console = TRUE)

8.2 print.augmentedRCBD.bulk()

The results of analysis in an object of class augmentedRCBD.bulk can be printed to the console as follows.

# Print results
print(bout)

8.3 report.augmentedRCBD.bulk()

The results generated by the analysis can be exported to a MS Word file as follows.

# MS word report
report.augmentedRCBD.bulk(aug.bulk = bout,
                          target = file.path(tempdir(),
                          "augmentedRCBD bulk output.docx"))
switch(out_type,
    html = cat('<img src="https://github.com/aravind-j/augmentedRCBD/raw/master/vignettes/augRCBDbulkword.png" align="center" alt="The `R` download location.">'),
    latex = cat('\\includegraphics{augRCBDbulkword.png}'))
switch(out_type,
    html = cat('<p style="text-align: left;"><strong>Fig. 8</strong>: MS Word report generated with `report.agumentedRCBD.bulk` function.</p>'),
    latex = cat('\\begin{center}
                \\textbf{Fig. 8}: MS Word report generated with \\texttt{report.agumentedRCBD.bulk} function.
                \\end{center}'))

9 Citing augmentedRCBD

# detach("package:augmentedRCBD", unload=TRUE)
suppressPackageStartupMessages(library(augmentedRCBD))
cit <- citation("augmentedRCBD")
# yr <- format(Sys.Date(), "%Y")
# cit[1]$year <- yr
# oc <- class(cit)
# 
# cit <- unclass(cit)
# attr(cit[[1]],"textVersion") <- gsub("\\(\\)",
#                                      paste("\\(", yr, "\\)", sep = ""),
#                                      attr(cit[[1]],"textVersion"))
# class(cit) <- oc
cit

10 Session Info

sessionInfo()

References



Try the augmentedRCBD package in your browser

Any scripts or data that you put into this service are public.

augmentedRCBD documentation built on June 12, 2021, 9:06 a.m.