knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

What does this tutorial cover?

This tutorial guide explains about the input formats supported by IPCAPS.BIOC.

Required packages

In this tutorial, we will use the functions from IPCAPS.BIOC.

library(IPCAPS.BIOC)

Supported formats

IPCAPS.BIOC supports 3 formats:

Binary PLINK format

If you have your data in PLINK format, either PED format (text) or TPED format (text), it can be simply converted to BED format (binary). The PLINK binary format or BED consists of 3 files; bed, bim, and fam. To convert these files from PLINK, use option –make-bed. See more details from this link.

There is one example of binary PLINK format available in the package and these files are:

In order to refer to these files which are embedded in the package, use these commands:

BED.file <- system.file("extdata", 
                        "ipcaps_example.bed", 
                        package = "IPCAPS.BIOC")

BIM.file <- system.file("extdata", 
                        "ipcaps_example.bim", 
                        package = "IPCAPS.BIOC")

FAM.file <- system.file("extdata", 
                        "ipcaps_example.fam", 
                        package = "IPCAPS.BIOC")

print(BED.file)
print(BIM.file)
print(FAM.file)

RData format

If you have the data already in R format, you can prepare the RData file to be compatible with the format accepted by ipcap(). This file must contain four variables which are:

In the case of re-analysis, it is convenient to run ipcaps() using the file rawdata.RData generated by ipcaps() and this file is located in the directory called /RData of the indicated output directory. This file contains all variables explained previously.

There is one example of RData format available in the package and this file is ipcaps_example.RData.

In order to refer to this file which is embedded in the package, use these commands:

RDATA.file <- system.file("extdata", 
                        "ipcaps_example.RData", 
                        package = "IPCAPS.BIOC")

print(RDATA.file)

You can check the variables inside this file by using these commands:

RDATA.file <- system.file("extdata", 
                        "ipcaps_example.RData", 
                        package = "IPCAPS.BIOC")

load(RDATA.file)

# show all variables and objects
ls.str() 

Text format

The function ipcaps supports text format which contains a data matrix. The rows of the data matrix must represent samples or individuals, and the column of the data matrix must represent features or SNPs. Columns can be separated by a single space or a tab. A data matrix must not contain any row name and any colum name.

For text format, ipcaps() supports both SNP data and the data matrix with continuous numbers (e.g. gene expression, peak counts, cell counts, etc). If the data type is 'snp', the data must be encoded using additive coding as 0 (homozygous allele), 1 (heterozygous allele) and 2 (mutant allele). If the data type is not SNP, the data must be numeric. The string is not allowed in the text format.

A text file can be either a normal text file or a compressed text file using gzip. A big text file should be divided into smaller files to load faster. For instance, you can input 3 files, use as:

files <- c('input1.txt', 'input2.txt', 'input3.txt')

There is one example of text format available in the package and this file is ipcaps_example_rowVar_colInd.txt.gz.

In order to refer to this file which is embedded in the package, use these commands:

TEXT.file <- system.file("extdata", 
                        "ipcaps_example_rowVar_colInd.txt.gz", 
                        package = "IPCAPS.BIOC")

print(TEXT.file)

You can check the data inside this file by using these commands:

text.input <- read.table(TEXT.file)

# check the size of data matrix
print(dim(text.input))

# show one part of the data matrix
print(text.input[1:5, 1:10])

Additional information file

Additional useful information (called "labels" in IPCAPS.BIOC) related subject, for example, geographic location or disease phenotype. These labels (one at a time) are used in displaying the clustering outcome of ipcaps(). This file should be provided in order to have a more meaningful result and visualization. However, it is not be used in the clustering process. Therefore, ipcaps() can be run without this additional file. However, it is recommended to have an additional information file for a better understanding of the result.

A label file must in the text format and it must contain at least one column. However, it may contain more than one column in which case each column needs to be separated by a space or a tab. An additional file can be either a normal text file or a compressed text file using gzip.

There is one example of an additional information file in the package and this file is ipcaps_example_individuals.txt.gz.

In order to refer to this file which is embedded in the package, use these commands:

EXTRA.file <- system.file("extdata", 
                          "ipcaps_example_individuals.txt.gz",
                          package = "IPCAPS.BIOC")

extra.info <- read.table(file = EXTRA.file,
                         header = FALSE)

head(extra.info)


kridsadakorn/ipcaps.bioc documentation built on Jan. 22, 2020, 11:18 p.m.