Home

/

Bioconductor

/

ADaCGH2

/

inputToADaCGH: Convert CGH data to ff or RAM objects for use with ADaCGH2

inputToADaCGH: Convert CGH data to ff or RAM objects for use with ADaCGH2
In ADaCGH2: Analysis of big data from aCGH experiments using parallel computing and ff objects

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/ADaCGH-2.R

Input data with CGH data are converted to several ff files and data checked for potential errors and location duplications.

inputToADaCGH(ff.or.RAM = "RAM",
                      robjnames = c("cgh.dat", "chrom.dat",
                                   "pos.dat", "probenames.dat"),
                      ffpattern = paste(getwd(), "/", sep = ""),
                      MAList = NULL,
                      cloneinfo = NULL,
                      RDatafilename = NULL,
                      textfilename = NULL,
                      dataframe = NULL,
                      path = NULL,
                      excludefiles = NULL,
                      cloneinfosep = "\t",
                      cloneinfoquote = "\"",
                      minNumPerChrom = 10,
                      verbose = FALSE,
                      mc.cores = floor(detectCores()/2))

`ff.or.RAM`	Whether you want to store the output as `ff` or RAM objects ("usual" R objects, such as data frames and vectors).
`robjnames`	Name of the objects that will be created in you use `ff.or.RAM = "RAM"`. If the names existing in the environment from where the function is called, they will not be overwritten, and the function will abort.
`ffpattern`	See argument `pattern` in `ff`. The default is to create the "ff" files in the current working directory.
`MAList`	The name of an object of class `MAList` (`as.MAList`) or `SegList` (e.g., `dim.SegList`). See vignnettes for these packages (limma and snapCGH) for details about these objects. You have to specify one, and only one, of `MAList`, `RDatafilename`, `textfilename`, `path`.
`cloneinfo`	A character vector with the full path to a file that conforms to the characteristics of `file` in function `read.clonesinfo` (see details in the vignette) or the name of a data frame with at least a column named "Chr" (with chromosomal informtaion) and "Position". This is only needed if you use `MAList` and your MAList object does not have Position and Chr columns.
`RDatafilename`	Name of data RData file that contains the data frame with original, non-ff, data. Note: this is the name of the RData file (possibly including path), NOT the name of the data frame. (For that, look at `dataframe`). The first three columns of the data frame are the IDs of the probes, the chromosome number, and the position, and all remaining columns contain the data for the arrays, one column per array. The names of the first three column do not matter, but the order does. Names of the remaining columns will be used if existing; otherwise, fake array names will be created. You have to specify one, and only one, of `MAList`, `RDatafilename`, `textfilename`, `path`.
`textfilename`	The name of a text file with the data. It should be a tab separated file, with a header. The first three columns of the data frame are the IDs of the probes, the chromosome number, and the position, and all remaining columns contain the data for the arrays, one column per array. The names of the first three column do not matter, but the order does. Names of the remaining columns will be used if existing; otherwise, fake array names will be created. You have to specify one, and only one, of `MAList`, `RDatafilename`, `textfilename`, `path`.
`dataframe`	The name of a data frame with the data. The first three columns of the data frame are the IDs of the probes, the chromosome number, and the position, and all remaining columns contain the data for the arrays, one column per array. The names of the first three column do not matter, but the order does. Names of the remaining columns will be used if existing; otherwise, fake array names will be created.
`path`	The name of the directory (the full path) to where each of the individual, one-column, files are. We will read ALL of the files in this directory, except for those listed under `excludefiles`. One file has to be named "ID.txt", another "Chrom.txt", and a third "Pos.txt". The rest of the files can be named any way you want and those are the files that contain the CGH data. All of the files are expected to be one-column text files, with a first row with a header. The header will not be used for "ID.txt", "Chrom.txt", or "Pos.txt", but the header will be used as the name of the array/subject for the CGH data files. You have to specify one, and only one, of `MAList`, `RDatafilename`, `textfilename`, `path`.
`excludefiles`	If you have specified `path`, names of files not to be read. A vector of strings. These should be the names of the files, without path information (as all of the files are in the same directory, as specified by `path`).
`cloneinfosep`	Argument to `read.table` if reading a `cloneinfo` file. Note: this is NOT used if reading a text file given in `textfilename`.
`cloneinfoquote`	Argument to `read.table` if reading a `cloneinfo` file. Note: this is NOT used if reading a text file given in `textfilename`.

`minNumPerChrom`	If any chromosome has fewer observations than minNumPerChrom the function will fail. This can help detect upstream pre-processing errors.
`verbose`	If TRUE, provide additional information that can be useful to debug problems. Right now it provides the list of files that will be read if using a directory. The default is FALSE.
`mc.cores`	The number of cores to use when reading files. This is always 1 in Windows. See details about the number of cores in `mclapply` and `detectCores`. Contention problems in I/O might be minimized by making this number smaller or much smaller than what is returned by `detectCores`. For long running jobs, please do some initial benchmarking. See comments and discussion in file "benchmark.pdf". In general, if you have a single SATA disk make this a small number (say, 2 or 3 or 6); in contrast, if you have many fast SAS disks in a RAID0 or RAID10 array, you can increase the number quite a bit (but generally always well below what detectCores gives).

If there are identical positions (in the same chromosome) a small random uniform variate is added to get unique locations.

We carry out several checks (e.g., no duplicated positions), but note that we DO NOT check for extremely large or small values, and this includes NOT CHECKING for infinite values.

Missing values are allowed in the data columns. However, we do not check for missing values in the ID, chromosome, or position columns, except if you are using as input an RData file or MA list. You better not have any missing values there; otherwise, things will break in strange ways. Why this inconsistency? Checking for missing values can consume a lot of resources (CPU and memory). If your are really huge, they will probably be stored as text files, and you are expected to use the appropriate tools there to filter (e.g., sed, awk, whatever). If they exist as an MA list or an RData file, they once fitted in RAM, so checking for these NAs is probably reasonable.

If you provide a text file as input (textfilename), the reading operation is carried out using read.table.ffdf, to allow for reading very large files. Using this option, however, does not force you to produce as output ff objects.

Commented examples of reading objects from limma and snapCGH are provided in the vignnette.

This function is used mainly for its side effects: writing either several ff files to the current working directory, or several RAM objects (the usual, in memory, local, R objects). The actual names are printed out.

Ramon Diaz-Uriarte rdiaz02@gmail.com

cutFile for obtaining files in the format needed if you read from a directory.

## Create a temp dir for storing output.
## (Not needed, but cleaner).
dir.create("ADaCGH2_example_input_dir")
originalDir <- getwd()
setwd("ADaCGH2_example_input_dir")
## Sys.sleep(1)

## Get location (and full filename) of example data file
fnameRData <- list.files(path = system.file("data", package = "ADaCGH2"),
                         full.names = TRUE, pattern = "inputEx.RData")

fnametxt <- list.files(path = system.file("data", package = "ADaCGH2"),
                         full.names = TRUE, pattern = "inputEx.txt")

namepath <- system.file("example-datadir", package = "ADaCGH2")

## Read from RData and write to ff
inputToADaCGH(ff.or.RAM = "ff",
              RDatafilename = fnameRData)

## Read from text file and write to ff
##    You might want to adapt mc.cores to your hardware
inputToADaCGH(ff.or.RAM = "ff",
              textfilename = fnametxt,
              mc.cores = 2)


## Read from text file and write to RAM
##    You might want to adapt mc.cores to your hardware
inputToADaCGH(ff.or.RAM = "RAM",
              textfilename = fnametxt,
              mc.cores = 2)

## Read from a directory and write to ff
##    You might want to adapt mc.cores to your hardware
inputToADaCGH(ff.or.RAM = "ff",
              path = namepath,
              mc.cores = 2)

### Clean up (DO NOT do this with objects you want to keep!!!)
load("chromData.RData")
load("posData.RData")
load("cghData.RData")

delete(cghData); rm(cghData)
delete(posData); rm(posData)
delete(chromData); rm(chromData)
unlink("chromData.RData")
unlink("posData.RData")
unlink("cghData.RData")
unlink("probeNames.RData")


### Running in a separate process. Only makes sense
### if returning ff objects (ff.or.RAM = "ff")
### This example will not work on Windows

## Not run: 
mcparallel(inputToADaCGH(ff.or.RAM = "ff",
                                 RDatafilename = fnameRData),
                                 silent = FALSE)
  tableChromArray <- mccollect()
  if(inherits(tableChromArray, "try-error")) {
    stop("ERROR in input data conversion")
  }
### Clean up (DO NOT do this with objects you want to keep!!!)
load("chromData.RData")
load("posData.RData")
load("cghData.RData")

delete(cghData); rm(cghData)
delete(posData); rm(posData)
delete(chromData); rm(chromData)
unlink("chromData.RData")
unlink("posData.RData")
unlink("cghData.RData")
unlink("probeNames.RData")

## End(Not run)

### Try to prevent problems in R CMD check
## Sys.sleep(2)

### Delete temp dir
setwd(originalDir)
## Sys.sleep(2)
unlink("ADaCGH2_example_input_dir", recursive = TRUE)
## Sys.sleep(2)

ADaCGH2 documentation built on Nov. 8, 2020, 4:57 p.m.

ADaCGH2 index

ADaCGH2 Overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ADaCGH2
Analysis of big data from aCGH experiments using parallel computing and ff objects

inputToADaCGH: Convert CGH data to ff or RAM objects for use with ADaCGH2
In ADaCGH2: Analysis of big data from aCGH experiments using parallel computing and ff objects

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to inputToADaCGH in ADaCGH2...

R Package Documentation

Browse R Packages

We want your feedback!

ADaCGH2 Analysis of big data from aCGH experiments using parallel computing and ff objects

inputToADaCGH: Convert CGH data to ff or RAM objects for use with ADaCGH2 In ADaCGH2: Analysis of big data from aCGH experiments using parallel computing and ff objects

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to inputToADaCGH in ADaCGH2...

R Package Documentation

Browse R Packages

We want your feedback!

ADaCGH2
Analysis of big data from aCGH experiments using parallel computing and ff objects

inputToADaCGH: Convert CGH data to ff or RAM objects for use with ADaCGH2
In ADaCGH2: Analysis of big data from aCGH experiments using parallel computing and ff objects