Description Usage Arguments Details Value Author(s) See Also Examples
Input data with CGH data are converted to several ff files and data checked for potential errors and location duplications.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | inputToADaCGH(ff.or.RAM = "RAM",
robjnames = c("cgh.dat", "chrom.dat",
"pos.dat", "probenames.dat"),
ffpattern = paste(getwd(), "/", sep = ""),
MAList = NULL,
cloneinfo = NULL,
RDatafilename = NULL,
textfilename = NULL,
dataframe = NULL,
path = NULL,
excludefiles = NULL,
cloneinfosep = "\t",
cloneinfoquote = "\"",
minNumPerChrom = 10,
verbose = FALSE,
mc.cores = floor(detectCores()/2))
|
ff.or.RAM |
Whether you want to store the output as |
robjnames |
Name of the objects that will be created in you use
|
ffpattern |
See argument |
MAList |
The name of an object of class You have to specify one, and only one, of |
cloneinfo |
A character vector with the full path to a file that
conforms to the characteristics of This is only needed if you use |
RDatafilename |
Name of data RData file that contains the data frame
with original, non-ff, data. Note: this is the name of the RData file
(possibly including path), NOT the name of the data frame. (For
that, look at The first three columns of the data frame are the IDs of the probes, the chromosome number, and the position, and all remaining columns contain the data for the arrays, one column per array. The names of the first three column do not matter, but the order does. Names of the remaining columns will be used if existing; otherwise, fake array names will be created. You have to specify one, and only one, of |
textfilename |
The name of a text file with the data. It should be a tab separated file, with a header. The first three columns of the data frame are the IDs of the probes, the chromosome number, and the position, and all remaining columns contain the data for the arrays, one column per array. The names of the first three column do not matter, but the order does. Names of the remaining columns will be used if existing; otherwise, fake array names will be created. You have to specify one, and only one, of |
dataframe |
The name of a data frame with the data. The first three columns of the data frame are the IDs of the probes, the chromosome number, and the position, and all remaining columns contain the data for the arrays, one column per array. The names of the first three column do not matter, but the order does. Names of the remaining columns will be used if existing; otherwise, fake array names will be created. |
path |
The name of the directory (the full path) to where each of
the individual, one-column, files are. We will read ALL of the files
in this directory, except for those listed under
All of the files are expected to be one-column text files, with a first row with a header. The header will not be used for "ID.txt", "Chrom.txt", or "Pos.txt", but the header will be used as the name of the array/subject for the CGH data files. You have to specify one, and only one, of |
excludefiles |
If you have specified |
cloneinfosep |
Argument to |
cloneinfoquote |
Argument to |
minNumPerChrom |
If any chromosome has fewer observations than minNumPerChrom the function will fail. This can help detect upstream pre-processing errors. |
verbose |
If TRUE, provide additional information that can be useful to debug problems. Right now it provides the list of files that will be read if using a directory. The default is FALSE. |
mc.cores |
The number of cores to use when reading files. This is
always 1 in Windows. See details about the number of cores in
|
If there are identical positions (in the same chromosome) a small random uniform variate is added to get unique locations.
We carry out several checks (e.g., no duplicated positions), but note that we DO NOT check for extremely large or small values, and this includes NOT CHECKING for infinite values.
Missing values are allowed in the data columns. However, we do not check for missing values in the ID, chromosome, or position columns, except if you are using as input an RData file or MA list. You better not have any missing values there; otherwise, things will break in strange ways. Why this inconsistency? Checking for missing values can consume a lot of resources (CPU and memory). If your are really huge, they will probably be stored as text files, and you are expected to use the appropriate tools there to filter (e.g., sed, awk, whatever). If they exist as an MA list or an RData file, they once fitted in RAM, so checking for these NAs is probably reasonable.
If you provide a text file as input (textfilename
), the reading
operation is carried out using read.table.ffdf
, to
allow for reading very large files. Using this option, however, does
not force you to produce as output ff
objects.
Commented examples of reading objects from limma and snapCGH are provided in the vignnette.
This function is used mainly for its side effects: writing either
several ff
files to the current working directory, or several
RAM objects (the usual, in memory, local, R objects). The actual names
are printed out.
Ramon Diaz-Uriarte rdiaz02@gmail.com
cutFile
for obtaining files in the format needed if you read from a directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | ## Create a temp dir for storing output.
## (Not needed, but cleaner).
dir.create("ADaCGH2_example_input_dir")
originalDir <- getwd()
setwd("ADaCGH2_example_input_dir")
## Sys.sleep(1)
## Get location (and full filename) of example data file
fnameRData <- list.files(path = system.file("data", package = "ADaCGH2"),
full.names = TRUE, pattern = "inputEx.RData")
fnametxt <- list.files(path = system.file("data", package = "ADaCGH2"),
full.names = TRUE, pattern = "inputEx.txt")
namepath <- system.file("example-datadir", package = "ADaCGH2")
## Read from RData and write to ff
inputToADaCGH(ff.or.RAM = "ff",
RDatafilename = fnameRData)
## Read from text file and write to ff
## You might want to adapt mc.cores to your hardware
inputToADaCGH(ff.or.RAM = "ff",
textfilename = fnametxt,
mc.cores = 2)
## Read from text file and write to RAM
## You might want to adapt mc.cores to your hardware
inputToADaCGH(ff.or.RAM = "RAM",
textfilename = fnametxt,
mc.cores = 2)
## Read from a directory and write to ff
## You might want to adapt mc.cores to your hardware
inputToADaCGH(ff.or.RAM = "ff",
path = namepath,
mc.cores = 2)
### Clean up (DO NOT do this with objects you want to keep!!!)
load("chromData.RData")
load("posData.RData")
load("cghData.RData")
delete(cghData); rm(cghData)
delete(posData); rm(posData)
delete(chromData); rm(chromData)
unlink("chromData.RData")
unlink("posData.RData")
unlink("cghData.RData")
unlink("probeNames.RData")
### Running in a separate process. Only makes sense
### if returning ff objects (ff.or.RAM = "ff")
### This example will not work on Windows
## Not run:
mcparallel(inputToADaCGH(ff.or.RAM = "ff",
RDatafilename = fnameRData),
silent = FALSE)
tableChromArray <- mccollect()
if(inherits(tableChromArray, "try-error")) {
stop("ERROR in input data conversion")
}
### Clean up (DO NOT do this with objects you want to keep!!!)
load("chromData.RData")
load("posData.RData")
load("cghData.RData")
delete(cghData); rm(cghData)
delete(posData); rm(posData)
delete(chromData); rm(chromData)
unlink("chromData.RData")
unlink("posData.RData")
unlink("cghData.RData")
unlink("probeNames.RData")
## End(Not run)
### Try to prevent problems in R CMD check
## Sys.sleep(2)
### Delete temp dir
setwd(originalDir)
## Sys.sleep(2)
unlink("ADaCGH2_example_input_dir", recursive = TRUE)
## Sys.sleep(2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.