fret_stats: Calculate FRET test statistics
In jean997/fret: Association testing with spatially structured phenotypes

Usage Arguments Details

fret_stats(pheno_file_list, trait_file, mode = c("dry_run", "s0_only",
  "full"), s0, zmin, z0 = if (!missing(zmin)) 0.3 * zmin,
  s0_est_size = pheno_file_list[1], pheno_transformation = NULL,
  trait = "x", covariates = c(), sample = "name", stat_type = c("huber",
  "lm", "qp", "custom"), stat_fun = NULL, resid_fun = NULL, libs = c(),
  seed, n_perm = 0, bandwidth = 151, smoother = c("ksmooth_0", "ksmooth",
  "none"), chunksize = 1e+05, which_chunks = "all", temp_dir = "./",
  temp_prefix = NULL, labels = pheno_file_list, cores = 1)

`pheno_file_list`	Name or vector of names of genomic phenotype files
`trait_file`	Name of trait file
`mode`	One of "dry_run", "s0_only", or "full". See description for details.
`s0`	Variance inflation constant. If missing, s0 will be estimated using a random chunk of the data. (See description and s0_est_size argument)
`zmin`	Minimum threshold. If missing, will be set to the 90th percentile of a sample of test statistics (See details below).
`z0`	Merging threshold. If missing z0 = 0.3*zmin.
`s0_est_size`	Number of test statistics used to estimate s0. Alternatively, s0_est_size may be a character string giving a phenotype file name or may be "all" to use all files.
`pheno_transformation`	If the phenotype is to be transformed, provide a function taking one argument (the vector of phenotype) and outputting the transformed phenotype.
`trait`	Name of trait (should match header of trait_file)
`covariates`	List of covariates to adjust for (names should match header of trait_file)
`stat_type`	Type of test statistis. May be one of "huber" or "lm".
`seed`	Seed (used for permutations). If missing, a random seed will be chosen and stored.
`n_perm`	Number of permutations. If n.perm=0, only test statistics for unpermuted data will be calculated.
`bandwidth`	Smoothing bandwidth
`chunksize`	Size of chunks to read at one time
`which_chunks`	Either a vector of chunk numbers or "all". See about running FRET in parallel or on a cluster.
`temp_dir`	Directory to write temporary chunk output to
`temp_prefix`	Prefix to use for chunk output
`labels`	Vector of labels for each phenotype file. These will be used in results tables and also in temporary file names.
`cores`	Number of cores to use. Using more than one requires the parallel package.
`huber_maxit`	Maximum iterations for Huber estimator.
`smooother`	Choice of smoother for smoothing test statistics. Can be one of "ksmooth_0" or "ksmooth". See details below.

mode: The function can run in one of three modes. If mode="dry_run" it will report information about the data provided, the model and the number of chunks and then exit. If mode = "s0_only" it will report this information, estimate s0 and zmin and exit. If mode="full" it will proceed to calculate all test statistics and permutation test statistics for the specified chunks.

Running on a cluster or in parallel: The which_chunks argument is intended to facilitate breaking a very large job into many small jobs that can be easily submitted to a cluster. It can also help with resuming an analysis that was interrupted. To limit memory requirements, only chunks of size chunksize will be read in and analyzed at one time. Results of these analyses are then written to disk in files named temp_dir/temp_prefix-label.chunknum.RDS. It is important to make sure that temp_dir has enough space to store lots of test statistics. If which_chunks="all", these temporary files will be automatically aggregated into a single set of results. If the analysis is conducted over many jobs, the uster will need to call the collect_fret_stats function to do this themselves(see documentation for collect_fret_stats). In addition to breaking chunks over many nodes or many jobs, the cores parameter can be used to perform calculations using multiple cores via the parallel package.

Smoother choice: There are three options for smoothing test statistics. "ksmooth_0" is a box kernel smoother for observations made at integer positions. It assumes that observations at missing positions are equal to 0. This is an appropriate smoother choice for DNase-seq and similar data types. In DNase-seq data, if a position is not present in the data, all samples have 0 cleavages observed at the position so the test statistic is equal to 0. "ksmooth" is a box kernel smoother that assumes observations at missing positions are missing. This is appropriate for bisulfite sequencing data.

Estimating s0, zmin, and z0: If s0 is not provided, it will be estimated from the data (if mode = "s0_only" or FALSE). The function will use an amount of data specified by the s0_est_size argument. If this argument is a file name, all the data in that file will be used. If it is an integer, the number of data points specified will be used. If fewer than 1,000,000 data points are used, the estimate might be unstable and a warning will be given. If zmin is missing, it will be set to the 90th percentile of the test statistics in the data sample (after correcting using s0). If z0 is missing it will be set to 0.3*zmin.

jean997/fret documentation built on May 18, 2019, 11:43 p.m.