README.md

R::construct.model.matrix

Introduction

This package came about as a replacement for the corresponding script shared-source/construct_model_matrix.R within the repository NCI-CGR/plco-analysis. The goal of the package was to generate variant of the script that is modularized, extensible, testable, installable, and just generally better. Happily that has all seemingly been achieved approaching v1.0.0! The addition of formal test cases in particular using usethis::testthat has been a relief.

There isn't much use to installing this package for reasons other than integration with NCI-CGR/plco-analysis; but at least in this repo now, it's theoretically possible for someone to extend this package or use it for other purposes.

Installation

Dependencies

Certain installation methods require manual installation of dependencies, since this package is extremely unlikely to ever end up in CRAN or conda. If needed, the required dependencies are: devtools, stringr, and data.table. These can be installed with those names within R using install.packages, or from conda using those names with r- prepended.

With a tarball

This repo is formatted as a CRAN-compliant R package and can be installed using relevant installation methods. It's not on CRAN, but if you have a tarball of this repository, you should be able to install it with the following command:

R CMD INSTALL construct.model.matrix-1.0.0.tar.gz

With git/devtools

Possibly the most practical option is to use the same process I'm using during development and testing. Unfortunately since this is not in CRAN or conda, you have to handle the dependencies yourself, as mentioned above. Then clone the repo wholesale, and use devtools to install the package:

R

require(devtools)

devtools::install_github("https://github.com/NCI-CGR/construct.model.matrix")

Input Data and Formats

The main entry point function in this package, construct.model.matrix, builds a model matrix given a series of parameter specifications. This is designed to deprecate the functionality of the NCI-CGR/plco-analysis script shared-source/construct_model_matrix.R.

The primary function construct.model.matrix accepts the following arguments:

Output Format

The data output format is consistent with the format established in NCI-CGR/plco-analysis. The first output row is a header of column names; the first two columns are "FID" and "IID"; though the naming convention is consistent with traditional PLINK phenotype files, the subject IDs are by default derived from the "plco_id" column from the PLCO backend phenotype files. The next column is always the single phenotype outcome; if the trait is continuous, this will have been inverse normalized. The remaining columns are any additional covariates in the model, in the order specified to covariate.list.csv.

The remaining rows each correspond to a single subject from the backend phenotype file, after filtering out subjects not requested by the relevant parameters (e.g. ancestry, chip, category.filename, sex.specific, control.inclusion.filename, control.exclusion.filename). The subjects are guaranteed to be in the same order in which they are encountered in the backend phenotype file. Entries are tab-delimited. There is no row ID column. String entries are not enclosed in quotation marks. The output file is plain text, not compressed.

Version History

21 January 2021: migrate to GitHub, reset to v1.0.0 on that platform.

17 December 2020: release candidate: v1.1.0! now with speeeeeedy loading.

16 December 2020: release candidate: v1.0.0! this has gone very smoothly.

14 December 2020: initial migration of version from plco.analysis.



NCI-CGR/construct.model.matrix documentation built on Aug. 10, 2021, 8:53 a.m.