learnFamilyBasedDAGs: Learning of Gaussian Directed Acyclic PGMs from Family Data

Description Usage Arguments Value References Examples

View source: R/learnFamilyBasedPGMs.R

Description

The CPDAG representing the Markov equivalence classes to which the Gaussian directed acyclic PGM belong and its decomposition into genetic and environmental components are learned from observational family data by applying the IC/PC algorithm \insertCitepearl2000causality,spirtes2000causationFamilyBasedPGMs with the zero partial correlation tests derived in the work by \insertCiteribeiro2019family;textualFamilyBasedPGMs as d-separation oracles.

These tests are based on univariate polygenic linear mixed models \insertCitealmasy1998multipointFamilyBasedPGMs, with two components of variance: the polygenic or family-specific random effect, which models the phenotypic variability across the families, and the environmental or subject-specific error, which models phenotypic variability after removing the familial aggregation effect.

Usage

1
2
3
4
5
learnFamilyBasedDAGs(phen.df, covs.df, pedigrees, sampled, fileID,
  dirToSave, alpha = 0.05, max_cores = NULL, minK = 10,
  maxFC = 0.01, orthogonal = TRUE, hidden_vars = FALSE,
  maj.rule = TRUE, useGPU = FALSE, debug = TRUE, savePlots = FALSE,
  logFile = NULL)

Arguments

phen.df

A data.frame with phenotype variables of only sampled subjects. Column names must be properly set with the names of the phenotypes.

covs.df

A data.frame with covariates of only sampled subjects. Column names must be properly set with the names of the covariates.

pedigrees

A data.frame with columuns famid, id, dadid, momid, and sex columns for all sampled and non-sampled subjects.

sampled

A logical vector in which element i indicates whether individual i was sampled or not.

fileID

A character string to be used as prefix in the filenames of RData objects with the partial correlation results. Note that covariates are not identified in these files.

dirToSave

Path to the folder you want to save the output objects.

alpha

The significance level to be used in the partial correlation tests.

max_cores

An integer indicating the maximum number of CPU cores to be used for parallel execution.

minK

A scalar indicating the minimum dimension allowed in the dimensionality reduction for confounding correction.

maxFC

A scalar between 0 and 1, indicating the maximum fraction of confounding allowed.

orthogonal

A logical value indicating whether the transformation matrix used in the confounding correction is orthogonal or not.

hidden_vars

A logical value indicating if the causal structure learning method should account for hidden variables. The rfci algorithm is used if hidden_vars is TRUE and the pc algorithm is used otherwise.

maj.rule

A logical value to be used in the skeleton function, indicating whether the majority rule must be applied or not.

useGPU

A logical value indicating whether GPU cores can be used for parallel execution.

debug

A logical value indicating whether some debug messages can be shown.

savePlots

A logical value indicating whether plots for the confounding correction must be generated.

logFile

Optional file path and name to save progress and error messages. If not provided and debug is True a default file is created in the dirToSave folder.

Value

Returns a list with the partial correlation matrices (pcor), the adjacency matrices (adjM), and with the igraph objects representing the undirected PGM (udg).

References

\insertAllCited

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
data(scen3) # available simulated datasets are scen1, scen2, scen3, and scen4

scenario = 3 # data was simulated according to scenario 3

fam.nf <- scen3$fam.nf
pedigrees <- scen3$pedigrees
phen.df <- scen3$phen.df[[1]] # accessing the first replicate
covs.df <- NULL # no covariates were used in the simulation process.

N <- sum(fam.nf) # total number of individuals
sampled <- rep(1, N) # in simulated data, all individuals were sampled.

fileID <- paste0("scen", scenario)
dirToSave <- paste0("./objects-PC-", fileID, "/")
dir.create(dirToSave, showWarnings=FALSE)

alpha = 0.05

dags <- learnFamilyBasedDAGs(phen.df, covs.df, pedigrees, sampled,
                             fileID, dirToSave, alpha, max_cores=NULL,
                             minK=10, maxFC = 0.05, orthogonal=TRUE,
                             hidden_vars=FALSE, maj.rule=TRUE,
                             useGPU=FALSE, debug=TRUE, savePlots=FALSE)

# the adjacency matrix of the learned directed acyclic genetic PGM
as(dags$g, "amat")

# plotting the the learned directed acyclic genetic PGM as an `igraph` object:
plot.igraph(graph.adjacency(adjM_g), vertex.size=30, vertex.color="lightblue")

# the adjacency matrix of the learned directed acyclic environmental PGM
as(dags$e, "amat")

# plotting the the learned directed acyclic environmental PGM as an `igraph` object:
plot.igraph(graph.adjacency(adjM_e), vertex.size=30, vertex.color="lightblue")

adele/FamilyBasedPGMs documentation built on Feb. 16, 2021, 8:29 a.m.