generate_data: Simulate data from the cFIT model

Description Usage Arguments Value

View source: R/simulation_by_model.R

Description

Simulate

X_j = H_jWΛ_j + 1_{n_j}b_j +E_j, j = 1,…, ntask

. The nonnegative matrix is generated as the cluster centers given number of clusters K. H_j is generated as the binary membership matrix, where the labels are generated from a Dirichlet distribution with parameter alpha. Distortion lambda and shift b are generated from truncated normal distribution, Noise matrix E is generate with each entry from iid normal distribution.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
generate_data(
  n,
  ntask,
  K,
  p,
  alpha = NULL,
  sig = 1,
  cl.sep = 1,
  batch.effect.sig = 0.1
)

Arguments

n

number of data point per dataset

ntask

number of batches

K

number of clusters

p

number of genes

alpha

parameter for Dirichilet distribution used to generate the labels (default 10, representing equal cluster size. smaller alpha corresponds to more unbalanced types)

sig

with cluster variance

cl.sep

Cluster center separation, the higher the clusters are well separated

batch.effect.sig

batch effect variance, higher the large batch effects are

Value

A list of generated data,

X.list

a list of n-by-p expression matrix

H.list

a list of n-by-K binary membership matrix

lambda.list

a list of length-p vectors of per-dataset scaling

b.list

a list of length-p vectors of per-dataset shift

E.list

a list of noise matrix

label.list

a list of length-n vectors of cluster labels

W

p-by-K common factor matrix


pengminshi/cFIT documentation built on July 11, 2021, 11:12 p.m.