draw8empidata: performs an empidata [multivariate] draw or prediction

Description Usage Arguments Details Value Examples

Description

(dn) According to parent values provided in X, returns draws for simulation from the E data frame. The returned variables are defined by names(win@nat), the subset selection is defined by win. See the detail section for the management of missing values.
When draws are done individually (within a loop), each time a draw is done, a record indicating the size of the candidate subset is appended to the file of name 'rebastaba.efi'.
Notice that the parent names (names(win@swg)) and variable to extract (names(win@nat)) are given at the variable level.

Usage

1
draw8empidata(X, E, win, nat, bnid, strict=FALSE)

Arguments

X

The matrix of values of the already simulated values. It must comprises the parents. If there is no parents, the matrix must be given (even with no columns) just to provide the number of simulations by its row number.

E

The data frame to be used for the empidata distribution. It must comprise all possible parents (WITH node name and brackets) and variables (WITHOUT the node name and brackets).

win

/win/ object to be used for the selection. Notice that win@swg indicates the parents and names(win@nat) indicates the variables of the nodes.

nat

Named vector of the nature parents (possibly more, usually those of the X columns).

bnid

character identifying the bn (for indication in the rebastaba.efi sinking.

strict

Must the elimination of missing values be common to all variables of the node?

Details

To allow the successive links of empidata nodes the node name was eliminated from the correspondance. E.g. let suppose that we are dealing witht the sequence of SEX -> E[HGT] -> F[WGT] where E and F are associated to empidata bases. In the adopted way, this implies that the corresponding parents E[SEX] and F[HGT] exists with the reduced names SEX for the base E and HGT for the base F.
Notice that implies that the same variable name can be used as co-parents for a given node. E.g. E1[HGT] and E2[HGT] cannot be simultaneous parents of F[WGT].
Missing values can be dealt in two ways. Here are the rules which are applied.
Let us denote X[parents], E[parents] and E[variables], respectively, the inherited parents from the previous simulated values, the corresponding parents in the data base and the variables which are drawn.

1

When there are missing values in E[variables], either they are left (strict=FALSE) either they are prevented, forbidding these observations to be drawn.

2

When there are parents, any missingness in X[parents] leads to missing values in all E[variables] since there is no way to know the corresponding observations in E data base..

3

For the same reason, when there are parents, all observations with missing values in E[parents] are discarded.

Value

A data.frame of size nrow(X) times length(win@nat) with the simulated values in the right order.

Examples

1
2
3
4
5
6
 rebastaba3k("RESET"); # to comply R checking
 E <- data.frame(A=1:100, c=100:1, w=round(sin(1:100), 2)); # the empirical distribution
 X <- as.data.frame(matrix(20:29, 10, 1, dimnames=list(NULL, "A"))); # the covariate values
 win <- new("win", nat=structure("conti", .Names="c"), 
 swg=structure(1, .Names="A"), skk=0, sdi=c(0, 1), snb=c(1, 2));
 draw8empidata(X, E, win, nat=structure("conti", .Names="A"), bnid="rebastaba.bn2");

rebastaba documentation built on May 2, 2019, 5:24 p.m.