This is the main function to perform snowball analysis. It requires a minimum input with many default operating parameters set.
1 2 3 
y 
a factor variable for mutation status 
X 
data.frame containing gene expression data. The
columns of 
ncore 
number of processors to use for parallel
computation. Set 
d 
the size of gene subset for gene level resampling. See references on d in X_d^x 
B 
bootstrap size, which is B in J_n(x), defining the total number of gene subsets used to estimate J_n, J_n(x)=\frac{1}{B}∑_{i=1}^{B}(\frac{1}{K}∑_{j=1}^{K}φ_n(g(X_{i,j}),κ)) 
B.i 
bootstrap size deployed on each child job in parallel mode 
sample.n 
number of samples drawn from the subject
level resampling, denoted as K in J_n(x). It
is ignored if 
resample.method 
this defines how the subject level
resampling is performed. The possible values are

mode.resample 
this specifies how the subjects are
counted for subject level leavekout random sampling,
and whether the stratification by group is applied. The
possible input values are 
k.resample 
A numerical value specifies the number
of subjects left out during the subject level resampling.
It is an integer number if 
A data.frame containing two variables: weights
and
positives
. weights
are the J_n(x)
values for all genes and positives are indicators to
whether a specific J_n(x) is above or below the
median of all J_n(x)'s.
The resampling is applied on two dimensions (see
references): gene level resamping and subject level
resampling. The gene level resampling is straightforward 
each time it takes d
number of genes randomly from
all the genes in X
. The subject level resampling is
specified by the combination of values given in
sample.n
, resample.method
,
mode.resample
and k.resample
. The flat
resampling on all subjects regardless of grouping,
specified by letting resample.method="none"
, is
simply a leavekout random sampling, where k is given by
k.resample
. In more complex cases, the subject level
resampling can be stratified based on the groups defined on
y
, in which case, resample.method
takes the
value of either "sample"
or "combn"
. When
resample.method = "sample"
, it applies a leavekout
random sampling within each group and finally only
sample.n
samples are generated from the resampling.
When resample.method = "combn"
, all possible
combinations after conditioning on the restrictions given
by mode.resample
and k.resample
are included.
In this case, the total number of resampled samples varies
depending on the sample size of the study.
mode.resample="count.class"
or
"percent.class"
defines two ways to calculate the
number of subjects to be left out in the random sampling.
The value of "count.class" indicates the exact number to be
left out and "percent.class" indicates the percentage of
total subjects to be left out. In all cases,
k.resample
specifies the number of subjects left out
in the leavekout sampling. If k.resample
is only a
scalar integer number, the subjects will be sampled with
exactly k.resample
subjects left out, either across
all the subjects in the case of flat sampling, or within
each group in the case of stratified resampling by group.
Instead, if k.resample
a vector with two integer
numbers, the sampling will leave out the number of subjects
from the two groups based on the two numbers provided. The
order of which number is taken for which group is based on
that the first number is assigned to the first factor level
and the second number is assigned to the second factor
level of factor(y)
. Check factor(y)
to see
how the two numbers in k.resample
would be assigned
to the two groups. A vector with two values for
k.resample
produces error if mode.resample =
"flat"
. This flexible way of defining the sampling scheme
allows easy specification for balanced sample size between
groups. See references for more details.
Xu, Y., Guo, X., Sun, J. and Zhao. Z. Snowball: resampling combined with distancebased regression to discover transcriptional consequences of driver mutation, manuscript.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  require(DESnowball)
data(snowball.demoData)
# check the demo dataset
print(sb.mutation)
head(sb.expression)
## A test run
Bn < 10000
ncore <4
# call Snowball
## Not run:
sb < snowball(y=sb.mutation,X=sb.expression,
ncore=ncore,d=100,B=Bn,
sample.n=1)
# process the gene ranking and selection
sb.sel < select.features(sb)
# plot the Jn values
plotJn(sb, sb.sel)
# get the significant gene list
top.genes < toplist(sb.sel)
## End(Not run)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.