Parallel infrastructure for computing on data generally centers around data parallelism. This means to call the same function on many different data- Same Instruction Multiple Data (SIMD).
Independent tasks can also run in parallel. This is task parallelism. It means to call different functions on different data simultaneously.
This can be done today through R's included parallel
package:
library(parallel) # Begins asynchronous evaluation of rnorm(10) job1 = mcparallel(expr = rnorm(10)) # This can happen before the above expression is finished x = mean(1:10) y = mccollect(job1)[[1]]
This introduces overhead compared to standard serial evaluation, but it may speed up the program if the following conditions hold:
1:1e8
generates a sequence of 100 million integers. This takes
10 times longer in parallel because the serialization time
far exceeds the time for actual computation.Suppose the user would like to run a script multiple times. The software essentially needs to do the following:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.