Description Details Public fields Methods Examples
This class was designed as a thin wrapper around Spark's
SparkContext. It is initialized when spark_submit is called
and inserted into the workspace as sc. Note, running
sc$stop will end your session. For information on methods and types
requirements, refer to the javadoc:
Not all methods are implemented due to compatability
and tidyspark best practice usage conflicts. If you need to use a method not
included, try calling it using call_method(sc$jobj, <yourMethod>).
jobjSparkContext java object
getConfget the SparkConf
new()Create a new SparkContext
SparkContext$new(sc = NULL)
scoptional, can instatiate with another SparkContext's jobj.
print()print SparkContext
Add File
SparkContext$print()
addFile()Add a file to be downloaded with this Spark job on every node.
SparkContext$addFile(path, recursive = F)
pathstring
recursiveboolean Add Jar
addJar()Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
SparkContext$addJar(path)
pathstring App Name
appName()get the App name Broadcast
SparkContext$appName()
broadcast()Broadcast a vairable to executors. cancelAllJobs
SparkContext$broadcast(value)
valuethe variable to broadcast.
cancelAllJobs()Cancel all jobs that have been scheduled or are running. cancelJobGroup
SparkContext$cancelAllJobs()
cancelJobGroup()Cancel active jobs for the specified group.
SparkContext$cancelJobGroup(groupId)
groupIdstring clearJobGroup
clearJobGroup()Clear the current thread's job group ID and its description. defaultMinPartitions
SparkContext$clearJobGroup()
defaultMinPartitions()Default min number of partitions for Hadoop RDDs when not given by user Notice that we use math.min so the "defaultMinPartitions" cannot be higher than 2. defaultParallelism
SparkContext$defaultMinPartitions()
defaultParallelism()Default level of parallelism to use when not given by user emptyRDD
SparkContext$defaultParallelism()
emptyRDD()Get an RDD that has no partitions or elements.
SparkContext$emptyRDD()
RDD isLocal
isLocal()is the Spark process local?
SparkContext$isLocal()
boolean jars
jars()is the Spark process local?
SparkContext$jars()
a jobj representing scala.collection.Seq<String>
master
master()why is roxygen making me do all these...
SparkContext$master()
string Parallelize
parallelize()Distribute a list (or Scala collection) to form an RDD.
SparkContext$parallelize(seq, numSlices = 1L)
seqlist (or Scala Collection) to distribute
numSlicesnumber of partitions to divide the collection into
Parallelize acts lazily. If seq is a mutable collection and is altered after the call to parallelize and before the first action on the RDD, the resultant RDD will reflect the modified collection. Pass a copy of the argument to avoid this., avoid using parallelize(Seq()) to create an empty RDD. Consider emptyRDD for an RDD with no partitions, or parallelize(Seq[T]()) for an RDD of T with empty partitions.
RDD setCheckpointDir
setCheckpointDir()Set the directory under which RDDs are going to be checkpointed. setJobDescription
SparkContext$setCheckpointDir(directory)
directorystring, path to the directory where checkpoint files will be stored (must be HDFS path if running in cluster)
setJobDescription()Set a human readable description of the current job. setJobGroup
SparkContext$setJobDescription(value)
valuestring
setJobGroup()Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
SparkContext$setJobGroup(groupId, description, interruptOnCancel)
groupIdstring
descriptionstring
interruptOnCancelIf TRUE, then job cancellation will result in Thread.interrupt() being called on the job's executor threads. This is useful to help ensure that the tasks are actually stopped in a timely manner, but is off by default due to HDFS-1208, where HDFS may respond to Thread.interrupt() by marking nodes as dead. setLocalProperty
setLocalProperty()Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool. sparkuser
SparkContext$setLocalProperty(key, value)
keystring
valuestring
sparkUser()Who AM I? startTime
SparkContext$sparkUser()
startTime()still surprised I have to write these. but the big bad orange warnings that roxygen throws are just sooooo ugly stop
SparkContext$startTime()
stop()Shut down the SparkContext. textFile
SparkContext$stop()
textFile()Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings. version
SparkContext$textFile(path, minPartitions)
pathstring, path to the text file on a supported file system
minPartitionsint, suggested minimum number of partitions for the resulting RDD
version()The version of Spark on which this application is running. Union RDDs
SparkContext$version()
union()Build the union of a list of RDDs.
SparkContext$union(rdds)
rddsa list of RDDs or RDD jobjs
RDD wholeTextFiles
wholeTextFiles()Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI.
SparkContext$wholeTextFiles(path, minPartitions)
pathDirectory to the input data files, the path can be comma separated paths as the list of inputs.
minPartitionsA suggestion value of the minimal splitting number for input data.
RDD
clone()The objects of this class are cloneable with this method.
SparkContext$clone(deep = FALSE)
deepWhether to make a deep clone.
1 2 3 4 5 6 7 8 9 10 | ## Not run:
spark <- spark_session()
sc <- spark$sparkContext
sc$defaultParallelism()
an_rdd <- sc$parallelize(list(1:10), 4)
sc$getConf$get("spark.submit.deployMode")
spark_session_stop()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.