This document is here as a soundboard for ideas and learnings around the sparklyr
package.
invoke_new() # create new scala objects
invoke_static() # call static methods from scala
invoke() # call methods from scala object
sparklyr has a function named sdf_schema()
for exploring the columns of a tibble on the R side. The return value is a list, and each element is a list with two elements, containing the name and data type of each column.
Here is a comparison of how R data types map to Spark data types. Other data types are not currently supported by sparklyr
.
| R type | Spark type | ---------|------------- | logical | BooleanType | | numeric | DoubleType | | integer | IntegerType | | character | StringType | | list | ArrayType |
sparklyr
doesn't currently have the ability to pass over more complex data types such as a List[String]
.
When passing an R list
over to Scala, we get a Scala ArrayType
and there is no current way to send a Scala List
from R using sparklyr
. However, some of our Scala functions require List
inputs. Potential solutions to this issue are:
Seq
instead of List
as the input type since Array
has also the Seq
trait in Scala, so everything works out-of-the-box.toList
function on the ArrayType
and then calls the existing Scala method.We can create Java ArrayList
s in the Spark environment using the following code:
# map some R vector `x` to a java ArrayList
al <- invoke_new(sc, "java.util.ArrayList")
lapply(x, FUN = function(y){invoke(al, "add", y)})
Note we don't need to reassign the results of the lapply
because it is adding values to the Scala List
in the JVM. We can then convert this code to a Scala List
using:
invoke_static(sc, "scala.collection.JavaConversions", "asScalaBuffer", al) %>%
invoke("toSeq") %>%
invoke("toList")
static
method?A static
method is one type of method which doesn't need any object to be initialized for it to be called. For instance, here’s an example of a method named increment
in a Scala object named StringUtils
:
object StringUtils {
def increment(s: String) = s.map(c => (c + 1).toChar)
}
Because it’s defined inside an object (not a class), the increment
method can be called directly on the StringUtils
object, without requiring an instance of StringUtils
to be created:
scala> StringUtils.increment("HAL")
res0: String = IBM
In fact, when an object is defined like this without a corresponding class, you can’t create an instance of it. This line of code won’t compile:
val utils = new StringUtils
So let's say you want to create a class that has instance methods and static methods. First you define nonstatic (instance) members in your class, and define members that you want to appear as “static” members in an object that has the same name as the class, and is in the same file as the class. This object is known as a companion object. For example:
// Pizza class
class Pizza (var crustType: String) {
override def toString = "Crust type is " + crustType
}
// companion object
object Pizza {
val CRUST_TYPE_THIN = "thin"
val CRUST_TYPE_THICK = "thick"
def getFoo = "Foo"
}
With the Pizza
class and Pizza
object defined in the same file (presumably named Pizza.scala), members of the Pizza
object can be accessed as so:
println(Pizza.CRUST_TYPE_THIN)
println(Pizza.getFoo)
You can also create a new Pizza
instance and use it as usual:
var p = new Pizza(Pizza.CRUST_TYPE_THICK)
println(p)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.