Home

/

GitHub

/

sparkts: Call Methods from the spark-ts Package

This document is here as a soundboard for ideas and learnings around the sparklyr package.

invoke_new()    # create new scala objects
invoke_static() # call static methods from scala
invoke()        # call methods from scala object

sparklyr has a function named sdf_schema() for exploring the columns of a tibble on the R side. The return value is a list, and each element is a list with two elements, containing the name and data type of each column.

Here is a comparison of how R data types map to Spark data types. Other data types are not currently supported by sparklyr.

| R type | Spark type | ---------|------------- | logical | BooleanType | | numeric | DoubleType | | integer | IntegerType | | character | StringType | | list | ArrayType |

sparklyr doesn't currently have the ability to pass over more complex data types such as a List[String].

When passing an R list over to Scala, we get a Scala ArrayType and there is no current way to send a Scala List from R using sparklyr. However, some of our Scala functions require List inputs. Potential solutions to this issue are:

Use Seq instead of List as the input type since Array has also the Seq trait in Scala, so everything works out-of-the-box.
Use overloading, which allows us to define methods of same name but having different parameters or data types, though this has issues. For an example of how this works, see this link.
Define a new Scala method for the same class that is called from R, which effectively invokes the toList function on the ArrayType and then calls the existing Scala method.

We can create Java ArrayLists in the Spark environment using the following code:

# map some R vector `x` to a java ArrayList
al <- invoke_new(sc, "java.util.ArrayList")
lapply(x, FUN = function(y){invoke(al, "add", y)})

Note we don't need to reassign the results of the lapply because it is adding values to the Scala List in the JVM. We can then convert this code to a Scala List using:

invoke_static(sc, "scala.collection.JavaConversions", "asScalaBuffer", al) %>%
  invoke("toSeq") %>%
  invoke("toList")

What is a `static` method?

A static method is one type of method which doesn't need any object to be initialized for it to be called. For instance, here’s an example of a method named increment in a Scala object named StringUtils:

object StringUtils {
  def increment(s: String) = s.map(c => (c + 1).toChar)
}

Because it’s defined inside an object (not a class), the increment method can be called directly on the StringUtils object, without requiring an instance of StringUtils to be created:

scala> StringUtils.increment("HAL") 
res0: String = IBM

In fact, when an object is defined like this without a corresponding class, you can’t create an instance of it. This line of code won’t compile:

val utils = new StringUtils

So let's say you want to create a class that has instance methods and static methods. First you define nonstatic (instance) members in your class, and define members that you want to appear as “static” members in an object that has the same name as the class, and is in the same file as the class. This object is known as a companion object. For example:

// Pizza class
class Pizza (var crustType: String) {
  override def toString = "Crust type is " + crustType
}

// companion object
object Pizza {
  val CRUST_TYPE_THIN = "thin" 
  val CRUST_TYPE_THICK = "thick" 
  def getFoo = "Foo"
}

With the Pizza class and Pizza object defined in the same file (presumably named Pizza.scala), members of the Pizza object can be accessed as so:

println(Pizza.CRUST_TYPE_THIN)
println(Pizza.getFoo)

You can also create a new Pizza instance and use it as usual:

var p = new Pizza(Pizza.CRUST_TYPE_THICK)
println(p)

nathaneastwood/sparkts documentation built on May 25, 2019, 10:34 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

nathaneastwood/sparkts
Call Methods from the spark-ts Package

inst/docs/sparklyr_learnings.md
In nathaneastwood/sparkts: Call Methods from the spark-ts Package

Introduction

Calling Scala

Data Types

Using other data types

What is a `static` method?

R Package Documentation

Browse R Packages

We want your feedback!

nathaneastwood/sparkts Call Methods from the spark-ts Package

inst/docs/sparklyr_learnings.md In nathaneastwood/sparkts: Call Methods from the spark-ts Package

Introduction

Calling Scala

Data Types

Using other data types

What is a static method?

R Package Documentation

Browse R Packages

We want your feedback!

nathaneastwood/sparkts
Call Methods from the spark-ts Package

inst/docs/sparklyr_learnings.md
In nathaneastwood/sparkts: Call Methods from the spark-ts Package

What is a `static` method?