Description Usage Arguments Details Examples
Use a sample CSV file to create a Hive table or pass the 'columns' argument to spark_read_csv
1 2 | db_map_csv(sample_file, db = "sparklyr", sample_size = 5,
dir_location = NULL, table_name = NULL, ...)
|
sample_file |
The path to a sample CSV file that will be used to determine the column types. |
db |
The type of connection or database. Possible values: 'hive', 'sparklyr'. |
sample_size |
The number of the top rows that will be sampled to determine the class. Defaults to 5. |
dir_location |
'hive' only - Passes the location of the directory where the data files are. |
table_name |
'hive' only - Passes the name of the table. Defaults to 'default'. |
This technique is meant to cut down the time of reading CSV files into the Spark context. It does that by either passing the column names and types in spark_read_csv or by using SQL to create the table
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ## Not run:
#Libraries needed for this example
library(tidyverse)
library(sparklyr)
library(dbutilities)
library(nycflights13)
#Creating a local spark context
conf <- spark_config()
conf$`sparklyr.shell.driver-memory` <- "16G"
sc <- spark_connect(master = "local",
version = "2.1.0",
config = conf)
#Using flights from nycflights13 for example
data("flights")
flights
#Creating a csv file out of the flights table
if(!dir.exists("csv"))dir.create("csv")
write_csv(flights, "csv/flights.csv")
#Mapping the CSV file (Hive)
create_sql <- db_map_csv(sample_file = "csv/flights.csv",
dir_location = file.path(getwd(), "csv"),
db = "hive",
table_name = "sql_flights")
#Run resulting SQL command to create the table
DBI::dbGetQuery(sc, create_sql)
#Mapping the CSV file (sparklyr)
flights_columns <- db_map_csv(sample_file = "csv/flights.csv")
#Use spark_read_csv with the infer_schema argument set to FALSE
flights_noinfer <- spark_read_csv(sc,
name = "noinfer_flights",
path = "csv/",
infer_schema = FALSE,
columns = flights_columns)
spark_disconnect(sc)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.