Man pages for rstudio/sparklyr
R Interface to Apache Spark

arrow_enabled_objectDetermine whether arrow is able to serialize the given R...
checkpoint_directorySet/Get Spark checkpoint directory
collectCollect
collect_from_rdsCollect Spark data serialized in RDS format into R
compile_package_jarsCompile Scala sources into a Java Archive (jar)
connection_configRead configuration values for a connection
connection_is_openCheck whether the connection is open
connection_spark_shinyappA Shiny app that can be used to construct a 'spark_connect'...
copy_toCopy To
copy_to.spark_connectionCopy an R Data Frame to Spark
DBISparkResult-classDBI Spark Result.
distinctDistinct
download_scalacDownloads default Scala Compilers
dplyr_hofdplyr wrappers for Apache Spark higher order functions
ensureEnforce Specific Structure for R Objects
fillFill
filterFilter
find_scalacDiscover the Scala Compiler
ft_binarizerFeature Transformation - Binarizer (Transformer)
ft_bucketizerFeature Transformation - Bucketizer (Transformer)
ft_chisq_selectorFeature Transformation - ChiSqSelector (Estimator)
ft_count_vectorizerFeature Transformation - CountVectorizer (Estimator)
ft_dctFeature Transformation - Discrete Cosine Transform (DCT)...
ft_elementwise_productFeature Transformation - ElementwiseProduct (Transformer)
ft_feature_hasherFeature Transformation - FeatureHasher (Transformer)
ft_hashing_tfFeature Transformation - HashingTF (Transformer)
ft_idfFeature Transformation - IDF (Estimator)
ft_imputerFeature Transformation - Imputer (Estimator)
ft_index_to_stringFeature Transformation - IndexToString (Transformer)
ft_interactionFeature Transformation - Interaction (Transformer)
ft_lshFeature Transformation - LSH (Estimator)
ft_lsh_utilsUtility functions for LSH models
ft_max_abs_scalerFeature Transformation - MaxAbsScaler (Estimator)
ft_min_max_scalerFeature Transformation - MinMaxScaler (Estimator)
ft_ngramFeature Transformation - NGram (Transformer)
ft_normalizerFeature Transformation - Normalizer (Transformer)
ft_one_hot_encoderFeature Transformation - OneHotEncoder (Transformer)
ft_one_hot_encoder_estimatorFeature Transformation - OneHotEncoderEstimator (Estimator)
ft_pcaFeature Transformation - PCA (Estimator)
ft_polynomial_expansionFeature Transformation - PolynomialExpansion (Transformer)
ft_quantile_discretizerFeature Transformation - QuantileDiscretizer (Estimator)
ft_regex_tokenizerFeature Transformation - RegexTokenizer (Transformer)
ft_r_formulaFeature Transformation - RFormula (Estimator)
ft_robust_scalerFeature Transformation - RobustScaler (Estimator)
ft_standard_scalerFeature Transformation - StandardScaler (Estimator)
ft_stop_words_removerFeature Transformation - StopWordsRemover (Transformer)
ft_string_indexerFeature Transformation - StringIndexer (Estimator)
ft_tokenizerFeature Transformation - Tokenizer (Transformer)
ft_vector_assemblerFeature Transformation - VectorAssembler (Transformer)
ft_vector_indexerFeature Transformation - VectorIndexer (Estimator)
ft_vector_slicerFeature Transformation - VectorSlicer (Transformer)
ft_word2vecFeature Transformation - Word2Vec (Estimator)
full_joinFull join
generic_call_interfaceGeneric Call Interface
get_spark_sql_catalog_implementationRetrieve the Spark connection's SQL catalog implementation...
grapes-greater-than-grapesInfix operator for composing a lambda expression
hive_context_configRuntime configuration interface for Hive
hof_aggregateApply Aggregate Function to Array Column
hof_array_sortSorts array using a custom comparator
hof_existsDetermine Whether Some Element Exists in an Array Column
hof_filterFilter Array Column
hof_forallChecks whether all elements in an array satisfy a predicate
hof_map_filterFilters a map
hof_map_zip_withMerges two maps into one
hof_transformTransform Array Column
hof_transform_keysTransforms keys of a map
hof_transform_valuesTransforms values of a map
hof_zip_withCombines 2 Array Columns
inner_joinInner join
invokeInvoke a Method on a JVM Object
invoke_methodGeneric Call Interface
jarrayInstantiate a Java array with a specific element type.
jfloatInstantiate a Java float type.
jfloat_arrayInstantiate an Array[Float].
j_invokeInvoke a Java function.
j_invoke_methodGeneric Call Interface
jobj_classSuperclasses of object
jobj_set_paramParameter Setting for JVM Objects
join.tbl_sparkJoin Spark tbls.
left_joinLeft join
list_sparklyr_jarslist all sparklyr-*.jar files that have been built
livy_configCreate a Spark Configuration for Livy
livy_installInstall Livy
livy_serviceStart Livy
ml_add_stageAdd a Stage to a Pipeline
ml_aft_survival_regressionSpark ML - Survival Regression
ml_alsSpark ML - ALS
ml_als_tidiersTidying methods for Spark ML ALS
ml_bisecting_kmeansSpark ML - Bisecting K-Means Clustering
ml_call_constructorWrap a Spark ML JVM object
ml_chisquare_testChi-square hypothesis testing for categorical data.
ml_clustering_evaluatorSpark ML - Clustering Evaluator
ml-constructorsConstructors for Pipeline Stages
ml_corrCompute correlation matrix
ml_decision_treeSpark ML - Decision Trees
ml_default_stop_wordsDefault stop words
ml_evaluateEvaluate the Model on a Validation Set
ml_evaluatorSpark ML - Evaluators
ml_feature_importancesSpark ML - Feature Importance for Tree Models
ml_fpgrowthFrequent Pattern Mining - FPGrowth
ml_gaussian_mixtureSpark ML - Gaussian Mixture clustering.
ml_generalized_linear_regressionSpark ML - Generalized Linear Regression
ml_glm_tidiersTidying methods for Spark ML linear models
ml_gradient_boosted_treesSpark ML - Gradient Boosted Trees
ml_isotonic_regressionSpark ML - Isotonic Regression
ml_isotonic_regression_tidiersTidying methods for Spark ML Isotonic Regression
ml_kmeansSpark ML - K-Means Clustering
ml_kmeans_cluster_evalEvaluate a K-mean clustering
ml_ldaSpark ML - Latent Dirichlet Allocation
ml_lda_tidiersTidying methods for Spark ML LDA models
ml_linear_regressionSpark ML - Linear Regression
ml_linear_svcSpark ML - LinearSVC
ml_linear_svc_tidiersTidying methods for Spark ML linear svc
ml_logistic_regressionSpark ML - Logistic Regression
ml_logistic_regression_tidiersTidying methods for Spark ML Logistic Regression
ml_metrics_binaryExtracts metrics from a fitted table
ml_metrics_multiclassExtracts metrics from a fitted table
ml_metrics_regressionExtracts metrics from a fitted table
ml-model-constructorsConstructors for 'ml_model' Objects
ml_model_dataExtracts data associated with a Spark ML model
ml_multilayer_perceptron_classifierSpark ML - Multilayer Perceptron
ml_multilayer_perceptron_tidiersTidying methods for Spark ML MLP
ml_naive_bayesSpark ML - Naive-Bayes
ml_naive_bayes_tidiersTidying methods for Spark ML Naive Bayes
ml_one_vs_restSpark ML - OneVsRest
ml-paramsSpark ML - ML Params
ml_pca_tidiersTidying methods for Spark ML Principal Component Analysis
ml-persistenceSpark ML - Model Persistence
ml_pipelineSpark ML - Pipelines
ml_power_iterationSpark ML - Power Iteration Clustering
ml_prefixspanFrequent Pattern Mining - PrefixSpan
ml_random_forestSpark ML - Random Forest
ml_stageSpark ML - Pipeline stage extraction
ml_standardize_formulaStandardize Formula Input for 'ml_model'
ml_summarySpark ML - Extraction of summary metrics
ml_survival_regression_tidiersTidying methods for Spark ML Survival Regression
ml-transform-methodsSpark ML - Transform, fit, and predict methods (ml_...
ml_tree_tidiersTidying methods for Spark ML tree models
ml-tuningSpark ML - Tuning
ml_uidSpark ML - UID
ml_unsupervised_tidiersTidying methods for Spark ML unsupervised models
mutateMutate
na.replaceReplace Missing Values in Objects
nestNest
pipePipe operator
pivot_longerPivot longer
pivot_widerPivot wider
print_jobjGeneric method for print jobj for a connection type
quote_sql_nameTranslate input character vector or symbol to a SQL...
random_stringRandom string generation
reactiveSparkReactive spark reader
reexportsObjects exported from other packages
registerDoSparkRegister a Parallel Backend
register_extensionRegister a Package that Implements a Spark Extension
replace_naReplace NA
right_joinRight join
sdf_alongCreate DataFrame for along Object
sdf_bindBind multiple Spark DataFrames by row and column
sdf_broadcastBroadcast hint
sdf_checkpointCheckpoint a Spark DataFrame
sdf_coalesceCoalesces a Spark DataFrame
sdf_collectCollect a Spark DataFrame into R.
sdf_copy_toCopy an Object into Spark
sdf_crosstabCross Tabulation
sdf_debug_stringDebug Info for Spark DataFrame
sdf_describeCompute summary statistics for columns of a data frame
sdf_dimSupport for Dimension Operations
sdf_distinctInvoke distinct on a Spark DataFrame
sdf_drop_duplicatesRemove duplicates from a Spark DataFrame
sdf_expand_gridCreate a Spark dataframe containing all combinations of...
sdf_fast_bind_colsFast cbind for Spark DataFrames
sdf_from_avroConvert column(s) from avro format
sdf_is_streamingSpark DataFrame is Streaming
sdf_last_indexReturns the last index of a Spark DataFrame
sdf_lenCreate DataFrame for Length
sdf_num_partitionsGets number of partitions of a Spark DataFrame
sdf_partition_sizesCompute the number of records within each partition of a...
sdf_persistPersist a Spark DataFrame
sdf_pivotPivot a Spark DataFrame
sdf_projectProject features onto principal components
sdf_quantileCompute (Approximate) Quantiles with a Spark DataFrame
sdf_random_splitPartition a Spark Dataframe
sdf_rbetaGenerate random samples from a Beta distribution
sdf_rbinomGenerate random samples from a binomial distribution
sdf_rcauchyGenerate random samples from a Cauchy distribution
sdf_rchisqGenerate random samples from a chi-squared distribution
sdf_read_columnRead a Column from a Spark DataFrame
sdf_registerRegister a Spark DataFrame
sdf_repartitionRepartition a Spark DataFrame
sdf_residualsModel Residuals
sdf_rexpGenerate random samples from an exponential distribution
sdf_rgammaGenerate random samples from a Gamma distribution
sdf_rgeomGenerate random samples from a geometric distribution
sdf_rhyperGenerate random samples from a hypergeometric distribution
sdf_rlnormGenerate random samples from a log normal distribution
sdf_rnormGenerate random samples from the standard normal distribution
sdf_rpoisGenerate random samples from a Poisson distribution
sdf_rtGenerate random samples from a t-distribution
sdf_runifGenerate random samples from the uniform distribution U(0,...
sdf_rweibullGenerate random samples from a Weibull distribution.
sdf_sampleRandomly Sample Rows from a Spark DataFrame
sdf-saveloadSave / Load a Spark DataFrame
sdf_schemaRead the Schema of a Spark DataFrame
sdf_separate_columnSeparate a Vector Column into Scalar Columns
sdf_seqCreate DataFrame for Range
sdf_sortSort a Spark DataFrame
sdf_sqlSpark DataFrame from SQL
sdf_to_avroConvert column(s) to avro format
sdf-transform-methodsSpark ML - Transform, fit, and predict methods (sdf_...
sdf_unnest_longerUnnest longer
sdf_unnest_widerUnnest wider
sdf_weighted_samplePerform Weighted Random Sampling on a Spark DataFrame
sdf_with_sequential_idAdd a Sequential ID Column to a Spark DataFrame
sdf_with_unique_idAdd a Unique ID Column to a Spark DataFrame
selectSelect
separateSeparate
spark_adaptive_query_executionRetrieves or sets status of Spark AQE
spark_advisory_shuffle_partition_sizeRetrieves or sets advisory size of the shuffle partition
spark-apiAccess the Spark API
spark_applyApply an R Function in Spark
spark_apply_bundleCreate Bundle for Spark Apply
spark_apply_logLog Writer for Spark Apply
spark_auto_broadcast_join_thresholdRetrieves or sets the auto broadcast join threshold
spark_coalesce_initial_num_partitionsRetrieves or sets initial number of shuffle partitions before...
spark_coalesce_min_num_partitionsRetrieves or sets the minimum number of shuffle partitions...
spark_coalesce_shuffle_partitionsRetrieves or sets whether coalescing contiguous shuffle...
spark_compilation_specDefine a Spark Compilation Specification
spark_compileCompile Scala sources into a Java Archive
spark_configRead Spark Configuration
spark_config_existsA helper function to check value exist under 'spark_config()'
spark_config_kubernetesKubernetes Configuration
spark_config_packagesCreates Spark Configuration
spark_config_settingsRetrieve Available Settings
spark_configurationRuntime configuration interface for the Spark Session
spark_config_valueA helper function to retrieve values from 'spark_config()'
spark_connectionRetrieve the Spark Connection Associated with an R Object
spark_connection-classspark_connection class
spark_connection_findFind Spark Connection
spark-connectionsManage Spark Connections
spark_connect_methodFunction that negotiates the connection with the Spark...
spark_context_configRuntime configuration interface for the Spark Context.
spark_dataframeRetrieve a Spark DataFrame
spark_default_compilation_specDefault Compilation Specification for Spark Extensions
spark_default_versiondetermine the version that will be used by default if version...
spark_dependencyDefine a Spark dependency
spark_dependency_fallbackFallback to Spark Dependency
spark_extensionCreate Spark Extension
spark_get_javaFind path to Java
spark_home_dirFind the SPARK_HOME directory for a version of Spark
spark_home_setSet the SPARK_HOME environment variable
spark_ide_connection_openSet of functions to provide integration with the RStudio IDE
spark_insert_tableInserts a Spark DataFrame into a Spark table
spark_installDownload and install various versions of Spark
spark_install_findFind a given Spark installation by version.
spark_install_synchelper function to sync sparkinstall project to sparklyr
spark_integ_test_skipIt lets the package know if it should test a particular...
spark_jobjRetrieve a Spark JVM Object Reference
spark_jobj-classspark_jobj class
spark_last_errorSurfaces the last error from Spark captured by internal...
spark_load_tableReads from a Spark Table into a Spark DataFrame.
spark_logView Entries in the Spark Log
sparklyr_get_backend_portReturn the port number of a 'sparklyr' backend.
spark_pipeline_stageCreate a Pipeline Stage Object
spark_readRead file(s) into a Spark DataFrame using a custom reader
spark_read_avroRead Apache Avro data into a Spark DataFrame.
spark_read_binaryRead binary data into a Spark DataFrame.
spark_read_csvRead a CSV file into a Spark DataFrame
spark_read_deltaRead from Delta Lake into a Spark DataFrame.
spark_read_imageRead image data into a Spark DataFrame.
spark_read_jdbcRead from JDBC connection into a Spark DataFrame.
spark_read_jsonRead a JSON file into a Spark DataFrame
spark_read_libsvmRead libsvm file into a Spark DataFrame.
spark_read_orcRead a ORC file into a Spark DataFrame
spark_read_parquetRead a Parquet file into a Spark DataFrame
spark_read_sourceRead from a generic source into a Spark DataFrame.
spark_read_tableReads from a Spark Table into a Spark DataFrame.
spark_read_textRead a Text file into a Spark DataFrame
spark_save_tableSaves a Spark DataFrame as a Spark table
spark_statistical_routinesGenerate random samples from some distribution
spark_table_nameGenerate a Table Name from Expression
spark_versionGet the Spark Version Associated with a Spark Connection
spark_version_from_homeGet the Spark Version Associated with a Spark Installation
spark_versionsRetrieves a dataframe available Spark versions that van be...
spark_webOpen the Spark web interface
spark_writeWrite Spark DataFrame to file using a custom writer
spark_write_avroSerialize a Spark DataFrame into Apache Avro format
spark_write_csvWrite a Spark DataFrame to a CSV
spark_write_deltaWrites a Spark DataFrame into Delta Lake
spark_write_jdbcWrites a Spark DataFrame into a JDBC table
spark_write_jsonWrite a Spark DataFrame to a JSON file
spark_write_orcWrite a Spark DataFrame to a ORC file
spark_write_parquetWrite a Spark DataFrame to a Parquet file
spark_write_rdsWrite Spark DataFrame to RDS files
spark_write_sourceWrites a Spark DataFrame into a generic source
spark_write_tableWrites a Spark DataFrame into a Spark table
spark_write_textWrite a Spark DataFrame to a Text file
sql-transformerFeature Transformation - SQLTransformer
src_databasesShow database list
stream_findFind Stream
stream_generate_testGenerate Test Stream
stream_idSpark Stream's Identifier
stream_lagApply lag function to columns of a Spark Streaming DataFrame
stream_nameSpark Stream's Name
stream_read_csvRead files created by the stream
stream_renderRender Stream
stream_statsStream Statistics
stream_stopStops a Spark Stream
stream_trigger_continuousSpark Stream Continuous Trigger
stream_trigger_intervalSpark Stream Interval Trigger
stream_viewView Stream
stream_watermarkWatermark Stream
stream_write_csvWrite files to the stream
stream_write_memoryWrite Memory Stream
sub-.tbl_sparkSubsetting operator for Spark dataframe
tbl_cacheCache a Spark Table
tbl_change_dbUse specific database
tbl_uncacheUncache a Spark Table
transform_sdftransform a subset of column(s) in a Spark Dataframe
uniteUnite
unnestUnnest
worker_spark_apply_unbundleExtracts a bundle of dependencies required by 'spark_apply()'
rstudio/sparklyr documentation built on March 29, 2024, 3:30 p.m.