Spark SQL Functions – Listed by Category

In this post we will address Spark SQL Functions, i.e., built-in functions, its syntax and what it does. The comprehensive list of Spark functions provided in the Apache Spark API documentation can be a bit difficult to navigate. In this post we breakdown the Apache Spark built-in functions by Category: Operators, String functions, Number functions, Date functions, Array functions, Conversion functions and Regex functions. Hopefully this will simplify the learning process and serve as a better reference article for Spark SQL functions.

  • Spark SQL Operators
  • Spark SQL String functions
  • Spark SQL Date functions
  • Spark SQL Number functions
  • Spark SQL Array functions
  • Spark SQL Conversion functions
  • Spark SQL Regex functions

Spark API

When writing and executing Spark SQL from Scala, Java, Python or R, a SparkSession is still the entry point. Once a SparkSession has been established, a DataFrame or a Dataset needs to be created on the data before Spark SQL can be executed

Spark SQL CLI

This Spark SQL Command Line interface is a lifesaver for writing and testing out SQL. However, the SQL is executed against Hive, so make sure test data exists in some capacity.

Spark SQL Operators

Operator
Description
!
Logical NOT
&
Returs the bitwise AND of two or more expressions
AND
Logical AND
|
Returs the bitwise OR of two or more expressions
OR
Logical OR
^
Returns the bitwise exclusive OR of two or more expressions
~
Returns the bitwise NOT of two or more expressions
<
Less than
<=
Less than or equal to
>
Greater than
>=
Greater than or equal to
= , ==
Returns true if two expressions are equal and false if not
<==>
Same as equals to, but returns TRUE if both values are NULL
and FALSE if one of the values is NULL
%
Retains the remainder of two or more expressions
*
Multiplication
/
Division
+
Addition
Subtraction

Spark SQL String Functions

Function
Syntax
Description
ascii
ascii(string)
Returns the numeric value of the left most character
bin
bin(expression)
Returns the string representation of “expression” represented in binary
bit_length
bit_length(expr)
Returns the bit length of string data or number of bits of binary data
char_length
character_length
char_length(expression)
character_length(expression)
Returns the character length of string data or number of bytes of binary data
chr
char
chr(expression)
char(expression)
Returns the ASCII character of the binary equivalent
first
first_value
first(expression[, isIgnoreNull])
first_value(expression[, isIgnoreNull])
Returns the first value of “expression” from a group of rows
initcap
initcap(string)
Converts the FIRST character of “string” to UPPERCASE
lcase
lower
lcase(string)
lower(string)
Converts “string” to LOWERCASE
left
left(string, length)
Returns the leftmost characters from “string” to a length specified by “length”
length
length(string)
Returns the length of “string”
lpad
lpad(string, length, pad)
Left pads “string” using character specified by “pad” to a length of “length”
ltrim
ltrim(trimString, string)
Removes all occurances of “trimString” from the leftmost/beginning of “string”
position
position(substring, string[, position])
Returns “substring” from “string”, starting at “positon”
printf
printf(strfmt, obj, …)
Returns a formatted string from printf-style format strings
repeat
repeat(string, n)
Repeats “string”, “n” number of times
replace
replace(string, search[, replace])
Replaces all occurances of “search” in “string” with “replace”
right
right(string, length)
Returns the characters of “length” from “string” starting from the rigthmost position
rpad
rpad(string, length, pad)
Right pads “string” with character specified in “pad” to length specified by “length”
rtrim
rtrim(trimString,string)
Right trims the character specified by “trimString” in “string”
soundex
soundex(str)
Returns the soundex code of the string
space
space(n)
Returns a string with “n” spaces
substr
substring
substr(string, position[, length])
substring(string, position[, length])
Returns the substring from “string”, starting at “position” of specified “length”
substring_index
substring_index(string, delimiter, count)
Returns the substring from “string” before “count” number of occurances of the specified “delimiter”
translate
translate(input, from, to)
Converts the characters in the “input” string “from” a given character “to” a new character
trim
trim(string)
Removes the leading and trailing white space characters from a string
ucase
upper
ucase(string)
upper(string)
Converts a given input string to UPPERCASE
unbase64
unbase64(string)
Converts a given string from base 64 to binary

Apache Spark Documentation

  • For the official Apache Spark v2.3.2 Spark SQL Programming Guide, click here.
  • For the official Apache Spark SQL API documentation, click here.

Other related documents

  • For the official Apache Spark v2.3.2 Spark SQL Programming Guide, click here.
  • For the official Apache Spark SQL API documentation, click here.

Email us at : info@obstkel.com

Copyright 2019 © OBSTKEL LLC. All rights Reserved.
Scroll to Top