Phoenix Language Reference

Phoenix is a SQL layer over HBase delivered as an embedded JDBC driver. The top level commands, SQL syntax, built-in functions, and supported data types are documented in the separate tabs below.


Commands

SELECT
UPSERT VALUES
UPSERT SELECT
DELETE
CREATE
DROP
ALTER TABLE
EXPLAIN

Other Grammar

Constraint
Table Options
Options
Hint
Column
Select Expression
Split Point
Table Expression
Order
Expression
And Condition
Condition
Compare
Operand
Summand
Factor
Term
Bind Parameter
Value
Case
Case When
Name
Quoted Name
Alias
Null
Data Type
String
Boolean
Numeric
Int
Long
Decimal
Number
Comments

SELECT

SELECT [/*+ hint */] [DISTINCT | ALL] selectExpression [,...]
FROM tableExpression [( column [,...] )] [ WHERE expression ]
[ GROUP BY expression [,...] ] [ HAVING expression ]
[ ORDER BY order [,...] ] [ LIMIT {bindParameter | number} ]
SELECT
 
/ * + hint * /
 
DISTINCT
ALL
selectExpression
 
, ...

FROM tableExpression
 
( column
 
, ...
)
 
WHERE expression

 
GROUP BY expression
 
, ...
 
HAVING expression

 
ORDER BY order
 
, ...
 
LIMIT
bindParameter
number

Selects data from a table. DISTINCT filters out duplicate results while ALL, the default, includes all results. FROM identifies the table being queried (single table only currently - no joins or derived tables yet). Dynamic columns not declared at create time may be defined in parenthesis after the table name and then used in the query. GROUP BY groups the the result by the given expression(s). HAVING filter rows after grouping. ORDER BY sorts the result by the given column(s) or expression(s) and is only allowed for aggregate queries or queries with a LIMIT clause. LIMIT limits the number of rows returned by the query with no limit applied if specified as null or less than zero. The LIMIT clause is executed after the ORDER BY clause to support TopN type queries. An optional hint overrides the default query plan.

Example:

SELECT * FROM TEST;
SELECT a.* FROM TEST;
SELECT DISTINCT NAME FROM TEST;
SELECT ID, COUNT(1) FROM TEST GROUP BY ID;
SELECT NAME, SUM(VAL) FROM TEST GROUP BY NAME HAVING COUNT(1) > 2;
SELECT 'ID' COL, MAX(ID) AS MAX FROM TEST;
SELECT * FROM TEST LIMIT 1000;

UPSERT VALUES

UPSERT INTO tableName [( columnName [,...] )] VALUES ( constantTerm [,...] )
UPSERT INTO tableName
 
( columnName
 
, ...
)
VALUES ( constantTerm
 
, ...
)

Inserts if not present and updates otherwise the value in the table. The list of columns is optional and if not present, the values will map to the column in the order they are declared in the schema. The values must evaluate to constants.

Example:

UPSERT INTO TEST VALUES('foo','bar',3);
UPSERT INTO TEST(NAME,ID) VALUES('foo',123);

UPSERT SELECT

UPSERT INTO tableName [( columnName [,...] )] select
UPSERT INTO tableName
 
( columnName
 
, ...
)
select

Inserts if not present and updates otherwise rows in the table based on the results of running another query. The values are set based on their matching position between the source and target tables. The list of columns is optional and if not present will map to the column in the order they are declared in the schema. If auto commit is on, and both a) the target table matches the source table, and b) the select performs no aggregation, then the population of the target table will be done completely on the server-side (with constraint violations logged, but otherwise ignored). Otherwise, data is buffered on the client and, if auto commit is on, committed in row batches as specified by the UpsertBatchSize connection property (or the phoenix.mutate.upsertBatchSize HBase config property which defaults to 10000 rows)

Example:

UPSERT INTO test.targetTable(col1, col2) SELECT col3, col4 FROM test.sourceTable WHERE col5 < 100
UPSERT INTO foo SELECT * FROM bar;

DELETE

DELETE [/*+ hint */] FROM tableName [ WHERE expression ]
[ ORDER BY order [,...] ] [ LIMIT {bindParameter | number} ]
DELETE
 
/ * + hint * /
FROM tableName
 
WHERE expression

 
ORDER BY order
 
, ...
 
LIMIT
bindParameter
number

Deletes the rows selected by the where clause. If auto commit is on, the deletion is performed completely server-side.

Example:

DELETE FROM TEST;
DELETE FROM TEST WHERE ID=123;
DELETE FROM TEST WHERE NAME LIKE 'foo%';

CREATE

CREATE { TABLE | VIEW } [IF NOT EXISTS] tableName
( column [,...] [constraint] )
[tableOptions] [ SPLIT ON ( splitPoint [,...] ) ]
CREATE
TABLE
VIEW
 
IF NOT EXISTS
tableName

( column
 
, ...
 
constraint
)

 
tableOptions
 
SPLIT ON ( splitPoint
 
, ...
)

Creates a new table or view. For the creation of a table, the HBase table and any column families referenced are created if they don't already exist (using uppercase names unless they are double quoted in which case they are case sensitive). Column families outside of the ones listed are not affected. At create time, an empty key value is added to the first column family of any existing rows. Upserts will also add this empty key value. This is done to improve query performance by having a key value column we can guarantee always being there (minimizing the amount of data that must be projected). Alternately, if a view is created, the HBase table and column families must already exist. No empty key value is added to existing rows and no data mutations are allowed - the view is read-only. Query performance for a view will not be as good as performance for a table. For a table only, HBase table and column configuration options may be passed through as key/value pairs to setup the HBase table as needed.

Example:

CREATE TABLE my_table ( id BIGINT not null primary key, date DATE not null)
CREATE TABLE my_table ( id INTEGER not null primary key desc, date DATE not null,
    m.db_utilization DECIMAL, i.db_utilization)
    m.DATA_BLOCK_ENCODING='DIFF'
CREATE TABLE prod_metrics ( host char(50) not null, created_date date not null,
    txn_count bigint CONSTRAINT pk PRIMARY KEY (host, created_date) )
CREATE TABLE IF NOT EXISTS my_table ( id char(10) not null primary key, value integer)
    DATA_BLOCK_ENCODING='NONE',VERSIONS=?,MAX_FILESIZE=2000000 split on (?, ?, ?)

DROP

DROP {TABLE | VIEW} [IF EXISTS] tableName
DROP
TABLE
VIEW
 
IF EXISTS
tableName

Drops a table or view. When dropping a table, the data in the table is deleted. For a view, on the other hand, the data is not affected. Note that the schema is versioned, such that snapshot queries connecting at an earlier time stamp may still query against the dropped table, as the HBase table itself is not deleted.

Example:

DROP TABLE my_schema.my_table
DROP VIEW my_view

ALTER TABLE

ALTER TABLE tableName { { ADD [IF NOT EXISTS] column [options] } | { DROP COLUMN [IF EXISTS] [ familyName. ] columnName } }
ALTER TABLE tableName
ADD
 
IF NOT EXISTS
column
 
options
DROP COLUMN
 
IF EXISTS
 
familyName .
columnName

Adds or removes a column from an existing table or view. When a column is dropped from a table, the data in that column is deleted as well. PK columns may not be dropped, and only nullable PK columns may be added. For a view, the data is not affected when a column is dropped. Note that creating or dropping columns only affects subsequent queries and data modifications. Snapshot queries that are connected at an earlier timestamp will still use the prior schema that was in place when the data was written.

Example:

ALTER TABLE my_schema.my_table ADD d.dept_id char(10) VERSIONS=10
ALTER TABLE my_table ADD dept_name char(50)
ALTER TABLE my_table ADD parent_id char(15) null primary key
ALTER TABLE my_table DROP COLUMN d.dept_id
ALTER TABLE my_table DROP COLUMN dept_name
ALTER TABLE my_table DROP COLUMN parent_id

EXPLAIN

EXPLAIN {select|upsertSelect|delete}
EXPLAIN
select
upsertSelect
delete

Computes the logical steps necessary to execute the given command. Each step is represented as a string in a single column result set row.

Example:

EXPLAIN SELECT NAME, COUNT(*) FROM TEST GROUP BY NAME HAVING COUNT(*) > 2;
EXPLAIN SELECT entity_id FROM CORE.CUSTOM_ENTITY_DATA WHERE organization_id='00D300000000XHP' AND SUBSTR(entity_id,1,3) = '002' AND created_date < CURRENT_DATE()-1;

Constraint

CONSTRAINT constraintName PRIMARY KEY ( columnName
 
ASC
DESC
 
, ...
)

Defines a multi-part primary key constraint. Each column may be declared to be sorted in ascending or descending ordering. The default is ascending.

Example:

CONSTRAINT my_pk PRIMARY KEY (host,created_date)
CONSTRAINT my_pk PRIMARY KEY (host ASC,created_date DESC)

Table Options

 
familyName .
name =
value
bindParameter
 
, ...

Sets an option on an HBase table or column by modifying the respective HBase metadata. The option applies to the named family or if omitted to all families if the name references an HColumnDescriptor property. Otherwise, the option applies to the HTableDescriptor.

One built-in option is SALT_BUCKETS. This option causes an extra byte to be transparently prepended to every row key to ensure an even distribution of write load across all your region servers. This is useful when your row key is always monotonically increasing causing hot spotting on a single region server. The byte is determined by hashing the row key and modding it with the SALT_BUCKETS value. The value may be from 1 to 256. If not split points are defined for the table, it will automatically be pre-split at each possible salt bucket value. For an excellent write-up of this technique, see http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/

Example:

SALT_BUCKETS=10
DATA_BLOCK_ENCODING='NONE',a.VERSIONS=10
MAX_FILESIZE=2000000000,MEMSTORE_FLUSHSIZE=80000000

Options

name =
value
bindParameter
 
, ...

Sets an option on an HBase column by modifying the HColumnDescriptor.

Example:

DATA_BLOCK_ENCODING='NONE',VERSIONS=10
MAX_FILESIZE=2000000000,MEMSTORE_FLUSHSIZE=80000000

Hint

name
 
, ...

Advanced features that overrides default query processing behavior. The three supported hints are 1) SKIP_SCAN to force a skip scan to be performed on the query when it otherwise would not. This option may improve performance if a query does not include the leading primary key column, but does include other, very selective primary key columns. 2) RANGE_SCAN to force a range scan to be performed on the query. This option may improve performance if a query filters on a range for non selective leading primary key column along with other primary key columns 3) NO_INTRA_REGION_PARALLELIZATION to prevent the spawning of multiple threads to process data within a single region. This option is useful when the overall data set being queries is known to be small.

Example:

/*+ SKIP_SCAN */
/*+ RANGE_SCAN */
/*+ NO_INTRA_REGION_PARALLELIZATION */

Column

 
familyName .
columnName dataType
 
 
NOT
NULL
 
PRIMARY KEY
 
ASC
DESC

Define a new primary key column. The column name is case insensitive by default and case sensitive if double quoted. The sort order of a primary key may be ascending (ASC) or descending. The default is ascending.

Example:

id char(15) not null primary key
key integer null
m.response_time bigint

Select Expression

*
( familyName . * )
term
 
 
AS
columnAlias

An expression in a SELECT statement. All columns in a table may be selected using *, and all columns in a column family may be selected using <familyName>.*.

Example:

*
cf.*
ID AS VALUE
VALUE + 1 VALUE_PLUS_ONE

Split Point

value
bindParameter

Defines a split point for a table. Use a bind parameter with preparedStatement.setBinary(int,byte[]) to supply arbitrary bytes.

Example:

'A'

Table Expression

 
schemaName .
tableName
 
 
AS
tableAlias

A reference to a table. Joins and sub queries are not currently supported.

Example:

PRODUCT_METRICS AS PM

Order

expression
 
ASC
DESC
 
NULLS
FIRST
LAST

Sorts the result by an expression. Only supported for queries that use GROUP BY.

Example:

NAME DESC NULLS LAST

Expression

andCondition
 
OR andCondition
 
...

Value or condition.

Example:

ID=1 OR NAME='Hi'

And Condition

condition
 
AND condition
 
...

Value or condition.

Example:

ID=1 AND NAME='Hi'

Condition

operand
 
compare operand
IN ( constantOperand
 
, ...
)
 
NOT
LIKE operand
IS
 
NOT
NULL
NOT expression
( expression )

Boolean value or condition. When comparing with LIKE, the wildcards characters are _ (any one character) and % (any characters). To search for the characters % and _, the characters need to be escaped. The escape character is \ (backslash). Patterns that end with an escape character are invalid and the expression returns NULL.

Example:

NAME LIKE 'Jo%'

Compare

< >
< =
> =
=
<
>
! =

Comparison operator. The operator != is the same as <>.

Example:

<>

Operand

summand
 
|| summand
 
...

A string concatenation.

Example:

'foo'|| s

Summand

factor
 
+
-
factor
 
...

An addition or subtraction of numeric or date type values

Example:

a + b
a - b

Factor

term
 
*
/
term
 
...

A multiplication or division.

Example:

c * d
e / 5

Term

value
bindParameter
Function
case
caseWhen
( operand )
 
tableAlias .
columnName

A value.

Example:

'Hello'

Bind Parameter

?
: number

A parameters can be indexed, for example :1 meaning the first parameter.

Example:

:1
?

Value

string
numeric
boolean
null

A literal value of any data type, or null.

Example:

10

Case

CASE term WHEN expression THEN term
 
...

 
ELSE expression
END

Returns the first expression where the value is equal to the test expression. If no else part is specified, return NULL.

Example:

CASE CNT WHEN 0 THEN 'No' WHEN 1 THEN 'One' ELSE 'Some' END

Case When

CASE WHEN expression THEN term
 
...

 
ELSE term
END

Returns the first expression where the condition is true. If no else part is specified, return NULL.

Example:

CASE WHEN CNT<10 THEN 'Low' ELSE 'High' END

Name

A-Z | _
 
A-Z | _
0-9
 
...
quotedName

Unquoted names are not case sensitive. There is no maximum name length.

Example:

my_column

Quoted Name

" anything "

Quoted names are case sensitive, and can contain spaces. There is no maximum name length. Two double quotes can be used to create a single double quote inside an identifier.

Example:

"first-name"

Alias

name

An alias is a name that is only valid in the context of the statement.

Example:

A

Null

NULL

NULL is a value without data type and means 'unknown value'.

Example:

NULL

Data Type

charType
varcharType
integerType
bigintType
decimalType
timestampType
dateType
timeType
unsignedLongType
unsignedIntType
binaryType
varBinaryType

A type name.

Example:

CHAR(15)
VARCHAR
VARCHAR(1000)
INTEGER
BINARY(200)

String

' anything '

A string starts and ends with a single quote. Two single quotes can be used to create a single quote inside a string.

Example:

'John''s car'

Boolean

TRUE
FALSE

A boolean value.

Example:

TRUE

Numeric

int
long
decimal

The data type of a numeric value is always the lowest possible for the given value. If the number contains a dot this is decimal; otherwise it is int, long, or decimal (depending on the value).

Example:

SELECT -10.05
SELECT 5
SELECT 12345678912345

Int

 
-
number

The maximum integer number is 2147483647, the minimum is -2147483648.

Example:

10

Long

 
-
number

Long numbers are between -9223372036854775808 and 9223372036854775807.

Example:

100000

Decimal

 
-
number
 
. number

A decimal number with fixed precision and scale. Internally, java.lang.BigDecimal is used.

Example:

SELECT -10.5

Number

0-9
 
...

The maximum length of the number depends on the data type used.

Example:

100

Comments

- - anything
/ / anything
/ * anything * /

Comments can be used anywhere in a command and are ignored by the database. Line comments end with a newline. Block comments cannot be nested, but can be multiple lines long.

Example:

// This is a comment