Phoenix Language Reference

Phoenix is a SQL layer over HBase delivered as an embedded JDBC driver. The top level commands, SQL syntax, built-in functions, and supported data types are documented in the separate tabs below.


Commands

SELECT
UPSERT VALUES
UPSERT SELECT
DELETE
CREATE
DROP
ALTER TABLE
CREATE INDEX
DROP INDEX
ALTER INDEX
EXPLAIN

Other Grammar

Constraint
Options
Hint
Column Def
Table Ref
Column Ref
Select Expression
Split Point
Table Expression
Order
Expression
And Condition
Condition
Compare
Operand
Summand
Factor
Term
Row Value Constructor
Bind Parameter
Value
Case
Case When
Name
Quoted Name
Alias
Null
Data Type
String
Boolean
Numeric
Int
Long
Decimal
Number
Comments

SELECT

SELECT [/*+ hint */] [DISTINCT | ALL] selectExpression [,...]
FROM tableExpression [( columnDef [,...] )] [ WHERE expression ]
[ GROUP BY expression [,...] ] [ HAVING expression ]
[ ORDER BY order [,...] ] [ LIMIT {bindParameter | number} ]
SELECT
 
/ * + hint * /
 
DISTINCT
ALL
selectExpression
 
, ...

FROM tableExpression
 
( columnDef
 
, ...
)
 
WHERE expression

 
GROUP BY expression
 
, ...
 
HAVING expression

 
ORDER BY order
 
, ...
 
LIMIT
bindParameter
number

Selects data from a table. DISTINCT filters out duplicate results while ALL, the default, includes all results. FROM identifies the table being queried (single table only currently - no joins or derived tables yet). Dynamic columns not declared at create time may be defined in parenthesis after the table name and then used in the query. GROUP BY groups the the result by the given expression(s). HAVING filter rows after grouping. ORDER BY sorts the result by the given column(s) or expression(s) and is only allowed for aggregate queries or queries with a LIMIT clause. LIMIT limits the number of rows returned by the query with no limit applied if specified as null or less than zero. The LIMIT clause is executed after the ORDER BY clause to support TopN type queries. An optional hint overrides the default query plan.

Example:

SELECT * FROM TEST;
SELECT a.* FROM TEST;
SELECT DISTINCT NAME FROM TEST;
SELECT ID, COUNT(1) FROM TEST GROUP BY ID;
SELECT NAME, SUM(VAL) FROM TEST GROUP BY NAME HAVING COUNT(1) > 2;
SELECT 'ID' COL, MAX(ID) AS MAX FROM TEST;
SELECT * FROM TEST LIMIT 1000;

UPSERT VALUES

UPSERT INTO tableName [( { columnRef | columnDef } [,...] )] VALUES ( constantTerm [,...] )
UPSERT INTO tableName
 
(
columnRef
columnDef
 
, ...
)
VALUES ( constantTerm
 
, ...
)

Inserts if not present and updates otherwise the value in the table. The list of columns is optional and if not present, the values will map to the column in the order they are declared in the schema. The values must evaluate to constants.

Example:

UPSERT INTO TEST VALUES('foo','bar',3);
UPSERT INTO TEST(NAME,ID) VALUES('foo',123);

UPSERT SELECT

UPSERT [/*+ hint */] INTO tableName [( { columnRef | columnDef } [,...] )] select
UPSERT
 
/ * + hint * /
INTO tableName
 
(
columnRef
columnDef
 
, ...
)
select

Inserts if not present and updates otherwise rows in the table based on the results of running another query. The values are set based on their matching position between the source and target tables. The list of columns is optional and if not present will map to the column in the order they are declared in the schema. If auto commit is on, and both a) the target table matches the source table, and b) the select performs no aggregation, then the population of the target table will be done completely on the server-side (with constraint violations logged, but otherwise ignored). Otherwise, data is buffered on the client and, if auto commit is on, committed in row batches as specified by the UpsertBatchSize connection property (or the phoenix.mutate.upsertBatchSize HBase config property which defaults to 10000 rows)

Example:

UPSERT INTO test.targetTable(col1, col2) SELECT col3, col4 FROM test.sourceTable WHERE col5 < 100
UPSERT INTO foo SELECT * FROM bar;

DELETE

DELETE [/*+ hint */] FROM tableName [ WHERE expression ]
[ ORDER BY order [,...] ] [ LIMIT {bindParameter | number} ]
DELETE
 
/ * + hint * /
FROM tableName
 
WHERE expression

 
ORDER BY order
 
, ...
 
LIMIT
bindParameter
number

Deletes the rows selected by the where clause. If auto commit is on, the deletion is performed completely server-side.

Example:

DELETE FROM TEST;
DELETE FROM TEST WHERE ID=123;
DELETE FROM TEST WHERE NAME LIKE 'foo%';

CREATE

CREATE { TABLE | VIEW } [IF NOT EXISTS] tableRef
( columnDef [,...] [constraint] )
[tableOptions] [ SPLIT ON ( splitPoint [,...] ) ]
CREATE
TABLE
VIEW
 
IF NOT EXISTS
tableRef

( columnDef
 
, ...
 
constraint
)

 
tableOptions
 
SPLIT ON ( splitPoint
 
, ...
)

Creates a new table or view. For the creation of a table, the HBase table and any column families referenced are created if they don't already exist (using uppercase names unless they are double quoted in which case they are case sensitive). Column families outside of the ones listed are not affected. At create time, an empty key value is added to the first column family of any existing rows. Upserts will also add this empty key value. This is done to improve query performance by having a key value column we can guarantee always being there (minimizing the amount of data that must be projected). Alternately, if a view is created, the HBase table and column families must already exist. No empty key value is added to existing rows and no data mutations are allowed - the view is read-only. Query performance for a view will not be as good as performance for a table. For a table only, HBase table and column configuration options may be passed through as key/value pairs to setup the HBase table as needed.

Example:

CREATE TABLE my_schema.my_table ( id BIGINT not null primary key, date DATE not null)
CREATE TABLE my_table ( id INTEGER not null primary key desc, date DATE not null,
    m.db_utilization DECIMAL, i.db_utilization)
    m.DATA_BLOCK_ENCODING='DIFF'
CREATE TABLE stats.prod_metrics ( host char(50) not null, created_date date not null,
    txn_count bigint CONSTRAINT pk PRIMARY KEY (host, created_date) )
CREATE TABLE IF NOT EXISTS my_table ( id char(10) not null primary key, value integer)
    DATA_BLOCK_ENCODING='NONE',VERSIONS=?,MAX_FILESIZE=2000000 split on (?, ?, ?)

DROP

DROP {TABLE | VIEW} [IF EXISTS] tableRef
DROP
TABLE
VIEW
 
IF EXISTS
tableRef

Drops a table or view. When dropping a table, the data in the table is deleted. For a view, on the other hand, the data is not affected. Note that the schema is versioned, such that snapshot queries connecting at an earlier time stamp may still query against the dropped table, as the HBase table itself is not deleted.

Example:

DROP TABLE my_schema.my_table
DROP VIEW my_view

ALTER TABLE

ALTER TABLE tableRef { { ADD [IF NOT EXISTS] columnDef [options] } | { DROP COLUMN [IF EXISTS] columnRef } | { SET options } }
ALTER TABLE tableRef
ADD
 
IF NOT EXISTS
columnDef
 
options
DROP COLUMN
 
IF EXISTS
columnRef
SET options

Alters an existing table by adding or removing a column or updating table options. When a column is dropped from a table, the data in that column is deleted as well. PK columns may not be dropped, and only nullable PK columns may be added. For a view, the data is not affected when a column is dropped. Note that creating or dropping columns only affects subsequent queries and data modifications. Snapshot queries that are connected at an earlier timestamp will still use the prior schema that was in place when the data was written.

Example:

ALTER TABLE my_schema.my_table ADD d.dept_id char(10) VERSIONS=10
ALTER TABLE my_table ADD dept_name char(50)
ALTER TABLE my_table ADD parent_id char(15) null primary key
ALTER TABLE my_table DROP COLUMN d.dept_id
ALTER TABLE my_table DROP COLUMN dept_name
ALTER TABLE my_table DROP COLUMN parent_id
ALTER TABLE my_table SET IMMUTABLE_ROWS=true

CREATE INDEX

CREATE INDEX [IF NOT EXISTS] indexName
ON tableRef ( columnRef [ASC | DESC] [,...] )
[ INCLUDE ( columnRef [,...] ) ]
[indexOptions] [ SPLIT ON ( splitPoint [,...] ) ]
CREATE INDEX
 
IF NOT EXISTS
indexName

ON tableRef ( columnRef
 
ASC
DESC
 
, ...
)

 
INCLUDE ( columnRef
 
, ...
)

 
indexOptions
 
SPLIT ON ( splitPoint
 
, ...
)

Creates a new secondary index on a table or view. The index will be automatically kept in sync with the table as the data changes. At query time, the optimizer will use the index if it contains all columns referenced in the query and produces the most efficient execution plan. If a table has rows that are write-once and append-only, then the table may set the IMMUTABLE_ROWS property to true (either up-front in the CREATE TABLE statement or afterwards in an ALTER TABLE statement). This reduces the overhead at write time to maintain the index. Otherwise, if this property is not set on the table, then incremental index maintenance will be performed on the server side when the data changes.

Example:

CREATE INDEX my_idx ON sales.opportunity(last_updated_date DESC)
CREATE INDEX my_idx ON log.event(created_date DESC) INCLUDE (name, payload) SALT_BUCKETS=10
CREATE INDEX IF NOT EXISTS my_comp_idx ON server_metrics ( gc_time DESC, created_date DESC )
    DATA_BLOCK_ENCODING='NONE',VERSIONS=?,MAX_FILESIZE=2000000 split on (?, ?, ?)

DROP INDEX

DROP INDEX [IF EXISTS] indexName ON tableRef
DROP INDEX
 
IF EXISTS
indexName ON tableRef

Drops an index from a table. When dropping an index, the data in the index is deleted. Note that since metadata is versioned, snapshot queries connecting at an earlier time stamp may still use the index, as the HBase table backing the index is not deleted.

Example:

DROP INDEX my_idx ON sales.opportunity
DROP INDEX IF EXISTS my_idx ON server_metrics

ALTER INDEX

ALTER INDEX [IF EXISTS] indexName ON tableRef { DISABLE | REBUILD | UNUSABLE | USABLE }
ALTER INDEX
 
IF EXISTS
indexName ON tableRef
DISABLE
REBUILD
UNUSABLE
USABLE

Alters the state of an existing index.  DISABLE will cause the no further index maintenance to be performed on the index and it will no longer be considered for use in queries. REBUILD will completely rebuild the index and upon completion will enable the index to be used in queries again. UNUSABLE will cause the index to no longer be considered for use in queries, however index maintenance will continue to be performed. USABLE will cause the index to again be considered for use in queries. Note that a disabled index must be rebuild and cannot be set as USABLE.

Example:

ALTER INDEX my_idx ON sales.opportunity DISABLE
ALTER INDEX IF EXISTS my_idx ON server_metrics REBUILD

EXPLAIN

EXPLAIN {select|upsertSelect|delete}
EXPLAIN
select
upsertSelect
delete

Computes the logical steps necessary to execute the given command. Each step is represented as a string in a single column result set row.

Example:

EXPLAIN SELECT NAME, COUNT(*) FROM TEST GROUP BY NAME HAVING COUNT(*) > 2;
EXPLAIN SELECT entity_id FROM CORE.CUSTOM_ENTITY_DATA WHERE organization_id='00D300000000XHP' AND SUBSTR(entity_id,1,3) = '002' AND created_date < CURRENT_DATE()-1;

Constraint

CONSTRAINT constraintName PRIMARY KEY ( columnName
 
ASC
DESC
 
, ...
)

Defines a multi-part primary key constraint. Each column may be declared to be sorted in ascending or descending ordering. The default is ascending.

Example:

CONSTRAINT my_pk PRIMARY KEY (host,created_date)
CONSTRAINT my_pk PRIMARY KEY (host ASC,created_date DESC)

Options

 
familyName .
name =
value
bindParameter
 
, ...

Sets an option on an HBase table or column by modifying the respective HBase metadata. The option applies to the named family or if omitted to all families if the name references an HColumnDescriptor property. Otherwise, the option applies to the HTableDescriptor.

One built-in option is SALT_BUCKETS. This option causes an extra byte to be transparently prepended to every row key to ensure an even distribution of write load across all your region servers. This is useful when your row key is always monotonically increasing causing hot spotting on a single region server. The byte is determined by hashing the row key and modding it with the SALT_BUCKETS value. The value may be from 1 to 256. If not split points are defined for the table, it will automatically be pre-split at each possible salt bucket value. For an excellent write-up of this technique, see http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/

Another built-in options is IMMUTABLE_ROWS. Only tables with immutable rows are allowed to have indexes. Immutable rows are expected to be inserted once in their entirety and then never updated. This limitation will be removed once incremental index maintenance has been implemented. The current implementation inserts the index rows when the data row is inserted.

Example:

IMMUTABLE_ROWS=true
SALT_BUCKETS=10
DATA_BLOCK_ENCODING='NONE',a.VERSIONS=10
MAX_FILESIZE=2000000000,MEMSTORE_FLUSHSIZE=80000000

Hint

name
 
, ...

Advanced features that overrides default query processing behavior. The supported hints include 1) SKIP_SCAN to force a skip scan to be performed on the query when it otherwise would not be. This option may improve performance if a query does not include the leading primary key column, but does include other, very selective primary key columns. 2) RANGE_SCAN to force a range scan to be performed on the query. This option may improve performance if a query filters on a range for non selective leading primary key column along with other primary key columns 3) NO_INTRA_REGION_PARALLELIZATION to prevent the spawning of multiple threads to process data within a single region. This option is useful when the overall data set being queries is known to be small. 4) NO_INDEX to force the data table to be used for a query, and 5) INDEX(<table_name> <index_name>...) to suggest which index to use for a given query. Double quotes may be used to surround a table_name and/or index_name that is case sensitive.

Example:

/*+ SKIP_SCAN */
/*+ RANGE_SCAN */
/*+ NO_INTRA_REGION_PARALLELIZATION */
/*+ NO_INDEX */
/*+ INDEX(employee emp_name_idx emp_start_date_idx) */

Column Def

columnRef dataType
 
 
NOT
NULL
 
PRIMARY KEY
 
ASC
DESC

Define a new primary key column. The column name is case insensitive by default and case sensitive if double quoted. The sort order of a primary key may be ascending (ASC) or descending. The default is ascending.

Example:

id char(15) not null primary key
key integer null
m.response_time bigint

Table Ref

 
schemaName .
tableName

References a table with an optional schema name qualifier

Example:

Sales.Contact
HR.Employee
Department

Column Ref

 
familyName .
columnName

References a column with an optional family name qualifier

Example:

e.salary
dept_name

Select Expression

*
( familyName . * )
term
 
 
AS
columnAlias

An expression in a SELECT statement. All columns in a table may be selected using *, and all columns in a column family may be selected using <familyName>.*.

Example:

*
cf.*
ID AS VALUE
VALUE + 1 VALUE_PLUS_ONE

Split Point

value
bindParameter

Defines a split point for a table. Use a bind parameter with preparedStatement.setBinary(int,byte[]) to supply arbitrary bytes.

Example:

'A'

Table Expression

 
schemaName .
tableName
 
 
AS
tableAlias

A reference to a table. Joins and sub queries are not currently supported.

Example:

PRODUCT_METRICS AS PM

Order

expression
 
ASC
DESC
 
NULLS
FIRST
LAST

Sorts the result by an expression.

Example:

NAME DESC NULLS LAST

Expression

andCondition
 
OR andCondition
 
...

Value or condition.

Example:

ID=1 OR NAME='Hi'

And Condition

condition
 
AND condition
 
...

Value or condition.

Example:

ID=1 AND NAME='Hi'

Condition

operand
 
compare operand
 
NOT
IN ( constantOperand
 
, ...
)
 
NOT
LIKE operand
 
NOT
BETWEEN operand AND operand
IS
 
NOT
NULL
NOT expression
( expression )

Boolean value or condition. When comparing with LIKE, the wildcards characters are _ (any one character) and % (any characters). To search for the characters % and _, the characters need to be escaped. The escape character is \ (backslash). Patterns that end with an escape character are invalid and the expression returns NULL. BETWEEN does an inclusive comparison for both operands.

Example:

NAME LIKE 'Jo%'

Compare

< >
< =
> =
=
<
>
! =

Comparison operator. The operator != is the same as <>.

Example:

<>

Operand

summand
 
|| summand
 
...

A string concatenation.

Example:

'foo'|| s

Summand

factor
 
+
-
factor
 
...

An addition or subtraction of numeric or date type values

Example:

a + b
a - b

Factor

term
 
*
/
term
 
...

A multiplication or division.

Example:

c * d
e / 5

Term

value
bindParameter
Function
case
caseWhen
( operand )
 
tableAlias .
columnRef
rowValueConstructor

A value.

Example:

'Hello'

Row Value Constructor

( term , term
 
...
)

A row value constructor is a list of other terms which are treated together as a kind of composite structure. They may be compared to each other or to other other terms. The main use case is 1) to enable efficiently stepping through a set of rows in support of query-more type functionality, or 2) to allow IN clause to perform point gets on composite row keys.

Example:

(col1, col2, 5)

Bind Parameter

?
: number

A parameters can be indexed, for example :1 meaning the first parameter.

Example:

:1
?

Value

string
numeric
boolean
null

A literal value of any data type, or null.

Example:

10

Case

CASE term WHEN expression THEN term
 
...

 
ELSE expression
END

Returns the first expression where the value is equal to the test expression. If no else part is specified, return NULL.

Example:

CASE CNT WHEN 0 THEN 'No' WHEN 1 THEN 'One' ELSE 'Some' END

Case When

CASE WHEN expression THEN term
 
...

 
ELSE term
END

Returns the first expression where the condition is true. If no else part is specified, return NULL.

Example:

CASE WHEN CNT<10 THEN 'Low' ELSE 'High' END

Name

A-Z | _
 
A-Z | _
0-9
 
...
quotedName

Unquoted names are not case sensitive. There is no maximum name length.

Example:

my_column

Quoted Name

" anything "

Quoted names are case sensitive, and can contain spaces. There is no maximum name length. Two double quotes can be used to create a single double quote inside an identifier.

Example:

"first-name"

Alias

name

An alias is a name that is only valid in the context of the statement.

Example:

A

Null

NULL

NULL is a value without data type and means 'unknown value'.

Example:

NULL

Data Type

charType
varcharType
decimalType
tinyintType
smallintType
integerType
bigintType
floatType
doubleType
timestampType
dateType
timeType
unsignedTinyintType
unsignedSmallintType
unsignedIntType
unsignedLongType
unsignedFloatType
unsignedDoubleType
binaryType
varbinaryType

A type name.

Example:

CHAR(15)
VARCHAR
VARCHAR(1000)
INTEGER
BINARY(200)

String

' anything '

A string starts and ends with a single quote. Two single quotes can be used to create a single quote inside a string.

Example:

'John''s car'

Boolean

TRUE
FALSE

A boolean value.

Example:

TRUE

Numeric

int
long
decimal

The data type of a numeric value is always the lowest possible for the given value. If the number contains a dot this is decimal; otherwise it is int, long, or decimal (depending on the value).

Example:

SELECT -10.05
SELECT 5
SELECT 12345678912345

Int

 
-
number

The maximum integer number is 2147483647, the minimum is -2147483648.

Example:

10

Long

 
-
number

Long numbers are between -9223372036854775808 and 9223372036854775807.

Example:

100000

Decimal

 
-
number
 
. number

A decimal number with fixed precision and scale. Internally, java.lang.BigDecimal is used.

Example:

SELECT -10.5

Number

0-9
 
...

The maximum length of the number depends on the data type used.

Example:

100

Comments

- - anything
/ / anything
/ * anything * /

Comments can be used anywhere in a command and are ignored by the database. Line comments end with a newline. Block comments cannot be nested, but can be multiple lines long.

Example:

// This is a comment