Batch Streaming
The general syntax of the SELECT
statement is:
SELECT select_list FROM table_expression [ WHERE boolean_expression ]
The table_expression
refers to any source of data. It could be an existing table, view, or VALUES
clause, the joined results of multiple existing tables, or a subquery. Assuming that the table is available in the catalog, the following would read all rows from Orders
.
SELECT * FROM Orders
The select_list
specification *
means the query will resolve all columns. However, usage of *
is discouraged in production because it makes queries less robust to catalog changes. Instead, a select_list
can specify a subset of available columns or make calculations using said columns. For example, if Orders
has columns named order_id
, price
, and tax
you could write the following query:
SELECT order_id, price + tax FROM Orders
Queries can also consume from inline data using the VALUES
clause. Each tuple corresponds to one row and an alias may be provided to assign names to each column.
SELECT order_id, price FROM (VALUES (1, 2.0), (2, 3.1)) AS t (order_id, price)
Rows can be filtered based on a WHERE
clause.
SELECT price + tax FROM Orders WHERE id = 10
Additionally, built-in and user-defined scalar functions can be invoked on the columns of a single row. User-defined functions must be registered in a catalog before use.
SELECT PRETTY_PRINT(order_id) FROM Orders
Batch Streaming
If SELECT DISTINCT
is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates).
SELECT DISTINCT id FROM Orders
For streaming queries, the required state for computing the query result might grow infinitely. State size depends on number of distinct rows. You can provide a query configuration with an appropriate state time-to-live (TTL) to prevent excessive state size. Note that this might affect the correctness of the query result. See query configuration for details