Query Syntax Reference
This document details the syntax of the Datui query language. For examples and typical workflows, see Querying Data.
Structure of a query
A query has the form:
select [columns] [by group_columns] [where conditions]
select— Required. Starts every query. May be followed by nothing (select all columns), or a comma‑separated list of column expressions.by— Optional. Grouping and aggregation. Everything afterbyup towhere(if present) is the group specification.where— Optional. Filtering. Everything afterwhereis the filter expression.
Clause order is fixed: select → by → where. The parser splits on the keywords where and by (respecting parentheses and brackets), so you cannot reorder or repeat clauses.
The : assignment (aliasing)
Use : to name an expression. The form is name : expression:
- Left of
:— The new column or group name. Must be an identifier (e.g.total) orcol["name with spaces"]. - Right of
:— The expression (column reference, literal, arithmetic, function call, etc.).
Examples:
select a, b, sum_ab: a + b
select renamed: col["Original Name"]
by region_name: region, total: sales + tax
Assignment is supported in both the select and by clauses. In by, it defines computed group keys or renames (e.g. region_name: region, total: sales + tax). Expressions may use column references, arithmetic, literals, and function calls.
Columns with spaces in their names
Identifiers cannot contain spaces. For columns (or aliases) with spaces, use col["..."] or col[identifier]:
select col["First Name"], col["Last Name"]
select no_spaces: col["name with spaces"]
Inside the brackets use either a quoted string ("name with spaces") or a single identifier (no spaces). The same syntax works in select, by, and where.
Right‑to‑left expression parsing
Expressions are parsed right‑to‑left: the leftmost binary operator is the root, and the right subexpression is parsed first (so it effectively binds tighter).
What this means
a + b * c→ parsed asa + (b * c)(multiplication binds tighter).a * b + c→ parsed asa * (b + c)(*is leftmost; the right subexpressionb + cis parsed as a unit).
So “higher‑precedence”‑style grouping happens when you put those operations on the right. You can often avoid parentheses by ordering:
select x, y: a * b + c → a * (b + c)
select x, y: a + b * c → a + (b * c)
Using parentheses
Use () to override grouping:
select (a + b) * c
select a, b where (x > 1) | (y < 0)
Parentheses also matter for , and | in where: splitting on comma and pipe respects nesting, so you can wrap ORs (or more complex logic) in () and then combine with commas. See Where clause: , and |.
Select clause
selectalone — All columns, no expressions.select a, b, c— Those columns (and/or expressions), in order.select a, b: x + y, c— Columns and aliased expressions.
Columns are separated by ,. Each item is either a plain expression or alias : expression. The same expression rules (arithmetic, col[], functions, etc.) apply everywhere.
By clause (grouping and aggregation)
by col1, col2— Group by those columns. Non‑group columns become list columns; the UI supports drill‑down.by region, total: sales + tax— Group byregionand a computed expression (e.g. arithmetic).select avg salary, min id by department— Aggregations per group.
By uses the same comma‑separated list and name : expression rules as select. Aggregation functions (avg, min, max, count, sum, std, med) can be written as fn[expr] or fn expr (brackets optional).
Where clause: , and |
The where clause combines conditions using two separators:
,— AND. Each comma‑separated segment is one ANDed condition.|— OR. Within a single comma‑separated segment,|separates alternatives that are ORed together.
Parsing order:
- Split the where part on
,(respecting()and[]). Each segment is ANDed. - Within each segment, split on
|. The pieces are ORed.
So:
where a > 10, b < 2→(a > 10) AND (b < 2).where a > 10 | a < 5→(a > 10) OR (a < 5).where a > 10 | a < 5, b = 2→(a > 10 OR a < 5) AND (b = 2).
Interaction of , and |
, has broader scope than |: it splits the where clause into top‑level AND terms. | only groups within one of those terms. So:
A, B | C=A AND (B OR C).A | B, C | D=(A OR B) AND (C OR D).
To express (A or B) and (C or D), use (A)|(B), (C)|(D). Parentheses keep | inside each AND term. For more complex logic, use () to group OR subexpressions, then separate those groups with ,.
Comparison operators
In where conditions, each comparison operator compares a left-hand expression (often a column) to a right-hand expression (column, literal, or expression).
| Operator | Meaning | Example |
|---|---|---|
= | Equal | where a = 10 |
!= | Not equal | where a != 0 |
<> | Not equal (same as !=) | where a <> 0 |
< | Less than | where a < 100 |
> | Greater than | where a > 5 |
<= | Less than or equal | where a <= 10 |
>= | Greater than or equal | where a >= 1 |
Operators and literals
- Arithmetic:
+-*%(%is division, not modulo). - Comparison: See Comparison operators (where clauses) above.
- Literals: Numbers (
42,3.14), strings ("hello",\"for embedded quotes), date literals (2021.01.01in YYYY.MM.DD format), timestamp literals (2021.01.15T14:30:00.123456in YYYY.MM.DDTHH:MM:SS[.fff…] format; fractional seconds set precision: 1–3 digits = ms, 4–6 = μs, 7–9 = ns). - Coalesce:
^— first non-null from left to right.a^b^c= coalesce(a, b, c). Right-to-left binding:a^b^c=a^(b^c).
Date and datetime accessors
For columns of type Date or Datetime, use dot notation to extract components:
column_ref.accessor
Datetime accessors
Apply to Datetime columns (with or without timezone):
| Accessor | Result | Description |
|---|---|---|
date | Date | Date part (year-month-day) |
time | Time | Time part (Polars Time type) |
year | Int32 | Year |
month | Int8 | Month (1–12) |
week | Int8 | Week number |
day | Int8 | Day of month (1–31) |
dow | Int8 | Day of week (1=Monday … 7=Sunday, ISO) |
month_start | Datetime/Date | First day of month at midnight |
month_end | Datetime/Date | Last day of month |
format["fmt"] | String | Format as string (chrono strftime, e.g. "%Y-%m") |
Date accessors
Apply to Date columns:
| Accessor | Result | Description |
|---|---|---|
year | Int32 | Year |
month | Int8 | Month (1–12) |
week | Int8 | Week number |
day | Int8 | Day of month |
dow | Int8 | Day of week (1=Monday … 7=Sunday) |
month_start | Date | First day of month |
month_end | Date | Last day of month |
format["fmt"] | String | Format as string (chrono strftime) |
String accessors
Apply to String columns:
| Accessor | Result | Description |
|---|---|---|
len | Int32 | Character length |
upper | String | Uppercase |
lower | String | Lowercase |
starts_with["x"] | Boolean | True if string starts with x |
ends_with["x"] | Boolean | True if string ends with x |
contains["x"] | Boolean | True if string contains x |
Accessor aliases
When you use an accessor on a column, the result is automatically aliased to {column}_{accessor} (e.g. timestamp.date → timestamp_date). This avoids duplicate column names and keeps results clear.
Syntax examples
Select:
select event_date: timestamp.date
select col["Created At"].date, col["Created At"].year
select name, event_time.time
select order_date, order_date.month, order_date.dow by order_date.year
Where (with date literal YYYY.MM.DD, timestamp literal):
select where created_at.date > other_date_col
select where dt_col.date > 2021.01.01
select where ts_col > 2021.01.15T14:30:00.123456
select where event_ts.month = 12, event_ts.dow = 1
select where city_name.ends_with["lanta"]
select where null col1
select where not null col1
Select (coalesce, string accessors, format):
select a: coln^cola^colb
select name.len, name.upper, dt_col.format["%Y-%m"]
Functions
Functions are used for aggregation (typically in select with by) and for logic in where. Write fn[expr] or fn expr; brackets are optional.
Aggregation functions
| Function | Aliases | Description | Example |
|---|---|---|---|
avg | mean | Average | select avg[price] by category |
min | — | Minimum | select min[qty] by region |
max | — | Maximum | select max[amount] by id |
count | — | Count of non-null values | select count[id] by status |
sum | — | Sum | select sum[amount] by year |
first | — | First value in group | select first[value] by group |
last | — | Last value in group | select last[value] by group |
std | stddev | Standard deviation | select std[score] by group |
med | median | Median | select med[price] by type |
len | length | String length (chars) | select len[name] by category |
Logic functions
| Function | Description | Example |
|---|---|---|
not | Logical negation | where not[a = b], where not x > 10 |
null | Is null | where null col1, where null[col1] |
not null | Is not null | where not null col1 |
Scalar functions
| Function | Description | Example |
|---|---|---|
len / length | String length | select len[name], where len[name] > 5 |
upper | Uppercase string | select upper[name], where upper[city] = "ATLANTA" |
lower | Lowercase string | select lower[name] |
abs | Absolute value | select abs[x] |
floor | Numeric floor | select floor[price] |
ceil / ceiling | Numeric ceiling | select ceil[score] |
Summary
| Topic | Detail |
|---|---|
| Query shape | select [cols] [by groups] [where conditions] |
: | name : expression in select and by |
| Spaces in names | col["name"] or col[identifier] |
| Date/datetime accessors | col.date, col.time, col.year, col.month, col.week, col.day, col.dow, col.month_start, col.month_end, col.format["fmt"] |
| String accessors | col.len, col.upper, col.lower, col.starts_with["x"], col.ends_with["x"], col.contains["x"] |
| Literals | Numbers, strings, YYYY.MM.DD, YYYY.MM.DDTHH:MM:SS[.fff...] |
| Coalesce | a^b^c = first non-null of a, b, c |
| Expressions | Right‑to‑left; right side binds tighter; use () to override |
Where , | AND between top‑level conditions |
Where | | OR within one top‑level condition |
Combining , and | | Top-level , = AND; | = OR within a term. e.g. “A, B or C” ⇒ A AND (B OR C). Use () to group. |