Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Query Syntax Reference

This document details the syntax of the Datui query language. For examples and typical workflows, see Querying Data.

Structure of a query

A query has the form:

select [columns] [by group_columns] [where conditions]
  • select — Required. Starts every query. May be followed by nothing (select all columns), or a comma‑separated list of column expressions.
  • by — Optional. Grouping and aggregation. Everything after by up to where (if present) is the group specification.
  • where — Optional. Filtering. Everything after where is the filter expression.

Clause order is fixed: selectbywhere. The parser splits on the keywords where and by (respecting parentheses and brackets), so you cannot reorder or repeat clauses.


The : assignment (aliasing)

Use : to name an expression. The form is name : expression:

  • Left of : — The new column or group name. Must be an identifier (e.g. total) or col["name with spaces"].
  • Right of : — The expression (column reference, literal, arithmetic, function call, etc.).

Examples:

select a, b, sum_ab: a + b
select renamed: col["Original Name"]
by region_name: region, total: sales + tax

Assignment is supported in both the select and by clauses. In by, it defines computed group keys or renames (e.g. region_name: region, total: sales + tax). Expressions may use column references, arithmetic, literals, and function calls.


Columns with spaces in their names

Identifiers cannot contain spaces. For columns (or aliases) with spaces, use col["..."] or col[identifier]:

select col["First Name"], col["Last Name"]
select no_spaces: col["name with spaces"]

Inside the brackets use either a quoted string ("name with spaces") or a single identifier (no spaces). The same syntax works in select, by, and where.


Right‑to‑left expression parsing

Expressions are parsed right‑to‑left: the leftmost binary operator is the root, and the right subexpression is parsed first (so it effectively binds tighter).

What this means

  • a + b * c → parsed as a + (b * c) (multiplication binds tighter).
  • a * b + c → parsed as a * (b + c) (* is leftmost; the right subexpression b + c is parsed as a unit).

So “higher‑precedence”‑style grouping happens when you put those operations on the right. You can often avoid parentheses by ordering:

select x, y: a * b + c    →  a * (b + c)
select x, y: a + b * c    →  a + (b * c)

Using parentheses

Use () to override grouping:

select (a + b) * c
select a, b where (x > 1) | (y < 0)

Parentheses also matter for , and | in where: splitting on comma and pipe respects nesting, so you can wrap ORs (or more complex logic) in () and then combine with commas. See Where clause: , and |.


Select clause

  • select alone — All columns, no expressions.
  • select a, b, c — Those columns (and/or expressions), in order.
  • select a, b: x + y, c — Columns and aliased expressions.

Columns are separated by ,. Each item is either a plain expression or alias : expression. The same expression rules (arithmetic, col[], functions, etc.) apply everywhere.


By clause (grouping and aggregation)

  • by col1, col2 — Group by those columns. Non‑group columns become list columns; the UI supports drill‑down.
  • by region, total: sales + tax — Group by region and a computed expression (e.g. arithmetic).
  • select avg salary, min id by department — Aggregations per group.

By uses the same comma‑separated list and name : expression rules as select. Aggregation functions (avg, min, max, count, sum, std, med) can be written as fn[expr] or fn expr (brackets optional).


Where clause: , and |

The where clause combines conditions using two separators:

  • ,AND. Each comma‑separated segment is one ANDed condition.
  • |OR. Within a single comma‑separated segment, | separates alternatives that are ORed together.

Parsing order:

  1. Split the where part on , (respecting () and []). Each segment is ANDed.
  2. Within each segment, split on |. The pieces are ORed.

So:

  • where a > 10, b < 2(a > 10) AND (b < 2).
  • where a > 10 | a < 5(a > 10) OR (a < 5).
  • where a > 10 | a < 5, b = 2(a > 10 OR a < 5) AND (b = 2).

Interaction of , and |

, has broader scope than |: it splits the where clause into top‑level AND terms. | only groups within one of those terms. So:

  • A, B | C = A AND (B OR C).
  • A | B, C | D = (A OR B) AND (C OR D).

To express (A or B) and (C or D), use (A)|(B), (C)|(D). Parentheses keep | inside each AND term. For more complex logic, use () to group OR subexpressions, then separate those groups with ,.


Comparison operators

In where conditions, each comparison operator compares a left-hand expression (often a column) to a right-hand expression (column, literal, or expression).

OperatorMeaningExample
=Equalwhere a = 10
!=Not equalwhere a != 0
<>Not equal (same as !=)where a <> 0
<Less thanwhere a < 100
>Greater thanwhere a > 5
<=Less than or equalwhere a <= 10
>=Greater than or equalwhere a >= 1

Operators and literals

  • Arithmetic: + - * % ( % is division, not modulo).
  • Comparison: See Comparison operators (where clauses) above.
  • Literals: Numbers (42, 3.14), strings ("hello", \" for embedded quotes), date literals (2021.01.01 in YYYY.MM.DD format), timestamp literals (2021.01.15T14:30:00.123456 in YYYY.MM.DDTHH:MM:SS[.fff…] format; fractional seconds set precision: 1–3 digits = ms, 4–6 = μs, 7–9 = ns).
  • Coalesce: ^ — first non-null from left to right. a^b^c = coalesce(a, b, c). Right-to-left binding: a^b^c = a^(b^c).

Date and datetime accessors

For columns of type Date or Datetime, use dot notation to extract components:

column_ref.accessor

Datetime accessors

Apply to Datetime columns (with or without timezone):

AccessorResultDescription
dateDateDate part (year-month-day)
timeTimeTime part (Polars Time type)
yearInt32Year
monthInt8Month (1–12)
weekInt8Week number
dayInt8Day of month (1–31)
dowInt8Day of week (1=Monday … 7=Sunday, ISO)
month_startDatetime/DateFirst day of month at midnight
month_endDatetime/DateLast day of month
format["fmt"]StringFormat as string (chrono strftime, e.g. "%Y-%m")

Date accessors

Apply to Date columns:

AccessorResultDescription
yearInt32Year
monthInt8Month (1–12)
weekInt8Week number
dayInt8Day of month
dowInt8Day of week (1=Monday … 7=Sunday)
month_startDateFirst day of month
month_endDateLast day of month
format["fmt"]StringFormat as string (chrono strftime)

String accessors

Apply to String columns:

AccessorResultDescription
lenInt32Character length
upperStringUppercase
lowerStringLowercase
starts_with["x"]BooleanTrue if string starts with x
ends_with["x"]BooleanTrue if string ends with x
contains["x"]BooleanTrue if string contains x

Accessor aliases

When you use an accessor on a column, the result is automatically aliased to {column}_{accessor} (e.g. timestamp.datetimestamp_date). This avoids duplicate column names and keeps results clear.

Syntax examples

Select:

select event_date: timestamp.date
select col["Created At"].date, col["Created At"].year
select name, event_time.time
select order_date, order_date.month, order_date.dow by order_date.year

Where (with date literal YYYY.MM.DD, timestamp literal):

select where created_at.date > other_date_col
select where dt_col.date > 2021.01.01
select where ts_col > 2021.01.15T14:30:00.123456
select where event_ts.month = 12, event_ts.dow = 1
select where city_name.ends_with["lanta"]
select where null col1
select where not null col1

Select (coalesce, string accessors, format):

select a: coln^cola^colb
select name.len, name.upper, dt_col.format["%Y-%m"]

Functions

Functions are used for aggregation (typically in select with by) and for logic in where. Write fn[expr] or fn expr; brackets are optional.

Aggregation functions

FunctionAliasesDescriptionExample
avgmeanAverageselect avg[price] by category
minMinimumselect min[qty] by region
maxMaximumselect max[amount] by id
countCount of non-null valuesselect count[id] by status
sumSumselect sum[amount] by year
firstFirst value in groupselect first[value] by group
lastLast value in groupselect last[value] by group
stdstddevStandard deviationselect std[score] by group
medmedianMedianselect med[price] by type
lenlengthString length (chars)select len[name] by category

Logic functions

FunctionDescriptionExample
notLogical negationwhere not[a = b], where not x > 10
nullIs nullwhere null col1, where null[col1]
not nullIs not nullwhere not null col1

Scalar functions

FunctionDescriptionExample
len / lengthString lengthselect len[name], where len[name] > 5
upperUppercase stringselect upper[name], where upper[city] = "ATLANTA"
lowerLowercase stringselect lower[name]
absAbsolute valueselect abs[x]
floorNumeric floorselect floor[price]
ceil / ceilingNumeric ceilingselect ceil[score]

Summary

TopicDetail
Query shapeselect [cols] [by groups] [where conditions]
:name : expression in select and by
Spaces in namescol["name"] or col[identifier]
Date/datetime accessorscol.date, col.time, col.year, col.month, col.week, col.day, col.dow, col.month_start, col.month_end, col.format["fmt"]
String accessorscol.len, col.upper, col.lower, col.starts_with["x"], col.ends_with["x"], col.contains["x"]
LiteralsNumbers, strings, YYYY.MM.DD, YYYY.MM.DDTHH:MM:SS[.fff...]
Coalescea^b^c = first non-null of a, b, c
ExpressionsRight‑to‑left; right side binds tighter; use () to override
Where ,AND between top‑level conditions
Where |OR within one top‑level condition
Combining , and |Top-level , = AND; | = OR within a term. e.g. “A, B or C” ⇒ A AND (B OR C). Use () to group.