Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Loading Data

Load data with Datui by passing it command line options and a path to open.

Supported Formats

FormatExtensionsEager load onlyHive partitioning
Parquet.parquetNoYes
CSV (or other-delimited).csv, .tsv, .psv, etc.NoNo
NDJSON.jsonlNoNo
JSON.jsonYesNo
Arrow IPC / Feather v2.arrow, .ipc, .featherNoNo
Avro.avroYesNo
Excel.xls, .xlsx, .xlsm, .xlsbYesNo
ORC.orcYesNo

Eager load only — The file is read fully into memory before use; no lazy streaming. Hive partitioning — Use the --hive flag with a directory or glob; see Hive-partitioned data below. Excel — Use the --sheet flag to specify which sheet to open.

CSV date inference — By default, CSV string columns that look like dates (e.g. YYYY-MM-DD, YYYY-MM-DDTHH:MM:SS) are parsed as Polars Date/Datetime. Use --parse-dates false or set parse_dates = false in configuration to disable.

Compression

Compressed files are identified by extension and decompressed before loading. Use the --compression option to specify the format when the file has no extension or the extension is wrong.

Supported Compression Formats

  • gz
  • zstd
  • bzip2
  • xz

Hive-partitioned data

You can load a Hive-style partitioned dataset (e.g. a directory tree with key=value segment names such as year=2024/month=01/) by using the --hive flag and passing a directory or a glob pattern instead of a single file.

  • Directory: point at the partition root, e.g. datui --hive /path/to/data
  • Glob: use a pattern that matches the partition layout, e.g. datui --hive /path/to/data/**/*.parquet
    You may need to quote the glob so your shell does not expand it (e.g. datui --hive "/path/to/data/**/*.parquet").

Only Parquet is supported for hive-partitioned loading. If you pass a single file with --hive, it is loaded as usual and the flag is ignored.

Partition columns (the keys from the path, e.g. year, month) are shown first in the table and listed in the Info panel under the Partitioned data tab.