Loading Data
Load data with Datui by passing it command line options
and a path to open. The path can be a local file or directory, an S3 URI (Amazon S3 or MinIO), a GCS URI (gs://), or an HTTP/HTTPS URL. Cloud and HTTP support are included by default.
Remote data (S3, GCS, and HTTP)
You can open data from Amazon S3, Google Cloud Storage (GCS), S3-compatible storage (e.g. MinIO), or HTTP/HTTPS by passing the appropriate URI. Support is included by default when you build or install datui.
Same behavior for all cloud and HTTP: Parquet (and Parquet globs/prefixes) are read directly where supported; all other formats (CSV, JSON, NDJSON, etc.) are downloaded to a temporary file first, then loaded. That applies to AWS S3, custom S3 (MinIO, etc.), and GCS.
One remote path at a time: If you pass more than one path and the first is a remote URL (S3, GCS, or HTTP), datui reports an error. Open a single remote URL per run; local files can still be opened as multiple paths (concatenated).
Amazon S3 (s3://)
Credentials: Datui does not add its own credential system. It uses the same credentials as the rest of the AWS ecosystem:
-
Environment variables (good for scripts and one-off use):
AWS_ACCESS_KEY_ID— access keyAWS_SECRET_ACCESS_KEY— secret keyAWS_REGION(orAWS_DEFAULT_REGION) — e.g.us-east-1- Optionally
AWS_SESSION_TOKENfor temporary credentials
-
Shared config (good for daily use):
~/.aws/credentials— profiles and keys~/.aws/config— region and other settings
-
IAM roles — If you run on EC2, ECS, Lambda, or similar, the instance/task role is used automatically; no env or config needed.
Set at least one of these before running Datui. Example:
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1
datui s3://my-bucket/data/myfile.parquet
Usage
-
Single Parquet file:
datui s3://bucket-name/path/to/file.parquet -
Hive-style partitioned data on S3:
Use a prefix or glob so Polars can discover partitions:datui s3://bucket-name/prefix/datui "s3://bucket-name/prefix/**/*.parquet"
-
Other formats (CSV, JSON, NDJSON, etc.) on S3:
Datui downloads the object to a temporary file and then loads it like a local file. Use the same URI style:datui s3://bucket-name/path/to/file.csv
The file is downloaded to your system temp directory (or--temp-dirif set), then opened normally. This is the same approach used for HTTP/HTTPS URLs.
Configurable S3 (MinIO and other S3-compatible backends)
You can point S3 at a custom endpoint (e.g. MinIO) via config, environment variables, or CLI options. Priority is: CLI > env > config. That lets you keep a default in config and override per run when you use multiple backends.
Config — In ~/.config/datui/config.toml add a [cloud] section:
[cloud]
# MinIO example
s3_endpoint_url = "http://localhost:9000"
s3_access_key_id = "minioadmin"
s3_secret_access_key = "minioadmin"
s3_region = "us-east-1"
Environment variables (override config; standard for S3 tools):
AWS_ENDPOINT_URLorAWS_ENDPOINT_URL_S3— custom endpoint (e.g.http://localhost:9000)AWS_ACCESS_KEY_ID— access keyAWS_SECRET_ACCESS_KEY— secret keyAWS_REGIONorAWS_DEFAULT_REGION— region (e.g.us-east-1)
CLI options (override env and config):
--s3-endpoint-url URL--s3-access-key-id KEY--s3-secret-access-key SECRET--s3-region REGION
Examples for multiple backends without editing config each time:
# MinIO in another terminal/shell
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_REGION=us-east-1
datui s3://my-bucket/file.parquet
# Different MinIO or S3-compatible endpoint
datui --s3-endpoint-url https://s3.other.example s3://other-bucket/file.parquet
Google Cloud Storage (gs://)
You can open Parquet files from Google Cloud Storage using gs:// URIs. Credentials use Application Default Credentials:
- User or service account key file: set
GOOGLE_APPLICATION_CREDENTIALSto the path of your JSON key file. - gcloud CLI: run
gcloud auth application-default login. - GCE/Cloud Run: workload identity is used automatically.
Example:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
datui gs://my-bucket/path/to/file.parquet
- Parquet and Hive-style prefixes/globs work as for S3:
datui gs://bucket/prefix/ordatui "gs://bucket/prefix/**/*.parquet". - Other formats (CSV, JSON, NDJSON, etc.) work the same as S3 and HTTP: the object is downloaded to a temporary file, then loaded. Example:
datui gs://my-bucket/path/to/file.csv
Minimal build (no cloud)
To build without S3 support and avoid the extra cloud dependencies:
cargo build --release --no-default-features
If you pass an S3 or gs:// URI to a binary built that way, you will see an error suggesting a build with default features.
Supported Formats
| Format | Extensions | Eager load only | Hive partitioning |
|---|---|---|---|
| Parquet | .parquet | No | Yes |
| CSV (or other-delimited) | .csv, .tsv, .psv, etc. | No | No |
| NDJSON | .jsonl | No | No |
| JSON | .json | Yes | No |
| Arrow IPC / Feather v2 | .arrow, .ipc, .feather | No | No |
| Avro | .avro | Yes | No |
| Excel | .xls, .xlsx, .xlsm, .xlsb | Yes | No |
| ORC | .orc | Yes | No |
Eager load only — The file is read fully into memory before use; no lazy streaming. Hive partitioning — Use the --hive flag with a directory or glob; see Hive-partitioned data below.
Excel — Use the --sheet flag to specify which sheet to open.
CSV date inference — By default, CSV string columns that look like dates (e.g. YYYY-MM-DD, YYYY-MM-DDTHH:MM:SS) are parsed as Polars Date/Datetime. Use --parse-dates false or set parse_dates = false in configuration to disable.
Compression
Compressed files are identified by extension and decompressed before loading. Use the --compression option to specify the format when the file has no extension or the extension is wrong.
Supported Compression Formats
- gz
- zstd
- bzip2
- xz
Hive-partitioned data
You can load a Hive-style partitioned dataset (e.g. a directory tree with key=value segment names such as year=2024/month=01/) by using the --hive flag and passing a directory or a glob pattern instead of a single file.
- Directory: point at the partition root, e.g.
datui --hive /path/to/data - Glob: use a pattern that matches the partition layout, e.g.
datui --hive /path/to/data/**/*.parquet
You may need to quote the glob so your shell does not expand it (e.g.datui --hive "/path/to/data/**/*.parquet").
Only Parquet is supported for hive-partitioned loading. If you pass a single file with --hive, it is loaded as usual and the flag is ignored.
Schema from one file (default): For faster loading, datui infers the Parquet schema from a single file along one partition branch (single-spine) instead of scanning all files. This applies to both local Hive directories and S3/GCS prefixes (e.g. s3://bucket/prefix/ or gs://bucket/prefix/). If your dataset has inconsistent schemas or other complications and you prefer Polars to discover the schema over all files, disable this with --single-spine-schema=false or set single_spine_schema = false under [file_loading] in configuration.
Partition columns (the keys from the path, e.g. year, month) are shown first in the table and listed in the Info panel under the Partitioned data tab.