Loading Data
Load data with Datui by passing it command line options
and a path to open. The path can be a local file or directory, an S3 URI (Amazon S3 or MinIO), a GCS URI (gs://), or an HTTP/HTTPS URL. Cloud and HTTP support are included by default.
Remote data (S3, GCS, and HTTP)
You can open data from Amazon S3, Google Cloud Storage (GCS), S3-compatible storage (e.g. MinIO), or HTTP/HTTPS by passing the appropriate URI. Support is included by default when you build or install datui.
Same behavior for all cloud and HTTP: Parquet (and Parquet globs/prefixes) are read directly where supported; all other formats (CSV, JSON, NDJSON, etc.) are downloaded to a temporary file first, then loaded. That applies to AWS S3, custom S3 (MinIO, etc.), and GCS.
One remote path at a time: If you pass more than one path and the first is a remote URL (S3, GCS, or HTTP), datui reports an error. Open a single remote URL per run; local files can still be opened as multiple paths (concatenated).
Amazon S3 (s3://)
Credentials: Datui does not add its own credential system. It uses the same credentials as the rest of the AWS ecosystem:
-
Environment variables (good for scripts and one-off use):
AWS_ACCESS_KEY_ID— access keyAWS_SECRET_ACCESS_KEY— secret keyAWS_REGION(orAWS_DEFAULT_REGION) — e.g.us-east-1- Optionally
AWS_SESSION_TOKENfor temporary credentials
-
Shared config (good for daily use):
~/.aws/credentials— profiles and keys~/.aws/config— region and other settings
-
IAM roles — If you run on EC2, ECS, Lambda, or similar, the instance/task role is used automatically; no env or config needed.
Set at least one of these before running Datui. Example:
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1
datui s3://my-bucket/data/myfile.parquet
Usage
-
Single Parquet file:
datui s3://bucket-name/path/to/file.parquet -
Hive-style partitioned data on S3:
Use a prefix or glob so Polars can discover partitions:datui s3://bucket-name/prefix/datui "s3://bucket-name/prefix/**/*.parquet"
-
Other formats (CSV, JSON, NDJSON, etc.) on S3:
Datui downloads the object to a temporary file and then loads it like a local file. Use the same URI style:datui s3://bucket-name/path/to/file.csv
The file is downloaded to your system temp directory (or--temp-dirif set), then opened normally. This is the same approach used for HTTP/HTTPS URLs.
Configurable S3 (MinIO and other S3-compatible backends)
You can point S3 at a custom endpoint (e.g. MinIO) via config, environment variables, or CLI options. Priority is: CLI > env > config. That lets you keep a default in config and override per run when you use multiple backends.
Config — In ~/.config/datui/config.toml add a [cloud] section:
[cloud]
# MinIO example
s3_endpoint_url = "http://localhost:9000"
s3_access_key_id = "minioadmin"
s3_secret_access_key = "minioadmin"
s3_region = "us-east-1"
Environment variables (override config; standard for S3 tools):
AWS_ENDPOINT_URLorAWS_ENDPOINT_URL_S3— custom endpoint (e.g.http://localhost:9000)AWS_ACCESS_KEY_ID— access keyAWS_SECRET_ACCESS_KEY— secret keyAWS_REGIONorAWS_DEFAULT_REGION— region (e.g.us-east-1)
CLI options (override env and config):
--s3-endpoint-url URL--s3-access-key-id KEY--s3-secret-access-key SECRET--s3-region REGION
Examples for multiple backends without editing config each time:
# MinIO in another terminal/shell
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_REGION=us-east-1
datui s3://my-bucket/file.parquet
# Different MinIO or S3-compatible endpoint
datui --s3-endpoint-url https://s3.other.example s3://other-bucket/file.parquet
Google Cloud Storage (gs://)
You can open Parquet files from Google Cloud Storage using gs:// URIs. Credentials use Application Default Credentials:
- User or service account key file: set
GOOGLE_APPLICATION_CREDENTIALSto the path of your JSON key file. - gcloud CLI: run
gcloud auth application-default login. - GCE/Cloud Run: workload identity is used automatically.
Example:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
datui gs://my-bucket/path/to/file.parquet
- Parquet and Hive-style prefixes/globs work as for S3:
datui gs://bucket/prefix/ordatui "gs://bucket/prefix/**/*.parquet". - Other formats (CSV, JSON, NDJSON, etc.) work the same as S3 and HTTP: the object is downloaded to a temporary file, then loaded. Example:
datui gs://my-bucket/path/to/file.csv
Minimal build (no cloud)
To build without S3 support and avoid the extra cloud dependencies:
cargo build --release --no-default-features
If you pass an S3 or gs:// URI to a binary built that way, you will see an error suggesting a build with default features.
Supported Formats
| Format | Extensions | Eager load only | Hive partitioning |
|---|---|---|---|
| Parquet | .parquet | No | Yes |
| CSV (or other-delimited) | .csv, .tsv, .psv, etc. | No | No |
| NDJSON | .jsonl | No | No |
| JSON | .json | Yes | No |
| Arrow IPC / Feather v2 | .arrow, .ipc, .feather | No | No |
| Avro | .avro | Yes | No |
| Excel | .xls, .xlsx, .xlsm, .xlsb | Yes | No |
| ORC | .orc | Yes | No |
Eager load only — The file is read fully into memory before use; no lazy streaming. Hive partitioning — Use the --hive flag with a directory or glob; see Hive-partitioned data below.
Excel — Use the --sheet flag to specify which sheet to open.
CSV date inference — By default, CSV string columns that look like dates (e.g. YYYY-MM-DD, YYYY-MM-DDTHH:MM:SS) are parsed as Polars Date/Datetime. Use --parse-dates false or set parse_dates = false in configuration to disable.
Compression
Compressed files are identified by extension and decompressed before loading. Use the --compression option to specify the format when the file has no extension or the extension is wrong.
Supported Compression Formats
- gz
- zstd
- bzip2
- xz
Hive-partitioned data
You can load a Hive-style partitioned dataset (e.g. a directory tree with key=value segment names such as year=2024/month=01/) by using the --hive flag and passing a directory or a glob pattern instead of a single file.
- Directory: point at the partition root, e.g.
datui --hive /path/to/data - Glob: use a pattern that matches the partition layout, e.g.
datui --hive /path/to/data/**/*.parquet
You may need to quote the glob so your shell does not expand it (e.g.datui --hive "/path/to/data/**/*.parquet").
Only Parquet is supported for hive-partitioned loading. If you pass a single file with --hive, it is loaded as usual and the flag is ignored.
Partition columns (the keys from the path, e.g. year, month) are shown first in the table and listed in the Info panel under the Partitioned data tab.