Performance Tips
Sampling in Analysis Mode
When you use Datui’s Analysis Mode, the application may automatically sample from your data rather than analyzing every row. Sampling is used to improve responsiveness and keep memory usage low when working with very large datasets.
By default, if your table contains more rows than a set threshold, Datui will analyze a representative sample instead of the full dataset. This threshold can be adjusted in the configuration file. To learn how to change the sampling limit, see the Configuration Guide: Analysis & Performance Settings.
Pivot is Eager
In order to determine all column names, pivot operations materialize all affected data in memory, which may increase RAM usage significantly for large tables.
Do as much filtering on the data as possible before pivoting to keep things manageable.
Prefer Directories with --hive
Using a directory with --hive is faster than a glob.
e.g. /path/to/partitioned/ would be faster than /path/to/partitioned/**/*.parquet.