Builds
Last updated
Last updated
A build refers to the process of running a transformation and producing an output dataset.
A build can be triggered by selecting one or multiple transformations in the lineage page and clicking on the Build
button.
Note When multiple transformations are selected, the build system resolved dependencies and chooses the most efficient build sequence.
When there are many upstream transformations, it can sometimes be beneficial to make use of the Upstream Buld functionality, which given a selected transformation recursively crawls all upstream dependencies and builds them before building the current transformation.
A regular build always deletes the current dataset and replaces it with the freshly build dataset. In some cases, however, it is desirable to retain old data, for example when historisation is required. This use case is supported and can be achieved with the following setup:
DataSpace internally stores the transforms as folders, code as files, and the produced datasets as parquet files. To perform an incremental build, the existing parquet file from the transform can be manually read and concatenated to the new calculations.
It often makes sense to run such an incremental build periodically. Please refer to Build Schedules to learn more.