Transformation
A node in the Workspace lineage graph represents a Transformation, which is essentially a Python function that accepts input data frames and outputs one data frame.
While the main purpose of a Transformation is to transform data, the system is flexible enough to also serve as Ingests, Load, Artifact Storage, Plots, and Dashboards.
On a technical level, a Transform is a Docker container running user code and generating the following resources
TransformId
code/
Contains the full git repository
datasets/
Gets populated automatically by turning the user returned LazyFrame into a Parquet file.
logs/
Stores the
log.txt
file containing logging data from executing the user code.
meta/
Contains
columns.json
file describing the column-level relationship to other Transforms.
artifacts/
Can be used to store intermediate resources or HTML files.
Last updated