Transformation

A node in the Workspace lineage graph represents a Transformation, which is essentially a Python function that accepts input data frames and outputs one data frame.

While the main purpose of a Transformation is to transform data, the system is flexible enough to also serve as Ingests, Load, Artifact Storage, Plots, and Dashboards.

On a technical level, a Transform is a Docker container running user code and generating the following resources

TransformId

  • code/

    • Contains the full git repository

  • datasets/

    • Gets populated automatically by turning the user returned LazyFrame into a Parquet file.

  • logs/

    • Stores the log.txt file containing logging data from executing the user code.

  • meta/

    • Contains columns.json file describing the column-level relationship to other Transforms.

  • artifacts/

    • Can be used to store intermediate resources or HTML files.

Last updated