Code

The code repository can have two types of code files: regular Python scripts and transformations. For a transformation to be identified as such, a function called transform returning a LazyFrame (or DataFrame) needs to be present. The function will be called automatically when appropriate.

import polars as pl

def transform(airport_info_ingest, jfk_ingest):
    return airport_info_ingest.join(jfk_ingest, on="id")

The function can have multiple parameters. A parameter to the transform function needs to have the same name as one of the transforms. By declaring a transform as a parameter, the dependency graph will be updated, the transform will be read, and the DataFrame will be passed in as a parameter to the transform function.

The following example shows how the connection_statisticstransform declares a dependency on airport_info_ingest and jfk_ingest by just specifying those transform names as parameters for the transform function.

Last updated