Versioning System

Overview

The DataSpace Versioning System provides a robust and transparent way to manage the evolution of data transformation pipelines through Git-based version control. Each workspace in DataSpace is backed by its own Git repository, ensuring that every change, from transformation logic to pipeline configuration, is tracked, auditable, and reproducible.

By using Git as the foundation, DataSpace brings software engineering best practices to data workflows, including branching, merging, and rollback capabilities.

Workspace-Level Versioning

In DataSpace, versioning operates at the workspace level. Each workspace represents an isolated development environment with its own Git repository. Within a workspace, users can define and evolve transformation pipelines as code. Every modification to the pipeline, such as adding a new transform, changing dependencies, or updating logic, is committed to the repository.

This approach ensures that:

  • Every version of a pipeline is reproducible.

  • Changes are traceable to specific commits and users.

  • Collaboration between multiple users on the same workspace remains conflict-free through branching.

Branches and Pipelines

Because pipelines are declared in code, different Git branches within the same workspace naturally correspond to different pipeline versions.

Each branch defines its own state of the transformation graph, meaning:

  • Switching branches changes the entire pipeline configuration and logic.

  • Multiple branches can coexist, enabling parallel experimentation or environment separation (for example, main for production and dev for testing).

  • Merging branches allows integration of changes across pipelines while maintaining a full history of modifications.

This design allows DataSpace users to experiment safely, test changes in isolation, and roll out updates confidently.

Working Offline

DataSpace allows users to work entirely offline by directly interacting with the underlying Git repository of a workspace. Users can clone the repository to their local machine, use their preferred development tools to edit transformation code or configuration files, and then commit and push changes back to DataSpace once they are ready. This approach provides full flexibility for developers who prefer local workflows, enabling integration with IDEs, editors, or external automation tools while keeping the workspace version history consistent.

Last updated