Overview
Last updated
Last updated
The DataSpace is a self-contained platform and can be either deployed on-prem or in a cloud infrastructure. This is made possible by dockerizing all of its modules, making the entire system resilient and horizontally scalable. For more details about the infrastructure see Infrastructure
Following are some key features
A major advantage of a hosted platform to a regular local development setup is the zero overhead of onboarding and environment setup. As soon as a new developer gets access to the platform, they can start working on the code immediately without installing any dependencies. Furthermore, the data never leaves the platform which increases security and enables data governance.
DataSpace consists of self-hostable components, which gives organisations the ability to completely self-host the entire platform, making sure that sensitive data will never leave the dedicated environment.
Furthermore, access rights, , and data limits can be put in place to limit access for any given user.
The platform reacts to changes to the lineage in real-time. Even when multiple developers are working on the pipeline at the same time, the changes to the build status and dependency graph are streamed in and automatically updated in real time.
DataSpace packages a lot of convenient features under one system while retaining the ability to extract a pipeline out of the platform. DataSpace creates the dependency graph according to the user's source code, therefore, a build can also be run manually on a local machine by downloading the entire project and running the Python code manually.