Google Drive
Single File Extraction
To ingest a CSV file from Google Drive, you first have to enable sharing on the file by following the instructions on the Google Drive Help Page.
The generated share link will look something like this:
https://drive.google.com/file/d/1Se7_LKZykBWweXpBths1oCmgGTGK4yyD/view?usp=sharing
This link is meant to open the Google Drive web interface. However, since we want the file itself, we have to modify the link. The file ID needs to be extracted from the original URL and combined with the direct file access link:
https://drive.google.com/uc?id=1Se7_LKZykBWweXpBths1oCmgGTGK4yyD
Following is the full code
import polars as pl
import os
url = 'https://drive.google.com/uc?id={os.environ['CSV_FILE_ID']}'
def transform():
df = pl.read_csv(url)
return dfMultiple File Extraction
If you have a folder with multiple files you would like to extract, it is not feasible to share every single file manually. In this case, we can leverage Google's API to programmatically access the share drive, index the files, and download all.
1. Create a Google Cloud Project
Go to https://console.cloud.google.com/.
Click the project selector (top left) and choose “New Project.”
Give it a descriptive name, for example
DataSpace Drive Access.Click Create.
2. Enable the Google Drive API
In the left sidebar, go to APIs & Services → Library.
Search for Google Drive API.
Click Enable.
3. Create a Service Account
Go to APIs & Services → Credentials.
Click Create Credentials → Service Account.
Enter a name like
dataspace-drive-access.Leave Permissions and Principals with access empty — no roles or users are needed.
Click Done.
4. Create a Key File
In the service account list, click your new account.
Open the Keys tab.
Click Add Key → Create New Key → JSON.
Save the downloaded
.jsonfile (for example,mcp.json).
You’ll need to upload this file to your DataSpace workspace later.
5. Share Your Google Drive Folder with the Service Account
Go to Google Drive.
Right-click your folder and choose Share.
Copy the client email from the service account JSON file (it looks like
dataspace-drive-access@your-project-id.iam.gserviceaccount.com).Add that email as a Viewer.
Copy the folder ID from the URL — it’s the long string between
/folders/and the next/. Example:https://drive.google.com/drive/folders/1r1cDZzgE7wclTYMv_Xzdv98znO9kHN8_ → Folder ID: 1r1cDZzgE7wclTYMv_Xzdv98znO9kHN8_
6. Prepare Your DataSpace Workspace
In DataSpace, declare the dependencies in _config.json:
{
"packages": [
"google-api-python-client",
"google-auth"
]
}Make sure your service account key (mcp.json) is uploaded to the workspace root.
7. Write the Transformation
Now we can download the files and save them in the artifacts folder for further processing downstream.
import polars as pl
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
import io, os
# Google Drive folder ID
FOLDER_ID = "<REPLACE_WITH_FOLDER_ID>"
# Path to your service account credentials
SERVICE_ACCOUNT_FILE = "./mcp.json"
# Drive API scope
SCOPES = ["https://www.googleapis.com/auth/drive.readonly"]
def transform():
# Authenticate using the service account
creds = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES
)
service = build("drive", "v3", credentials=creds)
# Query all Excel files in the folder
query = f"'{FOLDER_ID}' in parents and (name contains '.xlsx' or name contains '.xls')"
results = service.files().list(q=query, fields="files(id, name)").execute()
files = results.get("files", [])
# Download files into the artifacts folder
for f in files:
print(f"Downloading {f['name']}...")
request = service.files().get_media(fileId=f["id"])
with io.FileIO(os.path.join(os.environ["ARTIFACT_FOLDER"], f["name"]), "wb") as fh:
downloader = MediaIoBaseDownload(fh, request)
done = False
while not done:
status, done = downloader.next_chunk()
if status:
print(f" {int(status.progress() * 100)}%")
print("✅ All files downloaded successfully")
# Return an empty DataFrame (optional)
df = pl.DataFrame()
return dfAll downloaded files are automatically stored in the artifacts folder, so they persist across runs and are available for further processing.
Summary
You’ve now successfully configured your DataSpace workspace to:
Authenticate securely via a Google service account
Access a shared Google Drive folder
Automatically download Excel files into the artifacts folder
Last updated