backgroundImage

Validate a Google Sheet directly with GX

Use GX PandasDatasource and CSVAsset to do data quality on your online spreadsheets

Nathan Farmer
June 21, 2023
the text "🐼 PandasDatasource + 📒 CSVAsset = 🧡🧪 GX testing that reads directly from a Google Sheet" displayed as a series of Google Sheet titles, with decorative orange lines above and below it

A couple weeks ago I saw one of my colleagues using a workflow where he copied data from a Google Sheet to a database, then read from that database for Great Expectations testing. Which works just fine, but I wondered if it’s possible to skip the copy step and read directly from the Sheet.

I’m happy to report that it’s pretty straightforward, using PandasDatasource and CSVAsset.

Here’s the snippet:

1import great_expectations as gx
2context = gx.get_context()
3# the key can be found in your Google Sheet URL
4key = ""
5sheet_name = "Sheet1"
6sheet_url = f"https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
7csv_asset = context.sources.add_or_update_pandas("pandas_datasource").add_csv_asset(
8    name="my_google_sheet_csv",
9    filepath_or_buffer=sheet_url,
10    # Set first row as column names
11    header=0,
12    # Set first column as index
13    index_col=0,
14    # Parse column values to datetime
15    parse_dates=[]
16)
17
18validator = context.get_validator(batch_request=csv_asset.build_batch_request())
19validator.head()

You can find more snippets and share yours in the #gx-share channel of our Slack.

Search our blog for the latest on data quality.


©2024 Great Expectations. All Rights Reserved.