A couple weeks ago I saw one of my colleagues using a workflow where he copied data from a Google Sheet to a database, then read from that database for Great Expectations testing. Which works just fine, but I wondered if it’s possible to skip the copy step and read directly from the Sheet.
I’m happy to report that it’s pretty straightforward, using PandasDatasource and CSVAsset.
Here’s the snippet:
1import great_expectations as gx2context = gx.get_context()3# the key can be found in your Google Sheet URL4key = ""5sheet_name = "Sheet1"6sheet_url = f"https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}"7csv_asset = context.sources.add_or_update_pandas("pandas_datasource").add_csv_asset(8 name="my_google_sheet_csv",9 filepath_or_buffer=sheet_url,10 # Set first row as column names11 header=0,12 # Set first column as index13 index_col=0,14 # Parse column values to datetime15 parse_dates=[]16)17
18validator = context.get_validator(batch_request=csv_asset.build_batch_request())19validator.head()
You can find more snippets and share yours in the #gx-share channel of our Slack.