backgroundImage

GX with AWS: follow the yellow brick road

GX has 4 new AWS integration guides, which also show off recent improvements to our docs

Erin Kapp
February 09, 2023
A photo of a brick path winding along a river and under the bridge. The bricks are a yellow/gold color.
Our new step-by-step integration guides make AWS + GX as easy as following the yellow brick road. (📸: Akshay Nanavati via Unsplash)

GX documentation has new step-by-step integration guides for these AWS setups:

  • GX with AWS using S3 and Pandas 

  • GX with AWS using S3 and Spark 

  • GX with AWS using Athena 

  • GX with AWS using Redshift 

Our AWS integration documentation assumes that you’ve done all of the following:

  • Completed the Getting Started tutorial

  • Installed Python 3

  • Installed the AWS CLI

  • Configured your AWS credentials

  • Identified the S3 bucket and prefix where Expectations and Validation Results will be stored

If you haven’t completed any of these steps, hop on over to the Getting Standard tutorial docs or the official AWS documentation to get caught up.

Once these prerequisites are in place, each guide walks you through all of the key parts of getting the integration up and running:

  • Setup, where you verify that the AWS CLI is ready, install GX and necessary dependents locally (if they’re not installed already), create your Data Context, and configure your Expectations Store, Validation Results Store, and Data Docs on S3.

  • Connecting to data, where you choose how to run the code for creating a new Datasource, instantiate your DataContext, configure and save your Datasource, then test it.

  • Creating Expectations, where you prepare a Batch Request, Expectation Suite, and Validator, then add Expectations to your Expectation Suite and save it.

  • Validating data, where you create and run a Checkpoint and then build and view your Data Docs.

These guides also provide examples for both of the configuration workflows that GX supports: a YAML-based workflow if you use GX to generate a pre-configured Jupyter Notebook, and a Python dictionary workflow if you choose to work from a blank Jupyter Notebook or Python script.

In addition to the actual integration information, our AWS guides also show off some of the recent behind-the-scenes updates that have made our docs even more helpful. Like…

Modular document content


We first used modular documentation content as a major part of our How to configure a Pandas/Spark/SQL Datasource guides, and it also features heavily in the AWS guides.

Modular content ensures that processes which are identical in multiple guides are only detailed in a single file, which can be imported into all the documents it applies to.

What it means
: Documentation updates can happen faster. And most excitingly, we’ll be able to build future end-to-end tutorials more quickly, since we’ll be able to pull them together from the individual how-to guides that contribute to the workflow for a specific environment and data backend.

Named code examples


If you didn’t already know: most of our docs are under test! This means that we keep code snippet files that are separate from our docs so we can ensure that our docs stay up-to-date with our code.

To make this easier to manage, we’ve been improving on the process that pulls code examples from test scripts for display in our guides.

In the newest version, we can now name the example snippets, so examples in the docs aren’t dependent on the line numbers of the snippet in the source code. This lets us more easily update the source code for those examples to correspond to updates in the GX codebase: changes now automatically roll into the code examples in the docs.

What it means
: Overall, code snippet consistency will be better with more automation and less room for human error. And with less time spent maintaining the relationship between the doc examples and the codebase, our hardworking developer relations team can spend more time creating new content for you.

Prerequisite framework


We now have a custom .jsx script with an extendable, standardized framework for our prerequisites box, which currently appears only in the AWS docs. We can edit common requirements from a central place, and add document-specific prerequisites in the relevant places.

What it means:
As it’s rolled out to other documentation, the standardized display and messaging for our most common requirements will make it easier for you to know what’s needed to accomplish your goals.


You can find all of our AWS integration guides here. And if you haven’t already, don’t forget to check out our expanded API docs, now using Sphinx!

Search our blog for the latest on data quality.


©2024 Great Expectations. All Rights Reserved.