backgroundImage

We've revamped Checkpoints!

Sam Bail
January 28, 2021
Sam Bail
January 28, 2021
revamped checkpoint

We’ve made Checkpoints highly configurable first-class citizens in Great Expectations to meet all your validation needs.

You might recall seeing something like this when creating or running a Checkpoint in the Great Expectations CLI:

Screenshot of Checkpoints CLI output

Well, we’re excited to announce that Checkpoints are now fully grown up and no longer experimental!

As of Great Expectations version 0.13.8, Checkpoints integrate the new logic and metadata from Batches, Validators and new Datasources released with Great Expectations 0.13, or what we call the “new” or “experimental” API. We extended Checkpoints to handle more use cases with far less boilerplate code, and while we were at it, we also tidied up the underlying classes and added a CheckpointStore, which makes them a first-class citizen just like Datasources and Expectations. This is why we call them “class-based” Checkpoints in the documentation, as opposed to “legacy” Checkpoints. This is what you’re going to see in our documentation for now to distinguish the two versions:

Screenshot of Checkpoints documentation

This blog post covers the improved Checkpoints, and introduces how to create, configure, and run them.

What can you do with Checkpoints?

A Checkpoint bundles an Expectation Suite with a batch, or multiple batches, of data, which allows you to easily run validation and kick off any follow-up actions. Some examples of where you can use pre-configured Checkpoints to validate data include:

Most importantly, please check out our Core Concepts documentation on Checkpoints, which is packed with awesome explanations and examples.

Side note and invitation to contribute: as you can see from the linked docs, we’ve produced documentation for some of these use cases. If you’re in the mood to give back to the community, we’d really appreciate PRs documenting additional deployment patterns, either on this list or new ideas of your own.

What’s changing?

First things first: This is a non-breaking change, so you can upgrade to 0.13.8 and continue to use your legacy Checkpoints as-is (including the "This feature is experimental" warning message). As usual, we’re releasing this version of Checkpoints as a v1, with some room to iterate, so there might be some smaller tweaks happening in future releases. These changes are based on a huge amount of community feedback; we’re confident that they are a big step in the right direction and that we’ll be able to smooth out any rough edges soon.

At the time of writing this blog post, there are two ways you can interact with Checkpoints, which depends on whether you’re using the “stable” Great Expectations API for key concepts such as Datasources, or using the “experimental” API:

  1. The Checkpoint-related CLI commands like checkpoint new and checkpoint run continue to work with configurations for concepts using the “stable Great Expectations API”. If you’re using Datasources from the stable API, your Checkpoint workflows won’t change.
  2. If you’re using Datasources from the “experimental” Great Expectations API, you can access new-style Checkpoints through code, as we are planning to switch over the CLI entirely to all new concepts in a future release. There’s just one thing you’ll need to do if you want to use class-based Checkpoints in 0.13.8: We’ve incremented the version number of the great_expectations.yml config file, which means you’ll have to run the CLI upgrade tool..
  3. There's a third option: You can continue to use legacy Checkpoints via the CLI, but have them backed by a CheckpointStore if you upgrade the config version. This would allow you to store your Checkpoints somewhere other than the filesystem, e.g. in cloud storage.

You will see some warning messages regarding the new configuration file version, but that's ok. Only upgrade when you're confident you want to.

Creating and configuring Checkpoints

Let’s assume you’ve already configured a data asset MyDataAsset and an Expectation Suite my_suite, and you just want to create a Checkpoint that allows you to run validation of MyDataAsset with my_suite. This is the configuration that’ll bundle the Expectation Suite and the respective batch request using the SimpleCheckpoint class, which takes care of a few defaults:

config = """
name: my_checkpoint
config_version: 1.0
class_name: SimpleCheckpoint
validations:
- batch_request:
datasource_name: my_datasource
data_connector_name: my_data_connector
data_asset_name: MyDataAsset
partition_request:
index: 0
expectation_suite_name: yellow_tripdata_sample_2019-01.warning
"""

Well, this looks pretty much like the old-school Checkpoint yml files we’ve seen previously - a batch and a suite, nothing special. They simply replace what was previously known as “ValidationOperators”. However, the real power of these new Checkpoints comes from their configurability. For example, we can add multiple batch_requests to a Checkpoint to validate several assets with the same Expectation Suite, we can nest ValidationActions, add Evaluation Parameters, set the output type for validation results, and use templates for run_name. See this epic example of a highly customized Checkpoint configuration using the Checkpoint base class instead of SimpleCheckpoint:

config = """
name: my_fancy_checkpoint
config_version: 1
class_name: Checkpoint
run_name_template: "%Y-%M-foo-bar-template-$VAR"
validations:
- batch_request:
datasource_name: my_datasource
data_connector_name: my_special_data_connector
data_asset_name: users
partition_request:
index: -1
- batch_request:
datasource_name: my_datasource
data_connector_name: my_other_data_connector
data_asset_name: users
partition_request:
index: -2
expectation_suite_name: users.delivery
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
evaluation_parameters:
param1: "$MY_PARAM"
param2: 1 + "$OLD_PARAM"
runtime_configuration:
result_format:
result_format: BASIC
partial_unexpected_count: 20
"""

For more examples of the various configuration options for these new Checkpoints, take a look at our documentation!

Wait, what are ValidationActions again?

ValidationActions and ValidationOperators continue to exist inside of Checkpoints. However, we think of them as purely internal concerns. You will configure them within Checkpoints, but you would almost never instantiate or invoke them outside of Checkpoints.

This matters for extensibility. ValidationActions are pluggable actions that can kick off secondary processes after data validation, such as:

  • Storing validation results
  • Building Data Docs
  • Triggering notifications, such as Slack notifications

Because Checkpoints wrap ValidationActions, you can configure them just like you used to be able to do. The SimpleCheckpoint class actually defaults to the above action list! If we want more fine-grained control over which actions to run after validation, we can add a custom action_list like in the above example:

config = """
...
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
...
"""

We expect the list of integrations in ValidationActions, such as the types of notifications to send, to continue to grow. If you have ideas, we’d love to help you contribute them back to the community. Please check out our contribution guide to get started!

Running Checkpoints

Running Checkpoints is easy. We’ve designed them with two principles in mind:

  1. Minimal in-line code.
  2. Make them set-and-forget. Once you’ve configured a Checkpoint, you should be able to just run it repeatedly to validate your data assets, without requiring additional configuration.

You can simply run a new-style Checkpoint in code using the following snippet:

ge.get_context().run_checkpoint(
checkpoint_name="my_checkpoint",
)

As we mentioned above, we will soon integrate the new Checkpoints with the CLI, which means you will be able to trigger a run from the CLI too. Currently, the CLI checkpoint run command still supports the legacy Checkpoints, which only operate on concepts from the stable Great Expectations API.

Conclusion

We hope this article gives you an idea of what to expect when using Checkpoints for validation. For more detailed information about new Checkpoints, please refer to the updated how-to guides in the “Validation” section of our docs, as well as the Core Concepts pages in our documentation.

Like our blogs?

Sign up for emails and get more blogs and news

Great Expectations email sign-up

Hello friend of Great Expectations!

Our email content features product updates from the open source platform and our upcoming Cloud product, new blogs and community celebrations.

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Banner Image

Search our blog for the latest on data management


©2023 Great Expectations. All Rights Reserved.Privacy Policy