backgroundImage

We've revamped Checkpoints!

We’ve made Checkpoints highly configurable first-class citizens in Great Expectations to meet all your validation needs.

Sam Bail
January 28, 2021
revamped checkpoint

You might recall seeing something like this when creating or running a Checkpoint in the Great Expectations CLI:

Screenshot of Checkpoints CLI output

Well, we’re excited to announce that Checkpoints are now fully grown up and no longer experimental!

As of Great Expectations version 0.13.8, Checkpoints integrate the new logic and metadata from Batches, Validators and new Datasources released with Great Expectations 0.13, or what we call the “new” or “experimental” API. We extended Checkpoints to handle more use cases with far less boilerplate code, and while we were at it, we also tidied up the underlying classes and added a

CheckpointStore
, which makes them a first-class citizen just like Datasources and Expectations. This is why we call them “class-based” Checkpoints in the documentation, as opposed to “legacy” Checkpoints. This is what you’re going to see in our documentation for now to distinguish the two versions:

Screenshot of Checkpoints documentation

This blog post covers the improved Checkpoints, and introduces how to create, configure, and run them.

What can you do with Checkpoints?

A Checkpoint bundles an Expectation Suite with a batch, or multiple batches, of data, which allows you to easily run validation and kick off any follow-up actions. Some examples of where you can use pre-configured Checkpoints to validate data include:

Most importantly, please check out our Core Concepts documentation on Checkpoints, which is packed with awesome explanations and examples.

Side note and invitation to contribute: as you can see from the linked docs, we’ve produced documentation for some of these use cases. If you’re in the mood to give back to the community, we’d really appreciate PRs documenting additional deployment patterns, either on this list or new ideas of your own.

What’s changing?

First things first: This is a non-breaking change, so you can upgrade to 0.13.8 and continue to use your legacy Checkpoints as-is (including the "This feature is experimental" warning message). As usual, we’re releasing this version of Checkpoints as a v1, with some room to iterate, so there might be some smaller tweaks happening in future releases. These changes are based on a huge amount of community feedback; we’re confident that they are a big step in the right direction and that we’ll be able to smooth out any rough edges soon.

At the time of writing this blog post, there are two ways you can interact with Checkpoints, which depends on whether you’re using the “stable” Great Expectations API for key concepts such as Datasources, or using the “experimental” API:

  1. The Checkpoint-related CLI commands like
    checkpoint new
    and
    checkpoint run
    continue to work with configurations for concepts using the “stable Great Expectations API”. If you’re using Datasources from the stable API, your Checkpoint workflows won’t change.
  2. If you’re using Datasources from the “experimental” Great Expectations API, you can access new-style Checkpoints through code, as we are planning to switch over the CLI entirely to all new concepts in a future release. There’s just one thing you’ll need to do if you want to use class-based Checkpoints in 0.13.8: We’ve incremented the version number of the
    great_expectations.yml
    config file, which means you’ll have to run the CLI upgrade tool..
  3. There's a third option: You can continue to use legacy Checkpoints via the CLI, but have them backed by a CheckpointStore if you upgrade the config version. This would allow you to store your Checkpoints somewhere other than the filesystem, e.g. in cloud storage.

You will see some warning messages regarding the new configuration file version, but that's ok. Only upgrade when you're confident you want to.

Creating and configuring Checkpoints

Let’s assume you’ve already configured a data asset

MyDataAsset
and an Expectation Suite
my_suite
, and you just want to create a Checkpoint that allows you to run validation of
MyDataAsset
with
my_suite
. This is the configuration that’ll bundle the Expectation Suite and the respective batch request using the SimpleCheckpoint class, which takes care of a few defaults:

1config = """
2name: my_checkpoint
3config_version: 1.0
4class_name: SimpleCheckpoint
5validations:
6 - batch_request:
7 datasource_name: my_datasource
8 data_connector_name: my_data_connector
9 data_asset_name: MyDataAsset
10 partition_request:
11 index: 0
12 expectation_suite_name: yellow_tripdata_sample_2019-01.warning
13"""
14

Well, this looks pretty much like the old-school Checkpoint yml files we’ve seen previously - a batch and a suite, nothing special. They simply replace what was previously known as “ValidationOperators”. However, the real power of these new Checkpoints comes from their configurability. For example, we can add multiple

batch_requests
to a Checkpoint to validate several assets with the same Expectation Suite, we can nest ValidationActions, add Evaluation Parameters, set the output type for validation results, and use templates for
run_name
. See this epic example of a highly customized Checkpoint configuration using the
Checkpoint
base class instead of
SimpleCheckpoint
:

1config = """
2name: my_fancy_checkpoint
3config_version: 1
4class_name: Checkpoint
5run_name_template: "%Y-%M-foo-bar-template-$VAR"
6validations:
7 - batch_request:
8 datasource_name: my_datasource
9 data_connector_name: my_special_data_connector
10 data_asset_name: users
11 partition_request:
12 index: -1
13 - batch_request:
14 datasource_name: my_datasource
15 data_connector_name: my_other_data_connector
16 data_asset_name: users
17 partition_request:
18 index: -2
19expectation_suite_name: users.delivery
20action_list:
21 - name: store_validation_result
22 action:
23 class_name: StoreValidationResultAction
24 - name: store_evaluation_params
25 action:
26 class_name: StoreEvaluationParametersAction
27 - name: update_data_docs
28 action:
29 class_name: UpdateDataDocsAction
30evaluation_parameters:
31 param1: "$MY_PARAM"
32 param2: 1 + "$OLD_PARAM"
33runtime_configuration:
34 result_format:
35 result_format: BASIC
36 partial_unexpected_count: 20
37"""
38

For more examples of the various configuration options for these new Checkpoints, take a look at our documentation!

Wait, what are ValidationActions again?

ValidationActions and ValidationOperators continue to exist inside of Checkpoints. However, we think of them as purely internal concerns. You will configure them within Checkpoints, but you would almost never instantiate or invoke them outside of Checkpoints.

This matters for extensibility. ValidationActions are pluggable actions that can kick off secondary processes after data validation, such as:

  • Storing validation results
  • Building Data Docs
  • Triggering notifications, such as Slack notifications

Because Checkpoints wrap ValidationActions, you can configure them just like you used to be able to do. The

SimpleCheckpoint
class actually defaults to the above action list! If we want more fine-grained control over which actions to run after validation, we can add a custom
action_list
like in the above example:

1config = """
2...
3action_list:
4 - name: store_validation_result
5 action:
6 class_name: StoreValidationResultAction
7 - name: store_evaluation_params
8 action:
9 class_name: StoreEvaluationParametersAction
10 - name: update_data_docs
11 action:
12 class_name: UpdateDataDocsAction
13...
14"""
15

We expect the list of integrations in ValidationActions, such as the types of notifications to send, to continue to grow. If you have ideas, we’d love to help you contribute them back to the community. Please check out our contribution guide to get started!

Running Checkpoints

Running Checkpoints is easy. We’ve designed them with two principles in mind:

  1. Minimal in-line code.
  2. Make them set-and-forget. Once you’ve configured a Checkpoint, you should be able to just run it repeatedly to validate your data assets, without requiring additional configuration.

You can simply run a new-style Checkpoint in code using the following snippet:

1ge.get_context().run_checkpoint(
2 checkpoint_name="my_checkpoint",
3)
4

As we mentioned above, we will soon integrate the new Checkpoints with the CLI, which means you will be able to trigger a run from the CLI too. Currently, the CLI

checkpoint run
command still supports the legacy Checkpoints, which only operate on concepts from the stable Great Expectations API.

Conclusion

We hope this article gives you an idea of what to expect when using Checkpoints for validation. For more detailed information about new Checkpoints, please refer to the updated how-to guides in the “Validation” section of our docs, as well as the Core Concepts pages in our documentation.

Like our blogs?

Sign up for emails and get more blogs and news

Great Expectations email sign-up

Hello friend of Great Expectations!

Our email content features product updates from the open source platform and our upcoming Cloud product, new blogs and community celebrations.

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Banner Image

Search our blog for the latest on data management


©2024 Great Expectations. All Rights Reserved.