
This blog is about Metrics. That’s Metrics as in 'the specific GX Metric object,' not 'the generic concept of metrics.'
Most GX users won’t ever need to interact with a Metric. You’ll really only encounter Metrics directly when you’re deep in an Expectation’s code or creating a Custom Expectation.
So this post is a primer aimed at the subset of advanced GX users who are doing that under-the-hood Expectations work. And, of course, at anyone who’s interested just because.
If you were hoping to hear about generic metrics, can I interest you in this blog post instead?
What is a Metric?
Metrics are a key component of Expectations. One easy way to define a Metric is:
A Metric is an answer to a question you have about your data.
… where the question is part of your Expectation.
Minimalist Metrics
For a simple example of how a Metric relates to an Expectation, let’s consider expect_column_max_to_be_between. You use this Expectation to describe an acceptable range of values (provided by you) for the column’s maximum.
To determine if this Expectation is being met, GX needs to answer a question about the data: what is the column’s maximum value?
With the answer to that single question, you can get the results of the Expectation. So this Expectation needs just one Metric,
.
Similarly, you can determine the results of expect_column_unique_value_count_to_be_between if you answer how many unique values does the column have?—which corresponds to the sole Metric
.
Those examples are straightforward because those Expectations produce a single overall statistic or result for each Batch they evaluate. The Expectation is passed or failed based on that one answer.
But Expectations can also produce a pass/fail for each row, with the Expectation’s results based on the totality of the row results. Getting that kind of answer entails asking more than one question, which means more than one Metric.
With this kind of Expectation—which here we’ll call ColumnMap, after the class that’s used to implement them—we start to see multiple Metrics.
Metrics for (Column)Maps
In a ColumnMap Expectation, you’re evaluating individual rows. If all the rows pass, the Expectation passes.
So the main questions you’re asking about the data as a whole are:
How many rows are there?
How many rows don’t meet the validation criteria?
How many invalid values are there?
What are the invalid values?
Generally, these questions show up in a ColumnMap Expectation as the following Metrics:
- table.row_count
- column_values.nonnull.unexpected_count
expectation_namecolumn_values..unexpected_count
expectation_namecolumn_values..unexpected_values
The first two questions and their respective Metrics are straightforward:
starts turning up to report the total rows. It’s usually accompanied by
with the number of rows that fail, though in some scenarios you’ll see
instead.
The answer to these two Metrics are what you need to determine if the Expectation is passed.
Strictly speaking, you don’t need to ask how many unexpected values there are (
expectation_name
) or what they are (
expectation_name
). But without this information a failed ColumnMap Expectation can’t provide you with any context about the failure; in practice, you should always ask these questions.
For many ColumnMap Expectations in the Expectation Gallery, such as expect_column_values_to_be_increasing, these four Metrics are the ones you’ll see:

Metrics & mostly
There’s one more aspect to consider for ColumnMap Expectations: they can use the
parameter.
Using
allows you to set a threshold for the percentage of rows that have to pass in order for the Batch as a whole to pass. The default, without
, is 100%.
Using
allows you to pass data even if it’s less than perfect, while still specifying a point at which the data will no longer be ‘good enough.’ It’s calculated using the same
and
Metrics that the default pass/fail behavior uses.
Making Metrics
We’ve talked about Metrics as the answers to questions. It’s natural to ask if the Metrics also calculate those answers.
In short: no. This is where the MetricProvider steps in.
As we start talking about calculating, recall that GX can use different Execution Engines. And Pandas, Spark, and SQLAlchemy will each need different code to carry out the same calculation... so actually each Metric needs multiple calculations.
MetricProvider handles the connection between the Metric and the appropriate Execution Engine. To quote the MetricProvider conceptual guide:
To allow Expectations to work with multiple backends, methods for calculating Metrics need to be implemented for each ExecutionEngine. For example, [calculating the mean in] pandas is implemented by calling the built-in pandas
method on the column, Spark is implemented with a built-in Spark
.mean()function…
mean…the inputs for MetricProvider classes are methods for calculating the Metric on different backend applications. Each method must be decorated with an appropriate decorator. On
, the MetricProvider class registers the decorated methods as part of the Metrics registry so that they can be invoked to calculate Metrics.
new
That concludes this intro to Metrics in GX! You can read more about implementing a Metric here, or check out the rest of our documentation.