Expect a statistic's percent delta for a given column of a DataProfiler percent difference report to be within the specified threshold, inclusive.
expect_profile_numeric_columns_percent_diff_between_inclusive_threshold_range
This expectation level is EXPERIMENTAL
Contributors:
Tags:
Metrics:
Description
Expect a statistic's percent delta for a given column of a DataProfiler percent difference report to be within the specified threshold, inclusive.
This expectation takes the percent difference report between the data it is called on and a DataProfiler profile of the same schema loaded from a provided path.
Each numerical column percent delta will be checked against a user provided dictionary of columns paired with dictionaries of statistics containing lower and upper bounds.
This function builds upon the custom ProfileNumericColumnsDiff Expectation of Capital One's DataProfiler Expectations.
It is expected that a statistic's percent delta for a given column is within the specified threshold, inclusive.
Args:
- profile_path (str): A path to a saved DataProfiler profile object on the local filesystem.
limit_check_report_keys (dict): A dict, containing column names as keys and dicts as values that contain statistics as keys and dicts as values containing two keys:
"lower" denoting the lower bound for the threshold range, and "upper" denoting the upper bound for the threshold range.
- mostly (float - optional): a value indicating the lower bound percentage of successful values that must be present to evaluate to success=True.
validator.expect_profile_numeric_columns_percent_diff_between_inclusive_threshold_range( - profile_path = "C:/path_to/my_profile.pkl",
limit_check_report_keys = { - "column_one": {
- "min": {"lower": 0.5, "upper": 1.5}, #Indicating 50% lower bound and 150% upper bound
}, - "*": {
- "*": {"lower": 0.0, "upper": 2.0}, #Indicating 0% lower bound and 200% upper bound
},
}
) - Note: In limit_check_report_keys, "" in place of a column denotes a general operator in which the value it stores will be applied to every column in the data that has no explicit key.
"" in place of a statistic denotes a general operator in which the bounds it stores will be applied to every statistic for the given column that has no explicit key.
Want to make your own Expectation or an improvement to this one?
We've put together some great how to guides (including videos) on how to create your own expectations in a flash!
You can see those resources here: Contributor Resources