backgroundImage

What's new in GX: May 2023

New and updated Expectations, Spark functionality updates, Fluent Datasource improvements, dropped support for Python 3.7, and more

Erin Kapp
May 30, 2023
A small spotted frog is being held up to the camera with a wetland in the background
Get the scoop on everything GX that was released into the wild last month. Also released into the wild: this gopher frog. (📸: Niva Hoffman USFWS, 2021)

Into hibernation: dropped support for Python 3.7

The day is here: as of GX version 0.16.14, we no longer support Python 3.7. 

If you’re still using it:

  • You’ll still be able to use GX, but

    pip install great-expectations
    won’t install newer versions, and  you may need to pin or set upper bounds to make sure you’re using a version of GX that’s compatible with Python 3.7.

  • We recommend upgrading to Python 3.8 or later if possible—we want you to be able to use the improvements that we’re working on!

The full announcement with additional detail is on our Slack.

  • [MAINTENANCE] bump python minimum version to 3.8 (#7916)

Sounding great: new and updated Expectations

Notes in harmony:

  • [BUGFIX] fix class name (#7734) (thanks @tb102122)

  • [BUGFIX] expect_day_count_to_be_close_to_equivalent_week_day_mean (#7782) (thanks @Hadas Manor)

Catching fire: Spark functionality

There are a ton of new updates brightening up GX’s Spark functionality in GX, especially as it relates to Fluent Datasources.

Glimmers:

  • [FEATURE] Spark parquet reader support for fluent datasources (#7754)

  • [FEATURE] Spark read directory of files as a single batch for CSV (#7777)

  • [FEATURE] Enable passing "spark_config" through to "SparkDFExecutionEngine" constructor as arguments to "add_spark()" Fluent Datasources methods. (#7810)

  • [FEATURE] Splitters work with Spark Fluent Datasources (#7832)

  • [FEATURE] Spark file reader support for fluent datasources (#7844)

  • [FEATURE] Add Spark DeltaAsset type (#7872)

  • [FEATURE] Spark directory asset types (#7873)

  • [DOCS] FDS Deployment Pattern - AWS: Spark and S3 (#7775)

  • [BUGFIX] Handle “persist” directive in “SparkDFExecutionEngine” properly. (#7830)

  • [BUGFIX] Fix sparkDF cannot compute mean for DecimalType (#7867)

  • [MAINTENANCE] Enable Spark-S3 Integration tests on Azure CI/CD (#7819)

May flowers: Fluent Datasource improvements

This month saw a bouquet of improvements and fixes for the Datasource of the future.

New blooms:

  • [FEATURE] Add tests for

    SimpleCheckpoint
    utilizing Fluent Datasources with Pandas, Spark, and SQLAlchemy test cases. (#7778)

  • [FEATURE] Add batch.columns() convenience method to Fluent DataAsset implementation. (#7926)

  • [FEATURE] NotImplementedErrors for all FDS methods when accessed from BDS (#8002)

  • [DOCS] FDS Deployment Pattern - Google Cloud:  BigQuery and GCS (#7741)

  • [BUGFIX] Adding support for Fluent Batch Requests to context.get_validator (#7808)

  • [BUGFIX] Fix remaining FDS config substitution issues (#7917)

  • [MAINTENANCE] FDS - Datasources can rebuild their own asset data_connectors (#7826)

Sprucing things up: other new features

  • [FEATURE] using

    os.path.sep
    for Windows OS. (#7339) (Thanks @Richard O’Hara )

  • [FEATURE] Update

    get_context
    to scaffold project structure for file-backed usecases (#7693)

  • [FEATURE] Plumbing of validation_result_url from cloud response (#7809)

  • [FEATURE] DataProfilerStructuredDataAssistant Float Rule (#7842) (thanks @Michael Davis)

  • [FEATURE] Add DirectoryDeltaAsset (#7877)

  • [FEATURE] add ssm parameter support for config secrets (#7940) (thanks @Isaacwhyuenac)

New chapters: documentation updates

  • [DOCS] Remove Redundant Introduction Headings (#7747)

  • [DOCS] Creating a Checkpoint from an In-Memory Dataframe (#7701)

  • [DOCS] Updating Checkpoint terms page (#7722)

  • [DOCS] Review and Revise Great Expectations Quickstart (#7727)

  • [DOCS] Add CLI Admonition (#7765)

  • [DOCS] Update docs for how_to_initialize_a_filesystem_data_context_in_python (#7831)

  • [DOCS] Technical tags in Versioned Docs reference correct version (#7935)

  • [DOCS] add in-memory add expectation suite (#7973) (thanks @Tobias Bruckert)

Pest control: other bug fixes

  • [BUGFIX] Repair handling of regular expressions partitioning for cloud file storage environments utilizing prefix directive. (#7798)

  • [BUGFIX] Azure Package Presence/Absence Tests Strengthening (#7818)

  • [BUGFIX] Fix inability to extend SimpleCheckpoint -- and several additional enhancements and clean up (#7879)

  • [BUGFIX] Delete ExpectationSuite by name in GX Cloud (#7881)

  • [BUGFIX] Return qualified name when calling

    TableAsset.as_selectable()
    (#7942) (thanks @calabozo)

  • [BUGFIX] Change GXSqlDialect.AWSATHENA to

    awsathena
    (#7950) (thanks @calabozo)

  • [BUGFIX] Cloud - Fix

    context.sources.update_*() POST
    instead of
    PUT
    calls (#7989)

Upkeep: additional maintenance

  • [MAINTENANCE] Enable

    flake8-bugbear
    rules (#7776)

  • [MAINTENANCE] CLI warnings for

    suite new
    command (#7787)

  • [MAINTENANCE] Update

    GXCloudIdentifier
    to return nullable attrs instead of empty strings (#7985)


Get the full GX changelog here.

Search our blog for the latest on data quality.


©2024 Great Expectations. All Rights Reserved.