backgroundImage

Why GX Cloud?

What makes GX Cloud different from Open Source, and why.

Abe Gong
April 28, 2023
A photo of white cumulus clouds backlit against a bright blue sky
📸: Sam Schooler via Unsplash

The Beta launch earlier this month brought Great Expectations Cloud one step closer to being available to absolutely everyone. I’m super excited about this step forward. It’s going to make data quality a lot easier for a lot of people.

If you’re wondering how exactly GX Cloud is different from GX Open Source (or if you’re hearing about GX for the first time), this post is for you.

Why GX Cloud builds on top of GX Open Source

We’re very proud of what we’ve built with GX Open Source. It’s grown to become the world’s most popular data quality tool, with nearly 10 million monthly downloads and a Slack community with over 10,000 members.

All-time GX pypi stats

It’s a powerful and extensible library that covers a broad range of data quality issues, including missing data, source-to-destination replication, identifying outliers across multiple datasets, and more. It validates data natively on a wide variety of data backends, including Pandas, Spark, and many SQL dialects. It can be deployed in all kinds of contexts.

GX also provides tools for auto-generated data documentation to always keep tests and docs in sync, and tools for data profiling, so that teams can scale up coverage across many data assets.

However, because it’s packaged as a Python library, GX Open Source has certain limitations. Here are three key limitations that we’re addressing in Cloud.

Doesn’t provide persistent storage.

The

great_expectations
Python library is just that: a Python library. That makes GX open source a BYODB affair. You can use any of the major cloud providers to store your metadata and manage Data Docs, but setting that up still requires several manual steps. Most GX users don’t want to start off their data quality with a side of rolling their own infrastructure.

No concept of a user.

Since effective collaboration requires you to know who you’re collaborating with, and many data quality problems are best solved/prevented by collaboration, this can be a pretty severe limitation. Especially since GX’s official mission as a company is to revolutionize the speed and integrity of data collaboration.

Limited options for user interface.

As a developer tool, GX exposes interface options that a pure SaaS application wouldn’t have: a CLI, notebooks, and the Python APIs themselves. This ability to integrate deeply with the programmatic tools where data developers are most at home is one of the major advantages of GX. But modern UIs can unlock certain kinds of productivity and collaboration above and beyond pure code… and sometimes, you just want to click a button.

GX Cloud is how we’re delivering all these things and more.

What GX Cloud does differently

GX Cloud takes the capabilities of GX Open Source and adds hosting, storage, an interactive UI, roles and authentication, and more. That means it’s easier to deploy, easier to use, and much easier to collaborate with.

Since Cloud is a fully hosted and managed data quality solution, it’s way faster to get started. All of GX’s moving parts—DataContext configs, Data Docs, and all of its standard metadata stores—come prepackaged with sensible defaults. 

As you add Datasources and Expectations, everything is persisted to a shared, durable environment. From there, all of those pieces are instantly available anywhere you can run

gx.get_context()
.

That could be within an orchestrator like Airflow, Prefect, or Dagster; from a Jupyter or Deepnote notebook; as part of a data application like Streamlit or Hex; or as part of your CI/CD stack.  

In addition to bringing everything under one roof, GX Cloud streamlines certain operations. For example, it no longer requires a separate step to build Data Docs—they’ll always be up to date as soon as anyone makes edits or saves new results. You’ll also find that important steps in configuration are abstracted away with sensible defaults.

GX Cloud is also built to streamline collaboration around data quality. With persistent, easily shareable links to Data Docs and Validation Results, trend visualization, and the ability to edit Expectations directly from a UI, people across your organization can work together to identify and solve your data quality problems. 

We see collaboration features as one of the most exciting areas for data tooling and improvements to data infrastructure today. Data quality isn’t just a technical problem—it’s a collaboration problem. It can be hard for different data stakeholders to work effectively together.

GX Cloud is designed to be a collaborative data quality platform that meets data teams where they are. As a data analyst, scientist or engineer, you can choose whether you want to work in Python or the UI, depending on your task, your environment, and/or your feelings that day. At the same time, the domain experts you work with can use the kind of UI they’re familiar with.

Start using GX Cloud!

We’re looking for data teams to join us as we build the next stage of GX Cloud. As a Beta user, you’ll gain free access to GX Cloud in exchange for your feedback about the platform. For more details about GX Cloud, click here.

To join the Beta, click here or use the button below. Our team will be in touch with next steps.

More blogs

A large fuzzy bee visits some purple flowers
Blog
GX does not move or change your data
Erin Kapp
May 09, 2024
We take a deep dive into GX’s interaction with your data to put some misconceptions to rest...

Search our blog for the latest on data quality.


©2024 Great Expectations. All Rights Reserved.