Packages/CapitalOne DataProfiler Expectations/
iconCapitalOne DataProfiler Expectations  
Last updated on Apr 24, 2024

Data Profiler is an open source solution from Capital One that uses machine learning to help companies monitor big data and detect private customer information so that it can be protected. Data Profiler provides a pre-trained deep learning model to efficiently identify sensitive information, components to conduct statistical analysis of the dataset, and an API to build data labelers. Data Profiler can accept a wide range of data formats including csv, avro, parquet, json, text, and pandas DataFrames. Whether the data is structured, semi-structured or unstructured, the library is able to identify the schema, statistics, entities from the data. Versatility of the data labeler allows models to be modified as needed and it’s possible to run multiple models on the same dataset with just a few lines of code.

Contributors:
Dependencies:   
Designed by experts in the field:
Taylor TurnerPrincipal Machine Learning Engineer at Capital One
Since joining Capital One in 2016, Taylor has worked on projects within the model lifecycle, consulted with other departments on their ML projects, and taken new ideas and made them reusable tools internal and external to Capital One. He developed the profiler validator, null data flag, and data quantile reporting features for Data Profiler. Prior to Capital One, Taylor worked as a lead engineer for a hedge fund focused on using deep learning to glean tradable datapoints from a corpus of niche data streams of high-profile market commentators.
Designed by experts in the field:
Jeremy GoodsittLead Machine Learning Engineer at Capital One
Since joining Capital One in 2017, Jeremy has worked on machine learning optimization infrastructure, NLP model development such as the sensitive data labeler within the Data Profiler, and the engineering design behind the open source library, Data Profiler. His doctoral studies were at the University of Illinois where he worked on improving medical diagnostic techniques through computer vision and optimization techniques.
©2024 Great Expectations. All Rights Reserved.