Gretel: Building AI Data Without Scraping the Web

Gretel generates privacy-preserving synthetic datasets so companies can train AI models without exposing sensitive customer data or relying on scraped web content.

Most AI systems are trained on large volumes of scraped internet data but Gretel takes a different route. The company builds synthetic datasets designed to mimic real-world data without exposing the original sensitive information behind it.

Founded in 2020 in San Diego by Alex Watson and John Myers, Gretel grew out of a simple frustration. Teams working with sensitive data either risked privacy violations by using real customer records, or settled for weak synthetic substitutes that hurt model performance. There wasn’t a middle ground that felt production-ready.

How It Works

Gretel generates synthetic versions of structured and text-based datasets using techniques like differential privacy, GANs, and transformer models. The plan isn’t to invent random data. It’s to statistically reproduce patterns in real datasets while stripping away personally identifiable information. The output can then be used for training, testing, or sharing without exposing the underlying source data.

This matters most in regulated industries. Finance, healthcare, insurance, and large enterprises often sit on valuable data they can’t freely use because of GDPR, CCPA, HIPAA, or internal compliance constraints. Synthetic generation provides a way to experiment without moving raw records around.

Why It’s Gaining Attention

As debates around AI training data intensify, scraping the public web is no longer a neutral decision. Legal uncertainty and copyright disputes have made data sourcing harder to ignore. Gretel’s model avoids that layer entirely by focusing on enterprise-owned data and privacy-preserving generation.

The company has raised over $135M since launch and works with enterprises in regulated sectors.

Join the Discussion

Enjoyed this? Ask questions, share your take (hot, lukewarm, or undecided), or follow the thread with people in real time. The community’s open, join us.

Discord Community

Chat, code sharing & more

YouTube Comments

Video version & comments

Latest in Startups

Cursor: The AI Coding Tool You've Probably Heard About

Apr 9, 2026

Wiz: The Startup Google Just Paid $32B For

Mar 12, 2026