This is the question we get more than any other. So here's the rundown.
Watt's Signal Graph holds 145,000+ signals on people and 55,000+ signals on businesses, covering 250M+ US adults and 60M+ US companies. Every one of those signals comes from somewhere. We don't generate any of it ourselves. Watt is signal infrastructure, not data collection.
A handful of vendors with hundreds of sources underneath
Right now watt acquires data from two primary partners.
- RevenueBase provides our B2B data. This is the same underlying data layer that powers ZoomInfo and most of the legacy B2B providers. Firmographics, technographics, hiring signals, etc. -the B2B stack.
- DataMoon provides our consumer data. Identity signals, purchase intent, life events, interests, lifestyle, household composition, etc. -the B2C stack.
These two partners each represent hundreds of upstream data sources of their own. So while we only have two contracts, the data flowing into the Signal Graph is sourced from thousands of underlying providers.
That number will grow. As Watt scales, we'll add more direct vendor relationships and likely move further upstream toward source-of-truth providers wherever possible. Fewer hands touching the data means higher fidelity, and we'll come back to why that matters in a minute.
All of our data is on the open market
Here's the part most data companies hide.
Every signal currently in the Signal Graph is available on the open market to anyone who knows where to look. We don't have exclusive licensing. We don't have proprietary collection. We're not the only buyers of any of the underlying datasets. If you wanted to assemble the same raw signals we work with, you could.
The reason most don’t, is that the work isn't in the sourcing. It's in the consuming of enormous amounts of raw signaling data.
The moat is the substrate, not the sourcing
Anyone can buy the same raw signals we buy. The hard part (the part that took us years and that nobody else has built commercially) is structuring those signals so an AI can reason across all of them at once, without being handed a schema, and without anything getting pre-compressed upstream into someone's idea of what mattered.
Every other data product in the market was built for a human consumer. A warehouse assumes an analyst with a query in mind. An enrichment API assumes a CRM record being filled in one row at a time. Even the "AI-ready" data vendors are wrapping pre-aggregated fields in an MCP server and calling it done.
That's not the same shape. The moment a customer's AI agent needs to compose across signals nobody pre-imagined — and that's the moment a customer actually pays for an AI product — those tools fall over.
The Signal Graph is built on the opposite premise: raw signals at the resolution events actually happened, held with enough density and freshness that nothing the model might want to compose has been thrown away, exposed through composable operations the model assembles in real time against the live state of the substrate. That's what lets a marketer at one of our customers type a question nobody on our team anticipated and get back a 1.2M-row audience in a single session, no data team in the loop.
Why fewer hands on the data matters
The chain of custody behind commercial data is more layered than most buyers realize. There are companies that generate raw behavioral data at the source — retailers tracking purchases, shipping companies tracking deliveries, ad networks tracking impressions, data co-ops aggregating member contributions. Above them sit derivative companies that build signals on top of that raw data and sell those derivatives downstream. Above them sit middlemen who combine signals from multiple derivatives to create new packages. And above them sit aggregators who buy from the middlemen.
By the time data reaches a consumer like Watt, it may have passed through three or four layers of this ecosystem.
This is also a substrate question. Every hand the data passes through is another opportunity for fidelity to degrade and for upstream aggregation to compress raw signal into someone's pre-built summary. Each compression step throws away information the AI might have wanted to reason over later. The closer we can get to the original signal-generating event, the more raw the signal is when it enters the graph, and the more there is for an agent to compose against on the way out.
That's why our preference is to go as far upstream as possible when evaluating a vendor. It's not just a quality argument. It's a structural argument about what the substrate can support.
How a signal gets into the Signal Graph
When we onboard a new vendor today, the process takes several months.
We start with qualification. Does this vendor have signals that are net-new to the graph? Do they improve signals we already have? What's the lineage? How fresh is the data? What's the coverage? How accurate is it?
If they pass that bar, we move to ingestion. Building a pipeline to feed the vendor's data into the base graph is technically non-trivial because data vendors are notoriously inconsistent. Formats change. Delivery schedules slip. Signals arrive on uneven cadences.
There's no existing tooling designed for this because we built a novel database. Once data is ingested, it flows into the base graph asynchronously, on whatever cadence the vendor delivers. The Signal Graph itself recalculates every day, incorporating all data that has arrived since the last cycle. Our goal is to get vendor onboarding under one day and eventually make it fully self-service. We're not there yet.
What we don't do
We don't collect any data ourselves. There's no first-party tracking, no SDK, no cookies on consumer browsers, no observation of end users. All data is acquired externally from licensed partners.
We don't have any exclusive relationships today. That will change as we grow. There's a future state where Watt will license data that isn't available anywhere else simply by being the first buyer. But that's not the current state, and we're not going to pretend otherwise. We don't operate outside the US. All data in the graph is US-only. We've deliberately turned down deals to stay out of GDPR jurisdictions while we focus on building the US market.
One thing worth saying out loud
Because the Signal Graph unifies opt-in and opt-out flags across multiple data sources into a single layer, the privacy posture of the graph is arguably stronger than any individual source feeding into it. If five different vendors have collected opt-out flags on the same email address, all five are surfaced in the graph, even if some of those vendors didn't originally collect the flag themselves.
That's a quiet benefit of the architecture that we don't talk about often enough. It's also a useful reminder that aggregated infrastructure can be designed to raise the privacy floor, not lower it.
More questions
We've published a longer FAQ on data sourcing, privacy flags, and customer responsibility in our Docs homebase. If there's something we haven't answered, we want to hear it.
The data industry has been selling broken signals through opaque chains of custody for years. We're going to do this differently.