"What is a Signal" defined the atomic unit: a single fact about a person or company at a specific moment, kept raw and composable, before someone upstream decided which behaviors counted. "Signal Engineering" described the operating model: you describe the outcome you want, and an agent composes the data work against a substrate, in real time.
Both pieces point at something they don't name. The first invokes "the schema that would have been built from the signals if anyone had known in advance which questions you'd ask." The second invokes "the substrate" the engineer composes against. They're both describing what we call the Signal Graph.
The Signal Graph is the substrate that holds signals together so an agent can use them at scale.
The bottleneck is shape, not speed
An agent can't reason over petabytes of raw vendor data. Not because it's too slow, but because there's nothing to grip. Raw vendor data is values: an exact income of $63,412, a raw intent counter that fired fourteen times last month, a precise coordinate, a free-text job title. Values are infinite, messy, and incomparable. You can't stack them, and neither can an agent.
So the Signal Graph answers one specific question: what shape does a signal need so an agent can compose against it in real time?
The answer, in a line: a signal is a boolean clause, and the graph stores, for every entity, whether that clause holds.
That definition raises two questions: what shape lets an agent traverse it, and how raw data becomes that shape. The rest of this post takes them in order, then shows what they imply together.
The shape an agent can traverse
A signal answers one atomic thing about an entity. Income in [$50K–$75K]: yes. In-market for a mortgage: yes. One term, one yes-or-no. That's a clause.
A composition is an arbitrary boolean combination of signals: signal-1 OR signal-2 AND NOT signal-3. The signal is the atom; the composition is the molecule you build from it. A composition can take several forms: one produces an audience, another traverses relationships between entities, another finds the signals a set of entities holds in common.
This is the whole design, and it has three consequences.
- The signals are the asset; a composition is a recipe over them. A composition is just signal references with boolean operators between them; the weight sits in the signals it points at. They're finite, reusable, and shared: the same "in-market for a mortgage" signal is a clause in a thousand different compositions. The graph holds about 162,000 signals, and a library that size spans more audiences than anyone could ever enumerate. This is the structural form of the line from the first piece, "five signals stacked with the right boolean composition is a precise audience": the signals are the clauses, the stacking is the composition, and the graph only has to hold the clauses.
- Composition is fast, because the answers are already on the shelf. For every signal, the graph precomputes the exact set of entities who answer yes. Building an audience isn't recomputing anything; it's boolean composition over answers that already exist. Add a clause, intersect two answer-sets. That's the entire cost, whether the audience is a thousand people or a hundred million. The bulk of the cost is in writing and downloading the result files.
- The graph traverses, not just filters. A single edge, who employs whom, lets a composition cross from people to their companies and back. "Engineers at Series B startups" isn't a hand-written join across two tables; it's a short walk across the graph, expressed as a composition of person signals and business signals.
This is why it isn't a database. A database exposes tables and columns; you have to know the schema and write the query. The Signal Graph exposes named signals and the language to combine them. The agent navigates by intent, naming what it wants and combining the clauses, rather than by structure.
How raw data becomes a signal
If a signal is a boolean answer, then building the graph is the work of turning values into answers. That collapse, from raw values to boolean clauses, is a semantic compression rather than a reduction in bytes, and it's the one that matters.
Two collapses happen, and both are acts of judgment.
- Many records become one entity. The same person shows up across a dozen vendors under a dozen different IDs. Resolution doesn't compress that redundancy, it resolves it away: one person, represented once. Petabytes of overlapping raw records become a few hundred million canonical people and businesses.
- Raw values become named signals. A thousand vendor columns, raw counters, and free-text fields collapse into a compact set of signals an agent can reason about. The many raw events behind a single intent topic, for instance, become one intent signal. Low-information data that identifies nobody is processed out during ingestion, before it ever reaches the graph.
What comes out is far smaller: roughly 353 million entities, described by about 162,000 signals. And it runs on a heartbeat: the signal library and its answers rebuild daily, and identity resolves continuously.
Answerability, not resolution
The two halves meet on one fact: the graph does not keep the raw values. A signal is its boolean answer; there's no $63,412 sitting underneath "income in [$50K–$75K]: yes," only the answer itself. So the graph doesn't sell resolution. It sells answerability: its power is the set of clauses it can answer, not a pile of raw values you can re-interrogate later.
That sounds like exactly the one-way compression the first piece warned against: someone chose a shape and threw the rest away. The difference is where the optionality lives. The lossless facts aren't destroyed; they persist upstream in the raw store, at the full grain of the original events. The graph is a materialized projection of it: the clauses worth holding, precomputed for every entity.
So the contrast with a vendor's frozen segment is exact. A vendor chose its brackets once, discarded the raw, and walked away; you're stuck. Our signal library is ours, and it grows. The optionality isn't "every value is retained for ad-hoc slicing." It's "we can mint a new signal and it's live tomorrow."
Granularity is a dial, tuned to demand
Keeping only the answer doesn't lock the graph at one resolution. We don't retain the raw figure behind a signal, but we control completely how finely the signals themselves are cut. Because the signal library is ours, that granularity is a product decision rather than a property of the data: we sharpen it where customers need to cut more finely.
And we deepen it by decomposition, which is lossless upward. When a band is too coarse, it splits: "Income in [$100K–$200K]" becomes ten finer signals ([$100K–$110K], [$110K–$120K], and so on), and the coarse signal is simply the OR of its children. (Illustrative, not a commitment.) Nothing broad is lost; something sharp is gained. The library grows finer where demand shows up, and composition guarantees every coarser view still composes from the pieces.
Where it stands today
Today the signal abstraction lives close to the surface, assembled at serving time rather than stored natively in the graph. The analytics that would make each signal richer aren't all wired up yet.
The work moving now pushes the abstraction down into the substrate itself: signals as first-class, precomputed objects; freshness tracked per signal; the whole library recomputed on a daily cadence. The trajectory is a graph that can answer questions it was never explicitly built for.
The moat is the substrate
Raw values are vendor commodity, the Plumber's "what data can I get?" Anyone can buy the feed. The asset is the other thing: a finite, ever-sharpening library of signals, and the precomputed answer to each one for every entity. That's the Signal Engineer's "what can I find?", made askable at scale.
A finite library of signals. An infinite space of questions you never had to anticipate. That's the substrate, and the substrate is the moat.