Quick Answer

On-chain data is what the blockchain records directly: blocks, transactions, balances, events, and logs. Off-chain data is everything outside blocks: exchange trades, order books, derivatives, funding rates, liquidations, ETF flows, and developer activity. Real analysis needs both. On-chain explains flows and holder behaviour. Off-chain explains price discovery, depth, and leverage. The key thing to understand is that most popular "on-chain indicators" are not raw fields from the chain. They are constructed from raw data plus rules and heuristics. If you know where a metric comes from and how it is built, you can judge whether it is telling you something solid or something noisy.

Key points
On-chain data is the raw facts recorded by the network. Off-chain data comes from exchanges, funds, custodians, and market data vendors.
Most popular on-chain charts rely on heuristics and labels: entity clustering, cohort bucketing, exchange tagging. These are informed guesses, not perfect truth.
Provider methods change without clear version notes. A chart that looks different today from last month may reflect a methodology update, not a change in behaviour.
ETF flows are off-chain data but directly influence on-chain supply. This intersection is where some of the most useful current cycle signals live.
Trend is usually more reliable than absolute levels when comparing across providers. Different label sets produce different numbers; the direction is usually consistent.
Treat charts as claims that must be validated, not as screenshots to trust. Know the method, track the version, cross-check high-stakes reads.

The Two-Layer View

The most useful frame for understanding crypto data is the two-layer view. Every major market event has two distinct data layers sitting underneath it. They are not interchangeable and neither is complete without the other.

The two data layers
The holder and flow layer (on-chain): Who holds the asset, how long they have held it, whether they are moving it, whether supply is accumulating or distributing, and what the realised cost basis of the market looks like. This is what on-chain data explains.
The price and leverage layer (off-chain): Where price is actually setting, how much depth supports it, how much leverage is open, whether derivatives are amplifying moves, and where institutional flows are positioned. This is what off-chain data explains.
Why neither layer is enough alone: A strong on-chain accumulation signal is less useful without knowing whether derivatives leverage has built up on top of spot price. A funding rate extreme is less useful without knowing whether on-chain holder behaviour supports or contradicts the leveraged positioning. The reads that hold up through cycles combine both layers.

This two-layer frame also tells you where to look for specific questions. If you want to know about holder conviction and long-term behaviour, start with on-chain. If you want to know about short-term price mechanics, leverage, and institutional positioning, start with off-chain. Most serious analysts maintain visibility into both.


What Counts As On-Chain Data

On-chain data is the raw facts recorded by the network. Blocks, transactions, balances, events, contract logs, traces, and mempool entries. You can access it by running a node yourself or using a trusted indexer that parses chain state.

Sources And How They Work

1
Full node (highest integrity)

You run your own node and query chain state directly. Highest data integrity because you verify every block yourself. Slower to build features on top of, requires storage and maintenance.

2
Hosted nodes

Providers like Alchemy, Infura, or QuickNode give you API access to a fully synced node. Faster to start, but you trust the provider's uptime, filtering, and version management.

3
On-chain analytics providers

Glassnode, CryptoQuant, Nansen, Arkham, and others parse raw chain data and apply labels, clustering, and entity identification. Faster and more feature-rich than running your own node, but the value depends on the quality of their heuristics. The labels are the product, not just the data.

What On-Chain Data Is Genuinely Good At

On-chain strengths
Transparent, reconstructable, and tamper-evident. The chain state is auditable by anyone with a node.
Excellent for supply mechanics, large-holder flows, long-term holder behaviour, and settlement-level activity.
Hashrate and miner economics are on-chain and unambiguous (no labelling required for the raw metric).
Smart contract state, DeFi protocol positions, and staking balances are directly readable from chain.
The heuristic problem: As soon as you move from raw data to labelled entities ("exchange wallets", "miner wallets", "long-term holders"), you are relying on the provider's clustering heuristics. Those heuristics can be wrong, they can change, and they differ across providers. The chain tells you what happened. It does not tell you who did it. Entity attribution is a model, not a fact.

What Counts As Off-Chain Data

Off-chain data is anything not baked into blocks. Exchange trades, order books, derivatives, funding rates, liquidations, ETF flows, developer activity, social signals, web traffic, and survey data.

Sources And How They Work

1
Exchange APIs

Trades, order books, derivatives, open interest, funding rates, and liquidation events from centralised exchanges. Coverage depends on which exchanges an analyst tracks. Venues not covered introduce gaps.

2
Market data vendors

Kaiko, The Block, Coin Metrics (for market data), and others aggregate many exchanges into one feed. The quality of aggregation (how they handle duplicate trades, volume normalisation, and venue coverage) varies and matters significantly for any volume-based analysis.

3
ETF and custodian reports

Shares created or redeemed, AUM, and holdings disclosed by ETF issuers and custodians. Published on a reporting lag, not real-time, but important for understanding institutional positioning and flows. These are off-chain reports about assets that eventually move on-chain.

4
Developer and social data

GitHub commit data, developer activity counts, and social engagement metrics from platforms like Santiment or LunarCrush. Useful context but high noise-to-signal ratio. Treat as supporting evidence, not primary signals.

What Off-Chain Data Is Genuinely Good At

Off-chain strengths
Price discovery, order book depth, and liquidity structure that the chain cannot show.
Derivatives positioning, funding rate extremes, and liquidation cascades that reveal how leveraged the market is.
Near-real-time signals for short-term price mechanics that on-chain data lags.
Institutional flow signals through ETF data, custody reports, and regulated venue activity.
The coverage problem: Off-chain data is only as good as the venues it covers. An exchange that is not in a vendor's feed does not exist in their charts. Wash trading and bot activity inflate volumes at some venues. APIs reshape historical data when venues change their methodology. Always know which venues are included in any off-chain chart before drawing conclusions.

Where On-Chain And Off-Chain Intersect

The most analytically useful territory is where the two data types intersect. ETF flows are the clearest current example.

ETF inflows and outflows are off-chain data. The ETF issuer reports how many shares were created or redeemed. This is a disclosure, not a chain event. But the settlement of those flows (the bitcoin actually moving to or from the custodian) is visible on-chain. An inflow creates demand, and eventually the custodian receives BTC on-chain. An outflow reverses that.

Why this matters for cycle reading: When ETF flow data (off-chain) diverges from on-chain exchange reserve data, the gap is analytically meaningful. If ETF inflows are strong but on-chain exchange reserves are not falling, the demand may not be translating into real spot buying. If both move together, the signal is more robust. Using only one layer misses the cross-check.

Stablecoin dominance is another example of this intersection. The market cap of stablecoins is an on-chain figure. But where those stablecoins are sitting, whether on exchanges ready to deploy or in cold storage as dry powder, requires combining on-chain supply data with off-chain exchange reserve data. Neither data type alone answers the question.

Weekly analysis live now

The live application of this two-layer framework, how on-chain holder data and off-chain derivatives positioning are aligning right now, and what that means for the current cycle phase, is covered in the weekly member update.

See membership options

How Popular On-Chain Indicators Are Actually Built

Most "on-chain indicators" are not a single raw field you read from the chain. They are constructed from raw chain data plus a set of rules, definitions, and heuristics. Understanding the construction is the difference between using a metric correctly and being misled by it.

Supply And Age Metrics

These bucket UTXO or account balances by the date they last moved. The result is a distribution showing how long different portions of supply have been dormant.

Supply and age metric examples
Realised cap: Each coin valued at the price when it last moved, rather than current market price. A lower-noise measure of aggregate cost basis.
HODL waves: Supply bucketed by age bands (1 week, 1 month, 3 months, etc.) to show whether old or new hands dominate the current distribution.
Accumulation score composites: Provider-specific constructions combining balance changes, exchange flows, and age bands to score accumulation behaviour. These are model-heavy: understand the methodology before citing them.

Profit And Loss Metrics

These compare the realised price of a coin (what it cost when last moved) to current market price to estimate whether the market is in profit or loss in aggregate.

Profit and loss metric examples
MVRV (Market Value to Realised Value): Current market cap divided by realised cap. When MVRV is high, the average holder is sitting on significant unrealised profit, historically associated with distribution and cycle peaks. When it is low, average holder cost basis is close to or below current price, historically associated with capitulation zones.
SOPR (Spent Output Profit Ratio): The ratio of the value when a coin was sold to the value when it was bought. SOPR above 1 means coins are moving at a profit on average. SOPR below 1 means coins are moving at a loss. Used to identify whether holders are capitulating or realising gains.
NUPL (Net Unrealised Profit/Loss): The aggregate unrealised profit or loss across all holders as a percentage of market cap. A composite measure of where the market stands relative to its cost basis.
Important caveat: These metrics depend on accurate entity labelling. If a large exchange wallet moves coins internally for a wallet sweep or custody migration, that move appears in the data as a supply movement, potentially misclassifying dormant supply as active. Provider label quality determines how much this noise pollutes the signal.

Liquidity, Flows, And Activity

Other common constructions
Exchange net flows: Net of coins into and out of exchange-tagged wallets. Inflows suggest selling pressure building; outflows suggest self-custody or long-term holding. The quality of this metric depends entirely on which wallets are tagged as exchanges.
Miner flows: Similar construction for miner-tagged wallets. Used to track whether miners are selling or holding rewards. Hashrate data provides context for why miner behaviour changes.
Active addresses: Count of unique addresses involved in transactions per day. Useful for network activity trends. Easily inflated by MEV bots, airdrop farmers, and spam campaigns. Treat as a direction indicator, not an absolute count of users.

The Pitfalls That Ruin Good Reads

These are the most common ways that technically correct data produces analytically wrong conclusions.

Provider drift: A provider updates their label set or methodology. Old comparisons stop matching. You are looking at a different version of history than you were before, but the chart looks continuous. Always check whether a provider has published method change notes alongside any anomaly in a chart.
Look-ahead bias: A signal is constructed or cited as if it was clear at the time, when the method actually uses data that was not available until later. Common in backtests and retrospective analysis. The indicator that looks predictive in hindsight may have been invisible in real time.
Sampling and time zone mismatches: Mixing UTC timestamps with local exchange time, or comparing per-block versus per-minute bins, can make charts appear to conflict even when the underlying data is consistent. Align everything to UTC and note the sampling window explicitly.
Double counting from bridges and wrappers: A token bridged from one chain to another can appear as both an outflow on the source chain and an inflow on the destination chain. If both are counted in the same volume or flow metric, activity is double-counted. Net both sides before drawing conclusions on cross-chain volume.
Inflated activity from bots and farming: MEV bots, airdrop farmers, and spam transactions inflate address counts, transaction counts, and active address metrics. A spike in active addresses during an incentive campaign or a high-MEV period does not indicate genuine user growth.

A 15-Minute Validation Workflow

When any on-chain or off-chain chart is cited as evidence of a market condition, this is the workflow that confirms or questions it before treating it as actionable.

1
Define the claim

State it precisely. "Exchange reserves fell this week" is a claim with testable components: which exchanges, which asset, what time window, which provider.

2
Pull on-chain

Exchange-tagged addresses from at least two on-chain providers. Net inflow or outflow by day. Note the label vintage and whether any method notes are visible.

3
Pull off-chain

Spot price, funding rate, and open interest for the same period. Check whether the off-chain positioning picture supports or conflicts with what on-chain is showing.

4
Check for known distortions

Look for label flips, contract migrations, venue coverage changes, ETF-related wallet movements, or known farming campaigns that could explain an anomaly independently of genuine market behaviour.

5
Explain one level deeper

If exchange reserves fell, where did the coins go? ETF custody, long-term holder wallets, staking, OTC settlement, or something else? The direction is the surface. The destination is the substance.

6
Write the caveats explicitly

Label confidence level, sampling window, known data gaps, and which providers were used. A conclusion without visible caveats is an opinion dressed as analysis. Visible caveats make the work citable and honest.


How To Judge Any Metric At A Glance

Before trusting any dashboard chart or data screenshot, run through these five questions. They take less than two minutes and catch most of the common problems.

The five-question check
Is the definition written down? Does the provider document the formula, the units, and exactly what is being measured? If not, you are trusting a black box.
Is there a method note? Are the labels, filters, sampling frequency, and time zone documented somewhere visible? This is what lets you compare across providers and across time.
Is it replicable? Could you reproduce the metric from raw sources if you had to? Replicable metrics are structurally more trustworthy than opaque constructions.
Is there a version history? Can you see when the methodology last changed? A chart without visible versioning may be silently retroactively revised.
Does it lead or lag, and how does it behave under stress? A metric that looks predictive in trending conditions may be uninformative or misleading during rapid market moves or data anomalies.

Frequently Asked Questions

No. Both are incomplete on their own. On-chain data explains supply, flows, and holder behaviour. Off-chain data explains price discovery, depth, leverage, and institutional positioning. Real analysis needs both layers. Using only one produces blind spots.
Different label sets, different entity clustering heuristics, different contract maps, and different venue coverage. When providers update their labels, historical charts can change retroactively. Trends are usually consistent across providers even when absolute levels differ. Use multiple providers for directional confirmation rather than treating any single provider as definitive.
The direction of movement is usually useful. Absolute values depend heavily on which exchanges are tagged, how wrapped assets and cross-chain bridges are handled, and whether the provider's label set has changed recently. A sharp move in exchange reserves that coincides with a known ETF settlement event or custody migration may reflect operational movement, not organic buying or selling behaviour.
MVRV (Market Value to Realised Value) is the ratio of current market cap to realised cap. Realised cap values each coin at the price it last moved rather than current market price, giving a proxy for the aggregate cost basis of the market. When MVRV is elevated, average holders are sitting on large unrealised gains, which historically associates with cycle peaks and distribution. When MVRV is low, average cost basis is near or below current price, historically associating with capitulation zones. It is a useful context indicator but not a precise timing tool.
Free tools are excellent for learning the mechanics and tracking general trends. For analysis that informs material decisions, paid feeds typically offer better documented methods, visible versioning, data support when anomalies appear, and more granular historical access. The difference is less about data accuracy and more about the support and accountability layer around the data.
ETF flows are reported as off-chain disclosures by issuers. When an ETF takes in new shares, the issuer buys spot BTC through a custodian, which eventually moves on-chain to a custody wallet. When shares are redeemed, BTC moves out. Comparing ETF flow reports with on-chain exchange reserve data and custody wallet activity helps cross-check whether reported institutional demand is translating into on-chain supply movement, and at what lag.

The live application of this framework, how on-chain and off-chain data are aligning right now, and what the current evidence says about cycle phase will be covered in the weekly member update. Alpha Insider members get this analysis in real time every week across KAIROS timing, on-chain data, and macro signals.

Explore membership