The Shape of Supply and Demand Curves in Rapidly Clearing Markets

A central challenge in economics is understanding how price affects the quantity of supply and demand, a relationship often assumed to be approximately linear. But there are markets where this notion of linearity, sometimes called “elasticity,” may not hold. In a paper that deserves more attention, Donier and Bouchaud show that supply/demand curves of rapidly clearing markets (with a Brownian price process) have an average shape that is locally quadratic, with no linear term.

Here is a classical illustration of how total supply and demand tend to vary with price:

Total demand decreases — and total supply increases — monotonically with price. The curves intersect at p*, which is the volume-maximizing, market price. [1] Near the market price, supply and demand vary linearly with changes in price.

If we “clear” the market, so that the supply on offer at p^* trades with the demand at p^*, then the curves will look like:

Which is the same as the first plot, just shifted downwards by the quantity traded. The curves are still linear near p*. And if we zoom in, the average supply/demand curves will look like [2]:

But, Donier and Bouchaud [3] show that markets with certain features will actually have *average* supply/demand curves that are locally quadratic. I.e.:

In this regime, a market order will have price impact that scales with the square-root of its size, on average.

Their result also raises the question whether some markets operate at saddle points in their production and consumption curves, which may look like:

Where W_{saddle} is the rough width of the “saddle zone,” the region where the curves are predominantly quadratic.

This strikes me as qualitatively different from the classical economic picture. It also makes sense intuitively: when an asset is volatile, it’s difficult to know the exact price where supply and demand balance. Donier and Bouchaud don’t speculate on the size of W_{saddle}, but there is at least the possibility that their results apply to a wider range of prices than expected. If any real markets have a large W_{saddle}, estimating their price elasticities would be difficult or impossible. It could also explain the ennui of financial markets — where headline-generating price moves have little effect on real-world supply and demand. [4]

The Donier-Bouchaud Model

Donier and Bouchaud (and previous co-authors) use a reaction-diffusion model to obtain this result. Briefly and roughly:

  1. New buy orders are created (in time interval dt) with probability \omega_+(y), where y = p - p^* is the difference between the price of the new order (p) and the market price (p^*). Sell orders are created with probability \omega_-(y).
  2. Existing orders are canceled with probability \nu_{\pm}(y).
  3. The market clears with periodic time-interval \tau — when buy and sell orders with crossing prices are matched, and removed from the market.
  4. The underlying process of the price p^* is Brownian.

When \tau is small, they show that supply and demand curves are locally quadratic, for any reasonable \omega_{\pm}(y) and \nu_{\pm}(y).

Of course, real-world markets do not instantly clear crossed meta-orders, even when markets trade in continuous time. For example: a trader might intend to buy $100M of stock at any price below $100/share, while another trader intends to sell $100M of stock at any price above $90/share. The two traders might dribble out their order-flow over weeks, instead of instantly trading with each other at a price between $90 and $100 per share. [5]

Nonetheless, it is conceivable that some markets behave as if they’re in the small \tau limit. In the Bitcoin market, traders may be less inclined to hide their intentions than in traditional markets, and the visible order book might represent the true levels of supply and demand near the market price. The authors present the average displayed supply and demand for Bitcoin in Figure 6, which is very close to a quadratic function for prices within 2% of the clearing price (where cumulative supply/demand is typically ~400k BTC). So electronic markets’ “saddle zones” may be about as wide as their daily volatility, which doesn’t seem surprising; few oil producers are going to increase drilling because the price went up by 1%.

Latent Liquidity as First-Passage Time

Donier and Bouchaud’s result seems to be a general feature of Brownian price processes, and doesn’t depend much on the specifics of the model. The spirit of their model raises the question whether there’s a connection between the marginal supply/demand at a given price, and the time required for the market to move through that price. That is, perhaps latent liquidity has properties similar to first-passage time statistics.

A quadratic supply/demand curve is equivalent to marginal supply/demand varying linearly with price. By definition, cumulative supply (S(y)) at a price y away from the “true” price is just the sum of marginal supply (\rho_{S}) available up to that price: S(y) = \int_{0}^{y} \rho_{S}(y') dy'. [6] \rho_{S}(y) may also be called the volume of latent sell orders available at price y.

One way to reach Donier and Bouchaud’s result is by assuming that \rho_{S}(p) builds with time, after the market price moves through p. [7] To be clear, the following model is different and less sophisticated than what Donier and Bouchaud did, but I think it’s a good way to capture the intuition.

As an illustration, consider what happens to the latent order book after the clearing price drops instantly from p_0 to p_1. At first, the supply on offer between p_0 and p_1 will be zero:

Afterwards, inside this new gap, latent sell orders will start to build. Let’s assume that \rho_{S} grows as a function only of the time t(p) since the price dropped through p: \rho_{S}(t(p)). [8] The precise form of the function \rho_{S}(t) doesn’t really matter.

If the market doesn’t move after that initial price drop, time t_1 later, the latent order book will have replenished. Between p_0 and p_1, there will be quantity \rho_{S}(t_1) on offer:

To calculate the expected value of \rho_{S}(t(y)), we need the probability density of the time since the price last passed y away from the present price: \mathbf{p}_{y}^{LPT}[t]. For a time-reversible process, this distribution is the same as that of the first-passage time, \mathbf{p}_{y}^{FPT}[t].

For continuous-time Brownian motion, the first-passage time distribution is well-known:

\mathbf{p}_{y}^{FPT}[t] = \frac{y}{\sqrt{2\pi \sigma^2 t^3}} e^{-y^2 / (2 \sigma^2 t)}

For y \ll \sigma \sqrt{t}, this is linear in y:

\mathbf{p}_{y}^{FPT}[t] \approx \frac{y}{\sqrt{2\pi \sigma^2 t^3}}

Which gives an average marginal supply curve that’s linear in y:

\mathbf{E}_{t}[\rho_{S}(t)] = \int_{0}^{T} \rho_{S}(t) \mathbf{p}_{y}^{FPT}[t] dt \propto y

Where T is the total time that the market has been operating.

Thus, the expected cumulative supply is quadratic in y: \mathbf{E}_{t}[S(y)] = \int_{0}^{y} \mathbf{E}_{t}[\rho_{S}(y')] dy' \propto y^2.

The model should fail at sufficiently large t, when it has been a long time since the price last reached its current level. E.g., if the price of oil rises past its high for the year to $60/bbl, then we’d expect the level of marginal supply near $60 to reflect the real-world economics of oil extraction. So clearly, the model shouldn’t work for t=1yr in the oil market. But if oil breached only its high-of-the-day, perhaps the marginal supply would just be a mechanical function of that duration. We could argue that the model will start to fail when t is long enough for businesses to react to new highs/lows, which should be about the typical time between business decisions (t_{InterDecision}). [9] In that case W_{saddle} \sim \sigma \sqrt{t_{InterDecision}}, which could be quite large for illiquid markets.

Discretizing the Process

If \tau > 0, the market clears every period \tau in a batch auction, and the price process becomes a discrete-time random walk. An infinitely long \tau should recover the uncleared, classical supply/demand curves at the top of this post. So, as \tau increases, we expect a transition from the quadratic supply of Brownian motion to a linear regime.

To get the average supply curve for discrete markets, we need the first-passage time distribution. When a random walk has price steps that are independently drawn from a symmetric, continuous probability distribution with finite second moment, its first-passage time PDF is asymptotically: [10]

\mathbf{p}_{y}^{FPT}[n] \approx  (\frac{1}{2 \sqrt{\pi n^3}} + \frac{y}{\sqrt{2\pi \sigma_{step}^{2} n^3}}) e^{-y^2 / (2 \sigma_{step}^2 n)}

Where n is the number of steps the random walk has taken, and \sigma_{step}^2 is the variance of each step’s price movement. The approximation is valid in the limit n \to \inf with \frac{y}{\sqrt{n}} finite. The step count is related to continuous time via n = \frac{t}{\tau}. And if the underlying process is Brownian, \sigma_{step} = \sqrt{\tau}\sigma.

When the price process has a typical step size (\sigma_{step}) that’s small compared to the distance from the market price (y), then the second term dominates and \mathbf{E}_{t}[\rho_{S}(t)] is identical to the continuous Brownian case. That is, cumulative supply varies quadratically with price when \tau \ll \frac{y^2}{\sigma^2}.

When the process is heavily discretized, y is small compared to \sigma_{step} and the first term dominates, which will be approximately constant in y. Thus the marginal supply will be constant, and the cumulative supply linear in y.

This result is the same as Donier and Bouchaud’s. In fact, if we expand \mathbf{p}_{y}^{FPT}[n] to first order in \frac{y}{\sigma_{step}\sqrt{n}} (near the market price), then we get:

\mathbf{E}_{n}[\rho_{S}(n)] \approx L(y + u_0 \sigma \sqrt{\tau})

Where L is some constant obtained from integrating out t, identified as a measure of liquidity by Donier and Bouchaud. And u_0=\sqrt{\frac{1}{2}} \approx 0.71 is a constant not terribly far from the one obtained by Donier and Bouchaud (u_0 \approx 0.82). [11][12]

Thus, slowly-clearing markets — which are heavily discretized — may not have a saddle zone. [13]

Supply/Demand Curves when the Price Process is a Lévy Flight

The above asymptotics apply to a broad class of random walks if their variance is finite. But markets can have price fluctuations with fatter tails, particularly on shorter timescales. A Lévy flight of index 0 < \alpha < 2 has price increments (x_{t} = p_{t} - p_{t-\Delta t}) with divergent variance and a power-law tail: \mathbf{p}[x] \sim \frac{1}{|x|^{\alpha + 1}}.

The first passage times of a Lévy flight have asymptotic PDF: [14]

\mathbf{p}_{y}^{FPT}[t] \sim \frac{y^{\alpha / 2}}{t^{3/2}} for long t. [15]

This distribution gives, on average, a cumulative supply curve S(y) \sim y^{(2+ \alpha) / 2}. And a market order will have price impact \mathcal{I}(q)=S^{-1}(q) \sim q^{2 / (2+\alpha)}. As an example, \alpha=1.5 would correspond to a rather “jumpy” market, and would have S(y) \sim y^{1.75} and \mathcal{I}(q) \sim q^{0.57}. [16]

Supply/Demand Curves for a Sub-Diffusive Price Process

The volatility of a sub-diffusion increases with timescale more slowly than the volatility of an ordinary Brownian motion. For example: \sigma_{\Delta t}^2 \sim \Delta t^{\gamma} . When \gamma = 1, the volatility scales in the usual way for Brownian motion: linearly with the timescale. When 0 < \gamma < 1, the process is a sub-diffusion. A sub-diffusive market is mean-reverting in the sense that a price fluctuation is likely to be reversed in the future. Because sub-diffusive markets have “memory,” they’re considered “inefficient.” [17]

The first passage time distribution of a sub-diffusion is asymptotically: [18]

\mathbf{p}_{y}^{FPT}[t] \sim \frac{y}{t^{1 + \gamma / 2}}

This has linear price-dependence like ordinary Brownian motion. So the cumulative supply is again quadratic, and market impact is again square-root.

Certain types of “efficiency” can lead to square-root price impact. But if this model is approximately accurate, then “inefficient” markets like sub-diffusions can also have square-root impact.

Update: Benzaquen and Bouchaud just examined a reaction-diffusion model for sub-diffusions. They show that the latent order book is locally linear (eq. 10), like in the crude first-passage analysis here. For quickly executed meta-orders, they show \mathcal{I}(q) \sim \sqrt{q}. But for slow meta-orders that give latent orders more time to react mid-execution, they get \mathcal{I}(q) \sim q^{1-\gamma / 2}.

Order-of-Magnitude Scaling of Impact in Diffusions and Sub-Diffusions

I find this result interesting because it appears, at first glance, to contradict the simplest, order-of-magnitude “derivation” of the square-root impact law. But on closer inspection, I think order-of-magnitude logic is consistent with sub-diffusions and ordinary diffusions having similar impact scaling.

If a market is Brownian, its price changes will scale like \sigma \sqrt{\Delta t}. One view of price discovery is that a fraction (of order 1) of those price changes come from traders’ impact. Thus, a meta-order’s impact will roughly scale with the square root of its duration: \mathcal{I} \sim \sigma \sqrt{t_{OrderDuration}}. Market-wide volume roughly accumulates at a constant rate in time (V), so a meta-order of size q will last for a duration t_{OrderDuration} \sim \frac{q}{V}. This gives \mathcal{I}(q) \sim \sigma \sqrt{\frac{q}{V}}.

Now, if a market is sub-diffusive then its price changes scale more slowly with time: \Delta p \sim \Delta t^{\gamma / 2} . If everything else in the above argument were the same, then we’d get \mathcal{I}(q) \sim q^{\gamma / 2} . But there’s no reason to expect that volume in a sub-diffusive market should occur at a constant rate in clock-time.

Sub-diffusions can be reformulated as processes with independent increments, where the time between steps is variable with a long tail. [19] In this formulation, steps may correspond to trading activity — with each step roughly having volume V_{step}. A meta-order would then last for N steps, with N \sim \frac{q}{V_{step}}. The step lengths of this sub-diffusion have finite variance \sigma_{step}^2, so we can follow the same argument as in the Brownian case, with the step count replacing time: \mathcal{I}(q) \sim \sigma_{step}\sqrt{N} \sim \sigma_{step}\sqrt{\frac{q}{V_{step}}} — which has the same square-root scaling.


Market-behavior is crowd-behavior, and a crowd can be more predictable than the individuals inside it. Donier and Bouchaud — and the loose interpretation in this post — show that that when markets are mechanical and boring, they will have cumulative supply and demand curves that are roughly quadratic, on average. Now, lots of us do not think markets are boring! Even if many markets do operate at saddle points, they probably do not have very wide “saddle zones.” But if any do, a small price change might have virtually zero effect on production and consumption, and a large price change might have a giant effect. That’s different from the standard intuition.

[1] This picture is obviously heuristic. Supply/demand curves are well known to have many shapes, may be non-monotonic, and may have more than one price that locally maximizes the exchanged volume.

[2] The supply and demand curves near p* should (probably) be symmetric around p*, if we average over some ensemble of situations in a given market.

[3] Borrowing from their earlier work with Bonart, Deremble, de Lataillade, Lempérière, Kockelkoren, Mastromatteo, and Tóth. (e.g. 2014 and 2011)

[4] Even when large price changes barely affect supply and demand, they could still serve an important economic function. The opinions and information of traders are incorporated into prices, which — when markets are transparent — provide useful signals for long-term business decisions.

[5] Table 1 and Figure 13 of Waelbroeck and Gomes indicate that the bulk of institutional equity meta-orders are shorter than a few days.

[6] Demand is similar. For ease, let’s just discuss the supply side.

[7] Latent liquidity isn’t usually observable, so we can’t easily test the hypothesis that latent liquidity at a given price builds as a function of time since the market was at that price. But latent liquidity could be measured via the order book — if a market is heavily financialized, transparent, and dominated by traders who don’t hide their intentions. Arguably, Bitcoin is (or used to be) such a market.

[8] We could add an i.i.d noise term to \rho_{S} and it wouldn’t affect the results. The assumption that \rho_{S} has no explicit price-dependence is similar to Donier and Bouchaud’s example of constant \omega and \nu. This assumption is obviously wrong, but sufficiently close to the market price, it may be reasonable enough.

[9] The reaction time of businesses could be significantly longer than the human decision-making timescale. Especially when the market in question isn’t transparent or mature. E.g., if a farmer switches from growing olives to almonds, it might take her grove several years to become productive again. So the price of almonds may need to reach a multi-year high in order for her to be confident enough to switch crops. But, if the farmer could hedge her future production, she might quickly decide to switch crops after it becomes economical to do so. Perhaps a well-developed futures market reduces the valid range of this model from t \lesssim 1yr to t \lesssim 1day.

[10] See eq. 38 of Majumdar and citations for details. I changed the CDF of eq. 38 into a PDF, and dropped a non-leading term.

[11] Eq. 22

[12] Sparre-Andersen scaling applies to any Markov process with a continuous and symmetric distribution of price movements: For large t, first passage time distributions have to decay like t^{-3 / 2}. So, in order for the expectation over t to converge, we must have \rho_{S}(t) < \sqrt{t} for large t (assuming \rho_{S}(t) is monotonically increasing). That is, the marginal supply at a given price must grow more slowly than the square root of the time since the market was at that price. The connection between latent liquidity and first passage times won’t hold for very long t, but this may provide a loose bound on how quickly latent liquidity can replenish.

[13] According to the model, slowly-clearing supply/demand curves are still dominated by the quadratic term when price movements are sufficiently large (and thus last-passage times long). But the model should fail at very long timescales, when businesses are able to react to price moves. Though, as discussed in [9], the model’s valid timescale could be longer if a market is slow and opaque.

[14] See, e.g., eq. 10 in Koren, et al.

[15] Figure 2 shows pretty rapid convergence to this asymptotic result after a handful of time steps.

[16] Perhaps it shouldn’t be surprising that impact is steeper than in the Brownian case. Heuristically, the Lévy tails make the market more “momentum-driven.” In a Lévy-type market, a trader who initiates a price movement could find that the market quickly moves away from her. The Lévy process used here has independent increments, but we can imagine the independence breaking down in continuous-time, “mid-increment.” I.e., it’s conceivable that momentum-traders could trade in the middle of a timestep.

On a related note, Koren et al. show that the mean leapover length diverges for Lévy flights. The leapover length could be interpreted as the profit made by stop orders, *if* they execute in the middle of a price jump and are able to hit liquidity during tail events. Those are big “ifs,” but the potential to make near-infinite profit may partly explain the popularity of stop orders and short-term momentum strategies. It could also explain why traders are reluctant to post much liquidity far away from the current price.

[17] In a sub-diffusive market with low transaction costs, betting on mean-reversion is a profitable strategy. If you know of any electronic markets like this, let me know.

Even if sub-diffusions are rare in electronic markets, linear combinations of different assets could still be sub-diffusive. The stereotypical “pairs trade” involves a mean-reverting spread between prices of two assets. Also, sub-diffusions may be more common in the broader, off-exchange economy than we’d naively expect.

[18] See the first non-constant term in the series expansion of eq. 30 in Metzler and Klafter (“Boundary value problems for fractional diffusion equations”). Eq. 30 is the survival probability (CDF of the first passage time), the PDF is the time derivative. The approximation is valid for y \ll \sqrt{ K_{\gamma}t^{\gamma} }. K_{\gamma} is the “fractional diffusion coefficient,” analogous to \sigma^2 for \gamma=1.

Note that sub-diffusions are not Markov processes, so Sparre-Andersen scaling [12] doesn’t apply. Liquidity must replenish more slowly than \rho_{S}(t) < t^{\gamma / 2} for large t.

[19] Correlated waiting times may also lead to sub-diffusions in a process with independent increments.

CHX and Four Types of Speed Bump

Speed bumps, the latest fad in market structure, are proliferating. The Chicago Stock Exchange (CHX) recently proposed a new type of speed bump in the US equities market, called the “Liquidity Taking Access Delay” (LTAD). [1] The SEC’s decision whether to approve the LTAD could have a big impact on market structure, possibly more so than the IEX decision.

To understand the consequences of approving LTAD, I think it’s helpful to consider four types of asymmetric access delay: [2][3]

  1. An exchange applies a delay to orders that are accessing resting liquidity. Resting orders themselves may be modified or cancelled without delay. This is the LTAD.
  2. The same as #1, except only resting orders that are non-displayed avoid the delay.
  3. The delay applies to all client messages to the exchange, including cancellations and marketable orders. But the exchange operates algorithmic order-types which adjust their attributes without delay. In particular, a displayed order can be pegged to undelayed price data, while its potential counterparties are subject to the delay.
  4. The same as #3, except only non-displayed algorithmic order-types avoid the delay. This is the IEX speed bump. [4]

Here’s a table of the speed bump combinations, split by which types of resting orders have functionality similar to last look:

Last look Can apply to displayed quotes Can apply to non-displayed quotes only
At discretion of resting order-sender #1 (LTAD) #2
At discretion of exchange algo #3 #4 (IEX)

In my opinion, all four types do more harm than good to market end-users (in the context of Reg NMS) — each allows resting liquidity to fade in some way. But many end-users disagree, and asked the SEC to approve #4. So, now that speed bumps exist and non-displayed peg orders may elide them, which of the other three asymmetric delays should be allowed?

One school of thought is that exchanges have a duty to protect their peg orders to the maximum extent possible. Combining this principle with the new “de minimis” interpretation may imply that speed bumps of types #3 and #4 should be allowed. Healthy Markets takes this view. [5] I think it’s a bad idea for exchange algorithms to have a time-advantage that isn’t offered to traders. Exchanges have neither the incentive nor the expertise to optimize their pricing algorithms. [6] The traditional role of the exchange is to provide a meeting place for buyers and sellers, allowing them to transact at prices of their own choosing. Exchanges offer algorithmic order-types (like pegs) mainly for convenience, and I don’t think there should be any illusion that these order-types are as good as the techniques used by sophisticated brokers and traders. Providing exchange algos with a time-advantage will mean that they out-compete traders, who are otherwise superior in terms of their pricing accuracy, stability under stress, and diversity. Traders and their algorithms do make mistakes, but it’s hard to believe that an exchange algo monoculture can do better. Subsidizing inferior business methods reduces an industry’s productivity, and giving exchange algorithms a structural advantage would be no different.

In my view then, if #4 is allowed, then #2 should be allowed. And, if #3 is allowed, then #1 should be allowed. Since #4 is already approved, the only thing to consider is whether it should be permissible for delays to have asymmetries in how they apply to accessing displayed quotes. The comments mostly do a good job explaining why asymmetries are problematic for displayed market structure. [7] In short, if displayed orders are given extra time to decide whether to consummate a trade, a large number of quotes will be practically inaccessible, even though it’d be prohibited to lock, cross, or trade-through those quotes. [8] Even in markets without these regulations, such as FX, last look causes difficulty. [9] The consequences of combining last look with order protection are unpredictable, but they seem unlikely to be good for long-term traders.

So, if we don’t want Reg NMS order protection to apply to quotes eligible for last look, only options #2 and #4 should be permitted. There is a tradeoff, though. Giving only hidden orders a time-advantage will incentivize dark trading. I doubt that the major exchanges would become as dark as IEX, but the equities market would probably become less transparent.

IEX’s approval is already inspiring a dramatic increase in complexity. CHX might not be the most important exchange, but like IEX, any precedent it sets applies to other exchanges. [10] Restricting speed-bump-asymmetries to hidden orders has drawbacks, but it might be the only way Reg NMS can keep limping on.

[1] Technically, it’s not really new. The LTAD is similar to an old proposal from Nasdaq’s PSX, which was rejected by the SEC.

[2] In order to highlight the essential elements, I’m leaving out some details.

[3] There are other speed bump types of course. For example, an exchange could delay traders from cancelling resting orders, but allow incoming marketable orders to execute without delay. That sort of delay could be used to address complaints about “fading liquidity“, and is similar in some respects to Nasdaq’s tentative “Extended Life Order” and EBS’s “Minimum Quote Life“.

Delays in market data are speed bumps of an entirely different class. These are prohibited (I think) in US equities markets, which require quotes and trades to be published to the SIP without delay. Though perhaps this requirement doesn’t apply to “de minimis” delays?

[4] Algorithmic order-types may also use their undelayed data feeds to trade aggressively with resting orders which can’t be cancelled without going through a delay. IEX’s DPEG does this via its “book recheck” mechanism.

[5] From their comment letter:

Time delays should not apply to an exchange’s ability to price orders on behalf of all participants (i.e. Pegging).

Dave Lauer clarified on Twitter that this principle also applies to displayed peg orders.

[6] CHX’s justification for the LTAD evokes a useful thought experiment. CHX argues that its ETF quotes are victims of “latency arbitrage,” and that the LTAD will prevent this. If the SEC rejects the LTAD, CHX might propose another type of speed bump, where CHX manages displayed orders in ETFs (e.g. SPY, QQQ, etc) by pegging them to undelayed CME market data. Will CHX understand the relationship that these ETFs have to futures markets as well as professional traders do? And these ETFs are just the simple cases — imagine what kind of pegs an exchange might come up with to prevent market makers from being “picked off” on XOM when the price of crude or CVX moves. I’m sure exchanges can think of peg algos that market makers would find very useful, but is that really what we want exchanges doing?

[7] No discussion of the LTAD would be complete without mentioning CHX’s market data revenue sharing program. This post, though, isn’t intended to be a complete discussion. Many of the comment letters (and Matt Hurd’s playful summaries) address this issue.

I found it interesting that Virtu publicly supported the LTAD as soon as it was announced, presumably before they had time to review the filings. Was the LTAD proposed at Virtu’s request? Could Virtu’s CHX profitability rely on market data revenue sharing?

[8] Without using ISOs, that is. If enough displayed orders were granted a last look, then the market would be clogged with inaccessible quotes. Only traders with the legal infrastructure to submit ISOs would be able to navigate the equity market.

[9] This opinion isn’t unanimous. Some long-term traders appreciate last look, like Norway’s SWF, NBIM.

[10] James Angel’s letter argues that CHX should be given the benefit of the doubt just like IEX was. I’m not a lawyer, and I think the LTAD could hurt long-term traders, but some of his points are persuasive.

High-speed Trading Networks and Societal Value

The romanticized trader works in a caravan, braving the elements to move property and information between cities. Over millenia, humans have gone from trading physical goods with wagons and ships to trading symbolic contracts with undersea cables and specialized radio networks. Markets’ adoption of communications technology has undoubtedly benefited society. But has it gone too far? Could the benefits of further speed be outweighed by the costs? A lot of people think so. And high-speed networks aren’t cheap; a new cable between Tokyo and London may cost $850M. But while I’ve heard lots of complaints about the cost of improving network infrastructure, I haven’t seen any estimates of the benefit. I will attempt to provide one here.

The “Arms Race”

Budish, Cramton, and Shim call further investment in speed “socially wasteful.” They also describe the dynamic between trading firms as a “prisoner’s dilemma,” where firms could increase their profits if they all avoid expensive technology investments. But if one firm “deviates” and improves its speed, it can temporarily increase its profit at the expense of its rivals — until they make their own similar investments. The result is that firms are continually investing in speed, but these expenditures don’t increase industry revenue. Many HFTs, being the supposed prisoners in this dilemma, agree that this process is a waste — for example Mani Mahjouri of Tradeworx:

[W]e went from piecing together existing fiber routes to digging trenches across the country to lay straight fiber to now sending signals through microwave towers and laying new trans-Atlantic cables, doing these very, very expensive technological investments… But if you look at it from the perspective of society, that’s a tremendous amount of capital that’s being invested… In my view, maybe that would be better spent making a more competitive price, as opposed to spending it on speed

Now, capitalism is supposed to force competitors to invest in order to provide a better service. If an industry complains that high investment costs are squeezing its margins, that’s a sign that competition is working the way it should. The concern is normally the opposite, that competitors are colluding to avoid “defections” from the investment “prisoner’s dilemma.” The question is whether there’s something unique about trading networks that makes the investments worthless to society. I don’t see why there should be.

The Speed of Information and Social Welfare

It is obvious that social welfare suffers when communications between markets are sufficiently slow. A revealing example is the experience of a Mediterranean trader (c.a. 1066), who needed to know the price of silk in an away market before he could proceed with his business. He sent the following letter, where he reports that his time awaiting this information will be wasted:

The price in Ramle of the Cyprus silk, which I carry with me, is 2 dinars per little pound. Please inform me of its price and advise me whether I should sell it here or carry it with me to you in Misr (Fustat), in case it is fetching a good price there. By God, answer me quickly, I have no other business here in Ramle except awaiting answers to my letters… I need not stress the urgency of a reply concerning the price of silk

Clearly, there would have been an economic benefit if the trader had faster communications. In general, how can we estimate the size of this benefit? When prices aren’t synchronized, wealth is arbitrarily transferred between traders — some transactions receive a slightly better price and others a slightly worse price, purely through luck. These wealth transfers are zero on average, so it’d be wrong to say their cost to society is their entire amount. But it’d also be wrong to say that such transfers are completely innocuous because they’re “zero-sum.” Which means that a reasonable estimate of the societal cost of unsynchronized markets is: the volume (V_{transfer}) of wealth transfers due to price discrepancies between venues, multiplied by a factor between 0 and 1, representing the economic cost per unit wealth transfer (C_{transfer}).

Let’s start by estimating C_{transfer}. Here are three examples where markets assign a value to the cost associated with random wealth transfers:

  1. Insurance: Where customers transfer the risk of bad luck to insurance companies, paying significantly more than expected losses to offload risk. Loss ratios in the insurance industry are generally around 80%. [1] So the insurance market has a C_{transfer} of about 0.2. [2]
  2. The equity risk premium: Aswath Damodaran calculates the historical excess return of equities over short-term government debt to be 4.4% globally, and the standard deviation of the difference in returns to be 17.1% (Table 6). This means, roughly, that investors typically demand an additional 4.4% return in exchange for risking 17.1% of their capital — translating to a C_{transfer}=\frac{0.044}{0.171} \approx 0.25. Excluding the US, this ratio is 0.2, and in some countries isn’t far above 0.1 (e.g. Belgium or Norway).
  3. Jensen’s analysis of synchronization between fish markets in Kerala: After fishermen obtained mobile phones, they were able to find the nearby market with the highest price for fish, while they were still on the water. This information made it much easier for them to smooth out local fluctuations in supply and demand (see, e.g., Figure IV). Jensen estimates that mobile phones increased fishermen’s profits by 8% and decreased the average cost of fish by 4%. [3] These numbers suggest that the Kerala fish market has a C_{transfer} of at least 0.1. [4]

It’s striking that these disparate methods all give approximately the same value. For our calculation, we’ll take C_{transfer}=0.1.

Now, we just need to estimate the reduction in random wealth transfers, \Delta V_{transfer}, for a famous HFT network — The NY-Chicago microwave route seems like a good example. We’ll just make a simple, order-of-magnitude estimate. It’s reasonable to approximate \Delta V_{transfer} with the total executed volume (V) for products with multiple important trading centers connected by the route, multiplied by the typical price dispersion (\sigma_{\Delta}) prevented by a reduction in inter-market latency. The CME traded over a quadrillion dollars notional in 2015. Some of that volume is “inflated” by contracts with a high notional value (e.g. options, Eurodollars) or may not be tightly coupled with away markets — so let’s cut it by 90% and say that V \approx \$10^{14} yr^{-1} is the volume in Chicago that’s highly-correlated with important markets in New York or Europe. [5][6] We can estimate \sigma_{\Delta} by making the usual assumption that price movements are Brownian, so that volatility scales with the square root of time. Let’s just roughly assume that the relevant CME contracts have an annual volatility of 10%. This makes it easy to calculate how much their prices typically move in the ~2.5ms that wireless networks save over fiber (one-way): \sigma_{\Delta} = 0.1 yr^{-\frac{1}{2}} \sqrt{2.5 ms \cdot \frac{1.6 \times 10^{-10} yr}{ms}} \approx 2 \times 10^{-6} [7]

So we have that \Delta V_{transfer} \approx V \cdot \sigma_{\Delta} \approx \$10^{14} yr^{-1} \cdot 2 \times 10^{-6}. In other words, upgrading the Chicago-NY route from fiber to microwave prevents around $200M of arbitrarily transferred wealth per year. Multiplying this by our estimated economic cost per unit transfer, C_{transfer}=0.1, gives the societal value of this microwave route: about $20M per year.

Comparing a Network’s Value to its Cost

Laughlin, Aguirre, and Grundfest estimate that Chicago-NY microwave networks required around $140M in capital expenditures, with $20M per year in operating expenses. Our estimate of the route’s value is very rough, but still surprisingly close to their (also rough) numbers. And I imagine that some fraction of the costs, which include things like salaries and radio licensing fees, are not a complete loss to the economy. So, it’s quite plausible that these microwave networks are “worth it” to society.

There’s also the possibility that the networking technology developed for HFTs will have beneficial applications in other industries, many of which are latency-sensitive. Finance has a history of subsidizing important innovations. According to Rocky Kolb, the first application of the telescope was spotting cargo ships and using that information to trade. Galileo himself described how impressed Venetian leaders were with this application, who rewarded him:

And word having reached Venice that I had made one [a spyglass], it is six days since I was called by the Signoria, to which I had to show it together with the entire Senate, to the infinite amazement of all; and there have been numerous gentlemen and senators who, though old, have more than once climbed the stairs of the highest campaniles in Venice to observe at sea sails and vessels so far away that, coming under full sail to port, two hours and more were required before they could be seen without my spyglass.

If — in my armchair — I were put in charge of the economy and somebody came to me with a $200M project to cut inter-market latency by 3ms, I really wouldn’t know whether it was a good idea. Sure, I could do a more thorough calculation than we did here. But I bet the best decision for the economy would be based on whether people would actually pay to use the network. That’s what telecoms do already. Capital markets sometimes invest in dead-end projects, but identifying these mistakes, in advance, is rarely easy.

Reducing latency from 2 weeks to 1 second has obvious benefits to society. Going from 7ms to 4ms is more subtle. [8] But just because progress is incremental, doesn’t mean we should dismiss its value. The economy is big, and our markets process tremendous volumes. A small improvement in price discovery can make a meaningful difference.

[1] Typical loss ratios for property and casualty insurance in 2015 were about 69% (Table 1). For accident and health insurance, they were about 80% (Figures 10 and 11).

[2] Of course, buyers of insurance may be especially risk-averse and willing to pay high premiums to avoid catastrophes. But hopefully competition and regulation keeps exploitation to a minimum.

[3] Jensen estimates that consumer surplus (the welfare benefit to fish-buyers) increased by 6%, a bit bigger than the decline in fish prices.

[4] Also keep in mind that prices were still not perfectly synchronized after the introduction of mobile phones. Perhaps profits would increase and prices decrease even more if fishing boats had automated, low-latency routing. They almost certainly would if new technology made boats faster.

[5] It isn’t just S&P 500 futures in Chicago that are highly-correlated with markets in NY and Europe. You can make obvious arguments to include FX and fixed income futures in this category. Energy pricing is critically important for a wide range of asset markets. Agricultural products and metals also trade around the world, and can provide important trading signals for some equities.

[6] I don’t know whether the current lowest latency route between Chicago and Europe goes through New York. It probably depends on whether Hibernia allows customers to connect at their Halifax landing, and use their own microwave networks to shuttle data between Halifax and Chicago. I’m far from an expert on such things, but Alexandre Laumonier suggests that Hibernia may have restricted connections at a different landing:

Different informants in the industry (and one journalist) told me that Hibernia will not allow (at least for now) dishes at the Brean landing station. I tried to know more about that but the only answers I got were some “neither confirm nor deny” responses. Huh! People know but don’t talk. I wrote an email to Hibernia but I got zero answer (obviously). Then other informants told me Hibernia may finally allow dishes…

It’d be pretty funny if Hibernia let customers connect at Halifax instead of New York or Chicago, but charged extra for the ostensibly inferior service (like airlines do).

In any case, even if there are proprietary wireless networks from Chicago to Halifax, they probably share some towers with the Chicago-NY route. So maybe it’s fair to include Chicago-Europe traffic in our estimate of the economic value of Chicago-NY microwave networks.

[7] Assuming there are ~250 trading days per year, and each trading day is ~7 hours — so there are about 1.6 \cdot 10^{-10} trading years per millisecond.

[8] Matt Levine describes (as many do) “a market as a giant distributed computer for balancing supply and demand; each person’s preferences are data, and their interaction is the algorithm that creates prices and quantities.” This analogy may be helpful for understanding high-speed trading. A supercomputer’s performance can depend heavily on its interconnect. If the market is a giant supercomputer, reducing its interconnect latency from 7ms to 4ms could dramatically increase its processing power — for some tasks by 40%. For such tasks, we’d expect a large increase in inter-node traffic when latency is improved. Perhaps we are seeing this increase in modern financial markets, which have far higher trade and message volumes than in the past.

Price Impact in Efficient Markets

Market prices generally respond to an increase in supply or demand. This phenomenon, called “price impact,” is of central importance in financial markets. Price impact provides feedback between supply and demand, an essential component of the price discovery mechanism. Price impact also accounts for the vast majority of large traders’ execution costs — costs which regulators may seek to reduce by tweaking market structure.

Price impact is a concave function of meta-order [1] size — approximately proportional to the square-root of meta-order size — across every well-measured financial market (e.g. European and U.S. equities, futures, and bitcoin). There are some nice models that help explain this universality, most of which require fine-grained assumptions about market dynamics. [2] But perhaps various financial markets, regardless of their idiosyncrasies, share emergent properties that could explain empirical impact data. In this post, I try to predict price impact using only conjectures about a market’s large-scale statistical properties. In particular, we can translate intuitive market principles into integral equations. Some principles, based on efficiency arguments, imply systems of equations that behave like real markets.

In part I, we’ll start with the simplest principles, which we’ll only assume to hold on average: the “fair pricing condition”, and that market prices efficiently anticipate the total quantity of a meta-order based on its quantity already-executed. In part II, we’ll replace the fair pricing condition with an assumption that traders use price targets successfully, on average. In part III, we’ll return to fair pricing, but remove some efficiency from meta-order anticipation — by assuming that execution information percolates slowly into the marketplace. In part IV, we’ll emulate front-running, by doing the opposite of part III: leaking meta-orders’ short-term execution plans into the market. In parts V and VI, we’ll discuss adding the notion of urgency into meta-orders.

Definitions and Information-Driven Impact

We can motivate price impact from a supply and demand perspective. During the execution of a large buyer’s meta-order, her order flow usually changes the balance between supply and demand, inducing prices to rise by an amount called the “temporary price impact.” After the buyer is finished, she keeps her newly-acquired assets off the market, until she decides to sell. This semi-permanent reduction in supply causes the price to settle at a new level, which is higher than the asset’s initial price by an amount called the “permanent price impact.” Changes in available inventory cause permanent impact, and changes in flow (as well as inventory) cause temporary impact. [3]

Another view is that informed trading causes permanent impact, and that uncertainty about informedness causes temporary impact. When a trader submits a meta-order, its permanent impact should correspond in some fashion to her information. And its temporary impact should correspond to the market-estimate of permanent impact. In an “efficient” market, the informational view and the supply/demand view should be equivalent.

Before we proceed, we need some more definitions. Define \alpha(q) as the typical permanent price impact associated with a meta-order of quantity q. By “typical”, I mean that \alpha(q) is the expectation value of permanent impacts, \alpha_{s}(q), associated with a situation, s, in the set of all possible situations and meta-orders, S. Specifically, \alpha(q) = \mathbf{E}_{s \in S}[\alpha_{s}(q)]. It’s reasonable to associate the colloquial term “alpha” — which describes how well a given trade (s) predicts the price — with \alpha_{s}.

Also define \mathcal{I}(q) as the typical temporary price impact after a quantity q has been executed. Again, “typical” means \mathcal{I}(q) = \mathbf{E}_{s \in S}[\mathcal{I}_{s}(q)].

These expectations can be passed through the integrals discussed below, so we don’t need to pay attention to them. In the rest of this post, “permanent impact” will refer to the expectation \alpha(q)=E_{s \in S}[\alpha_{s}(q)] unless otherwise specified (and likewise for “temporary impact” and \mathcal{I}(q)).

I. A Bare-Bones Model

Starting from two assumptions of market efficiency, we can determine the typical price-trajectory of meta-orders. The two conditions are:

I.1) The “fair pricing condition,” which equates traders’ alpha with their execution costs from market-impact (on average):

\alpha(q) = \frac{1}{q} \int_{0}^{q} \mathcal{I}(q') dq'

The integral denotes the quantity-averaged temporary impact “paid” over the course of an entire meta-order. “Fair pricing” means that, in aggregate, meta-orders of a given size do not earn excess returns or below-benchmark returns.

Temporary price impact (black line) over the course of a meta-order of size q. After the execution is finished, the price impact decays (dashed line) to \alpha(q) (red), the quantity-weighted average of the meta-order’s temporary impact trajectory

I.2) Efficient linkage between temporary and permanent price impact:

\mathcal{I}(q') = \mathbf{E}_{q}[\alpha(q)|q \geq q'] = \int_{q'}^{\infty} \alpha(q)p[q|q \geq q']dq


Here, p[q] is the PDF of meta-order sizes and P[q] is the CDF. And p[q|q \geq q'] is the truncated probability density of meta-order sizes, \frac{p[q]}{1 - P[q']} — which represents the probability distribution of q given that quantity q' from the meta-order has already executed. This condition means that, on average, “the market” observes how much quantity an anonymous trader has executed, and uses that to estimate the distribution of her meta-order’s total quantity. “The market” then calculates an expectation value of the meta-order’s alpha, which sets the current clearing price (i.e. temporary impact). [4] [5] To emphasize, only the average temporary impact is determined this way; a single meta-order could have much different behavior. Here’s a heuristic example:
A. A trader is buying a lot of IBM stock and has so far bought a million shares.
B. The rest of the market sees signs (like higher price and volume) of that purchase and knows roughly that somebody has bought a million shares.
C. Once a trader has bought a million shares, there is a 50% chance that she’ll buy 5 million in total, and a 50% chance that she’ll buy 10 million. “The market” knows these probabilities.
D. For 5 million share meta-orders, the typical permanent price impact is 1%, and for 10 million share meta-orders it’s 2%. So “the market” expects our trader’s meta-order to have permanent impact of 1.5%. The *typical* temporary impact is determined by this expectation value. This particular meta-order may have temporary impact smaller or larger than 1.5%, but meta-orders sent under similar circumstances will have temporary impact of 1.5% on average.



An illustration of this linkage. The temporary price impact trajectory is the black line. At a given value of q', \mathcal{I}(q') (blue) is equal to the expected value (blue) of the permanent price impact given that the meta-order has size q' or bigger. The probability density of the final meta-order size, p[q|q \geq q'], is shown in shaded red. The permanent impact associated with those meta-order sizes is shown in green.

Relationship with Efficiency

The fair pricing condition could emerge when the capital management industry is sufficiently competitive. If a money manager uses a trading strategy that’s profitable after impact costs, other managers could copy it and make money. The strategy would continue to attract additional capital, until impact expenses balanced its alpha. (Some managers are protective of their methods, but most strategies probably get replicated eventually.) If a strategy ever became overused, and impact expenses overwhelmed its alpha, then managers would probably scale back or see clients pull their money due to poor performance. Of course these processes take time, so some strategies will earn excess returns post-impact and some strategies may underperform — fair pricing would hold so long as they average out to a wash.

A strictly stronger condition than I.2) should hold in a market where meta-orders are assigned an anonymous ID, and every trade is instantly reported to the public with its meta-order IDs disclosed. Farmer, Gerig, Lillo, and Waelbroeck call a similar market structure the “colored print” model. Under this disclosure regime, if intermediary profits are zero, the expected alpha would determine the temporary impact path of individual meta-orders, not just the average \mathcal{I}(q') as in I.2). All meta-orders would have the same impact path: \mathcal{I}_{s}(q') = \mathbf{E}_{q}[\alpha(q)|q \geq q'] = \int_{q'}^{\infty} \alpha(q)p[q|q \geq q']dq for any s. [6] Now, the colored print model doesn’t seem very realistic; most markets don’t have anywhere near that level of transparency. Nonetheless, Farmer et al. show preliminary measurements that partly support it. [7]

Even without colored prints, the linkage property I.2) could be due to momentum and mean-reversion traders competing away their profits. As discussed by Bouchaud, Farmer, and Lillo, most price movement is probably caused by changes in supply and demand. That is, if prices move on increased volume, it’s likely that someone activated a large meta-order, especially if there hasn’t been any news. So, if average impact overshot I.2) significantly, a mean-reversion trader could plausibly watch for these signs and profitably trade opposite large meta-orders. Likewise, if average impact undershot I.2), momentum traders might profit by following the price trend.

Solving the System of Equations

We can combine I.1) and I.2) to get an ODE [8]:

\alpha''(q) + (\frac{2}{q} - \frac{p[q]}{1-P[q]})\alpha'(q) = 0

This ODE lets us compute \alpha(q) and \mathcal{I}(q) for a given meta-order size distribution, p[q].

It’s common to approximate p[q] as a Pareto[q_{min},\beta] distribution (p[q] = \frac{\beta q_{min}^{\beta}}{q^{\beta+1}}). If we do so, then \frac{p[q]}{1-P[q]} = \frac{\beta}{q}, and the ODE has solution \alpha(q) = c_1 q^{\beta-1}+c_2. Equation I.1) implies \mathcal{I}(q) = \alpha(q)+q\alpha'(q), so we have that \mathcal{I}(q) = c_1 \beta q^{\beta-1} + c_2. Impact should nearly vanish for small q, so we can say that c_2 \approx 0. The post-execution decay in impact is then given by \frac{\alpha(q)}{\mathcal{I}(q)} = \frac{1}{\beta}

If we choose \beta = \frac{3}{2} (roughly in-line with empirical data), we get the familiar square-root law: \mathcal{I}(q) \propto \sqrt{q}. We also get an impact ratio of \frac{\alpha(q)}{\mathcal{I}(q)} = \frac{2}{3}, very close to real-world values.

A similar method from Farmer, Gerig, Lillo, and Waelbroeck gives the same result. They use the fair pricing condition, but combine it with a competitive model of market microstructure. [9] Here, instead of having a specific model of a market, we’re making a broad assumption about efficiency with property I.2). There may be a large class of competitive market structures that have this efficiency property.

Distribution of Order Sizes Implied by a Given Impact Curve

Under this model, knowing an asset’s price elasticity (\mathcal{I}(q)) is equivalent to knowing its equilibrium meta-order size distribution(p[q]). [10] If a market impact function \mathcal{I}(q) is assumed, we can calculate the meta-order size distribution. [11] For instance, Zarinelli, Treccani, Farmer, and Lillo are able to better fit their dataset with an impact function of the form \mathcal{I}(q) = a Log_{10}(1+bq) (p17). This impact curve implies a p[q] that’s similar to a power-law, but with a slight bend such that its tail decays slower than its bulk:


Meta-order size distribution implied by the impact curve \mathcal{I}(q) = 0.03 Log_{10}(1+470q), which Zarinelli, Treccani, Farmer, and Lillo fit to their dataset of single-day meta-orders. In this case, q would be analogous to their chosen measure of size, the daily volume fraction \eta. The impact function’s fit might be invalid for very large meta-orders (q \approx 1), so the lack of a sharp cutoff near q \approx 1 in the implied size distribution isn’t problematic.

II. A Replacement Principle for Fair Pricing: Traders’ Effective Use of Price Targets

The two integral equations in part I can be modified to accommodate other market structure principles. There’s some evidence that our markets obey the fair pricing condition, but it’s fun to consider alternatives. One possibility is that traders have price targets, and cease execution of their meta-orders when prices approach those targets. We can try replacing the fair pricing of I.1) with something that embodies this intuition:

II.1) \alpha(q) = a\mathcal{I}(q) + d

Where a and d are constants. This principle should be true when traders follow price-target rules, and their targets accurately predict the long-term price (on average). If d=0 and a=\frac{5}{4}, then traders typically stop executing when the price has moved \frac{4}{5} of the way from its starting value to its long-term value. If a=1 and d=0.01, then traders stop executing when the price is within 1% of its long-term value.

If we keep I.2), this gives the ODE:

\alpha'(q) + \frac{p[q](a-1)}{1 - P[q]}\alpha(q) + \frac{p[q]d}{1 - P[q]}=0

It’s readily solved. [12] In particular, if q \sim Pareto[q_{min},\beta] and a \neq 1 :

\alpha(q) = c q^{\beta (1-a)}+\frac{d}{1-a} and \mathcal{I}(q) = \frac{c q^{\beta (1-a)}+\frac{d}{1-a}-d}{a}.

For typical values of \beta \approx 1.5, we can get the usual square root-law by setting a \approx \frac{2}{3}. We need 0 < a < 1 in order for impact to be a concave, increasing function of order size, in agreement with empirical data. This suggests that perhaps traders do employ price targets, only instead of being conservative, their targets are overly aggressive. In other words, this model gives a realistic concave impact function if traders are overconfident and think their information is worth more than it is. [13] More generally, the partial reversion of impact after meta-orders’ completion could be explained with overconfidence. And when the “average” trader is overconfident just enough to balance out her alpha, the market will obey the fair pricing condition. I think there’s more to fair pricing than overconfidence, but this link between human irrationality and market efficiency is intriguing.

III. A Replacement Principle for Efficient Linkage, with Delayed Dissemination of Information

We can also think about alternatives for I.2). In I.2), “the market” could immediately observe the already-executed quantity of a typical meta-order. But markets don’t instantly process new information, so perhaps the market estimate of meta-orders’ already-executed quantity is delayed:

III.2) \mathcal{I}\left(q'\right) = \mathbf{E}_{q}[\alpha(q)|q \geq (q'-q_d)^+] = \frac{\int_{(q'-q_d)^+}^{\infty } p[q] \alpha (q) \, dq}{1-P[(q'-q_d)^+]}

Where q_d is a constant and (q'-q_d)^+ is the positive part of (q'-q_d): max(0,(q'-q_d)).
This condition should be true when the market (on average) is able to observe how much quantity an anonymous trader executed in the past, when her executed quantity was q_d less than it is in the present. This information can be used to estimate the distribution of her meta-order’s total size, and thus an expectation value of its final alpha. The temporary impact is set by this expectation value.

Intuitively, small meta-orders may blend in with background activity, but large ones are too conspicuous. If someone sends two 100-share orders to buy AAPL, other traders won’t know (or care) whether those orders came from one trader or two. But if a large buyer is responsible for a third of the day’s volume, other traders will notice and have a decent estimate of the buyer’s already-executed quantity, even if they don’t know whether the buyer was involved in the most recent trades on the tape. So, it’s very plausible for market participants to have a quantity-lagged, anonymized view of each other’s trading activity.

Combining III.2) with fair pricing I.1) gives the delay differential equation [14]:

\begin{cases} q \alpha ''(q) + \alpha '(q) \left(2-\frac{q p[q-q_d]}{1-P[q-q_d]}\right)-\left(\alpha (q)-\alpha (q-q_d)\right)\frac{p[q-q_d]}{1-P[q-q_d]}=0, & \mbox{if } q \geq q_d \\ \mathcal{I}(q)=\alpha(q)=constant, & \mbox{if } q < q_d \end{cases}.

We can solve it numerically [15]:


\mathcal{I}(q) and \alpha(q) when q is Pareto-distributed, for several values of q_d. The general behavior for q \gg q_d is similar to that of q_d=0, as in I.


The impact ratio \frac{\alpha(q)}{\mathcal{I}(q)} for several values of q_d. This ratio is 1 when the price does not revert at all post-execution, and 0 when the price completely reverts.

I gather that fundamental traders don’t like it when the price reverts on them, so some may want this impact ratio to be close to 1. Delayed information dissemination helps accomplish this goal when meta-orders are smaller than what can be executed within the delay period. But traders experience bigger than usual reversions if their meta-orders are larger than q_d. This behavior is intuitive: if a meta-order has executed a quantity less than q_d, other traders will have zero information about it and can’t react. But as soon as its executed quantity reaches q_d, the market is made aware that somebody is working an unusually big meta-order, and so the price moves considerably.

Some bond traders are pushing for a longer delay in trade reporting. One rationale is that asset managers could execute meta-orders during the delay period, before other traders react and move the market. The idea feels superficially like condition III.2), but isn’t a perfect analogy, because counterparties still receive trade confirmations without delay. And counterparties do use this information to trade. [16] So, delaying prints may not significantly slow the percolation of traders’ information into the marketplace, it just concentrates that information into the hands of their counterparties. Counterparties might provide tighter quotes because of this informational advantage, but only if liquidity provision is sufficiently competitive. [17]

In theory, it’s possible for market structure to explicitly alter q_d. [18] An exchange could delay both prints and trade confirmations, while operating, on behalf of customers, execution algorithms which do not experience a delay. This was the idea behind IEX’s defunct router, which would have been able to execute aggressive orders against its hidden order book and route out the remainder before informing either counterparty about the trades. The router would’ve increased the equity market’s q_d by the resting size on IEX’s hidden order book, which (I’m guessing) is very rarely above $100k notional — an amount that doesn’t really move the needle for large fundamental traders, especially since orders larger than q_d experience significant price reversion. Regardless, it’s interesting to think about more creative ways of giving exchange execution algorithms an informational advantage. The general problem with such schemes is that they are anti-competitive; brokers would have to use the advantaged exchange algos, which could command exorbitant fees and suffer from a lack of innovation. [19]

IV. A Replacement Principle for Efficient Linkage, with Information Leakage from Sloppy Trading or Front-Running

In III., we altered condition I.2) so that market prices responded to meta-orders’ executions in a lagged fashion. We can try the same idea in reverse to see what happens if market prices adjust to meta-orders’ future executed quantity:


\mathcal{I}\left(q_{tot},q_{executed}\right) = \begin{cases} \mathbf{E}_{q}[\alpha(q)|q \geq q_{executed}+q_{FR}] = \frac{\int_{q_{executed}+q_{FR}}^{\infty } p[q] \alpha (q) \, dq}{1-P[q_{executed}+q_{FR}]}, & \mbox{if } q_{executed}<q_{tot}-q_{FR} \\ \mathbf{E}_{q}[\alpha(q)|q=q_{tot}] = \alpha \left(q_{tot}\right), & \mbox{if } q_{executed}\geq q_{tot}-q_{FR} \end{cases}

Where \mathcal{I}\left(q_{tot},q_{executed}\right) is the temporary impact associated with a meta-order that has an already-executed quantity of q_{executed} and a total quantity of q_{tot}. q_{FR} is a constant. On average, a meta-order’s intentions are partly revealed to the market, which “knows” not only the meta-order’s already-executed quantity, but also whether it will execute an additional quantity q_{FR} in the future. If a meta-order will execute, in total, less than q_{executed}+q_{FR}, the market knows its total quantity exactly. “The market” uses this quantity information to calculate the meta-order’s expected alpha, which determines the typical temporary impact.

This condition may be an appropriate approximation for several market structure issues:

A. The sloppy execution methods described in “Flash Boys”: If a sub-par router sends orders to multiple exchanges without timing them to splash-down simultaneously, then “the market” may effectively “know” that some of the later orders are in-flight, before they arrive. If most fundamental traders use these sloppy routing methods (as “Flash Boys” claims), then we might be able to describe the market’s behavior with a q_{FR} approximately equal to the typical top-of-book depth.
B. Actual front-running: E.g., if fundamental traders split up their meta-orders into $10M pieces, and front-running brokers handle those pieces, the market will have a q_{FR} \approx \$ 10M. Though, brokers know their customers’ identities, so they may be able to predict a customer’s permanent impact with better precision than this model allows.
C. Last look: During the last-look-period, a fundamental trader’s counterparty can wait before finalizing the trade. If the fundamental trader sends orders to other exchanges during this period, her counterparty can take those into account when deciding to complete the trade. This is similar to A., except traders can’t avoid the information leakage by synchronizing their orders.

We can examine the solutions of this version of condition 2). Combining it with the fair pricing condition I.1) gives, for meta-orders with q_{tot}>q_{FR}: [20]

\alpha '(q_{tot}) \left(2-\frac{(q_{tot}-q_{FR}) p[q_{tot}]}{1-P[q_{tot}]}\right)+(q_{tot}-q_{FR}) \alpha ''(q_{tot})=0

If q_{tot} \sim Pareto[q_{min},\beta], this has solution:

\alpha (q_{tot}) = c_1 + c_2 (q_{tot}-q_{FR}){}^{\beta-1} \, _2F_1(1-\beta,-\beta;2-\beta;\frac{q_{FR}}{q_{FR}-q_{tot}})

For q_{tot} \gg q_{FR}: the _2F_1(...) \approx 1, so \alpha (q_{tot}) \approx c_1 + c_2 q_{tot}^{\beta-1}, which is the same behavior we saw in the base model I.

If we look at the solution’s behavior for q_{tot} \gtrsim q_{FR}, the story is quite different:


Permanent Impact and Peak-Temporary Impact when q_{tot} is slightly above q_{FR} = 10^{-4}, with constants c_1=0 and c_2=1. The temporary impact for a meta-order of size q_{tot} reaches its peak just before the meta-order’s end becomes known to the market, at q_{executed}=q_{tot}-q_{FR}. Peak-temporary impact goes negative when q_{tot} is sufficiently close to q_{FR}, but it’s possible to choose constants so that it stays positive (except at q_{tot}=q_{FR}, where it’s complex-valued). \alpha(q_{tot}), on the other hand, has a regular singular point at q_{tot}=q_{FR} and it is not possible to choose non-trivial constants such that \alpha(q_{tot}) is always positive. Temporary impact is calculated numerically via equation IV.2).

Under this model, meta-orders slightly larger than q_{FR} necessarily have negative long-term alpha. It’s possible that traders would adapt to this situation by never submitting meta-orders of that size, altering the Pareto-distribution of meta-order sizes so that no commonly-used q_{tot} is associated with negative alpha. But, it’s also possible that some traders would continue submitting orders that lose money in expectation. Market participants have diverse priorities, and long-term alpha is not always one of them.

V. Adding Time-Dependence

The model template above gets some general behavior right, but glosses over important phenomena in our markets. It makes no explicit mention of time, ignoring important factors like the urgency and execution rate of a meta-order. It’s not obvious how we could include these using only general arguments about efficiency, but we can imagine possible market principles and see where they lead.

For the sake of argument, say that every informed trading opportunity has a certain urgency, u, defined as the amount of time before its information’s value expires. For example, an informed trader may have a proprietary meteorological model which makes predictions 30 minutes before public forecasts are published. If her model predicts abnormal rainfall and she expects an effect on the price of wheat, she’d have 30 minutes to trade before her information becomes suddenly worthless. Of course, in real life she’d have competitors and her information would decay in value gradually over the 30 minutes, perhaps even retaining some value after it’s fully public. But let’s just assume that u is a constant for a given trading opportunity and see where it leads us.

If we try following a strict analogy with the time-independent model, we might write down these equations:

V.1) A “universal-urgency fair pricing condition,” that applies to meta-orders at every level of urgency:

\alpha(q,u) = \frac{1}{q} \int_{0}^{q} \mathcal{I}(q',u) dq'


This is a much stronger statement than ordinary fair pricing. It says that market-impact expenses equal alpha, on average, for meta-orders grouped by *any* given urgency. There are good reasons to expect this to be a bad approximation of reality — e.g. high-frequency traders probably constitute most short-urgency volume [21] and have large numbers of trades to analyze, so they can successfully tune their order sizes such that their profits are maximized (and positive). Perhaps some traders with long-urgency information submit orders that are larger than the capacity of their strategies, but I doubt HFTs do.

V.2) Efficient linkage between temporary and permanent price impact:

\mathcal{I}(q',u') = \mathbf{E}_{q,u}[\alpha(q,u)|q \geq q', u \geq u'] =\int_{u'}^{\infty}\int_{q'}^{\infty} \alpha(q,u)p[q,u|q \geq q', u \geq u']dqdu

Where p[q,u] is the PDF of meta-order sizes and urgencies, and P[q,u] is the CDF. p[q,u|q \geq q',u \geq u'] is the truncated probability distribution of meta-order sizes and urgencies, \frac{p[q,u]}{1 - P[q',\infty] - P[\infty,u'] + P[q',u']} — which represents the probability distribution of q and u given the knowledge that quantity q' from the meta-order has already executed in time u'. This is similar to the time-independent efficient linkage condition I.2). For example, a trader splits her meta-order into chunks, executing 1,000 shares per minute starting at 9:45. If she is still trading at 10:00, “the market,” having observed her order-flow imbalance, will “know” that her meta-order is at least 15,000 shares and has an urgency of at least 15 minutes. “The market” then calculates the expected alpha of the meta-order given these two pieces of information, which determines the average temporary impact.

We can combine these two equations to get a rather unenticing PDE. [22] As far as I can tell, its solutions are unrealistic. [23] Most solutions have temporary price impact that barely changes with varying levels of urgency. But in the real world, temporary impact should be greater for more urgent orders. The universal-urgency fair pricing here is too strong of a constraint on trader behavior. This condition means that markets don’t discriminate based on information urgency. Its failure suggests that markets do discriminate — and that informed traders, when they specialize in a particular time-sensitivity, face either a headwind or tailwind in their profitability.

VI. A Weaker Constraint

If we want to replace the universal-urgency of V.1) with something still compatible with ordinary fair pricing, perhaps the weakest constraint would be the following:

VI.1) \mathbf{E}_{u|q}[\alpha(q,u)] = \mathbf{E}_{u|q}[\frac{1}{q} \int_{0}^{q} \mathcal{I}(q',u) dq']

Which says that, for a given q, fair pricing holds on average across all u.

Requiring this, along with V.2), gives a large class of solutions. Many solutions have q -behavior similar to the time-independent model I, with u -behavior that looks something like this:


Stylized plot of permanent (\alpha) and temporary (\mathcal{I}) price impact vs urgency. Meta-orders of some urgencies pay more (on average) in temporary impact than they make in permanent impact, while meta-orders of other urgencies pay less than they make.

This weaker constraint leaves a great deal of flexibility in the shape of the market impact surface \mathcal{I}(q,u). Some of the solutions seem reasonable, e.g. for large u, \mathcal{I} could decay as a power of u. But there are plenty of unreasonable solutions too, so perhaps real markets obey a stronger form of fair pricing.


Price impact has characteristics that are universal across asset classes. This universality suggests that financial markets possess emergent properties that don’t depend too strongly upon their underlying market structure. Here, we consider some possible properties and their connection with impact.

The general approach is to think about a market structure principle, and write down a corresponding equation. Some of these equations, stemming from notions of efficiency, form systems which have behavior evocative of our markets. The simple system in part I combines the “fair pricing condition” with a linkage between expected short-term and long-term price impact. It predicts both impact’s size-dependence and post-execution decay with surprising accuracy. Fair pricing appears to agree with empirical equities data. The linkage condition is also testable. And, as discussed in part III, its form may weakly depend on how much and how quickly a market disseminates data. If we measure this dependence, we might further understand the effects of price-transparency on fundamental traders, and give regulators a better toolbox to evaluate the evolution of markets.

[1] A “meta-order” refers to a collection of orders stemming from a single trading decision. For example, a trader wanting to buy 10,000 lots of crude oil might split this meta-order into 1,000 child orders of 10 lots.

[2] There’s a good review and empirical study by Zarinelli et al. It has a brief overview of several models that can predict concave impact, including the Almgen-Chriss model, the propagator model of Bouchaud et al. and of Lillo and Farmer, the latent order book approach of Toth et al. and its extension by Donier et al., and the fair pricing and martingale approach of Farmer et al.

[3] Recall the “flow versus stock” (“stock” meaning available inventory) debate from the Fed’s Quantitative Easing programs, when people agonized over which of the two had a bigger impact on prices. E.g., Bernanke in 2013:

We do believe — although, you know, there’s room for debate — we do believe that the primary effect of our purchases is through the stock that we hold, because that stock has been withdrawn from markets, and the prices of those assets have to adjust to balance supply and demand. And we’ve taken out some of the supply, and so the prices go up, the yields go down.

For ordinary transactions, the “stock effect” is typically responsible for about two thirds of total impact (see, e.g., Figure 12). Central banks, though, are not ordinary market participants. But there are hints that their impact characteristics may not be so exceptional. Payne and Vitale studied FX interventions by the SNB. Their measurements show that the SNB’s price impact was a concave function of intervention size (Figure 2). The impact of SNB trades also appears to have partially reverted within 15-30 minutes, perhaps by about one third (Figures 1 and 2, Table 2). Though, unlike QE, these interventions were sterilised, so longer-term there shouldn’t have been much of a “stock effect” — and other participants may have known that.

[4] We can assume without loss of generality that the traders in question are buying (i.e. the meta-order sizes are positive). Sell meta-orders would have negative q, and the same arguments would apply, but with “\geq” replaced by “\leq“. Though, the meta-order size distribution for sell orders might not be symmetric to the distribution for buy orders (i.e. p[q] \neq p[-q]). Note that this model assumes that traders don’t submit sell orders when their intention is really to buy. There’s some debate over whether doing so would constitute market manipulation and I doubt it happens all that much, but that’s a discussion for another time.

[5] I’m being a little loose with words here. Say a meta-order in situation s has an already-executed quantity of q_{executed,s}, and the market-estimate of q_{executed,s} is \hat{q}_s. I.2) is not the same as saying that \mathbf{E}_{s \in S}[\hat{q}_s] = \mathbf{E}_{s \in S}[q_{executed,s}]. The market-estimate \hat{q}_s could be biased and I.2) might still hold. And I.2) could be wrong even if \hat{q}_s is unbiased.

[6] I’m being imprecise here. Intermediaries could differentiate some market situations from others, so we really should have: \mathcal{I}_{s_p}(q') = \mathbf{E}_{q}[\alpha_{S_p}|q \geq q'] = \int_{q'}^{\infty} \alpha_{S_p}(q)p[q|q \geq q']dq, where \alpha_{S_p} = \mathbf{E}_{s_p \in S_p}[\alpha_{s_p}(q)] is the average alpha for possible situations s_p given observable market conditions. E.g. average alpha increases when volatility doubles, and other traders know it — so they adjust their estimates of temporary impact accordingly. In this case, S_p is the set of meta-orders that could be sent when volatility is doubled. For this reason, and because impact is not the only cause of price fluctuations, the stronger “colored print” constraint wouldn’t eliminate empirically measured \mathbf{Var}_{s}[\mathcal{I}_{s}] — though it should dramatically reduce it.

[7] The draft presents some fascinating evidence in support of the colored print hypothesis. Using broker-tagged execution data from the LSE and an estimation method, the authors group trades into meta-orders. They then look at the marginal temporary impact of each successive child order from a given meta-order (call this meta-order M_{1}). In keeping with a concave impact-function, they find that M_{1}‘s child orders have lower impact if they’re sent later in M_{1}‘s execution. However, if another meta-order (M_{2}) is concurrently executing on the same side as M_{1}, M_{2}‘s child orders have nearly the same temporary impact, regardless of whether they occur early or late in the execution of M_{1} (p39-40). This means that “the market” is able to differentiate M_{1}‘s executions from M_{2}‘s!

I.2) might seem like a sensible approximation for real markets, but I’d have expected it to be pretty inaccurate when multiple large traders are simultaneously (and independently) active. There should be price movement and excess volume if two traders have bought a million shares each, but how could “the market” differentiate this situation from one where a single trader bought two million shares? It’s surprising, but the (draft) paper offers evidence that this differentiation happens. I don’t know what LSE market structure was like during the relevant period (2000-2002) — maybe it allowed information to leak — but it’s also possible that large meta-orders just aren’t very well camouflaged. A large trader’s orders might be poorly camouflaged, for example, if she has a favorite order size, or submits orders at regular time-intervals. In any case, if a meta-order is sufficiently large, its prints should effectively be “colored” — because it’s unlikely that an independent trading strategy would submit another meta-order of similar size at the same time.

A. Take a \frac{d}{dq} of I.1): \mathcal{I}(q)=q \alpha '(q)+\alpha (q)
B. Set A. equal to the definition of \mathcal{I}(q') in I.2): q' \alpha '(q')+\alpha (q')=\frac{\int_{q'}^{\infty } p[q] \alpha (q) \, dq}{1-P[q']}
C. Take a \frac{d}{dq'} of B.: q' \alpha ''(q')+2 \alpha '(q')=\frac{P'[q'] (\int_{q'}^{\infty } p[q] \alpha (q) \, dq)}{(1-P[q'])^2}-\frac{p[q'] \alpha (q')}{1-P[q']}
D. Plug B. into C. to eliminate the integral: q' \alpha ''(q')+2 \alpha '(q')=\frac{P'[q'] (q' \alpha '(q')+\alpha (q'))}{1-P[q']}-\frac{p[q'] \alpha (q')}{1-P[q']}
E. Use P'[q']=p[q']: \alpha '(q') (2-\frac{q' p(q')}{1-P(q')})+q' \alpha ''(q')=0
F. And for clarity, we can change variables from q' \rightarrow q, and divide by q (since we’re not interested in the ODE when q=0).

[9] There’s a helpful graphic on p20 of this presentation.

[10] This equivalence comes from ODE uniqueness and applies more generally than the model here. Latent liquidity models have a similar feature. In latent liquidity models, traders submit orders when the market approaches a price that appeals to them. In addition to their intuitive appeal, latent liquidity models predict square-root impact under a fairly wide variety of circumstances.

It’s helpful to visualize how price movements change the balance of buy and sell meta-orders. Let’s call N_{s}(q) the number of meta-orders, of size q, live in the market at a given situation s (a negative q indicates a sell meta-order). When supply and demand are in balance, we have \sum_{q=-\infty}^{\infty} qN_{s}(q) = 0 (buy volume equals sell volume).

Say a new meta-order of size q' enters the market and disrupts the equilibrium. This changes the price by \delta_{s}(q'), and morphs N_{s}(q) into a new function N_{s}(q, \delta_{s}(q')), with \sum_{q=-\infty}^{\infty} qN_{s}(q, \delta_{s}(q')) = -q'. I.e., a new buy meta-order will fully execute only if the right volume of new sell meta-orders appear and/or buy meta-orders disappear. Here is a stylized illustration:


Pre-impact (blue) and post-impact (orange) distributions of meta-order sizes live in the market, at an arbitrary situation s. Before a new buy meta-order (red) enters the market, the volume between buy and sell meta-orders is balanced. After the new meta-order begins trading, the distribution shifts to accommodate it. This shift is facilitated by a change in price, which incentivizes selling and disincentivizes buying.

By definition, \mathcal{I}(q) = \mathbf{E}_{s \in S}[\delta_{s}(q)], where the expectation is over all situations when a meta-order of size q might be submitted. Also by definition, N_{s}(q) — if we assume that meta-orders are i.i.d. (which would preclude correlated trading behavior like herding) — is the empirical distribution function of meta-order sizes. So N_{s}(q) and p[q] have the same shape if there are a large number of meta-orders live.

Donier, Bonart, Mastromatteo, and Bouchaud show that a broad class of latent liquidity models predict similar impact functions. Fitting their impact function to empirical data would give a latent liquidity model’s essential parameters, which describe the equilibrium (or “stationary”) p[q], as well as how it gets warped into p[q,\delta] when the price changes by \delta.

[11] From the ODE: \frac{p[q]}{1 - P[q]} = \frac{\alpha''(q)}{\alpha'(q)} + \frac{2}{q}. We can use I.1) to get \alpha(q) from \mathcal{I}(q), and thus find p[q] (for a continuous probability distribution, p[q] \propto \frac{p[q]}{1 - P[q]} e^{-\int\frac{p[q]}{1 - P[q]}dq}).

[12] That is, if a \neq 1 : \alpha(q) = \frac{d}{1-a}+K \exp \left(\int_0^q \frac{(1-a) p[q']}{1-P[q']} \, dq'\right). And in the case that a=1 : \alpha(q) = d \int_0^q \frac{p[q']}{P[q']-1} \, dq'+K.

[13] If fund managers knowingly let their AUM grow beyond the capacity of their strategies, then “overconfidence” might not be the right word. Then again, maybe it is. Clients presumably have confidence that their money managers will not overload their strategies.

A. Take a \frac{d}{dq} of the fair pricing condition I.1): \mathcal{I}(q)=q \alpha '(q)+\alpha (q)
B. Set equal to III.2): q' \alpha '\left(q'\right)+\alpha \left(q'\right)=\frac{\int_{q'-q_d}^{\infty } p[q] \alpha (q) \, dq}{1-P[q'-q_d]}
C. Take a \frac{d}{dq'} : q' \alpha ''\left(q'\right)+2 \alpha '\left(q'\right)=\frac{P'[q'-q_d] \left(\int_{q'-q_d}^{\infty } p[q] \alpha (q) \, dq\right)}{\left(1-P[q'-q_d]\right){}^2}-\frac{p[q'-q_d] \alpha \left(q'-q_d\right)}{1-P[q'-q_d]}
D. Substitute B. into C. to eliminate the integral: q' \alpha ''\left(q'\right)+2 \alpha '\left(q'\right)=\frac{\left(q' \alpha '\left(q'\right)+\alpha \left(q'\right)\right) P'[q'-q_d]}{1-P[q'-q_d]}-\frac{p[q'-q_d] \alpha \left(q'-q_d\right)}{1-P[q'-q_d]}
E. And use P'[q'-q_d]=p[q'-q_d] to get q \alpha ''(q) + \alpha '(q) \left(2-\frac{q p[q-q_d]}{1-P[q-q_d]}\right)-\left(\alpha (q)-\alpha (q-q_d)\right)\frac{p[q-q_d]}{1-P[q-q_d]}=0

[15] The solutions were generated with the following assumptions:

q \sim Pareto[q_{min}=10^{-7},\beta=\frac{3}{2}]
Initial conditions for q_d=0 : \alpha(q_{min})=10^{-5}, \alpha'(q_{min})=10^{3}
Initial conditions for q_d=10^{-5} : \alpha(q_{min})=1.1 \times 10^{-4}, \alpha'(q_{min})=10^{-2}
Initial conditions for q_d=10^{-2} : \alpha(q_{min})=1.1 \times 10^{-4}, \alpha'(q_{min})=10^{-3}
The q_d=0 solution was generated from the ODE of I.1).

[16] Here’s Robin Wigglesworth on one reason bank market-makers like trade reporting delays:

These days, bank traders are loath or unable to sit on big positions due to regulatory restrictions. Even if an asset manager is willing to offload his position to a dealer at a deep discount, the price they agree will swiftly go out to the entire market through Trace, hamstringing the trader’s ability to offload it quickly. [Emphasis added]

[17] I don’t know whether bond liquidity provision is sufficiently competitive, but it has notoriously high barriers to entry.

Even for exchange-traded products, subsidizing market-makers with an informational advantage requires great care. E.g., for products that are 1-tick wide with thick order books, it’s possible that market-makers monetize most of the benefit of delayed trade reporting. On these products, market-makers may submit small orders at strategic places in the queue to receive advance warning of large trades. Matt Hurd calls these orders “canaries.” If only a handful of HFTs use canaries, a large aggressor won’t receive meaningful size-improvement, but the HFTs will have a brief window where they can advantageously trade correlated products. To be clear, canaries don’t hurt the aggressor at all (unless she simultaneously and sloppily trades these correlated products), but they don’t help much either. Here’s a hypothetical example:

1. Canary orders make up 5% of the queue for S&P 500 futures (ES).
2. A fundamental trader sweeps ES, and the canaries give her a 5% larger fill.
3. The canary traders learn about the sweep before the broader market, and use that info to trade correlated products (e.g. FX, rates, energy, cash equities).

Most likely, the fundamental trader had no interest in trading those products, so she received 5% size-improvement for free. But, if more HFTs had been using canaries, their profits would’ve been lower and maybe she could’ve received 10% size-improvement. The question is whether the number of HFTs competing over these strategies is large enough to maximize the size-improvement for our fundamental trader. You could argue that 5% size-improvement is better than zero, but delaying public market data does have costs, such as reduced certainty and wider spreads.

[18] If q_d were intentionally changed by altering market structure, there’d probably be corresponding changes in the distribution of q and the initial conditions. These changes could counteract the anticipated effects.

[19] A more competition-friendly version might be for exchange latency-structure to allow canaries. But the loss of transparency from delaying market data may itself be anti-competitive. E.g., if ES immediately transmitted execution reports, and delayed market data by 10ms, then market-makers would only be able to quote competing products (like SPY) when they have canary orders live in ES. Requiring traders on competing venues to also trade on your venue doesn’t sound very competition-friendly.

A. Since \mathcal{I} is piecewise, split the fair pricing integral I.1) into the relevant two regions: \alpha(q_{tot}) = \frac{1}{q_{tot}} \left( \int_0^{q_{tot}-q_{FR}} \mathcal{I}(q_{tot},q_{executed}) dq_{executed} + \int_{q_{tot}-q_{FR}}^{q_{tot}} \mathcal{I}(q_{tot},q_{executed}) dq_{executed} \right)
B. Plugging in IV.2) to A.:
q_{tot}\alpha(q_{tot}) = \int_0^{q_{tot}-q_{FR}} \frac{\int_{q_{executed}+q_{FR}}^{\infty } p[q] \alpha (q) \, dq}{1-P[q_{executed}+q_{FR}]} \, dq_{executed}+q_{FR} \alpha(q_{tot})
C. Take a \frac{\partial}{\partial q_{tot}} : q_{FR} \alpha '(q_{tot})+\frac{\int_{q_{tot}}^{\infty } p[q] \alpha (q) \, dq}{1-P[q_{tot}]}=q_{tot} \alpha '(q_{tot})+\alpha (q_{tot})
D. Take another \frac{\partial}{\partial q_{tot}} : q_{FR} \alpha ''(q_{tot})+\frac{P'[q_{tot}] (\int_{q_{tot}}^{\infty } p[q] \alpha (q) \, dq)}{(1-P[q_{tot}])^2}-\frac{p[q_{tot}] \alpha(q_{tot})}{1-P[q_{tot}]}=q_{tot} \alpha ''(q_{tot})+2 \alpha '(q_{tot})
E. Subsitute C. into D. to eliminate the integral, and use P'[q_{tot}] = p[q_{tot}] : \alpha '\left(q_{\text{tot}}\right) \left(2-\frac{\left(q_{\text{tot}}-q_{\text{FR}}\right) p[q_{\text{tot}}]}{1-P[q_{\text{tot}}]}\right)+\left(q_{\text{tot}}-q_{\text{FR}}\right) \alpha ''\left(q_{\text{tot}}\right)=0

[21] The value of HFTs’ information will decay in a complex manner over the span of their predicted time period. An HFT might predict 30-second returns and submit orders within 100us of a change in its prediction. If that prediction maintained its value for the entire 30 seconds (becoming valueless at 31 seconds), then the HFT wouldn’t need to react so quickly. High-frequency traders, almost by definition, are characterized by having to compete for profit from their signals. From the instant they obtain their information, it starts decaying in value.

[22] Thanks to Mathematica.
q g(q,u)^2 \alpha ^{(2,1)}(q,u) = g(q,u) \left(g^{(1,0)}(q,u) \alpha ^{(0,1)}(q,u)+q g^{(1,0)}(q,u) \alpha ^{(1,1)}(q,u) + q g^{(1,1)}(q,u) \alpha ^{(1,0)}(q,u)+g^{(0,1)}(q,u) \left(2 \alpha ^{(1,0)}(q,u)+q \alpha ^{(2,0)}(q,u)\right)+g^{(1,1)}(q,u) \alpha (q,u)\right) - 2 g^{(0,1)}(q,u) g^{(1,0)}(q,u) \left(q \alpha ^{(1,0)}(q,u)+\alpha (q,u)\right)+g(q,u)^3 p(q,u) \alpha (q,u)-2 g(q,u)^2 \alpha ^{(1,1)}(q,u)

With g\left(q,u\right) = \frac{1}{1 - P[q,\infty] - P[\infty,u] + P[q,u]}

The procedure is to plug V.2) into V.1) and take 2 q partial derivatives and 1 u partial:

A. Inserting V.2) into V.1): \alpha \left(q,u'\right)=\frac{\int_0^q g\left(q',u'\right) \int_{u'}^{\infty} \left(\int_{q'}^{\infty} p(q'',u) \alpha (q'',u) \, dq''\right) \, du \, dq'}{q}
B. Take a \frac{\partial}{\partial q} : \alpha ^{(1,0)}\left(q,u'\right)=\frac{g\left(q,u'\right) \int_{u'}^{\infty} \left(\int_q^{\infty} p(q'',u) \alpha (q'',u) \, dq''\right) \, du}{q}-\frac{\int_0^q g\left(q',u'\right) \int_{u'}^{\infty} \left(\int_{q'}^{\infty} p(q'',u) \alpha (q'',u) \, dq''\right) \, du \, dq'}{q^2}
C. Substitute A. into B. to eliminate integrals where applicable: \alpha ^{(1,0)}\left(q,u'\right)=\frac{g\left(q,u'\right) \int_{u'}^{\infty} \left(\int_q^{\infty} p(q'',u) \alpha (q'',u) \, dq''\right) \, du}{q}-\frac{\alpha \left(q,u'\right)}{q}
D. Take another \frac{\partial}{\partial q} : \alpha ^{(2,0)}\left(q,u'\right)=\frac{g^{(1,0)}\left(q,u'\right) \int_{u'}^{\infty} \left(\int_q^{\infty} p(q'',u) \alpha (q'',u) \, dq''\right) \, du}{q}-\frac{g\left(q,u'\right) \int_{u'}^{\infty} \left(\int_q^{\infty} p(q'',u) \alpha (q'',u) \, dq''\right) \, du}{q^2}-\frac{g\left(q,u'\right) \int_{u'}^{\infty} p(q,u) \alpha (q,u) \, du}{q}+\frac{\alpha \left(q,u'\right)}{q^2}-\frac{\alpha ^{(1,0)}\left(q,u'\right)}{q}
E. Substitute C. into D to eliminate integrals where applicable: \alpha ^{(2,0)}\left(q,u'\right)=\frac{g^{(1,0)}\left(q,u'\right) \left(\alpha ^{(1,0)}\left(q,u'\right)+\frac{\alpha \left(q,u'\right)}{q}\right)}{g\left(q,u'\right)}-\frac{g\left(q,u'\right) \int_{u'}^{\infty} p(q,u) \alpha (q,u) \, du}{q}-\frac{2 \alpha ^{(1,0)}\left(q,u'\right)}{q}
F. Take a \frac{\partial}{\partial u'} : \alpha ^{(2,1)}\left(q,u'\right)=\frac{-g^{(0,1)}\left(q,u'\right) \left(\int_{u'}^{\infty} p(q,u) \alpha (q,u) \, du\right)}{q}+\frac{g^{(1,1)}\left(q,u'\right) \left(\alpha ^{(1,0)}\left(q,u'\right)+\frac{\alpha \left(q,u'\right)}{q}\right)}{g\left(q,u'\right)}+\frac{g^{(1,0)}\left(q,u'\right) \left(\frac{\alpha ^{(0,1)}\left(q,u'\right)}{q}+\alpha ^{(1,1)}\left(q,u'\right)\right)}{g\left(q,u'\right)}-\frac{g^{(0,1)}\left(q,u'\right) g^{(1,0)}\left(q,u'\right) \left(\alpha ^{(1,0)}\left(q,u'\right)+\frac{\alpha \left(q,u'\right)}{q}\right)}{g\left(q,u'\right)^2}+\frac{g\left(q,u'\right) p\left(q,u'\right) \alpha \left(q,u'\right)}{q}-\frac{2 \alpha ^{(1,1)}\left(q,u'\right)}{q}
G. To get the result, substitute E. into F. to eliminate integrals where applicable.

[23] I could be wrong, and it’s hard to define what “reasonable” solutions look like. But I checked this three ways:

1. I tried numerically solving for \alpha (and thus \mathcal{I}) assuming various joint probability distributions p[q,u] — where q and u are dependent and generated by functions of Weibull, Pareto, Log-Normal, or Stable random variables. I didn’t see any solutions where \alpha and \mathcal{I} had significant u -dependence without simultaneously having some other ridiculous feature (e.g. an infinity at small q).
2. I tried assuming \alpha(q,u) had a few reasonable forms (e.g. \alpha(q,u) \propto q^x u^{-y}) and solving numerically for p[q,u]. All the solutions I saw were not probability distributions (e.g. had negative probabilities).
3. It’s possible to solve the two integral equations directly if we assume that p and q are independent (p[q,u]=p_q[q]p_u[u]) and the solutions are separable (\alpha(q,u)=\alpha_q(q)\alpha_u(u) and \mathcal{I}(q,u)=\mathcal{I}_q(q)\mathcal{I}_u(u)). In this case, \mathcal{I}_q(q) and \alpha_q(q) obey the same ODE as the original time-independent system in part I. And \alpha_u(u)=\mathcal{I}_u(u)= constant, which isn’t realistic.

Pershing Square and Information Leakage on IEX

Pershing Square has filed an updated 13D disclosing that they sold 5 million shares of Valeant last week. Pershing almost surely considered this sale to be sensitive information, but I believe that their execution method was quite conspicuous.

In a recent blog post, I noted that some institutional flows may leave a signature in FINRA ATS data. IEX in particular seemed to have some enthusiastic customers, some of which were also their shareholders. [1] In addition to the FINRA data, IEX reports near-realtime volume data on their website, which could make customer flows detectable long before transactions are complete. [2] I noted that the last time Pershing Square traded Valeant common stock, IEX reported an anomalously high market share of Valeant trades that day:

[I]t may be more than coincidence that IEX’s share of VRX volume was anomolously high when Pershing Square recently bought 2 million shares.

It’s (almost) too easy to mention the irony if valued information has leaked because of Greenlight’s or Pershing Square’s support for IEX. Ackman’s paranoia about front-running features prominently in “Flash Boys.”

So, when IEX abruptly began reporting very high market share in Valeant on Dec. 24, it piqued my interest. After a few more days of persistently high volume, it seemed very likely that Pershing Square was trading the stock. On the 30th, I tweeted “I hope Pershing Sq read my post,” along with some screenshots of Valeant volume on IEX.

Now, IEX has other loyal customers — it could have been that Greenlight was trading Valeant for instance. But this seemed less likely because of Pershing’s close relationship with the stock. Just looking by eye [3], when IEX showed a lot of activity on Valeant, the price seemed to either drift down or stay stable (sometimes appearing “pinned”) — and, when IEX showed a pause in activity, the price tended to rise. That kind of pattern may indicate that the IEX-favoring trader was selling. [4] Short-swing profit rules also mean that Pershing Square would have been more likely to be selling than buying (see Matt Levine’s footnote #5). Altogether, the information at the time was very suggestive that Pershing Square was selling Valeant common stock. [5]

Traders have always maintained relationships with their favorite brokers and market centers. Sometimes these relationships can result in suboptimal execution quality. But hopefully, in exchange for their loyalty, traders receive other benefits. Perhaps Pershing Square has decided that furthering IEX’s “pro-investor” agenda is worth leaking their trading intentions. Or, more cynically, perhaps they’ve decided that it’s more important to support their investment in IEX. Either way, if I were Pershing Square, I’d be giving those decisions another look. [6]

[1] IEX CEO Brad Katsuyama has disclosed that some customers heavily favor IEX (at around 15m:10s):

We have some very close partners that have shifted a lot of their trading towards IEX — a third of their volume is now executed on our market.

[2] IEX doesn’t report trades or volume in their current market data protocol. Perhaps that’s an indication that IEX thinks its trade data is sensitive? It wouldn’t be hard for a machine to read the data from the website though (a tool like PhantomJS might work). That data could probably be matched with off-exchange prints on the consolidated tape to get a more precise view of IEX trades. Regardless, if and when IEX becomes an exchange, its trades will be explicitly identified on the tape.

[3] This is far from rigorous, obviously. I’d love to see an analysis of price impact and IEX volume data with large sample size.

[4] Liquidity providers would obviously know whether their counterparties on IEX had a tendency to be buying or selling Valeant in a given minute/hour/day/week. I have never used information like this to make trading decisions (nor has my company), but I believe it’d be perfectly compliant to do so. Valeant volume (single-counted) on IEX was comparable to the volume Pershing Square reported for each day. So, if Pershing sent 1/3 of their volume to IEX (see [1]), that means that their order flow probably attracted liquidity onto IEX. A lot of that may have been due to execution algos noticing the flow and choosing to interact with it. If execution algos can use information from dark pool fills to make trading decisions, then surely prop traders can too. This is probably a much more important source of information leakage than the type that IEX claims to prevent.

[5] The larger picture could have been more complex, of course. Pershing could have been selling common stock, while simultaneously buying calls or selling puts.

[6] Some crude methods to estimate the potential benefit to Pershing Square of heavily patronizing IEX:

  1. Probably around 5 million extra shares traded on IEX as a result of this (likely) routing decision. At 18mils/share, IEX would’ve made $9k extra revenue. Maybe there’s some momentum effect where, as IEX receives more volume, they attract future business. Let’s be generous then and say this volume is worth $100k to IEX and its project, and that Pershing Square believes any benefit to this project fully accrues to institutional traders like themselves.
  2. Alternatively, let’s assume that “success” for IEX means achieving the same market cap as Nasdaq, $10B. 5 million shares is about 0.2% of IEX’s monthly volume of 2-3B shares. Say that Pershing Square typically trades about once per month, so that they can increase IEX’s long-term revenue by 0.2%. Again, let’s be generous and say that 0.2% of revenue increases IEX’s chance of “success” by 1%. So, Pershing Square’s loyalty could improve IEX’s expected value by as much as $10M. It’s hard for me to parse IEX’s cap table in their exchange application, but let’s guess that Pershing Square owns 5% of it. That’d mean that Pershing Square would receive an expected-value benefit of $500k from their routing favoritism.

And estimating the cost to Pershing Square:

  1. Say that Pershing’s intention to decrease their Valeant stake leaked before they finished their trading. By what amount would Valeant’s stock move? I don’t know, but a conservative estimate should be at least 1%, right? Given the explosion in volume on IEX, what probability would the market assign to the possibility that Pershing was selling? My personal estimate at the time was over 50%, with a <10% chance that they were buying (neither I nor my company used this information to trade). So, perhaps the stock would (and did) drop by 0.5% because of this leak. If Pershing had $300M left to trade, this one incident would cost them $1.5M. And it’s not hard to imagine this number being several times higher.
  2. Instead of Valeant, say that Pershing Square wants to trade a stock that nobody expects them to. Now, people looking at IEX’s reported volume wouldn’t have much idea of the side of the trade, or that it came from Pershing Square. But, the market-makers on IEX would probably know the side. The first hour after Pershing’s meta-order begins trading, maybe these HFTs develop an inkling (10% probability) that the meta-order is large (50% of ADV). Let’s assume, as IEX seems to, that HFTs are ruthless and that they start moving the price in accordance with the meta-order’s expected market impact. Using the square-root law (e.g. p8), the 50% of ADV meta-order could be expected to move the price by sqrt(0.5) * the stock’s daily volatility. Say the daily volatility is 1%, so the market impact would be around 0.7%. It’s already been an hour though, so perhaps half of this market impact has already occurred. The HFTs have 10% confidence that this further 0.35% move will happen, so they move the price by 0.03%. If Pershing had $300M left to trade at this point, that 0.03% would cost them about $100k. And maybe Pershing trades this size 5 times per year, so the routing preference could cost them $500k/yr. This is a paranoid (and rough) estimate of HFT order anticipation, but paranoia seems to be part of the IEX ethos.

In any case, it seems to me that the cost to Pershing Square of favoring IEX out-weighs a generously calculated benefit. But I guess a million here and there might not be a big deal to them.

Possible Compromises for IEX

The most controversial aspect of IEX’s proposed design seems to be the non-uniform application of their speed bump. [1] Before the community invests too much time debating this issue, I want to discuss why the unfair access proposed by IEX is unnecessary. IEX could accomplish their stated goals without offering an informational advantage to its peg orders or router.

Apparently, IEX doesn’t apply the speed bump to incoming market data used for repricing algorithmic order types, or for communications between the exchange and their router. A lot of people (including me) have explained how these disparities can cause problems. In short, repricing algorithmic order types with market data that counterparties haven’t yet seen is equivalent to “latency arbitrage.” And, it feels anti-competitive for IEX to delay communications between itself and every unaffiliated router. [2][3] In this post, I’ll explain why these two exceptions to the speed bump aren’t needed to prevent “latency arbitrage” and “front-running.”

Protecting Peg Orders Without Privileging Them

IEX wants to make it impossible for its peg orders to be “picked off” by traders that process market data faster than IEX can. But that doesn’t actually require IEX to reprice its peg orders with fully non-delayed market data. The CME is introducing functionality that timestamps client orders at the moment they are received, then processes them in the order of those timestamps. IEX could do the same, but also timestamp incoming market data. If IEX doesn’t want to subscribe to wireless data feeds, it could subtract the latency difference between wireless and fiber links from its market data timestamps. [4] Once IEX has levelized timestamps for all messages, all it needs to do is process the messages in the correct order. This would accomplish IEX’s goal of “[ensuring] that no market participants can take action on IEX in reaction to changes in market prices before IEX is aware of the same price changes.”

If re-ordering messages with software during the shoebox delay makes the delay appear more “intentional” (which violates Reg. NMS), there are analog options too. [5] IEX could introduce smaller shoeboxes for the direct feeds it processes. For example, if IEX receives market data messages from Nasdaq 200us before any trader can act on them, then it can add a delay coil of 200us to its cable from Nasdaq. And, if it receives market data from NYSE 50us before fast traders do, then it can add a 50us coil to its NYSE feed, etc.

Either of these options would prevent IEX peg orders from being repriced in a “last look”-like manner. Here’s a stylized, bad diagram:


Preventing Information-Leakage from IEX’s Router Without Privileging It

IEX says that it delays outgoing messages to all subscribers, except their routing broker-dealer (IEXS), “to prevent “information leakage” or “liquidity fade” when IEXS routes to other markets.” Their concern is that, without this asymmetric delay, market-makers could pull their quotes on other exchanges if a trader sends a large order to IEX which partially executes before being routed out. However, IEX could prevent that “front-running” [6] by locating its router outside the speed bump in Secaucus, with clients. The router could then maintain its view of exchanges’ visible order books, including IEX’s, and time the sending of its orders so that they arrive at all exchanges simultaneously.

IEX suggests that competing routers could operate in this way, so IEX should be aware that its router could do the same. [7][8] But there is a drawback. The router would only know the visible quantity posted on IEX, and wouldn’t be able to optimally interact with IEX’s hidden orders. The only way a router can fully access hidden liquidity at a given exchange is by operating sequentially: first sending an order to that exchange, waiting to hear back, then sending any unfilled balance to other markets. The whole point of hidden liquidity is that you only know it’s there after you (or others) trade with it. [9]

By allowing its router to bypass the speed bump, IEX effectively gives it exclusive access to IEX’s hidden order book information. That special access only lasts for the speed bump duration of 350us, but it still seems problematic. Lava was fined for allegedly using information from its hidden order book to help routing decisions at an affiliate. Matt Levine argues that this offense was (mostly) victimless:

What ColorBook did with the hidden orders is route its customers to those hidden orders… Once they submitted an order to buy X shares at Y price, ColorBook would send it toward the hidden orders. That’s exactly what you want when you submit a hidden order!

Which could certainly be true, though those hidden order users might not have liked interacting with flow from the ColorBook router. [10]

The argument to allow IEX to favorably treat its router is pretty much the same as Levine’s point about Lava. Such treatment, if fully disclosed, would probably improve fill rates for users of both the router and IEX hidden orders. It does, however, hurt users of non-IEX routers (and non-IEX resting orders, which miss fills). The question is whether exchanges should be permitted to help their users via any means, or whether they have to consider the broader competitive landscape. Should BATS be permitted to make routing decisions based on special access to Edge’s hidden orders? The same trade-offs apply.

Regardless, IEX is perfectly capable of operating a router immune to “front-running,” without giving it preferential access. This issue is not about “front-running,” it’s about accessing hidden liquidity. [11]


The rhetoric surrounding IEX has always been too hot for reasonable debate. That’s a shame. I think that there’s room for a compromise which allows IEX to accomplish its goals, while also satisfying automated traders and competitors. The “Flash Boys” would just have to admit that, sometimes, people who they hate make good points. Maybe that’s part of growing up. [12]

[1] This post, as always with IEX, is speculative. Their currently posted exchange application doesn’t have much information on the speed bump and when it applies. IEX’s comment letters provide more detail, but there are still some uncertainties in my mind as to what exactly their market model entails.

[2] IEX sort of denies this:

IEXS, the routing broker‐dealer, does not route to IEX and all orders, routable or otherwise, must pass through the POP, so there is no competitive disparity in terms of access to IEX’s trading system.

But also:

Following completion of routing actions, as instructed by the Exchange, any unfilled balance of shares returns to the Exchange for processing in accordance with applicable rules. That message does not traverse the POP

[3] IEX favorably treating its router could prompt other exchanges to create similar arrangements for their own routers, putting brokers’ smart order routers and small exchanges at a competitive disadvantage. I don’t really understand why IEX would want that to be permitted. If a larger exchange like Nasdaq were to introduce a speed bump that doesn’t apply to its router, traders would be strongly incentivized to use Nasdaq’s router, and nobody would use IEX’s. I’d think that a startup exchange would be most supportive of Reg NMS’s spirit of fair competition.

IEX’s peg order treatment could raise questions about fair competition as well. Traders and brokers may be forced to use IEX’s algorithmic order types rather than their own. Citadel expressed concern that IEX could one day “charge more to execute pegged orders… that have an inherent time advantage over other order types.” And perhaps IEX already does — by charging a higher rate for hidden orders. My understanding is that all hidden orders on IEX are effectively midpoint pegs which are repriced using non-speedbumped market data. It’s not unusual for an exchange to charge extra for hidden executions, but providing a latency advantage to hidden orders raises new questions about their fees.

[4] For example, if IEX receives a book update from Nasdaq at 10:00:00.000000 over fiber with a 1-way latency of 200us, but they know the fastest wireless link has a 1-way latency of 100us, then IEX could recalibrate the timestamp of that book update to 9:59:59.999900. That would represent the time that the fastest trader could have received the same market data message (100us earlier than IEX). There are some wrinkles when you consider that wireless links are not always operational, so if IEX were to be completely fair it would not perform this subtraction when the weather is bad. Rather than deal with that issue, it may be easier for IEX to just subscribe to wireless feeds from the most important markets. It’d probably cost a total of around $50k/mo, which doesn’t sound like a big burden.

[5] I don’t see why it should matter whether a delay has software components, but I’m also not a lawyer.

[6] Or whatever they’re calling it these days.

[7] Top of p15.

[8] Though I don’t understand the extent that an exchange can act in a broker-like capacity. Perhaps locating their router in a different datacenter and offering functionality similar to brokers’ smart order routers (SORs) crosses some line? If so, that still seems better than their proposal to offer systematically-advantaged SOR-like functionality?

[9] IEX seems rather dismissive of sequential routing in a comment letter (p16). But sequential routing does have its advantages. Not every user wants to fully access lit quotes without regard for market impact, price improvement, or fees.

[10] This depends on many factors, including the toxicity of ColorBook routed orders. If the information sharing between the Lava ECN and the ColorBook router were disclosed to hidden order users, they could have taken that into account before sending those hidden orders. As Levine says:

That wasn’t disclosed in its filings, or consistently disclosed in its advertising. (Though it was sometimes: This is not so much “a fact LavaFlow kept secret” as it is “a fact LavaFlow forgot to tell people.

How consistently has IEX disclosed any favorable access it affords its router? I don’t know, but Citadel says:

While it is not explicit in the Application, IEX has explained informally that the IEX Router would not be required to go through the IEX Access Delay to access the IEX trading system or when routing orders from the IEX trading system to other market centers.

[11] There’s also an argument that allowing the IEX router to skip the speed bump guarantees that any unfilled portion of a routable order will be first in the queue when it returns to IEX. Ignoring the issue of whether IEX should be able to offer this benefit only to clients of its router, I don’t think it’s actually true. I don’t know exactly how IEX’s router works. But if it submits orders so that they hit BATS at the same time as NYSE, it should be possible for a trader to react to the sweep on BATS and submit an order to IEX more than 350us before IEX hears back from NYSE.

[12] Matt Levine is responsible for creating the pun. I’m responsible for using it badly.

Can We Tell Who Trades on Which Dark Pools?

Marketplace transparency ensures that investors receive a fair price and have accurate data to conduct their research. But, transparency can also make it harder for traders to conceal their intentions from competitors and counterparties. Exchanges and regulators are tasked with balancing the transparency needs of a market’s customers. Dark pools, by operating with the minimal amount of transparency permitted, are meant to help institutions hide their order flow. They do this, roughly speaking, in two ways:

  1. Lack of pre-trade transparency. Orders are invisible on dark pools until they execute.
  2. Reduced post-trade transparency. Dark pools are required to quickly report trades to the consolidated tape, but this process is not instant. Subscribers to the public tape also don’t know which dark pool (or wholesaler/ELP) reported a given trade.

Market structure is always changing, and there’s a new wrinkle to #2. FINRA Rule 4552 specifies that weekly dark pool volume be published per security.* The data is made public on a 2-week delayed basis, but as we’ll see, it may still have some informational value.

13F Holdings Data

Regulation also requires that large asset managers report their end-of-quarter long positions, within 45 days. [1] Many hedge funds wait until the last minute to file their 13Fs, which suggests that they consider the disclosed information to be valuable.

Some hedge funds, like Greenlight Capital, publicly promote the dark pool IEX. Greenlight also owns a stake in IEX, so it may make sense for it to preferentially trade there. We can combine the 13F-reported changes in Greenlight’s long positions with the FINRA 4552 data to get an idea of whether it trades disproportionate volume on IEX. Here’s a density plot of Greenlight’s quarterly trading activity versus IEX’s:


A measure of Greenlight’s volume versus a measure of IEX’s market share, for each stock and quarter. The x-axis is: \log [c + \frac{V_{a,s}V}{V_{a}V_{s}}] , where c is a small constant 10^{-15}, V is the total quarterly volume across all ATSs and stocks, V_{a} is the total quarterly volume on the given ATS a (in this case a is IEX), V_{s} is the quarterly volume on a given stock (across all ATSs), and V_{a,s} is the quarterly volume on a given ATS (IEX) and stock. The y-axis is: \log [c_{f,0.05} + \frac{V_{f,s}}{V_{s}}] , where V_{f,s} is the trading volume for the fund f (Greenlight) implied by the change in 13F-reported position for the stock s, c_{f,0.05} is the 5th percentile of the first quarter’s (ending 9-30-2014) 13F-implied volume for the fund f. Each data point is from a given quarter and stock.

It does look like something is going on here. The above is for the entire universe of NMS Tier 1 stocks. What if we limit it to stocks that we suspect Greenlight is more likely to trade? Here is a similar plot, restricted to stocks in which Greenlight reported long positions in their previous quarter’s 13F:


Similar to above. Includes a linear regression with shaded 95% confidence intervals.

Obviously, correlation is different from causation, but this relationship indicates that Greenlight may direct a lot of volume to IEX. IEX also reports near-realtime volume on its website, so one could potentially detect when Greenlight is currently trading a stock. Pershing Square, another backer of IEX, trades too infrequently to make a similar analysis worthwhile, but it may be more than coincidence that IEX’s share of VRX volume was anomolously high when Pershing Square recently bought 2 million shares. [2]

It’s (almost) too easy to mention the irony if valued information has leaked because of Greenlight’s or Pershing Square’s support for IEX. Ackman’s paranoia about front-running features prominently in “Flash Boys.” [3] And Greenlight sometimes has felt that even 13F disclosure harms its business. [4]

A Broader Analysis

It seemed fun to check if any other hedge funds had easily-detected dark pool preferences. I selected the top 100 funds listed on Octafinance and attempted to query Jive Data for their 13F data for the 5 quarters leading up to June 30, 2015. I then did a Lasso regression of the relative volume of each hedge fund on the relative volume of all dark pools (these volume measures are defined in the caption of the first plot), using the first 4 quarters of data. The 5th quarter was used as test data. The regression only includes data points where the given fund was active in a stock that quarter. [5] It’s not anything fancy, but this process hopefully catches some superficial relationships like Greenlight’s with IEX. Here’s the R script used, as well as the plots and tables it outputs.

See “lassoResultsWhenFundTraded_LogHFAnomVol_on_LogAtsAnomVol.csv” (in the second zip file above) for a summary table of the Lasso results. [6] Care is needed when assigning statistical significance to such a large number of regressions, but lots of things stick out. Mariner Investment Group appears to be one of the more detectable funds [7], with test-set R^{2} not much below 0.5.


Predicted and actual volume measures for Mariner Investment Group’s test data.

It appears as if Mariner likes to trade on Level ATS, and tends to avoid Sigma-X and the UBS dark pool. We can’t disentangle a fund’s routing decisions from other reasons for these correlations — e.g. a fund may be more likely to trade a stock if high retail participation has distorted its price, making the fund’s activity correlated with that of Interactive Brokers’ ATS (IATS), even if the fund doesn’t trade there. [8] But, there appears to be a tendency for hedge funds to route away from UBS’s ATS; Tortoise Capital Advisors is the only fund with a positive coefficient for UBS, and many have large negative coefficients. I don’t know the reason for that, it may be that hedge funds are displeased with the execution quality, or just that they’re not UBS’s clientele. If it’s the former, this analysis raises a sticky dilemma for traders who want to hide their intentions: If you don’t like a certain venue, your information leakage might rise if you avoid it. If that’s really the case, you may want to route there even if you think they do shady stuff. Sometimes, fixing market structure requires collective action, and we need regulators to effect that action on our behalf.

Some highly active funds have a surprisingly large test R^{2}. It’s possible that whenever you can make a confident prediction about a fund’s volume, it may turn out to be especially hard to predict the direction of that volume. I wonder if that’s the case for Citadel Advisors (their prediction has an R^{2} near 0.1), because I really would expect Citadel to be sophisticated enough to cloak their trading. Some highly active funds that appear to have more detectable flows include: Bridgewater (R^{2} ~ 0.07) , Millenium (R^{2} ~ 0.05), Royce (R^{2} ~ 0.1, which apparently likes Morgan Stanley’s ATS, and avoids JP Morgan’s), BlueMountain (R^{2} ~ 0.07, possibly likes MS, and avoids UBS), Tudor (R^{2} ~ 0.1, possibly avoids UBS), Carlson (R^{2} ~ 0.13, which may have prefered ITG [9], and traded more volume on stocks less active on Fidelity’s and Interactive Brokers’ ATSs), and Ellington (R^{2} ~ 0.2). Highbridge, Adage, D.E. Shaw, and both Two Sigma entities have very weak detectability (R^{2} ~ 0.03). AQR, Renaissance, and Visium probably leak little or no volume information this way.

Plenty of less active funds have sizable R^{2}s too. But I do find it interesting to discuss the example where the prediction arguably fails most. The prediction for Magellan Asset Management does not do well during the test quarter:


Predicted and actual volume measures for Magellan Asset Management’s test data.

The largest component in its regression was an apparent tendency to trade on IEX. This relationship suddenly reversed in the last quarter:


Magellan’s relative volume vs anomalous IEX volume, for stocks that Magellan traded in the given quarter. A linear regression is shown with 95% confidence bands, for each quarter’s relationship.

Was Magellan formerly a big user of IEX, but started avoiding it in Q2? We can’t be sure because the tendency was found via data snooping, but it is suggestive. If so, Magellan may have taken the easiest countermeasure of all, changing their behavior.

Unpredictable Trading

The key to avoiding this sort of leakage is to trade unpredictably, or at least, to trade in the same manner as the population norm. Which, in my view, means that Einhorn’s reasoning described in “Flash Boys” could be almost exactly wrong:

After listening to Brad’s pitch, Einhorn asked him a simple question: Why aren’t we all just picking the same exchange? Why don’t investors organize themselves to sponsor a single stock exchange entrusted with guarding their interests and protecting them from Wall Street predators?

Block trading can be a valuable service, but its utility has a limit. To see why, say that 100 high-alpha investors agree to exclusively trade on a single venue, and public documents show that only one of them owns Micron stock. Suddenly, that venue reports an unusual volume of Micron trades. With a bit of ancillary data (perhaps news articles or observed price impact), other traders might ascertain whether that investor is reducing or adding to her position.

I imagine that this type of information leakage can occur on lit exchanges too. The major exchanges have more volume to camouflage institutional executions. But if a hedge fund were to preferentially trade at a minor exchange (or blacklist a major one) their activity may leave a signature. Investors who persistently use the same execution algorithms (or algorithmic order types) could perhaps even leak the side of their transactions. [10]

The premise of Reg. NMS is that competition between exchanges lowers costs and prevents abuses. If an upstart venue is widely seen as superior, it will rapidly attract market share. People dissatisfied with the major exchanges have yet to reach consensus on an alternative. Which means that if they unreservedly support their favorite upstart, their execution quality can suffer. That must be frustrating. It’s understandable then if upstarts try schemes that force participants to use their venue. NYSE has suggested that IEX’s design includes anti-competitive routing practices and peg order handling. Unless traders disaffected by market fragmentation stop being fragmented themselves, their only way forward is to attack the fundamentals of Reg. NMS. I’m not sure it’s the answer [11], but it shouldn’t be a surprise if market critics are wistful for the days when they traded on a single, monolithic exchange.

[1] Particularly large positions have to be reported sooner. Short positions do not have to reported in the US, though there is a movement to change this. Large short positions in European equities have to be reported quickly, and I’d be curious to see this post’s analysis repeated with the higher resolution European data.

[2] Here’s a screenshot of IEX’s most traded stocks on Oct 21, shortly after the close. A very large chunk of this volume appeared before Pershing Square announced that they had traded (though I didn’t take a screenshot). This is a good opportunity for me to remind you that nothing on this blog is trading or investment advice.

[3] From “Flash Boys” (emphasis added):

Bill Ackman runs a famous hedge fund, Pershing Square, that often buys large chunks of companies. In the two years before Katsuyama turned up in his office to explain what was happening, Ackman had started to suspect that people might be using the information about his trades to trade ahead of him. “I felt that there was a leak every time,” Ackman says. “I thought maybe it was the prime broker. It wasn’t the kind of leak that I thought.”

It never is, is it?

[4] Greenlight has also said:

We believe that the best response for any investors that are worried about fast computers taking advantage of them is to ask that their orders be routed to IEX.

But what about investors worried about slow traders “taking advantage” of them? In that case, maybe they should think twice before sending all of their volume to IEX?

[5] Which means that in order to use this particular method to predict the volume of a fund’s activity in a given stock, you’d need to know whether they’re likely to be trading it at all. Perhaps that’s doable sometimes. But, in any case, it’s not what I’m trying to do here. This post is just to see whether funds might have any detectable preferences, not to determine if those preferences create trading opportunities.

[6] Which contains coefficients given by the Lasso regression of each hedge fund’s relative volume on ATS’s anomalous volume. Each (quarter, stock) pair is a data point. Mean-square error is given for the training set (Total_MSE_Train) and test set (Total_MSE_test). A measure of R^{2} is given for the training set (R-Squared_MSE_Test) and test set (R-Squared_MSE_Test) — note that the R^{2} is a bit unusual for the test set, in that it uses the mean from the training set as its “null prediction.” Sample sizes for each set are given by n_Train and n_Test.

[7] Their equity portfolio consists mostly of ETFs and biotech, so this could be an artifact.

[8] In that instance, IATS trading activity could still be a useful predictor of hedge fund volume.

[9] ITG’s volume has collapsed after being fined for prop trading in its own dark pool. I would imagine that they’ve lost many customers since the end of the last quarter in this dataset (June 30), so prediction accuracy may be lower for later quarters.

[10] If there’s demand for it, maybe I can look into whether any market data patterns are correlated with institutional flows.

[11] For one thing, it’s not clear why a movement to make trading infrastructure more utility-like should stop with exchanges. What about brokers, execution algorithms, and intermediaries? I think that similar game-theoretical dilemmas could apply to those groups too. Restructuring a competitive industry into a state-supervised monopoly is partly an admission that there’s no prospect of further value-adding innovation. As Cliff Asness says:

[I]t’s the argument monopolists always make — that they are really only trying to create efficiencies and eliminate waste for the customer.

* ATS data is provided via and is copyrighted by FINRA 2015.