Saturday, November 5, 2011

Thinning to reduce autocorrelation: Rarely useful!

Borys Paulewicz, commenting on a previous post, brought to my attention a very recent article about thinning of MCMC chains: Link, W. A. & Eaton, M. J. (2011) On thinning of chains in MCMC. Methods in Ecology and Evolution. First published online 17 June 2011. doi: 10.1111/j.2041-210X.2011.00131.x The basic conclusion of the article is that thinning of chains is not usually appropriate when the goal is precision of estimates from an MCMC sample. (Thinning can be useful for other reasons, such as memory or time constraints in post-chain processing, but those are very different motivations than precision of estimation of the posterior distribution.)

Here's the idea: Consider an MCMC chain that is strongly autocorrelated. Autocorrelation produces clumpy samples that are unrepresentative, in the short run, of the true underlying posterior distribution. Therefore, if possible, we would like to get rid of autocorrelation so that the MCMC sample provides a more precise estimate of the posterior sample. One way to decrease autocorrelation is to thin the sample, using only every nth step. If we keep 50,000 thinned steps with small autocorrelation, then we very probably have a more precise estimate of the posterior than 50,000 unthinned steps with high autocorrelation. But to get 50,000 kept steps in a thinned chain, we needed to generate n*50,000 steps. With such a long chain, the clumpy autocorrelation has probably all been averaged out anyway! In fact, Link and Eaton show that the longer (unthinned) chain usually yields better estimates of the true posterior than the shorter thinned chain, even for percentiles in the tail of the distribution, at least for the particular cases they consider.

The tricky part is knowing how long of a chain is long enough to smooth out short-run autocorrelation. The more severe the autocorrelation is, the longer the chain needs to be. I have not seen rules of thumb for how to translate an autocorrelation function into a recommended chain length. Link and Eaton suggest monitoring different independent chains and assaying whether the estimates produced by the different chains are suitably similar to each other.

For extreme autocorrelation, it's best not to rely on long-run averaging out, but instead to use other techniques that actually get rid of the autocorrelation. This usually involves reparameterization, as appropriate for the particular model.

Link and Eaton point out that the inefficiency of thinning has been known for years, but many practitioners have gone on using it anyway. My book followed those practitioners. It should be pointed out that thinning does not yield incorrect results (in the sense of being biased). Thinning merely produces correct results less efficiently (on average) than using the full chain from which the thinned chain was extracted. There are at least a couple more mathematically advanced textbooks that you might turn to for additional advice. For example, Jackman says in his 2009 book, Bayesian Analysis for the Social Sciences, "High levels of autocorrelation in a MCMC algorithm are not fatal in and of themselves, but they will indicate that a very long run of the sampler may be required. Thinning is not a strategy for avoiding these long runs, but it is a strategy for dealing with the otherwise overwhelming amount of MCMC output. (p. 263)" Christensen et al. say in their 2011 book, Bayesian Ideas and Data Analysis, "Unless there is severe autocorrelation, e.g., high correlation with, say [lag]=30, we don't believe that thinning is worthwhile. (p.146)" Unfortunately, there is no sure advice about how long a chain is needed. Longer is better. Perhaps if you're tempted to thin by n to reduce autocorrelation, just use a chain n times as long without thinning.

3 comments:

  1. Dear Dr. Kruschke,

    Thank you for a very interesting post. Would you happen to know where I could get hold of the paper you mention ("On thinning of chains in MCMC")? I do not have access to it through my university library...

    Thanks again for another great post.

    Patrick

    ReplyDelete
  2. I think you could find this paper interesting http://arxiv.org/abs/1510.07727

    ReplyDelete
  3. Here are more details regarding the previous comment. I have not read the paper, so am not endorsing it, merely reporting more info for interested readers.

    Active link: http://arxiv.org/abs/1510.07727

    Info:

    Statistically efficient thinning of a Markov chain sampler

    Art B. Owen

    (Submitted on 27 Oct 2015 (v1), last revised 11 Nov 2015 (this version, v3))

    It is common to subsample Markov chain samples to reduce the storage burden of the output. It is also well known that discarding k−1 out of every k observations will not improve statistical efficiency. It is less frequently remarked that subsampling a Markov chain allows one to omit some of the computation beyond that needed to simply advance the chain. When this reduced computation is accounted for, thinning the Markov chain by subsampling it can improve statistical efficiency. Given an autocorrelation parameter ρ and a cost ratio θ, this paper shows how to compute the most efficient subsampling frequency k. The optimal k grows rapidly as ρ increases towards 1. The resulting efficiency gain depends primarily on θ, not ρ. Taking k=1 (no thinning) is optimal when ρ≤0. For ρ>0 it is optimal if and only if θ≤(1−ρ)2/(2ρ). The efficiency gain never exceeds 1+θ.

    ReplyDelete