A short thread on this interesting new paper in JoE on testing the appropriate level of clustering. TL;DR I do not think this is really a test of the *appropriate* level of clustering. Nor do I think such a test exists. But I do think it may spur a

Thread

A short thread on this interesting new paper in JoE on testing the appropriate level of clustering.

TL;DR I do not think this is really a test of the *appropriate* level of clustering. Nor do I think such a test exists.

But I do think it may spur an interesting discussion 1/N

First, what does the paper do?

It develops a test for the null that “clustered and non-clustered SEs are the same (asymptotically)”

If the test rejects, we can say with confidence that whether you cluster or not matters 2/N

The implicit assumption then seems to be that if the level of clustering matters, then you should cluster rather than use heteroskedasticity-robust SEs.

But I’m not sure this is right. 3/N

Suppose I have an RCT where I assign treatment at the individual level. We know heteroskedasticity-robust SEs are valid. But I may get different SEs if I cluster at the state level. This paper’s test, it seems, would suggest I should cluster at the state level. 4/N

More generally, I do not think it’s possible to develop a test for the appropriate level of testing b/c the appropriate level depends on the Q we want to answer.

The exact same dataset may warrant different clustering schemes depending on what we want to learn from it. 5/N

Let’s start w/an example. Suppose I have sampled 1000 ppl iid from 3 states, say CT, MA, and RI. I estimate average wages. Should I cluster my SEs? 6/N

Suppose first that I care about those three states in particular. Maybe I’m advising the governors of Southern New England. I have an iid sample from the population I care about, so clearly I don’t need to cluster at all. 7/N

Now, sps I care about the avg in the entire US. But I only had $ to send surveyors to 3 states. And I happened to draw CT, MA, and RI. Now, I need to cluster because I effectively only have 3 observations out of 50 instead of 1000 out of the pop of the US. 8/N

So, with exactly the same data, the answer to whether I need to cluster depends on the Q I’m trying to answer. Or, in other words, the *estimand*.

Clearly, a function I write in R or Stata can’t tell me what Q I’m trying to answer [maybe GPT will be able to eventually] 9/N

Put otherwise, frequentist inference requires me to imagine repeated draws of data from the same DGP.

If I care abt only Southern NE, I imagine re-drawing 1000 ppl from the same 3 states.

If I care abt the whole US, I imagine re-drawing another 3 states and sampling again 10/N

A statistical test is based on the data that you have. It can’t tell you about the alternative datasets that you might have received from the same DGP. Only you can tell me that. 11/N

Similar points arise if we imagine re-drawing treatments from some assignment mechanism rather than re-sampling units from a super-population. 12/N

The points in this thread are not new, but I think this is one of the trickiest issues in applied work.

I found this paper by Abadie Athey Imbens and @jmwooldridge to be very helpful 13/N

academic.oup.com/qje/article/138/1/1/6750017

And, selfishly, I’ll add that @asheshrambachan and I have a paper extending this design-based view of uncertainty to “quasi-experimental” settings. N/N

arxiv.org/abs/2008.00602

Mentions

See All

Arindrajit Dube @arindube · Apr 11, 2023

Post
From Twitter

Good thread.

Thread by Jonathan Roth

Thread

Mentions