yellow brick road to stats heaven

~ a loose collection of statistical and quantitative research material for fun and enrichment ~

by roland b. stark

critique: occasional commentary on research methods and analyses

yellowbrickstats home | my statistical and research consulting

a brilliant way to investigate the effects of public protest using a natural experiment

sep. 9, 2017

Read Dan Kopf's excellent Quartz summary of a study by Andreas Madestam, Daniel Shoag, Stan Veuger, and David Yanagizawa-Drott from Harvard and Stockholm Universities. Want to know to what degree political demonstrations produced results in elections? Track the rain. The rain? It actually makes a beautiful example of what's termed an instrumental variable. Whether it rains at protest locations can scarcely have anything directly to do with ultimate election results, but it unquestionably relates to turnout for each demonstration. If the size of turnout relates to election results, then the rain should, statistically (if not causally), relate to them as well. "If the absence of rain means bigger protests, and bigger protests actually make a difference, then local political outcomes ought to depend on whether or not it rained [on protest days]...As it turns out, protest size really does matter."

how not to attribute causality from statistical results

sep. 9, 2017

From a major outlet for health care research findings, Fierce Health Care . I've reproduced a key passage and commented inline in color.

"Employment status is the top socioeconomic factor affecting 30-day [US hospital] readmissions for heart failure, heart attacks or pneumonia, according to a new study from Truven Health Analytics.

[Such a conclusion is on very shaky ground, as you'll see.]

As readmission penalties reach record highs, analyzing causes is more important than ever.


Researchers, led by David Foster, Ph.D., collected 2011 and 2012 data from the Centers for Medicare & Medicaid Services and used a statistical test called the Variance Inflation Factor (VIF) for correlations among the nine factors in the Community Need Index (CNI): elderly poverty, single parent poverty, child poverty, uninsurance, minority, no high school, renting, unemployment and limited English.

[In truth, the VIF tells not what is the most important factor, but only to what extent the different factors, or independent variables, overlap with one another, potentially confounding the results. In this case, trying to isolate one indicator of socioeconomic status (SES) while controlling for eight others will surely distort the connection between any of these indicators and the outcome. These SES indicators are too much "part and parcel of" one another, too inseparable, to allow for valid use of control in this way. It's a mistake to ask "How much does SES (version 1) relate to readmission if we statistically remove SES (versions 2-9) from the relationship?" Much like saying, "How addicted am I to desserts if you discount my intake of cookies, pie, and ice cream?" Or there's Monty Python's "Apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system, and public health, what have the Romans ever done for us?"]

Their analysis found unemployment and lack of high school education were the only statistically significant factors in connection with readmissions, carrying a risk of 18.1 percent and 5.3 percent, respectively, according to the study."

[As explained above, these are not valid conclusions to be drawn. But even if the numbers were somehow accurate, what could such statements mean? That readmission risk becomes on average 5.3% for non-high-school graduates? Can't be -- way too low. That it's 5.3 points higher than it would be otherwise? Can't be -- too high. 5.3% higher in relative terms? Maybe, but that would hardly merit calling high school education an important factor. So what's left?

readmission rates: 58% of variance explained!?

nov. 18, 2015

"Fifty-eight percent of national variation in hospital readmission rates was explained by the county in which the hospital was located," announce Jeph Herrin et al. in Community Factors and Hospital Readmission Rates, published in 2014 in Health Services Research . Sound odd to you? After all, for most readmission studies the percent explained is in single digits. Being able to account for 4 or 5% of the variation translates to an ability to assess individual risk that can meaningfully aid in clinical decisions. Even Harlan Krumholz and his team of 17 researchers and statisticians, the ones whose predictive models form the basis for the national readmission penalty system imposed by Medicare, have usually only explained 3-8%. And those models have taken into account about 50 input variables.

It turns out that Herrin et al. took their data on 4,073 hospitals and broke it down by 2,254 counties. There were almost as many counties as hospitals themselves. And many counties contained only a single hospital.

Now, suppose the authors had divided the 4,073 into, say, 4 groups defined by region, and found that the 4 groups had sizeable differences in readmission rate. That would have been a meaningful way to summarize the data. Even if they had formed somewhat more groups -- say, one for each of the 50 states -- that might have been meaningful; the data would have been spread pretty thin for some states. But to "explain" differences using 2,254 groups? It's not a far cry from simply listing the readmission rates of all 4,073 hospitals and claiming victoriously to have "explained" 100% of the variance in the hospital-to-hospital rate. Sounds like a feat for Captain Obvious .

One reason why this matters a great deal is that, to the extent that some geographic factor is considered responsible for this outcome, hospital performance will no longer be considered responsible. So if county in fact explained 58% of the variance, then hospital performance, it might be argued, couldn't account for more than 42%. This is the incorrect conclusion that was reported in unqualified fashion by news outlets such as Becker's Hospital Review.

The article by Herrin and colleagues makes contributions in other ways, of course, but the chief findings are very misleading. Watch for dialogue, in Health Services Research or elsewhere, on how to interpret the results. The upshot should be quite a bit more nuanced and moderated than what we've seen above. And if you're interested in the role of socioeconomic factors in hospital readmission, you'll find information at ReInforced Care, Inc.

Women's March Jan. 2017

Captain Obvious