Saturday, January 26, 2013

Bayesian disease diagnosis with uncertain prior probabilities

A standard textbook example of Bayes' rule considers the posterior probability of having a disease given that a diagnostic test shows a positive result. A key assumption of the standard example is that the prior probability, or base rate, of the disease is very accurately known. Such an assumption is fine for simple illustrations of Bayes' rule, but what if there is uncertainty in the prior probability of the disease? Does the Bayesian approach then break down? No. In fact, the Bayesian approach is specifically appropriate for applications in which there is uncertainty. That's what the Bayesian approach is all about: Describing uncertainty and re-allocating uncertainty when new data are observed. This post shows one way to use Bayesian analysis for the disease-diagnosis situation when the prior probability (i.e., base rate) of the disease is uncertain.

This post was provoked by a recent blog at the New Yorker by Gary Marcus and Ernest Davis. The topic of their blog was a criticism of Nate Silver's book, The Signal and the Noise. In the course of their critique, Marcus and Davis go through a case of the standard textbook example, involving diagnosis of breast cancer. They assume, for purposes of the example, that there is a well-known prior probability of breast cancer. They say,
A Bayesian approach is particularly useful when predicting outcome probabilities in cases where one has strong prior knowledge of a situation.  ... But the Bayesian approach is much less helpful when there is no consensus about what the prior probabilities should be.
I think that the second sentence above is exactly wrong. The Bayesian approach is also helpful, perhaps uniquely helpful, when there is uncertainty in the prior. All we have to do is express the uncertainty in a mathematical model, and then let Bayesian inference tell us how to re-allocate uncertainty given the data. The present blog post is an explicit illustration of one way to do this.

Please note that this post is not defending or criticizing Nate Silver's book. This post is about what Marcus and Davis say about Bayesian methods, illustrated specifically by the case of disease diagnosis. This post is not about anything Silver does or doesn't say about Bayesian or non-Bayesian methods. My goal is to clarify what Bayesian methods can do, and specifically one way for expressing uncertainty in the case of disease diagnosis.

First, the standard textbook example, as provided by Marcus and Davis in their blog:
Suppose, for instance (borrowing an old example that Silver revives), that a woman in her forties goes for a mammogram and receives bad news: a “positive” mammogram. However, since not every positive result is real, what is the probability that she actually has breast cancer? To calculate this, we need to know four numbers. The fraction of women in their forties who have breast cancer is 0.014 [bold added], which is about one in seventy. The fraction who do not have breast cancer is therefore 1 - 0.014 = 0.986. These fractions are known as the prior probabilities. The probability that a woman who has breast cancer will get a positive result on a mammogram is 0.75. The probability that a woman who does not have breast cancer will get a false positive on a mammogram is 0.1. These are known as the conditional probabilities. Applying Bayes’s theorem, we can conclude that, among women who get a positive result, the fraction who actually have breast cancer is (0.014 x 0.75) / ((0.014 x 0.75) + (0.986 x 0.1)) = 0.1, approximately. That is, once we have seen the test result, the chance is about ninety per cent that it is a false positive.
In the case above, the prior probability (emphasized by bold font), is assumed to be exactly 0.014. This probability presumably came from some gigantic previous survey of women, using some extremely accurate assay of the disease, so that the prior probability is conceived as having no uncertainty.

But what if there is uncertainty about the prior probability of the disease? What if, instead a gigantic previous survey, there was only a small survey, or no previous survey at all? Bayesian methods handle this situation well. I've got to introduce a little mathematical notation here, but the ideas are straightforward. Let's denote the true probability of the disease in the population as the symbol θ (Greek letter theta). The standard example above stated that θ=0.014. But in Bayesian methods, we can put a distribution of relative credibility across the entire range of θ values from 0 to 1. Instead of assuming we believe only in the exact value θ=0.014, we say there is a spectrum of possibilities, and the prior knowledge expresses how certain we are in the various possible values of θ. The distribution across the range of θ values could be very broad --- that's how we express uncertainty. Or, the distribution could be sharply peaked over θ=0.014 --- that's how we express certainty.

Here is the standard example in which there is high certainty that θ=0.014, with the positive test result denoted mathematically as y=1.
The left panel shows a sharp spike over the value θ=0.014, indicating that all other values have essentially zero credibility. (Ignore the messy numbers under "95% HDI." They are irrelevant for present purposes.) This "spike" distribution can be thought of as being based on a previous survey of 20,000 women. That's why the header of the left panel says "Prior Certainty: 20000." The right panel above shows the posterior probability of having the disease, given the positive test result (y=1). The right bar, centered on disease=1.0, has height of 0.097, which is the posterior probability of having the disease. This result closely matches the result stated by Marcus and Davis in their example.

We can use Bayesian inference seamlessly when the prior is less certain. Suppose, for example, that the previous survey involved only 20 women. Then the prior is a broader distribution over possible underlying probabilities of the disease, and the result looks like this:
The left panel shows a distribution that is spread out over θ, meaning that we are not very certain what the value of θ is. (The displayed distribution is actually the posterior distribution of θ given the test result; the prior distribution looks similar.) In particular, lots of values for θ greater than 0.014 are considered to be possible. The right panel above shows the posterior probability of having the disease, given the positive test result. Notice that the Bayesian inference indicates that the probability of having the disease is about 0.336, much more than the result of 0.097 when the prior was highly certain. This makes sense; after all, the prior knowledge is uncertain, so the the test result should weigh more heavily.

We can use Bayesian inference seamlessly even when the prior is hugely uncertain. Suppose that the previous survey involved only 2 women, which abstractly means merely that we know the disease can happen, but that's all we know. Then the prior distribution looks like this:
The prior, displayed above, shows that any value of θ is equally possible, and the probability of having the disease, according to this prior, is 50-50. The posterior distribution, given a positive test result, looks like this:
Notice that the posterior probability of having the disease is 0.882. Why so high? Because we have very uncertain prior knowledge about the disease, and all we have to go by is the outcome of the diagnostic test. (The Bayesian inference also automatically updates the distribution of credibility across θ as shown in the left panel above. It shows that, given the single positive test result, we should shift our beliefs about the underlying probability of the disease toward higher values.)

In this post I've tried to show that Bayesian inference is seamlessly useful regardless of how much uncertainty there is in prior knowledge. The simplistic framing of the standard textbook example should not be construed as the only way to do Bayesian analysis. Indeed, the whole point of Bayesian inference is to express uncertainty and then re-allocate credibility when given new knowledge. If there is lack of consensus in prior knowledge, then the prior should express the lack of consensus, for example with a broad prior distribution as illustrated in the examples above. If different camps have different strong priors, then Bayesian analysis can tell each camp how they should re-allocate their idiosyncratic beliefs. With enough data, the idiosyncratic posterior distributions will converge, despite starting with different priors.

By the way, I am not aware of the sort of analysis I've provided above appearing elsewhere in the literature. But it's straightforward, and I imagine it must be "out there" somewhere. If any of you can point me to it, I'd appreciate it.

Thursday, January 24, 2013

Bayesian workshop at FAA Atlantic City, Feb. 28


I'll be doing an introductory workshop on Bayesian data analysis at the Federal Aviation Administration's Human Factors Lab in Atlantic City on February 28, 2013. Details are available here.

Tuesday, January 22, 2013

Marking sub-intervals of multimodal highest density intervals

Here's a small aesthetic fix for the program BernGrid.R, so that it marks all interior end points of sub-intervals in a multimodal highest density interval.

For example, Figure 6.4 (p. 107) shows the posterior HDI with only the lowest and highest points marked by vertical lines (even though there is a horizontal "water level" that shows the subintervals):

The new program puts vertical lines at all the interior sub-interval end points, like this:

Is that exciting or what?!? If you just gotta have it, you can get it from the usual program repository. Get the files BernGrid.R, BernGridExample.R, and openGraphSaveGraph.R.

Thursday, January 17, 2013

Uniform R code for opening, saving graphs in Windows and Mac OS

[See follow-up here.]

A big frustration when trying to create R code that works across Windows and Mac OS is that the R commands for opening graphics windows, and for saving their contents, are different in the two operating systems. In Windows, the functions windows() and savePlot() do the job nicely. But in Mac OS, equivalent commands are X11(,type="cairo") and either savePlot() or dev.copy2* depending on the desired format.

In this post I present two simple utility functions for opening graphics windows and saving them that operate the same for Windows and Mac OS. (You'd think someone would have done this already, but it appears not?) The basic sequence of commands is this:

openGraph(...)  # open a graphics window
plot(...)       # create graph in the window
saveGraph(...)  # save to a file in desired format

I have tested the functions on recent Windows and Mac OS machines, running from RStudio desktop. I would like to know if users encounter problems with the functions when running in a different configuration. I am told that the functions also work on Linux.

Here is an R script that defines the functions and then calls them with several different file formats. Please copy and paste the entire script into R and give it a try. Let me know how it goes.

Update January 29, 2013: The functions defined below work robustly and they are now in the program openGraphSaveGraph.R in the program repository. Several other programs have been modified to use these functions, and the others will be modified eventually.

#------------------------------------------------------------------------
# Modified 2013-Jan-22 8:55pm EST

openGraph = function( width=7 , height=7 , ... ) {
  if ( .Platform$OS.type != "windows" ) { # Mac OS, Linux
    X11( width=width , height=height , type="cairo" , ... )
  } else { # Windows OS
    windows( width=width , height=height , ... )
  }
}

saveGraph = function( file="saveGraphOutput" , type="pdf" , ... ) {
  if ( .Platform$OS.type != "windows" ) { # Mac OS, Linux
    if ( any( type == c("png","jpeg","jpg","tiff","bmp")) ) {

      sptype = type
      if ( type == "jpg" ) { sptype = "jpeg" }
      savePlot( file=paste(file,".",type,sep="") , type=sptype , ... )     
    }
    if ( type == "pdf" ) {
      dev.copy2pdf(file=paste(file,".",type,sep="") , ... )
    }
    if ( type == "eps" ) {
      dev.copy2eps(file=paste(file,".",type,sep="") , ... )
    }
  } else { # Windows OS
    file=paste(file,".",type,sep="") # force explicit extension
    savePlot( file=file , type=type , ... )
  }
}

graphics.off()
for ( typeVal in c("eps","pdf","jpg","jpeg","png") ) {
  openGraph(width=3,height=3)
  plot(1:4,c(1,3,2,4),type="o",main=bquote(typeVal==.(typeVal)))
  saveGraph( file="openGraphSaveGraphOutput", type=typeVal )
}
 

#------------------------------------------------------------------------

Monday, January 7, 2013

Bayesian multiple linear regression with interaction: Erratum

Background: In regression models, the posterior distribution on the regression coefficients and intercept can be strongly correlated, causing some MCMC samplers to become inefficient as they "bang into the walls" of the diagonal ridge in parameter space. There are various ways to address this problem, but one simple way is by standardizing the data (or at least subtracting out the mean), and then converting the MCMC-derived parameters back to the original scale of the data. This approach was taken in the book (DBDA) for using BUGS, and continued, vestigially, in the programs that were converted to JAGS even though JAGS is often less susceptible to the problem.

The news in this post is that a reader, Francis Campos, caught an error in the formulas for the translation from standardized data to original-scale data, for the particular case of the multiple linear regression model with two predictors and a multiplicative interaction. Specifically, part of Equation 17.5, at the top of p.472, has two subscripts reversed, and should be changed as shown in red here:
Correction to Eqn. 17.5, top of p. 472
It turns out that the error was not only in the printed equation. It was also in the programs MultiLinRegressInterBrugs.R and MultiLinRegressInterJags.R, and this error caused the graphical output to be wrong as well. The corrected programs are now available at the program repository, and the corrected figures are included below. (In a separate program, I verified the new results simply by removing the standardization of the data and the transformation back to original scale. JAGS handles the unstandardized, original-scale data very efficiently.) Fortunately, nothing conceptual in the discussion in the book needs to be changed. All the conceptual points still apply, even though the graphical details have changed as shown here:
Top of Fig. 17.9, p. 473, corrected.
Bottom of Fig. 17.9, p. 473, corrected.
Figure 17.10, p. 474, corrected.
My thanks to Francis Campos for finding the error and alerting me to it!

Friday, January 4, 2013

Bayesian estimation in a new web app!

A wonderful new web app is now available for doing Bayesian estimation of two groups. Just enter the data for each group, click the "go" button, and watch the MCMC sample emerge, revealing the posterior distribution for the means, standard deviations, and normality, along with the difference of means, difference of standard deviations, and effect size. This web app was created by Rasmus Bååth and is available at http://www.sumsar.net/best_online/. Give it a try!
 
The web app is fabulous for quickly and easily seeing the main results of a Bayesian estimation of parameters for two groups. There is no need to install R and JAGS and then invoke R and run commands. Instead, just paste in your data and click a button!

Here (below) is a screen shot from January 03, 2013, using the data from the main example of BEST. More comments about the app continue after the screen shot.


The original implementation of BEST, available from http://www.indiana.edu/~kruschke/BEST/, provides some more functionality than in the app (at the time of this post), but only by taking the effort to install (free) software and to interact with the R programming language. The original BEST software implements ROPEs for the decision rule and shows a posterior predictive check in the form of a smattering of credible model distributions superimposed on data histograms. It also shows correlations of parameters in the posterior. The original BEST software also does power computations. Another benefit of dealing with the BEST software is that it is a gateway to the full spectrum of Bayesian data-analysis programs written in R and JAGS/BUGS. But for quickly and easily showing the primary results from a Bayesian analysis of two groups, this web app is brilliant!

Big thanks to Rasmus Bååth for creating this awesome web app! If you like it, let him know -- his e-mail is linked in the lower right "About" section on the web app. The URL again is http://www.sumsar.net/best_online/