## Saturday, April 7, 2012

### Negative Binomial Reparameterization

In a previous post, I showed that direct estimation of the p and r parameters in a negative binomial distribution could involve bad autocorrelation in the MCMC chains, and I suggested that there must be some standard reparameterization to solve the problem, and asked for a pointer. Dr. John Davey of the University of Edinburgh was good enough to point the way. The solution is indeed straight forward: In the direct estimation, the p and r parameters are given priors, while the mean m (and variance v) is derived from p and r. In the reparameterization, the m and r parameters are given priors, while the p (and variance v) is derived from m and r. Autocorrelation in the chains is greatly reduced. Here is an example.

## The model specifications:

Parameterized by p and r:
model {
for( i in 1 : N ) {
y[i] ~ dnegbin( p , r )
}
p ~ dbeta(1.001,1.001)
r ~ dgamma(0.01,0.01)
m <- r*(1-p)/p
v <- r*(1-p)/(p*p)
}

Parameterized by m and r:
model {
for( i in 1 : N ) {
y[i] ~ dnegbin( p , r )
}
p <- r/(r+m)
r ~ dgamma(0.01,0.01)
m ~ dgamma(0.01,0.01)
v <- r*(1-p)/(p*p)
}

## Results:

Parameterized by p and r:

Parameterized by m and r:

The posteriors are a bit different for the two parameterizations, because I used "generic" priors without any attempt to transform them to be equivalent.

Thanks to John Davey for this pointer!

1. This is great. I'm going to try it with my data now. Thanks for the tip since this seems like a common problem I will encounter with my count data.

Cheers,
Dan

2. Thanks, Professor Kruschke. You inspired me to try this -- and JAGS as a vehicle for learning BUGS -- on another problem, available for public inspection at: http://hypergeometric.wordpress.com/2013/12/23/jags-for-finding-highs-and-lows-in-a-week-of-wikipedia-accesses/

3. G'day, What would a full script look like when comparing two vectors both of which have a negbinom distribution? I have been trying to transpose this post with BEST.R function, but not very well.

4. Graeme:

I won't have a chance to write a full script, but the key thing to do is use nested indexing for the group identifier. Row i of the data file contains the count y[i] and the group identifier x[i]. Then the model statement uses

for ( i in 1:Nrows ) {
y[i] ~ dnegbin( p[x[i]] , r[x[i]] )
}
for ( grpIdx in 1:2 ) {
p[grpIdx] <- r[grpIdx]/(r[grpIdx]+m[grpIdx])
m[grpIdx] ~ dgamma(0.01,0.01)
# etc. ...
}

5. This seems to be working to here. Anything obvioulsy wrong?

y[i] ~ dnegbin( p[x[i]] , r[x[i]] )
}
for ( grpIdx in 1:2) {
p[grpIdx] <- r[grpIdx]/(r[grpIdx]+m[grpIdx])
m[grpIdx] ~ dgamma(0.01, 0.01)
r[grpIdx] ~ dgamma(0.01, 0.01)
v[grpIdx] <- r[grpIdx]*(1-p[grpIdx])/(p[grpIdx]*p[grpIdx])
}
}
"

attach(snail)

y = c( snail$alive14 , snail$alive15) # combine data into one vector
x = c( rep(1,length(alive14)) , rep(2,length(alive15)) ) # create group membership code
Ntotal = length(y)
# Specify the data in a list, for later shipment to JAGS:
dataList <- list(
x = x,
y = y ,
Ntotal = Ntotal
)
length(y) #check
length(x)

paramaters<-c("m","r","p","v") #collect these
jagsModel <- jags.model(textConnection(modelString), data=dataList, n.chains=3, n.adapt=1e3)

codaSamples <- coda.samples(jagsModel, variable.names=parameters, n.iter=3000, thin=1)
m <- as.matrix(codaSamples)

thanks Graeme