Zero-inflation and overdispersion in count models and penalized regressionNovember 18, 2011
It has been a while since I’ve posted but I am still busy working away at my dissertation. For my dissertation, I am developing a Bayesian mixture model with my co-advisor Dr. Chatterjee in the school of statistics at the University of Minnesota. My hope is that this approach can be useful when your data are a realization of a mixture model with more than two components. Specifically this approach is being developed to address what I perceive as a shortcoming for dealing with count data that are both zero-inflated and overdispersed. Currently the best approaches for dealing with this kind of data are the zero-inflated Poisson and zero-inflated negative binomial models, with the latter models fitting best in the case of zero-inflation and overdispersion in your non-zero counts. These approaches work best when your data arise from two latent classes: structural zeros and sampling zeros. However, in a data set that I have been working with it seems reasonable that the data are from more than two latent classes and thus the inspiration behind the development of this new technique. I will post more about this as we develop it as well as R code.
Finally, I am getting really excited about penalized regression. Specifically ridge, lasso, and elastic-net regression. I will hopefully be back to blog about this soon with some examples of how to do this in R. The benefits of these approaches seem limitless and would be a great addition to the arsenal of methodologies that an applied researcher in education or psychology might use. Specifically I think these types of regressions could work well as a substitute for factor analysis (as these techniques act to reduce the dimensionality of your data).