h1

Profile Analysis in R

May 7, 2010

I recently wrote a few functions to perform a profile analysis in R. These functions are used to identify the criterion pattern and run a cross-validation (see Davision & Davenport, 2002). I’ve thought about getting serious about this and trying to clean up my code, add some new features (such as MCMC), and submit it to CRAN. Please test the script (& documentation) and if you do profile analysis let me know what you think and if you think I should add some other features. I’d be happy to do it but since I am primarily interested in Bayesian statistics, multilevel modeling, and latent modeling, I don’t have a lot of interest in maintaining and developing a profile analysis R package unless their is interest from the community as it’s not my own research interest.

This script is available here and the manual is available here .

The script contains two functions:

criterion.pattern()


profile.cv()

And you should source the script at the start of a R session:
source("/path/to/profile_analysis-0.1.R")

h1

JAGS 2.0.0 for Ubuntu 10.04

April 29, 2010

I built a package of JAGS 2.0.0 for Ubuntu 10.04 64 bit. This was just a quick rebuild from the upstream Debian Sid source. It seems to work fine. It is available here . Please let me know if you have any issues with it. I plan on tracking Ubuntu 10.04 64 bit on my laptop until Squeeze is released and then moving back to Debian. Until then I plan to keep this package up-to-date for Ubuntu.

EDIT: rjags has not yet been updated for JAGS 2.0.0, so if you want to continue to use rjags, you should hang tight with JAGS 1.0.4 for the time being.

2nd EDIT: I already got sick of Ubuntu and am back on Debian testing (and will be until Squeeze becomes stable then I will follow stable). So while I won’t be pulling my JAGS package, I won’t be updating it either. Sorry.

Also, I’ve noticed a lot of people seem to be coming here to figure out how to set up WinBUGS or OpenBUGS on their Mac. If possible, I’d like to encourage you to use JAGS. It really does work great, is cross-platform, and open source.

3rd EDIT (May 7th 2010): JAGS 2.0.0 has now hit Squeeze and this means that rjags on CRAN is presently broken. However, you can grab the rjags tarball from sourceforge here and install it by running the following code.


R CMD INSTALL rjags_2.0.0-2.tar.gz

Restart R and you can now load the rjags library again. Thanks to Dirk Eddelbuettel for the tip that a compatible version of rjags was living on Sourceforge.

4th EDIT (May 21st, 2010): JAGS 2.1.0 and rjags 2.1.0 have been released. rjags 2.1.0 is now on CRAN so you no longer need to install rjags from Sourceforge. I will not be updating the package for Ubuntu, as I am on Debian testing, sorry.

h1

Bayesian Multilevel Talk

April 28, 2010

As promised, I am posting my Bayesian multilevel talk. You can get it here . It briefly introduces Bayesian, MCMC, and Bayesian multilevel modeling. The presentation was created in beamer (.tex file is available upon request) and I used JAGS, rjags, and MCMCglmm to run my models. The presentation includes syntax for running Bayesian models in these programs.

Next week I will post a new presentation on Bayesian CFA. The analysis will be performed in rjags again.

Please feel free to comment and let me know what you think. I am specifically interested in knowing if something is unclear or just plain wrong.

Finally, I again this summer plan to go through another Bayesian textbook (Gelman’s book) and will attempt to publish more code on this website. I am going to focus primarily on regression and multilevel models as I know these techniques a lot better than latent models. If you’re interested in seeing a model written in JAGS syntax for rjags let me know. Also, if interest exists I might put together a how-to with JAGS.

Finally, I will only be publishing code for the JAGS language as it’s the only multi-platform opensource Gibbs sampler and all code you will see written here from now on will only be written if it’s a computational solution to a problem for Windows, Mac, and Linux users (The *BSDs and OpenSolaris too I guess). No Windows only stuff.

One other thing: COMMENTING IS WHAT LETS ME KNOW YOU FIND THIS USEFUL AND THAT I SHOULD KEEP DOING THIS. Unless you leave comments I have no way of knowing that this is useful and that I’m not wasting my time. The purpose of this blog is to be didactic not pedantic. So if you find this useful please let me know.

h1

Image Analysis

April 14, 2010

Although I’ve been absent from this blog for well over a month, I haven’t abandoned it. I have been busy working on a few projects that are using Bayesian methods and have been very busy with school work. The projects that I have been working on are the multilevel ZIP project that I’ve been discussing on this blog for a while and the other one is a confirmatory factor analysis project. I will be giving presentations on both of these approaches on the 27th of April and will post PDFs of these talks. The latter talk, on CFA, will also include code for rjags, a package to call JAGS from R. I’ve been playing around with this particular Gibbs Sampler a little bit and I think it’s a good option for Linux and Mac users. So I will be posting some code on here, similar to my SEM code, but this time it will be JAGS and rjags syntax and therefore can be used on all platforms without the need of wine.

On a related note, I recently gave a talk on Guttman’s image analysis for a seminar on factor analysis. Image analysis seems to be pretty interesting and able to overcome many of the issues associated with factor analysis, including factor score indeterminancy and factorial invariance. However, there are still issues of rotational indeterminancy. Basically image analysis breaks manifest variables into two chunks: Images and partial anti-images. The images are the parts of a variable that are shared with the other manifest variables, while the anti-image is the part that is unique. Further, images are similar to common factors and the anti-image is similar to unique factors. If you are interested in factor analysis, please read my image analysis talk (by clicking here) and let me know what you think. I am not an expert on image analysis but am happy to host a discussion on the topic and will try to answer questions.

h1

Why GNU/Linux for academics and why Debian in particular?

February 26, 2010

This vignette is an attempt to explain my position as to why I use GNU/Linux and Debian GNU/Linux, in particular, and why you as a reader, surfer, or couch academic, should want to run GNU/Linux and Debian. I first briefly describe GNU/Linux, then Debian, and provide reasons (compelling?) why you should consider Debian GNU/Linux over Windows or Mac.

GNU/Linux is the proper name given to what is casually referred to as the Linux operating system. Yes this is an argument that has been rehashed over and over again whether or not referring to GNU/Linux as Linux is sufficient. I do not believe it is and prefer to use GNU/Linux. GNU is the userland, e.g. the libraries, applications, compilers, etc., that you use to interact with the kernel, in this case Linux. There is also the GNU/Hurd operating system but it is still in it’s infancy (and yes I am aware that folks have ported the GNU userland over to FreeBSD, etc). Linux is the kernel. The kernel is the bridge between the software and the computer’s hardware. You blame Linux when your hardware isn’t supported and blame GNU when your favorite application is buggy? Linux is central, it’s vital, it’s the heart of an operating system. GNU is the appendages, the organs, and the organ system. Thus it’s a symbiotic interaction that together form the GNU/Linux operating system. Linux is named after Linus Torvalds and GNU is primarily the brainchild of Richard Stallman. GNU/Linux is open source (defined by several licenses such as GPL, BSD, etc.), thus you can get the source and freely modify it, improve on it, contribute to it, and even make money off of it (and other people’s free work!). Everything is developed completely in the open. In most case, you can get GNU/Linux for free. GNU/Linux is available in what are called distributions. A distribution takes the hard work out of GNU/Linux by putting the pieces together nicely, packaging up applications, adding pretty themes, and in general making the computing experience much nicer for the end user. Examples of popular distributions include Red Hat, Ubuntu, openSUSE, and Debian GNU/Linux. Red Hat and openSUSE, via Novell’s SLED, offer commercial distributions complete with support with Red Hat geared towards servers and SLED toward desktops. Of course you can run either as a server or a desktop. Now there are distributions such as openSUSE, Ubuntu, Debian, Fedora, etc., that are available for free. Ubuntu is one of the most popular distributions, is geared towards new users of Linux, and is in general a pretty good computing experience. Mac OS X users should love Ubuntu. I am not going to get into the different desktop environments, GNOME, KDE, XFCE, etc, but suffice it to say that there are some amazing desktop environments for Linux and UNIX based OS that are just as good and better than Windows or Mac.

Debian GNU/Linux, or Debian for short, is a distribution founded by Ian Murdock. It was a call by Ian to develop, in the open, a distribution completely by volunteers. It has been hugely successful and has received attention and support (i.e hardware donations, sponsorship of conferences) from several large hardware vendors, including HP and IBM. It has one of the largest, if not the largest selection, of precompiled packages available in the Linux world. Debian has a long release cycle (releasing when ready) and is traditionally known for having releases that are extremely stable with albeit sometimes old software. Debian sports a developer community of 1000+ and has a very good, informal, support community. Ubuntu is based on Debian.

So why GNU/Linux and Debian GNU/Linux in particular?
1) Transparency
Both GNU/Linux and Debian are 100% transparent. Bugs, security issues, etc. are in the open. That means you are aware of vulnerabilities and could potential help to solve them. Academia is suppose to be a transparent community where we can contribute and learn from one another. The source code in GNU/Linux and Debian is always available and because of this we can stand on the shoulders of giants.

2) DFSG
Debian looks out for there users by strictly complying with the Debian Free Software Guidelines. Some may see this as a problem as well but I don’t (for example, Firefox = Iceweasel).

3) Stability
GNU/Linux and Debian are both extremely stable especially when compared with Windows and Mac OS X. I realize that others experiences may be different and that’s fine. Also depending on the distribution you choose to run in GNU/Linux will have a great bearing on stability. Also the security holes in GNU/Linux are fewer than Windows and Mac and when they are arise you are aware of them, they are not hidden, and fixes are usually fast. Finally, viruses are extremely rare on Linux.

4) Lots of applications just an apt-get away
OpenOffice.org
LaTeX
Emacs
R
Firefox
Thunderbird
Octave
GIMP
etc.
Some of the applications work even better on Linux than Windows such as Emacs and LaTeX.

5) 64 bit support
64 bit support in Windows and Mac, in my opinion, pales in comparison to Linux especially Debian. All of my applications, kernel, etc., are 64 bit whereas in Windows and Mac you’re still forced to have to run some 32 bit applications and a kernel in the case of certain Mac hardware. This performance difference is huge.

6) In general, Debian just works
Debian in general just works out of the box. Unless you have extremely new hardware then everything should be detected. A default install of Debian, i.e. where you close your eyes and just keep hitting enter, will give you a complete desktop with a office suite, web browser, email client, graphics manipulation program, etc., all for free and additional packages are just an apt-get away. This is not a feature unique to Debian but a function of the whole GNU/Linux community.

7) Academia is suppose to be community that fosters the sharing of ideas right?
So this one, in my opinion, is the most important. Academia is suppose to be about sharing ideas, learning, and seeking knowledge collaboratively. So why would you use an operating system that doesn’t fully embrace these ideals? GNU/Linux, Debian in particular, allows you to contribute, improve on their code, or just use it for free. If you have an idea, file a bug report, and a developer will potentially implement your idea. Or with access to the code you can develop it yourself.

While this is not exhaustive these are some of the reasons why I choose GNU/Linux and Debian. So why not Ubuntu you ask? Well in my opinion, it’s buggier, because they release too often and I don’t really trust Canonical and Mark Shuttleworth. I see Mark as a little bit of a leech on Debian and while I think Ubuntu has contributed to Debian and upstream projects, it still seems meager compared to how much they’ve benefited. Debian’s slow release cycle also means you don’t have to update as often and you get a release that is more stable. All Ubuntu does is add some frosting on the cake and that frosting looks more and more like Mac OS X.

Thoughts, comments, things you think I should re-address or clarify are welcomed.

h1

Beamerposter for in-house conference

February 24, 2010

At a recent conference for graduate students in my department, I presented the attached poster. I am sharing it here for two reasons: 1) To show what a poster created entirely in LaTeX and R looks like and 2) to receive feedback on my poster. I know that there are some typos in the poster, e.g. Poission and not Poisson and grammatical mistakes, but I am curious if anyone has any thoughts about my discussion (yes it’s very pithy)? I am happy to share my *.tex file and *.sty file to create this if anyone is interested. Click on the picture below to download the PDF.

EDIT: The .tex and *.sty files are in the comments. If you want them please locate them in the comments.

gsrd2010

h1

Dropbox package for Debian Testing (Squeeze) for amd64

February 4, 2010

I am sharing a Dropbox package that I built from a Ubuntu source (from the Dropbox website) on a Debian testing machine. So it is binary compatible with Debian testing. Click here to download it. If there is a problem with it please let me know.

NOTE: This is not the same as installing the Ubuntu binary of nautilus-dropbox. My package is a complete rebuild and therefore is 100% compatible with Debian as it was built on Debian using Debian build dependencies.

h1

Icedove 3 amd64 packages for Debian Unstable (and Testing)

January 30, 2010

I grew impatient waiting for Icedove 3 (aka Thunderbird 3) to hit Debian Unstable, so I rolled my own packages for amd64 (only architectures available in Experimental were for i386 and armel). I’m uploading them in case they are of interest to anyone. They were built using build-deps from sid. I haven’t done any real testing with them but they *seem* to work fine here.

Note: They likely work on Testing but I haven’t tried and can’t. Feel free to leave a comment if they do so that others know.

EDIT (Feb 3, 2010): This package works fine on Debian Testing. So feel free to install it on Testing all the dependencies are in place. Also note this is just a rebuild of the source from Experimental so I haven’t done anything fancy other than just


apt-get source icedove
sudo apt-get build-dep icedove
dch -l local # To add the 'local' part to the package name #
debuild -us -uc
sudo dpkg -i ../icedove_3.0.1-1local1_amd64.deb

i.e. I did a complete rebuild not just a checkinstall.

2nd EDIT (May 4 2010): I am pulling my icedove 3 packages because icedove 3 is in Squeeze now and that is what people should be installing because it’s maintained and gets security updates! My package is not maintained, is old, and is no more!

h1

ZIP model updates and longitudinal talk

January 14, 2010

Since my last post, I’ve finally managed to make my ZIP models (a mixture model of Binomial + Poisson distributions) converge in MCMCglmm. The trick was that the model needed a longer burn-in and a larger number of iterations to converge. After 10,000 burn-ins and 60,000 iterations my model converged. I am in debt to Jarrod Hadfield for his time and patience helping me through this problem.

The reason I initially had to move to a ZIP model was that 86% of the student data in my data set had zero suspensions. So there was a strong need to account for this in a model. The Poisson model could not account for and did a horrible job predicting. However, the ZIP model did a much better job accounting for this. Below is the syntax that I used to solve my problem.


# Set priors for additional terms in G-structure
prior1=list(R=list(V=diag(2),nu=1, fix=2),
G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.V=25^2),
G2=list(V=1, nu=1, alpha.mu=0, alpha.V=25^2),
G3=list(V=1, nu=1, alpha.mu=0, alpha.V=25^2)))


q2.1000 <- MCMCglmm(sus~trait-1 + at.level(trait,1):grade + at.level(trait,1):I(grade^2) + at.level(trait,1):male.f + at.level(trait,1):ethnic.f + at.level(trait,1):sped.f +at.level(trait,1):risk.f, random=~us(at.level(trait,1)):id.f + us(at.level(trait,1)):schn.f + (at.level(trait,1)):schn.f:id.f, data=ransamp1000, rcov=~idh(trait):units, family="zipoisson", prior=prior1,nitt=60000, thin=50, burnin=10000)

The R structure in the prior corresponds to the random error variance-covariance matrix which is not estimated in a Binomial model so it is fixed. The G structure corresponds to the random effects variance-covariance matrix, in my case I have three elements and need three priors. I need a prior to random intercepts for the students (~us(at.level(trait,1)):id.f), schools (~us(at.level(trait,1)):schn.f), and students nested within schools (~us(at.level(trait,1)):schn.f:id.f). The priors on the G-structure are weak but informative Cauchy priors and I encourage you to read Jarrod’s extensive CourseNotes included with his package and see Gelman 2006 “Prior distributions for variance parameters in hierarchical models” for more info. about the priors.

My estimates, both ‘fixed’ and ‘random’, appear to be robust to the priors and using the Cauchy ones with parameter expansion just speeds up convergence. (Takes about 26.5 hours with 175,975 data points).

Finally, I am posting a recent talk I gave on longitudinal models here and the TeX file here . It was created in Beamer and should compile except for the references to the graphs that were not included. So comment these out.

h1

PDF annotation in Linux and update about Bayesian guide

December 11, 2009

I have been searching for a program that can annotate PDFs natively in Linux. I stumbled upon this PDF guide. However, I didn’t want to use Wine if I didn’t have to. Fortunately I came across Jarnal. While Jarnal isn’t as fully featured as using a PDF viewer under Wine, it does work. Also there’s a Debian package. So if you’re looking for an alternative to running Foxit or PDF-XChange Viewer via Wine, Jarnal is a viable option.

To use Jarnal, I do the following: File -> Open Background (and find the PDF I’m interested in) then I set Format -> Paper and Background “Lined” to “Plain”. I am not sure how to set this up as a default. But it works well and at least I can annotate.

Also, I plan to update the Bayesian guide during my break between semesters. Mostly, I plan to extend on the information about priors and maybe write up some simple code comparing GLM to MCMC results. The code will most likely be written in R but it might include some BUGS code. Ideally, it would be code in JAGS but I don’t know JAGS yet. We’ll see if I get time to learn it over break. Probably won’t.