Sunday, February 2, 2014

Boxplot with mean and standard deviation in ggPlot2 (plus Jitter)

When you create a boxplot in R, it automatically computes median, first and third quartile ("hinges") and 95% confidence interval of median ("notches").

But we would like to change the default values of boxplot graphics with the mean, the mean + standard deviation, the mean - S.D., the min and the max values.
Here is an example solved using ggplot2 package. Plus here are represented points (the single values) jittered horizontally.
library(ggplot2)
# create fictitious data
a <- runif(10)
b <- runif(12)
c <- runif(7)
d <- runif(15)
# data groups
group <- factor(rep(1:4, c(10, 12, 7, 15)))
# dataframe
mydata <- data.frame(c(a,b,c,d), group)
names(mydata) <- c("value", "group")
# function for computing mean, DS, max and min values
min.mean.sd.max <- function(x) {
r <- c(min(x), mean(x) - sd(x), mean(x), mean(x) + sd(x), max(x))
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
# ggplot code
p1 <- ggplot(aes(y = value, x = factor(group)), data = mydata)
p1 <- p1 + stat_summary(fun.data = min.mean.sd.max, geom = "boxplot") + geom_jitter(position=position_jitter(width=.2), size=3) + ggtitle("Boxplot con media, 95%CI, valore min. e max.") + xlab("Gruppi") + ylab("Valori")
view raw ggplot1 hosted with ❤ by GitHub

9 comments:

  1. "we would like to change the default values of boxplot graphics"

    Have you also considered changing the value of pi to 2.0 and renaming the mean as "the Oxford-Meier-Blubb"? Please, don't do any of these three. There are enough recipients struggeling with the meaning of a boxplot. If there finally are boxplot-like-figures that are not boxplots, it's going to hurt. Please find an alternative way to display the results.

    Cheers,
    Bernhard

    ReplyDelete
  2. @Duleep Samuel
    at the end of the code, type: print(p1)

    ReplyDelete
  3. thanks for rectifying, now the code rocks

    ReplyDelete
  4. Wonderful! I'm writing a report with a deadline coming soon, and this is exactly what I needed!

    Ciao!

    ReplyDelete
  5. Hi, great post, but... i have a question, your function calculate the mean-sd, mean+sd, which i consider is the box of the boxplot, and min and max values, which i supose are the bars... but why the graph title says: 95%CI? not should be "mean +-SD?

    ReplyDelete
    Replies
    1. your point is valid! it is not 95% CI. This is 68% and if you want to have 95% CI you should change the function to mean(x) -+ 3*sd(x)

      Delete
  6. Please elaborate it more . Ist time reader who dnt know about this could not understand it

    ReplyDelete
  7. I like how this post demonstrates customizing boxplots in R with ggplot2.

    ReplyDelete