Age-standardisation of means

Standardisation of rates by age (or other variables) is common. Standardisation of means is less common, but also seems useful. For example, I wanted to compare the mean duration of hospital admission between group with different age and sex composition without having to stratify my results or use regression.

Calculating the adjusted mean is easy – you just apply the stratified means to a reference population, then divide the total by the total size of the reference population. Calculating a standard error for the adjusted mean could be more difficult. Luckily there is a method in the ‘survey’ package, based on the method described by the CDC.

# generate data in which older people and women have longer spell durations, but men are older

n <- 250
datm <- data.frame(agegrp = sample(c('15-34', '35-44', '45-55'), n, T, c(0.1, 0.3, 0.6)), sex = 'm')
datf <- data.frame(agegrp = sample(c('15-34', '35-44', '45-55'), n, T, c(0.6, 0.3, 0.1)), sex = 'f')
dat <- rbind(datm, datf)
dat$agegrp <- as.factor(dat$agegrp)
dat$spell <- runif(n * 2, 0, 5)
dat$spell <- dat$spell * ifelse(dat$agegrp == '35-44', 1.5, 1)
dat$spell <- dat$spell * ifelse(dat$agegrp == '45-55', 2.0, 1)
dat$spell <- dat$spell * ifelse(dat$sex == 'f', 1.3, 1)

# calculate raw means: men higher than women; women higher than men within strata

aggregate(dat$spell, by = list(dat$sex), FUN = 'mean')
x <- aggregate(dat$spell, by = list(dat$agegrp, dat$sex), FUN = 'mean')

# calculate age-adjusted means: women higher than men

popage <- as.numeric(table(dat$agegrp))
sum(popage * x[1:3,3]) / sum(popage) # male age-adjusted mean
sum(popage * x[4:6,3]) / sum(popage) # female age-adjusted mean

# standard errors of age-adjusted means


design <- svydesign(ids = ~1, strata = ~agegrp, data = dat)
stdes <- svystandardize(design, by = ~agegrp, over = ~sex, population = popage)
y <- svyby(~spell, ~sex, svymean, design = stdes)
y$lower <- y$spell - 1.96 * y$se
y$upper <- y$spell + 1.96 * y$se

y$spell[2] / y$spell[1]

The object ‘y’ contains the adjusted means, standard errors and confidence intervals. The adjusted means are identical to those calculated manually. The final line shows that the age-adjusted mean spell for women is approx 1.3 times that of men, as you would expect from the data.


Correlation matrix for categorical variables

In R, you can easily create a correlation matrix of continuous variables using the base function ‘cor’. But there’s no comparable way to create a correlation matrix of categorical variables. The function below provides a matrix of Cramer’s V (requiring the ‘vcd’ package), where:

‘vars’ is a string vector of categorical variables that you want to correlate
‘dat’ is a data.frame containing the variables

catcorrm <- function(vars, dat) sapply(vars, function(y) sapply(vars, function(x) assocstats(table(dat[,x], dat[,y]))$cramer))

Wrapper for indirect standardisation function from epiR

I like the indirect standardisation function in epiR. I’ve made a wrapper that lets you put in data rather than fiddly summary tables.

‘dat’ is a data.frame containing the other variables.
‘outcome’ is a character name of the binary outcome variable.
‘strata’ is a character name of the factor variable that you want to stratify by (e.g. region).
‘agegrp’ is a character name of the factor variable that you want to standardise for (e.g. age group).
‘survival’ is a character name of the numeric variable showing survival time. If you include this variabe, the output will be a rate ratio; otherwise it will be a risk ratio.
‘ref_strat’ is a character name of the stratum that you want to use as the reference value. If you include this variable, the specified stratum will have an SMR of 100. Otherwise, the study population will be used as a reference.

SMR.wrapper <- function(dat, outcome, strata, agegrp, survival = NA, ref_strat = NA) {
	labs <- levels(dat[,strata])
	levs <- 1:nlevels(dat[,agegrp])
	dat$ind <- 1
	strat_deaths <- matrix(aggregate(dat[,outcome], by = list(dat[,strata]), FUN = sum)[,2], dimnames = list(labs, ""))
	mths_at_risk <- matrix(xtabs(ifelse(rep(, nrow(dat)), dat$ind, dat[,survival]) ~ dat[,strata] + dat[,agegrp]), nlevels(dat[,strata]), nlevels(dat[,agegrp]), dimnames = list(labs, levs))
	if(! {
		dat <- dat[dat[,strata] == ref_strat,]
	ref_deaths <- aggregate(dat[,outcome], by = list(dat[,agegrp]), FUN = sum)
	ref_deaths <- c(ref_deaths[,2], sum(ref_deaths[,2]))
	ref_mths <- aggregate(ifelse(rep(, nrow(dat)), dat$ind, dat[,survival]), by = list(dat[,agegrp]), FUN = sum)
	ref_mths <- c(ref_mths[,2], sum(ref_mths[,2]))
	ref_rate <- matrix(ref_deaths / ref_mths, 1, dimnames = list("", c(levs, "Total")))
	return(epi.indirectadj(strat_deaths, mths_at_risk, ref_rate, units = 100))

More efficient way to find primes in R

I previously posted a method of finding all primes up to a number ‘max’. Here’s a more efficient method that removes all non-primes rather than testing whether each number is a prime.

primes <- function(max) {
	a <- 2:max
	b <- 2:(max/2)
	for (i in b) {
		c <- (1:(max/i) * i)[-1]
		c <- c[c %in% a]
		a <- a[!(a %in% c)]

Nice output tables for logistic models

You often want effect sizesĀ and confidence intervals on the odds scale from your logistic regression models. The models in R (e.g. glm, zelig) are really useful but need a bit of work before you have something you can use in your paper. This function produces a nice output data frame.

LRoutput <- function(model) {
    coefs <- summary(model)$coefficients
    CIs <- confint(model)
    coefsCIs <- cbind(coefs[,1], CIs)
    x <- data.frame(exp(coefsCIs))
    x <- format(round(x, 2), digits = 2, nsmall = 2)
    p <- round(coefs[,4], 3)
    p2 <- ifelse(p < 0.001, "<0.001", paste0(" ", format(p, digits = 3, nsmall = 3)))
    stars <- findInterval(p, c(0, 0.001, 0.01, 0.05, 0.1))
    stars <- factor(stars, 1:5, labels = c("***", "**", "*", ".", ""))
    summ <- paste0(x[,1], " (", x[,2], "-", x[,3], ";", p2, ")")
    return(data.frame(coef = x[,1], lci = x[,2], uci = x[,3], p = p, stars = stars, summary = summ))

Recode a vector

I haven’t managed to find a function that can recode vectors in a simple way, and that works for any class of vector. I think this function should be quite efficient despite the loop. ‘Vect’ is the vector you want to recode (which could be a variable in a data frame), ‘originals’ is a vector of values that appear in that vector, e.g. c(1, 2, NA), and replacements are the values you want to replace them with, e.g. c(NA, 7, 4). There is no need to include all unique values in the ‘originals’ vector.

recode <- function(vect, originals, replacements) {
	u <- rep(NA, length(vect))
	for (i in 1:length(originals)) {
		if([i])) {
			u[] <- replacements[i]
		} else {
		u[vect == originals[i]] <- replacements[i]
	return(ifelse(vect %in% originals, u, vect))

Determine if a vector is ordered

This little bit of code tells you if a vector x is ordered: 1 = ascending, 2 = descending, 0 = not ordered.

ordered.status <- function (x) {
	if (x[1] > x[2]) {
		ord <- order(x, decreasing = T)
		ordx <- x[ord]
		return(ifelse(identical(x, ordx), 1, 0))
		} else {
			ord <- order(x)
			ordx <- x[ord]
			return(ifelse(identical(x, ordx), 2, 0))