Assigned Reading:
Chapter 5 from: Zuur, A. F., Ieno, E. N., Walker, N., Saveliev, A. A. and Smith, G. M. 2009. Mixed Effects Models and Extensions in Ecology with R. Springer. link
Fixed and random effects affect mean and variance of y, respectively.
Examples
Fixed: Nutrient added or not, male or female, upland or lowland, wet versus dry, light versus shade, one age versus another
Random: genotype, block within a field, individuals with repeated measures, family, parent
Mixed-effects models maximize use of info in the data, compared to 2-stage analysis.
Random intercept model
Random intercept and slope model
Random effects model
Step 1: fit linear regression
Step 2: fit model with gls (so linear regression model can be compared with mixed-effects models)
Step 3: choose variance strcuture
Adjust variance structure to take care of heterogeneity
Step 4: fit the model
Make sure method="REML"
M1.lme=lme(Form, random=~1|Nest, method="REML", data=Owls)
Step 5: compare new mixed-effects model with old
Form <- formula(LogNeg~SexParent*FoodTreatment+SexParent*ArrivalTime)
M.gls = gls(Form, data=Owls)
M1.lme=lme(Form, random=~1|Nest, method="REML", data=Owls)
anova(M.gls, M1.lme)
Stept 6: validate model and, if necessary, repeat steps 4 and 5 until good model is found
Steps 7 and 8: find optimal fixed effects structure
First, interaction terms; drop one at time and compare models
Use methods="ML"
Step 9: refit with REML and validate model
Step 10: interpret
Step 11: additive model
Chapter 19. Mixed-Effercts Models, in Crawley 2012 The R Book, 2nd ed. link
Fixed vs. random effects
Pseudoreplication
More examples of mixed-effects models for different sampling and experimental designs
The data:
Hummingbird species specialization measured using z-score standardized d’ values
Load the data
This is a dataset of 3 columns:
# Beth's data are not availale on our class GitHub repository.
# To use this script, download the data from our class Canvas site and place it in the data folder of your working directory.
# Use this line if you downloaded the data file to the data directory on your computer
spec <- read.csv(file = "data/Specialization_Data_BIO202.csv", header = TRUE)
head(spec)
## Polygon.Area.100 fSpecies Specialization
## 1 1 Charming.Hummingbird -0.574054
## 2 1 Green.Hermit 8.986848
## 3 1 Rufous.tailed.Hummingbird 1.360791
## 4 1 Snowy.bellied.Hummingbird 1.980888
## 5 1 Stripe.throated.Hermit 2.902282
## 6 1 Violet.crowned.Woodnymph 1.662080
Originally when I did this analysis I followed Zuur et al.’s “10-step process” in the book starting on pg. 130. I do not outline all the steps I took here, but I found it useful in case any of you are interested for future reference.
All analayses are done using the package {nlme}
but can also be done in {lme4}
library(nlme)
Since I want to know if hummingbird specialization changes with increasing forest cover, I can start with a simple linear regression
Side note: instead of using the lm()
function for my linear model, I use the gls()
function in case I need to add a variance later on, this way I can compare the model with no variance structure to the model with the variance structure.
Form <- Specialization ~ Polygon.Area.100
M.lm <- gls(Form, data = spec)
plot(Form, data = spec, xlab = "Forest Cover (100m)", ylab = "Specialization (d')")
abline(M.lm, lwd = 4)
plot(M.lm)
The cone shape in my residuals illustrates that my data are violating the assumption of homogeneity, so I will apply a variance structure to hopefully take care of this.
The best variance structure I found was the varPower structure
vf3 <- varPower(form = ~Polygon.Area.100) ## power of the covariate variance structure = the variance of the residuals multiplied with the power of the absolute value of the variance covariate forest cover
## nested within linear model so can be compared to linear model fit using log likelihood ratio
## Variance covariate can not have values equal to zero
M.gls3 <- gls(Specialization ~ Polygon.Area.100, weights = vf3, data = spec)
AIC(M.lm, M.gls3) ## we used the gls function earlier instead of the lm function so we can do this anova
## df AIC
## M.lm 3 908.2300
## M.gls3 4 851.0649
plot(M.gls3) ## you can see the residuals are now less cone-y
summary(M.gls3)
## Generalized least squares fit by REML
## Model: Specialization ~ Polygon.Area.100
## Data: spec
## AIC BIC logLik
## 851.0649 863.4399 -421.5325
##
## Variance function:
## Structure: Power of variance covariate
## Formula: ~Polygon.Area.100
## Parameter estimates:
## power
## 0.7317331
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 0.193740 0.3090522 0.626885 0.5316
## Polygon.Area.100 4.830138 0.8255718 5.850657 0.0000
##
## Correlation:
## (Intr)
## Polygon.Area.100 -0.791
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.5689357 -0.6917083 -0.3274916 0.5277962 2.7803202
##
## Residual standard error: 5.450152
## Degrees of freedom: 165 total; 163 residual
From the model with the variance structure we can tell that it looks like there is a relationship between hummingbird species specialization and amount of forest cover, so that’s neat.
Biologically, however, it’s totally possible that each of the 13 hummingbird species has its own relationship between specialization and forest cover. For example, a morphologically specialized hummingbird species that has a really long, curved beak may not be able to display the same degree of change in its specialization as a hummingbird with a medium-length, straight beak.
In addition, as with the example of the RIKZ data from 5 beaches in the Zuur book Chp. 5, the specialization values of each hummingbird species across the sites are likely to be more related to each other than to the specialization values of other hummingbird species. This makes the data nested.
Thus, a mixed effects model for nested data is applicable in this case!
We can model specialization as a linear function of forest cover where the intercept is allowed to change per hummingbird species. This biologically makes sense because it is expected that each hummingbird species will have its own unique specialization behavior at each site.
Allowing the intercept to vary for each species is a random intercept model
Mlme1 <- lme(Form, random = ~1|fSpecies, data = spec, weights = vf3)
## lme stands for linear mixed effects. In this function you must specify a "random" argument
## ~1|fSpecies specifies the random intercept model. Remember | in this case means "by" so you are allowing the intercept to vary by species.
summary(Mlme1)
## Linear mixed-effects model fit by REML
## Data: spec
## AIC BIC logLik
## 807.9522 823.4209 -398.9761
##
## Random effects:
## Formula: ~1 | fSpecies
## (Intercept) Residual
## StdDev: 1.425693 4.349495
##
## Variance function:
## Structure: Power of variance covariate
## Formula: ~Polygon.Area.100
## Parameter estimates:
## power
## 0.7097861
## Fixed effects: list(Form)
## Value Std.Error DF t-value p-value
## (Intercept) 0.077894 0.4869339 151 0.159969 0.8731
## Polygon.Area.100 4.678450 0.6992141 151 6.691012 0.0000
## Correlation:
## (Intr)
## Polygon.Area.100 -0.467
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.1087088 -0.6534845 -0.2104922 0.6095179 3.2441853
##
## Number of Observations: 165
## Number of Groups: 13
The fixed effects part of the output shows that the general equation for the relationship between forest cover (FC) and specialization is \(0.08 + 4.68*FC\).
This is the average relationship for all the species considered together. However, we are letting each hummingbird species get its own intercept, so each hummingbird species can get its own equation as well. To incorporate the contribution of each species to the equation, we incorporate the random effect of each species into the intercept portion of the equation.
F0 <- fitted(Mlme1, level = 0) ## the fitted, fixed values for the population model. These values are the same for every hummingbird within a site -- this will plot our overall trend line
F1 <- fitted(Mlme1, level = 1) ## the fitted, random values for each hummingbird at each site (within-species fitted values). These vary for every hummingbird within a site -- this will plot the trend line for each individual species
I <- order(spec$Polygon.Area.100); Polygon.Area.100s <- sort(spec$Polygon.Area.100) ## order() gives the sorted row numbers corresponding to increasing forest cover, these numbers will be used to index the correct corresponding values of F0 (see plot function below), sort() gives the actual values of increasing forest cover to plot
thirteen2 <- unique(spec$fSpecies) ## I use this for indexing in my for-loop, it is simply a list of names of every species in the data set (there are thirteen species)
plot(Polygon.Area.100s, F0[I], lwd = 4, type = "l", ylab = "Specialization (d')", xlab = "Forest Cover (100m)", ylim = c(-5, 10)) ## draws average trend line
## following for-loop draws thirteen individual species trend lines
for(i in 1:length(thirteen2)) {
x1 <- spec$Polygon.Area.100[spec$fSpecies == thirteen2[i]]
y1 <- F1[spec$fSpecies == thirteen2[i]]
K <- order(x1)
lines(sort(x1), y1[K])
}
anova(M.gls3, Mlme1) ## compares the non-random effects model to the random-intercept model - which one fits better?
## Model df AIC BIC logLik Test L.Ratio p-value
## M.gls3 1 4 851.0649 863.4399 -421.5325
## Mlme1 2 5 807.9522 823.4209 -398.9761 1 vs 2 45.11275 <.0001
The AIC is lower for the random-intercept model and the L.Ratio is quite high, so the p-value is low (p<0.0001) which is good evidence that the random-effects model is a better fit for the data, and that each species does in fact have its own specialization relationship with forest cover.
However, we have good reason to think, biologically, that the relationship between specialization and forest cover is different for each hummingbird species (and thus not all species will have a relationship parallel to the population curve).
Thus, we will now fit a random intercept AND random slope model
Mlme2 <- lme(Form, random = ~1 + Polygon.Area.100|fSpecies, data = spec, weights = vf3) ## the "~1" specifies the random intecerpt and "Polygon.Area.100|fSpecies" allows the slope to vary by hummingbird species
summary(Mlme2)
## Linear mixed-effects model fit by REML
## Data: spec
## AIC BIC logLik
## 763.8287 785.485 -374.9144
##
## Random effects:
## Formula: ~1 + Polygon.Area.100 | fSpecies
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.5185718 (Intr)
## Polygon.Area.100 4.6121499 -0.079
## Residual 3.4333447
##
## Variance function:
## Structure: Power of variance covariate
## Formula: ~Polygon.Area.100
## Parameter estimates:
## power
## 0.6460465
## Fixed effects: list(Form)
## Value Std.Error DF t-value p-value
## (Intercept) 0.273699 0.2775939 151 0.9859696 0.3257
## Polygon.Area.100 3.919556 1.4091936 151 2.7814175 0.0061
## Correlation:
## (Intr)
## Polygon.Area.100 -0.324
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.7228479 -0.5905337 -0.0581716 0.5215418 3.5139731
##
## Number of Observations: 165
## Number of Groups: 13
The amount of variation around the intercept is 0.522 and the random vairation around the slope is 4.612 indicating that there is more variation in slopes than in intercepts between the hummingbird species (which is evident when you look at the plot below). There is a correlation between random intercepts and slopes of only -0.079 which is quite low (good).
AIC(Mlme1, Mlme2) ## does the random intercept model or the random intercept and slope model have a lower AIC?
## df AIC
## Mlme1 5 807.9522
## Mlme2 7 763.8287
The random intercept and slope model has a lower AIC than the random intercept model, which is a good sign.
## plotting the random intercept and slope model
F0 <- fitted (Mlme2, level = 0)
F1 <- fitted(Mlme2, level = 1)
I <- order(spec$Polygon.Area.100); Polygon.Area.100s <- sort(spec$Polygon.Area.100)
h.cols.rainbow = rainbow(13) ## because this time it would be nice to give a different color to each species
plot(Polygon.Area.100s, F0[I], lwd = 6, type = "l", ylab = "Specialization (d')", xlab = "Forest Cover (100m)", ylim = c(-3, 15))
for(i in 1:length(thirteen2)) {
x1 <- spec$Polygon.Area.100[spec$fSpecies == thirteen2[i]]
y1 <- F1[spec$fSpecies == thirteen2[i]]
K <- order(x1)
lines(sort(x1), y1[K], lwd = 2, lty = 4, col = h.cols.rainbow[i])
}
fixed.effects(Mlme2)
## (Intercept) Polygon.Area.100
## 0.2736991 3.9195557
random.effects(Mlme2) ## gives the intercept and slope adjustment to the population curve for each hummingbird species. To get the intercept and slope for each individual species you add each species' values below to the fixed effects values (i.e. the population curve values) given above.
## (Intercept) Polygon.Area.100
## Charming.Hummingbird -0.371743752 0.1196069
## Garden.Emerald 0.138848695 -2.7304097
## Green.crowned.Brilliant -0.343739802 -1.6064965
## Green.Hermit 0.148005627 10.4221947
## Long.billed.Starthroat -0.132222769 -4.3955088
## Rufous.tailed.Hummingbird 0.041034119 3.6722203
## Scaly.breasted.Hummingbird -0.343415166 -4.7436235
## Snowy.bellied.Hummingbird 0.552190446 -2.8180533
## Stripe.throated.Hermit 0.261321687 0.1362129
## Violet.crowned.Woodnymph -0.204904787 -0.9737176
## Violet.Sabrewing 0.189618494 -3.2726526
## White.tailed.Emerald 0.056395222 0.2902166
## White.tipped.Sicklebill 0.008611986 5.9000106
In practical terms, what these numbers mean is that if you were to catch a hummingbird in a site with no forest cover whatsoever (the intercept) and you didn’t know what species it was, you would predict that its specialization would be 0.27 because that’s the population estimate for all hummingbird species in general. However, if you caught, say, a Garden Emerald hummingbird in a site with no forest cover, because you know the species, you predict that its specialization is 0.27 + 0.14 = 0.41, which is a higher specialization than the population estimate.
Mlme3 <- lme(Specialization ~ 1, random = ~1|fSpecies, data = spec, weights = vf3)
summary(Mlme3)
## Linear mixed-effects model fit by REML
## Data: spec
## AIC BIC logLik
## 846.6653 859.0648 -419.3327
##
## Random effects:
## Formula: ~1 | fSpecies
## (Intercept) Residual
## StdDev: 1.443682 5.070841
##
## Variance function:
## Structure: Power of variance covariate
## Formula: ~Polygon.Area.100
## Parameter estimates:
## power
## 0.7504239
## Fixed effects: Specialization ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 1.549056 0.4422072 152 3.50301 6e-04
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.11140157 -0.27351685 0.07591166 0.63584959 3.40459610
##
## Number of Observations: 165
## Number of Groups: 13
Comparing our models with AIC
AIC(M.gls3, Mlme1, Mlme2, Mlme3)
## df AIC
## M.gls3 4 851.0649
## Mlme1 5 807.9522
## Mlme2 7 763.8287
## Mlme3 4 846.6653
Using this approach we would select our random intercept and random slope model (Mlme2) because it has the lowest AIC
Fitting the final model with Restricted Maximum Liklihood
M1 <- lme(Form, random = ~1 + Polygon.Area.100|fSpecies, data = spec, weights = vf3, method = "REML")
summary(M1)
## Linear mixed-effects model fit by REML
## Data: spec
## AIC BIC logLik
## 763.8287 785.485 -374.9144
##
## Random effects:
## Formula: ~1 + Polygon.Area.100 | fSpecies
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.5185718 (Intr)
## Polygon.Area.100 4.6121499 -0.079
## Residual 3.4333447
##
## Variance function:
## Structure: Power of variance covariate
## Formula: ~Polygon.Area.100
## Parameter estimates:
## power
## 0.6460465
## Fixed effects: list(Form)
## Value Std.Error DF t-value p-value
## (Intercept) 0.273699 0.2775939 151 0.9859696 0.3257
## Polygon.Area.100 3.919556 1.4091936 151 2.7814175 0.0061
## Correlation:
## (Intr)
## Polygon.Area.100 -0.324
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.7228479 -0.5905337 -0.0581716 0.5215418 3.5139731
##
## Number of Observations: 165
## Number of Groups: 13
fixed.effects(Mlme2)
## (Intercept) Polygon.Area.100
## 0.2736991 3.9195557
random.effects(Mlme2) ## gives the intercept and slope adjustment to the population curve for each hummingbird species. To get the intercept and slope for each individual species you add each species' values below to the fixed effects values (i.e. the population curve values) given above.
## (Intercept) Polygon.Area.100
## Charming.Hummingbird -0.371743752 0.1196069
## Garden.Emerald 0.138848695 -2.7304097
## Green.crowned.Brilliant -0.343739802 -1.6064965
## Green.Hermit 0.148005627 10.4221947
## Long.billed.Starthroat -0.132222769 -4.3955088
## Rufous.tailed.Hummingbird 0.041034119 3.6722203
## Scaly.breasted.Hummingbird -0.343415166 -4.7436235
## Snowy.bellied.Hummingbird 0.552190446 -2.8180533
## Stripe.throated.Hermit 0.261321687 0.1362129
## Violet.crowned.Woodnymph -0.204904787 -0.9737176
## Violet.Sabrewing 0.189618494 -3.2726526
## White.tailed.Emerald 0.056395222 0.2902166
## White.tipped.Sicklebill 0.008611986 5.9000106
In practical terms, what these numbers mean is that if you were to catch a hummingbird in a site with no forest cover whatsoever (the intercept) and you didn’t know what species it was, you would predict that its specialization would be 0.27 because that’s the population estimate for all hummingbird species in general. However, if you caught, say, a Garden Emerald hummingbird in a site with no forest cover, because you know the species, you predict that its specialization is \(0.27 + 0.14 = 0.41\), which is a higher specialization than the population estimate.
Another (easier) way to see the parameter estimates for each species is simply using the function coef()
coef(Mlme2)
## (Intercept) Polygon.Area.100
## Charming.Hummingbird -0.09804466 4.0391626
## Garden.Emerald 0.41254779 1.1891461
## Green.crowned.Brilliant -0.07004071 2.3130593
## Green.Hermit 0.42170472 14.3417505
## Long.billed.Starthroat 0.14147632 -0.4759531
## Rufous.tailed.Hummingbird 0.31473321 7.5917760
## Scaly.breasted.Hummingbird -0.06971607 -0.8240677
## Snowy.bellied.Hummingbird 0.82588954 1.1015025
## Stripe.throated.Hermit 0.53502078 4.0557686
## Violet.crowned.Woodnymph 0.06879431 2.9458381
## Violet.Sabrewing 0.46331759 0.6469032
## White.tailed.Emerald 0.33009432 4.2097723
## White.tipped.Sicklebill 0.28231108 9.8195663
Mlme3 <- lme(Specialization ~ 1, random = ~1|fSpecies, data = spec, weights = vf3)
summary(Mlme3)
## Linear mixed-effects model fit by REML
## Data: spec
## AIC BIC logLik
## 846.6653 859.0648 -419.3327
##
## Random effects:
## Formula: ~1 | fSpecies
## (Intercept) Residual
## StdDev: 1.443682 5.070841
##
## Variance function:
## Structure: Power of variance covariate
## Formula: ~Polygon.Area.100
## Parameter estimates:
## power
## 0.7504239
## Fixed effects: Specialization ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 1.549056 0.4422072 152 3.50301 6e-04
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.11140157 -0.27351685 0.07591166 0.63584959 3.40459610
##
## Number of Observations: 165
## Number of Groups: 13
Random effects only graph
## plotting the random intercept and slope model
F0 <- fitted (Mlme3, level = 0)
F1 <- fitted(Mlme3, level = 1)
I <- order(spec$Polygon.Area.100); Polygon.Area.100s <- sort(spec$Polygon.Area.100)
h.cols.rainbow = rainbow(13) ## because this time it would be nice to give a different color to each species
plot(Polygon.Area.100s, F0[I], lwd = 6, type = "l", ylab = "Specialization (d')", xlab = "Forest Cover (100m)", ylim = c(-3, 15))
for(i in 1:length(thirteen2)) {
x1 <- spec$Polygon.Area.100[spec$fSpecies == thirteen2[i]]
y1 <- F1[spec$fSpecies == thirteen2[i]]
K <- order(x1)
lines(sort(x1), y1[K], lwd = 2, lty = 4, col = h.cols.rainbow[i])
}
Comparing our models with AIC
AIC(M.gls3, Mlme1, Mlme2, Mlme3)
## df AIC
## M.gls3 4 851.0649
## Mlme1 5 807.9522
## Mlme2 7 763.8287
## Mlme3 4 846.6653
Using this approach we would select our random intercept and random slope model (Mlme2) because it has the lowest AIC
Fitting the final model with Restricted Maximum Liklihood
M1 <- lme(Form, random = ~1 + Polygon.Area.100|fSpecies, data = spec, weights = vf3, method = "REML")
summary(M1)
## Linear mixed-effects model fit by REML
## Data: spec
## AIC BIC logLik
## 763.8287 785.485 -374.9144
##
## Random effects:
## Formula: ~1 + Polygon.Area.100 | fSpecies
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.5185718 (Intr)
## Polygon.Area.100 4.6121499 -0.079
## Residual 3.4333447
##
## Variance function:
## Structure: Power of variance covariate
## Formula: ~Polygon.Area.100
## Parameter estimates:
## power
## 0.6460465
## Fixed effects: list(Form)
## Value Std.Error DF t-value p-value
## (Intercept) 0.273699 0.2775939 151 0.9859696 0.3257
## Polygon.Area.100 3.919556 1.4091936 151 2.7814175 0.0061
## Correlation:
## (Intr)
## Polygon.Area.100 -0.324
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.7228479 -0.5905337 -0.0581716 0.5215418 3.5139731
##
## Number of Observations: 165
## Number of Groups: 13
We now have evidence that hummingbird species’ specialization, in general, increases with increasing forest cover. However, not all hummingbird species show the same relationship; some species show a much greter increase in specialization with forest cover than others. Species becoming less specialized in less forested areas suggests that changes in resource availability as forest cover is lost are potentially forcing species to be less ‘picky’ in disturbed, agricultural areas.