Fixing the bridge between biologists and statisticians

Dealing with correlation in designed field experiments: part II

Published at February 10, 2025 · 11 min read

With field experiments, studying the correlation between the observed traits may not be an easy task. For example, we can consider a genotype experiment, laid out in randomised complete blocks, with 27 wheat genotypes and three replicates, where several traits were recorded, including yield (Yield) and weight of thousand kernels (TKW). We might be interested in studying the correlation between those two traits, but we would need to face two fundamental problems:

...

Repeated measures and subsampling with perennial crops

Published at December 4, 2023 · 5 min read

In a recent post, I have talked about repeated measures, for a case where measurements were taken repeatedly in the same plots across years see here. Previously, in another post, I had talked about subsampling, for a case where several random samples were taken from the same plot see here.

Repeated measures and subsampling are vastly different: in the first case I am specifically interested in the ‘evolution’ of the response over time (or space, sometimes). In the second case (subsampling), I only want to improve the precision/accuracy of my measurements, by taking multiple random samples in each plot.

...

Repeated measures with perennial crops

Published at March 30, 2023 · 8 min read

In this post, I want to discuss a concept that is often mistaken by some of my collegues. With all crops, we are used to repeating experiments across years to obtain multi-year data; the structure of the resulting dataset is always the same and it is exemplified in the box below, that refers to a multi-year genotype experiment with winter wheat.

rm(list = ls())
library(tidyverse)
library(nlme)
library(emmeans)
filePath <- "https://www.casaonofri.it/_datasets/WinterWheat.csv"
dataset <- read.csv(filePath)
dataset <- dataset %>%
  mutate(across(c(1:3, 5), .fns = factor))
head(dataset)
##   Plot Block Genotype Yield Year
## 1    2     1 COLOSSEO  6.73 1996
## 2  110     2 COLOSSEO  6.96 1996
## 3  181     3 COLOSSEO  5.35 1996
## 4    2     1 COLOSSEO  6.26 1997
## 5  110     2 COLOSSEO  7.01 1997
## 6  181     3 COLOSSEO  6.11 1997

We can see that we have a column for the blocks, a column for the experimental factor (the genotype, in this instance), a column for the year and a column for the response variable (the yield, in this instance).

...

Subsampling in field experiments

Published at March 29, 2023 · 11 min read

Subsampling is very common in field experiments in agriculture. It happens when we collect several random samples from each plot and we submit them to some sort of measurement process. Some examples? Let’s imagine that we have randomised field experiments with three replicates and, either,:

we collect the whole grain yield in each plot, select four subsamples and measure, in each subsample, the oil content or some other relevant chemical property, or
we collect, from each plot, four plants and measure their heights, or
we collect a representative soil sample from each plot and perform chemical analyses in triplicate.

For all the above examples, we end up with 3 by 4 equal 12 data for each treatment level. The question is: do we have 12 replicates? This is exactly the point: subsamples should never be mistaken for true-replicates, as the experimental treatments were not independently allocated to each one of them. In literature, subsamples are usually known as sub-replicates or pseudo-replicates: for the above examples, we have three true-replicates and four pseudo-replicates per true-replicate. Let’s see how to handle pseudo-replicates in data analysis. But, first of all, do not forget that: experiments with pseudo-replicates are valid only when we also have true-replicates! If we only have pseudo-replicates… well, there is nothing we can do in data analysis that transforms our experiment into a valid one…

...

Multi-environment split-plot experiments

Published at September 13, 2022 · 7 min read

Have you made a split-plot field experiment? Have you repeated such an experiment in two (or more) years/locations? Have you run into troubles, because the reviewer told you that your ANOVA model was invalid? If so, please, stop for awhile and read: this post might help you understand what was wrong with your analyses.

Let’s think of a field experiment, where 6 genotypes of faba bean were compared under two different sowing times (autumn and spring). For the ease of organisation, such an experiment was laid down as a split-plot, in a randomised complete block design; sowing times were randomly allocated to main-plots, while genotypes were randomly allocated to sub-plots. Let’s stop for a moment… does this sound strange to you? Do you need further information about split-plot designs? You can get some general information at this link and hints on how to analyse the results at this other link

...

Meta-analysis for a single study. Is it possible?

Published at July 21, 2022 · 12 min read

We all know that the word meta-analysis encompasses a body of statistical techniques to combine quantitative evidence from several independent studies. However, I have recently discovered that meta-analytic methods can also be used to analyse the results of a single research project. That happened a few months ago, when I was reading a paper from Damesa et al. (2017), where the authors describe some interesting methods of data analyses for multi-environment genotype experiments. These authors gave a few nice examples with related SAS code, that is rooted in mixed models. As an R enthusiast, I was willing to reproduce their analyses with R, but I could not succeed, until I realised that I could make use of the package ‘metafor’ and its bunch of meta-analityc methods.

...

Split-plot designs: the transition to mixed models for a dinosaur

Published at February 11, 2021 · 15 min read

Those who long ago took courses in ‘analysis of variance’ or ‘experimental design’ … would have learned methods … based on observed and expected mean squares and methods of testing based on ‘error strata’ (if you weren’t forced to learn this, consider yourself lucky). (Douglas Bates, 2006).

In a previous post, I already mentioned that, due to my age, I see myself as a dinosaur within the R-users community. I already mentioned how difficult it is, for a dinosaur, to adjust to new concepts and paradigms in data analysis, after having done things differently for a long time ( see this post here ). Today, I decided to sit and write a second post, relating to data analyses for split-plot designs. Some years ago, when switching to R, this topic required some adjustments to my usual workflow, which gave me a few headaches.

...

Accounting for the experimental design in linear/nonlinear regression analyses

Published at December 4, 2020 · 11 min read

In this post, I am going to talk about an issue that is often overlooked by agronomists and biologists. The point is that field experiments are very often laid down in blocks, using split-plot designs, strip-plot designs or other types of designs with grouping factors (blocks, main-plots, sub-plots). We know that these grouping factors should be appropriately accounted for in data analyses: ‘analyze them as you have randomized them’ is a common saying attributed to Ronald Fisher. Indeed, observations in the same group are correlated, as they are more alike than observations in different groups. What happens if we neglect the grouping factors? We break the independence assumption and our inferences are invalid (Onofri et al., 2010).

...

Building ANOVA-models for long-term experiments in agriculture

Published at August 20, 2020 · 29 min read

This is the follow-up of a manuscript that we (some colleagues and I) have published in 2016 in the European Journal of Agronomy (Onofri et al., 2016). I thought that it might be a good idea to rework some concepts to make them less formal, simpler to follow and more closely related to the implementation with R. Please, be patient: this lesson may be longer than usual.

First question: what are long-term experiments? Agricultural experiments have to deal with long-term effects of cropping practices. Think about fertilisation: certain types of organic fertilisers may give effects on soil fertility, which are only observed after a relatively high number of years (say: 10-15). In order to observe those long-term effects, we need to plan Long Term Experiments (LTEs), wherein each plot is regarded as a small cropping system, with the selected combination of rotation, fertilisation, weed control and other cropping practices. Due to the fact that yield and other relevant variables are repeatedly recorded over time, LTEs represent a particular class of multi-environment experiments with repeated measures.

...

Fitting complex mixed models with nlme. Example #5

Published at June 5, 2020 · 14 min read

Joint Regression is a very old, but, nonetheless, useful technique. It is widely known that the yield of a genotype in different environments depends on environmental covariates, such as the amount of rainfall in some critical periods of time. Apart from rain, also temperature, wind, solar radiation, air humidity and soil characteristics may concur to characterise a certain environment as good or bad and, ultimately, to determine yield potential.

Early in the 60s, several authors proposed that the yield of genotypes is expressed as a function of an environmental index $e_j$ , measuring the yield potential of each environment $j$ (Finlay and Wilkinson, 1963; Eberhart and Russel, 1966; Perkins and Jinks, 1968). For example, for a genotype $i$ , we could write:

...

Fitting 'complex' mixed models with 'nlme': Example #2

Published at September 13, 2019 · 10 min read

In this post I am going to deal with a repeated split-plot experiment with heteroscedastic errors. Ready for this trip?

Let’s imagine a field experiment, where different genotypes of khorasan wheat are to be compared under different nitrogen (N) fertilisation systems. Genotypes require bigger plots, with respect to fertilisation treatments and, therefore, the most convenient choice would be to lay-out the experiment as a split-plot, in a randomised complete block design. Genotypes would be randomly allocated to main plots, while fertilisation systems would be randomly allocated to sub-plots. As usual in agricultural research, the experiment should be repeated in different years, in order to explore the environmental variability of results.

...

Testing for interactions in nonlinear regression

Published at September 13, 2019 · 11 min read

Factorial experiments are very common in agriculture and they are usually laid down to test for the significance of interactions between experimental factors. For example, genotype assessments may be performed at two different nitrogen fertilisation levels (e.g. high and low) to understand whether the ranking of genotypes depends on nutrient availability. For those of you who are not very much into agriculture, I will only say that such an assessment is relevant, because we need to know whether we can recommend the same genotypes, e.g., both in conventional agriculture (high nitrogen availability) and in organic agriculture (relatively lower nitrogen availability).

...

The 'Environmental Variance' model for Multi-Environment Experiments

Published at August 20, 2019 · 9 min read

Fitting mixed models has become very common in biology and recent developments involve the manipulation of the variance-covariance matrix for random effects and residuals. To the best of my knowledge, within the frame of frequentist methods, the only freeware solution in R should be based on the ‘nlme’ package, as the ‘lmer’ package does not easily permit such manipulations. The ‘nlme’ package is fully described in Pinheiro and Bates (2000). Of course, the ‘asreml’ package can be used, but, unfortunately, this is not freeware.

...

Genotype experiments: fitting a stability variance model with R

Published at June 6, 2019 · 8 min read

Yield stability is a fundamental aspect for the selection of crop genotypes. The definition of stability is rather complex (see, for example, Annichiarico, 2002); in simple terms, the yield is stable when it does not change much from one environment to the other. It is an important trait, that helps farmers to maintain a good income in most years.

Agronomists and plant breeders are continuosly concerned with the assessment of genotype stability; this is accomplished by planning genotype experiments, where a number of genotypes is compared on randomised complete block designs, with three to five replicates. These experiments are repeated in several years and/or several locations, in order to measure how the environment influences yield level and the ranking of genotypes.

...

Dealing with correlation in designed field experiments: part II (asreml)

Published at May 10, 2019 · 16 min read

With field experiments, studying the correlation between the observed traits may not be an easy task. Indeed, in these experiments, subjects are not independent, but they are grouped by treatment factors (e.g., genotypes or weed control methods) or by blocking factors (e.g., blocks, plots, main-plots). I have dealt with this problem in a previous post and I gave a solution based on traditional methods of data analyses.

In a recent paper, Piepho (2018) proposed a more advanced solution based on mixed models. He presented four examplary datasets and gave SAS code to analyse them, based on PROC MIXED. I was very interested in those examples, but I do not use SAS. Therefore, I tried to ‘transport’ the models in R, which turned out to be a difficult task. After struggling for awhile with several mixed model packages, I came to an acceptable solution, which I would like to share.

...

#Mixed_models

Tags

Recent posts

Dealing with correlation in designed field experiments: part II

A trip from variance-covariance to correlation and back

How do we combine errors? The linear case

How do we combine errors, in biology? The delta method

Plotting weather data with ggplot()

Archives