Fixing the bridge between biologists and statisticians

Models are wrong... but, some are useful (G. Box)!


Correcting for multiplicity in the 'emmeans' package

Published at January 21, 2026 ·  7 min read

In my recent book (see below), on page 166 and earlier, I made the point that, with pairwise comparisons and, more generally, whenever simultaneous statistical tests are performed, it is necessary to provide P-values that account for the familywise error rate, i.e. the probability of committing at least one incorrect rejection within the whole family of simultaneous tests (i.e. adjusted P-values). In this respect, it may be useful to recall that, for a single non-significant test, the comparison-wise error rate \(E_c\) is the probability of a wrong rejection for that single test (based on a non-adjusted P-value), whereas the probability of at least one wrong rejection within a family of \(k\) comparisons is much higher.

...

How do we combine errors, in biology? The delta method

Published at December 12, 2025 ·  8 min read

In a recent post I have shown that we can build linear combinations of model parameters (see here ). For example, if we have two parameter estimates, say \(X\) and \(Z\), with standard errors respectively equal to \(\sigma_X\) and \(\sigma_Z\) and a covariance of \(\sigma_{XZ}\) we can build a linear combination as follows:

\[ Y = f(X,Z) = aX + bZ + c\]

where \(a\), \(b\) and \(c\) are three numeric constants. The standard error for this combination can be obtained as:

...

Using `lme()` to fit the Environmental Variance mixed model to genotype experiments

Published at December 4, 2025 ·  11 min read

Yield stability is a key aspect in the selection of crop genotypes. Its definition is not entirely straightforward (see, for example, Annichiarico, 2002), but, in simple terms, it refers to the ability of a crop to maintain its yield potential across different environments, helping farmers safeguard their income. Several statistical indicators of stability have been proposed (see, e.g., Mohammadi, 2008). In this post, however, I will focus on the so-called environmental variance, which represents the portion of phenotypic variance attributable to environmental (non-genetic) factors and is measured as the overall variance across environments for each genotype.

...

Using `lme()` to fit the Stability Variance mixed model to genotype experiments

Published at December 2, 2025 ·  11 min read

Yield stability is a fundamental aspect of the selection of crop genotypes. Its definition is rather complex (see, for example, Annichiarico, 2002), but, in simple terms, it represents the ability of a crop to maintain its potential yield level across environments, which helps farmers preserve their income. Several statistical indicators of stability exist (see, e.g., Mohammadi, 2008) and, in this post, I would like to concentrate on the so-called stability variance, that is, for a specific genotype, the amount of yield variability across different environments, after correcting for the additive effects of each environment, which are common to all genotypes under investigation (Shukla, 1972).

...

Getting the Absolute/Relative Growth Rate from growth curves

Published at November 27, 2025 ·  7 min read

Yesterday, a colleague of mine pointed me to the article “How to fit nonlinear plant growth models and calculate growth rates: an update for ecologists” (Paine et al., 2012). It addresses a relevant topic: many plant scientists are involved in growth analyses and need to determine Absolute Growth Rates (AGRs) and Relative Growth Rates (RGRs).

The main point made by Paine et al. is that we can use the observed data to fit a growth model via nonlinear regression and then calculate model-derived growth rates together with their standard errors. In principle, the process is straightforward: we select a suitable growth model to predict biomass at any given time \(t\); the AGR at time \(t\) is the derivative of the selected growth function with respect to time, and the RGR is simply the AGR at time \(t\) divided by the biomass at that same time point.

...

Why are derivatives important in life? A case-study with nonlinear regression

Published at November 26, 2025 ·  7 min read

In general, undergraduate students in biology/ecology courses tend to consider the derivatives as a very abstract entity, with no real usefulness in the everyday life. In my work as a teacher, I have often tried to fight against such an attitude, by providing convincing examples on how we can use the derivatives to get a better understanding about the changes on a given system.

In this post I’ll tell you about a recent situation where I was involved with derivatives. A few weeks ago, a colleague of mine wrote me to ask the following question (I’m changing it a little, to make it, hopefully, more interesting). He asked: “I am using a power curve to model how the size of the sampling area affects species richness. How can I quantify my knowledge gain?”. This is an interesting question, indeed, although I feel I should provide you with some background information.

...

Field Research methods in Agriculture

Published at November 25, 2025 ·  2 min read

Hi everybody, I have exciting news!

After a few months of silence, I’m thrilled to finally share the reason why: I’ve been working intensely on a new book project, which has taken up most of my spare time. And now… the book is out!

This book is titled ‘Field Research Methods in Agriculture’ and it is published by Springer Nature. It offers a clear, accessible, and practical introduction to experimental design and basic data analysis for field experiments in agriculture and related disciplines. It’s specifically designed for students, researchers, and practitioners who want to strengthen their methodological skills without getting lost in heavy mathematics.

...

Dealing with correlation in designed field experiments: part II

Published at February 10, 2025 ·  11 min read

With field experiments, studying the correlation between the observed traits may not be an easy task. For example, we can consider a genotype experiment, laid out in randomised complete blocks, with 27 wheat genotypes and three replicates, where several traits were recorded, including yield (Yield) and weight of thousand kernels (TKW). We might be interested in studying the correlation between those two traits, but we would need to face two fundamental problems:

...

A trip from variance-covariance to correlation and back

Published at January 24, 2025 ·  6 min read

The variance-covariance and the correlation matrices are two entities that describe the association between the columns of a two-way data matrix. They are very much used, e.g., in agriculture, biology and ecology and they can be easily calculated with base R, as shown in the box below.

data(mtcars)
matr <- mtcars[,1:4]

# Covariances
Sigma <- cov(matr)

# Correlations
R <- cor(matr)

Sigma
##              mpg        cyl       disp        hp
## mpg    36.324103  -9.172379  -633.0972 -320.7321
## cyl    -9.172379   3.189516   199.6603  101.9315
## disp -633.097208 199.660282 15360.7998 6721.1587
## hp   -320.732056 101.931452  6721.1587 4700.8669
R
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000

It is useful to be able to go back and forth from variance-covariance to correlation, without going back to the original data matrix. Let’s consider that the variance-covariance of the two variables X and Y is:

...

How do we combine errors? The linear case

Published at November 22, 2024 ·  7 min read

In our research work, we usually fit models to experimental data. Our aim is to estimate some biologically relevant parameters, together with their standard errors. Very often, these parameters are interesting in themselves, as they represent means, differences, rates or other important descriptors. In other cases, we use those estimates to derive further indices, by way of some appropriate calculations. For example, think that we have two parameter estimates, say Q and W, with standard errors respectively equal to \(\sigma_Q\) and \(\sigma_W\): it might be relevant to calculate the amount:

...