Published at November 30, 2023 · 5 min read

I am one of those old guys who still uses the stabilising transformations, when the data do not conform to the basic assumptions for ANOVA. Indeed, apart from counts and proportions, where GLMs can be very useful, I have not yet found a simple way to deal with heteroscedasticity for continuous variables, such as yield, weight, height and so on. Yes, I know, Generalised Least Squares (GLS) can be useful to fit heteroscedastic models, but I would argue that stabilising transformations are, conceptually, very much simpler and they can be easily thought to PhD students and practitioners, with only a basic level of knowledge about statistics.

...

Designed experiments with replicates: Principal components or Canonical Variates?

Published at November 2, 2023 · 16 min read

A few days ago, a colleague of mine wanted to hear my opinion about what multivariate method would be the best for a randomised field experiment with replicates. We had a nice discussion and I thought that such a case-study might be generally interesting for the agricultural sciences; thus, I decided to take my Apple Mac-Book PRO, sit down, relax and write a new post on this matter.

My colleague’s research study was similar to this one: a randomised block field experiment (three replicates) with 16 durum wheat genotypes, which was repeated in four years. The quality of grain yield was assessed by recording the following four traits:

...

GGE analyses for multi-environment studies

Published at May 31, 2023 · 12 min read

In a recent post we have seen that we can use Principal Component Analyses (PCA) to elucidate the ‘genotype by environment’ relationship (see this post). Whenever the starting point for PCA is the doubly-centered (centered by rows and columns) matrix of yields across environments, we talk about AMMI analysis, which is often used to get insight into the stability of genotype yields across environments.

By changing the starting matrix, we can obtain a different perspective and put focus on the definition of macroenvironments and on the selection of winning genotypes. In particular, if the two-way matrix of yields across environments is only column-centered before PCA, we talk about GGE analysis (Yan et al., 2000). In spite of some academic debate (see Gauch, 2006, Yan et al., 2007, Gauch et al., 2008), AMMI and GGE analyses are both useful and can be used as two complementary tools for the analysis of multi-environment genotype data.

...

AMMI analyses for multi-environment studies

Published at May 26, 2023 · 19 min read

Again into a subject that is rather important for most agronomists, i.e. the selection of crop varieties. All farmers are perfectly aware that crop performances are affected both by the genotype and by the environment. These two effects are not purely additive and they often show a significant interaction. By this word, we mean that a genotype can give particularly good/bad performances in some specific environmental situations, which we may not expect, considering its average behaviour in other environmental conditions. The Genotype by Environment (GE) interaction may cause changes in the ranking of genotypes, depending on the environment and may play a key role in varietal recommendation, for a given mega-environment.

...

Repeated measures with perennial crops

Published at March 30, 2023 · 8 min read

In this post, I want to discuss a concept that is often mistaken by some of my collegues. With all crops, we are used to repeating experiments across years to obtain multi-year data; the structure of the resulting dataset is always the same and it is exemplified in the box below, that refers to a multi-year genotype experiment with winter wheat.

rm(list = ls())
library(tidyverse)
library(nlme)
library(emmeans)
filePath <- "https://www.casaonofri.it/_datasets/WinterWheat.csv"
dataset <- read.csv(filePath)
dataset <- dataset %>%
  mutate(across(c(1:3, 5), .fns = factor))
head(dataset)
##   Plot Block Genotype Yield Year
## 1    2     1 COLOSSEO  6.73 1996
## 2  110     2 COLOSSEO  6.96 1996
## 3  181     3 COLOSSEO  5.35 1996
## 4    2     1 COLOSSEO  6.26 1997
## 5  110     2 COLOSSEO  7.01 1997
## 6  181     3 COLOSSEO  6.11 1997

We can see that we have a column for the blocks, a column for the experimental factor (the genotype, in this instance), a column for the year and a column for the response variable (the yield, in this instance).

...

Subsampling in field experiments

Published at March 29, 2023 · 11 min read

Subsampling is very common in field experiments in agriculture. It happens when we collect several random samples from each plot and we submit them to some sort of measurement process. Some examples? Let’s imagine that we have randomised field experiments with three replicates and, either,:

we collect the whole grain yield in each plot, select four subsamples and measure, in each subsample, the oil content or some other relevant chemical property, or
we collect, from each plot, four plants and measure their heights, or
we collect a representative soil sample from each plot and perform chemical analyses in triplicate.

For all the above examples, we end up with 3 by 4 equal 12 data for each treatment level. The question is: do we have 12 replicates? This is exactly the point: subsamples should never be mistaken for true-replicates, as the experimental treatments were not independently allocated to each one of them. In literature, subsamples are usually known as sub-replicates or pseudo-replicates: for the above examples, we have three true-replicates and four pseudo-replicates per true-replicate. Let’s see how to handle pseudo-replicates in data analysis. But, first of all, do not forget that: experiments with pseudo-replicates are valid only when we also have true-replicates! If we only have pseudo-replicates… well, there is nothing we can do in data analysis that transforms our experiment into a valid one…

...

Fitting threshold models to seed germination data

Published at March 13, 2023 · 19 min read

In previous posts we have shown that we can use time-to-event curves to describe the germination pattern of a seed population (see here). We have also shown that these curves can be modified to include the effects of external/internal factors/covariates, such as the genotype, the species, the humidity content and temperature in the substrate (see here and here). These modified time-to-event curves can be fitted in ‘one-step’, i.e., we start from the germination data with the appropriate shape (see here), fit the model and retrieve the estimates of model parameters ( go to here for an example ).

...

Fitting thermal-time-models to seed germination data

Published at February 10, 2023 · 7 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper.

A motivating examples

In recent times, we wanted to model the effect of temperature on seed germination for Hordeum vulgare and we made an assay with three replicated Petri dishes (50 seeds each) at 9 constant temperature levels (1, 3, 7, 10, 15, 20, 25, 30, 35, 40 °C). Germinated seeds were counted and removed daily for 10 days. This unpublished dataset is available as barley in the drcSeedGerm package, which needs to be installed from github (see below), together with the drcte package for time-to-event model fitting. The following code loads the necessary packages, loads the datasets and shows the first six lines.

...

Fitting hydro-thermal-time-models to seed germination data

Published at February 10, 2023 · 15 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper.

Germination assay

This dataset was obtained from previously published work (Mesgaran et al., 2017) with Hordeum spontaneum [C. Koch] Thell. The germination assay was conducted using four replicates of 20 seeds tested at six different water potential levels (0, −0.3, −0.6, −0.9, −1.2 and −1.5 MPa). Osmotic potentials were produced using variable amount of polyethylene glycol (PEG, molecular weight 8000) adjusted for the temperature level. Petri dishes were incubated at six constant temperature levels (8, 12, 16, 20, 24 and 28 °C), under a photoperiod of 12 h. Germinated seeds (radicle protrusion > 3 mm) were counted and removed daily for 20 days.

...

The coefficient of determination: is it the R-squared or r-squared?

Published at November 26, 2022 · 9 min read

We often use the coefficient of determination as a swift ‘measure’ of goodness of fit for our regression models. Unfortunately, there is no unique symbol for such a coefficient and both $R^2$ and $r^2$ are used in literature, almost interchangeably. Such an interchangeability is also endorsed by the Wikipedia (see at: https://en.wikipedia.org/wiki/Coefficient_of_determination ), where both symbols are reported as the abbreviations for this statistical index.

As an editor of several International Journals, I should not agree with such an approach; indeed, the two symbols $R^2$ and $r^2$ mean two different things, and they are not necessarily interchangeable, because, depending on the setting, either of the two may be wrong or ambiguous. Let’s pay a little attention to such an issue.

...

Fixing the bridge between biologists and statisticians

Models are wrong... but, some are useful (G. Box)!

A motivating examples

Germination assay

Tags

Recent posts

Dealing with correlation in designed field experiments: part II

A trip from variance-covariance to correlation and back

How do we combine errors? The linear case

How do we combine errors, in biology? The delta method

Plotting weather data with ggplot()

Archives