A Comprehensive Guide to What is the RDA analysis

December 12, 2025 •

5 min read

According to a 2025 review on landscape genomics, redundancy analysis (RDA) is a highly versatile tool for identifying and evaluating relationships between genetic and environmental variation. A comprehensive guide on what is the RDA analysis is essential for researchers looking to understand and apply this multivariate statistical method effectively.

Quick Summary

Redundancy analysis is a multivariate statistical technique used to explain the variation observed in a set of response variables using a set of explanatory variables, commonly applied in ecological and genetic studies.

Key Points

Constrained Ordination: RDA is a form of constrained ordination that links response variables to a set of explanatory variables.
Combined Method: The analysis integrates multiple linear regression with principal component analysis (PCA) to explain variance.
Linear Relationships: RDA is most appropriate when linear relationships are assumed to exist between the response and explanatory variables.
Explained vs. Unexplained Variance: The results partition total variation into parts explained by the model (constrained axes) and residuals (unconstrained axes).
Biplot Visualization: A key output is the biplot, which graphically illustrates the relationships and strength of influence among all variables and observations.
Ecology and Genomics: It is a powerful tool frequently used in ecological and genetic research to study environmental gradients and genotype-environment associations.
Data Assumptions: Important assumptions include linearity and proper data transformation, which should be checked before running the analysis.

What is Redundancy Analysis?

Redundancy analysis (RDA) is a multivariate statistical method used to explain the variation in one set of variables (the response matrix, Y) using a second set of variables (the explanatory matrix, X). Conceived by Van den Wollenberg in 1977 as an alternative to Canonical Correlation Analysis (CCorA), RDA combines principles from multiple linear regression and principal component analysis (PCA). It is a "constrained" form of ordination, meaning the axes of the analysis are limited by the explanatory variables provided. This differs from an unconstrained ordination like PCA, which simply finds the axes of maximum variation without any external constraints.

Unlike CCorA, which is a symmetric method, RDA is non-symmetric. It specifically seeks linear combinations of the explanatory variables (X) that can best explain the variation in the response variables (Y). This makes it a powerful tool for testing hypotheses about how certain environmental factors might influence species composition, gene expression, or other complex ecological datasets.

How Does RDA Work?

An RDA conceptually involves two main steps. First, a series of multiple linear regressions are performed, with each response variable regressed against all explanatory variables. This produces a matrix of fitted values. The second step is to run a PCA on this matrix of fitted values. The resulting principal components are the constrained or canonical axes of the RDA. The variation accounted for by these axes is the portion of the response data explained by the explanatory variables. The remaining, unexplained variation is captured by unconstrained axes, which represent the residuals of the analysis.

Key Components of RDA

Response Variables (Y): This data matrix contains the variables you wish to explain or predict. Examples include species abundance in different sites or gene expression levels in different individuals.
Explanatory Variables (X): This matrix contains the variables that are hypothesized to influence the response variables. These can be environmental factors like pH, temperature, or land use type.
Constrained (Canonical) Axes: These axes represent the variation in the response data that is statistically explained by the explanatory variables. They are linear combinations of the explanatory variables.
Unconstrained (Residual) Axes: These represent the variation in the response data that remains unexplained by the explanatory variables included in the model.

Interpreting the RDA Biplot

The most common way to visualize RDA results is through a biplot. This plot displays the relationships among the response variables, explanatory variables, and observations (e.g., samples, sites) in a reduced-dimensional space.

Visual interpretation guidelines:

Arrow Length: The length of an explanatory variable's arrow indicates the strength of its correlation with the constrained ordination axes. Longer arrows indicate a greater influence.
Arrow Direction: The direction of an arrow shows the environmental gradient. Response variables pointing in a similar direction to an explanatory variable's arrow are positively correlated with it.
Angle Between Arrows: The angle between two variable arrows reflects their correlation. A small angle indicates a strong positive correlation, a 90-degree angle indicates no correlation, and a 180-degree angle indicates a strong negative correlation.
Observation Points: Points representing individual samples or sites are positioned based on their scores on the ordination axes. Sites that plot closer together are more similar in their response variable composition.

When to Use RDA: Applications and Examples

RDA is a versatile tool applicable in any field dealing with multivariate data where a directional or constrained analysis is needed. Its primary use case is when you assume that the relationships between your response and explanatory variables are linear.

Some common applications include:

Ecology: Analyzing how environmental variables like soil properties, temperature, or precipitation influence community composition (e.g., microbes, plants, animals).
Genomics: Explaining patterns in genetic data (e.g., SNP data) using environmental factors to assess genotype-environment associations.
Bioinformatics: Understanding how different treatments or conditions affect gene expression profiles by using treatment labels as explanatory variables.
Soil Science: Investigating how management practices influence a suite of soil properties simultaneously.

RDA vs. CCA: Choosing the Right Analysis

The choice between RDA and Canonical Correspondence Analysis (CCA) is crucial and depends on the underlying data structure, specifically the expected nature of the relationships between variables.


Feature	Redundancy Analysis (RDA)	Canonical Correspondence Analysis (CCA)
Model Assumption	Assumes linear relationships between response and explanatory variables.	Assumes unimodal (curved, bell-shaped) relationships between response and explanatory variables.
Distance Measure	Based on Euclidean distances. Appropriate for shorter gradients.	Based on chi-square distances. Appropriate for longer gradients.
Focus	Maximizes the explained variance of the response variables.	Maximizes the correlation between site and species scores.
Data Type	Handles quantitative and qualitative variables. Can be sensitive to variables with many zeros.	Preferred for data with many double-zeros, like species abundance data across long gradients.
Application	Ecological studies with linear environmental responses, genomics.	Ecological niche modeling, long environmental gradients.

The Step-by-Step Process for Conducting RDA

Data Preparation: Organize your data into a response matrix (Y) and an explanatory matrix (X). Each matrix should have the same number of observations (rows). Consider data transformations or scaling if necessary, for example, using Hellinger transformation for species abundance data.
Multicollinearity Check: Evaluate the explanatory variables for high correlations among themselves. Highly correlated variables can inflate variance. If necessary, remove or combine some variables.
Model Fitting: Use statistical software like R (with the vegan package) or XLSTAT to fit the RDA model. The software performs the regressions and constrained PCA internally.
Significance Testing: Perform permutation tests (e.g., ANOVA on the RDA model) to determine if the relationship between the explanatory and response variables is statistically significant. This validates the reliability of the model.
Result Interpretation: Analyze the model summary, including eigenvalues and inertia percentages, to understand the amount of variance explained. Focus on the constrained axes.
Visualization: Generate and interpret a biplot to visualize the relationships between all variables and observations. Assess arrow directions, lengths, and angles to draw conclusions.
Refine and Report: Based on the results, you can perform further analyses like partial RDA to account for nuisance variables or build a more refined model. The final report should include the amount of explained variance and an interpretation of the biplot.

Conclusion

Redundancy analysis provides a structured, hypothesis-driven approach to explore complex relationships within multivariate data. By merging elements of regression and PCA, RDA effectively identifies how a set of predictor variables drives variation in a set of response variables. Its visual output, the biplot, is an intuitive way to represent these multivariate relationships, making it a valuable tool for researchers across ecology, genetics, and other scientific domains. While it assumes linear relationships, its practical utility and interpretability solidify its place as a cornerstone of modern data analysis, particularly within the ecological community.

Visit the R vegan package documentation for more details on performing RDA.

Frequently Asked Questions

The primary purpose of an RDA analysis is to determine the linear combinations of a set of explanatory variables that best explain the variation observed in a set of response variables. This helps to understand and visualize the relationships between the two datasets.

RDA differs from PCA primarily because it is a constrained ordination method, while PCA is unconstrained. RDA uses a second matrix of explanatory variables to constrain the axes, whereas PCA simply finds the axes of maximum variation in a single dataset without considering external factors.

You should use RDA when you assume a linear relationship between your variables. Use CCA when the relationships are expected to be unimodal (curved or bell-shaped), typically over longer ecological gradients. The decision is often guided by the gradient length of the data.

On an RDA biplot, arrows typically represent explanatory variables. The length of an arrow indicates the strength of its correlation with the ordination axes, and its direction represents the gradient of its influence. Response variables can also be plotted as points or arrows.

The constrained, or canonical, axes represent the portion of the total variation in the response data that is statistically explained by the explanatory variables in the model. The unconstrained, or residual, axes represent the remaining variation that is not explained by the model.

RDA is widely used in ecology, environmental science, and genetics. Specific applications include analyzing species abundance in relation to environmental factors, assessing genotype-environment associations, and evaluating the effects of different soil management practices.

Key assumptions for RDA include linear relationships between variables, homogeneity of variances, and the independence of observations. Appropriate data scaling and transformation should be applied to meet these assumptions and ensure reliable results.

Popular software options for performing RDA include R, using packages like vegan, ade4, and BiodiversityR. Other tools include specialized statistical software like XLSTAT, an Excel add-on, and PAST.

A significant p-value from a permutation test indicates that the relationship between the response and explanatory variables is statistically significant. This allows you to reject the null hypothesis that there is no linear relationship between the two datasets.