Metric Multidimensional Scaling

Multivariate Statistics
mds
classical-mds
pcoa
Representing a distance matrix in low-dimensional Euclidean coordinates
Published

April 17, 2026

Introduction

Metric multidimensional scaling — also known as classical MDS or principal coordinates analysis (PCoA) — takes a pairwise distance matrix and produces a low-dimensional Euclidean configuration whose pairwise distances approximate the input as closely as possible. It is the workhorse for visualising similarity or distance data when raw coordinates are unavailable: think reconstructing a map from inter-city distances, projecting microbial-community Bray-Curtis dissimilarities, or visualising a kernel-derived similarity matrix in a 2D plot.

Prerequisites

A working understanding of distance and similarity matrices, principal components analysis, and basic linear algebra (eigendecomposition).

Theory

Given an \(n \times n\) distance matrix \(D\), form the doubly-centred inner-product matrix

\[B = -\tfrac{1}{2}\bigl(I - J/n\bigr) D^{(2)} \bigl(I - J/n\bigr),\]

where \(D^{(2)}\) has elements \(d_{ij}^2\) and \(J\) is the all-ones matrix. Eigendecompose \(B = U \Lambda U^\top\). The top \(k\) eigenvectors scaled by \(\sqrt{\lambda}\) give a \(k\)-dimensional configuration whose Euclidean inter-point distances approximate the original distances. When \(D\) comes from Euclidean distances on raw data, classical MDS is equivalent to PCA on the centred data — the eigenvalues match the squared singular values of the centred data matrix.

For non-Euclidean distances, \(B\) may have negative eigenvalues; the configuration still uses only the top positive eigenvalues, with the discarded negative mass quantifying the embedding error.

Assumptions

A pairwise distance matrix that is at least approximately Euclidean-embeddable. Severe violations (negative eigenvalues dominating) suggest non-metric MDS as a more appropriate alternative.

R Implementation

# Classical MDS on US city distances
data(UScitiesD)
mds <- cmdscale(UScitiesD, k = 2, eig = TRUE)
plot(mds$points, type = "n")
text(mds$points, labels = labels(UScitiesD))

# Eigenvalues and variance explained
mds$eig[1:5]
cumsum(mds$eig[1:5] / sum(mds$eig[1:5]))

Output & Results

cmdscale() returns the low-dimensional configuration ($points), eigenvalues, and an indicator of how well the chosen \(k\) approximates the original distances. The plot of US-city MDS coordinates recovers the recognisable geographical structure with a near-perfect rotation and reflection of the actual map.

Interpretation

A reporting sentence: “Classical MDS on the 10 inter-city distances produced a two-dimensional configuration that captured 99 % of the eigen-mass; the resulting layout matches actual geography after a reflection, with east-west and north-south structure immediately visible.” Always report the proportion of eigen-mass retained in the displayed dimensions; a poor approximation signals that more dimensions or non-metric MDS would help.

Practical Tips

  • For Euclidean distances, classical MDS is equivalent to PCA on the centred data; if you have raw data, run PCA directly for cleaner output.
  • For non-Euclidean distances (Bray-Curtis, Jaccard, network distances), check the eigenvalue structure — large negative eigenvalues indicate a non-Euclidean metric and motivate non-metric MDS instead.
  • In ecology and microbiome research, this method is called principal coordinates analysis (PCoA); the algorithm is identical.
  • Add Procrustes analysis (vegan::procrustes) when comparing MDS solutions across related datasets; it aligns configurations up to rotation, reflection, and scaling.
  • For configurations beyond two dimensions, plot pairs of axes; factoextra::fviz_mds() provides a ggplot-style alternative.
  • Pair the configuration with a stress measure or the proportion of variance retained to communicate fit quality alongside the visual.

R Packages Used

Base R cmdscale() for classical MDS; vegan::wcmdscale() for weighted classical MDS in ecology; MASS::isoMDS() and vegan::metaMDS() for non-metric alternatives; factoextra::fviz_mds() for ggplot-style visualisation; smacof for advanced MDS variants including weighted and constrained versions.