Overview
songR provides a native R/C++ implementation of the Self-Organizing Nebulous Growths (SONG) algorithm for nonlinear dimensionality reduction and data visualization.
SONG’s key advantages over t-SNE and UMAP:
| Feature | SONG | t-SNE | UMAP |
|---|---|---|---|
| Incremental updates | Yes | No | No |
| Parametric model | Yes (codebook) | No | Optional |
| Noise robustness | High | Low | Medium |
| Global structure | Good | Poor | Good |
New data can be added to an existing embedding without reinitializing or retraining the model, making SONG ideal for streaming data, growing datasets, and experiments where data arrives in batches.
Installation
# Install from CRAN (when available)
install.packages("songR")
# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("r-heller/songR")Key Features
Incremental Visualization
Train on an initial batch, then add more data without recomputing from scratch. Existing point positions are preserved:
Parametric Projection
Once trained, a SONG model can project unseen data without retraining:
new_coords <- predict(model, newdata = held_out_data)Bundled Dataset
songR ships with songR_blobs, a synthetic 8-cluster dataset in 20 dimensions for quick benchmarking:
When to Use SONG
- Incremental/streaming data: Data arrives in batches and you need a stable, growing visualization.
- Heterogeneous increments: New data may contain classes or structures not present in the original data.
- Noisy or mixed clusters: SONG is more tolerant to noise and cluster overlap than t-SNE.
- Parametric mapping: SONG retains a codebook model that can project new points without retraining.
- Large-scale cytometry/single-cell: SONG handles 1M+ cells incrementally with stable embeddings (see tutorials).
Vignettes and Tutorials
CRAN Vignettes
- Introduction to songR – Full overview of the API with iris and songR_blobs examples
- Getting Started with songR – Quick-start guide covering fitting, updating, predicting, and tuning
pkgdown Articles
- Reproducing Paper Figures – Reproduces key experiments from Senanayake et al. (2021) using songR (heterogeneous/homogeneous increments, CDY stability, noise tolerance, topology preservation, AMI scores)
- Interactive Shiny App – Guide to the built-in comparison app with dark mode
Full-Scale Tutorial Scripts
The tutorials/ directory contains scripts for full-scale reproduction of all paper figures and tables on real datasets (MNIST, Fashion-MNIST, Wong CyTOF 1.27M cells, COIL-20, Samusik):
| Script | Paper Figure | Dataset |
|---|---|---|
02_fig3_fashion_mnist_heterogeneous.R |
Fig. 3 | Fashion-MNIST (70k) |
03_fig4_mnist_heterogeneous.R |
Fig. 4 | MNIST (70k) |
04_fig5_wong_homogeneous.R |
Fig. 5 | Wong CyTOF (1.27M) |
05_fig6_cdy_lines.R |
Fig. 6 | Multiple |
06_fig7_table_IV_noise_tolerance.R |
Fig. 7 / Table IV | Gaussian blobs |
07_fig8_coil20_topology.R |
Fig. 8 | COIL-20 (1440) |
08_table_II_heterogeneous_ami.R |
Table II | Multiple |
09_table_III_homogeneous_ami.R |
Table III | Multiple |
11_extra_static_benchmark.R |
– | All datasets |
12_extra_hyperparam_sensitivity.R |
– | songR_blobs |
13_extra_runtime_benchmark.R |
– | Scaling test |
Run all tutorials:
Key Parameters
| Parameter | Default | Effect |
|---|---|---|
epochs |
50 | More = better convergence, slower |
epsilon |
0.9 | Edge decay rate (0–1); lower = sparser graph |
spread_factor |
0.5 | Growth threshold; higher = more coding vectors |
k |
3 | Neighborhood size for graph construction |
dispersion |
TRUE | UMAP refinement step for visual quality |
alpha |
1.0 | Initial learning rate |
Citation
If you use songR in published research, please cite both this R package and the underlying SONG algorithm:
R package:
Heller, R. (2026). songR: Self-Organizing Nebulous Growths for Dimensionality Reduction. R package version 0.1.0. https://github.com/r-heller/songR
Underlying algorithm:
Senanayake, D. A., Wang, W., Naik, S. H., & Halgamuge, S. (2021). Self-Organizing Nebulous Growths for Robust and Incremental Data Visualization. IEEE Transactions on Neural Networks and Learning Systems, 32(10), 4588-4602. doi:10.1109/TNNLS.2020.3023941
citation("songR")Acknowledgments
This package is an independent R/C++ re-implementation of the SONG algorithm using RcppArmadillo. The original algorithm was developed by Damith Senanayake, Wei Wang, Shalin Naik, and Saman Halgamuge at the University of Melbourne. The original Python implementation is available at github.com/damithsenanayake/SONG.