Rust/WASM API Reference

The Rust implementation provides f64 precision phylogenetic computations, compilable to both native Rust and WebAssembly (via wasm-bindgen). It serves as the browser fallback when WebGPU is unavailable.

Crate: phylo_wasm

Location: subby/wasm/

Targets: - cdylib — WebAssembly module via wasm-bindgen - rlib — Native Rust library for CLI and testing

Feature flags: - wasm — gates wasm-bindgen dependencies

[dependencies]
wasm-bindgen = { version = "0.2", optional = true }
js-sys = { version = "0.3", optional = true }

[features]
default = []
wasm = ["dep:wasm-bindgen", "dep:js-sys"]

Native Rust API

DiagModel

pub struct DiagModel {
    pub eigenvalues: Vec<f64>,   // (A,)
    pub eigenvectors: Vec<f64>,  // (A*A,) row-major
    pub pi: Vec<f64>,            // (A,)
}

log_like(alignment, parent_index, distances, model) -> Vec<f64>

Compute per-column log-likelihoods.

Parameter Type Description
alignment &[i32] Flat (R*C) row-major tokens
parent_index &[i32] (R,) preorder parent indices
distances &[f64] (R,) branch lengths
model &DiagModel Substitution model

Returns: Vec<f64> of length C.

counts(alignment, parent_index, distances, model, f81_fast_flag) -> Vec<f64>

Compute expected substitution counts and dwell times.

Returns: Vec<f64> of length A*A*C, row-major (A, A, C).

root_prob(alignment, parent_index, distances, model) -> Vec<f64>

Compute posterior root state distribution.

Returns: Vec<f64> of length A*C, row-major (A, C).

mixture_posterior_full(alignment, parent_index, distances, models, log_weights) -> Vec<f64>

Compute mixture component posteriors.

Returns: Vec<f64> of length K*C, row-major (K, C).


Model constructors

hky85_diag(kappa: f64, pi: &[f64]) -> DiagModel

HKY85 model with closed-form eigendecomposition.

jukes_cantor_model(a: usize) -> DiagModel

Jukes-Cantor model for a states.

f81_model(pi: &[f64]) -> DiagModel

F81 model with given equilibrium frequencies.

gamma_rate_categories(alpha: f64, k: usize) -> (Vec<f64>, Vec<f64>)

Discretized gamma rate categories.

scale_model(model: &DiagModel, rate_multiplier: f64) -> DiagModel

Scale eigenvalues by a rate multiplier.

eig_symmetric(s: &mut [f64], a: usize) -> (Vec<f64>, Vec<f64>)

Cyclic Jacobi eigendecomposition for a symmetric matrix. Computes eigenvalues and eigenvectors in-place.

Parameter Type Description
s &mut [f64] Flat (a*a) row-major symmetric matrix (modified in-place)
a usize Matrix dimension

Returns: (eigenvalues, eigenvectors) — eigenvalues as Vec<f64> of length a, eigenvectors as Vec<f64> of length a*a (row-major, v[i*a+k] = component $i$ of eigenvector $k$).

diagonalize_rate_matrix(sub_rate: &[f64], pi: &[f64]) -> DiagModel

Diagonalize a reversible rate matrix via symmetrization and Jacobi eigendecomposition. Symmetrizes $Q$ as $S_{ij} = Q_{ij} \sqrt{\pi_i} / \sqrt{\pi_j}$, then computes eigenvalues/eigenvectors of $S$.

Parameter Type Description
sub_rate &[f64] Flat (A*A) row-major rate matrix $Q$
pi &[f64] (A,) equilibrium distribution

Returns: DiagModel.

gy94_model(omega: f64, kappa: f64, pi: Option<&[f64]>) -> DiagModel

Goldman-Yang (1994) codon substitution model. Operates on 61 sense codons.

Parameter Type Description
omega f64 dN/dS ratio (Ka/Ks)
kappa f64 Transition/transversion ratio
pi Option<&[f64]> (61,) codon equilibrium frequencies (None = uniform $1/61$)

Returns: DiagModel with $A = 61$ states.


Format utilities

GeneticCode

pub struct GeneticCode {
    pub codons: Vec<String>,           // 64 codon strings
    pub amino_acids: Vec<char>,        // 64 amino acid letters (stop = '*')
    pub sense_mask: Vec<bool>,         // (64,) true for sense codons
    pub sense_indices: Vec<usize>,     // (61,) indices of sense codons in 0..63
    pub codon_to_sense: Vec<i32>,      // (64,) maps codon idx to sense idx (stop -> -1)
    pub sense_codons: Vec<String>,     // 61 sense codon strings
    pub sense_amino_acids: Vec<char>,  // 61 amino acid letters
}

genetic_code() -> GeneticCode

Return the standard genetic code. Codons in ACGT lexicographic order; stop codons (TAA, TAG, TGA) marked with '*'.

codon_to_sense(alignment: &[i32], n: usize, c: usize, a: usize) -> Vec<i32>

Remap a 64-codon tokenized flat alignment to 61-sense-codon tokens. Stop codons become the gap token.

Parameter Type Description
alignment &[i32] Flat (n*c) row-major tokens (64-codon encoding)
n usize Number of sequences
c usize Number of columns
a usize Input codon alphabet size (64)

Returns: Vec<i32> of length n*c with $A_\text{sense} = 61$ tokens.

KmerIndex

pub struct KmerIndex {
    pub tuples: Vec<Vec<i64>>,  // (T, k) column tuples
}

Maps between column tuples and output alignment indices. Provides O(1) lookup in both directions.

Method Description
new(tuples) Create from Vec<Vec<i64>>
tuple_to_idx(&self, t) Column tuple → output index. Returns -1 if absent.
idx_to_tuple(&self, idx) Output index → column tuple.
len(&self) Number of tuples.

KmerResult

pub struct KmerResult {
    pub alignment: Vec<i32>,  // flat (n * c_k) tokens
    pub a_kmer: usize,        // A^k
    pub n: usize,             // number of sequences
    pub c_k: usize,           // number of output columns
    pub index: KmerIndex,     // tuple ↔ index mapping
}

sliding_windows(c: usize, k: usize, stride: Option<usize>, offset: usize, edge: &str) -> Vec<Vec<i64>>

Generate column index tuples for sliding-window k-mer tokenization.

Parameter Type Description
c usize Number of columns
k usize Window size
stride Option<usize> Step between starts (Nonek)
offset usize Starting column (default 0)
edge &str "truncate" or "pad"

Returns: Vec<Vec<i64>>(M, k) column indices. -1 for out-of-bounds (with "pad").

all_column_ktuples(c: usize, k: usize, ordered: bool) -> Vec<Vec<i64>>

Generate all k-tuples of column indices. WARNING: $O(C^k)$.

Parameter Type Description
c usize Number of columns
k usize Tuple size
ordered bool true: permutations; false: combinations

Returns: Vec<Vec<i64>>.

kmer_tokenize(alignment: &[i32], n: usize, c: usize, a: usize, k: usize, gap_mode: &str) -> KmerResult

Convert single-character tokens to k-mer tokens using non-overlapping windows. c must be divisible by k. Backward-compatible API that delegates to kmer_tokenize_tuples.

Parameter Type Description
alignment &[i32] Flat (n*c) row-major tokens
n usize Number of sequences
c usize Number of columns
a usize Single-character alphabet size
k usize k-mer size (e.g., 3 for codons)
gap_mode &str "any" or "all"

Returns: KmerResult.

kmer_tokenize_tuples(alignment: &[i32], n: usize, c: usize, a: usize, column_tuples: &[Vec<i64>], gap_mode: &str) -> KmerResult

Core tokenizer accepting arbitrary column tuples. Entries of -1 in tuples are treated as unobserved positions.

Parameter Type Description
alignment &[i32] Flat (n*c) row-major tokens
n usize Number of sequences
c usize Number of columns
a usize Single-character alphabet size
column_tuples &[Vec<i64>] Column index tuples
gap_mode &str "any" or "all"

Returns: KmerResult. Token encoding: 0..$A^k - 1$ for observed k-mers, $A^k$ for ungapped-unobserved, $A^k + 1$ for gap.


Internal modules

Module Functions
tree children_of, validate_binary_tree
model Model constructors, DiagModel, eig_symmetric, diagonalize_rate_matrix, gy94_model
token token_to_likelihood
sub_matrices compute_sub_matrices
pruning upward_pass
outside downward_pass
eigensub compute_j, eigenbasis_project, accumulate_c, back_transform
f81_fast f81_counts
mixture mixture_posterior
branch_mask compute_branch_mask
formats parse_newick, parse_fasta, parse_strings, combine_tree_alignment, detect_alphabet, genetic_code, codon_to_sense, kmer_tokenize, kmer_tokenize_tuples, sliding_windows, all_column_ktuples, KmerIndex

WASM bindings

When compiled with --features wasm, the crate exports flat-array wasm-bindgen functions:

wasm_log_like(alignment, parent_index, distances, eigenvalues, eigenvectors, pi) -> Vec<f64>

wasm_counts(alignment, parent_index, distances, eigenvalues, eigenvectors, pi, f81_fast) -> Vec<f64>

wasm_root_prob(alignment, parent_index, distances, eigenvalues, eigenvectors, pi) -> Vec<f64>

wasm_branch_mask(alignment, parent_index, a) -> Vec<u8>

All parameters are flat typed arrays. Model parameters are passed individually rather than as a struct.

Format parser bindings

wasm_parse_newick(newick_str) -> JsValue

Parse a Newick string. Returns object with parentIndex (Int32Array), distanceToParent (Float64Array), leafNames, nodeNames, R.

wasm_parse_fasta(text) -> JsValue

Parse FASTA text. Returns object with alignment (Int32Array, flat N*C), leafNames, alphabet, N, C.

wasm_parse_strings(sequences) -> JsValue

Parse an array of equal-length strings. Returns object with alignment (Int32Array, flat N*C), alphabet, N, C.


Building

Native

cd subby/wasm
cargo build --release
cargo test

WASM

cd subby/wasm
wasm-pack build --target web --features wasm

Produces pkg/phylo_wasm_bg.wasm and pkg/phylo_wasm.js.


JS wrapper: PhyloWASM

subby/webgpu/phylo_wasm.js wraps the WASM module with the same async API as PhyloGPU:

import { PhyloWASM } from './phylo_wasm.js';

const engine = await PhyloWASM.create('./phylo_wasm_bg.wasm');
const logLike = await engine.LogLike(alignment, parentIndex, distances,
                                      eigenvalues, eigenvectors, pi);
engine.destroy();

PhyloWASM.create(wasmUrl)

Fetch and instantiate the WASM module.

PhyloWASM.fromModule(wasmModule)

Wrap a pre-initialized wasm-bindgen module.

The API methods (LogLike, Counts, RootProb, MixturePosterior, computeBranchMask, destroy) are identical to PhyloGPU — see the WebGPU API reference.


Precision

All internal computation uses f64. Test tolerance: atol=1e-8 against golden files.