Rust/WASM API Reference¶
The Rust implementation provides f64 precision phylogenetic computations, compilable to both native Rust and WebAssembly (via wasm-bindgen). It serves as the browser fallback when WebGPU is unavailable.
Crate: phylo_wasm¶
Location: subby/wasm/
Targets:
- cdylib — WebAssembly module via wasm-bindgen
- rlib — Native Rust library for CLI and testing
Feature flags:
- wasm — gates wasm-bindgen dependencies
[dependencies]
wasm-bindgen = { version = "0.2", optional = true }
js-sys = { version = "0.3", optional = true }
[features]
default = []
wasm = ["dep:wasm-bindgen", "dep:js-sys"]
Native Rust API¶
DiagModel¶
pub struct DiagModel {
pub eigenvalues: Vec<f64>, // (A,)
pub eigenvectors: Vec<f64>, // (A*A,) row-major
pub pi: Vec<f64>, // (A,)
}
log_like(alignment, parent_index, distances, model) -> Vec<f64>¶
Compute per-column log-likelihoods.
| Parameter | Type | Description |
|---|---|---|
alignment |
&[i32] |
Flat (R*C) row-major tokens |
parent_index |
&[i32] |
(R,) preorder parent indices |
distances |
&[f64] |
(R,) branch lengths |
model |
&DiagModel |
Substitution model |
Returns: Vec<f64> of length C.
counts(alignment, parent_index, distances, model, f81_fast_flag) -> Vec<f64>¶
Compute expected substitution counts and dwell times.
Returns: Vec<f64> of length A*A*C, row-major (A, A, C).
root_prob(alignment, parent_index, distances, model) -> Vec<f64>¶
Compute posterior root state distribution.
Returns: Vec<f64> of length A*C, row-major (A, C).
mixture_posterior_full(alignment, parent_index, distances, models, log_weights) -> Vec<f64>¶
Compute mixture component posteriors.
Returns: Vec<f64> of length K*C, row-major (K, C).
Model constructors¶
hky85_diag(kappa: f64, pi: &[f64]) -> DiagModel¶
HKY85 model with closed-form eigendecomposition.
jukes_cantor_model(a: usize) -> DiagModel¶
Jukes-Cantor model for a states.
f81_model(pi: &[f64]) -> DiagModel¶
F81 model with given equilibrium frequencies.
gamma_rate_categories(alpha: f64, k: usize) -> (Vec<f64>, Vec<f64>)¶
Discretized gamma rate categories.
scale_model(model: &DiagModel, rate_multiplier: f64) -> DiagModel¶
Scale eigenvalues by a rate multiplier.
eig_symmetric(s: &mut [f64], a: usize) -> (Vec<f64>, Vec<f64>)¶
Cyclic Jacobi eigendecomposition for a symmetric matrix. Computes eigenvalues and eigenvectors in-place.
| Parameter | Type | Description |
|---|---|---|
s |
&mut [f64] |
Flat (a*a) row-major symmetric matrix (modified in-place) |
a |
usize |
Matrix dimension |
Returns: (eigenvalues, eigenvectors) — eigenvalues as Vec<f64> of length a, eigenvectors as Vec<f64> of length a*a (row-major, v[i*a+k] = component $i$ of eigenvector $k$).
diagonalize_rate_matrix(sub_rate: &[f64], pi: &[f64]) -> DiagModel¶
Diagonalize a reversible rate matrix via symmetrization and Jacobi eigendecomposition. Symmetrizes $Q$ as $S_{ij} = Q_{ij} \sqrt{\pi_i} / \sqrt{\pi_j}$, then computes eigenvalues/eigenvectors of $S$.
| Parameter | Type | Description |
|---|---|---|
sub_rate |
&[f64] |
Flat (A*A) row-major rate matrix $Q$ |
pi |
&[f64] |
(A,) equilibrium distribution |
Returns: DiagModel.
gy94_model(omega: f64, kappa: f64, pi: Option<&[f64]>) -> DiagModel¶
Goldman-Yang (1994) codon substitution model. Operates on 61 sense codons.
| Parameter | Type | Description |
|---|---|---|
omega |
f64 |
dN/dS ratio (Ka/Ks) |
kappa |
f64 |
Transition/transversion ratio |
pi |
Option<&[f64]> |
(61,) codon equilibrium frequencies (None = uniform $1/61$) |
Returns: DiagModel with $A = 61$ states.
Format utilities¶
GeneticCode¶
pub struct GeneticCode {
pub codons: Vec<String>, // 64 codon strings
pub amino_acids: Vec<char>, // 64 amino acid letters (stop = '*')
pub sense_mask: Vec<bool>, // (64,) true for sense codons
pub sense_indices: Vec<usize>, // (61,) indices of sense codons in 0..63
pub codon_to_sense: Vec<i32>, // (64,) maps codon idx to sense idx (stop -> -1)
pub sense_codons: Vec<String>, // 61 sense codon strings
pub sense_amino_acids: Vec<char>, // 61 amino acid letters
}
genetic_code() -> GeneticCode¶
Return the standard genetic code. Codons in ACGT lexicographic order; stop codons (TAA, TAG, TGA) marked with '*'.
codon_to_sense(alignment: &[i32], n: usize, c: usize, a: usize) -> Vec<i32>¶
Remap a 64-codon tokenized flat alignment to 61-sense-codon tokens. Stop codons become the gap token.
| Parameter | Type | Description |
|---|---|---|
alignment |
&[i32] |
Flat (n*c) row-major tokens (64-codon encoding) |
n |
usize |
Number of sequences |
c |
usize |
Number of columns |
a |
usize |
Input codon alphabet size (64) |
Returns: Vec<i32> of length n*c with $A_\text{sense} = 61$ tokens.
KmerIndex¶
pub struct KmerIndex {
pub tuples: Vec<Vec<i64>>, // (T, k) column tuples
}
Maps between column tuples and output alignment indices. Provides O(1) lookup in both directions.
| Method | Description |
|---|---|
new(tuples) |
Create from Vec<Vec<i64>> |
tuple_to_idx(&self, t) |
Column tuple → output index. Returns -1 if absent. |
idx_to_tuple(&self, idx) |
Output index → column tuple. |
len(&self) |
Number of tuples. |
KmerResult¶
pub struct KmerResult {
pub alignment: Vec<i32>, // flat (n * c_k) tokens
pub a_kmer: usize, // A^k
pub n: usize, // number of sequences
pub c_k: usize, // number of output columns
pub index: KmerIndex, // tuple ↔ index mapping
}
sliding_windows(c: usize, k: usize, stride: Option<usize>, offset: usize, edge: &str) -> Vec<Vec<i64>>¶
Generate column index tuples for sliding-window k-mer tokenization.
| Parameter | Type | Description |
|---|---|---|
c |
usize |
Number of columns |
k |
usize |
Window size |
stride |
Option<usize> |
Step between starts (None → k) |
offset |
usize |
Starting column (default 0) |
edge |
&str |
"truncate" or "pad" |
Returns: Vec<Vec<i64>> — (M, k) column indices. -1 for out-of-bounds (with "pad").
all_column_ktuples(c: usize, k: usize, ordered: bool) -> Vec<Vec<i64>>¶
Generate all k-tuples of column indices. WARNING: $O(C^k)$.
| Parameter | Type | Description |
|---|---|---|
c |
usize |
Number of columns |
k |
usize |
Tuple size |
ordered |
bool |
true: permutations; false: combinations |
Returns: Vec<Vec<i64>>.
kmer_tokenize(alignment: &[i32], n: usize, c: usize, a: usize, k: usize, gap_mode: &str) -> KmerResult¶
Convert single-character tokens to k-mer tokens using non-overlapping windows. c must be divisible by k. Backward-compatible API that delegates to kmer_tokenize_tuples.
| Parameter | Type | Description |
|---|---|---|
alignment |
&[i32] |
Flat (n*c) row-major tokens |
n |
usize |
Number of sequences |
c |
usize |
Number of columns |
a |
usize |
Single-character alphabet size |
k |
usize |
k-mer size (e.g., 3 for codons) |
gap_mode |
&str |
"any" or "all" |
Returns: KmerResult.
kmer_tokenize_tuples(alignment: &[i32], n: usize, c: usize, a: usize, column_tuples: &[Vec<i64>], gap_mode: &str) -> KmerResult¶
Core tokenizer accepting arbitrary column tuples. Entries of -1 in tuples are treated as unobserved positions.
| Parameter | Type | Description |
|---|---|---|
alignment |
&[i32] |
Flat (n*c) row-major tokens |
n |
usize |
Number of sequences |
c |
usize |
Number of columns |
a |
usize |
Single-character alphabet size |
column_tuples |
&[Vec<i64>] |
Column index tuples |
gap_mode |
&str |
"any" or "all" |
Returns: KmerResult. Token encoding: 0..$A^k - 1$ for observed k-mers, $A^k$ for ungapped-unobserved, $A^k + 1$ for gap.
Internal modules¶
| Module | Functions |
|---|---|
tree |
children_of, validate_binary_tree |
model |
Model constructors, DiagModel, eig_symmetric, diagonalize_rate_matrix, gy94_model |
token |
token_to_likelihood |
sub_matrices |
compute_sub_matrices |
pruning |
upward_pass |
outside |
downward_pass |
eigensub |
compute_j, eigenbasis_project, accumulate_c, back_transform |
f81_fast |
f81_counts |
mixture |
mixture_posterior |
branch_mask |
compute_branch_mask |
formats |
parse_newick, parse_fasta, parse_strings, combine_tree_alignment, detect_alphabet, genetic_code, codon_to_sense, kmer_tokenize, kmer_tokenize_tuples, sliding_windows, all_column_ktuples, KmerIndex |
WASM bindings¶
When compiled with --features wasm, the crate exports flat-array wasm-bindgen functions:
wasm_log_like(alignment, parent_index, distances, eigenvalues, eigenvectors, pi) -> Vec<f64>¶
wasm_counts(alignment, parent_index, distances, eigenvalues, eigenvectors, pi, f81_fast) -> Vec<f64>¶
wasm_root_prob(alignment, parent_index, distances, eigenvalues, eigenvectors, pi) -> Vec<f64>¶
wasm_branch_mask(alignment, parent_index, a) -> Vec<u8>¶
All parameters are flat typed arrays. Model parameters are passed individually rather than as a struct.
Format parser bindings¶
wasm_parse_newick(newick_str) -> JsValue¶
Parse a Newick string. Returns object with parentIndex (Int32Array), distanceToParent (Float64Array), leafNames, nodeNames, R.
wasm_parse_fasta(text) -> JsValue¶
Parse FASTA text. Returns object with alignment (Int32Array, flat N*C), leafNames, alphabet, N, C.
wasm_parse_strings(sequences) -> JsValue¶
Parse an array of equal-length strings. Returns object with alignment (Int32Array, flat N*C), alphabet, N, C.
Building¶
Native¶
cd subby/wasm
cargo build --release
cargo test
WASM¶
cd subby/wasm
wasm-pack build --target web --features wasm
Produces pkg/phylo_wasm_bg.wasm and pkg/phylo_wasm.js.
JS wrapper: PhyloWASM¶
subby/webgpu/phylo_wasm.js wraps the WASM module with the same async API as PhyloGPU:
import { PhyloWASM } from './phylo_wasm.js';
const engine = await PhyloWASM.create('./phylo_wasm_bg.wasm');
const logLike = await engine.LogLike(alignment, parentIndex, distances,
eigenvalues, eigenvectors, pi);
engine.destroy();
PhyloWASM.create(wasmUrl)¶
Fetch and instantiate the WASM module.
PhyloWASM.fromModule(wasmModule)¶
Wrap a pre-initialized wasm-bindgen module.
The API methods (LogLike, Counts, RootProb, MixturePosterior, computeBranchMask, destroy) are identical to PhyloGPU — see the WebGPU API reference.
Precision¶
All internal computation uses f64. Test tolerance: atol=1e-8 against golden files.