◆ FORGE Suite
GitHubMechanical Neuroimaging Lab · Univ. of Delaware
Skip to content

Solver Backends

Sentinel uses a pluggable linear solver system built on the AbstractLinearSolver interface. All solvers implement the same linear_solve! method, so switching between backends requires only changing the solver argument — no changes to problem setup or assembly code.

AbstractLinearSolver Interface

Every solver backend subtypes AbstractLinearSolver and implements:

julia
linear_solve!(x::AbstractVector{ComplexF64}, K::SparseMatrixCSC{ComplexF64},
              f::AbstractVector{ComplexF64}, solver::YourSolver) -> x

The system K * x = f is solved in-place, overwriting x. Here K is always a complex-valued sparse matrix (complex symmetric for viscoelastic models, general for poroelastic model 4) with Dirichlet boundary conditions already applied.

Solvers that cache internal state (e.g., factorizations) should also implement:

julia
invalidate_cache!(solver::YourSolver)

This discards cached state when K changes (e.g., after re-assembly). The default fallback is a no-op.

See the Forward Solver API page for full docstrings of AbstractLinearSolver, linear_solve!, and invalidate_cache!.

DirectSolver

The default solver. Uses Julia's built-in \ operator, which dispatches to SuiteSparse UMFPACK for complex sparse matrices. No extra dependencies required.

julia
solver = DirectSolver()
forward_solve!(disp, K, f, model, 1; solver=solver)

Best for development, testing, and small to moderate problem sizes. Every call performs a fresh LU factorization.

See the Forward Solver API page for the full DirectSolver docstring.

CachedDirectSolver

Wraps UMFPACK with LU factorization caching. On the first linear_solve! call, computes lu(K) and stores it. Subsequent calls with the same K reuse the cached factorization, solving only by forward/back substitution.

julia
solver = CachedDirectSolver()
forward_solve!(disp, K, f, model, 1; solver=solver)   # ~55ms (factor + solve)
adjoint_solve!(adjnt, K, calc, meas, bcs, 1; solver=solver)  # ~0.2ms (reuse)

This is the primary solver for the inverse problem loop. The forward solve stores lu(K), and the adjoint solve reuses it — since both operate on the same stiffness matrix, the expensive factorization is performed only once per assembly.

Call invalidate_cache! whenever K is re-assembled. The evaluate_objective! function calls invalidate_cache! automatically after each assembly, so in normal usage the cache lifecycle is managed for you.

See the Forward Solver API page for the full CachedDirectSolver docstring.

MUMPSSolver

MUMPS (MUltifrontal Massively Parallel Sparse direct Solver) backend, provided by the SentinelMUMPSExt package extension. Loaded automatically when using MUMPS is called alongside Sentinel.

julia
using MUMPS
solver = MUMPSSolver(sym=2)   # complex symmetric (default)
forward_solve!(disp, K, f, model, 1; solver=solver)

The sym parameter controls the MUMPS symmetry mode:

symModeFactorizationUse case
2Complex symmetricLDL^TViscoelastic models 1, 2, 3, 5, 6, 7 (default)
0General unsymmetricLUPoroelastic model 4

Requires MPI to be initialized before use (MPI.Init() is called automatically if needed).

See the Forward Solver API page for the full MUMPSSolver docstring.

KrylovSolver

GMRES iterative solver with ILU(0) preconditioning, using Krylov.jl and IncompleteLU.jl. The ILU preconditioner is built lazily on the first solve and cached for reuse across subsequent solves with the same sparsity pattern.

julia
solver = KrylovSolver(rtol=1e-12, atol=1e-12, memory=20, itmax=200)
forward_solve!(disp, K, f, model, 1; solver=solver)

Parameters:

ParameterDefaultDescription
rtol1e-12Relative convergence tolerance
atol1e-12Absolute convergence tolerance
memory20GMRES restart parameter (Krylov subspace size)
itmax200Maximum GMRES iterations
verbose0Verbosity level (0 = silent)

Validation: All 276 zones in the MGH brain MRE dataset converge in 1 GMRES iteration with ILU(0) preconditioning, with maximum error of 1.91e-13 relative to the direct solver. This indicates that ILU(0) is an excellent preconditioner for the MRE stiffness matrices.

CUDAKrylovSolver

GPU-accelerated GMRES solver using cuSPARSE for sparse matrix-vector products and ILU(0) preconditioning. Provided by the SentinelCUDAExt package extension.

julia
using CUDA
solver = CUDAKrylovSolver(rtol=1e-12, atol=1e-12, memory=20)
forward_solve!(disp, K, f, model, 1; solver=solver)

The solver transfers K to the GPU, computes ILU(0) via cuSPARSE csrilu02, and runs GMRES with GPU-resident vectors. For zone-sized systems (~1700 DOFs), ILU(0) achieves 1-iteration convergence, making the effective cost one SpMV plus two triangular solves.

Note

Despite excellent convergence, CUDAKrylovSolver is slower than CUDSSBatchedSolver for zone-sized problems due to GPU kernel launch overhead. Use CUDSSBatchedSolver for production workloads.

CUDSSDirectSolver

GPU sparse direct solver via NVIDIA cuDSS. Performs per-zone factorization and solve on a dedicated CUDA stream for thread-safe concurrent operation.

julia
using CUDA, CUDSS
solver = CUDSSDirectSolver()  # allocates its own CuStream
forward_solve!(disp, K, f, model, 1; solver=solver)

Key features:

  • Per-solver CUDA stream — Each instance owns a CuStream, enabling safe concurrent use from multiple Julia threads during line search.

  • Refactorization reuse — First call performs full analysis + factorization. Subsequent calls with _needs_refactor=true only refactorize (same sparsity pattern). Same-K different-RHS calls (adjoint reuse) only update the RHS.

  • Cache invalidation — Call invalidate_cache!(solver) after material updates to trigger refactorization on the next solve.

Used in Phase 3 (line search) of the batched zone driver, where ~290 solvers run concurrently across Julia threads. Achieves ~8.5ms per refactor+solve on DGX Spark ARM cores, vs ~55ms for CPU LU on the same hardware.

CUDSSBatchedSolver

Batched GPU factorization and solve for all zones simultaneously. This is the primary production solver for Phase 1 (forward + adjoint) of the GPU zone decomposition driver.

julia
using CUDA, CUDSS
bsolver = CUDSSBatchedSolver(structure="G")
create_batched_solver!(bsolver, zone_Ks)      # symbolic analysis + factorize
batched_solve!(bsolver, zone_xs, zone_rhs)    # solve all zones
batched_update_and_refactor!(bsolver, zone_Ks) # update values + refactorize

Key functions:

  • create_batched_solver!(solver, Ks) — Transfers all zone K matrices to GPU, creates per-zone CudssSolver instances, performs symbolic analysis and initial factorization.

  • batched_solve!(solver, xs, rhs) — Solves all zone systems using cached factorizations. RHS vectors are transferred to GPU, solved, and results read back.

  • batched_update_and_refactor!(solver, Ks) — Updates GPU-side matrix values from CPU and re-factorizes, reusing the symbolic analysis from create_batched_solver!.

Handles variable-size zones (different sparsity patterns and dimensions) by maintaining independent CudssSolver instances per zone. On the MGH brain dataset (~290 zones), factorizes all zones in ~5s and solves in ~48ms.

Adding a New Solver

To add a new solver backend:

1. Define the solver type:

julia
struct MySolver <: AbstractLinearSolver
    # solver-specific configuration fields
end

2. Implement linear_solve!:

julia
function linear_solve!(x::AbstractVector{ComplexF64},
                        K::SparseMatrixCSC{ComplexF64},
                        f::AbstractVector{ComplexF64},
                        solver::MySolver)
    # Solve K * x = f, store result in x
    return x
end

3. (Optional) Implement invalidate_cache!:

julia
function invalidate_cache!(solver::MySolver)
    solver.cached_factor = nothing
    return nothing
end

4. Export from Sentinel:

Add the type to the export list in src/forward/solver.jl:

julia
export MySolver

The solver can then be passed to forward_solve!, adjoint_solve!, or set as the solver field in a ForwardProblemContext.

If the solver is provided by an external package, implement it as a package extension (see ext/SentinelMUMPSExt.jl for the pattern).