============ Day 1 (Sep 12) ============

1. Sparse Tensor Train Decomposition (Zhaonan Meng)

Problem

Standard TT decomposition (e.g., TT-SVD) gives low-rank approximations but cannot preserve sparsity in TT-cores, even if the input tensor is very sparse.
Leads to dense cores → memory blowup, inefficiency, not suitable for large-scale sparse tensors.

Solution

TT-ID: Tensor-Train Interpolative Decomposition
- Uses Partial Rank-Revealing LU (PRRLU) to enforce inheritance of sparsity.
- Maintains sparsity in TT-cores by selecting representative rows/columns (skeletons).
STTID: High-performance implementation of TT-ID
- Sparse PRRLU in COO format.
- Optimizations: Selective Data Separation, Hash Table Gaussian Elimination.
- GPU acceleration: cuBLAS for pivoting, cuCollections hash tables, kernel fusion.

Results

Quality: Lower density cores than TT-SVD/TT-cross, more stable & memory efficient.
Performance: CPU 2.5–105× faster, GPU up to 728× speedup on H100.

Questions

Problem