============ Day 1 (Sep 12) ============
1. Sparse Tensor Train Decomposition (Zhaonan Meng)
Problem
- Standard TT decomposition (e.g., TT-SVD) gives low-rank approximations but cannot preserve sparsity in TT-cores, even if the input tensor is very sparse.
- Leads to dense cores → memory blowup, inefficiency, not suitable for large-scale sparse tensors.
Solution
- TT-ID: Tensor-Train Interpolative Decomposition
- Uses Partial Rank-Revealing LU (PRRLU) to enforce inheritance of sparsity.
- Maintains sparsity in TT-cores by selecting representative rows/columns (skeletons).
- STTID: High-performance implementation of TT-ID
- Sparse PRRLU in COO format.
- Optimizations: Selective Data Separation, Hash Table Gaussian Elimination.
- GPU acceleration: cuBLAS for pivoting, cuCollections hash tables, kernel fusion.
Results
- Quality: Lower density cores than TT-SVD/TT-cross, more stable & memory efficient.
- Performance: CPU 2.5–105× faster, GPU up to 728× speedup on H100.
Questions
- Dong Li:
- slide 16, phase 2 and phase 3 → do they happen in parallel?
- Z: no, there is some dependency
- does Guassian elimination happen in GPU?
- Z: yes
- introduces load imbalance?
- Z: did not investigate, there is a possibility
- Z: updates need to be performed on some rows and not on some
- Nikos:
- with tt-decomposition, the ordering of tt-cores is crucial, do you take in account of the different order?
- Z: only left to right order and not consider different orders for the tt-cores.
- Z: not focused on identifying optimal order for the tt-decomposition
- can be future work → Explore different mode orderings in tensor train decomposition
- N: how does sparsity get affected by changing the order?
- Z: The order can also influence the rank and affect the sparsity → good future work direction
- Karl: int64 being used ?
2. Accelerating Sparse Matrix Multiplication on Tenstorrent (Rahmy Salman)
Problem
- Sparse matrix multiplication (SpMM) is critical.