Batched Least Squares of Tall Skinny Matrices on GPUs
Better performance is possible if independent problems are batched together.
This is often possible, e.g. XVA applications in finance. nAG has made a highly optimized GPU code allowing different sized matrices in the batch.
Tags: GPU, Least Squares, NAG Library