Sparse versus dense in GraphBLAS: sometimes dense is better

GraphBLAS is fast, and getting faster.  I'm currently working on an OpenMP implementation, with support from Intel, and I'm starting on a CUDA implementation with support from with NVIDIA.

On my 24-core Intel Xeon server, I'm seeing a 15x speedup on sparse matrix-matrix multiply, for example.

All of this will speed up your application without the for you to do anything, but there are cases where you can modify your application to allow SuiteSparse:GraphBLAS to be still faster.  Here are some important cases.

Using a dense vector when appropriate:  a basic breadth-first-search makes incremental additions to a vector containing the level of the nodes visited.  In a simple approach, this vector v starts out sparse, and stays sparse.   SuiteSparse:GraphBLAS has a fast incremental update to its matrices and vectors, but theses updates must be finished if the vector is then used as an input to a method that requires the vector to be finalized.  That's the case for the BFS.

So what's happening internally, is that a few entries get added to v, and then the list of entries is sorted, in O(nnz(v)) time, inside the loop.  If the diameter of the graph is large (say O(n)) then this forces the total time of the BFS to be O(n^2).  Ouch.

Yet this vector v is expected to become dense, if the whole graph is visited.  So the simple solution is to construct v as a dense vector from the beginning.  SuiteSparse:GraphBLAS will detect that all entries are present, and so updates take O(1) time.  The LAGraph_bfs_simple uses this approach (see the code here).

A simple one-line assignment sets all of v to explicit zero values:

GrB_assign (v, NULL, NULL, 0, GrB_ALL, n, NULL)

An alternative is to let v start sparse (in case it ends up sparse), and convert it only when it starts to get too many entries. LAGraph_bfs_pushpull takes this approach (see the code here). The vector v is used as its own mask. Entries that do not appear in v are set to explicit zeros:

GrB_Descriptor_set (desc, GrB_MASK, GrB_SCMP) ;
GrB_assign (v, v, NULL, GrB_ALL, n, desc) ;

While not essential, in case v is updated again, it would be fastest to finalize the computations of v as well:

GrB_Matrix_nvals (&ignore, v)

That statement returns the number of entries in v (which are ignored), but it also has the side effect of forcing completion of any pending updates. If you know that v is dense, then it's best to force completion so that SuiteSparse:GraphBLAS can then deal with a truly dense vector v internally.

GraphBLAS does not have explicit dense vectors and matrices.  The job of finding the best data structure is up to the library implementor (me, for SuiteSparse:GraphBLAS).  It can be a difficult task to select automatically, but it would be possible to do this.  See my next blog post, My Friendly Zombie.

Leave a Reply

Your email address will not be published. Required fields are marked *