Performance of the MATLAB interface to SuiteSparse:GraphBLAS.

The MATLAB interface is getting polished, and most of the functionality is in place. I just posted a draft of SuiteSparse:GraphBLAS v3.1.0-Aug26, with these features. In the meantime, here are some exciting performance results.

C(I,J)=A can be 1000x faster than MATLAB

For the MATLAB statement C(I,J)=A, I can do the same syntax with a GraphBLAS matrix (a 'gb' object in MATLAB). On my 20-core DGX Station, MATLAB takes 0.5 seconds and GraphBLAS 0.23 seconds when the matrix C has dimension 4 million.

OK, a speedup of 2x is nice. But wait ...

When C is 9e6-by-9e6, MATLAB takes 238 seconds (about 4 minutes), and graphBLAS takes 0.26 seconds; so GraphBLAS is 915x faster. This is the very same syntax: either C(I,J)=A with MATLAB sparse matrices, or the very same statement but with gb objects. The 0.26 seconds for GraphbLAS even includes the time to convert C back to a MATLAB sparse matrix, for good measure (but that would be option in practice).

Sparse Deep Neural Network: 60x faster than MATLAB

On the same machine, the Sparse Deep Neural Network in pure MATLAB (dnn_matlab.m) takes 15,472 seconds (4.3 hours), with 64K neurons and 120 layers. This is the reference implementation at http://graphchallenge.org, slightly editted for style.

function Y = dnn_matlab (W, bias, Y0)
Y = Y0 ;
for i=1:length(W)
    % Propagate through layer.
    Z = Y * W {i} ;
    % Apply bias to non-zero entries.
    Y = Z + (double(logical(Z)) .* bias {i}) ;
    % Threshold negative values.
    Y (Y < 0) = 0 ;
    % Threshold maximum values. 
    Y (Y > 32) = 32 ;
end

The pure C code in my HPEC'19 paper takes 251 seconds with 40 threads. The dnn_gb.m function, below, which is simpler than dnn_matlab.m, takes 254 seconds with 40 threads. So I'm only losing a little bit of time in the MATLAB interface.

function Y = dnn_gb (W, bias, Y0)
Y = Y0 ;
for i=1:length(W)
    % Propagate through layer, apply bias, and threshold negative values.
    Y = gb.select ('>0', gb.mxm ('+.+', Y * W {i}, bias {i})) ;
    % Threshold maximum values.
    M = Y > 32 ;
    if (nnz (M) > 0)
        Y (M) = 32 ;
    end
end

(the test for nnz(M) could also be added to the dnn_matlab.m; it saves only a small bit of time though).

Logical Indexing can be 90,000x faster than MATLAB

MATLAB has a logical indexing that acts much like the mask in GraphBLAS, C(M)=A(M).  That statement assigns entries from A to C where M(i,j) is true. With its MATLAB interface, GraphBLAS can use the same syntax, just a lot faster.  First, some MATLAB sparse matrices:

n = 4000 ;
C = sprand (n, n, 0.1) ;
A = 100 * sprand (n, n, 0.1) ;
M = (C > 0.5) ;

That takes just under a second to setup the problem.  The matrices can then be converted into gb matrices, assigned, and the result converted back, as follows. The time taken is 0.08 seconds on my 4-core Dell laptop, using GraphBLAS. I could have added in the time to create A2 and C2, but that would be optional if all your matrices were already gb objects. I did add the time to convert the result from a gb object C2 to a MATLAB sparse matrix C2, just for good measure (but again, that would be optional if all your work is in gb matrices anyway).

A2 = gb (A) ;
C2 = gb (C) ;
tic
C2 (M) = A2 (M) ;
C2 = double (C2) ;
toc ;

In comparsion, the MATLAB statement below takes 568 seconds.

C (M) = A (M) ;

GraphBLAS computes the same result as MATLAB, using the exact same syntax, but is 6,773x faster on my laptop. On my DGX station, with 20 cores, the difference is around 20,000x. If instead of the nice C(M)=A(M) syntax, you were willing to use the GraphBLAS function:

C = gb.assign (C, M, A) ;

which computes C(M)=A(M), it cuts the time by 4x, and on my NVIDIA DGX Station, this statement in MATLAB is about 90,000x faster when using gb matrix objects as compared to C(M)=A(M) MATLAB sparse matrices.

One caveat though: my Dell is running MATLAB R2019a and my DGX Station is using MATLAB R2018a.

Try these results yourself

These are crazy results ... so you shouldn't just trust them at face value. Run the code yourself on your machine. It's easy.

Download the Aug26 draft of v3.1.0 on my web page (http://faculty.cse.tamu.edu/davis/GraphBLAS_archive/ and go to the bottom of the page). Just install the GraphBLAS library itself, then the MATLAB interface. Then run the gbdemo.m and gbdemo2.m in GraphBLAS/GraphBLAS/demo. For the sparse deep neural network, a small example is tested in gbdemo.m. For the larger problems, like the 64K-by-120 layer one used above, you'll need to download the problems from http://graphchallenge.org and use dnn_mat2gb.m to convert them to GraphBLAS matrices.

Leave a Reply

Your email address will not be published. Required fields are marked *