The design for the MATLAB interface to SuiteSparse:GraphBLAS starts with the decision to construct a MATLAB struct to hold the GrB_Matrix. The contents of this struct match my internal data structure exactly.

Here is a list of functions that will be available. First, for GrB_Matrix_new, and to construct a GraphBLAS copy of a MATLAB sparse matrix:

A = gb ; empty 1-by-1 GraphBLAS double matrix A = gb (X) ; GraphBLAS copy of a MATLAB sparse X, same type A = gb (type) ; empty 1-by-1 GraphBLAS matrix of the given type A = gb (X, type) ; GraphBLAS typecasted copy of a MATLAB sparse X A = gb (m, n) ; empty m-by-n GraphBLAS double matrix A = gb (m, n, type) ; empty m-by-n GraphBLAS matrix of the given type

where type is a string, 'double', 'single', 'int8', ... 'uint64', 'complex'. A is the MATLAB struct that will hold the GraphBLAS sparse matrix. Yes, I could use MATLAB objects, but I don't need any operator overloading (yet...). Then here are the operations; most are self-explanatory. This is a draft list.

Cout = gbadd (Cin, M, accum, semiring, A, B, desc) Cout = gbapply (Cin, M, accum, op, A, desc) Cout = gbassign (Cin, M, accum, A, I, J, desc) A = gbbuild (I, J, X, m, n, dup, type) gbclear Cout = gbemult (Cin, M, accum, semiring, A, B, desc) Cout = gbextract (Cin, M, accum, A, I, J, desc) [I J X] = gbfind (A, onebased) A = gbget (arg1, arg2, arg3) Cout = gbkron (Cin, M, accum, op, A, B, desc) Cout = gbmxm (Cin, M, accum, semiring, A, B, desc) nvals = gbnvals (X) ok = gbprint (C, level) Cout = gbreduce (Cin, M, accum, op, A, desc) Cout = gbresize (Cin, m, n) Cout = gbselect (Cin, M, accum, op, A, thunk, desc) A = gbset (arg1, arg2, arg3) ; % sets global and matrix options [m n] = gbsize (X) A = gbsparse (X) Cout = gbsubassign (Cin, M, accum, A, I, J, desc) nthreads = gbthreads (nthreads) Cout = gbtranspose (Cin, M, accum, A, desc) type = gbtype (X)

I can thus hold a GrB_Matrix of any type this way, and efficiently import/export the struct into/out of GraphBLAS in constant time. The inputs to the above functions can be GraphBLAS sparse matrices or MATLAB sparse matrices, interchangebly. The output is a GraphBLAS sparse matrix, except for gbsparse which converts a GraphBLAS sparse matrix into a MATLAB sparse matrix, typecasting if need be.

For a comparison, here is the sparse DDN MATLAB reference, at http://graphchallenge.mit.edu/challenges , slightly editted for style:

function Y = inferenceReLUvec (W, bias, Y0) % Performs ReLU inference using input feature vector(s) Y0, DNN weights W, % and bias vectors. Y = Y0 ; % Initialize feature vectors. for i=1:length(W) % Loop through each weight layer W{i} % Propagate through layer. Z = Y * W {i} ; b = bias {i} ; % Apply bias to non-zero entries: Y = Z + (double(logical(Z)) .* b) ; Y (Y < 0) = 0 ; % Threshold negative values. Y (Y > 32) = 32 ; % Threshold maximum values. end

Here's what the Sparse DDN looks like in this (draft) MATLAB interface. If W, bias, and Y0 are MATLAB sparse matrices then:

% convert W and Y0 to GraphBLAS sparse single-precision matrices Y = gb (Y, 'single') ; n = length (bias {1}) ; for i = 1:length (W) W {i} = gb (W {i}, 'single') ; B {i} = gbbuild (1:n, 1:n, bias {i}, n, n, '+', 'single') ; end

The above step can be skipped if the problem is constructed in GraphBLAS sparse matrices in the first place. With the whole problem stored in GraphBLAS format, the inferenceReLU becomes:

function Y = inferenceReLUvec (W, bias, Y0) % Performs ReLU inference using input feature vector(s) Y0, DNN weights W, and % bias vectors. Y = Y0; % Initialize feature vectors. for i=1:length(W) % Loop through each weight layer W{i} % Propagate through layer. Y = gbmxm ([ ], [ ], [ ], '+.*', Y, W {i}) ; % apply bias Y = gbmxm ([ ], [ ], [ ], '+.+', Y, B {i}) ; % apply ReLU Y = gbselect ([ ], [ ], [ ], '>0', Y) ; % apply threshold via user-defined function. But I can't create a % user-defined operator that can be called from the MATLAB interface yet. % Y = gbapply ([ ], [ ], [ ], 'user defined ymax(x) = min(x,32)', Y) ; % So gbselect is used to construct a mask M so that M(i,j)=1 if % Y(i,j)>32, then gbassign is used with a mask to set those values to 32: M = gbselect ([ ], [ ], [ ], '>thunk', Y, 32) ; Y = gbassign (Y, M, [ ], 32) ; end

The semirings and operators are determined by MATLAB strings. A semiring has the format 'add.mult.type', where the type is optional. The type defaults to the input matrix type (say either A or B for GrB_mxm; I have to decide this).

Note that the format of most GraphBLAS operators is like this:

Cout = gbmxm (Cin, Mask, accum, semiring, A, B, descriptor) ;

where if Cin is empty then it would be an implied empty matrix of the right size and type, on input the the GraphBLAS call:

info = GrB_mxm (C, Mask, accum, semiring, A, B, descriptor) ;

The GraphBLAS call modifies its input/output argument C. But the MATLAB MATLAB mexFunction is not supposed to modify its inputs (well, it can if you're careful, but it breaks the MATLAB API so any design would be fragile).

This constraint has deeper consequences to the efficiency of what you can do in MATLAB with GraphBLAS. I can't exploit non-blocking mode, and thus setElement and lots of tiny GrB_assign's will be slower than if you were using GraphBLAS from its native C API.

Variable arguments are easy to do in MATLAB, so as shorthand, the descriptor is left off. I could also detect where the string is that defines the operator, and this would tell me that the Cin, Mask, and accum operator do not appear. This is easy to do, so the above code becomes simpler:

function Y = inferenceReLUvec (W, bias, Y0) % Performs ReLU inference using input feature vector(s) Y0, DNN weights W, and % bias vectors. Y = Y0; % Initialize feature vectors. for i=1:length(W) % Loop through each weight layer W{i} % Propagate through layer. Y = gbmxm ('+.*', Y, W {i}) ; % apply bias Y = gbmxm ('+.+', Y, B {i}) ; % apply ReLU Y = gbselect ('>0', Y) ; % apply threshold via user-defined function. But I can't create a user-defined % operator that can be called from the MATLAB interface ... yet. % Y = gbapply ([ ], [ ], [ ], 'user defined ymax(x) = min(x,32)', Y) ; % So gbselect is used to construct a mask M, then gbassign to set them to 32: M = gbselect ('>thunk', Y, 32) ; Y = gbassign (Y, M, [ ], 32) ; % note that I and J default to ':' end

and if you like MATLAB one-liners, I can write the inner-most loop in just 2 lines of MATLAB. That's terser than the MATLAB reference implementation because applying the bias has to be done in a convoluted manner if using just pure MATLAB.

function Y = inferenceReLUvec (W, bias, Y0) Y = Y0 ; for i=1:length(W) Y = gbselect ('>0', gbmxm ('+.+', gbmxm ('+.*', Y, W {i}), B {i})) ; Y = gbassign (Y, gbselect ('>thunk', Y, 32), [ ], 32) ; end

Cool. The code above should be as fast or almost as a fast as the native C API, because in the C API for this particular use of GraphBLAS I don't get any speedup from my non-blocking exploit anyway. The MATLAB interface can't exploit it, and the LAGraph_dcc C function ( https://github.com/GraphBLAS/LAGraph/blob/master/Source/Algorithm/LAGraph_dnn.c ) doesn't, so the performance will be about the same. I think the number of matrix copies that will need to be made will even be the same.

Which is to say, on the large Sparse Deep Neural Network Challenge, it will be perhaps 50x to 100x faster then the MATLAB reference implementation on my 20-core Intel-Xeon NVIDIA DGX workstation (without using the GPUs ... yet).

The above code returns Y as a GraphBLAS sparse matrix. If you want Y back as a MATLAB sparse matrix, at the end, just do

Y = gbsparse (Y) ;

That will also typecast Y from its GrB_FP32 type ('single') back to the MATLAB double sparse type. MATLAB supports only sparse logical, double, and double complex. GraphBLAS will support all those too, plus any sparse integer matrices.

This is work in progress. I have the gb and gbbuild functions implemented, and I see their performance is not any worse than the native C API. That means gbbuild is thus a bit faster than A = sparse (I,J,X) in MATLAB R2018b on my 2013 dual-core MacBook pro (about the same if I,J are unsorted, 3x faster if not, and about 1.5x faster when I use both cores). On my new Dell XPS 13 laptop, (Ubuntu Linux 18.04, gcc 7.3, Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz), gbbuild is 5x faster than same the MATLAB sparse function. These are not estimates, but real timings.

So I expect that the other gb* functions will also be just as fast as the native C API. The only thing that will be slower is a sequence of GrB_setElement or GrB_assign operations. I leave the matrix with pending work in this case (using zombies and pending tuples) and then only at the end, when the matrix is needed for something else, do I kill all the zombies and assemble the pending tuples. This will not yet be possible in a MATLAB interface, since this exploit requires that I modify an input matrix. I can't make a copy since that defeats the whole purpose. A single update can be just O(1) in size and take O(1) time, to modify a matrix with O(nvals(A)) entries. But the MATLAB interface forces me to make a copy, thus limiting my exploit of the GraphBLAS non-blocking mode.

If I do figure out a stable and portable way to break the MATLAB convention, and do this in MATLAB:

gbsetelement (C, i, j, x) ; % do C(i,j)=x but do not finish the work

Then it would be just as fast as the native C API. The above statement is not possible as a MATLAB m-file, of course. A MATLAB mexFunction *can* (but is not supposed to) modify its input arguments. When I try, so far, I get a segfault. So for now, I won't write the above function.

In the future, I do hope to create an object-oriented wrapper for these functions. The difficulty in this will be determining what semiring to use. I can write C=A*B in MATLAB for objects A, B, and C, and the '*' operator gets captured by the 'TIMES' method. But there is no way to specify the semiring to use in the MATLAB expression C=A*B.