Sparse matrices


Note that you want SciPy >= 0.7.2


In SciPy 0.6, has a bug with singleton dimensions. There may be more bugs. It also has inconsistent implementation of sparse matrices.

We do not test against SciPy versions below 0.7.2.

We describe the details of the compressed sparse matrix types.
should be used if there are more rows than column (shape[0] > shape[1]).
should be used if there are more columns than rows (shape[0] < shape[1]).
is faster if we are modifying the array. After initial inserts, we can then convert to the appropriate sparse matrix format.
The following types also exist:
Dictionary of Keys format. From their doc: This is an efficient structure for constructing sparse matrices incrementally.
Coordinate format. From their lil doc: consider using the COO format when constructing large matrices.
There seems to be a new format planned for scipy 0.7.x:
Block Compressed Row (BSR). From their doc: The Block Compressed Row (BSR) format is very similar to the Compressed Sparse Row (CSR) format. BSR is appropriate for sparse matrices with dense sub matrices like the last example below. Block matrices often arise in vector-valued finite element discretizations. In such cases, BSR is considerably more efficient than CSR and CSC for many sparse arithmetic operations.
Sparse matrix with DIAgonal storage

There are four member variables that comprise a compressed matrix sp (for at least csc, csr and bsr):

gives the shape of the matrix.
gives the values of the non-zero entries. For CSC, these should be in order from (I think, not sure) reading down in columns, starting at the leftmost column until we reach the rightmost column.
gives the location of the non-zero entry. For CSC, this is the row location.
gives the other location of the non-zero entry. For CSC, there are as many values of indptr as there are columns + 1 in the matrix. sp.indptr[k] = x and indptr[k+1] = y means that column k contains[x:y], i.e. the xth through the y-1th non-zero values.

See the example below for details.

>>> import scipy.sparse
>>> sp = scipy.sparse.csc_matrix((5, 10))
>>> sp[4, 0] = 20
/u/lisa/local/byhost/ SparseEfficiencyWarning: changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient.
>>> sp[0, 0] = 10
>>> sp[2, 3] = 30
>>> sp.todense()
matrix([[ 10.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,  30.,   0.,   0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.],
        [ 20.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.]])
>>> print sp
  (0, 0)        10.0
  (4, 0)        20.0
  (2, 3)        30.0
>>> sp.shape
(5, 10)
array([ 10.,  20.,  30.])
>>> sp.indices
array([0, 4, 2], dtype=int32)
>>> sp.indptr
array([0, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3], dtype=int32)

Several things should be learned from the above example:

  • We actually use the wrong sparse matrix type. In fact, it is the rows that are sparse, not the columns. So, it would have been better to use sp = scipy.sparse.csr_matrix((5, 10)).
  • We should have actually created the matrix as a lil_matrix, which is more efficient for inserts. Afterwards, we should convert to the appropriate compressed format.
  • sp.indptr[0] = 0 and sp.indptr[1] = 2, which means that column 0 contains[0:2], i.e. the first two non-zero values.
  • sp.indptr[3] = 2 and sp.indptr[4] = 3, which means that column 3 contains[2:3], i.e. the third non-zero value.

TODO: Rewrite this documentation to do things in a smarter way.


For faster sparse code:
  • Construction: lil_format is fast for many inserts.
  • Operators: “Since conversions to and from the COO format are quite fast, you can use this approach to efficiently implement lots computations on sparse matrices.” (Nathan Bell on scipy mailing list)


The sparse equivalent of dmatrix is csc_matrix and csr_matrix.

Dot vs. StructuredDot

Often when you use a sparse matrix it is because there is a meaning to the structure of non-zeros. The gradient on terms outside that structure has no meaning, so it is computationally efficient not to compute them.

StructuredDot is when you want the gradient to have zeroes corresponding to the sparse entries in the matrix.

TrueDot and Structured dot have different gradients but their perform functions should be the same.

The gradient of TrueDot can have non-zeros where the sparse matrix had zeros. The gradient of StructuredDot can’t.

Suppose you have dot(x,w) where x and w are square matrices. If w is dense, like randn((5,5)) and x is of full rank (though potentially sparse, like a diagonal matrix of 1s) then the output will be dense too. (But i guess the density of the output is a red herring.) What’s important is the density of the gradient on the output. If the gradient on the output is dense, and w is dense (as we said it was) then the True gradient on x will be dense. If our dot is a TrueDot, then it will say that the gradient on x is dense. If our dot is a StructuredDot, then it will say the gradient on x is only defined on the diagonal and ignore the gradients on the off-diagonal.