In our problem we end up with the following:

If we assume that N is proportional to the identity matrix then it factors out as a constant factor. This means that we end up with the following for our maximum likelihood:

This is a traditional “inverse” problem. There are many ways to solve this, include “Least squares” but also using the “pseudo-inverse” of gamma. It’s useful to think about the latter for a moment because it can help us characterize the situation quite well. It’ll be easier to start by making the RHS a square matrix.

We can also note that if you take the derivative of this with respect to S you end up with the inverse of the covariance matrix:

Note that we can get an estimate of the model parameters, S, but that the observed cross-correlation data is actually product of some “true” set of model parameters convolved with our gamma matrix. That means we can get a (somewhat obvious) relationship between our estimated and “true” parameters:

If we can perfectly invert our matrix, then of course R is the identity matrix and we are perfectly estimating our parameters. In our case, the name of the game is estimating that inverse. The more parameters we have when we do it, the better we do it. However, the more parameters we have, the larger the covariance because our sky directions are all correlated!

We can use a pseudo-inverse to estimate that matrix. A pseudo-inverse uses a singular-value decomposition where, roughly, for a matrix M, you do something similar to diagonalizing You invert the diagonal matrix (which contains the “singular values”), and then multiply that inverted matrix by the matrices used in the diagonalization again. The singular values of a matrix M are given by the square root of the eigenvalues of Wikipedia has a good longer discussion.

Often, some of these singular values will be very small. We can keep them, and in bulk they’ll help give us resolution (better accurately estimate our model parameters), but we introduce extra covariance because our pixels are correlated. We have to make a trade-off between including some singular values and not including others. This becomes an optimization problem.

To start, we can look at the resolution matrix when we keep singular values above 10^{-4}. We see that some of our estimates will have contributions from the other pixels. We want to maximize the diagonals of this matrix! Ideally, they’d be 1.

For many different cut-off values for which singular values we keep, we can plot the diagonal above. We can also plot the diagonal of the covariance matrix. See those to plots below:

We want the left-hand plot to be large, while we want the right-hand plot to be small.

These are all for a single p-wave injection, but it’s important to remember that these matrices are independent of our data! However, we can see exactly these effects by looking at the maps. See the very end of this note.

In the end, we want to have the best of both worlds. If we can evaluate the “spread” of the resolution matrix, then I think we can choose an SVD cutoff such that the spread of that resolution matrix is roughly the diffraction limited spot size. This would be equivalent to what is done for the LIGO spherical harmonics search (except they worried about L_maxes for the sky instead of number of singular values…we could consider adding the size of our pixels into the mix as well and doing more optimization as well…).

We could also use the least-squares algorithm all the time instead of calculating a pseudo-inverse (which is what is done above), but if we’re going to do that we’d need to look in to how best to estimate the resolution and covariance matrices to try to characterize that method. I believe the python version of “lsqr” will estimate the covariance matrix for us, so we could potentially use that.

Cutoff: 10^{-1} (Good SNR, bad localization)

Cutoff: 10^{-10} (bad SNR, good localization)