
Faster medians using bottleneck

The bottleneck package provides very fast implementations of numpy functions like median that aggregate data. It accommodates masking by replacing masked values with numpy.NaN.

How much faster is bottleneck? The median on masked data is roughly 1000x faster than numpy.


The latest version of bottleneck works only with numpy 1.8.0 or later.

Installing bottleneck

This should be easy: pip install bottleneck will do the trick.

Using bottleneck with ccdproc

To use bottleneck, we need to do three things:

  1. Fill any mask values in the data array with numpy.NaN.
  2. Pass the data into, e.g., bottleneck.nanmedian()
  3. Create a mask for the result by masking all numpy.NaN values.

The function below can be used as a replacement for

def bn_median(masked_array, axis=None):
    Perform fast median on masked array


    masked_array : ``
        Array of which to find the median.

    axis : int, optional
        Axis along which to perform the median. Default is to find the median of
        the flattened array.
    import numpy as np
    import bottleneck as bn
    data = masked_array.filled(fill_value=np.NaN)
    med = bn.nanmedian(data, axis=axis)
    # construct a masked array result, setting the mask from any NaN entries
    return, mask=np.isnan(med))

To use this with sigma_clipping (assuming you have done all of the necessary imports first and created the combiner):

>>> my_combiner.sigma_clipping(func=bn_median)  

To perform median_combine with bottleneck:

>>> my_combiner.median_combine(median_func=bn_median)  

Feedback on whether this should be incorporated directly into ccdproc would be appreciated.