Faster medians using bottleneck¶

The bottleneck package provides very fast implementations of numpy functions like median that aggregate data. It accommodates masking by replacing masked values with numpy.NaN.

How much faster is bottleneck? The median on masked data is roughly 1000x faster than numpy.

Note

The latest version of bottleneck works only with numpy 1.8.0 or later.

Installing bottleneck¶

This should be easy: pip install bottleneck will do the trick.

Using bottleneck with `ccdproc`¶

To use bottleneck, we need to do three things:

Fill any mask values in the data array with numpy.NaN.
Pass the data into, e.g., bottleneck.nanmedian()
Create a mask for the result by masking all numpy.NaN values.

The function below can be used as a replacement for numpy.ma.median:

def bn_median(masked_array, axis=None):
    """
    Perform fast median on masked array

    Parameters

    masked_array : `numpy.ma.masked_array`
        Array of which to find the median.

    axis : int, optional
        Axis along which to perform the median. Default is to find the median of
        the flattened array.
    """
    import numpy as np
    import bottleneck as bn
    data = masked_array.filled(fill_value=np.NaN)
    med = bn.nanmedian(data, axis=axis)
    # construct a masked array result, setting the mask from any NaN entries
    return np.ma.array(med, mask=np.isnan(med))

To use this with sigma_clipping (assuming you have done all of the necessary imports first and created the combiner):

>>> my_combiner.sigma_clipping(func=bn_median)  

To perform median_combine with bottleneck:

>>> my_combiner.median_combine(median_func=bn_median)  

Feedback on whether this should be incorporated directly into ccdproc would be appreciated.

Navigation

Faster medians using bottleneck¶

Installing bottleneck¶

Using bottleneck with ccdproc¶

Page Contents

Using bottleneck with `ccdproc`¶