The bottleneck package provides very fast implementations of numpy functions like median that aggregate data. It accommodates masking by replacing masked values with numpy.NaN.
How much faster is bottleneck? The median on masked data is roughly 1000x faster than numpy.
Note
The latest version of bottleneck works only with numpy 1.8.0 or later.
This should be easy: pip install bottleneck will do the trick.
To use bottleneck, we need to do three things:
The function below can be used as a replacement for numpy.ma.median:
def bn_median(masked_array, axis=None):
"""
Perform fast median on masked array
Parameters
masked_array : `numpy.ma.masked_array`
Array of which to find the median.
axis : int, optional
Axis along which to perform the median. Default is to find the median of
the flattened array.
"""
import numpy as np
import bottleneck as bn
data = masked_array.filled(fill_value=np.NaN)
med = bn.nanmedian(data, axis=axis)
# construct a masked array result, setting the mask from any NaN entries
return np.ma.array(med, mask=np.isnan(med))
To use this with sigma_clipping (assuming you have done all of the necessary imports first and created the combiner):
>>> my_combiner.sigma_clipping(func=bn_median)
To perform median_combine with bottleneck:
>>> my_combiner.median_combine(median_func=bn_median)
Feedback on whether this should be incorporated directly into ccdproc would be appreciated.