• A bug in the OpenCL implementation of the radial profiles prevented Models with multiple profiles from displaying correctly, as the output image would contain only the values of last profile. This was a problem introduced only in the last version of libprofit, and not an ongoing issue.
  • When using OpenCL, any radial profile specifying rough=true caused the output image not to be scaled properly, with values not taking into account the profile’s magnitude or pixel scale. This seems to have been an issue for a long time, but since rough=true is not a common option it had gone under the radar for some time.


  • All profile evaluation has been changed from being absolute (profiles set the final value of a pixel) to be additive (their add their pixel values onto the image). This change in behavior has the effect that one less memory allocation is needed, which can be a big difference when generating big images, while also simplifying the logic of the Model evaluation.
  • Model objects now internally store the normalized version of the PSF given by the user instead of the original, which was never really needed.
  • profit-cli now makes it easier to specify multiple copies of the same profile, useful for scaling tests. Also, writing FITS files in little endian systems doesn’t allocate extra memory anymore.
  • Minor improvements to imaging classes.


  • The implementation of the Model class has been improved. In particular it has been made more memory efficient, which is particularly important in scenarios where many profiles (in the order of thousands) are added into it. Previously each profile was allocated its own Image, which added both to the memory footprint, and to the total runtime. Now a single scratch space is used for all profiles, and individual results are immediately summed up, respecting the convolution settings of each profile. Experiments with the null profile show a significant decrease in runtime when many Model evaluations take place.


  • Implemented correct flux capturing. This feature was previously implemented in the ProFit R package as part of its fitting process, but it was otherwise unavailable.
  • Added explicit support to allow convolution of images against kernels with bigger dimensions than the images themselves. This was previously supported implicitly, and only in certain cases, by the OpenCL convolver, while the FFT convolver threw an proper exception, and the brute-force convolvers usually crashed. This first implementation is not ideal, but the use case is rare.
  • Several performance and code improvements, like removing unnecessary code, avoiding unnecessary conversions and avoiding a few dynamic allocations.


  • Users can now select the underlying SIMD-capable instruction set to use for brute-force convolution.
  • New library method has_simd_instruction_set() for users to check whether libprofit was compiled with support for different instruction sets.
  • Improved FFTW-based convolver performance by avoiding dynamic memory allocation at convolution time. This brings a noticeable performance improvement of around 20%.


  • Adding support for FFTW versions lower than 3.3.


  • profit-cli compiling in Windows.
  • New Profile::parameter() method to specify parameters and values with a single name=value string.
  • New utility methods: trim(), split() and setenv().
  • Using SSE2/AVX SIMD extensions to implement brute-force convolution if the CPU supports it, with pure C++ implementation as a fallback. Can be disabled with -DLIBPROFIT_NO_SIMD=ON.
  • Potentially fixed the importing of FFTW wisdom files in systems with more than one FFTW installation.
  • Fixed compilation of brokenexponential OpenCL kernel in platforms where it was failing to compile.
  • Compiling in release mode (i.e., -O3 -DNDEBUG in gcc/clang) by default.
  • Lowering OpenMP requirement to 2.0 (was 3.0).
  • OpenCL kernel cache working for some platforms/devices that was not previously working.
  • Many internal code cleanups and design changes to make code easier to read and maintain.


  • FFT convolution using hermitian redundancy. This increases performance of FFT-based convolution by at least 10% in release builds, and addresses some warnings pointed out by valgrind.


  • Added init_diagnose() and finish_diagnose() functions to avoid printing to stdout/stderr from within libprofit.


  • Fixed double detection support for OpenCL devices regardless of the supported OpenCL version.
  • Fixed a few compiling issues under Visual Studio compiler.
  • Continuous integration in Windows via AppVeyor



  • Internal implementation dependencies clearly hidden from users. This means that users compiling against libprofit don’t need to search for header files other than libprofit’s, making it much easier to write code against libprofit.
  • Model redesigned. No member variables are exposed anymore; instead different setter/getter methods must be used.
  • Image redesigned. In summary, it looks much more like a standard container now.
  • New Model::set_crop() specifies whether cropping should be carried out after convolution, if the convolution needs to pad the image.
  • Model::evaluate() has an extra optional parameter to receive the offset at which cropping needs to happen (if it hasn’t, see Model::set_crop()) to remove padding from the resulting image.
  • FFTW convolution uses real-to-complex and complex-to-real forward and backwards transforms respectively (instead of complex-to-complex transforms both ways), which is more efficient and should use less memory.
  • New on-disk OpenCL kernel cache. This speeds up the creation of OpenCL environments by a big factor as compilation of kernels doesn’t happen every time an environment is created.
  • New on-disk FFTW plan cache. This speeds up the creation of FFT-based convolvers by a big factor as the plans are not calculated every time for a given set of parameters.
  • New null profile, useful for testing.
  • New init() and finish() calls to initialize and finalize libprofit. These are mandatory, and should be called before and after using anything else from libprofit.


  • Brute-force convolver about 3x faster than old version.
  • Fixing compilation failure on MacOS introduced in 1.6.0.
  • Center pixel in sersic profile treated specially only if adjust parameter is on.