CUDA Fast Math
As noted in Fastmath, for certain classes of applications that utilize floating point, strict IEEE-754 conformance is not required. For this subset of applications, performance speedups may be possible.
The CUDA target implements Fastmath behavior with two differences.
fastmathargument to the
@jit decoratoris limited to the values
True, the following optimizations are enabled:
Flushing of denormals to zero.
Use of a fast approximation to the square root function.
Use of a fast approximation to the division operation.
Contraction of multiply and add operations into single fused multiply-add operations.
See the documentation for nvvmCompileProgram for more details of these optimizations.
Secondly, calls to a subset of math module functions on
float32operands will be implemented using fast approximate implementations from the libdevice library.