CUDA Fast Math

As noted in Fastmath, for certain classes of applications that utilize floating point, strict IEEE-754 conformance is not required. For this subset of applications, performance speedups may be possible.

The CUDA target implements Fastmath behavior with two differences.