About AVX CPP¶
Note
In case of GCC and Clang turning on specific optimization flags might produce code very similar in performance to this wrote by hand. Both of these compilers can produce well optimized code when SIMD instructions are enabled. For MSVC manually optimized SIMD can result in code that is few times faster than “optimized” by MSVC.
While developing this library I have followed the principles listed below:
Performance - the library should be as fast as possible, resulting in code that has as little overhead caused by abstraction as possible (that’s why almost all methods are inline).
Exception Avoidance - the library should not throw exceptions as they make code slower and harder to read. An exception to this rule is running code in Debug mode, where exceptions are explicitly thrown. Otherwise, for invalid inputs, functions have no effect or guarantee not to cause undefined behavior (e.g. buffer overflow).
Ease of Use - the library should be easy to use providing intuitive API for introducing SIMD to existing code.
Simplicity - no one likes to use overcomplicated code with many abstraction layers. That’s why each class is an independent unit (even though all of them share same namespace and most methods) not using polymorphism or iheritance.
Portability - the library should be available on Windows and Linux platforms while supporting all major compilers (GCC, Clang and MSVC). For obvious reasons the library is targeting only x86 architecture supporting AVX2 (mandatory) and AVX512 (optional).
TDD - each feature should be tested before being added to the library (along with adding a feature, tests for it should be developed). The tests should be run on all supported platforms and compilers to ensure seamless integration [1]. Tests can be found in src/tests folder.
List of classes¶
The library provides following classes, which can be used (all of which are within avx
namespace):
Char256
holds 32 chars (8-bit integers),UChar256
holds 32 unsigned chars (8-bit unsigned integers),Short256
holds 16 shorts (16-bit integers),UShort256
holds 16 unsigned shorts (16-bit unsigned integers),Int256
holds 8 32-bit integers,UInt256
holds 8 32-bit unsigned integers,Long256
holds 4 64-bit integers,ULong256
holds 4 64-bit unsigned integers,Float256
holds 8 32-bit floats,Double256
holds 4 64-bit doubles
Features¶
Each of the types supports regular math operators:
+
and+=
-
and-=
*
and*=
/
and/=
Each operator supports class and scalar values (of corresponding type).
For example +
operator for Int256
will have two versions, one accepting Int256
and other accepting int
.
All types also support following logical operators:
==
, accepts types like math operators,!=
, same as above + ignores -0.0 for floating-point types.
Integer types also support bitwise operators:
&
and&=
|
and|=
^
and^=
~
without arguments<<
and<<=
, this is an exception, accepts same class orunsigned int
>>
and>>=
, this is an exception, accepts same class orunsigned int
Each class will provide list of operators and used SIMD instructions.
Limitations¶
Long256
and ULong256
data loading and saving overhead results in SIMD version being slower, than regular sequential solution.Each constructor or load
method expects source to contain at least 32 bytes of continous data.
Providing pointer to memory, which contains less than 32 bytes can result in undefined behavior.