* high performance through the integration of BLAS libraries and manually tuned HPC math kernels
* vectorization by SSE, SSE2, SSE3, SSSE3, SSE4, AVX, AVX2, AVX-512, FMA, and SVML
* parallel execution by OpenMP, C++11 threads and Boost threads
* the intuitive and easy to use API of a domain specific language
* unified arithmetic with dense and sparse vectors and matrices
* thoroughly tested matrix and vector arithmetic
* completely portable, high quality C++ source code