summaryrefslogtreecommitdiff
path: root/arch/arm64/crypto/chacha-neon-core.S
AgeCommit message (Collapse)Author
2019-02-28crypto: arm64/chacha - fix hchacha_block_neon() for big endianEric Biggers
On big endian arm64 kernels, the xchacha20-neon and xchacha12-neon self-tests fail because hchacha_block_neon() outputs little endian words but the C code expects native endianness. Fix it to output the words in native endianness (which also makes it match the arm32 version). Fixes: cc7cf991e9eb ("crypto: arm64/chacha20 - add XChaCha20 support") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-28crypto: arm64/chacha - fix chacha_4block_xor_neon() for big endianEric Biggers
The change to encrypt a fifth ChaCha block using scalar instructions caused the chacha20-neon, xchacha20-neon, and xchacha12-neon self-tests to start failing on big endian arm64 kernels. The bug is that the keystream block produced in 32-bit scalar registers is directly XOR'd with the data words, which are loaded and stored in native endianness. Thus in big endian mode the data bytes end up XOR'd with the wrong bytes. Fix it by byte-swapping the keystream words in big endian mode. Fixes: 2fe55987b262 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: arm64/chacha - use combined SIMD/ALU routine for more speedArd Biesheuvel
To some degree, most known AArch64 micro-architectures appear to be able to issue ALU instructions in parellel to SIMD instructions without affecting the SIMD throughput. This means we can use the ALU to process a fifth ChaCha block while the SIMD is processing four blocks in parallel. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: arm64/chacha - optimize for arbitrary length inputsArd Biesheuvel
Update the 4-way NEON ChaCha routine so it can handle input of any length >64 bytes in its entirety, rather than having to call into the 1-way routine and/or memcpy()s via temp buffers to handle the tail of a ChaCha invocation that is not a multiple of 256 bytes. On inputs that are a multiple of 256 bytes (and thus in tcrypt benchmarks), performance drops by around 1% on Cortex-A57, while performance for inputs drawn randomly from the range [64, 1024) increases by around 30%. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: arm64/chacha20 - refactor to allow varying number of roundsEric Biggers
In preparation for adding XChaCha12 support, rename/refactor the ARM64 NEON implementation of ChaCha20 to support different numbers of rounds. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>