diff options
author | Jiong Wang <jiong.wang@netronome.com> | 2019-05-30 21:23:18 +0100 |
---|---|---|
committer | Alexei Starovoitov <ast@kernel.org> | 2019-05-31 17:07:13 -0700 |
commit | c231c22a989af95fae3e75cac9d4511e0fe79377 (patch) | |
tree | c3bb3a7f99175f958731086b8efc8b8dd9d5361e /Documentation/bpf | |
parent | d168286d773ca7d5f9e8de8765216557839579d8 (diff) |
bpf: doc: update answer for 32-bit subregister question
There has been quite a few progress around the two steps mentioned in the
answer to the following question:
Q: BPF 32-bit subregister requirements
This patch updates the answer to reflect what has been done.
v2:
- Add missing full stop. (Song Liu)
- Minor tweak on one sentence. (Song Liu)
v1:
- Integrated rephrase from Quentin and Jakub
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'Documentation/bpf')
-rw-r--r-- | Documentation/bpf/bpf_design_QA.rst | 30 |
1 files changed, 25 insertions, 5 deletions
diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index cb402c59eca5..12a246fcf6cb 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit CPU architectures and 32-bit HW accelerators. Can true 32-bit registers be added to BPF in the future? -A: NO. The first thing to improve performance on 32-bit archs is to teach -LLVM to generate code that uses 32-bit subregisters. Then second step -is to teach verifier to mark operations where zero-ing upper bits -is unnecessary. Then JITs can take advantage of those markings and -drastically reduce size of generated code and improve performance. +A: NO. + +But some optimizations on zero-ing the upper 32 bits for BPF registers are +available, and can be leveraged to improve the performance of JITed BPF +programs for 32-bit architectures. + +Starting with version 7, LLVM is able to generate instructions that operate +on 32-bit subregisters, provided the option -mattr=+alu32 is passed for +compiling a program. Furthermore, the verifier can now mark the +instructions for which zero-ing the upper bits of the destination register +is required, and insert an explicit zero-extension (zext) instruction +(a mov32 variant). This means that for architectures without zext hardware +support, the JIT back-ends do not need to clear the upper bits for +subregisters written by alu32 instructions or narrow loads. Instead, the +back-ends simply need to support code generation for that mov32 variant, +and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to +enable zext insertion in the verifier). + +Note that it is possible for a JIT back-end to have partial hardware +support for zext. In that case, if verifier zext insertion is enabled, +it could lead to the insertion of unnecessary zext instructions. Such +instructions could be removed by creating a simple peephole inside the JIT +back-end: if one instruction has hardware support for zext and if the next +instruction is an explicit zext, then the latter can be skipped when doing +the code generation. Q: Does BPF have a stable ABI? ------------------------------ |