Age | Commit message (Collapse) | Author |
|
Patch by Thomas Jarosch.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@28913 a1c6a512-1295-4272-9138-f99709370657
|
|
by ~6%.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@28632 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@28439 a1c6a512-1295-4272-9138-f99709370657
|
|
it might a bug in the 4 years old gcc version, but __ASSEMBLER__ is not
defined when preprocessing .S files with -std=gnu99
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@28026 a1c6a512-1295-4272-9138-f99709370657
|
|
N900. Speedup is 2.1x for -c5000 compared to the ARMv6 asm. Note that actually compiling it on device requires hand-assembling the 'vadd' and 'vsub' instructions due to a bug in binutils 2.18.50, and making the standalone decoder use it requires Makefile and demac_config.h hacks.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@27944 a1c6a512-1295-4272-9138-f99709370657
|
|
fancy preprocessor stuff.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@27490 a1c6a512-1295-4272-9138-f99709370657
|
|
We can't pop into pc on ARMv4t when using thumb: the T bit won't be
modified if we are returning to a thumb function
Code running on ARMv4t should use the new ldrpc / ldmpc macros instead
of ldr pc, [sp], #4 and ldm(cond) sp!, {regs, pc}
No modification on pure ARM builds and ARMv5+
Note: USE_THUMB is currently never defined, no targets can currently be
built with -mthumb, see FS#6734
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@26756 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@26376 a1c6a512-1295-4272-9138-f99709370657
|
|
directory, also standard'ify some parts of the code base (almost entirely #include fixes).
This is to a) to cleanup firmware/common and firmware/include a bit, but also b) for Rockbox as an application which should use the host system's c library and headers, separating makes it easy to exclude our files from the build.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@25850 a1c6a512-1295-4272-9138-f99709370657
|
|
about half of the performance gap towards PP5022. The (relatively large) buffers for decoded data stay in IRAM, as does the reciprocal table. Clarify some comments.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@25108 a1c6a512-1295-4272-9138-f99709370657
|
|
dual-core split on PP. This also means less inlining, and hence speeds up decoding on single core slightly, due to better caching behaviour.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@25005 a1c6a512-1295-4272-9138-f99709370657
|
|
gcc to wrongly estimate the size of the asm(), leading to (potential) compilation problems. This is necessary for the upcoming restructuring, and should fix ARMv6+ sim builds as well. No functional change.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@25004 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24862 a1c6a512-1295-4272-9138-f99709370657
|
|
(high bit set) numerators.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24783 a1c6a512-1295-4272-9138-f99709370657
|
|
but speeds up decoding on x86/x86_64 sims. Average speedup ranges from 25% for -c2000 to 3 times for -c5000; on Intel Atom it's even 45% for -c2000 to 6 times for -c5000.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24663 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24602 a1c6a512-1295-4272-9138-f99709370657
|
|
~4% for -c2000..-c4000 (less for -c5000). Thanks to Frank Gevaerts for testing.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24590 a1c6a512-1295-4272-9138-f99709370657
|
|
-c2000, ~7% for -c3000 and higher.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24569 a1c6a512-1295-4272-9138-f99709370657
|
|
selection.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24512 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24508 a1c6a512-1295-4272-9138-f99709370657
|
|
Done by linking first with the table empty to determine free space, then sizing table to fill it.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24507 a1c6a512-1295-4272-9138-f99709370657
|
|
codec and an optimized divider is already provided for general use in codeclib.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24506 a1c6a512-1295-4272-9138-f99709370657
|
|
fusing vector math for the filters. Speedup is roughly 3.5% for -c2000, 8% for -c3000 and 12% for -c4000. To be extended to other architectures.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24473 a1c6a512-1295-4272-9138-f99709370657
|
|
out of IRAM for sizes that aren't near realtime and extend udiv32_arm reciprocal table.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24376 a1c6a512-1295-4272-9138-f99709370657
|
|
* Use Newton-Raphson divider on ARMv5e and ARMv6, about 7% speedup on Gigabeat S.
* On ARMv4 targets using IRAM, remove insane filter buffer from IRAM, fill available IRAM with LUT of reciprocals for small divisors - speedup varies according to target and available IRAM, APE normal sample is approx. 109% RT on e200.
* Rename apps/codecs/lib/udiv32_armv4.S to apps/codecs/lib/udiv32_arm.S, which includes dividers for all ARM targets specialized for APE.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@24354 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@23594 a1c6a512-1295-4272-9138-f99709370657
|
|
gain from using them is minimal (basically code size only).
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@21916 a1c6a512-1295-4272-9138-f99709370657
|
|
Use a smaller PCM buffer on targets with 2MB or less ram.
(FS#9703)
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19743 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19643 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19612 a1c6a512-1295-4272-9138-f99709370657
|
|
-c2000), but also helps on the arm targets (+0.9% for -c2000 on PP5002). This transformation is oveflow safe, as absres < 2^24 is guaranteed.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19556 a1c6a512-1295-4272-9138-f99709370657
|
|
standalone decoder contain debugging information.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19552 a1c6a512-1295-4272-9138-f99709370657
|
|
bit output in the standalone decoder a bit.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19517 a1c6a512-1295-4272-9138-f99709370657
|
|
on PP, ~8% on Gigabeat S (less for higher compression levels). Also fix some overlooked comments in the stereo predictor.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19375 a1c6a512-1295-4272-9138-f99709370657
|
|
for mono -c1000. Apply ideas gained from it back to the stereo predictor, saving 4 instructions. No speed increase for stereo, probably due to cache aliasing effects. * 80-column police.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19296 a1c6a512-1295-4272-9138-f99709370657
|
|
registers, for a slight speedup.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19287 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19268 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19264 a1c6a512-1295-4272-9138-f99709370657
|
|
giving a nice speedup for the higher compression levels (tested on Cowon D2).
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19260 a1c6a512-1295-4272-9138-f99709370657
|
|
add+ldmia/stmia for 2 registers. On ARM7TDMI a str pair is equally fast, so go for the simpler macro and use it for all ARMv4.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19250 a1c6a512-1295-4272-9138-f99709370657
|
|
shuffling around the register allocation somewhat. Performance on ARMv4 is unaffected.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19248 a1c6a512-1295-4272-9138-f99709370657
|
|
will be used in the dual core split.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19236 a1c6a512-1295-4272-9138-f99709370657
|
|
(sometimes using different registers to allow this). Speeds up the predictor by almost 20% on ARMv6 (overall speedup for -c1000 is 5%), and might also help a bit on ARMv5. ARMv4 speed is unaffected.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19210 a1c6a512-1295-4272-9138-f99709370657
|
|
-fprofile-arcs and gcov) and asm files. Biggest effect on coldfire (-c1000: +8%, -c2000: +5%), but ARM also profits a bit (less than 1% on ARM7TDMI, around 1% on ARM1136).
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19199 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19198 a1c6a512-1295-4272-9138-f99709370657
|
|
tree. Fully controlled dependencies give faster and more correct recompiles.
Many #include lines adjusted to conform to the new standards.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19146 a1c6a512-1295-4272-9138-f99709370657
|
|
repeating blocks. * Use MUL (variant) instead of MLA (variant) in the first step of the ARM scalarproduct() if there's no loop. * Unroll ARM assembler functions to 32 where not already done, plus the generic scalarproduct().
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19144 a1c6a512-1295-4272-9138-f99709370657
|
|
bit filters are faster on ARMv4 (with assembler code), so use them there. Nice speedup on PP and Gigabeat F/X.
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19140 a1c6a512-1295-4272-9138-f99709370657
|
|
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19121 a1c6a512-1295-4272-9138-f99709370657
|
|
was only used there, and defined some variables in the .h
git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19116 a1c6a512-1295-4272-9138-f99709370657
|