Age | Commit message (Collapse) | Author |
|
At present the lppaca - the structure shared with the iSeries
hypervisor and phyp - is contained within the PACA, our own low-level
per-cpu structure. This doesn't have to be so, the patch below
removes it, making a separate array of lppaca structures.
This saves approximately 500*NR_CPUS bytes of image size and kernel
memory, because we don't need aligning gap between the Linux and
hypervisor portions of every PACA. On the other hand it means an
extra level of dereference in many accesses to the lppaca.
The patch also gets rid of several places where we assign the paca
address to a local variable for no particular reason.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
The current ppc64 per cpu data implementation is quite slow. eg:
lhz 11,18(13) /* smp_processor_id() */
ld 9,.LC63-.LCTOC1(30) /* per_cpu__variable_name */
ld 8,.LC61-.LCTOC1(30) /* __per_cpu_offset */
sldi 11,11,3 /* form index into __per_cpu_offset */
mr 10,9
ldx 9,11,8 /* __per_cpu_offset[smp_processor_id()] */
ldx 0,10,9 /* load per cpu data */
5 loads for something that is supposed to be fast, pretty awful. One
reason for the large number of loads is that we have to synthesize 2
64bit constants (per_cpu__variable_name and __per_cpu_offset).
By putting __per_cpu_offset into the paca we can avoid the 2 loads
associated with it:
ld 11,56(13) /* paca->data_offset */
ld 9,.LC59-.LCTOC1(30) /* per_cpu__variable_name */
ldx 0,9,11 /* load per cpu data
Longer term we can should be able to do even better than 3 loads.
If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset
was in a register we could cut it down to one load. A suggestion from
Rusty is to use gcc's __thread extension here. In order to do this we
would need to free up r13 (the __thread register and where the paca
currently is). So far Ive had a few unsuccessful attempts at doing that :)
The patch also allocates per cpu memory node local on NUMA machines.
This patch from Rusty has been sitting in my queue _forever_ but stalled
when I hit the compiler bug. Sorry about that.
Finally I also only allocate per cpu data for possible cpus, which comes
straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128)
and 4 possible cpus we see some nice gains:
total used free shared buffers cached
Mem: 4012228 212860 3799368 0 0 162424
total used free shared buffers cached
Mem: 4016200 212984 3803216 0 0 162424
A saving of 3.75MB. Quite nice for smaller machines. Note: we now have
to be careful of per cpu users that touch data for !possible cpus.
At this stage it might be worth making the NUMA and possible cpu
optimisations generic, but per cpu init is done so early we have to be
careful that all architectures have their possible map setup correctly.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
include/asm-ppc/ had #ifdef __KERNEL__ in all header files that
are not meant for use by user space, include/asm-powerpc does
not have this yet.
This patch gets us a lot closer there. There are a few cases
where I was not sure, so I left them out. I have verified
that no CONFIG_* symbols are used outside of __KERNEL__
any more and that there are no obvious compile errors when
including any of the headers in user space libraries.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
This patch removes several unnecessary fields from the paca:
- next_jiffy_update_tb was simply unused. Remove trivially.
- The exdsi exception save area was not used. There were plans to use
it, but they never seem to have gone anywhere. If they ever do, we
can put it back. Remove from the paca, and from asm-offsets.c
- The default_decr field was used from asm, but was only ever assigned
the value of tb_ticks_per_jiffy. Just access tb_ticks_per_jiffy from
asm directly instead.
Built and booted on POWER5 LPAR and iSeries RS64.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
On iSeries, the paca contains, amongst other things an ItLpRegSave
structure used by the hypervisor to save registers. The hypervisor
locates this area through a pointer at the beginning of the paca, so
the structure itself can be located elsewhere. This patch moves the
reg_save area out into its own array. This reduces the amount of
iSeries specific gunk which is visible to general powerpc code via
paca.h
Built and booted on POWER5 LPAR and iSeries RS64.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
This patch moves a bunch of files from arch/ppc64 and
include/asm-ppc64 which have no equivalents in ppc32 code into
arch/powerpc and include/asm-powerpc. The file affected are:
abs_addr.h
compat.h
lppaca.h
paca.h
tce.h
cpu_setup_power4.S
ioctl32.c
firmware.c
pacaData.c
The only changes apart from the move and corresponding Makefile
changes are:
- #ifndef/#define in includes updated to _ASM_POWERPC_ form
- trailing whitespace removed
- comments giving full paths removed
- pacaData.c renamed paca.c to remove studlyCaps
- Misplaced { moved in lppaca.h
Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64), built
for 32-bit powermac (ARCH=powerpc).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|