summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-02-17init: Move radix_tree_init() earlyYinghai Lu
Prepare for using radix trees in early_irq_init(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-30-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-17irq: Remove unnecessary bootmem codeYinghai Lu
mem_init is moved early already. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-29-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-16x86: Convert i8259_lock to raw_spinlockThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-10x86: Avoid race condition in pci_enable_msix()Brandon Phiilps
Keep chip_data in create_irq_nr and destroy_irq. When two drivers are setting up MSI-X at the same time via pci_enable_msix() there is a race. See this dmesg excerpt: [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X [ 85.170611] alloc irq_desc for 99 on node -1 [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X [ 85.170614] alloc kstat_irqs on node -1 [ 85.170616] alloc irq_2_iommu on node -1 [ 85.170617] alloc irq_desc for 100 on node -1 [ 85.170619] alloc kstat_irqs on node -1 [ 85.170621] alloc irq_2_iommu on node -1 [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X [ 85.170626] alloc irq_desc for 101 on node -1 [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X [ 85.170630] alloc kstat_irqs on node -1 [ 85.170631] alloc irq_2_iommu on node -1 [ 85.170635] alloc irq_desc for 102 on node -1 [ 85.170636] alloc kstat_irqs on node -1 [ 85.170639] alloc irq_2_iommu on node -1 [ 85.170646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 As you can see igb and ixgbe are both alternating on create_irq_nr() via pci_enable_msix() in their probe function. ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data = NULL via dynamic_irq_init(). igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[] via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this: cfg_new = irq_desc_ptrs[102]->chip_data; if (cfg_new->vector != 0) continue; This hits the NULL deref. Another possible race exists via pci_disable_msix() in a driver or in the number of error paths that call free_msi_irqs(): destroy_irq() dynamic_irq_cleanup() which sets desc->chip_data = NULL ...race window... desc->chip_data = cfg; Remove the save and restore code for cfg in create_irq_nr() and destroy_irq() and take the desc->lock when checking the irq_cfg. Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org> Signed-off-by: Brandon Phililps <bphilips@suse.de> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10Merge branch 'i2c-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: i2c-tiny-usb: Fix on big-endian systems
2010-02-10Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] Fix struct _lowcore layout. [S390] qdio: prevent call trace if CHPID is offline [S390] qdio: continue polling for buffer state ERROR
2010-02-10Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PIT: control word is write-only kvmclock: count total_sleep_time when updating guest clock Export the symbol of getboottime and mmonotonic_to_bootbased
2010-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6: avr32: clean up memory allocation in at32_add_device_mci arch/avr32: Fix build failure for avr32 caused by typo
2010-02-10Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc: Fix address masking bug in hpte_need_flush()
2010-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix dentry hash calculation for case-insensitive mounts [CIFS] Don't cache timestamps on utimes due to coarse granularity [CIFS] Maximum username length check in session setup does not match cifs: fix length calculation for converted unicode readdir names [CIFS] Add support for TCP_NODELAY
2010-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits) drivers/net: Correct NULL test MAINTAINERS: networking drivers - Add git net-next tree net/sched: Fix module name in Kconfig cxgb3: fix GRO checksum check dst: call cond_resched() in dst_gc_task() netfilter: nf_conntrack: fix hash resizing with namespaces netfilter: xtables: compat out of scope fix netfilter: nf_conntrack: restrict runtime expect hashsize modifications netfilter: nf_conntrack: per netns nf_conntrack_cachep netfilter: nf_conntrack: fix memory corruption with multiple namespaces Bluetooth: Keep a copy of each HID device's report descriptor pktgen: Fix freezing problem igb: make certain to reassign legacy interrupt vectors after reset irda: add missing BKL in irnet_ppp ioctl irda: unbalanced lock_kernel in irnet_ppp ixgbe: Fix return of invalid txq ixgbe: Fix ixgbe_tx_map error path netxen: protect resource cleanup by rtnl lock netxen: fix tx timeout recovery for NX2031 chip Bluetooth: Enter active mode before establishing a SCO link. ...
2010-02-10powerpc: Fix address masking bug in hpte_need_flush()David Gibson
Commit f71dc176aa06359681c30ba6877ffccab6fba3a6 'Make hpte_need_flush() correctly mask for multiple page sizes' introduced bug, which is triggered when a kernel with a 64k base page size is run on a system whose hardware does not 64k hash PTEs. In this case, we emulate 64k pages with multiple 4k hash PTEs, however in hpte_need_flush() we incorrectly only mask the hardware page size from the address, instead of the logical page size. This causes things to go wrong when we later attempt to iterate through the hardware subpages of the logical page. This patch corrects the error. It has been tested on pSeries bare metal by Michael Neuling. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-09Merge branch 'for-linus' of git://neil.brown.name/mdLinus Torvalds
* 'for-linus' of git://neil.brown.name/md: md: fix some lockdep issues between md and sysfs. md: fix 'degraded' calculation when starting a reshape.
2010-02-10md: fix some lockdep issues between md and sysfs.NeilBrown
====== This fix is related to http://bugzilla.kernel.org/show_bug.cgi?id=15142 but does not address that exact issue. ====== sysfs does like attributes being removed while they are being accessed (i.e. read or written) and waits for the access to complete. As accessing some md attributes takes the same lock that is held while removing those attributes a deadlock can occur. This patch addresses 3 issues in md that could lead to this deadlock. Two relate to calling flush_scheduled_work while the lock is held. This is probably a bad idea in general and as we use schedule_work to delete various sysfs objects it is particularly bad. In one case flush_scheduled_work is called from md_alloc (called by md_probe) called from do_md_run which holds the lock. This call is only present to ensure that ->gendisk is set. However we can be sure that gendisk is always set (though possibly we couldn't when that code was originally written. This is because do_md_run is called in three different contexts: 1/ from md_ioctl. This requires that md_open has succeeded, and it fails if ->gendisk is not set. 2/ from writing a sysfs attribute. This can only happen if the mddev has been registered in sysfs which happens in md_alloc after ->gendisk has been set. 3/ from autorun_array which is only called by autorun_devices, which checks for ->gendisk to be set before calling autorun_array. So the call to md_probe in do_md_run can be removed, and the check on ->gendisk can also go. In the other case flush_scheduled_work is being called in do_md_stop, purportedly to wait for all md_delayed_delete calls (which delete the component rdevs) to complete. However there really isn't any need to wait for them - they have already been disconnected in all important ways. The third issue is that raid5->stop() removes some attribute names while the lock is held. There is already some infrastructure in place to delay attribute removal until after the lock is released (using schedule_work). So extend that infrastructure to remove the raid5_attrs_group. This does not address all lockdep issues related to the sysfs "s_active" lock. The rest can be address by splitting that lockdep context between symlinks and non-symlinks which hopefully will happen. Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix p9_client_destroy unconditional calling v9fs_put_trans 9p: fix memory leak in v9fs_parse_options() 9p: Fix the kernel crash on a failed mount 9p: fix option parsing 9p: Include fsync support for 9p client net/9p: fix statsize inside twstat net/9p: fail when user specifies a transport which we can't find net/9p: fix virtio transport to correctly update status on connect
2010-02-09KVM: PIT: control word is write-onlyMarcelo Tosatti
PIT control word (address 0x43) is write-only, reads are undefined. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09kvmclock: count total_sleep_time when updating guest clockJason Wang
Current kvm wallclock does not consider the total_sleep_time which could cause wrong wallclock in guest after host suspend/resume. This patch solve this issue by counting total_sleep_time to get the correct host boot time. Cc: stable@kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09Export the symbol of getboottime and mmonotonic_to_bootbasedJason Wang
Export getboottime and monotonic_to_bootbased in order to let them could be used by following patch. Cc: stable@kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09[S390] Fix struct _lowcore layout.Heiko Carstens
Offsets and sizes are wrong for 32 bit. Got broken with 866ba284 "[S390] cleanup lowcore.h". Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-09[S390] qdio: prevent call trace if CHPID is offlineJan Glauber
If a CHPID is offline during a device shutdown the ccw_device_halt|clear may fail and the qdio device stays in state STOPPED until the shutdown is finished. If an interrupt occurs before the device is set to INACTIVE the STOPPED state triggers a WARN_ON in the interrupt handler. Prevent this WARN_ON by catching the STOPPED state in the interrupt handler. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-09[S390] qdio: continue polling for buffer state ERRORUrsula Braun
Inbound traffic handling may hang if next buffer to check is in state ERROR, polling is stopped and the final check for further available inbound buffers disregards buffers in state ERROR. This patch includes state ERROR when checking availability of more inbound buffers. Cc: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-08Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
2010-02-08drivers/net: Correct NULL testJulia Lawall
Test the value that was just allocated rather than the previously tested one. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ expression *x; expression e; identifier l; @@ if (x == NULL || ...) { ... when forall return ...; } ... when != goto l; when != x = e when != &x *x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08MAINTAINERS: networking drivers - Add git net-next treeJoe Perches
During the rc period, patches that are not bugfixes should be done using the net-next tree. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08net/sched: Fix module name in KconfigJan Luebbe
The action modules have been prefixed with 'act_', but the Kconfig description was not changed. Signed-off-by: Jan Luebbe <jluebbe@debian.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08cxgb3: fix GRO checksum checkDivy Le Ray
Verify the HW checksum state for frames handed to GRO processing. Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-09md: fix 'degraded' calculation when starting a reshape.NeilBrown
This code was written long ago when it was not possible to reshape a degraded array. Now it is so the current level of degraded-ness needs to be taken in to account. Also newly addded devices should only reduce degradedness if they are deemed to be in-sync. In particular, if you convert a RAID5 to a RAID6, and increase the number of devices at the same time, then the 5->6 conversion will make the array degraded so the current code will produce a wrong value for 'degraded' - "-1" to be precise. If the reshape runs to completion end_reshape will calculate a correct new value for 'degraded', but if a device fails during the reshape an incorrect decision might be made based on the incorrect value of "degraded". This patch is suitable for 2.6.32-stable and if they are still open, 2.6.31-stable and 2.6.30-stable as well. Cc: stable@kernel.org Reported-by: Michael Evans <mjevans1983@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-08Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux: Revert "nfsd4: fix error return when pseudoroot missing"
2010-02-089p: fix p9_client_destroy unconditional calling v9fs_put_transEric Van Hensbergen
restructure client create code to handle error cases better and only cleanup initialized portions of the stack. Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/cluster: Make o2net connect messages KERN_NOTICE ocfs2/dlm: Fix printing of lockname ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map() ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node ocfs2: Plugs race between the dc thread and an unlock ast message ocfs2: Remove overzealous BUG_ON during blocked lock processing ocfs2: Do not downconvert if the lock level is already compatible ocfs2: Prevent a livelock in dlmglue ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast ocfs2: Use compat_ptr in reflink_arguments. ocfs2/dlm: Handle EAGAIN for compatibility - v2 ocfs2: Add parenthesis to wrap the check for O_DIRECT. ocfs2: Only bug out when page size is larger than cluster size. ocfs2: Fix memory overflow in cow_by_page. ocfs2/dlm: Print more messages during lock migration ocfs2/dlm: Ignore LVBs of locks in the Blocked list ocfs2/trivial: Remove trailing whitespaces ocfs2: fix a misleading variable name ocfs2: Sync max_inline_data_with_xattr from tools. ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path
2010-02-089p: fix memory leak in v9fs_parse_options()Eric Van Hensbergen
If match_strdup() fail this function exits without freeing the options string. Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com> Sigend-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-089p: Fix the kernel crash on a failed mountAneesh Kumar K.V
The patch fix the crash repoted below [ 15.149907] BUG: unable to handle kernel NULL pointer dereference at 00000001 [ 15.150806] IP: [<c140b886>] p9_virtio_close+0x18/0x24 ..... .... [ 15.150806] Call Trace: [ 15.150806] [<c1408e78>] ? p9_client_destroy+0x3f/0x163 [ 15.150806] [<c1409342>] ? p9_client_create+0x25f/0x270 [ 15.150806] [<c1063b72>] ? trace_hardirqs_on+0xb/0xd [ 15.150806] [<c11ed4e8>] ? match_token+0x64/0x164 [ 15.150806] [<c1175e8d>] ? v9fs_session_init+0x2f1/0x3c8 [ 15.150806] [<c109cfc9>] ? kmem_cache_alloc+0x98/0xb8 [ 15.150806] [<c1063b72>] ? trace_hardirqs_on+0xb/0xd [ 15.150806] [<c1173dd1>] ? v9fs_get_sb+0x47/0x1e8 [ 15.150806] [<c1173dea>] ? v9fs_get_sb+0x60/0x1e8 [ 15.150806] [<c10a2e77>] ? vfs_kern_mount+0x81/0x11a [ 15.150806] [<c10a2f55>] ? do_kern_mount+0x33/0xbe [ 15.150806] [<c10b40b9>] ? do_mount+0x654/0x6b3 [ 15.150806] [<c1038949>] ? do_page_fault+0x0/0x284 [ 15.150806] [<c10b28ec>] ? copy_mount_options+0x73/0xd2 [ 15.150806] [<c10b4179>] ? sys_mount+0x61/0x94 [ 15.150806] [<c14284e9>] ? syscall_call+0x7/0xb .... [ 15.203562] ---[ end trace 1dd159357709eb4b ]--- [ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08dst: call cond_resched() in dst_gc_task()Eric Dumazet
Kernel bugzilla #15239 On some workloads, it is quite possible to get a huge dst list to process in dst_gc_task(), and trigger soft lockup detection. Fix is to call cond_resched(), as we run in process context. Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-089p: fix option parsingEric Van Hensbergen
Options pointer is being moved before calling kfree() which seems to cause problems. This uses a separate pointer to track and free original allocation. Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>w
2010-02-089p: Include fsync support for 9p clientM. Mohan Kumar
Implement the fsync in the client side by marking stat field values to 'don't touch' so that server may interpret it as a request to guarantee that the contents of the associated file are committed to stable storage before the Rwstat message is returned. Without this patch, calling fsync on a 9p file results in "Invalid argument" error. Please check the attached C program. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Acked-by: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Fix ondemand to not request targets outside policy limits [CPUFREQ] Fix use after free of struct powernow_k8_data [CPUFREQ] fix default value for ondemand governor
2010-02-08ocfs2/cluster: Make o2net connect messages KERN_NOTICESunil Mushran
Connect and disconnect messages are more than informational as they are required during root cause analysis for failures. This patch changes them from KERN_INFO to KERN_NOTICE. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Faseh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-08ocfs2/dlm: Fix printing of locknameSunil Mushran
The debug call printing the name of the lock resource was chopping off the last character. This patch fixes the problem. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-08Revert "nfsd4: fix error return when pseudoroot missing"J. Bruce Fields
Commit f39bde24b275ddc45d fixed the error return from PUTROOTFH in the case where there is no pseudofilesystem. This is really a case we shouldn't hit on a correctly configured server: in the absence of a root filehandle, there's no point accepting version 4 NFS rpc calls at all. But the shared responsibility between kernel and userspace here means the kernel on its own can't eliminate the possiblity of this happening. And we have indeed gotten this wrong in distro's, so new client-side mount code that attempts to negotiate v4 by default first has to work around this case. Therefore when commit f39bde24b275ddc45d arrived at roughly the same time as the new v4-default mount code, which explicitly checked only for the previous error, the result was previously fine mounts suddenly failing. We'll fix both sides for now: revert the error change, and make the client-side mount workaround more robust. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-08net/9p: fix statsize inside twstatEric Van Hensbergen
stat structures contain a size prefix. In our twstat messages we were including the size of the size prefix in the prefix, which is not what the protocol wants, and Inferno servers would complain. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08net/9p: fail when user specifies a transport which we can't findEric Van Hensbergen
If the user specifies a transport and we can't find it, we failed back to the default trainsport silently. This patch will make the code complain more loudly and return an error code. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08net/9p: fix virtio transport to correctly update status on connectEric Van Hensbergen
The 9p virtio transport was not updating its connection status correctly preventing it from being able to mount the server. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08netfilter: nf_conntrack: fix hash resizing with namespacesPatrick McHardy
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08netfilter: xtables: compat out of scope fixAlexey Dobriyan
As per C99 6.2.4(2) when temporary table data goes out of scope, the behaviour is undefined: if (compat) { struct foo tmp; ... private = &tmp; } [dereference private] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08netfilter: nf_conntrack: restrict runtime expect hashsize modificationsAlexey Dobriyan
Expectation hashtable size was simply glued to a variable with no code to rehash expectations, so it was a bug to allow writing to it. Make "expect_hashsize" readonly. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08netfilter: nf_conntrack: per netns nf_conntrack_cachepEric Dumazet
nf_conntrack_cachep is currently shared by all netns instances, but because of SLAB_DESTROY_BY_RCU special semantics, this is wrong. If we use a shared slab cache, one object can instantly flight between one hash table (netns ONE) to another one (netns TWO), and concurrent reader (doing a lookup in netns ONE, 'finding' an object of netns TWO) can be fooled without notice, because no RCU grace period has to be observed between object freeing and its reuse. We dont have this problem with UDP/TCP slab caches because TCP/UDP hashtables are global to the machine (and each object has a pointer to its netns). If we use per netns conntrack hash tables, we also *must* use per netns conntrack slab caches, to guarantee an object can not escape from one namespace to another one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [Patrick: added unique slab name allocation] Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08netfilter: nf_conntrack: fix memory corruption with multiple namespacesPatrick McHardy
As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked" conntrack, which is located in the data section, might be accidentally freed when a new namespace is instantiated while the untracked conntrack is attached to a skb because the reference count it re-initialized. The best fix would be to use a seperate untracked conntrack per namespace since it includes a namespace pointer. Unfortunately this is not possible without larger changes since the namespace is not easily available everywhere we need it. For now move the untracked conntrack initialization to the init_net setup function to make sure the reference count is not re-initialized and handle cleanup in the init_net cleanup function to make sure namespaces can exit properly while the untracked conntrack is in use in other namespaces. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08Merge branch 'v4l_for_linus' of git://linuxtv.org/fixesLinus Torvalds
* 'v4l_for_linus' of git://linuxtv.org/fixes: V4L/DVB: dvb-core: fix initialization of feeds list in demux filter V4L/DVB: dvb_demux: Don't use vmalloc at dvb_dmx_swfilter_packet V4L/DVB: Fix the risk of an oops at dvb_dmx_release
2010-02-08Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Invalidate dcache before enabling it
2010-02-08Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/pseries: Fix kexec regression caused by CPPR tracking