summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-07-28UBI: block: Avoid disk size integer overflowRichard Weinberger
This patch fixes the issue that on very large UBI volumes UBI block does not work correctly. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-28UBI: block: Set disk_capacity out of the mutexEzequiel Garcia
There's no need to set the disk capacity with the mutex held, so this commit takes the variable setting out of the mutex. This simplifies the disk capacity fix for very large volumes in a follow up commit. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-28UBI: block: Make ubiblock_resize return somethingEzequiel Garcia
Currently, ubiblock_resize() can fail if the device is not found in the list. This commit changes the return type, so the function can return something meaningful on error paths. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: add a log overlap assertionArtem Bityutskiy
Add an assertion which checkes that the head of the log never overlaps with the tail of the log. Suggested-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: remove unnecessary checkArtem Bityutskiy
Remove the "if (c->lhead_offs == 0)" check because is unnecessary, since at that point the log head offset is guaranteed to be zero due to the previous operation. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: remove mst_mutexArtem Bityutskiy
The 'mst_mutex' is not needed since because 'ubifs_write_master()' is only called on the mount path and commit path. The mount path is sequential and there is no parallelism, and the commit path is also serialized - there is only one commit going on at a time. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: kernel-doc warning fixFabian Frederick
s/data/timer Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBI: init_volumes: Ignore volumes with no LEBsRichard Weinberger
UBI assumes that ubi_attach_info will only contain ubi_ainf_volume structures for volumes with at least one LEB. In scanning mode this is true because UBI can nicely create a ubi_ainf_volume on demand while creating the EBA table. For fastmap this is not true, the fastmap on-flash structure has a list of all volumes, the ubi_ainf_volume structures are created from this list. So it can happen that an empty volume ends up in init_volumes(). We can easely deal with that by looking into ->leb_count too. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: replace seq_printf by seq_putsFabian Frederick
Fix checkpatch warnings: "WARNING: Prefer seq_puts to seq_printf" Andrew Morton wrote: " - puts is presumably faster - puts doesn't go rogue if you accidentally pass it a "%". - this patch actually made fs/ubifs/super.o 12 bytes smaller. Perhaps because seq_printf() is a varargs function, forcing the caller to pass args on the stack instead of in registers. " Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: replace count*size kzalloc by kcallocFabian Frederick
kcalloc manages count*sizeof overflow. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: kernel-doc warning fixFabian Frederick
No grouped argument in drop_last_node. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: fix error path in create_default_filesystem()hujianyang
In the end of 'create_default_filesystem()' we need to check the return value of 'ubifs_write_node()' to ensure that we have successfully written the 'cs_node'. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: fix spelling of "scanned"Artem Bityutskiy
Randy Dunlap pointed that we should use "scanned" instead of "scaned". This patch makes the correction. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: fix some commentsSeunghun Lee
This patch fixes some comments about return type. Signed-off-by: Seunghun Lee <waydi1@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: remove useless @ecc in struct ubifs_scan_lebhujianyang
We set @ecc in ubifs_scan_leb only if leb_read returns EBADMSG and do not use it any more. This patch removes this variable and adds comments about EBADMSG handling. Artem: re-phrase commentaries Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: remove useless statementshujianyang
This patch removes useless and duplicate statements. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: Add missing break statements in dbg_chk_pnode()hujianyang
This is a minor fix. These two branches in 'dbg_chk_pnode()' are dealing with different conditions. Although there is no fault in current state, I think adding "break"s in each end of branch is better. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-19UBIFS: fix error handling in dump_lpt_leb()hujianyang
This patch checks the return value of 'ubifs_unpack_nnode()'. If this function returns an error, 'nnode' may not be initialized, so just print an error message and break. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-18Merge tag 'upstream-3.16-rc6' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI fixes from Artem Bityutskiy: "Two UBI fastmap-related fixes for v3.16: - fix UBI fastmap support which we broke in 3.16-rc1 by reversing the volumes RB-tree sorting criteria. - make sure that we scrub all PEBs where we see bit-flips - we were missing some of them when the fastmap feature was enabled" * tag 'upstream-3.16-rc6' of git://git.infradead.org/linux-ubifs: UBI: fastmap: do not miss bit-flips UBI: fix the volumes tree sorting criteria
2014-07-18Merge tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs fixes from Dave Chinner: "Fixes for low memory perforamnce regressions and a quota inode handling regression. These are regression fixes for issues recently introduced - the change in the stack switch location is fairly important, so I've held off sending this update until I was sure that it still addresses the stack usage problem the original solved. So while the commits in the xfs tree are recent, it has been under tested for several weeks now" * tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfs: xfs: null unused quota inodes when quota is on xfs: refine the allocation stack switch Revert "xfs: block allocation work needs to be kswapd aware"
2014-07-17Merge tag 'stable/for-linus-3.16-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from Konrad Rzeszutek Wilk: "Two fixes found during migration of PV guests. David would be the one doing this pull but he is on vacation. Fixes: - fix console deadlock when resuming PV guests - fix regression hit when ballooning and resuming PV guests" * tag 'stable/for-linus-3.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: set ballooned out pages as invalid in p2m xen/manage: fix potential deadlock when resuming the console
2014-07-17Merge tag 'trace-fixes-v3.16-rc5-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few more fixes for ftrace infrastructure. I was cleaning out my INBOX and found two fixes from zhangwei from a year ago that were lost in my mail. These fix an inconsistency between trace_puts() and the way trace_printk() works. The reason this is important to fix is because when trace_printk() doesn't have any arguments, it turns into a trace_puts(). Not being able to enable a stack trace against trace_printk() because it does not have any arguments is quite confusing. Also, the fix is rather trivial and low risk. While porting some changes to PowerPC I discovered that it still has the function graph tracer filter bug that if you also enable stack tracing the function graph tracer filter is ignored. I fixed that up. Finally, Martin Lau, fixed a bug that would cause readers of the ftrace ring buffer to block forever even though it was suppose to be NONBLOCK" This also includes the fix from an earlier pull request: "Oleg Nesterov fixed a memory leak that happens if a user creates a tracing instance, sets up a filter in an event, and then removes that instance. The filter allocates memory that is never freed when the instance is destroyed" * tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Fix polling on trace_pipe tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs tracing: Fix graph tracer with stack tracer on other archs tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs tracing: instance_rmdir() leaks ftrace_event_file->filter
2014-07-16Merge tag 'for-linus-20140716' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD fixes from Brian Norris: - Fix ELM suspend/resume - Reduce warnings if NAND ECC is too weak - Add CFI support for Sharp LH28F640BF NOR The last fix is coming in because other commits in the 3.16 cycle depended on this support. * tag 'for-linus-20140716' of git://git.infradead.org/linux-mtd: mtd: cfi_cmdset_0001.c: add support for Sharp LH28F640BF NOR mtd: nand: reduce the warning noise when the ECC is too weak mtd: devices: elm: fix elm_context_save() and elm_context_restore() functions
2014-07-16Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A cpufreq lockup fix and a compiler warning fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix compiler warnings x86, tsc: Fix cpufreq lockup
2014-07-16Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Tooling fixes and an Intel PMU driver fixlet" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Do not allow optimized switch for non-cloned events perf/x86/intel: ignore CondChgd bit to avoid false NMI handling perf symbols: Get kernel start address by symbol name perf tools: Fix segfault in cumulative.callchain report
2014-07-16Merge tag 'sound-3.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Things seem to calm down so far, just a small few HD-audio fixes (regression fixes and a new codec ID addition) popping up" * tag 'sound-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix broken PM due to incomplete i915 initialization ALSA: hda - Revert stream assignment order for Intel controllers ALSA: hda - Add new GPU codec ID 0x10de0070 to snd-hda ALSA: hda: Fix build warning
2014-07-16UBI: fastmap: do not miss bit-flipsBrian Norris
The return value from 'ubi_io_read_ec_hdr()' was stored in 'err', not in 'ret'. This fix makes sure Fastmap-enabled UBI does not miss bit-flip while reading EC headers, events and scrubs the affected PEBs. This issue was reported by Coverity Scan. Artem: improved the commit message. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-15Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fix from Jan Kara: "Fix locking of dquot shrinker" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: missing lock in dqcache_shrink_scan()
2014-07-15Merge tag 'gpio-v3.16-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fix from Linus Walleij: "Fix up some merge confusion from the merge window" * tag 'gpio-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: mcp23s08: Eliminates redundant checking.
2014-07-15ring-buffer: Fix polling on trace_pipeMartin Lau
ring_buffer_poll_wait() should always put the poll_table to its wait_queue even there is immediate data available. Otherwise, the following epoll and read sequence will eventually hang forever: 1. Put some data to make the trace_pipe ring_buffer read ready first 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) 3. epoll_wait() 4. read(trace_pipe_fd) till EAGAIN 5. Add some more data to the trace_pipe ring_buffer 6. epoll_wait() -> this epoll_wait() will block forever ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, ring_buffer_poll_wait() returns immediately without adding poll_table, which has poll_table->_qproc pointing to ep_poll_callback(), to its wait_queue. ~ During the epoll_wait() call in step 3 and step 6, ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue because the poll_table->_qproc is NULL and it is how epoll works. ~ When there is new data available in step 6, ring_buffer does not know it has to call ep_poll_callback() because it is not in its wait queue. Hence, block forever. Other poll implementation seems to call poll_wait() unconditionally as the very first thing to do. For example, tcp_poll() in tcp.c. Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com Cc: stable@vger.kernel.org # 2.6.27 Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled" Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Martin Lau <kafai@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15quota: missing lock in dqcache_shrink_scan()Niu Yawei
Commit 1ab6c4997e04 (fs: convert fs shrinkers to new scan/count API) accidentally removed locking from quota shrinker. Fix it - dqcache_shrink_scan() should use dq_list_lock to protect the scan on free_dquots list. CC: stable@vger.kernel.org Fixes: 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputszhangwei(Jovi)
The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing, so add it, to be consistent with __trace_printk/__trace_bprintk. Those functions are all called by the same function: trace_printk(). Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This contains miscellaneous fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: replace count*size kzalloc by kcalloc fuse: release temporary page if fuse_writepage_locked() failed fuse: restructure ->rename2() fuse: avoid scheduling while atomic fuse: handle large user and group ID fuse: inode: drop cast fuse: ignore entry-timeout on LOOKUP_REVAL fuse: timeout comparison fix
2014-07-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Bluetooth pairing fixes from Johan Hedberg. 2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB, from Max Stepanov. 3) New iwlwifi chip IDs, from Oren Givon. 4) bnx2x driver reads wrong PCI config space MSI register, from Yijing Wang. 5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu. 6) Fix double SKB free in openvswitch, from Andy Zhou. 7) Fix sk_dst_set() being racey with UDP sockets, leading to strange crashes, from Eric Dumazet. 8) Interpret the NAPI budget correctly in the new systemport driver, from Florian Fainelli. 9) VLAN code frees percpu stats in the wrong place, leading to crashes in the get stats handler. From Eric Dumazet. 10) TCP sockets doing a repair can crash with a divide by zero, because we invoke tcp_push() with an MSS value of zero. Just skip that part of the sendmsg paths in repair mode. From Christoph Paasch. 11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai. 12) Don't ignore path MTU icmp messages with a zero mtu, machines out there still spit them out, and all of our per-protocol handlers for PMTU can cope with it just fine. From Edward Allcutt. 13) Some NETDEV_CHANGE notifier invocations were not passing in the correct kind of cookie as the argument, from Loic Prylli. 14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul Maloy. 15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix from Dmitry Popov. 16) Fix skb->sk assigned without taking a reference to 'sk' in appletalk, from Andrey Utkin. 17) Fix some info leaks in ULP event signalling to userspace in SCTP, from Daniel Borkmann. 18) Fix deadlocks in HSO driver, from Olivier Sobrie. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits) hso: fix deadlock when receiving bursts of data hso: remove unused workqueue net: ppp: don't call sk_chk_filter twice mlx4: mark napi id for gro_skb bonding: fix ad_select module param check net: pppoe: use correct channel MTU when using Multilink PPP neigh: sysctl - simplify address calculation of gc_* variables net: sctp: fix information leaks in ulpevent layer MAINTAINERS: update r8169 maintainer net: bcmgenet: fix RGMII_MODE_EN bit tipc: clear 'next'-pointer of message fragments before reassembly r8152: fix r8152_csum_workaround function be2net: set EQ DB clear-intr bit in be_open() GRE: enable offloads for GRE farsync: fix invalid memory accesses in fst_add_one() and fst_init_card() igb: do a reset on SR-IOV re-init if device is down igb: Workaround for i210 Errata 25: Slow System Clock usbnet: smsc95xx: add reset_resume function with reset operation dp83640: Always decode received status frames r8169: disable L23 ...
2014-07-15tracing: Fix graph tracer with stack tracer on other archsSteven Rostedt (Red Hat)
Running my ftrace tests on PowerPC, it failed the test that checks if function_graph tracer is affected by the stack tracer. It was. Looking into this, I found that the update_function_graph_func() must be called even if the trampoline function is not changed. This is because archs like PowerPC do not support ftrace_ops being passed by assembly and instead uses a helper function (what the trampoline function points to). Since this function is not changed even when multiple ftrace_ops are added to the code, the test that falls out before calling update_function_graph_func() will miss that the update must still be done. Call update_function_graph_function() for all calls to update_ftrace_function() Cc: stable@vger.kernel.org # 3.3+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputszhangwei(Jovi)
Currently trace option stacktrace is not applicable for trace_printk with constant string argument, the reason is in __trace_puts/__trace_bputs ftrace_trace_stack is missing. In contrast, when using trace_printk with non constant string argument(will call into __trace_printk/__trace_bprintk), then trace option stacktrace is workable, this inconstant result will confuses users a lot. Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15ALSA: hda - Fix broken PM due to incomplete i915 initializationTakashi Iwai
When the initialization of Intel HDMI controller fails due to missing i915 kernel symbols (e.g. HD-audio is built in while i915 is module), the driver discontinues the probe. However, since the probe was done asynchronously, the driver object still remains, thus the relevant PM ops are still called at suspend/resume. This results in the bad access to the incomplete audio card object, eventually leads to Oops or stall at PM. This patch adds the missing checks of chip->init_failed flag at each PM callback in order to fix the problem above. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-07-14hso: fix deadlock when receiving bursts of dataOlivier Sobrie
When the module sends bursts of data, sometimes a deadlock happens in the hso driver when the tty buffer doesn't get the chance to be flushed quickly enough. Remove the endless while loop in function put_rxbuf_data() which is called by the urb completion handler. If there isn't enough room in the tty buffer, discards all the data received in the URB. Cc: David Miller <davem@davemloft.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Dan Williams <dcbw@redhat.com> Cc: Jan Dumon <j.dumon@option.com> Signed-off-by: Olivier Sobrie <olivier@sobrie.be> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14hso: remove unused workqueueOlivier Sobrie
The workqueue "retry_unthrottle_workqueue" is not scheduled anywhere in the code. So, remove it. Signed-off-by: Olivier Sobrie <olivier@sobrie.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14Merge tag 'firewire-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fix from Stefan Richter: "The 1394 drivers cannot and are not supposed to be built on platforms which don't provide the DMA mapping API (regression since v3.16-rc1 with CONFIG_COMPILE_TEST=y on some architectures)" * tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: IEEE 1394 (FireWire) support should depend on HAS_DMA
2014-07-14Merge git://git.kvack.org/~bcrl/aio-fixesLinus Torvalds
Pull another aio fix from Ben LaHaise: "put_reqs_available() can now be called from within irq context, which means that it (and its sibling function get_reqs_available()) now need to be irq-safe, not just preempt-safe" * git://git.kvack.org/~bcrl/aio-fixes: aio: protect reqs_available updates from changes in interrupt handlers
2014-07-14net/l2tp: don't fall back on UDP [get|set]sockoptSasha Levin
The l2tp [get|set]sockopt() code has fallen back to the UDP functions for socket option levels != SOL_PPPOL2TP since day one, but that has never actually worked, since the l2tp socket isn't an inet socket. As David Miller points out: "If we wanted this to work, it'd have to look up the tunnel and then use tunnel->sk, but I wonder how useful that would be" Since this can never have worked so nobody could possibly have depended on that functionality, just remove the broken code and return -EINVAL. Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: James Chapman <jchapman@katalix.com> Acked-by: David Miller <davem@davemloft.net> Cc: Phil Turnbull <phil.turnbull@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-14net: ppp: don't call sk_chk_filter twiceChristoph Schulz
Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use sk_unattached_filter api") causes sk_chk_filter() to be called twice when setting a PPP pass or active filter. This applies to both the generic PPP subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from within get_filter(). The second one is through the call chain ppp_ioctl() or isdn_ppp_ioctl() --> sk_unattached_filter_create() --> __sk_prepare_filter() --> sk_chk_filter() The first call from within get_filter() should be deleted as get_filter() is called just before calling sk_unattached_filter_create() later on, which eventually calls sk_chk_filter() anyway. For 3.15.x, this proposed change is a bugfix rather than a pure optimization as in that branch, sk_chk_filter() may replace filter codes by other codes which are not recognized when executing sk_chk_filter() a second time. So with 3.15.x, if sk_chk_filter() is called twice, the second invocation may yield EINVAL (this depends on the filter codes found in the filter to be set, but because the replacement is done for frequently used codes, this is almost always the case). The net effect is that setting pass and/or active PPP filters does not work anymore, since sk_unattached_filter_create() always returns EINVAL due to the second call to sk_chk_filter(), regardless whether the filter was originally sane or not. Signed-off-by: Christoph Schulz <develop@kristov.de> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14mlx4: mark napi id for gro_skbJason Wang
Napi id was not marked for gro_skb, this will lead rx busy loop won't work correctly since they stack never try to call low latency receive method because of a zero socket napi id. Fix this by marking napi id for gro_skb. The transaction rate of 1 byte netperf tcp_rr gets about 50% increased (from 20531.68 to 30610.88). Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14bonding: fix ad_select module param checkNikolay Aleksandrov
Obvious copy/paste error when I converted the ad_select to the new option API. "lacp_rate" there should be "ad_select" so we can get the proper value. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: David S. Miller <davem@davemloft.net> Fixes: 9e5f5eebe765 ("bonding: convert ad_select to use the new option API") Reported-by: Karim Scheik <karim.scheik@prisma-solutions.at> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14net: pppoe: use correct channel MTU when using Multilink PPPChristoph Schulz
The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see ppp_generic module) tries to determine how big a fragment might be. According to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the corresponding comment and code in ppp_mp_explode(): /* * hdrlen includes the 2-byte PPP protocol field, but the * MTU counts only the payload excluding the protocol field. * (RFC1661 Section 2) */ mtu = pch->chan->mtu - (hdrlen - 2); However, the pppoe module *does* include the PPP protocol field in the channel MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under certain circumstances (one byte if PPP protocol compression is used, two otherwise), causing the generated Ethernet packets to be dropped. So the pppoe module has to subtract two bytes from the channel MTU. This error only manifests itself when using Multilink PPP, as otherwise the channel MTU is not used anywhere. In the following, I will describe how to reproduce this bug. We configure two pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is computed by adding the two link MTUs and subtracting the MP header twice, which is 4 bytes long.) The necessary pppd statements on both sides are "multilink mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server side, we additionally need to start two pppoe-server instances to be able to establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU of the PPP network interface to the MRRU (2976) on both sides of the connection in order to make use of the higher bandwidth. (If we didn't do that, IP fragmentation would kick in, which we want to avoid.) Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to server over the PPP link. This results in the following network packet: 2948 (echo payload) + 8 (ICMPv4 header) + 20 (IPv4 header) --------------------- 2976 (PPP payload) These 2976 bytes do not exceed the MTU of the PPP network interface, so the IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode() prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger than the negotiated MRRU. So this packet would have to be divided in three fragments. But this does not happen as each link MTU is assumed to be two bytes larger. So this packet is diveded into two fragments only, one of size 1489 and one of size 1488. Now we have for that bigger fragment: 1489 (PPP payload) + 4 (MP header) + 2 (PPP protocol field for the MP payload (0x3d)) + 6 (PPPoE header) -------------------------- 1501 (Ethernet payload) This packet exceeds the link MTU and is discarded. If one configures the link MTU on the client side to 1501, one can see the discarded Ethernet frames with tcpdump running on the client. A ping -s 2948 -c 1 192.168.15.254 leads to the smaller fragment that is correctly received on the server side: (tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d) 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000, Flags [end], length 1492 and to the bigger fragment that is not received on the server side: (tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d) 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1515: PPPoE [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000, Flags [begin], length 1493 With the patch below, we correctly obtain three fragments: 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [begin], length 1492 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [none], length 1492 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 27: PPPoE [ses 0x1] MLPPP (0x003d), length 7: seq 0x000, Flags [end], length 5 And the ICMPv4 echo request is successfully received at the server side: IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1), length 2976) 192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0, length 2956 The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698 ("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies to 3.10 upwards but the fix can be applied (with minor modifications) to kernels as old as 2.6.32. Signed-off-by: Christoph Schulz <develop@kristov.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14neigh: sysctl - simplify address calculation of gc_* variablesMathias Krause
The code in neigh_sysctl_register() relies on a specific layout of struct neigh_table, namely that the 'gc_*' variables are directly following the 'parms' member in a specific order. The code, though, expresses this in the most ugly way. Get rid of the ugly casts and use the 'tbl' pointer to get a handle to the table. This way we can refer to the 'gc_*' variables directly. Similarly seen in the grsecurity patch, written by Brad Spengler. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15xfs: null unused quota inodes when quota is onDave Chinner
When quota is on, it is expected that unused quota inodes have a value of NULLFSINO. The changes to support a separate project quota in 3.12 broken this rule for non-project quota inode enabled filesystem, as the code now refuses to write the group quota inode if neither group or project quotas are enabled. This regression was introduced by commit d892d58 ("xfs: Start using pquotaino from the superblock"). In this case, we should be writing NULLFSINO rather than nothing to ensure that we leave the group quota inode in a valid state while quotas are enabled. Failure to do so doesn't cause a current kernel to break - the separate project quota inodes introduced translation code to always treat a zero inode as NULLFSINO. This was introduced by commit 0102629 ("xfs: Initialize all quota inodes to be NULLFSINO") with is also in 3.12 but older kernels do not do this and hence taking a filesystem back to an older kernel can result in quotas failing initialisation at mount time. When that happens, we see this in dmesg: [ 1649.215390] XFS (sdb): Mounting Filesystem [ 1649.316894] XFS (sdb): Failed to initialize disk quotas. [ 1649.316902] XFS (sdb): Ending clean mount By ensuring that we write NULLFSINO to quota inodes that aren't active, we avoid this problem. We have to be really careful when determining if the quota inodes are active or not, because we don't want to write a NULLFSINO if the quota inodes are active and we simply aren't updating them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-07-14net: sctp: fix information leaks in ulpevent layerDaniel Borkmann
While working on some other SCTP code, I noticed that some structures shared with user space are leaking uninitialized stack or heap buffer. In particular, struct sctp_sndrcvinfo has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when putting this into cmsg. But also struct sctp_remote_error contains a 2 bytes hole that we don't fill but place into a skb through skb_copy_expand() via sctp_ulpevent_make_remote_error(). Both structures are defined by the IETF in RFC6458: * Section 5.3.2. SCTP Header Information Structure: The sctp_sndrcvinfo structure is defined below: struct sctp_sndrcvinfo { uint16_t sinfo_stream; uint16_t sinfo_ssn; uint16_t sinfo_flags; <-- 2 bytes hole --> uint32_t sinfo_ppid; uint32_t sinfo_context; uint32_t sinfo_timetolive; uint32_t sinfo_tsn; uint32_t sinfo_cumtsn; sctp_assoc_t sinfo_assoc_id; }; * 6.1.3. SCTP_REMOTE_ERROR: A remote peer may send an Operation Error message to its peer. This message indicates a variety of error conditions on an association. The entire ERROR chunk as it appears on the wire is included in an SCTP_REMOTE_ERROR event. Please refer to the SCTP specification [RFC4960] and any extensions for a list of possible error formats. An SCTP error notification has the following format: struct sctp_remote_error { uint16_t sre_type; uint16_t sre_flags; uint32_t sre_length; uint16_t sre_error; <-- 2 bytes hole --> sctp_assoc_t sre_assoc_id; uint8_t sre_data[]; }; Fix this by setting both to 0 before filling them out. We also have other structures shared between user and kernel space in SCTP that contains holes (e.g. struct sctp_paddrthlds), but we copy that buffer over from user space first and thus don't need to care about it in that cases. While at it, we can also remove lengthy comments copied from the draft, instead, we update the comment with the correct RFC number where one can look it up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15xfs: refine the allocation stack switchDave Chinner
The allocation stack switch at xfs_bmapi_allocate() has served it's purpose, but is no longer a sufficient solution to the stack usage problem we have in the XFS allocation path. Whilst the kernel stack size is now 16k, that is not a valid reason for undoing all our "keep stack usage down" modifications. What it does allow us to do is have the freedom to refine and perfect the modifications knowing that if we get it wrong it won't blow up in our faces - we have a safety net now. This is important because we still have the issue of older kernels having smaller stacks and that they are still supported and are demonstrating a wide range of different stack overflows. Red Hat has several open bugs for allocation based stack overflows from directory modifications and direct IO block allocation and these problems still need to be solved. If we can solve them upstream, then distro's won't need to bake their own unique solutions. To that end, I've observed that every allocation based stack overflow report has had a specific characteristic - it has happened during or directly after a bmap btree block split. That event requires a new block to be allocated to the tree, and so we effectively stack one allocation stack on top of another, and that's when we get into trouble. A further observation is that bmap btree block splits are much rarer than writeback allocation - over a range of different workloads I've observed the ratio of bmap btree inserts to splits ranges from 100:1 (xfstests run) to 10000:1 (local VM image server with sparse files that range in the hundreds of thousands to millions of extents). Either way, bmap btree split events are much, much rarer than allocation events. Finally, we have to move the kswapd state to the allocation workqueue work when allocation is done on behalf of kswapd. This is proving to cause significant perturbation in performance under memory pressure and appears to be generating allocation deadlock warnings under some workloads, so avoiding the use of a workqueue for the majority of kswapd writeback allocation will minimise the impact of such behaviour. Hence it makes sense to move the stack switch to xfs_btree_split() and only do it for bmap btree splits. Stack switches during allocation will be much rarer, so there won't be significant performacne overhead caused by switching stacks. The worse case stack from all allocation paths will be split, not just writeback. And the majority of memory allocations will be done in the correct context (e.g. kswapd) without causing additional latency, and so we simplify the memory reclaim interactions between processes, workqueues and kswapd. The worst stack I've been able to generate with this patch in place is 5600 bytes deep. It's very revealing because we exit XFS at: 37) 1768 64 kmem_cache_alloc+0x13b/0x170 about 1800 bytes of stack consumed, and the remaining 3800 bytes (and 36 functions) is memory reclaim, swap and the IO stack. And this occurs in the inode allocation from an open(O_CREAT) syscall, not writeback. The amount of stack being used is much less than I've previously be able to generate - fs_mark testing has been able to generate stack usage of around 7k without too much trouble; with this patch it's only just getting to 5.5k. This is primarily because the metadata allocation paths (e.g. directory blocks) are no longer causing double splits on the same stack, and hence now stack tracing is showing swapping being the worst stack consumer rather than XFS. Performance of fs_mark inode create workloads is unchanged. Performance of fs_mark async fsync workloads is consistently good with context switches reduced by around 150,000/s (30%). Performance of dbench, streaming IO and postmark is unchanged. Allocation deadlock warnings have not been seen on the workloads that generated them since adding this patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>