summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-07-02Merge tag 'usb-3.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB updates from Greg KH: "Here's the big USB 3.11-rc1 merge request. Lots of gadget and finally, chipidea driver updates (they were much needed), along with a new host controller driver, lots of little serial driver fixes, the removal of the 255 usb-serial device limitation, and a variety of other minor things. All of these have been in the linux-next releases for a while" * tag 'usb-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (254 commits) usb: musb: omap2430: make it compile again usb: chipidea: ci_hdrc_imx: access phy via private data xhci: Add missing unlocks on error paths USB: option,qcserial: move Novatel Gobi1K IDs to qcserial ehci-atmel.c: prepare clk before calling enable USB: ohci-at91: prepare clk before calling enable USB: HWA: fix device probe failure wusbcore: add entries in Documentation/ABI for new wusbhc sysfs attributes wusbcore: add sysfs attribute for retry count wusbcore: add sysfs attribute for DNTS count and interval usb: chipidea: drop "13xxx" infix usb: phy: tegra: remove duplicated include from phy-tegra-usb.c usb: host: xhci-plat: release mem region while removing module usbmisc_imx: allow autoloading on according to dt ids usb: fix build error without CONFIG_USB_PHY usb: check usb_hub_to_struct_hub() return value xhci: check for failed dma pool allocation usb: gadget: f_subset: fix missing unlock on error in geth_alloc() usb: gadget: f_ncm: fix missing unlock on error in ncm_alloc() usb: gadget: f_ecm: fix missing unlock on error in ecm_alloc() ...
2013-07-02Merge tag 'fscache-20130702' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull FS-Cache updates from David Howells: "This contains a number of fixes for various FS-Cache issues plus some cleanups. The commits are, in order: 1) Provide a system wait_on_atomic_t() and wake_up_atomic_t() sharing the bit-wait table (enhancement for #8). 2) Don't put spin_lock() in a while-condition as spin_lock() may have a do {} while(0) wrapper (cleanup). 3) Symbolically name i_mutex lock classes rather than using numbers in CacheFiles (cleanup). 4) Don't sleep in page release if __GFP_FS is not set (deadlock vs ext4). 5) Uninline fscache_object_init() (cleanup for #7). 6) Wrap checks on object state (cleanup for #7). 7) Simplify the object state machine by separating work states from wait states. 8) Simplify cookie retention by objects (NULL pointer deref fix). 9) Remove unused list_to_page() macro (cleanup). 10) Make the remaining-pages counter in the retrieval op atomic (assertion failure fix). 11) Don't use spin_is_locked() in assertions (assertion failure fix)" * tag 'fscache-20130702' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: FS-Cache: Don't use spin_is_locked() in assertions FS-Cache: The retrieval remaining-pages counter needs to be atomic_t cachefiles: remove unused macro list_to_page() FS-Cache: Simplify cookie retention for fscache_objects, fixing oops FS-Cache: Fix object state machine to have separate work and wait states FS-Cache: Wrap checks on object state FS-Cache: Uninline fscache_object_init() FS-Cache: Don't sleep in page release if __GFP_FS is not set CacheFiles: name i_mutex lock class explicitly fs/fscache: remove spin_lock() from the condition in while() Add wait_on_atomic_t() and wake_up_atomic_t()
2013-07-02Merge tag 'dlm-3.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes a number of SCTP related fixes in the dlm, and a few other minor fixes and changes." * tag 'dlm-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: Avoid LVB truncation dlm: log an error for unmanaged lockspaces dlm: config: using strlcpy instead of strncpy dlm: remove duplicated include from lowcomms.c dlm: disable nagle for SCTP dlm: retry failed SCTP sends dlm: try other IPs when sctp init assoc fails dlm: clear correct bit during sctp init failure handling dlm: set sctp assoc id during setup dlm: clear correct init bit during sctp setup
2013-07-02Merge tag 'for-f2fs-3.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches: - remount_fs callback function - restore parent inode number to enhance the fsync performance - xattr security labels - reduce the number of redundant lock/unlock data pages - avoid frequent write_inode calls The other minor bug fixes are as follows. - endian conversion bugs - various bugs in the roll-forward recovery routine" * tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (56 commits) f2fs: fix to recover i_size from roll-forward f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes() f2fs: remove reusing any prefree segments f2fs: code cleanup and simplify in func {find/add}_gc_inode f2fs: optimize the init_dirty_segmap function f2fs: fix an endian conversion bug detected by sparse f2fs: fix crc endian conversion f2fs: add remount_fs callback support f2fs: recover wrong pino after checkpoint during fsync f2fs: optimize do_write_data_page() f2fs: make locate_dirty_segment() as static f2fs: remove unnecessary parameter "offset" from __add_sum_entry() f2fs: avoid freqeunt write_inode calls f2fs: optimise the truncate_data_blocks_range() range f2fs: use the F2FS specific flags in f2fs_ioctl() f2fs: sync dir->i_size with its block allocation f2fs: fix i_blocks translation on various types of files f2fs: set sb->s_fs_info before calling parse_options() f2fs: support xattr security labels f2fs: fix iget/iput of dir during recovery ...
2013-07-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds
Pull GFS2 updates from Steven Whitehouse: "There are a few bug fixes for various, mostly very minor corner cases, plus some interesting new features. The new features include atomic_open whose main benefit will be the reduction in locking overhead in case of combined lookup/create and open operations, sorting the log buffer lists by block number to improve the efficiency of AIL writeback, and aggressively issuing revokes in gfs2_log_flush to reduce overhead when dropping glocks." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Reserve journal space for quota change in do_grow GFS2: Fix fstrim boundary conditions GFS2: fix warning message GFS2: aggressively issue revokes in gfs2_log_flush GFS2: fix regression in dir_double_exhash GFS2: Add atomic_open support GFS2: Only do one directory search on create GFS2: fix error propagation in init_threads() GFS2: Remove no-op wrapper function GFS2: Cocci spatch "ptr_ret.spatch" GFS2: Eliminate gfs2_rg_lops GFS2: Sort buffer lists by inplace block number
2013-07-02Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 update from Ted Ts'o: "Lots of bug fixes, cleanups and optimizations. In the bug fixes category, of note is a fix for on-line resizing file systems where the block size is smaller than the page size (i.e., file systems 1k blocks on x86, or more interestingly file systems with 4k blocks on Power or ia64 systems.) In the cleanup category, the ext4's punch hole implementation was significantly improved by Lukas Czerner, and now supports bigalloc file systems. In addition, Jan Kara significantly cleaned up the write submission code path. We also improved error checking and added a few sanity checks. In the optimizations category, two major optimizations deserve mention. The first is that ext4_writepages() is now used for nodelalloc and ext3 compatibility mode. This allows writes to be submitted much more efficiently as a single bio request, instead of being sent as individual 4k writes into the block layer (which then relied on the elevator code to coalesce the requests in the block queue). Secondly, the extent cache shrink mechanism, which was introduce in 3.9, no longer has a scalability bottleneck caused by the i_es_lru spinlock. Other optimizations include some changes to reduce CPU usage and to avoid issuing empty commits unnecessarily." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits) ext4: optimize starting extent in ext4_ext_rm_leaf() jbd2: invalidate handle if jbd2_journal_restart() fails ext4: translate flag bits to strings in tracepoints ext4: fix up error handling for mpage_map_and_submit_extent() jbd2: fix theoretical race in jbd2__journal_restart ext4: only zero partial blocks in ext4_zero_partial_blocks() ext4: check error return from ext4_write_inline_data_end() ext4: delete unnecessary C statements ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree() jbd2: move superblock checksum calculation to jbd2_write_superblock() ext4: pass inode pointer instead of file pointer to punch hole ext4: improve free space calculation for inline_data ext4: reduce object size when !CONFIG_PRINTK ext4: improve extent cache shrink mechanism to avoid to burn CPU time ext4: implement error handling of ext4_mb_new_preallocation() ext4: fix corruption when online resizing a fs with 1K block size ext4: delete unused variables ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents jbd2: remove debug dependency on debug_fs and update Kconfig help text jbd2: use a single printk for jbd_debug() ...
2013-07-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS patches (part 1) from Al Viro: "The major change in this pile is ->readdir() replacement with ->iterate(), dealing with ->f_pos races in ->readdir() instances for good. There's a lot more, but I'd prefer to split the pull request into several stages and this is the first obvious cutoff point." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits) [readdir] constify ->actor [readdir] ->readdir() is gone [readdir] convert ecryptfs [readdir] convert coda [readdir] convert ocfs2 [readdir] convert fatfs [readdir] convert xfs [readdir] convert btrfs [readdir] convert hostfs [readdir] convert afs [readdir] convert ncpfs [readdir] convert hfsplus [readdir] convert hfs [readdir] convert befs [readdir] convert cifs [readdir] convert freevxfs [readdir] convert fuse [readdir] convert hpfs reiserfs: switch reiserfs_readdir_dentry to inode reiserfs: is_privroot_deh() needs only directory inode, actually ...
2013-07-02sync: don't block the flusher thread waiting on IODave Chinner
When sync does it's WB_SYNC_ALL writeback, it issues data Io and then immediately waits for IO completion. This is done in the context of the flusher thread, and hence completely ties up the flusher thread for the backing device until all the dirty inodes have been synced. On filesystems that are dirtying inodes constantly and quickly, this means the flusher thread can be tied up for minutes per sync call and hence badly affect system level write IO performance as the page cache cannot be cleaned quickly. We already have a wait loop for IO completion for sync(2), so cut this out of the flusher thread and delegate it to wait_sb_inodes(). Hence we can do rapid IO submission, and then wait for it all to complete. Effect of sync on fsmark before the patch: FSUse% Count Size Files/sec App Overhead ..... 0 640000 4096 35154.6 1026984 0 720000 4096 36740.3 1023844 0 800000 4096 36184.6 916599 0 880000 4096 1282.7 1054367 0 960000 4096 3951.3 918773 0 1040000 4096 40646.2 996448 0 1120000 4096 43610.1 895647 0 1200000 4096 40333.1 921048 And a single sync pass took: real 0m52.407s user 0m0.000s sys 0m0.090s After the patch, there is no impact on fsmark results, and each individual sync(2) operation run concurrently with the same fsmark workload takes roughly 7s: real 0m6.930s user 0m0.000s sys 0m0.039s IOWs, sync is 7-8x faster on a busy filesystem and does not have an adverse impact on ongoing async data write operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-02f2fs: fix to recover i_size from roll-forwardJaegeuk Kim
If user requests many data writes and fsync together, the last updated i_size should be stored to the inode block consistently. But, previous write_end just marks the inode as dirty and doesn't update its metadata into its inode block. After that, fsync just writes the inode block with newly updated data index excluding inode metadata updates. So, this patch introduces write_end in which updates inode block too when the i_size is changed. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-02f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes()Gu Zheng
As destroy_fsync_dnodes() is a simple list-cleanup func, so delete the unused and unrelated f2fs_sb_info argument of it. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-02f2fs: remove reusing any prefree segmentsJaegeuk Kim
This patch removes check_prefree_segments initially designed to enhance the performance by narrowing the range of LBA usage across the whole block device. When allocating a new segment, previous f2fs tries to find proper prefree segments, and then, if finds a segment, it reuses the segment for further data or node block allocation. However, I found that this was totally wrong approach since the prefree segments have several data or node blocks that will be used by the roll-forward mechanism operated after sudden-power-off. Let's assume the following scenario. /* write 8MB with fsync */ for (i = 0; i < 2048; i++) { offset = i * 4096; write(fd, offset, 4KB); fsync(fd); } In this case, naive segment allocation sequence will be like: data segment: x, x+1, x+2, x+3 node segment: y, y+1, y+2, y+3. But, if we can reuse prefree segments, the sequence can be like: data segment: x, x+1, y, y+1 node segment: y, y+1, y+2, y+3. Because, y, y+1, and y+2 became prefree segments one by one, and those are reused by data allocation. After conducting this workload, we should consider how to recover the latest inode with its data. If we reuse the prefree segments such as y or y+1, we lost the old node blocks so that f2fs even cannot start roll-forward recovery. Therefore, I suggest that we should remove reusing prefree segments. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-02f2fs: code cleanup and simplify in func {find/add}_gc_inodeGu Zheng
This patch simplifies list operations in find_gc_inode and add_gc_inode. Just simple code cleanup. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: add description] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-02f2fs: optimize the init_dirty_segmap functionNamjae Jeon
Optimize the while loop condition Since this condition will always be true and while loop will be terminated by the following condition in code: if (segno >= TOTAL_SEGS(sbi)) break; Hence we can replace the while loop condition with while(1) instead of always checking for segno to be less than Total segs. Also we do not need to use TOTAL_SEGS() everytime. We can store this value in a local variable since this value is constant. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-02f2fs: fix an endian conversion bug detected by sparseJaegeuk Kim
This patch should fix the following bug reported by kbuild test robot. fs/f2fs/recovery.c:233:33: sparse: incorrect type in assignment (different base types) parse warnings: (new ones prefixed by >>) >> recovery.c:233: sparse: incorrect type in assignment (different base types) recovery.c:233: expected unsigned int [unsigned] [assigned] ofs_in_node recovery.c:233: got restricted __le16 [assigned] [usertype] ofs_in_node >> recovery.c:238: sparse: incorrect type in assignment (different base types) recovery.c:238: expected unsigned int [unsigned] ofs_in_node recovery.c:238: got restricted __le16 [assigned] [usertype] ofs_in_node Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-02f2fs: fix crc endian conversionJaegeuk Kim
While calculating CRC for the checkpoint block, we use __u32, but when storing the crc value to the disk, we use __le32. Let's fix the inconsistency. Reported-and-Tested-by: Oded Gabbay <ogabbay@advaoptical.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-01ext4: optimize starting extent in ext4_ext_rm_leaf()Ashish Sangwan
Both hole punch and truncate use ext4_ext_rm_leaf() for removing blocks. Currently we choose the last extent as the starting point for removing blocks: ex = EXT_LAST_EXTENT(eh); This is OK for truncate but for hole punch we can optimize the extent selection as the path is already initialized. We could use this information to select proper starting extent. The code change in this patch will not affect truncate as for truncate path[depth].p_ext will always be NULL. Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01jbd2: invalidate handle if jbd2_journal_restart() failsTheodore Ts'o
If jbd2_journal_restart() fails the handle will have been disconnected from the current transaction. In this situation, the handle must not be used for for any jbd2 function other than jbd2_journal_stop(). Enforce this with by treating a handle which has a NULL transaction pointer as an aborted handle, and issue a kernel warning if jbd2_journal_extent(), jbd2_journal_get_write_access(), jbd2_journal_dirty_metadata(), etc. is called with an invalid handle. This commit also fixes a bug where jbd2_journal_stop() would trip over a kernel jbd2 assertion check when trying to free an invalid handle. Also move the responsibility of setting current->journal_info to start_this_handle(), simplifying the three users of this function. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Younger Liu <younger.liu@huawei.com> Cc: Jan Kara <jack@suse.cz>
2013-07-01ext4: translate flag bits to strings in tracepointsTheodore Ts'o
Translate the bitfields used in various flags argument to strings to make the tracepoint output more human-readable. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: fix up error handling for mpage_map_and_submit_extent()Theodore Ts'o
The function mpage_released_unused_page() must only be called once; otherwise the kernel will BUG() when the second call to mpage_released_unused_page() tries to unlock the pages which had been unlocked by the first call. Also restructure the error handling so that we only give up on writing the dirty pages in the case of ENOSPC where retrying the allocation won't help. Otherwise, a transient failure, such as a kmalloc() failure in calling ext4_map_blocks() might cause us to give up on those pages, leading to a scary message in /var/log/messages plus data loss. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-07-01jbd2: fix theoretical race in jbd2__journal_restartTheodore Ts'o
Once we decrement transaction->t_updates, if this is the last handle holding the transaction from closing, and once we release the t_handle_lock spinlock, it's possible for the transaction to commit and be released. In practice with normal kernels, this probably won't happen, since the commit happens in a separate kernel thread and it's unlikely this could all happen within the space of a few CPU cycles. On the other hand, with a real-time kernel, this could potentially happen, so save the tid found in transaction->t_tid before we release t_handle_lock. It would require an insane configuration, such as one where the jbd2 thread was set to a very high real-time priority, perhaps because a high priority real-time thread is trying to read or write to a file system. But some people who use real-time kernels have been known to do insane things, including controlling laser-wielding industrial robots. :-) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-07-01ext4: only zero partial blocks in ext4_zero_partial_blocks()Lukas Czerner
Currently if we pass range into ext4_zero_partial_blocks() which covers entire block we would attempt to zero it even though we should only zero unaligned part of the block. Fix this by checking whether the range covers the whole block skip zeroing if so. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: check error return from ext4_write_inline_data_end()Theodore Ts'o
The function ext4_write_inline_data_end() can return an error. So we need to assign it to a signed integer variable to check for an error return (since copied is an unsigned int). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
2013-07-01ext4: delete unnecessary C statementsjon ernst
Comparing unsigned variable with 0 always returns false. err = 0 is duplicated and unnecessary. [ tytso: Also cleaned up error handling in ext4_block_zero_page_range() ] Signed-off-by: "Jon Ernst" <jonernst07@gmx.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()Al Viro
Both ext3 and ext4 htree_dirblock_to_tree() is just filling the in-core rbtree for use by call_filldir(). All updates of ->f_pos are done by the latter; bumping it here (on error) is obviously wrong - we might very well have it nowhere near the block we'd found an error in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-07-01jbd2: move superblock checksum calculation to jbd2_write_superblock()Theodore Ts'o
Some of the functions which modify the jbd2 superblock were not updating the checksum before calling jbd2_write_superblock(). Move the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so that the checksum is calculated consistently. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: stable@vger.kernel.org
2013-07-01ext4: pass inode pointer instead of file pointer to punch holeAshish Sangwan
No need to pass file pointer when we can directly pass inode pointer. Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: improve free space calculation for inline_databoxi liu
In ext4 feature inline_data,it use the xattr's space to store the inline data in inode.When we calculate the inline data as the xattr,we add the pad.But in get_max_inline_xattr_value_size() function we count the free space without pad.It cause some contents are moved to a block even if it can be stored in the inode. Signed-off-by: liulei <lewis.liulei@huawei.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Tao Ma <boyu.mt@taobao.com>
2013-07-01ext4: reduce object size when !CONFIG_PRINTKJoe Perches
Reduce the object size ~10% could be useful for embedded systems. Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and arguments, passing " " to functions when !CONFIG_PRINTK and still verifying format and arguments with no_printk. $ size fs/ext4/built-in.o* text data bss dec hex filename 239375 610 888 240873 3ace9 fs/ext4/built-in.o.new 264167 738 888 265793 40e41 fs/ext4/built-in.o.old $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config # CONFIG_PRINTK is not set CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y CONFIG_EXT4_FS_POSIX_ACL=y # CONFIG_EXT4_FS_SECURITY is not set # CONFIG_EXT4_DEBUG is not set Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: improve extent cache shrink mechanism to avoid to burn CPU timeZheng Liu
Now we maintain an proper in-order LRU list in ext4 to reclaim entries from extent status tree when we are under heavy memory pressure. For keeping this order, a spin lock is used to protect this list. But this lock burns a lot of CPU time. We can use the following steps to trigger it. % cd /dev/shm % dd if=/dev/zero of=ext4-img bs=1M count=2k % mkfs.ext4 ext4-img % mount -t ext4 -o loop ext4-img /mnt % cd /mnt % for ((i=0;i<160;i++)); do truncate -s 64g $i; done % for ((i=0;i<160;i++)); do cp $i /dev/null &; done % perf record -a -g % perf report This commit tries to fix this problem. Now a new member called i_touch_when is added into ext4_inode_info to record the last access time for an inode. Meanwhile we never need to keep a proper in-order LRU list. So this can avoid to burns some CPU time. When we try to reclaim some entries from extent status tree, we use list_sort() to get a proper in-order list. Then we traverse this list to discard some entries. In ext4_sb_info, we use s_es_last_sorted to record the last time of sorting this list. When we traverse the list, we skip the inode that is newer than this time, and move this inode to the tail of LRU list. When the head of the list is newer than s_es_last_sorted, we will sort the LRU list again. In this commit, we break the loop if s_extent_cache_cnt == 0 because that means that all extents in extent status tree have been reclaimed. Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is changed to save a local variable in these functions. Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: implement error handling of ext4_mb_new_preallocation()Alexey Khoroshilov
If memory allocation in ext4_mb_new_group_pa() is failed, it returns error code, ext4_mb_new_preallocation() propages it, but ext4_mb_new_blocks() ignores it. An observed result was: - allocation fail means ext4_mb_new_group_pa() does not update ext4_allocation_context; - ext4_mb_new_blocks() sets ext4_allocation_request->len (ar->len = ac->ac_b_ex.fe_len;) to number of blocks preallocated (512) instead of number of blocks requested (1); - that activates update cycle in ext4_splice_branch(): for (i = 1; i < blks; i++) <-- blks is 512 instead of 1 here *(where->p + i) = cpu_to_le32(current_block++); - it iterates 511 times and corrupts a chunk of memory including inode structure; - page fault happens at EXT4_SB(inode->i_sb) in ext4_mark_inode_dirty(); - system hangs with 'scheduling while atomic' BUG. The patch implements a check for ext4_mb_new_preallocation() error code and handles its failure as if ext4_mb_regular_allocator() fails. Found by Linux File System Verification project (linuxtesting.org). [ Patch restructed by tytso to make the flow of control easier to follow. ] Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: fix corruption when online resizing a fs with 1K block sizeMaarten ter Huurne
Subtracting the number of the first data block places the superblock backups one block too early, corrupting the file system. When the block size is larger than 1K, the first data block is 0, so the subtraction has no effect and no corruption occurs. Signed-off-by: Maarten ter Huurne <maarten@treewalker.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> CC: stable@vger.kernel.org
2013-06-30Linux 3.10Linus Torvalds
2013-06-30Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull another powerpc fix from Benjamin Herrenschmidt: "I mentioned that while we had fixed the kernel crashes, EEH error recovery didn't always recover... It appears that I had a fix for that already in powerpc-next (with a stable CC). I cherry-picked it today and did a few tests and it seems that things now work quite well. The patch is also pretty simple, so I see no reason to wait before merging it." * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/eeh: Fix fetching bus for single-dev-PE
2013-06-30Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of seven bug fixes. Several fcoe fixes for locking problems, initiator issues and a VLAN API change, all of which could eventually lead to data corruption, one fix for a qla2xxx locking problem which could lead to multiple completions of the same request (and subsequent data corruption) and a use after free in the ipr driver. Plus one minor MAINTAINERS file update" (only six bugfixes in this pull, since I had already pulled the fcoe API fix directly from Robert Love) * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] ipr: Avoid target_destroy accessing memory after it was freed [SCSI] qla2xxx: Fix for locking issue between driver ISR and mailbox routines MAINTAINERS: Fix fcoe mailing list libfc: extend ex_lock to protect all of fc_seq_send libfc: Correct check for initiator role libfcoe: Fix Conflicting FCFs issue in the fabric
2013-06-30powerpc/eeh: Fix fetching bus for single-dev-PEGavin Shan
While running Linux as guest on top of phyp, we possiblly have PE that includes single PCI device. However, we didn't return its PCI bus correctly and it leads to failure on recovery from EEH errors for single-dev-PE. The patch fixes the issue. Cc: <stable@vger.kernel.org> # v3.7+ Cc: Steve Best <sbest@us.ibm.com> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-29Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Ben Herrenschmidt: "We discovered some breakage in our "EEH" (PCI Error Handling) code while doing error injection, due to a couple of regressions. One of them is due to a patch (37f02195bee9 "powerpc/pci: fix PCI-e devices rescan issue on powerpc platform") that, in hindsight, I shouldn't have merged considering that it caused more problems than it solved. Please pull those two fixes. One for a simple EEH address cache initialization issue. The other one is a patch from Guenter that I had originally planned to put in 3.11 but which happens to also fix that other regression (a kernel oops during EEH error handling and possibly hotplug). With those two, the couple of test machines I've hammered with error injection are remaining up now. EEH appears to still fail to recover on some devices, so there is another problem that Gavin is looking into but at least it's no longer crashing the kernel." * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/pci: Improve device hotplug initialization powerpc/eeh: Add eeh_dev to the cache during boot
2013-06-29ARM: dt: Only print warning, not WARN() on bad cpu map in device treeOlof Johansson
Due to recent changes and expecations of proper cpu bindings, there are now cases for many of the in-tree devicetrees where a WARN() will hit on boot due to badly formatted /cpus nodes. Downgrade this to a pr_warn() to be less alarmist, since it's not a new problem. Tested on Arndale, Cubox, Seaboard and Panda ES. Panda hits the WARN without this, the others do not. Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-30powerpc/pci: Improve device hotplug initializationGuenter Roeck
Commit 37f02195b (powerpc/pci: fix PCI-e devices rescan issue on powerpc platform) fixes a problem with interrupt and DMA initialization on hot plugged devices. With this commit, interrupt and DMA initialization for hot plugged devices is handled in the pci device enable function. This approach has a couple of drawbacks. First, it creates two code paths for device initialization, one for hot plugged devices and another for devices known during the initial PCI scan. Second, the initialization code for hot plugged devices is only called when the device is enabled, ie typically in the probe function. Also, the platform specific setup code is called each time pci_enable_device() is called, not only once during device discovery, meaning it is actually called multiple times, once for devices discovered during the initial scan and again each time a driver is re-loaded. The visible result is that interrupt pins are only assigned to hot plugged devices when the device driver is loaded. Effectively this changes the PCI probe API, since pci_dev->irq and the device's dma configuration will now only be valid after pci_enable() was called at least once. A more subtle change is that platform specific PCI device setup is moved from device discovery into the driver's probe function, more specifically into the pci_enable_device() call. To fix the inconsistencies, add new function pcibios_add_device. Call pcibios_setup_device from pcibios_setup_bus_devices if device setup is not complete, and from pcibios_add_device if bus setup is complete. With this change, device setup code is moved back into device initialization, and called exactly once for both static and hot plugged devices. [ This also fixes a regression introduced by the above patch which causes dev->irq to be overwritten under some cirumstances after MSIs have been enabled for the device which leads to crashes due to the MSI core "hijacking" dev->irq to store the base MSI number and not the LSI. --BenH ] Cc: Yuanquan Chen <Yuanquan.Chen@freescale.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto fix from Herbert Xu: "This fixes a crash in the crypto layer exposed by an SCTP test tool" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algboss - Hold ref count on larval
2013-06-29Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm/qxl fix from Dave Airlie: "Bad me forgot an access check, possible security issue, but since this is the first kernel with it, should be fine to just put it in now" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/qxl: add missing access check for execbuffer ioctl
2013-06-29Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validationMathieu Desnoyers
This __put_user() could be used by unprivileged processes to write into kernel memory. The issue here is that even if copy_siginfo_to_user() fails, the error code is not checked before __put_user() is executed. Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle, so it has not hit a stable release yet. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Roland McGrath <roland@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Pedro Alves <palves@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This is a recently spotted regression in the snapshot behavior... It turns out several tests weren't being run in the nightlies so this took a while to spot" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: send snapshot context with writes
2013-06-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ubifs fixes from Al Viro: "A couple of ubifs readdir/lseek race fixes. Stable fodder, really nasty..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: UBIFS: fix a horrid bug UBIFS: prepare to fix a horrid bug
2013-06-29Merge tag 'for-linus-20130628' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300 Pull two MN10300 fixes from David Howells: "The first fixes a problem with passing arrays rather than pointers to get_user() where __typeof__ then wants to declare and initialise an array variable which gcc doesn't like. The second fixes a problem whereby putting mem=xxx into the kernel command line causes init=xxx to get an incorrect value." * tag 'for-linus-20130628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300: mn10300: Use early_param() to parse "mem=" parameter mn10300: Allow to pass array name to get_user()
2013-06-29Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Correct an ordering issue in the tick broadcast code. I really wish we'd get compensation for pain and suffering for each line of code we write to work around dysfunctional timer hardware." * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: Fix tick_broadcast_pending_mask not cleared
2013-06-29Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "One more fix for a recently discovered bug" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Disable monitoring on setuid processes for regular users
2013-06-29[readdir] constify ->actorAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29[readdir] ->readdir() is goneAl Viro
everything's converted to ->iterate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29[readdir] convert ecryptfsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29[readdir] convert codaAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>