summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-05-27exec: replace call_usermodehelper_pipe with use of umh init function and ↵Neil Horman
resolve limit The first patch in this series introduced an init function to the call_usermodehelper api so that processes could be customized by caller. This patch takes advantage of that fact, by customizing the helper in do_coredump to create the pipe and set its core limit to one (for our recusrsion check). This lets us clean up the previous uglyness in the usermodehelper internals and factor call_usermodehelper out entirely. While I'm at it, we can also modify the helper setup to look for a core limit value of 1 rather than zero for our recursion check Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27ufs: permit mounting of BorderWare filesystemsThomas Stewart
I recently had to recover some files from an old broken machine that was running BorderWare Document Gateway. It's basically a drop in web server for sharing files. From the look of the init process and using strings on of a few files it seems to be based on FreeBSD 3.3. The process turned out to be more difficult than I imagined, but to cut a long story short BorderWare in their wisdom use a nonstandard magic number in their UFS (ufstype=44bsd) file systems. Thus Linux refuses to mount the file systems in order to recover the data. After a bit of hunting I was able to make a quick fix to fs/ufs/super.c in order to detect the new magic number. I assume that this number is the same for all installations. It's quite easy to find out from ufs_fs.h. The superblock sits 8k into the block device and the magic number its 1372 bytes into the superblock struct. # dd if=/dev/sda5 skip=$(( 8192 + 1372 )) bs=1 count=4 2> /dev/null | hd 00000000 97 26 24 0f |.&$.| # Signed-off-by: Thomas Stewart <thomas@stewarts.org.uk> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27fs/autofs4: use memdup_userJulia Lawall
Use memdup_user when user data is immediately copied into the allocated region. Elimination of the variable ads, which is no longer useful. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = \(kmalloc@p\|kzalloc@p\)(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) || ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27ext3 uses rb_node = NULL; to zero rb_root.Venkatesh Pallipadi
The problem with this is that 17d9ddc72fb8bba0d4f678 ("rbtree: Add support for augmented rbtrees") in the linux-next tree adds a new field to that struct which needs to be NULLas well. This patch uses RB_ROOT as the intializer so all of the relevant fields will be NULL'd. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Eric Paris <eparis@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-27quota: Fixup dquot_transferJan Kara
Commit bc8e5f07392f05c47c8bdeff4f7098db440d065c had a typo which caused quota miscomputation when changing owner group of a file. Linus will hate me. Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-27reiserfs: Fix resuming of quotas on remount read-writeJan Kara
When quota was suspended on remount-ro, finish_unfinished() will try to turn it on again (which fails) and also turns the quotas off on exit. Fix the function to check whether quotas are already on at function entry and do not turn them off in that case. CC: reiserfs-devel@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-27Btrfs: add more error checking to btrfs_dirty_inodeChris Mason
The ENOSPC code will now return ENOSPC to btrfs_start_transaction. btrfs_dirty_inode needs to check for this and error out appropriately. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: allow unaligned DIOChris Mason
In order to support DIO that isn't aligned to the filesystem blocksize, we fall back to buffered for any unaligned DIOs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: drop verbose enospc printkChris Mason
Less printk is good printk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: Fix block generation verification raceYan, Zheng
After the path is released, the generation number got from block pointer is no long valid. The race may cause disk corruption, because verify_parent_transid() calls clear_extent_buffer_uptodate() when generation numbers mismatch. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: fix preallocation and nodatacow checks in O_DIRECTChris Mason
The O_DIRECT code wasn't checking for multiple references on preallocated or nodatacow extents. This means it wasn't honoring snapshots properly. The fix here is to add an explicit check for multiple references This also fixes the math for selecting the correct disk block, making sure not to go past the end of the extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: squashfs: update documentation to include description of xattr layout squashfs: fix name reading in squashfs_xattr_get squashfs: constify xattr handlers squashfs: xattr fix sparse warnings squashfs: xattr_lookup sparse fix squashfs: add xattr support configure option squashfs: add new extended inode types squashfs: add support for xattr reading squashfs: add xattr id support
2010-05-26fs/fscache/object-list.c: fix warning on 32-bitAndrew Morton
fs/fscache/object-list.c: In function 'fscache_objlist_lookup': fs/fscache/object-list.c:105: warning: cast to pointer from integer of different size Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-26Btrfs: avoid ENOSPC errors in btrfs_dirty_inodeChris Mason
btrfs_dirty_inode tries to sneak in without much waiting or space reservation, mostly for performance reasons. This usually works well but can cause problems when there are many many writers. When btrfs_update_inode fails with ENOSPC, we fallback to a slower btrfs_start_transaction call that will reserve some space. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: move O_DIRECT space reservation to btrfs_direct_IOChris Mason
This moves the delalloc space reservation done for O_DIRECT into btrfs_direct_IO. This way we don't leak reserved space if the generic O_DIRECT write code errors out before it calls into btrfs_direct_IO. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26NFS: Fix another nfs_wb_page() deadlockTrond Myklebust
J.R. Okajima reports that the call to sync_inode() in nfs_wb_page() can deadlock with other writeback flush calls. It boils down to the fact that we cannot ever call writeback_single_inode() while holding a page lock (even if we do set nr_to_write to zero) since another process may already be waiting in the call to do_writepages(), and so will deny us the I_SYNC lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-26NFS: Ensure that we mark the inode as dirty if we exit early from commitTrond Myklebust
If we exit from nfs_commit_inode() without ensuring that the COMMIT rpc call has been completed, we must re-mark the inode as dirty. Otherwise, future calls to sync_inode() with the WB_SYNC_ALL flag set will fail to ensure that the data is on the disk. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-26NFS: Fix a lock imbalance typo in nfs_access_cache_shrinkerTrond Myklebust
Commit 9c7e7e23371e629dbb3b341610a418cdf1c19d91 (NFS: Don't call iput() in nfs_access_cache_shrinker) unintentionally removed the spin unlock for the inode->i_lock. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-26mm: export generic_pipe_buf_*() to modulesMiklos Szeredi
This is needed by fuse device code which wants to create pipe buffers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-05-25Btrfs: rework O_DIRECT enospc handlingChris Mason
This changes O_DIRECT write code to mark extents as delalloc while it is processing them. Yan Zheng has reworked the enospc accounting based on tracking delalloc extents and this makes it much easier to track enospc in the O_DIRECT code. There are a few space cases with the O_DIRECT code though, it only sets the EXTENT_DELALLOC bits, instead of doing EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because we don't want to mess with clearing the dirty and uptodate bits when things go wrong. This is important because there are no pages in the page cache, so any extent state structs that we put in the tree won't get freed by releasepage. We have to clear them ourselves as the DIO ends. With this commit, we reserve space at in btrfs_file_aio_write, and then as each btrfs_direct_IO call progresses it sets EXTENT_DELALLOC on the range. btrfs_get_blocks_direct is responsible for clearing the delalloc at the same time it drops the extent lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25driver core: add devname module aliases to allow module on-demand auto-loadingKay Sievers
This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix permissions checking for setflags ioctl() GFS2: Don't "get" xattrs for ACLs when ACLs are turned off GFS2: Rework reclaiming unlinked dinodes
2010-05-25Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: Ensure inode allocation buffers are fully replayed xfs: enable background pushing of the CIL xfs: forced unmounts need to push the CIL xfs: Introduce delayed logging core code xfs: Delayed logging design documentation xfs: Improve scalability of busy extent tracking xfs: make the log ticket ID available outside the log infrastructure xfs: clean up log ticket overrun debug output xfs: Clean up XFS_BLI_* flag namespace xfs: modify buffer item reference counting xfs: allow log ticket allocation to take allocation flags xfs: Don't reuse the same transaction ID for duplicated transactions.
2010-05-25smbfs: remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in fs/smbfs/symlink.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25fs: ldm: don't use own implementation of hex_to_bin()Andy Shevchenko
Remove own implementation of hex_to_bin(). Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com> Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25fatfs: ratelimit corruption reportOGAWA Hirofumi
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25ntfs: use add_to_page_cache_lru()Minchan Kim
Quote from Nick piggin's about btrfs patch - http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04472.html. "add_to_page_cache_lru is exported, so it should be used. Benefits over using a private pagevec: neater code, 128 bytes fewer stack used, percpu lru ordering is preserved, and finally don't need to flush pagevec before returning so batching may be shared with other LRU insertions." Let's use it instead of private pagevec in ntfs, too. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25ntfs: clean up ntfs_attr_extend_initializedMinchan Kim
cached_page and lru_pvec have not been used. Let's remove the arguments. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, ↵Alexey Dobriyan
SHRT_MAX and SHRT_MIN - C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not USHORT_MAX/SHORT_MAX/SHORT_MIN. - Make SHRT_MIN of type s16, not int, for consistency. [akpm@linux-foundation.org: fix drivers/dma/timb_dma.c] [akpm@linux-foundation.org: fix security/keys/keyring.c] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25fs-writeback: check sync bit earlier in inode_wait_for_writebackRichard Kennedy
When wb_writeback() hasn't written anything it will re-acquire the inode lock before calling inode_wait_for_writeback. This change tests the sync bit first so that is doesn't need to drop & re-acquire the lock if the inode became available while wb_writeback() was waiting to get the lock. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25mm: migration: avoid race between shift_arg_pages() and rmap_walk() during ↵Mel Gorman
migration by not migrating temporary stacks Page migration requires rmap to be able to find all ptes mapping a page at all times, otherwise the migration entry can be instantiated, but it is possible to leave one behind if the second rmap_walk fails to find the page. If this page is later faulted, migration_entry_to_page() will call BUG because the page is locked indicating the page was migrated by the migration PTE not cleaned up. For example kernel BUG at include/linux/swapops.h:105! invalid opcode: 0000 [#1] PREEMPT SMP ... Call Trace: [<ffffffff810e951a>] handle_mm_fault+0x3f8/0x76a [<ffffffff8130c7a2>] do_page_fault+0x44a/0x46e [<ffffffff813099b5>] page_fault+0x25/0x30 [<ffffffff8114de33>] load_elf_binary+0x152a/0x192b [<ffffffff8111329b>] search_binary_handler+0x173/0x313 [<ffffffff81114896>] do_execve+0x219/0x30a [<ffffffff8100a5c6>] sys_execve+0x43/0x5e [<ffffffff8100320a>] stub_execve+0x6a/0xc0 RIP [<ffffffff811094ff>] migration_entry_wait+0xc1/0x129 There is a race between shift_arg_pages and migration that triggers this bug. A temporary stack is setup during exec and later moved. If migration moves a page in the temporary stack and the VMA is then removed before migration completes, the migration PTE may not be found leading to a BUG when the stack is faulted. This patch causes pages within the temporary stack during exec to be skipped by migration. It does this by marking the VMA covering the temporary stack with an otherwise impossible combination of VMA flags. These flags are cleared when the temporary stack is moved to its final location. [kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25pagemap: add #ifdefs CONFIG_HUGETLB_PAGE on code walking hugetlb vmaNaoya Horiguchi
If !CONFIG_HUGETLB_PAGE, pagemap_hugetlb_range() is never called. So put it (and its calling function) into #ifdef block. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25Btrfs: use async helpers for DIO write checksummingChris Mason
The async helper threads offload crc work onto all the CPUs, and make streaming writes much faster. This changes the O_DIRECT write code to use them. The only small complication was that we need to pass in the logical offset in the file for each bio, because we can't find it in the bio's pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: don't walk around with task->state != TASK_RUNNINGChris Mason
Yan Zheng noticed two places we were doing a lot of work without task->state set to TASK_RUNNING. This sets the state properly after we get ready to sleep but decide not to. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: do aio_write instead of writeJosef Bacik
In order for AIO to work, we need to implement aio_write. This patch converts our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and nothing broke, and the AIO stuff magically started working. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: add basic DIO read/write supportJosef Bacik
This provides basic DIO support for reading and writing. It does not do the work to recover from mismatching checksums, that will come later. A few design changes have been made from Jim's code (sorry Jim!) 1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO code in order to account for all of BTRFS's oddities, but thanks to that work it seems like the best bet is to just ignore compression and such and just opt to fallback on buffered IO. 2) Fallback on buffered IO for compressed or inline extents. Jim's code did it's own buffering to make dio with compressed extents work. Now we just fallback onto normal buffered IO. 3) Use ordered extents for the writes so that all of the lock_extent() lookup_ordered() type checks continue to work. 4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with DIO writes. I've tested this with fsx and everything works great. This patch depends on my dio and filemap.c patches to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25direct-io: do not merge logically non-contiguous requestsJosef Bacik
Btrfs cannot handle having logically non-contiguous requests submitted. For example if you have Logical: [0-4095][HOLE][8192-12287] Physical: [0-4095] [4096-8191] Normally the DIO code would put these into the same BIO's. The problem is we need to know exactly what offset is associated with what BIO so we can do our checksumming and unlocking properly, so putting them in the same BIO doesn't work. So add another check where we submit the current BIO if the physical blocks are not contigous OR the logical blocks are not contiguous. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25direct-io: add a hook for the fs to provide its own submit_bio functionJosef Bacik
Because BTRFS can do RAID and such, we need our own submit hook so we can setup the bio's in the correct fashion, and handle checksum errors properly. So there are a few changes here 1) The submit_io hook. This is straightforward, just call this instead of submit_bio. 2) Allow the fs to return -ENOTBLK for reads. Usually this has only worked for writes, since writes can fallback onto buffered IO. But BTRFS needs the option of falling back on buffered IO if it encounters a compressed extent, since we need to read the entire extent in and decompress it. So if we get -ENOTBLK back from get_block we'll return back and fallback on buffered just like the write case. I've tested these changes with fsx and everything seems to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata ENOSPC handling for balanceYan, Zheng
This patch adds metadata ENOSPC handling for the balance code. It is consisted by following major changes: 1. Avoid COW tree leave in the phrase of merging tree. 2. Handle interaction with snapshot creation. 3. make the backref cache can live across transactions. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Pre-allocate space for data relocationYan, Zheng
Pre-allocate space for data relocation. This can detect ENOPSC condition caused by fragmentation of free space. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata ENOSPC handling for tree logYan, Zheng
Previous patches make the allocater return -ENOSPC if there is no unreserved free metadata space. This patch updates tree log code and various other places to propagate/handle the ENOSPC error. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata reservation for orphan inodesYan, Zheng
reserve metadata space for handling orphan inodes Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Introduce global metadata reservationYan, Zheng
Reserve metadata space for extent tree, checksum tree and root tree Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Update metadata reservation for delayed allocationYan, Zheng
Introduce metadata reservation context for delayed allocation and update various related functions. This patch also introduces EXTENT_FIRST_DELALLOC control bit for set/clear_extent_bit. It tells set/clear_bit_hook whether they are processing the first extent_state with EXTENT_DELALLOC bit set. This change is important if set/clear_extent_bit involves multiple extent_state. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Integrate metadata reservation with start_transactionYan, Zheng
Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Introduce contexts for metadata reservationYan, Zheng
Introducing metadata reseravtion contexts has two major advantages. First, it makes metadata reseravtion more traceable. Second, it can reclaim freed space and re-add them to the itself after transaction committed. Besides add btrfs_block_rsv structure and related helper functions, This patch contains following changes: Move code that decides if freed tree block should be pinned into btrfs_free_tree_block(). Make space accounting more accurate, mainly for handling read only block groups. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Kill init_btrfs_i()Yan, Zheng
All code in init_btrfs_i can be moved into btrfs_alloc_inode() Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Shrink delay allocated space in a synchronizedYan, Zheng
Shrink delayed allocation space in a synchronized manner is more controllable than flushing all delay allocated space in an async thread. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Kill allocate_wait in space_infoYan, Zheng
We already have fs_info->chunk_mutex to avoid concurrent chunk creation. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Link block groups of different raid typesYan, Zheng
The size of reserved space is stored in space_info. If block groups of different raid types are linked to separate space_info, changing allocation profile will corrupt reserved space accounting. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>