summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-06-19Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: Fix kernel-doc parameter name typo in blk-settings.c: block: rename CONFIG_LBD to CONFIG_LBDAF block: Fix bounce_pfn setting hd: stop defining MAJOR_NR
2009-06-19inotify: inotify_destroy_mark_entry could get called twiceEric Paris
inotify_destroy_mark_entry could get called twice for the same mark since it is called directly in inotify_rm_watch and when the mark is being destroyed for another reason. As an example assume that the file being watched was just deleted so inotify_destroy_mark_entry would get called from the path fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() -> fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry(). If this happened at the same time as userspace tried to remove a watch via inotify_rm_watch we could attempt to remove the mark from the idr twice and could thus double dec the ref cnt and potentially could be in a use after free/double free situation. The fix is to have inotify_rm_watch use the generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the inotify_destroy_mark_entry() function can only be called one. This patch also renames the function to inotify_ingored_remove_idr() so it is clear what is actually going on in the function. Hopefully this fixes: [ 20.342058] idr_remove called for id=20 which is not allocated. [ 20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077 [ 20.353933] Call Trace: [ 20.356410] [<ffffffff811a82b7>] idr_remove+0x115/0x18f [ 20.361737] [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75 [ 20.367061] [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf [ 20.373771] [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf [ 20.380306] [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10 [ 20.386238] [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170 [ 20.393293] [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf [ 20.399829] [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6 [ 20.405850] [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Peter Ziljlstra <peterz@infradead.org>
2009-06-19block: rename CONFIG_LBD to CONFIG_LBDAFBartlomiej Zolnierkiewicz
Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-18nfsd41: Backchannel: minorversion support for the back channelAndy Adamson
Prepare to share backchannel code with NFSv4.1. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [nfsd41: use nfsd4_cb_sequence for callback minorversion] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18nfsd41: Backchannel: cleanup nfs4.0 callback encode routinesAndy Adamson
Mimic the client and prepare to share the back channel xdr with NFSv4.1. Bump the number of operations in each encode routine, then backfill the number of operations. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18Merge branch 'devel-for-2.6.31' into for-2.6.31Trond Myklebust
Conflicts: fs/nfs/client.c fs/nfs/super.c
2009-06-18nfsd41: Remove ip address collision detection caseMike Sager
Verified that cthon and pynfs exchange id tests pass (except for the two expected fails: EID8 and EID50) Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: clean up jbd2_journal_try_to_free_buffers() ext4: Don't update ctime for non-extent-mapped inodes ext4: Fix up whitespace issues in fs/ext4/inode.c ext4: Fix 64-bit block type problem on 32-bit platforms ext4: teach the inode allocator to use a goal inode number ext4: Use a hash of the topdir directory name for the Orlov parent group ext4: document the "abort" mount option ext4: move the abort flag from s_mount_opts to s_mount_flags ext4: update the s_last_mounted field in the superblock ext4: change s_mount_opt to be an unsigned int ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl ext4: avoid unnecessary spinlock in critical POSIX ACL path ext3: avoid unnecessary spinlock in critical POSIX ACL path ext4: convert instrumentation from markers to tracepoints jbd2: convert instrumentation from markers to tracepoints
2009-06-18seq_file: add function to write binary dataPeter Oberparleiter
seq_write() can be used to construct seq_files containing arbitrary data. Required by the gcov-profiling interface to synthesize binary profiling data files. Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18elf_core_dump: use rcu_read_lock() to access ->real_parentOleg Nesterov
In theory it is not safe to dereference ->parent/real_parent without tasklist or rcu lock, we can race with re-parenting. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18reiserfs: fix warnings with gcc 4.4Jeff Mahoney
Several code paths in reiserfs have a construct like: if (is_direntry_le_ih(ih = B_N_PITEM_HEAD(src, item_num))) ... which, in addition to being ugly, end up causing compiler warnings with gcc 4.4.0. Previous compilers didn't issue a warning. fs/reiserfs/do_balan.c:1273: warning: operation on `aux_ih' may be undefined fs/reiserfs/lbalance.c:393: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:421: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:777: warning: operation on `ih' may be undefined I believe this is due to the ih being passed to macros which evaluate the argument more than once. This is old code and we haven't seen any problems with it, but this patch eliminates the warnings. It converts the multiple evaluation macros to static inlines and does a preassignment for the cases that were causing the warnings because that code is just ugly. Reported-by: Chris Mason <mason@oracle.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18ufs: sector_t cannot be negativeRoel Kluin
unsigned i_block,fragment cannot be negative. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18isofs: cleanup mount option processingJan Kara
Remove unused variables from isofs_sb_info (used to be some mount options), unify variables for option to use 0/1 (some options used 'y'/'n'), use bit fields for option flags in superblock. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18isofs: fix setting of uid and gid to 0Jan Kara
isofs allows setting of default uid and gid of files but value 0 was used to indicate that user did not specify any uid/gid mount option. Since this option also overrides uid/gid set in Rock Ridge extension, it makes sense to allow forcing uid/gid 0. Fix option processing to allow this. Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18isofs: let mode and dmode mount options override rock ridge mode settingJan Kara
So far, permissions set via 'mode' and/or 'dmode' mount options were effective only if the medium had no rock ridge extensions (or was mounted without them). Add 'overriderockmode' mount option to indicate that these options should override permissions set in rock ridge extensions. Maybe this should be default but the current behavior is there since mount options were created so I think we should not change how they behave. Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18ext3: make sure inode is deleted from orphan list after truncateJan Kara
As Ted pointed out, it can happen that ext3_truncate() returns without removing inode from orphan list. This way we could in some rare cases (like when we get ENOMEM from an allocation in ext3_truncate called because of failed ext3_write_begin) leave the inode on orphan list and that triggers assertion failure on umount. So make ext3_truncate() always remove inode from in-memory orphan list. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18jbd: clean up journal_try_to_free_buffers()Hisashi Hifumi
I delete the following patch "commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91 Author: Mingming Cao <cmm@us.ibm.com> Date: Fri Jul 25 01:46:22 2008 -0700 jbd: fix race between free buffer and commit transaction This patch is no longer needed because if race between freeing buffer and committing transaction functionality occurs and dio gets error, currently dio falls back to buffered IO by the following patch. commit 6ccfa806a9cfbbf1cd43d5b6aa47ef2c0eb518fd Author: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Date: Tue Sep 2 14:35:40 2008 -0700 VFS: fix dio write returning EIO when try_to_release_page fails Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Theodore Tso <tytso@mit.edu> Cc: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18ext3: fix chain verification in ext3_get_blocks()Jan Kara
Chain verification in ext3_get_blocks() has been hosed since it called verify_chain(chain, NULL) which always returns success. As a result readers could in theory race with truncate. On the other hand the race probably cannot happen with the current locking scheme, since by the time ext3_truncate() is called all the pages are already removed and hence get_block() shouldn't be called on such pages... Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18ext2: Do not update mtime of a moved directoryJan Kara
One of our users is complaining that his backup tool is upset on ext2 (while it's happy on ext3, xfs, ...) because of the mtime change. The problem is: mkdir foo mkdir bar mkdir foo/a Now under ext2: mv foo/a foo/b changes mtime of 'foo/a' (foo/b after the move). That does not really make sense and it does not happen under any other filesystem I've seen. More complicated is: mv foo/a bar/a This changes mtime of foo/a (bar/a after the move) and it makes some sense since we had to update parent directory pointer of foo/a. But again, no other filesystem does this. So after some thoughts I'd vote for consistency and change ext2 to behave the same as other filesystems. Do not update mtime of a moved directory. Specs don't say anything about it (neither that it should, nor that it should not be updated) and other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do it. So let's become more consistent. Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel. Reported-by: <ronny.pretzsch@dfs.de> Cc: <hare@suse.de> Cc: Jörn Engel <joern@logfs.org> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18proc: vmcore - use kzalloc in get_new_element()Cyrill Gorcunov
Instead of kmalloc+memset better use straight kzalloc Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18procfs: remove sparse errors in proc_devtree.cMichal Simek
CHECK fs/proc/proc_devtree.c fs/proc/proc_devtree.c:197:14: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:203:34: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:210:14: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:223:26: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:226:14: warning: Using plain integer as NULL pointer Signed-off-by: Michal Simek <monstr@monstr.eu> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18epoll: fix nested calls supportDavide Libenzi
This fixes a regression in 2.6.30. I unfortunately accepted a patch time ago, to drop the "current" usage from possible IRQ context, w/out proper thought over it. The patch switched to using the CPU id by bounding the nested call callback with a get_cpu()/put_cpu(). Unfortunately the ep_call_nested() function can be called with a callback that grabs sleepy locks (from own f_op->poll()), that results in epic fails. The following patch uses the proper "context" depending on the path where it is called, and on the kind of callback. This has been reported by Stefan Richter, that has also verified the patch is his previously failing environment. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18proc: export statistics for softirq to /procKeika Kobayashi
Export statistics for softirq in /proc/softirqs and /proc/stat. 1. /proc/softirqs Implement /proc/softirqs which shows the number of softirq for each CPU like /proc/interrupts. 2. /proc/stat Add the "softirq" line to /proc/stat. This line shows the number of softirq for all cpu. The first column is the total of all softirqs and each subsequent column is the total for particular softirq. [kosaki.motohiro@jp.fujitsu.com: remove redundant for_each_possible_cpu() loop] Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp> Reviewed-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18nfsd: optimise the starting of zero threads when none are running.NeilBrown
Currently, if we ask to set then number of nfsd threads to zero when there are none running, we set up all the sockets and register the service, and then tear it all down again. This is pointless. So detect that case and exit promptly. (also remove an assignment to 'error' which was never used. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com>
2009-06-18nfsd: don't take nfsd_mutex twice when setting number of threads.NeilBrown
Currently when we write a number to 'threads' in nfsdfs, we take the nfsd_mutex, update the number of threads, then take the mutex again to read the number of threads. Mostly this isn't a big deal. However if we are write '0', and portmap happens to be dead, then we can get unpredictable behaviour. If the nfsd threads all got killed quickly and the last thread is waiting for portmap to respond, then the second time we take the mutex we will block waiting for the last thread. However if the nfsd threads didn't die quite that fast, then there will be no contention when we try to take the mutex again. Unpredictability isn't fun, and waiting for the last thread to exit is pointless, so avoid taking the lock twice. To achieve this, get nfsd_svc return a non-negative number of active threads when not returning a negative error. Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18fs: Provide empty .set_page_dirty() aop for anon inodesPeter Zijlstra
.set_page_dirty() is one of those a_ops that defaults to the buffer implementation when not set. Therefore provide a dummy function to make it do nothing. (Uncovered by perfcounters fd's which can now be writable-mmap-ed.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18udf: Use device size when drive reported bogus number of written blocksJan Kara
Some drives report 0 as the number of written blocks when there are some blocks recorded. Use device size in such case so that we can automagically mount such media. Signed-off-by: Jan Kara <jack@suse.cz>
2009-06-17nfs: remove unnecessary NFS_INO_INVALID_ACL checksJames Morris
Unless I'm mistaken, NFS_INO_INVALID_ACL is being checked twice during getacl calls (i.e. first via nfs_revalidate_inode() and then by each all site). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: More "sloppy" parsing problemsChuck Lever
Specifying "port=-5" with the kernel's current mount option parser generates "unrecognized mount option". If "sloppy" is set, this causes the mount to succeed and use the default values; the desired behavior is that, since this is a valid option with an invalid value, the mount should fail, even with "sloppy." To properly handle "sloppy" parsing, we need to distinguish between correct options with invalid values, and incorrect options. We will need to parse integer values by hand, therefore, and not rely on match_token(). For instance, these must all fail with "invalid value": port=12345678 port=-5 port=samuel and not with "unrecognized option," as they do currently. Thus, for the sake of match_token() we need to treat the values for these options as strings, and do the conversion to integers using strict_strtol(). This is basically the same solution we used for the earlier "retry=" fix (commit ecbb3845), except in this case the kernel actually has to parse the value, rather than ignore it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Invalid mount option values should always fail, even with "sloppy"Chuck Lever
Ian Kent reports: "I've noticed a couple of other regressions with the options vers and proto option of mount.nfs(8). The commands: mount -t nfs -o vers=<invalid version> <server>:/<path> /<mountpoint> mount -t nfs -o proto=<invalid proto> <server>:/<path> /<mountpoint> both immediately fail. But if the "-s" option is also used they both succeed with the mount falling back to defaults (by the look of it). In the past these failed even when the sloppy option was given, as I think they should. I believe the sloppy option is meant to allow the mount command to still function for mount options (for example in shared autofs maps) that exist on other Unix implementations but aren't present in the Linux mount.nfs(8). So, an invalid value specified for a known mount option is different to an unknown mount option and should fail appropriately." See RH bugzilla 486266. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Remove unused XDR decoder functionsChuck Lever
Clean up: Remove xdr_decode_fhstatus() and xdr_decode_fhstatus3(), now that they are unused. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Update MNT and MNT3 reply decoding functionsChuck Lever
Solder xdr_stream-based XDR decoding functions into the in-kernel mountd client that are more careful about checking data types and watching for buffer overflows. The new MNT3 decoder includes support for auth-flavor list decoding. The "_sz" macro for MNT3 replies was missing the size of the file handle. I've added this back, and included the size of the auth flavor array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: add XDR decoder for mountd version 3 auth-flavor listsChuck Lever
Introduce an xdr_stream-based XDR decoder that can unpack the auth- flavor list returned in a MNT3 reply. The nfs_mount() function's caller allocates an array, and passes the size and a pointer to it. The decoder decodes all the flavors it can into the array, and returns the number of decoded flavors. If the caller is not interested in the auth flavors, it can pass a value of zero as the size of the pre-allocated array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: add new file handle decoders to in-kernel mountd clientChuck Lever
Introduce xdr_stream-based XDR file handle decoders to the in-kernel mountd client. These are more careful than the existing decoder functions about buffer overflows and data type and range checking. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Add separate mountd status code decoders for each mountd versionChuck Lever
Introduce data structures and xdr_stream-based decoding functions for unmarshalling mountd status codes properly. Mountd version 3 uses specific standard error return codes that are not errno values and not NFS3ERR_ values. These have a well-defined standard mapping to local errno values. Introduce data structures and a decoder function that map these status codes to local errno values properly. This is new functionality (but not used yet). Version 1 mountd status values are defined by RFC 1094 as UNIX error values (errno values). Errno values on heterogeneous systems do not necessarily match each other. To avoid exposing possibly incorrect errno values to upper layers, the current XDR decoder converts all non-zero MNT version 1 status codes to -EACCES. The OpenGroup XNFS standard provides a mapping similar to but smaller than the version 3 error codes. Implement a decoder that uses the XNFS error codes, replacing the current decoder. For both mountd protocol versions, map unrecognized errors to -EACCES. Finally we introduce a replacement data structure for mnt_fhstatus at this time, which is used by the new XDR decoders. In addition to documenting that the status value returned by the XDR decoders is always an errno, this new structure will be expanded in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: remove unused function in fs/nfs/mount_clnt.cChuck Lever
Clean up: remove xdr_encode_dirpath() now that it has been replaced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argumentChuck Lever
Check the length of the supplied dirpath, and see that it fits properly in the RPC buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Clean up MNT program definitionsChuck Lever
Clean up: Relocate MNT program procedure number definitions to the only file that uses them. Relocate the version number definitions, which are shared, to nfs.h. Remove duplicate program number definitions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17lockd: Don't bother with RPC ping for NSM upcallsChuck Lever
Cut NSM upcall RPC traffic in half -- don't do a NULL call first. The cases where a ping would be helpful are rare. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17lockd: Update NSM state from SM_MON repliesChuck Lever
When rpc.statd starts up in user space at boot time, it attempts to write the latest NSM local state number into /proc/sys/fs/nfs/nsm_local_state. If lockd.ko isn't loaded yet (as is the case in most configurations), that file doesn't exist, thus the kernel's NSM state remains set to its initial value of zero during lockd operation. This is a problem because rpc.statd and lockd use the NSM state number to prevent repeated lock recovery on rebooted hosts. If lockd sends a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state number is received, there is no way for lockd or rpc.statd to distinguish that stale SM_NOTIFY from an actual reboot. Thus lock recovery could be performed after the rebooted host has already started reclaiming locks, and those locks will be lost. We could change /etc/init.d/nfslock so it always modprobes lockd.ko before starting rpc.statd. However, if lockd.ko is ever unloaded and reloaded, we are back at square one, since the NSM state is not preserved across an unload/reload cycle. This may happen frequently on clients that use automounter. A period of NFS inactivity causes lockd.ko to be unloaded, and the kernel loses its NSM state setting. Instead, let's use the fact that rpc.statd plants the local system's NSM state in every SM_MON (and SM_UNMON) reply. lockd performs a synchronous SM_MON upcall to the local rpc.statd _before_ sending its first NLM request to a new remote. This would permit rpc.statd to provide the current NSM state to lockd, even after lockd.ko had been unloaded and reloaded. Note that NLMPROC_LOCK arguments are constructed before the nsm_monitor() call, so we have to rearrange argument construction very slightly to make this all work out. And, the kernel appears to treat NSM state as a u32 (see struct nlm_args and nsm_res). Make nsm_local_state a u32 as well, to ensure we don't get bogus comparison results. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not availableChuck Lever
Clear "ret" if the error return from svc_create_xprt(AF_INET6) was -EAFNOSUPORT. Otherwise, callback start-up will succeed, but nfs_callback_up() will return -EAFNOSUPPORT anyway, and the first NFSv4 mount attempt after a reboot will fail. Bug introduced by commit f738f517 in 2.6.30-rc1. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Return error code from nfs_callback_up() to user spaceChuck Lever
If the kernel cannot start the NFSv4 callback service during a mount request, it returns -ENOMEM to user space, resulting in this message: mount.nfs4: Cannot allocate memory Adjust nfs_alloc_client() and nfs_get_client() to pass NFSv4 callback start-up errors back to user space so a less mysterious error message can be displayed by the mount command. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: Do not display the setting of the "intr" mount optionChuck Lever
The "intr" mount option has been deprecated for a while, but /proc/mounts continues to display "nointr" whether "intr" or "nointr" has been specified for a mount point. Since these options do not have any effect, simply do not display them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17NFS: add support for splice writesSuresh Jayaraman
Adds support for splice writes. It effectively calls generic_file_splice_write() to do the writes. We need not worry about O_APPEND case as the combination of splice() writes and O_APPEND is disallowed. This patch propagates NFS write errors back to the caller. The number of bytes written via splice are being added to NFSIO_NORMALWRITTENBYTES as these are effectively cached writes. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17Merge commit 'linux-pnfs/nfs41-for-2.6.31' into nfsv41-for-2.6.31Trond Myklebust
2009-06-17jbd2: clean up jbd2_journal_try_to_free_buffers()Hisashi Hifumi
This patch reverts 3f31fddf, which is no longer needed because if a race between freeing buffer and committing transaction functionality occurs and dio gets error, currently dio falls back to buffered IO due to the commit 6ccfa806. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-17nfs41: Backchannel: CB_SEQUENCE validationRicardo Labiaga
Validates the callback's sessionID, the slot number, and the sequence ID. Increments the slot's sequence. Detects replays, but simply prints a debug message (if debugging is enabled since we don't yet implement a duplicate request cache for the backchannel. This should not present a problem, since only idempotent callbacks are currently implemented. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Backchannel: Be more obvious about the return value] [nfs41: Backchannel: dprink in host order] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17nfs41: Backchannel: New find_client_with_session()Ricardo Labiaga
Finds the 'struct nfs_client' that matches the server's address, major version number, and session ID. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17nfs41: Backchannel: Add a backchannel slot table to the sessionRicardo Labiaga
Defines a new 'struct nfs4_slot_table' in the 'struct nfs4_session' for use by the backchannel. Initializes, resets, and destroys the backchannel slot table in the same manner the forechannel slot table is initialized, reset, and destroyed. The sequenceid for each slot in the backchannel slot table is initialized to 0, whereas the forechannel slotid's sequenceid is set to 1. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17nfs41: Backchannel: Refactor nfs4_init_slot_table()Ricardo Labiaga
Generalize nfs4_init_slot_table() so it can be used to initialize the backchannel slot table in addition to the forechannel slot table. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>