summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-01-09udf: Cleanup calling convention of inode_getblk()Jan Kara
inode_getblk() always returned NULL and passed results in its parameters. Make the function return something useful - found block number. Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09ext2: Fix error handling on inode bitmap corruptionJan Kara
When insert_inode_locked() fails in ext2_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since the inode does not have I_NEW set. Fix the problem by informing about filesystem error and jumping to fail: (instead of fail_drop:) which doesn't call unlock_new_inode(). Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09ext3: Fix error handling on inode bitmap corruptionJan Kara
When insert_inode_locked() fails in ext3_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since inode does not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:) which declares filesystem error and does not call unlock_new_inode(). Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09ext3: replace ll_rw_block with other functionsZheng Liu
ll_rw_block() is deprecated. Thus we replace it with other functions. CC: Jan Kara <jack@suse.cz> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-08ext[34]: avoid i_nlink warnings triggered by drop_nlink/inc_nlink kludge in ↵Al Viro
symlink() Both ext3 and ext4 put the half-created symlink inode into the orphan list for a while (see the comment in ext[34]_symlink() for gory details). Then, if everything went fine, they pull it out of the orphan list and bump the link count back to 1. The thing is, inc_nlink() is going to complain about seeing somebody changing i_nlink from 0 to 1. With a good reason, since normally something like that is a bug. Explicit set_nlink(inode, 1) does the same thing as inc_nlink() here, but it does *not* complain - exactly because it should be usable in strange situations like this one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08exofs: oops after late failure in mountAl Viro
We have already set ->s_root, so ->put_super() is going to be called. Freeing ->s_fs_info is a bloody bad idea when it's going to be dereferenced very shortly... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08devpts: fix double-free on mount failureAl Viro
devpts_kill_sb() is called even if devpts_fill_super() fails; we should not do that kfree() in the latter, especially not with ->s_fs_info left pointing to freed object. Double kfree() is a Bad Thing(tm)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
2012-01-08Merge branch 'pm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits) PM / Hibernate: Implement compat_ioctl for /dev/snapshot PM / Freezer: fix return value of freezable_schedule_timeout_killable() PM / shmobile: Allow the A4R domain to be turned off at run time PM / input / touchscreen: Make st1232 use device PM QoS constraints PM / QoS: Introduce dev_pm_qos_add_ancestor_request() PM / shmobile: Remove the stay_on flag from SH7372's PM domains PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode PM: Drop generic_subsys_pm_ops PM / Sleep: Remove forward-only callbacks from AMBA bus type PM / Sleep: Remove forward-only callbacks from platform bus type PM: Run the driver callback directly if the subsystem one is not there PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412. PM / Sleep: Merge internal functions in generic_ops.c PM / Sleep: Simplify generic system suspend callbacks PM / Hibernate: Remove deprecated hibernation snapshot ioctls PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled() ARM: S3C64XX: Implement basic power domain support PM / shmobile: Use common always on power domain governor ... Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused XBT_FORCE_SLEEP bit
2012-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: local functions should be static GFS2: We only need one ACL getting function GFS2: Fix multi-block allocation GFS2: decouple quota allocations from block allocations GFS2: split function rgblk_search GFS2: Fix up "off by one" in the previous patch GFS2: move toward a generic multi-block allocator GFS2: O_(D)SYNC support for fallocate GFS2: remove vestigial al_alloced GFS2: combine gfs2_alloc_block and gfs2_alloc_di GFS2: Add non-try locks back to get_local_rgrp GFS2: f_ra is always valid in dir readahead function GFS2: Fix very unlikley memory leak in ACL xattr code GFS2: More automated code analysis fixes GFS2: Add readahead to sequential directory traversal GFS2: Fix up REQ flags
2012-01-08Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (22 commits) xfs: mark the xfssyncd workqueue as non-reentrant xfs: simplify xfs_qm_detach_gdquots xfs: fix acl count validation in xfs_acl_from_disk() xfs: remove unused XBT_FORCE_SLEEP bit xfs: remove XFS_QMOPT_DQSUSER xfs: kill xfs_qm_idtodq xfs: merge xfs_qm_dqinit_core into the only caller xfs: add a xfs_dqhold helper xfs: simplify xfs_qm_dqattach_grouphint xfs: nest qm_dqfrlist_lock inside the dquot qlock xfs: flatten the dquot lock ordering xfs: implement lazy removal for the dquot freelist xfs: remove XFS_DQ_INACTIVE xfs: cleanup xfs_qm_dqlookup xfs: cleanup dquot locking helpers xfs: remove the sync_mode argument to xfs_qm_dqflush_all xfs: remove xfs_qm_sync xfs: make sure to really flush all dquots in xfs_qm_quotacheck xfs: untangle SYNC_WAIT and SYNC_TRYLOCK meanings for xfs_qm_dqflush xfs: remove the lid_size field in struct log_item_desc ... Fix up trivial conflict in fs/xfs/xfs_sync.c
2012-01-08Merge branch 'for-linus2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits) reiserfs: Properly display mount options in /proc/mounts vfs: prevent remount read-only if pending removes vfs: count unlinked inodes vfs: protect remounting superblock read-only vfs: keep list of mounts for each superblock vfs: switch ->show_options() to struct dentry * vfs: switch ->show_path() to struct dentry * vfs: switch ->show_devname() to struct dentry * vfs: switch ->show_stats to struct dentry * switch security_path_chmod() to struct path * vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb vfs: trim includes a bit switch mnt_namespace ->root to struct mount vfs: take /proc/*/mounts and friends to fs/proc_namespace.c vfs: opencode mntget() mnt_set_mountpoint() vfs: spread struct mount - remaining argument of next_mnt() vfs: move fsnotify junk to struct mount vfs: move mnt_devname vfs: move mnt_list to struct mount vfs: switch pnode.h macros to struct mount * ...
2012-01-08ore: Must support none-PAGE-aligned IOBoaz Harrosh
NFS might send us offsets that are not PAGE aligned. So we must read in the reminder of the first/last pages, in cases we need it for Parity calculations. We only add an sg segments to read the partial page. But we don't mark it as read=true because it is a lock-for-write page. TODO: In some cases (IO spans a single unit) we can just adjust the raid_unit offset/length, but this is left for later Kernels. [Bug in 3.2.0 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-01-07Merge branch 'driver-core-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07NFSv4: Save the owner/group name string when doing openTrond Myklebust
...so that we can do the uid/gid mapping outside the asynchronous RPC context. This fixes a bug in the current NFSv4 atomic open code where the client isn't able to determine what the true uid/gid fields of the file are, (because the asynchronous nature of the OPEN call denies it the ability to do an upcall) and so fills them with default values, marking the inode as needing revalidation. Unfortunately, in some cases, the VFS will do some additional sanity checks on the file, and may override the server's decision to allow the open because it sees the wrong owner/group fields. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06reiserfs: Properly display mount options in /proc/mountsJan Kara
Make reiserfs properly display mount options in /proc/mounts. CC: reiserfs-devel@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: prevent remount read-only if pending removesMiklos Szeredi
If there are any inodes on the super block that have been unlinked (i_nlink == 0) but have not yet been deleted then prevent the remounting the super block read-only. Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: count unlinked inodesMiklos Szeredi
Add a new counter to the superblock that keeps track of unlinked but not yet deleted inodes. Do not WARN_ON if set_nlink is called with zero count, just do a ratelimited printk. This happens on xfs and probably other filesystems after an unclean shutdown when the filesystem reads inodes which already have zero i_nlink. Reported by Christoph Hellwig. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: protect remounting superblock read-onlyMiklos Szeredi
Currently remouting superblock read-only is racy in a major way. With the per mount read-only infrastructure it is now possible to prevent most races, which this patch attempts. Before starting the remount read-only, iterate through all mounts belonging to the superblock and if none of them have any pending writes, set sb->s_readonly_remount. This indicates that remount is in progress and no further write requests are allowed. If the remount succeeds set MS_RDONLY and reset s_readonly_remount. If the remounting is unsuccessful just reset s_readonly_remount. This can result in transient EROFS errors, despite the fact the remount failed. Unfortunately hodling off writes is difficult as remount itself may touch the filesystem (e.g. through load_nls()) which would deadlock. A later patch deals with delayed writes due to nlink going to zero. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: keep list of mounts for each superblockMiklos Szeredi
Keep track of vfsmounts belonging to a superblock. List is protected by vfsmount_lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: switch ->show_options() to struct dentry *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: switch ->show_path() to struct dentry *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: switch ->show_devname() to struct dentry *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: switch ->show_stats to struct dentry *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06switch security_path_chmod() to struct path *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06vfs: prefer ->dentry->d_sb to ->mnt->mnt_sbAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into ZAl Viro
2012-01-06Merge branch 'pstore' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux * 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore: gracefully handle NULL pstore_info functions pstore: pass reason to backend write callback
2012-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1958 commits) net: pack skb_shared_info more efficiently net_sched: red: split red_parms into parms and vars net_sched: sfq: extend limits cnic: Improve error recovery on bnx2x devices cnic: Re-init dev->stats_addr after chip reset net_sched: Bug in netem reordering bna: fix sparse warnings/errors bna: make ethtool_ops and strings const xgmac: cleanups net: make ethtool_ops const vmxnet3" make ethtool ops const xen-netback: make ops structs const virtio_net: Pass gfp flags when allocating rx buffers. ixgbe: FCoE: Add support for ndo_get_fcoe_hbainfo() call netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call igb: reset PHY after recovering from PHY power down igb: add basic runtime PM support igb: Add support for byte queue limits. e1000: cleanup CE4100 MDIO registers access e1000: unmap ce4100_gbe_mdio_base_virt in e1000_remove ...
2012-01-06Merge branch 'driver-core-next' into Linux 3.2Greg Kroah-Hartman
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) sched/tracing: Add a new tracepoint for sleeptime sched: Disable scheduler warnings during oopses sched: Fix cgroup movement of waking process sched: Fix cgroup movement of newly created process sched: Fix cgroup movement of forking process sched: Remove cfs bandwidth period check in tg_set_cfs_period() sched: Fix load-balance lock-breaking sched: Replace all_pinned with a generic flags field sched: Only queue remote wakeups when crossing cache boundaries sched: Add missing rcu_dereference() around ->real_parent usage [S390] fix cputime overflow in uptime_proc_show [S390] cputime: add sparse checking and cleanup sched: Mark parent and real_parent as __rcu sched, nohz: Fix missing RCU read lock sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer sched, nohz: Fix the idle cpu check in nohz_idle_balance sched: Use jump_labels for sched_feat sched/accounting: Fix parameter passing in task_group_account_field sched/accounting: Fix user/system tick double accounting sched/accounting: Re-use scheduler statistics for the root cgroup ... Fix up conflicts in - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h usecs_to_cputime64() vs the sparse cleanups - kernel/sched/fair.c, kernel/time/tick-sched.c scheduler changes in multiple branches
2012-01-06fs/9p: iattr_valid flags are kernel internal flags map them to 9p values.Aneesh Kumar K.V
Kernel internal values can change, add protocol values for these constant and use them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-06ore: fix BUG_ON, too few sgs when readingBoaz Harrosh
When reading RAID5 files, in rare cases, we calculated too few sg segments. There should be two extra for the beginning and end partial units. Also "too few sg segments" should not be a BUG_ON there is all the mechanics in place to handle it, as a short read. So just return -ENOMEM and the rest of the code will gracefully split the IO. [Bug in 3.2.0 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-01-06ore: Fix crash in case of an IO error.Boaz Harrosh
The users of ore_check_io() expect the reported device (In case of error) to be indexed relative to the passed-in ore_components table, and not the logical dev index. This causes a crash inside objlayoutdriver in case of an IO error. [Bug in 3.2.0 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-01-06ore: FIX breakage when MISC_FILESYSTEMS is not setBoaz Harrosh
As Reported by Randy Dunlap When MISC_FILESYSTEMS is not enabled and NFS4.1 is: fs/built-in.o: In function `objio_alloc_io_state': objio_osd.c:(.text+0xcb525): undefined reference to `ore_get_rw_state' fs/built-in.o: In function `_write_done': objio_osd.c:(.text+0xcb58d): undefined reference to `ore_check_io' fs/built-in.o: In function `_read_done': ... When MISC_FILESYSTEMS, which is more of a GUI thing then anything else, is not selected. exofs/Kconfig is never examined during Kconfig, and it can not do it's magic stuff to automatically select everything needed. We must split exofs/Kconfig in two. The ore one is always included. And the exofs one is left in it's old place in the menu. [Needed for the 3.2.0 Kernel] CC: Stable Tree <stable@kernel.org> Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-01-06NFS: Remove pNFS bloat from the generic write pathTrond Myklebust
We have no business doing any this in the standard write release path. Get rid of it, and put it in the pNFS layer. Also, while we're at it, get rid of the completely bogus unlock/relock semantics that were present in nfs_writeback_release_full(). It is not only unnecessary, but actually dangerous to release the write lock just in order to take it again in nfs_page_async_flush(). Better just to open code the pgio operations in a pnfs helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06pnfs-obj: Must return layout on IO errorBoaz Harrosh
As mandated by the standard. In case of an IO error, a pNFS objects layout driver must return it's layout. This is because all device errors are reported to the server as part of the layout return buffer. This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR is done, through a bit flag on the pnfs_layoutdriver_type->flags member. The flag is set by the layout driver that wants a layout_return preformed at pnfs_ld_{write,read}_done in case of an error. (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr because this code is never called outside of pnfs.c and pnfs IO paths) Without this patch 3.[0-2] Kernels leak memory and have an annoying WARN_ON after every IO error utilizing the pnfs-obj driver. [This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06pnfs-obj: pNFS errors are communicated on iodata->pnfs_errorBoaz Harrosh
Some time along the way pNFS IO errors were switched to communicate with a special iodata->pnfs_error member instead of the regular RPC members. But objlayout was not switched over. Fix that! Without this fix any IO error is hanged, because IO is not switched to MDS and pages are never cleared or read. [Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05vfs: fix up ENOIOCTLCMD error handlingLinus Torvalds
We're doing some odd things there, which already messes up various users (see the net/socket.c code that this removes), and it was going to add yet more crud to the block layer because of the incorrect error code translation. ENOIOCTLCMD is not an error return that should be returned to user mode from the "ioctl()" system call, but it should *not* be translated as EINVAL ("Invalid argument"). It should be translated as ENOTTY ("Inappropriate ioctl for device"). That EINVAL confusion has apparently so permeated some code that the block layer actually checks for it, which is sad. We continue to do so for now, but add a big comment about how wrong that is, and we should remove it entirely eventually. In the meantime, this tries to keep the changes localized to just the EINVAL -> ENOTTY fix, and removing code that makes it harder to do the right thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-05NFS: Cache state owners after files are closedChuck Lever
Servers have a finite amount of memory to store NFSv4 open and lock owners. Moreover, servers may have a difficult time determining when they can reap their state owner table, thanks to gray areas in the NFSv4 protocol specification. Thus clients should be careful to reuse state owners when possible. Currently Linux is not too careful. When a user has closed all her files on one mount point, the state owner's reference count goes to zero, and it is released. The next OPEN allocates a new one. A workload that serially opens and closes files can run through a large number of open owners this way. When a state owner's reference count goes to zero, slap it onto a free list for that nfs_server, with an expiry time. Garbage collect before looking for a state owner. This makes state owners for active users available for re-use. Now that there can be unused state owners remaining at umount time, purge the state owner free list when a server is destroyed. Also be sure not to reclaim unused state owners during state recovery. This change has benefits for the client as well. For some workloads, this approach drops the number of OPEN_CONFIRM calls from the same as the number of OPEN calls, down to just one. This reduces wire traffic and thus open(2) latency. Before this patch, untarring a kernel source tarball shows the OPEN_CONFIRM call counter steadily increasing through the test. With the patch, the OPEN_CONFIRM count remains at 1 throughout the entire untar. As long as the expiry time is kept short, I don't think garbage collection should be terribly expensive, although it does bounce the clp->cl_lock around a bit. [ At some point we should rationalize the use of the nfs_server ->destroy method. ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Trond: Fixed a garbage collection race and a few efficiency issues] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05fs/9p: We should not allocate a new inode when creating hardlines.Aneesh Kumar K.V
Don't do new_inode_from fid in case of hardlink creation. This ensures that link count for hardlink files get updated properly. Earlier link count was not updated on removing a hardlink with cache mode enabled. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-05fs/9p: v9fs_stat2inode should update suid/sgid bits.Aneesh Kumar K.V
Create a new helper that update the permission bits and use that, instead of opencoding the logic. Reported and bisected by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-059p: Reduce object size with CONFIG_NET_9P_DEBUGJoe Perches
Reduce object size by deduplicating formats. Use vsprintf extension %pV. Rename P9_DPRINTK uses to p9_debug, align arguments. Add function for _p9_debug and macro to add __func__. Add missing "\n"s to p9_debug uses. Remove embedded function names as p9_debug adds it. Remove P9_EPRINTK macro and convert use to pr_<level>. Add and use pr_fmt and pr_<level>. $ size fs/9p/built-in.o* text data bss dec hex filename 62133 984 16000 79117 1350d fs/9p/built-in.o.new 67342 984 16928 85254 14d06 fs/9p/built-in.o.old $ size net/9p/built-in.o* text data bss dec hex filename 88792 4148 22024 114964 1c114 net/9p/built-in.o.new 94072 4148 23232 121452 1da6c net/9p/built-in.o.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-05fs/9p: check schedule_timeout_interruptible return valueJim Garlick
In v9fs_file_do_lock() we need to check return value of schedule_timeout_interruptible() and exit the loop when it returns nonzero, otherwise the loop is not really interruptible and after the signal, the loop is no longer throttled by P9_LOCK_TIMEOUT. Signed-off-by: Jim Garlick <garlick.jim@gmail.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-05NFS: Clean up nfs4_find_state_owners_locked()Chuck Lever
There's no longer a need to check the so_server field in the state owner, because nowadays the RB tree we search for state owners contains owners for that only server. Make nfs4_find_state_owners_locked() use the same tree searching logic as nfs4_insert_state_owner_locked(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05NFSv4: include bitmap in nfsv4 get acl dataAndy Adamson
The NFSv4 bitmap size is unbounded: a server can return an arbitrary sized bitmap in an FATTR4_WORD0_ACL request. Replace using the nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data xdr length to the (cached) acl page data. This is a general solution to commit e5012d1f "NFSv4.1: update nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved. Cc: stable@kernel.org Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05nfs: fix a minor do_div portability issueChris Metcalf
This change modifies filelayout_get_dense_offset() to use the functions in math64.h and thus avoid a 32-bit platform compile error trying to use do_div() on an s64 type. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05NFSv4.1: cleanup comment and debug printkAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05NFSv4.1: change nfs4_free_slot parameters for dynamic slotsAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05NFSv4.1: cleanup init and reset of session slot tablesAndy Adamson
We are either initializing or resetting a session. Initialize or reset the session slot tables accordingly. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>