summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2007-05-08namei.c: remove utterly outdated commentChristoph Hellwig
We don't have a routine called namei() anymore since at least 2.3.x, and the comment is just totally out of sync with the current lookup logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08vfs: remove superflous sb == NULL checksChristoph Hellwig
inode->i_sb is always set, not need to check for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08proc: remove pathetic ->deleted WARN_ONAlexey Dobriyan
WARN_ON(de && de->deleted); is sooo unreliable. Why? proc_lookup remove_proc_entry =========== ================= lock_kernel(); spin_lock(&proc_subdir_lock); [find proc entry] spin_unlock(&proc_subdir_lock); spin_lock(&proc_subdir_lock); [find proc entry] proc_get_inode ============== WARN_ON(de && de->deleted); ... if (!atomic_read(&de->count)) free_proc_entry(de); else de->deleted = 1; So, if you have some strange oops [1], and doesn't see this WARN_ON it means nothing. [1] try_module_get() of module which doesn't exist, two lines below should suffice, or not? Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Fix race between proc_readdir and remove_proc_entryDarrick J. Wong
Fix the following race: proc_readdir remove_proc_entry ============ ================= spin_lock(&proc_subdir_lock); [choose PDE to start filldir from] spin_unlock(&proc_subdir_lock); spin_lock(&proc_subdir_lock); [find PDE] [free PDE, refcount is 0] spin_unlock(&proc_subdir_lock); /* boom */ if (filldir(dirent, de->name, ... [de_put on error path --adobriyan] Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Fix race between proc_get_inode() and remove_proc_entry()Alexey Dobriyan
proc_lookup remove_proc_entry =========== ================= lock_kernel(); spin_lock(&proc_subdir_lock); [find PDE with refcount 0] spin_unlock(&proc_subdir_lock); spin_lock(&proc_subdir_lock); [find PDE with refcount 0] [check refcount and free PDE] spin_unlock(&proc_subdir_lock); proc_get_inode: de_get(de); /* boom */ Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08add filesystem subtype supportMiklos Szeredi
There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the filesystem is fuse based or not. So there's a conflict of interest in how this should be represented in fstab, mtab and /proc/mounts. The current scheme is to encode the real filesystem type in the mount source. So an sshfs mount looks like this: sshfs#user@server:/ /mnt/server fuse rw,nosuid,nodev,... This url-ish syntax works OK for sshfs and similar filesystems. However for block device based filesystems (ntfs-3g, zfs) it doesn't work, since the kernel expects the mount source to be a real device name. A possibly better scheme would be to encode the real type in the type field as "type.subtype". So fuse mounts would look like this: /dev/hda1 /mnt/windows fuseblk.ntfs-3g rw,... user@server:/ /mnt/server fuse.sshfs rw,nosuid,nodev,... This patch adds the necessary code to the kernel so that this can be correctly displayed in /proc/mounts. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08oss: strlcpy is smart enoughJean Delvare
strlcpy already accounts for the trailing zero in its length computation, so there is no need to substract one to the buffer size. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08epoll: optimizations and cleanupsDavide Libenzi
Epoll is doing multiple passes over the ready set at the moment, because of the constraints over the f_op->poll() call. Looking at the code again, I noticed that we already hold the epoll semaphore in read, and this (together with other locking conditions that hold while doing an epoll_wait()) can lead to a smarter way [1] to "ship" events to userspace (in a single pass). This is a stress application that can be used to test the new code. It spwans multiple thread and call epoll_wait() and epoll_ctl() from many threads. Stress tested on my dual Opteron 254 w/out any problems. http://www.xmailserver.org/totalmess.c This is not a benchmark, just something that tries to stress and exploit possible problems with the new code. Also, I made a stupid micro-benchmark: http://www.xmailserver.org/epwbench.c [1] Considering that epoll must be thread-safe, there are five ways we can be hit during an epoll_wait() transfer loop (ep_send_events()): 1) The epoll fd going away and calling ep_free This just can't happen, since we did an fget() in sys_epoll_wait 2) An epoll_ctl(EPOLL_CTL_DEL) This can't happen because epoll_ctl() gets ep->sem in write, and we're holding it in read during ep_send_events() 3) An fd stored inside the epoll fd going away This can't happen because in eventpoll_release_file() we get ep->sem in write, and we're holding it in read during ep_send_events() 4) Another epoll_wait() happening on another thread They both can be inside ep_send_events() at the same time, we get (splice) the ready-list under the spinlock, so each one will get its own ready list. Note that an fd cannot be at the same time inside more than one ready list, because ep_poll_callback() will not re-queue it if it sees it already linked: if (ep_is_linked(&epi->rdllink)) goto is_linked; Another case that can happen, is two concurrent epoll_wait(), coming in with a userspace event buffer of size, say, ten. Suppose there are 50 event ready in the list. The first epoll_wait() will "steal" the whole list, while the second, seeing no events, will go to sleep. But at the end of ep_send_events() in the first epoll_wait(), we will re-inject surplus ready fds, and we will trigger the proper wake_up to the second epoll_wait(). 5) ep_poll_callback() hitting us asyncronously This is the tricky part. As I said above, the ep_is_linked() test done inside ep_poll_callback(), will guarantee us that until the item will result linked to a list, ep_poll_callback() will not try to re-queue it again (read, write data on any of its members). When we do a list_del() in ep_send_events(), the item will still satisfy the ep_is_linked() test (whatever data is written in prev/next, it'll never be its own pointer), so ep_poll_callback() will still leave us alone. It's only after the eventual smp_mb()+INIT_LIST_HEAD(&epi->rdllink) that it'll become visible to ep_poll_callback(), but at the point we're already past it. [akpm@osdl.org: 80 cols] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08the scheduled removal of OBSOLETE_OSS optionsAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08ext3: dirindex error pointer issuesDmitriy Monakhov
- ext3_dx_find_entry() exit with out setting proper error pointer - do_split() exit with out setting proper error pointer it is realy painful because many callers contain folowing code: de = do_split(handle,dir, &bh, frame, &hinfo, &retval); if (!(de)) return retval; <<< WOW retval wasn't changed by do_split(), so caller failed <<< but return SUCCESS :) - Rearrange do_split() error path. Current error path is realy ugly, all this up and down jump stuff doesn't make code easy to understand. [dmonakhov@sw.ru: fix annoying fake error messages] Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Optimize timespec_trunc()Eric Dumazet
The first thing done by timespec_trunc() is : if (gran <= jiffies_to_usecs(1) * 1000) This should really be a test against a constant known at compile time. Alas, it isnt. jiffies_to_usec() was unilined so C compiler emits a function call and a multiply to compute : a CONSTANT. mov $0x1,%edi mov %rbx,0xffffffffffffffe8(%rbp) mov %r12,0xfffffffffffffff0(%rbp) mov %edx,%ebx mov %rsi,0xffffffffffffffc8(%rbp) mov %rsi,%r12 callq ffffffff80232010 <jiffies_to_usecs> imul $0x3e8,%eax,%eax cmp %ebx,%eax This patch reorders kernel/time.c a bit so that jiffies_to_usecs() is defined before timespec_trunc() so that compiler now generates : cmp $0x3d0900,%edx (HZ=250 on my machine) This gives a better code (timespec_trunc() becoming a leaf function), and shorter kernel size as well. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08init dma masks in pnp_devDavid Brownell
PNP now initializes device dma masks, which prevents oopses when generic dma calls are made using pnp device nodes. This assumes PNP only uses ISA DMA, with 24 bit addresses; and that it's safe to init those masks for all devices (rather than finding out which devices have been assigned DMA channels, and handling only those). Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Adam Belay <abelay@novell.com> Cc: Jaroslav Kysela <perex@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08rcutorture: Mark rcu_torture_init as __initJosh Triplett
The corresponding rcu_torture_cleanup cannot get marked as __exit, because rcu_torture_init uses it to clean up if init fails. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Merge sys_clone()/sys_unshare() nsproxy and namespace handlingBadari Pulavarty
sys_clone() and sys_unshare() both makes copies of nsproxy and its associated namespaces. But they have different code paths. This patch merges all the nsproxy and its associated namespace copy/clone handling (as much as possible). Posted on container list earlier for feedback. - Create a new nsproxy and its associated namespaces and pass it back to caller to attach it to right process. - Changed all copy_*_ns() routines to return a new copy of namespace instead of attaching it to task->nsproxy. - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines. - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON() just incase. - Get rid of all individual unshare_*_ns() routines and make use of copy_*_ns() instead. [akpm@osdl.org: cleanups, warning fix] [clg@fr.ibm.com: remove dup_namespaces() declaration] [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval] [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08exec: fix remove_arg_zeroNick Piggin
Petr Tesarik discovered a problem in remove_arg_zero(). He writes: When a script is loaded, load_script() replaces argv[0] with the name of the interpreter and the filename passed to the exec syscall. However, there is no guarantee that the length of the interpreter name plus the length of the filename is greater than the length of the original argv[0]. If the difference happens to cross a page boundary, setup_arg_pages() will call put_dirty_page() [aka install_arg_page()] with an address outside the VMA. Therefore, remove_arg_zero() must free all pages which would be unused after the argument is removed. So, rewrite the remove_arg_zero function without gotos, with a few comments, and with the commonly used explicit index/offset. This fixes the problem and makes it easier to understand as well. [a.p.zijlstra@chello.nl: add comment] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Cap shmmax at INT_MAX in compat shminfoGuy Streeter
The value of shmmax may be larger than will fit in the struct used by the 32bit compat version of sys_shmctl. This change mirrors what the normal sys_shmctl does when called with the old IPC_INFO command. Signed-off-by: Guy Streeter <streeter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Use stop_machine_run in the Intel RNG driverPrarit Bhargava
Replace call_smp_function with stop_machine_run in the Intel RNG driver. CPU A has done read_lock(&lock) CPU B has done write_lock_irq(&lock) and is waiting for A to release the lock. A third CPU calls call_smp_function and issues the IPI. CPU A takes CPU C's IPI. CPU B is waiting with interrupts disabled and does not see the IPI. CPU C is stuck waiting for CPU B to respond to the IPI. Deadlock. The solution is to use stop_machine_run instead of call_smp_function (call_smp_function should not be called in situations where the CPUs may be suspended). [haruo.tomita@toshiba.co.jp: fix a typo in mod_init()] [haruo.tomita@toshiba.co.jp: fix memory leak] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: "Tomita, Haruo" <haruo.tomita@toshiba.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08IRQ: add __must_check to request_irqMonakhov Dmitriy
This could help to find buggy drivers where request_irq return value wasn't checked. There's just no reason to ignore errors which can and do occur. Anyone who got warning during compilation have to realise what it is't realy safe code. Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08kconfig: centralize the selection of semaphore debugging in lib/Kconfig.debugRobert P. J. Day
Remove the Kconfig selection of semaphore debugging from the ALPHA and FRV Kconfig files, and centralize it in lib/Kconfig.debug. There doesn't seem to be much point in letting individual architectures independently define the same Kconfig option when it can just as easily be put in a single Kconfig file and made dependent on a subset of architectures. that way, at least the option shows up in the same relative location in the menu each time. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: David Howells <dhowells@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08reiserfs: correct misspelled "REISERFS_PROC_INFO" to "CONFIG_REISERFS_PROC_INFO"Robert P. J. Day
Correct the misspelling of the preprocessor check of a Kconfig option to refer to CONFIG_REISERFS_PROC_INFO and not just the incorrect REISERFS_PROC_INFO. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08module: use kreallocPekka Enberg
This converts an open-coded krealloc() to use the shiny new API. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08reiserfs: shrink superblock if no xattrsAlexey Dobriyan
This makes in-core superblock fit into one cacheline here. Before: struct dentry * xattr_root; /* 124 4 */ /* --- cacheline 1 boundary (128 bytes) --- */ struct rw_semaphore xattr_dir_sem; /* 128 12 */ int j_errno; /* 140 4 */ }; /* size: 144, cachelines: 2 */ /* sum members: 142, holes: 1, sum holes: 2 */ /* last cacheline: 16 bytes */ After: int j_errno; /* 124 4 */ /* --- cacheline 1 boundary (128 bytes) --- */ }; /* size: 128, cachelines: 1 */ /* sum members: 126, holes: 1, sum holes: 2 */ Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Apple SMC driver (hardware monitoring and control)Nicolas Boichat
This driver provides support for the Apple System Management Controller, which provides an accelerometer (Apple Sudden Motion Sensor), light sensors, temperature sensors, keyboard backlight control and fan control. Only Intel-based Apple's computers are supported (MacBook Pro, MacBook, MacMini). [bunk@stusta.de: make drivers/hwmon/applesmc.c:backlight_work stati] [khali@linux-fr.org: fix temperature attribute file names] Signed-off-by: Nicolas Boichat <nicolas@boichat.ch> Cc: Jean Delvare <khali@linux-fr.org> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Fix compilation of drivers with -O0Michal Schmidt
It is sometimes useful to compile individual drivers with optimization disabled for easier debugging. Currently drivers which use htonl() and similar functions don't compile with -O0. This patch fixes it. It also removes obsolete and misleading comments. This header is not for userspace, so we don't have to care about strange programs these comments mention. (akpm: -O0 probably isn't a good idea, but this code looks pretty crufty and unuseful) Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08init/do_mounts.c: proper prepare_namespace() prototypeAdrian Bunk
Add a proper protype for prepare_namespace() in include/linux/init.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08fix section mismatch warning in lib/swiotlb.cSam Ravnborg
kbuild spits outs following warning on a defconfig x86_64 build: WARNING: swiotlb.o - Section mismatch: reference to .init.text:swiotlb_init from __ksymtab between '__ksymtab_swiotlb_init' (at offset 0xa0) and '__ksymtab_swiotlb_free_coherent' This warning happens because the function swiotlb_init is marked __init and EXPORT_SYMBOL(). A 'git grep swiotlb_init' showed no users in drivers/ so remove the EXPORT_SYMBOL. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08scripts: kernel-doc whitespace cleanupRandy Dunlap
Whitespace cleanup only: convert some series of spaces to tabs. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08reiserfs: possible null pointer dereference during resizeDmitriy Monakhov
sb_read may return NULL, let's explicitly check it. If so free new bitmap blocks array, after this we may safely exit as it done above during bitmap allocation. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08freevxfs: possible null pointer dereference fixDmitriy Monakhov
sb_read may return NULL, so let's explicitly check it. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08is_power_of_2 in fs/block_dev.cVignesh Babu BM
Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2 Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08is_power_of_2 in fs/hfsVignesh Babu BM
Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2 Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08is_power_of_2 in fatVignesh Babu BM
Replacing (n & (n-1)) in the context of power of 2 checks with is_power_of_2 Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08drivers/char/hvc_console.c: cleanupsAdrian Bunk
- make needlessly global code static - remove the unused EXPORT_SYMBOL's Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08tty: Clarify documentation of ->write()Alan
The tty driver write method is different to the usual fops device write methods as the buffer is already in kernel space. Clarify the docs since someone writing a driver made that mistake. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08devpts: add fsnotify create eventFlorin Malita
Currently, devpts doesn't generate an fsnotify event upon pts creation because the regular vfs paths aren't involved. Deallocation, on the other hand, correctly generates a nameremove event thanks to the d_delete() invocation in devpts_pty_kill(). This patch adds the missing fsnotify_create() trigger in devpts_pty_new(). Signed-off-by: Florin Malita <fmalita@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08use use SEEK_MAX to validate user lseek argumentsChris Snook
Add SEEK_MAX and use it to validate lseek arguments from userspace. Signed-off-by: Chris Snook <csnook@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08use symbolic constants in generic lseek codeChris Snook
Convert magic numbers to SEEK_* values from fs.h Signed-off-by: Chris Snook <csnook@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Documentation: Ask driver writers to provide PM supportRafael J. Wysocki
Add a paragraph in Documentation/SubmittingDrivers requesting that the basic PM support be provided by new device drivers. Add two new documents in Documentation/power/ giving general instructions on debugging the suspend/resume functionality and testing the suspend and resume support in device drivers. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: David Brownell <david-b@pacbell.net> Cc: Nigel Cunningham <ncunningham@linuxmail.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Fix constant folding and poor optimization in byte swapping codeTrent Piepho
Constant folding does not work for the swabXX() byte swapping functions, and the C versions optimize poorly. Attempting to initialize a global variable to swab16(0x1234) or put something like "case swab32(42):" in a switch statement will not compile. It can work, swab.h just isn't doing it correctly. This patch fixes that. Contrary to the comment in asm-i386/byteorder.h, gcc does not recognize the "C" version of swab16 and turn it into efficient code. gcc can do this, just not with the current code. The simple function: u16 foo(u16 x) { return swab16(x); } Would compile to: movzwl %ax, %eax movl %eax, %edx shrl $8, %eax sall $8, %edx orl %eax, %edx With this patch, it will compile to: rolw $8, %ax I also attempted to document the maze different macros/inline functions that are used to create the final product. Signed-off-by: Trent Piepho <xyzzy@speakeasy.org> Cc: Francois-Rene Rideau <fare@tunes.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08softlockup: s/99/MAX_RT_PRIO/Oleg Nesterov
Don't use hardcoded 99 value, use MAX_RT_PRIO. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08freezer: task->exit_state should be treated as boleanOleg Nesterov
Except for BUG_ON() checks, we should not use EXIT_XXXX defines outside of exit/wait paths. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08ipmi: add pci remove handlingCorey Minyard
Add pci_remove handling to the driver, so it will clean up if the device is hot-removed. Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08ipmi: add new IPMI nmi watchdog handlingCorey Minyard
Convert over to the new NMI handling for getting IPMI watchdog timeouts via an NMI. This add config options to know if there is the ability to receive NMIs and if it has an NMI post processing call. Then it modifies the IPMI watchdog to take advantage of this so that it can know if an NMI comes in. It also adds testing that the IPMI NMI watchdog works. Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08ipmi: allow shared interruptsCorey Minyard
The IPMI driver used enable_irq and disable_irq when it got into situations where it couldn't allocate memory; it did this to avoid having the interrupt just lock the machine when it couldn't get memory to perform the transaction to disable the interrupt. This patch modifies the driver to not use disable_irq and enable_irq. It instead sends the messages to the BMC to perform this operation. It also makes sure interrupts are cleanly disabled when the interface is shut down and cleans up some shutdown things that are no longer necessary. Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08ipmi: add powerpc openfirmware sensingCorey Minyard
Add support for of_platform_driver to the ipmi_si module. When loading the module, the driver will be registered to of_platform. The driver will be probed for all devices with the type ipmi. It's supporting devices with compatible settings ipmi-kcs, ipmi-smic and ipmi-bt. Only ipmi-kcs could be tested. Signed-off-by: Christian Krafft <krafft@de.ibm.com> Acked-by: Heiko J Schick <schihei@de.ibm.com> Signed-off-by: Corey Minyard <minyard@acm.org> Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08mm: shrink parent dentries when shrinking slabAndrew Morton
Teach the dentry slab shrinker to aggressively shrink parent dentries when shrinking the dentry cache. This is done to attempt to improve the situation where the dentry slab cache gets a lot of internal fragmentation due to pages containing directory dentries. It is expected that this change will cause some of those dentries to be reaped earlier, and with less scanning. Needs careful testing. Cc: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08fix quadratic behavior of shrink_dcache_parent()Miklos Szeredi
The time shrink_dcache_parent() takes, grows quadratically with the depth of the tree under 'parent'. This starts to get noticable at about 10,000. These kinds of depths don't occur normally, and filesystems which invoke shrink_dcache_parent() via d_invalidate() seem to have other depth dependent timings, so it's not even easy to expose this problem. However with FUSE it's easy to create a deep tree and d_invalidate() will also get called. This can make a syscall hang for a very long time. This is the original discovery of the problem by Russ Cox: http://article.gmane.org/gmane.comp.file-systems.fuse.devel/3826 The following patch fixes the quadratic behavior, by optionally allowing prune_dcache() to prune ancestors of a dentry in one go, instead of doing it one at a time. Common code in dput() and prune_one_dentry() is extracted into a new helper function d_kill(). shrink_dcache_parent() as well as shrink_dcache_sb() are converted to use the ancestry-pruner option. Only for shrink_dcache_memory() is this behavior not desirable, so it keeps using the old algorithm. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Maneesh Soni <maneesh@in.ibm.com> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08reduce size of task_struct on 64-bit machinesWilliam Cohen
This past week I was playing around with that pahole tool (http://oops.ghostprotocols.net:81/acme/dwarves/) and looking at the size of various struct in the kernel. I was surprised by the size of the task_struct on x86_64, approaching 4K. I looked through the fields in task_struct and found that a number of them were declared as "unsigned long" rather than "unsigned int" despite them appearing okay as 32-bit sized fields. On x86_64 "unsigned long" ends up being 8 bytes in size and forces 8 byte alignment. Is there a reason there a reason they are "unsigned long"? The patch below drops the size of the struct from 3808 bytes (60 64-byte cachelines) to 3760 bytes (59 64-byte cachelines). A couple other fields in the task struct take a signficant amount of space: struct thread_struct thread; 688 struct held_lock held_locks[30]; 1680 CONFIG_LOCKDEP is turned on in the .config [akpm@linux-foundation.org: fix printk warnings] Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08ext2/3/4: fix file date underflow on ext2 3 filesystems on 64 bit systemsMarkus Rechberger
Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079 signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit 10000011110110100100111110111101 .. -2,082,844,739 10000011110110100100111110111101 .. 2,212,122,557 <- this currently gets stored on the disk but when converting it to a 64bit signed long value it loses its sign and becomes positive. Cc: Andreas Dilger <adilger@dilger.ca> Cc: <linux-ext4@vger.kernel.org> Andreas says: This patch is now treating timestamps with the high bit set as negative times (before Jan 1, 1970). This means we lose 1/2 of the possible range of timestamps (lopping off 68 years before unix timestamp overflow - now only 30 years away :-) to handle the extremely rare case of setting timestamps into the distant past. If we are only interested in fixing the underflow case, we could just limit the values to 0 instead of storing negative values. At worst this will skew the timestamp by a few hours for timezones in the far east (files would still show Jan 1, 1970 in "ls -l" output). That said, it seems 32-bit systems (mine at least) allow files to be set into the past (01/01/1907 works fine) so it seems this patch is bringing the x86_64 behaviour into sync with other kernels. On the plus side, we have a patch that is ready to add nanosecond timestamps to ext3 and as an added bonus adds 2 high bits to the on-disk timestamp so this extends the maximum date to 2242. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Allow access to /proc/$PID/fd after setuid()Alexey Dobriyan
/proc/$PID/fd has r-x------ permissions, so if process does setuid(), it will not be able to access /proc/*/fd/. This breaks fstatat() emulation in glibc. open("foo", O_RDONLY|O_DIRECTORY) = 4 setuid32(65534) = 0 stat64("/proc/self/fd/4/bar", 0xbfafb298) = -1 EACCES (Permission denied) Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-By: Kirill Korotaev <dev@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>