linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2008-02-01	[AUDIT] create context if auditing was ever enabled	Eric Paris
	Disabling audit at runtime by auditctl doesn't mean that we can stop allocating contexts for new processes; we don't want to miss them when that sucker is reenabled. (based on work from Al Viro in the RHEL kernel series) Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] clean up audit_receive_msg()	Eric Paris
	generally clean up audit_receive_msg() don't free random memory if selinux_sid_to_string fails for some reason. Move generic auditing to a helper function Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] make audit=0 really stop audit messages	Eric Paris
	Some audit messages (namely configuration changes) are still emitted even if the audit subsystem has been explicitly disabled. This patch turns those messages off as well. Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] break large execve argument logging into smaller messages	Eric Paris
	execve arguments can be quite large. There is no limit on the number of arguments and a 4G limit on the size of an argument. this patch prints those aruguments in bite sized pieces. a userspace size limitation of 8k was discovered so this keeps messages around 7.5k single arguments larger than 7.5k in length are split into multiple records and can be identified as aX[Y]= Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] include audit type in audit message when using printk	Eric Paris
	Currently audit drops the audit type when an audit message goes through printk instead of the audit deamon. This is a minor annoyance in that the audit type is no longer part of the message and the information the audit type conveys needs to be carried in, or derived from the message data. The attached patch includes the type number as part of the printk. Admittedly it isn't the type name that the audit deamon provides but I think this is better than dropping the type completely. Signed-pff-by: John Johansen <jjohansen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] do not panic on exclude messages in audit_log_pid_context()	Eric Paris
	If we fail to get an ab in audit_log_pid_context this may be due to an exclude rule rather than a memory allocation failure. If it was due to a memory allocation failue we would have already paniced and no need to do it again. Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] Add End of Event record	Eric Paris
	This patch adds an end of event record type. It will be sent by the kernel as the last record when a multi-record event is triggered. This will aid realtime analysis programs since they will now reliably know they have the last record to complete an event. The audit daemon filters this and will not write it to disk. Signed-off-by: Steve Grubb <sgrubb redhat com> Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] add session id to audit messages	Eric Paris
	In order to correlate audit records to an individual login add a session id. This is incremented every time a user logs in and is included in almost all messages which currently output the auid. The field is labeled ses= or oses= Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] collect uid, loginuid, and comm in OBJ_PID records	Eric Paris
	Add uid, loginuid, and comm collection to OBJ_PID records. This just gives users a little more information about the task that received a signal. pid is rather meaningless after the fact, and even though comm isn't great we can't collect exe reasonably on this code path for performance reasons. Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[AUDIT] return EINTR not ERESTART*	Eric Paris
	The syscall exit code will change ERESTART* kernel internal return codes to EINTR if it does not restart the syscall. Since we collect the audit info before that point we should fix those in the audit log as well. Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01	[PATCH] get rid of loginuid races	Al Viro
	Keeping loginuid in audit_context is racy and results in messier code. Taken to task_struct, out of the way of ->audit_context changes. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-02-01	[PATCH] switch audit_get_loginuid() to task_struct *	Al Viro
	all callers pass something->audit_context Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-02-01	futex: Add bitset conditional wait/wakeup functionality	Thomas Gleixner
	To allow the implementation of optimized rw-locks in user space, glibc needs a possibility to select waiters for wakeup depending on a bitset mask. This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is expecting an absolute timeout value instead of the relative one, which is used for the FUTEX_WAIT OP. FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is stored in the futex_q structure, which is used to enqueue the waiter into the hashed futex waitqueue. FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup function logically ANDs the bitset with the bitset stored in each waiters futex_q structure. If the result is zero (i.e. none of the set bits in the bitsets is matching), then the waiter is not woken up. If the result is not zero (i.e. one of the set bits in the bitsets is matching), then the waiter is woken. The bitset provided by the caller must be non zero. In case the provided bitset is zero the kernel returns EINVAL. Internaly the new OPs are only extensions to the existing FUTEX_WAIT and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits set into the futex_wait() and futex_wake() functions. Signed-off-by: Thomas Gleixner <tgxl@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01	futex: Remove warn on in return fixup path	Thomas Gleixner
	The WARN_ON() in the fixup return path of futex_lock_pi() can trigger with false positives. The following scenario happens: t1 holds the futex and t2 and t3 are blocked on the kernel side rt_mutex. t1 releases the futex (and the rt_mutex) and assigned t2 to be the next owner of the futex. t2 is interrupted and returns w/o acquiring the rt_mutex, before t1 can release the rtmutex. t1 releases the rtmutex and t3 becomes the pending owner of the rtmutex. t2 notices that it is the designated owner (user space variable) and fails to acquire the rt_mutex via trylock, because it is not allowed to steal the rt_mutex from t3. Now it looks at the rt_mutex pending owner (t3) and assigns the futex and the pi_state to it. During the fixup t4 steals the rtmutex from t3. t2 returns from the fixup and the owner of the rt_mutex has changed from t3 to t4. There is no need to do another round of fixups from t2. The important part (t2 is not returning as the user space visible owner) is done. The further fixups are done, before either t3 or t4 return to user space. For the user space it is not relevant which task (t3 or t4) is the real owner, as long as those are both in the kernel, which is guaranteed by the serialization of the hash bucket lock. Both tasks (which ever returns first to userspace - t4 because it locked the rt_mutex or t3 due to a signal) are going through the lock_futex_pi() return path where the ownership is fixed before the return to user space. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01	tick-sched: add more debug information	Thomas Gleixner
	To allow better diagnosis of tick-sched related, especially NOHZ related problems, we need to know when the last wakeup via an irq happened and when the CPU left the idle state. Add two fields (idle_waketime, idle_exittime) to the tick_sched structure and add them to the timer_list output. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01	timekeeping: update xtime_cache when time(zone) changes	Thomas Gleixner
	xtime_cache needs to be updated whenever xtime and or wall_to_monotic are changed. Otherwise users of xtime_cache might see a stale (and in the case of timezone changes utterly wrong) value until the next update happens. Fixup the obvious places, which miss this update. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01	hrtimer: fix hrtimer_init_sleeper() users	Peter Zijlstra
	this patch: commit 37bb6cb4097e29ffee970065b74499cbf10603a3 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri Jan 25 21:08:32 2008 +0100 hrtimer: unlock hrtimer_wakeup Broke hrtimer_init_sleeper() users. It forgot to fix up the futex caller of this function to detect the failed queueing and messed up the do_nanosleep() caller in that it could leak a TASK_INTERRUPTIBLE state. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31	[AUDIT]: Increase skb->truesize in audit_expand	Herbert Xu
	The recent UDP patch exposed this bug in the audit code. It was calling pskb_expand_head without increasing skb->truesize. The caller of pskb_expand_head needs to do so because that function is designed to be called in places where truesize is already fixed and therefore it doesn't update its value. Because the audit system is using it in a place where the truesize has not yet been fixed, it needs to update its value manually. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-01	Ensure that we export __fatal_signal_pending()	Trond Myklebust
	It may be used by the modules nfs.ko and sunrpc.ko Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [ Made it a regular export rather than GPL-only - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-01	Merge branch 'task_killable' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc * 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits) Remove commented-out code copied from NFS NFS: Switch from intr mount option to TASK_KILLABLE Add wait_for_completion_killable Add wait_event_killable Add schedule_timeout_killable Use mutex_lock_killable in vfs_readdir Add mutex_lock_killable Use lock_page_killable Add lock_page_killable Add fatal_signal_pending Add TASK_WAKEKILL exit: Use task_is_* signal: Use task_is_* sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL ptrace: Use task_is_* power: Use task_is_* wait: Use TASK_NORMAL proc/base.c: Use task_is_* proc/array.c: Use TASK_REPORT perfmon: Use task_is_* ... Fixed up conflicts in NFS/sunrpc manually..
2008-01-31	debug: turn ignore_loglevel into an early param	Ingo Molnar
	i was debugging early crashes and wondered where all the printks went. The reason: ignore_loglevel_setup() was not called yet ... Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31	sched: remove unused params	Gerald Stralko
	This removes the extra struct task_struct *p parameter in inc_nr_running and dec_nr_running functions. Signed-off by: Jerry Stralko <gerb.stralko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31	sched: let +nice tasks have smaller impact	Peter Zijlstra
	Michel Dänzr has bisected an interactivity problem with plus-reniced tasks back to this commit: 810e95ccd58d91369191aa4ecc9e6d4a10d8d0c8 is first bad commit commit 810e95ccd58d91369191aa4ecc9e6d4a10d8d0c8 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Mon Oct 15 17:00:14 2007 +0200 sched: another wakeup_granularity fix unit mis-match: wakeup_gran was used against a vruntime fix this by assymetrically scaling the vtime of positive reniced tasks. Bisected-by: Michel Dänzer <michel@tungstengraphics.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31	sched: fix high wake up latencies with FAIR_USER_SCHED	Srivatsa Vaddagiri
	The reason why we are getting better wakeup latencies for !FAIR_USER_SCHED is because of this snippet of code in place_entity(): if (!initial) { /* sleeps upto a single latency don't count. / if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se)) ^^^^^^^^^^^^^^^^^^ vruntime -= sysctl_sched_latency; / ensure we never gain time by being placed backwards. */ vruntime = max_vruntime(se->vruntime, vruntime); } NEW_FAIR_SLEEPERS feature gives credit for sleeping only to tasks and not group-level entities. With the patch attached, I could see that wakeup latencies with FAIR_USER_SCHED are restored to the same level as !FAIR_USER_SCHED. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31	Merge branch 'linux-2.6'	Paul Mackerras

2008-01-30	KVM: Disallow fork() and similar games when using a VM	Avi Kivity
	We don't want the meaning of guest userspace changing under our feet. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30	x86/non-x86: percpu, node ids, apic ids x86.git fixup	Mike Travis
	Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	genirq: stackdump after the "Trying to free already-free IRQ" message	Ingo Molnar
	these bugs are harder to find than they seem, a stackdump helps. make it dependent on CONFIG_DEBUG_SHIRQ so that people can turn it off if it annoys them. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: add a simple backtrace test module	Arjan van de Ven
	During the work on the x86 32 and 64 bit backtrace code I found it useful to have a simple test module to test a process and irq context backtrace. Since the existing backtrace code was buggy, I figure it might be useful to have such a test module in the kernel so that maybe we can even detect such bugs earlier.. [ mingo@elte.hu: build fix ] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: make early printk selectable on 64-bit as well	Ingo Molnar
	Enable CONFIG_EMBEDDED to select CONFIG_EARLY_PRINTK on 64-bit as well. saves ~2K: text data bss dec hex filename 7290283 3672091 1907848 12870222 c4624e vmlinux.before 7288373 3671795 1907848 12868016 c459b0 vmlinux.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: kprobes: add kprobes smoke tests that run on boot	Ananth N Mavinakayanahalli
	Here is a quick and naive smoke test for kprobes. This is intended to just verify if some unrelated change broke the *probes subsystem. It is self contained, architecture agnostic and isn't of any great use by itself. This needs to be built in the kernel and runs a basic set of tests to verify if kprobes, jprobes and kretprobes run fine on the kernel. In case of an error, it'll print out a message with a "BUG" prefix. This is a start; we intend to add more tests to this bucket over time. Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions. Tested on x86 (32/64) and powerpc. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30	debug: add the end-of-trace marker and the module list to	Arjan van de Ven
	Unlike oopses, WARN_ON() currently does't print the loaded modules list. This makes it harder to take action on certain bug reports. For example, recently there were a set of WARN_ON()s reported in the mac80211 stack, which were just signalling a driver bug. It takes then anther round trip to the bug reporter (if he responds at all) to find out which driver is at fault. Another issue is that, unlike oopses, WARN_ON() doesn't currently printk the helpful "cut here" line, nor the "end of trace" marker. Now that WARN_ON() is out of line, the size increase due to this is minimal and it's worth adding. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	debug: move WARN_ON() out of line	Arjan van de Ven
	A quick grep shows that there are currently 1145 instances of WARN_ON in the kernel. Currently, WARN_ON is pretty much entirely inlined, which makes it hard to enhance it without growing the size of the kernel (and getting Andrew unhappy). This patch build on top of Olof's patch that introduces __WARN, and places the slowpath out of line. It also uses Ingo's suggestion to not use __FUNCTION__ but to use kallsyms to do the lookup; this saves a ton of extra space since gcc doesn't need to store the function string twice now: 3936367 833603 624736 5394706 525112 vmlinux.before 3917508 833603 624736 5375847 520767 vmlinux-slowpath 15Kb savings... Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Olof Johansson <olof@lixom.net> Acked-by: Matt Meckall <mpm@selenic.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: add /proc/irq/*/spurious to dump the spurious irq debugging state	Andi Kleen
	This is useful to debug problems with interrupt handlers that return sometimes IRQ_NONE. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	genirq: turn irq debugging options into module params	Andi Kleen
	This allows to change them at runtime using sysfs. No need to reboot to set them. I only added aliases (kernel.noirqdebug etc.) so the old options still work. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: compat_sys_ptrace	Roland McGrath
	This adds a generic definition of compat_sys_ptrace that calls compat_arch_ptrace, parallel to sys_ptrace/arch_ptrace. Some machines needing this already define a function by that name. The new generic function is defined only on machines that put #define __ARCH_WANT_COMPAT_SYS_PTRACE into asm/ptrace.h. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: compat_ptrace_request	Roland McGrath
	This adds a compat_ptrace_request that is the analogue of ptrace_request for the things that 32-on-64 ptrace implementations can share in common. So far there are just a couple of requests handled generically. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: ptrace_request peekdata/pokedata	Roland McGrath
	This makes ptrace_request handle {PEEK,POKE}{TEXT,DATA} directly. Every arch_ptrace that could call generic_ptrace_peekdata already has a default case calling ptrace_request, so this keeps things simpler for the arch code. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	spinlock: lockbreak cleanup	Nick Piggin
	The break_lock data structure and code for spinlocks is quite nasty. Not only does it double the size of a spinlock but it changes locking to a potentially less optimal trylock. Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a __raw_spin_is_contended that uses the lock data itself to determine whether there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is not set. Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to decouple it from the spinlock implementation, and make it typesafe (rwlocks do not have any need_lockbreak sites -- why do they even get bloated up with that break_lock then?). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: rename the struct pt_regs members for 32/64-bit consistency	H. Peter Anvin
	We have a lot of code which differs only by the naming of specific members of structures that contain registers. In order to enable additional unifications, this patch drops the e- or r- size prefix from the register names in struct pt_regs, and drops the x- prefixes for segment registers on the 32-bit side. This patch also performs the equivalent renames in some additional places that might be candidates for unification in the future. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	ptrace: generic PTRACE_SINGLEBLOCK	Roland McGrath
	This makes ptrace_request handle PTRACE_SINGLEBLOCK along with PTRACE_CONT et al. The new generic code makes use of the arch_has_block_step macro and generic entry points on machines that define them. [ mingo@elte.hu: bugfix ] Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	ptrace: generic resume	Roland McGrath
	This makes ptrace_request handle all the ptrace requests that wake up the traced task. These do low-level ptrace implementation magic that is not arch-specific and should be kept out of arch code. The implementations on each arch usually do the same thing. The new generic code makes use of the arch_has_single_step macro and generic entry points to handle PTRACE_SINGLESTEP. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: various changes and cleanups to in_p/out_p delay details	Ingo Molnar
	various changes to the in_p/out_p delay details: - add the io_delay=none method - make each method selectable from the kernel config - simplify the delay code a bit by getting rid of an indirect function call - add the /proc/sys/kernel/io_delay_type sysctl - change 'io_delay=standard\|alternate' to io_delay=0x80 and io_delay=0xed - make the io delay config not depend on CONFIG_DEBUG_KERNEL Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: "David P. Reed" <dpreed@reed.com>
2008-01-30	time: track accurate idle time with tick_sched.idle_sleeptime	Venki Pallipadi
	Current idle time in kstat is based on jiffies and is coarse grained. tick_sched.idle_sleeptime is making some attempt to keep track of idle time in a fine grained manner. But, it is not handling the time spent in interrupts fully. Make tick_sched.idle_sleeptime accurate with respect to time spent on handling interrupts and also add tick_sched.idle_lastupdate, which keeps track of last time when idle_sleeptime was updated. This statistics will be crucial for cpufreq-ondemand governor, which can shed some conservative gaurd band that is uses today while setting the frequency. The ondemand changes that uses the exact idle time is coming soon. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	NTP: correct inconsistent ntp interval/tick_length usage	john stultz
	I recently noticed on one of my boxes that when synched with an NTP server, the drift value reported for the system was ~283ppm. While in some cases, clock hardware can be that bad, it struck me as unusual as the system was using the acpi_pm clocksource, which is one of the more trustworthy and accurate clocksources on x86 hardware. I brought up another system and let it sync to the same NTP server, and I noticed a similar 280some ppm drift. In looking at the code, I found that the acpi_pm's constant frequency was being computed correctly at boot-up, however once the system was up, even without the ntp daemon running, the clocksource's frequency was being modified by the clocksource_adjust() function. Digging deeper, I realized that in the code that keeps track of how much the clocksource is skewing from the ntp desired time, we were using different lengths to establish how long an time interval was. The clocksource was being setup with the following interval: NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ While the ntp code was using the tick_length_base value: tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ) /NTP_INTERVAL_FREQ The subtle difference is: (tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC This difference in calculation was causing the clocksource correction code to apply a correction factor to the clocksource so the two intervals were the same, however this results in the actual frequency of the clocksource to be made incorrect. I believe this difference would affect all clocksources, although to differing degrees depending on the clocksource resolution. The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1, so my apologies for the mistake, and for not noticing it until now. The following patch, corrects the clocksource's initialization code so it uses the same interval length as the code in ntp.c. After applying this patch, the drift value for the same system went from ~283ppm to only 2.635ppm. I believe this patch to be good, however it does affect all arches and I've only tested on x86, so some caution is advised. I do think it would be a likely candidate for a stable 2.6.24.x release. Any thoughts or feedback would be appreciated. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	x86: make clockevents more robust	Ingo Molnar
	detect zero event-device multiplicators - they then cause division-by-zero crashes if a clockevent has been initialized incorrectly. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	clocksource: add unregister function to disable unusable clocksources	Thomas Gleixner
	On x86 the PIT might become an unusable clocksource. Add an unregister function to provide a possibilty to remove the PIT from the list of available clock sources. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30	clocksource: make clocksource watchdog cycle through online CPUs	Andi Kleen
	This way it checks if the clocks are synchronized between CPUs too. This might be able to detect slowly drifting TSCs which only go wrong over longer time. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	clocksource.c: use init_timer_deferrable for clocksource_watchdog	Parag Warudkar
	clocksource_watchdog can use a deferrable timer - reduces wakeups from idle per second. Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30	time: fold __get_realtime_clock_ts() into getnstimeofday()	Geert Uytterhoeven
	- getnstimeofday() was just a wrapper around __get_realtime_clock_ts() - Replace calls to __get_realtime_clock_ts() by calls to getnstimeofday() - Fix bogus reference to get_realtime_clock_ts(), which never existed Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>