Age | Commit message (Collapse) | Author |
|
The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to. In order to do this tracking one should
1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
2. Wait some time.
3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)
To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is. Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.
Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast. This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.
Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
These bits are always constant (== PAGE_SHIFT) and just occupy space in
the entry. Moreover, in next patch we will need to report one more bit
in the pagemap, but all bits are already busy on it.
That said, describe the pagemap entry that has 6 more free zero bits.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In the next patch the clear-refs-type will be required in
clear_refs_pte_range funciton, so prepare the walk->private to carry
this info.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the implementation of the soft-dirty bit concept that should
help keep track of changes in user memory, which in turn is very-very
required by the checkpoint-restore project (http://criu.org).
To create a dump of an application(s) we save all the information about
it to files, and the biggest part of such dump is the contents of tasks'
memory. However, there are usage scenarios where it's not required to
get _all_ the task memory while creating a dump. For example, when
doing periodical dumps, it's only required to take full memory dump only
at the first step and then take incremental changes of memory. Another
example is live migration. We copy all the memory to the destination
node without stopping all tasks, then stop them, check for what pages
has changed, dump it and the rest of the state, then copy it to the
destination node. This decreases freeze time significantly.
That said, some help from kernel to watch how processes modify the
contents of their memory is required.
The proposal is to track changes with the help of new soft-dirty bit
this way:
1. First do "echo 4 > /proc/$pid/clear_refs".
At that point kernel clears the soft dirty _and_ the writable bits from all
ptes of process $pid. From now on every write to any page will result in #pf
and the subsequent call to pte_mkdirty/pmd_mkdirty, which in turn will set
the soft dirty flag.
2. Then read the /proc/$pid/pagemap2 and check the soft-dirty bit reported there
(the 55'th one). If set, the respective pte was written to since last call
to clear refs.
The soft-dirty bit is the _PAGE_BIT_HIDDEN one. Although it's used by
kmemcheck, the latter one marks kernel pages with it, while the former
bit is put on user pages so they do not conflict to each other.
This patch:
A new clear-refs type will be added in the next patch, so prepare
code for that.
[akpm@linux-foundation.org: don't assume that sizeof(enum clear_refs_types) == sizeof(int)]
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There may exist NULL pointer dereference in config_item_name() when one
volume (say Volume A) unmounts while another (say Volume B) mounting.
Volume A Volume B
already Mounted.
Unmounting, call
o2hb_heartbeat_group_drop_item()
-> config_item_put(item)
set reg(A)->item.ci_name to NULL
in function config_item_cleanup().
begin mounting, call
o2hb_region_pin() and tranverse all
regions. When reading
reg(A)->item.ci_name, it causes
NULL pointer dereference.
call o2hb_region_release() and
del reg(A) from list.
So we should skip accessing regions that is going to release when
tranverse o2hb_all_regions.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: joyce <xuejiufei@huawei.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Adjust switch..case syntax at o2net_state_change to meet the kernel coding
standard.
s/printk/pr_info/.
[akpm@linux-foundation.org: revert pr_foo() change]
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Gurudas Pai <gurudas.pai@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
Cc: Srinivas Eeeda <srinivas.eeda@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Tao Ma <tm@tao.ma>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix a comment typo in o2quo_hb_still_up()
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Gurudas Pai <gurudas.pai@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
Cc: Srinivas Eeeda <srinivas.eeda@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Tao Ma <tm@tao.ma>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
s/o2hb_global_hearbeat_mode_set/o2hb_global_heartbeat_mode_set/ to make
the signature of those routines in a consistent manner with others for
heartbeating.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Gurudas Pai <gurudas.pai@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
Cc: Srinivas Eeeda <srinivas.eeda@oracle.com>
Cc: Tao Ma <tm@tao.ma>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Under heavy I/O load, writing the disk heartbeat can be forced to wait for
minutes, and this causes the node to be fenced.
This patch tries to use WRITE_SYNC in submitting the heartbeat bio, so
that writing the heartbeat will have a priority over other requests.
Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
Acked-by: Tao Ma <tm@tao.ma>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Srinivas Eeeda <srinivas.eeda@oracle.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Tested-by: Gurudas Pai <gurudas.pai@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Inlined xattr shared free space of inode block with inlined data or data
extent record, so the size of the later two should be adjusted when
inlined xattr is enabled. See ocfs2_xattr_ibody_init(). But this isn't
done well when reflink. For inode with inlined data, its max inlined
data size is adjusted in ocfs2_duplicate_inline_data(), no problem. But
for inode with data extent record, its record count isn't adjusted. Fix
it, or data extent record and inlined xattr may overwrite each other,
then cause data corruption or xattr failure.
One panic caused by this bug in our test environment is the following:
kernel BUG at fs/ocfs2/xattr.c:1435!
invalid opcode: 0000 [#1] SMP
Pid: 10871, comm: multi_reflink_t Not tainted 2.6.39-300.17.1.el5uek #1
RIP: ocfs2_xa_offset_pointer+0x17/0x20 [ocfs2]
RSP: e02b:ffff88007a587948 EFLAGS: 00010283
RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00000000000051e4
RDX: ffff880057092060 RSI: 0000000000000f80 RDI: ffff88007a587a68
RBP: ffff88007a587948 R08: 00000000000062f4 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000010
R13: ffff88007a587a68 R14: 0000000000000001 R15: ffff88007a587c68
FS: 00007fccff7f06e0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000015cf000 CR3: 000000007aa76000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process multi_reflink_t
Call Trace:
ocfs2_xa_reuse_entry+0x60/0x280 [ocfs2]
ocfs2_xa_prepare_entry+0x17e/0x2a0 [ocfs2]
ocfs2_xa_set+0xcc/0x250 [ocfs2]
ocfs2_xattr_ibody_set+0x98/0x230 [ocfs2]
__ocfs2_xattr_set_handle+0x4f/0x700 [ocfs2]
ocfs2_xattr_set+0x6c6/0x890 [ocfs2]
ocfs2_xattr_user_set+0x46/0x50 [ocfs2]
generic_setxattr+0x70/0x90
__vfs_setxattr_noperm+0x80/0x1a0
vfs_setxattr+0xa9/0xb0
setxattr+0xc3/0x120
sys_fsetxattr+0xa8/0xd0
system_call_fastpath+0x16/0x1b
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
While deleting a file with ocfs2_unlink(), there is a bug in this
function. This bug will result in filesystem read-only.
After calling ocfs2_orphan_add(), the file which will be deleted is
added into orphan dir. If ocfs2_delete_entry() fails, the file still
exists in the parent dir. And this scenario introduces a conflict of
metadata.
If a file is added into orphan dir, when we put inode of the file with
iput(), the inode i_flags is setted (~OCFS2_VALID_FL) in
ocfs2_remove_inode(), and then write back to disk.
But as previously mentioned, the file still exists in the parent dir.
On other nodes, the file can be still accessed. When first read the
file with ocfs2_read_blocks() from disk, It will check and avalidate
inode using ocfs2_validate_inode_block(). So File system will be
readonly because the inode is invalid. In other words, the inode
i_flags has been set (~OCFS2_VALID_FL).
[akpm@linux-foundation.org: cleanups]
[jeff.liu@oracle.com: s/inode_is_unlinkable/ocfs2_inode_is_unlinkable/]
Signed-off-by: Younger Liu <younger.liu@huawei.com>
Signed-off-by: Jensen <shencanquan@huawei.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Younger Liu <younger.liu@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In ocfs2_relink_block_group(), we roll back all those changes if notify
intent to modify buffers for metadata update failed even if the relevant
buffer has not yet been modified/got dirty at that point, that are not
quite right because of:
- None buffer has been modified/dirty if failed to call
ocfs2_journal_access_gd() against the previous block group buffer
- Only the previous block group buffer has got dirty if failed to call
ocfs2_journal_access_gd() against the block group buffer
- There is no need to roll back the change for file entry buffer at all
Those problems will not cause anything wrong but unnecessary. This
patch fix them and kill the useless bg_ptr variable as well.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Younger Liu <younger.liu@huawei.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
While adding a file into orphan dir in ocfs2_orphan_add(), it calls
__ocfs2_add_entry() before ocfs2_journal_access_di(). If
ocfs2_journal_access_di() failed, the file is added into orphan dir, and
orphan dir dinode updated, but file dinode has not been updated.
Accordingly, the data is not consistent between file dinode and orphan
dir.
So, need to call ocfs2_journal_access_di() before __ocfs2_add_entry(),
and if ocfs2_journal_access_di() failed, orphan_fe and
orphan_dir_inode->i_nlink need rollback.
This bug was added by 3939fda4 ("Ocfs2: Journaling i_flags and
i_orphaned_slot when adding inode to orphan dir.").
Signed-off-by: Younger Liu <younger.liu@huawei.com>
Acked-by: Jeff Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
blocked list
dlmlock_master() returns DLM_RECOVERING/DLM_MIGRATING/ DLM_FORWAR after
adding lock to blocked list if lockres has the state
DLM_LOCK_RES_RECOVERING/DLM_LOCK_RES_MIGRATING/ DLM_LOCK_RES_IN_PROGRESS.
so it will retry in dlmlock(). And this may cause dlm_thread fall into an
infinite loop
Thread1 dlm_thread
calls dlm_lock->dlmlock_master,
if lockresA is in state
DLM_LOCK_RES_RECOVERING, calls
__dlm_wait_on_lockres() and waits
until others threads clear this
state;
If cannot grant this lock,
adding lock to blocked list,
and return DLM_RECOVERING;
Grant this lock and move it to
grant list;
After a while, retry and
calls list_add_tail(), adding lock
to blocked list again.
Granted and blocked list of this lockres will become the following
conditions:
lock_res->granted.next = dlm_lock->list_head;
lock_res->blocked.next = dlm_lock->list_head;
dlm_lock->list_head.next = dlm_lock_resource->blocked;
When dlm_thread traverses the granted list, it will fall into an endless
loop, checking dlm_lock.list_head, dlm_lock->list_head.next
(i.e.lock_res->blocked), lock_res->blocked.next(i.e.dlm_lock.list_head
again) .....
Signed-off-by: joyce <xuejiufei@huawei.com>
Reviewed-by: jensen <shencanquan@huawei.com>
Cc: Jeff Liu <jeff.liu@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Free space checking will be done in ocfs2_xattr_ibody_init(). So remove
here.
[akpm@linux-foundation.org: remove unused local]
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a memory leak in sc_kref_release(). When free struct
o2net_sock_container (sc), we should release sc->sc_page.
Signed-off-by: Younger Liu <younger.liu@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ocfs2_calc_extend_credits
While adding extends to a file, the credits are calculated incorrectly
and if the requested clusters is more than one (or more because we used
a conservative limit) then we run out of journal credits and we hit an
assert in journalling code.
The function parameter bits_wanted variable was not used at all.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In ocfs2_remove_btree_range, when calling ocfs2_lock_refcount_tree and
ocfs2_prepare_refcount_change_for_del failed, it goes to out and then
tries to call mutex_unlock without mutex_lock before. And when calling
ocfs2_reserve_blocks_for_rec_trunc failed, it should free ref_tree
before return.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Code cleanup: needs_checkpoint is assigned to but never used. Delete
the variable.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Jeff Liu <jeff.liu@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
dlm_begin_reco_handler() returns without putting dlm when dlm recovery
state is DLM_RECO_STATE_FINALIZE.
Signed-off-by: joyce <xuejiufei@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If we use le32_add_cpu to set ocfs2_dinode i_flags, it may lead to the
corresponding flag corrupted. So we should change it to bitwise and/or
operation.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: shencanquan <shencanquan@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
instead of enum
In dlm_request_all_locks, ret is type enum. But o2net_send_message
returns a type int value. Then it will never run into the following
error branch. So we should change the ret type from enum to int.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Below 3 functions have already been declared in dlmcommon.h, so we have
no need to declare them again in dlmrecovery.c:
dlm_complete_recovery_thread
dlm_launch_recovery_thread
dlm_kick_recovery_thread
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The difference between "count" and "len" is that "len" is capped at
4095. Changing it like this makes it match how sysfs_write_file() is
implemented.
This is a static analysis patch. I haven't found any store_attribute()
functions where this change makes a difference.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If we receive new caps from the auth MDS and the non-auth MDS is
revoking the newly issued caps, we should release the caps from
the non-auth MDS. The scenario is filelock's state changes from
SYNC to LOCK. Non-auth MDS revokes Fc cap, the client gets Fc cap
from the auth MDS at the same time.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
If caps are been revoking by the auth MDS, don't consider them as
issued even they are still issued by non-auth MDS. The non-auth
MDS should also be revoking/exporting these caps, the client just
hasn't received the cap revoke/export message.
The race I encountered is: When caps are exporting to new MDS, the
client receives cap import message and cap revoke message from the
new MDS, then receives cap export message from the old MDS. When
the client receives cap revoke message from the new MDS, the revoking
caps are still issued by the old MDS, so the client does nothing.
Later when the cap export message is received, the client removes
the caps issued by the old MDS. (Another way to fix the race is
calling ceph_check_caps() in handle_cap_export())
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
The locking order for pending vmtruncate is wrong, it can lead to
following race:
write wmtruncate work
------------------------ ----------------------
lock i_mutex
check i_truncate_pending check i_truncate_pending
truncate_inode_pages() lock i_mutex (blocked)
copy data to page cache
unlock i_mutex
truncate_inode_pages()
The fix is take i_mutex before calling __ceph_do_pending_vmtruncate()
Fixes: http://tracker.ceph.com/issues/5453
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
when mounting ceph with a dev name that starts with a slash, ceph
would attempt to access the character before that slash. Since we
don't actually own that byte of memory, we would trigger an
invalid access:
[ 43.499934] BUG: unable to handle kernel paging request at ffff880fa3a97fff
[ 43.500984] IP: [<ffffffff818f3884>] parse_mount_options+0x1a4/0x300
[ 43.501491] PGD 743b067 PUD 10283c4067 PMD 10282a6067 PTE 8000000fa3a97060
[ 43.502301] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 43.503006] Dumping ftrace buffer:
[ 43.503596] (ftrace buffer empty)
[ 43.504046] CPU: 0 PID: 10879 Comm: mount Tainted: G W 3.10.0-sasha #1129
[ 43.504851] task: ffff880fa625b000 ti: ffff880fa3412000 task.ti: ffff880fa3412000
[ 43.505608] RIP: 0010:[<ffffffff818f3884>] [<ffffffff818f3884>] parse_mount_options$
[ 43.506552] RSP: 0018:ffff880fa3413d08 EFLAGS: 00010286
[ 43.507133] RAX: ffff880fa3a98000 RBX: ffff880fa3a98000 RCX: 0000000000000000
[ 43.507893] RDX: ffff880fa3a98001 RSI: 000000000000002f RDI: ffff880fa3a98000
[ 43.508610] RBP: ffff880fa3413d58 R08: 0000000000001f99 R09: ffff880fa3fe64c0
[ 43.509426] R10: ffff880fa3413d98 R11: ffff880fa38710d8 R12: ffff880fa3413da0
[ 43.509792] R13: ffff880fa3a97fff R14: 0000000000000000 R15: ffff880fa3413d90
[ 43.509792] FS: 00007fa9c48757e0(0000) GS:ffff880fd2600000(0000) knlGS:000000000000$
[ 43.509792] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 43.509792] CR2: ffff880fa3a97fff CR3: 0000000fa3bb9000 CR4: 00000000000006b0
[ 43.509792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 43.509792] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 43.509792] Stack:
[ 43.509792] 0000e5180000000e ffffffff85ca1900 ffff880fa38710d8 ffff880fa3413d98
[ 43.509792] 0000000000000120 0000000000000000 ffff880fa3a98000 0000000000000000
[ 43.509792] ffffffff85cf32a0 0000000000000000 ffff880fa3413dc8 ffffffff818f3c72
[ 43.509792] Call Trace:
[ 43.509792] [<ffffffff818f3c72>] ceph_mount+0xa2/0x390
[ 43.509792] [<ffffffff81226314>] ? pcpu_alloc+0x334/0x3c0
[ 43.509792] [<ffffffff81282f8d>] mount_fs+0x8d/0x1a0
[ 43.509792] [<ffffffff812263d0>] ? __alloc_percpu+0x10/0x20
[ 43.509792] [<ffffffff8129f799>] vfs_kern_mount+0x79/0x100
[ 43.509792] [<ffffffff812a224d>] do_new_mount+0xcd/0x1c0
[ 43.509792] [<ffffffff812a2e8d>] do_mount+0x15d/0x210
[ 43.509792] [<ffffffff81220e55>] ? strndup_user+0x45/0x60
[ 43.509792] [<ffffffff812a2fdd>] SyS_mount+0x9d/0xe0
[ 43.509792] [<ffffffff83fd816c>] tracesys+0xdd/0xe2
[ 43.509792] Code: 4c 8b 5d c0 74 0a 48 8d 50 01 49 89 14 24 eb 17 31 c0 48 83 c9 ff $
[ 43.509792] RIP [<ffffffff818f3884>] parse_mount_options+0x1a4/0x300
[ 43.509792] RSP <ffff880fa3413d08>
[ 43.509792] CR2: ffff880fa3a97fff
[ 43.509792] ---[ end trace 22469cd81e93af51 ]---
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Sage Weil <sage@inktan.com>
|
|
Drop ignored return value. Fix allocation failure case to not leak.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Either in vfs_write or io_submit,it call file_start/end_write.
The different between file_start/end_write and sb_start/end_write is
file_ only handle regular file.But i think in ceph_aio_write,it only
for regular file.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Acked-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
[ 1121.231883] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
[ 1121.231935] in_atomic(): 1, irqs_disabled(): 0, pid: 9831, name: mv
[ 1121.231971] 1 lock held by mv/9831:
[ 1121.231973] #0: (&(&ci->i_ceph_lock)->rlock){+.+...},at:[<ffffffffa02bbd38>] ceph_getxattr+0x58/0x1d0 [ceph]
[ 1121.231998] CPU: 3 PID: 9831 Comm: mv Not tainted 3.10.0-rc6+ #215
[ 1121.232000] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011
[ 1121.232027] ffff88006d355a80 ffff880092f69ce0 ffffffff8168348c ffff880092f69cf8
[ 1121.232045] ffffffff81070435 ffff88006d355a20 ffff880092f69d20 ffffffff816899ba
[ 1121.232052] 0000000300000004 ffff8800b76911d0 ffff88006d355a20 ffff880092f69d68
[ 1121.232056] Call Trace:
[ 1121.232062] [<ffffffff8168348c>] dump_stack+0x19/0x1b
[ 1121.232067] [<ffffffff81070435>] __might_sleep+0xe5/0x110
[ 1121.232071] [<ffffffff816899ba>] down_read+0x2a/0x98
[ 1121.232080] [<ffffffffa02baf70>] ceph_vxattrcb_layout+0x60/0xf0 [ceph]
[ 1121.232088] [<ffffffffa02bbd7f>] ceph_getxattr+0x9f/0x1d0 [ceph]
[ 1121.232093] [<ffffffff81188d28>] vfs_getxattr+0xa8/0xd0
[ 1121.232097] [<ffffffff8118900b>] getxattr+0xab/0x1c0
[ 1121.232100] [<ffffffff811704f2>] ? final_putname+0x22/0x50
[ 1121.232104] [<ffffffff81155f80>] ? kmem_cache_free+0xb0/0x260
[ 1121.232107] [<ffffffff811704f2>] ? final_putname+0x22/0x50
[ 1121.232110] [<ffffffff8109e63d>] ? trace_hardirqs_on+0xd/0x10
[ 1121.232114] [<ffffffff816957a7>] ? sysret_check+0x1b/0x56
[ 1121.232120] [<ffffffff81189c9c>] SyS_fgetxattr+0x6c/0xc0
[ 1121.232125] [<ffffffff81695782>] system_call_fastpath+0x16/0x1b
[ 1121.232129] BUG: scheduling while atomic: mv/9831/0x10000002
[ 1121.232154] 1 lock held by mv/9831:
[ 1121.232156] #0: (&(&ci->i_ceph_lock)->rlock){+.+...}, at:
[<ffffffffa02bbd38>] ceph_getxattr+0x58/0x1d0 [ceph]
I think move the ci->i_ceph_lock down is safe because we can't free
ceph_inode_info at there.
CC: stable@vger.kernel.org # 3.8+
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
We may receive old request reply from the exporter MDS after receiving
the importer MDS' cap import message.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
The client can receive truncate request from MDS at any time.
So the page writeback code need to get i_size, truncate_seq and
truncate_size atomically
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
ceph_encode_inode_release() can race with ceph_open() and release
caps wanted by open files. So it should call __ceph_caps_wanted()
to get the wanted caps.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Conflicts:
drivers/net/ethernet/freescale/fec_main.c
drivers/net/ethernet/renesas/sh_eth.c
net/ipv4/gre.c
The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.
The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.
Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the total number of ACPI commits is slightly greater than
the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
remains the most active patch submitter.
To me, the most significant change is the addition of offline/online
device operations to the driver core (with the Greg's blessing) and
the related modifications of the ACPI core hotplug code. Next are the
freezer updates from Colin Cross that should make the freezing of
tasks a bit less heavy weight.
We also have a couple of regression fixes, a number of fixes for
issues that have not been identified as regressions, two new drivers
and a bunch of cleanups all over.
Highlights:
- Hotplug changes to support graceful hot-removal failures.
It sometimes is necessary to fail device hot-removal operations
gracefully if they cannot be carried out completely. For example,
if memory from a memory module being hot-removed has been allocated
for the kernel's own use and cannot be moved elsewhere, it's
desirable to fail the hot-removal operation in a graceful way
rather than to crash the kernel, but currenty a success or a kernel
crash are the only possible outcomes of an attempted memory
hot-removal. Needless to say, that is not a very attractive
alternative and it had to be addressed.
However, in order to make it work for memory, I first had to make
it work for CPUs and for this purpose I needed to modify the ACPI
processor driver. It's been split into two parts, a resident one
handling the low-level initialization/cleanup and a modular one
playing the actual driver's role (but it binds to the CPU system
device objects rather than to the ACPI device objects representing
processors). That's been sort of like a live brain surgery on a
patient who's riding a bike.
So this is a little scary, but since we found and fixed a couple of
regressions it caused to happen during the early linux-next testing
(a month ago), nobody has complained.
As a bonus we remove some duplicated ACPI hotplug code, because the
ACPI-based CPU hotplug is now going to use the common ACPI hotplug
code.
- Lighter weight freezing of tasks.
These changes from Colin Cross and Mandeep Singh Baines are
targeted at making the freezing of tasks a bit less heavy weight
operation. They reduce the number of tasks woken up every time
during the freezing, by using the observation that the freezer
simply doesn't need to wake up some of them and wait for them all
to call refrigerator(). The time needed for the freezer to decide
to report a failure is reduced too.
Also reintroduced is the check causing a lockdep warining to
trigger when try_to_freeze() is called with locks held (which is
generally unsafe and shouldn't happen).
- cpufreq updates
First off, a commit from Srivatsa S Bhat fixes a resume regression
introduced during the 3.10 cycle causing some cpufreq sysfs
attributes to return wrong values to user space after resume. The
fix is kind of fresh, but also it's pretty obvious once Srivatsa
has identified the root cause.
Second, we have a new freqdomain_cpus sysfs attribute for the
acpi-cpufreq driver to provide information previously available via
related_cpus. From Lan Tianyu.
Finally, we fix a number of issues, mostly related to the
CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
up some code. The majority of changes from Viresh Kumar with bits
from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
Arnd Bergmann, and Tang Yuantian.
- ACPICA update
A usual bunch of updates from the ACPICA upstream.
During the 3.4 cycle we introduced support for ACPI 5 extended
sleep registers, but they are only supposed to be used if the
HW-reduced mode bit is set in the FADT flags and the code attempted
to use them without checking that bit. That caused suspend/resume
regressions to happen on some systems. Fix from Lv Zheng causes
those registers to be used only if the HW-reduced mode bit is set.
Apart from this some other ACPICA bugs are fixed and code cleanups
are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
Zhang Rui.
- cpuidle updates
New driver for Xilinx Zynq processors is added by Michal Simek.
Multidriver support simplification, addition of some missing
kerneldoc comments and Kconfig-related fixes come from Daniel
Lezcano.
- ACPI power management updates
Changes to make suspend/resume work correctly in Xen guests from
Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
cleanups and fixes of the ACPI device power state selection
routine.
- ACPI documentation updates
Some previously missing pieces of ACPI documentation are added by
Lv Zheng and Aaron Lu (hopefully, that will help people to
uderstand how the ACPI subsystem works) and one outdated doc is
updated by Hanjun Guo.
- Assorted ACPI updates
We finally nailed down the IA-64 issue that was the reason for
reverting commit 9f29ab11ddbf ("ACPI / scan: do not match drivers
against objects having scan handlers"), so we can fix it and move
the ACPI scan handler check added to the ACPI video driver back to
the core.
A mechanism for adding CMOS RTC address space handlers is
introduced by Lan Tianyu to allow some EC-related breakage to be
fixed on some systems.
A spec-compliant implementation of acpi_os_get_timer() is added by
Mika Westerberg.
The evaluation of _STA is added to do_acpi_find_child() to avoid
situations in which a pointer to a disabled device object is
returned instead of an enabled one with the same _ADR value. From
Jeff Wu.
Intel BayTrail PCH (Platform Controller Hub) support is added to
the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
driver is modified to work around a couple of known BIOS issues.
Changes from Mika Westerberg and Heikki Krogerus.
The EC driver is fixed by Vasiliy Kulikov to use get_user() and
put_user() instead of dereferencing user space pointers blindly.
Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
Kani.
- Assorted power management updates
The "runtime idle" helper routine is changed to take the return
values of the callbacks executed by it into account and to call
rpm_suspend() if they return 0, which allows us to reduce the
overall code bloat a bit (by dropping some code that's not
necessary any more after that modification).
The runtime PM documentation is updated by Alan Stern (to reflect
the "runtime idle" behavior change).
New trace points for PM QoS are added by Sahara
(<keun-o.park@windriver.com>).
PM QoS documentation is updated by Lan Tianyu.
Code cleanups are made and minor issues are addressed by Bernie
Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.
- devfreq updates
New driver for the Exynos5-bus device from Abhilash Kesavan.
Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.
- OMAP power management updates
Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
updates from Andrii Tseglytskyi and Nishanth Menon."
* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
cpufreq: Fix cpufreq regression after suspend/resume
ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
PM / Sleep: Warn about system time after resume with pm_trace
cpufreq: don't leave stale policy pointer in cdbs->cur_policy
acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
cpufreq: make sure frequency transitions are serialized
ACPI: implement acpi_os_get_timer() according the spec
ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
ACPI: Add CMOS RTC Operation Region handler support
ACPI / processor: Drop unused variable from processor_perflib.c
cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
...
|
|
Pull cifs updates from Steve French:
"Various CIFS/SMB2/SMB3 updates for 3.11. Includes bug fixes - SMB3
support should be much more stable with key DFS fix and also signing
possible now (although is more work to do to get SMB3 signing working
well with multiuser).
Mounts using the new SMB 3.02 dialect can now be done (specify
"vers=3.02" on mount) against the most current Microsoft systems.
Also includes a big cleanup of the cifs/smb2/smb3 authentication code
from Jeff which fixes some long standing problems with the way allowed
authentication flavors and signing are configured.
Some followon patches later in the cycle will clean up allocation of
structures for the various security mechanisms depending on what
dialect is chosen (reduces memory usage a little) and to add support
for the secure negotiate fsctl (for smb3) which prevents downgrade
attacks."
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (39 commits)
cifs: fill TRANS2_QUERY_FILE_INFO ByteCount fields
cifs: fix SMB2 signing enablement in cifs_enable_signing
[CIFS] Fix build warning
[CIFS] SMB3 Signing enablement
[CIFS] Do not set DFS flag on SMB2 open
[CIFS] fix static checker warning
cifs: try to handle the MUST SecurityFlags sanely
When server doesn't provide SecurityBuffer on SMB2Negotiate pick default
Handle big endianness in NTLM (ntlmv2) authentication
revalidate directories instiantiated via FIND_* in order to handle DFS referrals
SMB2 FSCTL and IOCTL worker function
Charge at least one credit, if server says that it supports multicredit
Remove typo
Some missing share flags
cifs: using strlcpy instead of strncpy
Update headers to update various SMB3 ioctl definitions
Update cifs version number
Add ability to dipslay SMB3 share flags and capabilities for debugging
Add some missing SMB3 and SMB3.02 flags
Add SMB3.02 dialect support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull pstore update from Tony Luck:
"Fixes for pstore for 3.11 merge window"
* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
efivars: If pstore_register fails, free unneeded pstore buffer
acpi: Eliminate console msg if pstore.backend excludes ERST
pstore: Return unique error if backend registration excluded by kernel param
pstore: Fail to unlink if a driver has not defined pstore_erase
pstore/ram: remove the power of buffer size limitation
pstore/ram: avoid atomic accesses for ioremapped regions
efi, pstore: Cocci spatch "memdup.spatch"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second set of VFS changes from Al Viro:
"Assorted f_pos race fixes, making do_splice_direct() safe to call with
i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
->d_hash/->d_compare calling conventions changes from Linus, misc
stuff all over the place."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
Document ->tmpfile()
ext4: ->tmpfile() support
vfs: export lseek_execute() to modules
lseek_execute() doesn't need an inode passed to it
block_dev: switch to fixed_size_llseek()
cpqphp_sysfs: switch to fixed_size_llseek()
tile-srom: switch to fixed_size_llseek()
proc_powerpc: switch to fixed_size_llseek()
ubi/cdev: switch to fixed_size_llseek()
pci/proc: switch to fixed_size_llseek()
isapnp: switch to fixed_size_llseek()
lpfc: switch to fixed_size_llseek()
locks: give the blocked_hash its own spinlock
locks: add a new "lm_owner_key" lock operation
locks: turn the blocked_list into a hashtable
locks: convert fl_link to a hlist_node
locks: avoid taking global lock if possible when waking up blocked waiters
locks: protect most of the file_lock handling with i_lock
locks: encapsulate the fl_link list handling
locks: make "added" in __posix_lock_file a bool
...
|
|
very similar to ext3 counterpart...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
matter in lseek_execute() to update the current file offset
to the desired offset if it is valid, ceph also does the
simliar things at ceph_llseek().
To reduce the duplications, this patch make lseek_execute()
public accessible so that we can call it directly from the
underlying file systems.
Thanks Dave Chinner for this suggestion.
[AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]
v2->v1:
- Add kernel-doc comments for lseek_execute()
- Call lseek_execute() in ceph->llseek()
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Ted Tso <tytso@mit.edu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Sage Weil <sage@inktank.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Time in range will fail safely if we move to a different cpu with an
extremely large clock skew.
Add time_in_range64() and convert lls to use it.
changelog:
v2
- fixed double call to sched_clock in can_poll_ll
- fixed checkpatchisms
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the big driver core merge for 3.11-rc1
Lots of little things, and larger firmware subsystem updates, all
described in the shortlog. Nice thing here is that we finally get rid
of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had
been always on for a number of kernel releases, now it's just
removed)"
* tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits)
driver core: device.h: fix doc compilation warnings
firmware loader: fix another compile warning with PM_SLEEP unset
build some drivers only when compile-testing
firmware loader: fix compile warning with PM_SLEEP set
kobject: sanitize argument for format string
sysfs_notify is only possible on file attributes
firmware loader: simplify holding module for request_firmware
firmware loader: don't export cache_firmware and uncache_firmware
drivers/base: Use attribute groups to create sysfs memory files
firmware loader: fix compile warning
firmware loader: fix build failure with !CONFIG_FW_LOADER_USER_HELPER
Documentation: Updated broken link in HOWTO
Finally eradicate CONFIG_HOTPLUG
driver core: firmware loader: kill FW_ACTION_NOHOTPLUG requests before suspend
driver core: firmware loader: don't cache FW_ACTION_NOHOTPLUG firmware
Documentation: Tidy up some drivers/base/core.c kerneldoc content.
platform_device: use a macro instead of platform_driver_register
firmware: move EXPORT_SYMBOL annotations
firmware: Avoid deadlock of usermodehelper lock at shutdown
dell_rbu: Select CONFIG_FW_LOADER_USER_HELPER explicitly
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull FS-Cache updates from David Howells:
"This contains a number of fixes for various FS-Cache issues plus some
cleanups. The commits are, in order:
1) Provide a system wait_on_atomic_t() and wake_up_atomic_t() sharing
the bit-wait table (enhancement for #8).
2) Don't put spin_lock() in a while-condition as spin_lock() may have
a do {} while(0) wrapper (cleanup).
3) Symbolically name i_mutex lock classes rather than using numbers
in CacheFiles (cleanup).
4) Don't sleep in page release if __GFP_FS is not set (deadlock vs
ext4).
5) Uninline fscache_object_init() (cleanup for #7).
6) Wrap checks on object state (cleanup for #7).
7) Simplify the object state machine by separating work states from
wait states.
8) Simplify cookie retention by objects (NULL pointer deref fix).
9) Remove unused list_to_page() macro (cleanup).
10) Make the remaining-pages counter in the retrieval op atomic
(assertion failure fix).
11) Don't use spin_is_locked() in assertions (assertion failure fix)"
* tag 'fscache-20130702' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
FS-Cache: Don't use spin_is_locked() in assertions
FS-Cache: The retrieval remaining-pages counter needs to be atomic_t
cachefiles: remove unused macro list_to_page()
FS-Cache: Simplify cookie retention for fscache_objects, fixing oops
FS-Cache: Fix object state machine to have separate work and wait states
FS-Cache: Wrap checks on object state
FS-Cache: Uninline fscache_object_init()
FS-Cache: Don't sleep in page release if __GFP_FS is not set
CacheFiles: name i_mutex lock class explicitly
fs/fscache: remove spin_lock() from the condition in while()
Add wait_on_atomic_t() and wake_up_atomic_t()
|