summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2013-02-26libceph: update osd request/reply encodingSage Weil
Use the new version of the encoding for osd requests and replies. In the process, update the way we are tracking request ops and reply lengths and results in the struct ceph_osd_request. Update the rbd and fs/ceph users appropriately. The main changes are: - we keep pointers into the request memory for fields we need to update each time the request is sent out over the wire - we keep information about the result in an array in the request struct where the users can easily get at it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2013-02-26rbd: pass length, not op for osd completionsAlex Elder
The only thing type-specific osd completion functions do with their osd op parameter is (in some cases) extract the number of bytes transferred from it. In the other cases, the xferred bytes field is not used, and total message data transfer byte count (which may well be zero) is used. Just set the object request transfer count in the main osd request callback function and provide that to the other routines. There is then no longer any need to pass the op pointer to the type-specific completion routines, so drop those parameters. Stop doing anything with the total message data length. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-26rbd: move rbd_osd_trivial_callback()Alex Elder
This function is slightly out of place, probably the result of an errant automatic merge or something. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-26switch vfs_getattr() to struct pathAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-25rbd: eliminate sparse warningsAlex Elder
Fengguang Wu reminded me that there were outstanding sparse reports in the ceph and rbd code. This patch fixes these problems in rbd that lead to those reports: - Convert functions that are never referenced externally to have static scope. - Add a lockdep annotation to rbd_request_fn(), because it releases a lock before acquiring it again. This partially resolves: http://tracker.ceph.com/issues/4184 Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25rbd: normalize dout() callsAlex Elder
Add dout() calls to facilitate tracing of image and object requests. Change a few existing calls so they use __func__ rather than the hard-coded function name. Have calls always add ":" after the name of the function, and prefix pointer values with a consistent tag indicating what it represents. (Note that there remain some older dout() calls that are left untouched by this patch.) Issue a warning if rbd_osd_write_callback() ever gets a short write. This resolves: http://tracker.ceph.com/issues/4235 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25rbd: barriers are hardAlex Elder
Let's go shopping! I'm afraid this may not have gotten it right: 07741308 rbd: add barriers near done flag operations The smp_wmb() should have been done *before* setting the done flag, to ensure all other data was valid before marking the object request done. Switch to use atomic_inc_return() here to set the done flag, which allows us to verify we don't mark something done more than once. Doing this also implies general barriers before and after the call. And although a read memory barrier might have been sufficient before reading the done flag, convert this to a full memory barrier just to put this issue to bed. This resolves: http://tracker.ceph.com/issues/4238 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25rbd: ignore zero-length requestsAlex Elder
The old request code simply ignored zero-length requests. We should still operate that same way to avoid any changes in behavior. We can implement handling for special zero-length requests separately (see http://tracker.ceph.com/issues/4236). Add some assertions based on this new constraint. This resolves: http://tracker.ceph.com/issues/4237 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-22new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22loopdev: ignore negative offset when calculate loop device sizeGuo Chao
Negative offset may cause loop device size larger than backing file size. $ fallocate -l 1M a $ losetup --offset 0xffffffffffff0000 /dev/loop0 a $ blockdev --getsize64 /dev/loop0 1114112 $ ls -l a -rw-r--r-- 1 root root 1048576 Jan 23 12:46 a $ cat /dev/loop0 cat: /dev/loop0: Input/output error It makes no sense to do that. Only apply offset when it's positive. Fix a typo in the comment by the way. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22loopdev: remove an user triggerable oopsGuo Chao
When loopdev is built as module and we pass an invalid parameter, loop_init() will return directly without deregister misc device, which will cause an oops when insert loop module next time because we left some garbage in the misc device list. Test case: sudo modprobe loop max_part=1024 (failed due to invalid parameter) sudo modprobe loop (oops) Clean up nicely to avoid such oops. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22loopdev: move common code into loop_figure_size()Guo Chao
Update block device size in accord with gendisk size and let userspace know the change in loop_figure_size(). This is a clean up to remove common code of loop_figure_size()'s two callers. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22loopdev: update block device size in loop_set_status()Guo Chao
Loop device driver sometimes fails to impose the size limit on the device. Keep issuing following two commands: losetup --offset 7517244416 --sizelimit 3224971264 /dev/loop0 backed_file blockdev --getsize64 /dev/loop0 blockdev reports file size instead of sizelimit several out of 100 times. The problems are: - losetup set up the device in two ioctl: LOOP_SET_FD and LOOP_SET_STATUS64. - LOOP_SET_STATUS64 only update size of gendisk. Block device size will be updated lazily when device comes to use. If udev rushes in between the two ioctl, it will bring in a block device whose size is backing file size. If the device is not released after LOOP_SET_STATUS64 ioctl, blockdev will not see the updated size. Update block size in LOOP_SET_STATUS64 ioctl. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Reported-by: M. Hindess <hindessm@uk.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22loopdev: fix a deadlockGuo Chao
bd_mutex and lo_ctl_mutex can be held in different order. Path #1: blkdev_open blkdev_get __blkdev_get (hold bd_mutex) lo_open (hold lo_ctl_mutex) Path #2: blkdev_ioctl lo_ioctl (hold lo_ctl_mutex) lo_set_capacity (hold bd_mutex) Lockdep does not report it, because path #2 actually holds a subclass of lo_ctl_mutex. This subclass seems creep into the code by mistake. The patch author actually just mentioned it in the changelog, see commit f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see: http://marc.info/?l=linux-kernel&m=123806169129727&w=2 Path #2 hold bd_mutex to call bd_set_size(), I've protected it with i_mutex in a previous patch, so drop bd_mutex at this site. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22drivers/block/swim3.c: fix null pointer dereferenceCong Ding
The use of pointer fs should be after the null check. Signed-off-by: Cong Ding <dinggnu@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-21Merge tag 'driver-core-3.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...
2013-02-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k update from Geert Uytterhoeven. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Sort out !CONFIG_MMU_SUN3 vs. CONFIG_HAS_DMA swim: Add missing spinlock init
2013-02-20Merge branch 'stable/for-jens-3.9' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.9/drivers Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.9 which has bug-fixes that did not make it in v3.8. They all are marked as material for the stable tree as well. There are two bug-fixes for the code that has been in there for some time (that is the Jan's fix and one of mine). And there are two bug-fixes for the persistent grant feature that debuted in v3.8 for xen blk[back|front]end.
2013-02-19Merge branch 'testing' of github.com:ceph/ceph-client into into linux-3.8-cephAlex Elder
2013-02-19libceph: drop return value from page vector copy routinesAlex Elder
The return values provided for ceph_copy_to_page_vector() and ceph_copy_from_page_vector() serve no purpose, so get rid of them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-19rbd: ignore result of ceph_copy_from_page_vector()Alex Elder
The result of ceph_copy_from_page_vector() is simply the length argument it is provided. This is called by rbd_obj_method_sync(), which returns the result if it's non-negative. But we always either ignore or overwrite that return value. So explicitly ignore what's returned by the copy function, and have rbd_obj_method_sync() always return either a negative errno or 0. We also return the result of ceph_copy_from_page_vector() in rbd_obj_read_sync(). There we still want to return the number of bytes transferred, but we can use the value we already have in hand rather than what ceph_copy_from_page_vector() provides. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-19rbd: prevent bytes transferred overflowAlex Elder
In rbd_obj_read_sync(), verify the number of bytes transferred won't exceed what can be represented by a size_t before using it to indicate the number of bytes to copy to the result buffer. (The real motivation for this is to prepare for the next patch.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-19libceph: allow STAT osd operationsAlex Elder
Add support for CEPH_OSD_OP_STAT operations in the osd client and in rbd. This operation sends no data to the osd; everything required is encoded in identity of the target object. The result will be ENOENT if the object doesn't exist. If it does exist and no other error occurs the server returns the size and last modification time of the target object as output data (in little endian format). The size is a 64 bit unsigned and the time is ceph_timespec structure (two unsigned 32-bit integers, representing a seconds and nanoseconds value). This resolves: http://tracker.ceph.com/issues/4007 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-19rbd: add parentheses to object request iterator macrosAlex Elder
The for_each_obj_request*() macros should parenthesize their uses of the ireq parameter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-19xen-blkback: use balloon pages for persistent grantsRoger Pau Monne
With current persistent grants implementation we are not freeing the persistent grants after we disconnect the device. Since grant map operations change the mfn of the allocated page, and we can no longer pass it to __free_page without setting the mfn to a sane value, use balloon grant pages instead, as the gntdev device does. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: stable@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-02-19xen-blkfront: drop the use of llist_for_each_entry_safeKonrad Rzeszutek Wilk
Replace llist_for_each_entry_safe with a while loop. llist_for_each_entry_safe can trigger a bug in GCC 4.1, so it's best to remove it and use a while loop and do the deletion manually. Specifically this bug can be triggered by hot-unplugging a disk, either by doing xm block-detach or by save/restore cycle. BUG: unable to handle kernel paging request at fffffffffffffff0 IP: [<ffffffffa0047223>] blkif_free+0x63/0x130 [xen_blkfront] The crash call trace is: ... bad_area_nosemaphore+0x13/0x20 do_page_fault+0x25e/0x4b0 page_fault+0x25/0x30 ? blkif_free+0x63/0x130 [xen_blkfront] blkfront_resume+0x46/0xa0 [xen_blkfront] xenbus_dev_resume+0x6c/0x140 pm_op+0x192/0x1b0 device_resume+0x82/0x1e0 dpm_resume+0xc9/0x1a0 dpm_resume_end+0x15/0x30 do_suspend+0x117/0x1e0 When drilling down to the assembler code, on newer GCC it does .L29: cmpq $-16, %r12 #, persistent_gnt check je .L30 #, out of the loop .L25: ... code in the loop testq %r13, %r13 # n je .L29 #, back to the top of the loop cmpq $-16, %r12 #, persistent_gnt check movq 16(%r12), %r13 # <variable>.node.next, n jne .L25 #, back to the top of the loop .L30: While on GCC 4.1, it is: L78: ... code in the loop testq %r13, %r13 # n je .L78 #, back to the top of the loop movq 16(%rbx), %r13 # <variable>.node.next, n jmp .L78 #, back to the top of the loop Which basically means that the exit loop condition instead of being: &(pos)->member != NULL; is: ; which makes the loop unbound. Since xen-blkfront is the only user of the llist_for_each_entry_safe macro remove it from llist.h. Orabug: 16263164 CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-02-19xen/blkback: Don't trust the handle from the frontend.Konrad Rzeszutek Wilk
The 'handle' is the device that the request is from. For the life-time of the ring we copy it from a request to a response so that the frontend is not surprised by it. But we do not need it - when we start processing I/Os we have our own 'struct phys_req' which has only most essential information about the request. In fact the 'vbd_translate' ends up over-writing the preq.dev with a value from the backend. This assignment of preq.dev with the 'handle' value is superfluous so lets not do it. Cc: stable@vger.kernel.org Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-02-19xen-blkback: do not leak mode propertyJan Beulich
"be->mode" is obtained from xenbus_read(), which does a kmalloc() for the message body. The short string is never released, so do it along with freeing "be" itself, and make sure the string isn't kept when backend_changed() doesn't complete successfully (which made it desirable to slightly re-structure that function, so that the error cleanup can be done in one place). Reported-by: Olaf Hering <olaf@aepfle.de> CC: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-02-18block: IBM RamSan 70/80 driver fixesPhilip J Kelleher
This patch includes the following driver fixes for the IBM RamSan 70/80 driver: o Changed the creg_ctrl lock from a mutex to a spinlock. o Added a count check for ioctl calls. o Removed unnecessary casting of void pointers. o Made every function static that needed to be. o Added comments to explain things more thoroughly. Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-18libceph: kill ceph_osdc_create_event() "one_shot" parameterAlex Elder
There is only one caller of ceph_osdc_create_event(), and it provides 0 as its "one_shot" argument. Get rid of that argument and just use 0 in its place. Replace the code in handle_watch_notify() that executes if one_shot is nonzero in the event with a BUG_ON() call. While modifying "osd_client.c", give handle_watch_notify() static scope. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: "A couple small fixes for sparc including some THP brown-paper-bag material: 1) During the merging of all the THP support for various architectures, sparc missed adding a HAVE_ARCH_TRANSPARENT_HUGEPAGE to it's Kconfig, oops. 2) Sparc needs to be mindful of hugepages in get_user_pages_fast(). 3) Fix memory leak in SBUS probe, from Cong Ding. 4) The sunvdc virtual disk client driver has a test of the bitmask of vdisk server supported operations which was off by one bit" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sunvdc: Fix off-by-one in generic_request(). sparc64: Fix get_user_pages_fast() wrt. THP. sparc64: Add missing HAVE_ARCH_TRANSPARENT_HUGEPAGE. sparc: kernel/sbus.c: fix memory leakage
2013-02-14sunvdc: Fix off-by-one in generic_request().David S. Miller
The 'operations' bitmap corresponds one-for-one with the operation codes, no adjustment is necessary. Reported-by: Mark Kettenis <mark.kettenis@xs4all.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-14Merge branch 'delete-xt-disk' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux into for-3.9/drivers Paul writes: Please pull the following to get the removal of the original IBM PC-XT hard disk driver from the block layer (drivers/block/xd.c). As near as I can tell, it hasn't seen a run time fix in over a dozen years, and with drive sizes of 10-20MB, and performance of about 128kB/s maximum, it is no surprise that it has been completely unused for well over a decade. The removal was originally posted[1] well over a month ago, and since then, there has been nobody objecting to the removal, aside from someone who had mistakenly confused it with a completely different driver (hd.c)
2013-02-13rbd: add barriers near done flag operationsAlex Elder
Somehow, I missed this little item in Documentation/atomic_ops.txt: *** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! *** Create and use some helper functions that include the proper memory barriers for manipulating the done field. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: turn off interrupts for open/remove lockingAlex Elder
This commit: bc7a62ee5 rbd: prevent open for image being removed added checking for removing rbd before allowing an open, and used the same request spinlock for protecting that and updating the open count as is used for the request queue. However it used the non-irq protected version of the spinlocks. Fix that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13libceph: don't require r_num_pages for bio requestsAlex Elder
There is a check in the completion path for osd requests that ensures the number of pages allocated is enough to hold the amount of incoming data expected. For bio requests coming from rbd the "number of pages" is not really meaningful (although total length would be). So stop requiring that nr_pages be supplied for bio requests. This is done by checking whether the pages pointer is null before checking the value of nr_pages. Note that this value is passed on to the messenger, but there it's only used for debugging--it's never used for validation. While here, change another spot that used r_pages in a debug message inappropriately, and also invalidate the r_con_filling_msg pointer after dropping a reference to it. This resolves: http://tracker.ceph.com/issues/3875 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: don't take extra bio reference for osd clientAlex Elder
Currently, if the OSD client finds an osd request has had a bio list attached to it, it drops a reference to it (or rather, to the first entry on that list) when the request is released. The code that added that reference (i.e., the rbd client) is therefore required to take an extra reference to that first bio structure. The osd client doesn't really do anything with the bio pointer other than transfer it from the osd request structure to outgoing (for writes) and ingoing (for reads) messages. So it really isn't the right place to be taking or dropping references. Furthermore, the rbd client already holds references to all bio structures it passes to the osd client, and holds them until the request is completed. So there's no need for this extra reference whatsoever. So remove the bio_put() call in ceph_osdc_release_request(), as well as its matching bio_get() call in rbd_osd_req_create(). This change could lead to a crash if old libceph.ko was used with new rbd.ko. Add a compatibility check at rbd initialization time to avoid this possibilty. This resolves: http://tracker.ceph.com/issues/3798 and http://tracker.ceph.com/issues/3799 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: prevent open for image being removedAlex Elder
An open request for a mapped rbd image can arrive while removal of that mapping is underway. We need to prevent such an open request from succeeding. (It appears that Maciej Galkiewicz ran into this problem.) Define and use a "removing" flag to indicate a mapping is getting removed. Set it in the remove path after verifying nothing holds the device open. And check it in the open path before allowing the open to proceed. Acquire the rbd device's lock around each of these spots to avoid any races accessing the flags and open_count fields. This addresses: http://tracker.newdream.net/issues/3427 Reported-by: Maciej Galkiewicz <maciejgalkiewicz@ragnarson.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: define flags field, use it for exists flagAlex Elder
Define a new rbd device flags field, manipulated using bit operations. Replace the use of the current "exists" flag with a bit in this new "flags" field. Add a little commentary about the "exists" flag, which does not need to be manipulated atomically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: don't drop watch requests on completionAlex Elder
When we register an osd request to linger, it means that request will stay around (under control of the osd client) until we've unregistered it. We do that for an rbd image's header object, and we keep a pointer to the object request associated with it. Keep a reference to the watch object request for as long as it is registered to linger. Drop it again after we've removed the linger registration. This resolves: http://tracker.ceph.com/issues/3937 (Note: this originally came about because the osd client was issuing a callback more than once. But that behavior will be changing soon, documented in tracker issue 3967.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: decrement obj request count when deletingAlex Elder
Decrement the obj_request_count value when deleting an object request from its image request's list. Rearrange a few lines in the surrounding code. This resolves: http://tracker.ceph.com/issues/3940 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: track object rather than osd request for watchAlex Elder
Switch to keeping track of the object request pointer rather than the osd request used to watch the rbd image header object. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: unregister linger in watch sync routineAlex Elder
Move the code that unregisters an rbd device's lingering header object watch request into rbd_dev_header_watch_sync(), so it occurs in the same function that originally sets up that request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: get rid of rbd_req_sync_exec()Alex Elder
Get rid rbd_req_sync_exec() because it is no longer used. That eliminates the last use of rbd_req_sync_op(), so get rid of that too. And finally, that leaves rbd_do_request() unreferenced, so get rid of that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: implement sync method with new codeAlex Elder
Reimplement synchronous object method calls using the new request tracking code. Use the name rbd_obj_method_sync() Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: send notify ack asynchronouslyAlex Elder
When we receive notification of a change to an rbd image's header object we need to refresh our information about the image (its size and snapshot context). Once we have refreshed our rbd image we need to acknowledge the notification. This acknowledgement was previously done synchronously, but there's really no need to wait for it to complete. Change it so the caller doesn't wait for the notify acknowledgement request to complete. And change the name to reflect it's no longer synchronous. This resolves: http://tracker.newdream.net/issues/3877 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: get rid of rbd_req_sync_notify_ack()Alex Elder
Get rid rbd_req_sync_notify_ack() because it is no longer used. As a result rbd_simple_req_cb() becomes unreferenced, so get rid of that too. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: use new code for notify ackAlex Elder
Use the new object request tracking mechanism for handling a notify_ack request. Move the callback function below the definition of this so we don't have to do a pre-declaration. This resolves: http://tracker.newdream.net/issues/3754 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: get rid of rbd_req_sync_watch()Alex Elder
Get rid of rbd_req_sync_watch(), because it is no longer used. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13rbd: implement watch/unwatch with new codeAlex Elder
Implement a new function to set up or tear down a watch event for an mapped rbd image header using the new request code. Create a new object request type "nodata" to handle this. And define rbd_osd_trivial_callback() which simply marks a request done. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>