summaryrefslogtreecommitdiff
path: root/fs/udf
AgeCommit message (Collapse)Author
2017-03-02statx: Add a system call to make enhanced file info availableDavid Howells
Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr*() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char *filename, unsigned int flags, unsigned int mask, struct statx *buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares*[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS_*_FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_*, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-02-27fs: add i_blocksize()Fabian Frederick
Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs branch. This patch also fixes multiple checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Thanks to Andrew Morton for suggesting more appropriate function instead of macro. [geliangtang@gmail.com: truncate: use i_blocksize()] Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03udf: simplify udf_ioctl()Fabian Frederick
"out" label was only returning error code. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-03udf: fix ioctl errorsFabian Frederick
Currently, lsattr for instance in udf directory gives "udf: Invalid argument While reading flags on ..." This patch returns -ENOIOCTLCMD when command is unknown to have more accurate message like this: "Inappropriate ioctl for device While reading flags on ..." Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-20udf: allow implicit blocksize specification during mountFabian Frederick
udf_fill_super() used udf_parse_options() to flag UDF_FLAG_BLOCKSIZE_SET when blocksize was specified otherwise used 512 bytes (bdev_logical_block_size) and 2048 bytes (UDF_DEFAULT_BLOCKSIZE) IOW both 1024 and 4096 specifications were required or resulted in "mount: wrong fs type, bad option, bad superblock on /dev/loop1" This patch loops through different block values but also updates udf_load_vrs() to return -EINVAL instead of 0 when udf_check_vsd() fails (and uopt->novrs = 0). The later being the reason for the RFC; we have that case when mounting a 4kb blocksize against other values but maybe VRS is not mandatory there ? Tested with 512, 1024, 2048 and 4096 blocksize Reported-by: Jan Kara <jack@suse.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: check partition reference in udf_read_inode()Fabian Frederick
We were checking block number without checking partition. sbi->s_partmaps[iloc->partitionReferenceNum] could lead to bad memory access. See udf_nfs_get_inode() path for instance. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: atomically read inode sizeFabian Frederick
See i_size_read() comments in include/linux/fs.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: merge module informations in super.cFabian Frederick
Move all module attributes at the end of one file like other FS. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: remove next_epos from udf_update_extent_cache()Fabian Frederick
udf_update_extent_cache() is only called from inode_bmap() with 1 for next_epos Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: Factor out trimming of crtimeFabian Frederick
Factor out trimming of crtime field. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: remove empty conditionFabian Frederick
loc & 0x02 is empty since first git version in 2005 in udf_add_extendedattr() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: remove unneeded line breakFabian Frederick
Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: merge bh freeFabian Frederick
Merge all bh free at one place. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: use pointer for kernel_long_ad argumentFabian Frederick
Having struct kernel_long_ad laarr[EXTENT_MERGE_SIZE] in all function arguments could be understood as by-value parameter. Use kernel_long_ad pointer for functions depending on inode_getblk() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-10udf: use __packed instead of __attribute__ ((packed))Fabian Frederick
defined in linux/compiler-gcc.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-05udf: Make stat on symlink report symlink length as st_sizeJan Kara
UDF encodes symlinks in a more complex fashion and thus i_size of a symlink does not match the lenght of a string returned by readlink(2). This confuses some applications (see bug 191241) and may be considered a violation of POSIX. Fix the problem by reading the link into page cache in response to stat(2) call and report the length of the decoded path. Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-03fs/udf: make #ifdef UDF_PREALLOCATE unconditionalSteve Kenton
Signed-off-by: Steve Kenton <skenton@ou.edu> Signed-off-by: Jan Kara <jack@suse.cz>
2017-01-03fs: udf: Replace CURRENT_TIME with current_time()Deepa Dinamani
CURRENT_TIME is not y2038 safe. CURRENT_TIME macro is also not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Logical Volume Integrity format is described to have the same timestamp format for "Recording Date and time" as the other [a,c,m]timestamps. The function udf_time_to_disk_format() does this conversion. Hence the timestamp is passed directly to the function and not truncated. This is as per Arnd's suggestion on the thread. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-01block,fs: untangle fs.h and blk_types.hChristoph Hellwig
Nothing in fs.h should require blk_types.h to be included. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning
2016-10-10Merge remote-tracking branch 'ovl/rename2' into for-linusAl Viro
2016-09-27fs: Replace current_fs_time() with current_time()Deepa Dinamani
current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27fs: rename "rename2" i_op to "rename"Miklos Szeredi
Generated patch: sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2` sed -i "s/\brename2\b/rename/g" `git grep -wl rename2` Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-27fs: support RENAME_NOREPLACE for local filesystemsMiklos Szeredi
This is trivial to do: - add flags argument to foo_rename() - check if flags doesn't have any other than RENAME_NOREPLACE - assign foo_rename() to .rename2 instead of .rename Filesystems converted: affs, bfs, exofs, ext2, hfs, hfsplus, jffs2, jfs, logfs, minix, msdos, nilfs2, omfs, reiserfs, sysvfs, ubifs, udf, ufs, vfat. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Boaz Harrosh <ooo@electrozaur.com> Acked-by: Richard Weinberger <richard@nod.at> Acked-by: Bob Copeland <me@bobcopeland.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org>
2016-09-22fs: Give dentry to inode_change_ok() instead of inodeJan Kara
inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-19udf: don't bother with full-page write optimisations in adinicb caseAl Viro
... it would get converted to regular if such had been attempted Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-06udf: Remove useless check in udf_adinicb_write_begin()Jan Kara
As Al properly points out, len is guaranteed to be smaller than PAGE_SIZE when we reach udf_adinicb_write_begin() as otherwise we would have converted the file to the normal format. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>
2016-07-26Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...
2016-07-26Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...
2016-07-20block: get rid of bio_rw and READAChristoph Hellwig
These two are confusing leftover of the old world order, combining values of the REQ_OP_ and REQ_ namespaces. For callers that don't special case we mostly just replace bi_rw with bio_data_dir or op_is_write, except for the few cases where a switch over the REQ_OP_ values makes more sense. Any check for READA is replaced with an explicit check for REQ_RAHEAD. Also remove the READA alias for REQ_RAHEAD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07fs: have ll_rw_block users pass in op and flags separatelyMike Christie
This has ll_rw_block users pass in the operation and flags separately, so ll_rw_block can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-19udf: Use correct partition reference number for metadataAlden Tondettar
UDF/OSTA terminology is confusing. Partition Numbers (PNs) are arbitrary 16-bit values, one for each physical partition in the volume. Partition Reference Numbers (PRNs) are indices into the the Partition Map Table and do not necessarily equal the PN of the mapped partition. The current metadata code mistakenly uses the PN instead of the PRN when mapping metadata blocks to physical/sparable blocks. Windows-created UDF 2.5 discs for some reason use large, arbitrary PNs, resulting in mount failure and KASAN read warnings in udf_read_inode(). For example, a NetBSD UDF 2.5 partition might look like this: PRN PN Type --- -- ---- 0 0 Sparable 1 0 Metadata Since PRN == PN, we are fine. But Windows could gives us: PRN PN Type --- ---- ---- 0 8192 Sparable 1 8192 Metadata So udf_read_inode() will start out by checking the partition length in sbi->s_partmaps[8192], which is obviously out of bounds. Fix this by creating a new field (s_phys_partition_ref) in struct udf_meta_data, referencing whatever physical or sparable map has the same partition number as the metadata partition. [JK: Add comment about s_phys_partition_ref, change its name] Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-05-19udf: Use IS_ERR when loading metadata mirror file entryAlden Tondettar
Currently when udf_get_pblock_meta25() fails to map a block using the primary metadata file, it will attempt to load the mirror file entry by calling udf_find_metadata_inode_efe(). That function will return a ERR_PTR if it fails, but the return value is only checked against NULL. Test the return value using IS_ERR() and change it to NULL if needed. Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-05-19udf: Don't BUG on missing metadata partition descriptorAlden Tondettar
Currently, if a metadata partition map is missing its partition descriptor, then udf_get_pblock_meta25() will BUG() out the first time it is called. This is rather drastic for a corrupted filesystem, so just treat this case as an invalid mapping instead. Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-05-17Merge branch 'work.preadv2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs cleanups from Al Viro: "More cleanups from Christoph" * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: use RWF_SYNC fs: add RWF_DSYNC aand RWF_SYNC ceph: use generic_write_sync fs: simplify the generic_write_sync prototype fs: add IOCB_SYNC and IOCB_DSYNC direct-io: remove the offset argument to dio_complete direct-io: eliminate the offset argument to ->direct_IO xfs: eliminate the pos variable in xfs_file_dio_aio_write filemap: remove the pos argument to generic_file_direct_write filemap: remove pos variables in generic_file_read_iter
2016-05-17Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes from Jan Kara: "A fix for UDF crash on corrupted media and one UDF header fixup" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Export superblock magic to userspace udf: Prevent stack overflow on corrupted filesystem mount
2016-05-17Merge branch 'ovl-fixes' into for-linusAl Viro
Backmerge to resolve a conflict in ovl_lookup_real(); "ovl_lookup_real(): use lookup_one_len_unlocked()" instead, but it was too late in the cycle to rebase.
2016-05-09more trivial ->iterate_shared conversionsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02Merge getxattr prototype change into work.lookupsAl Viro
The rest of work.xattr stuff isn't needed for this branch
2016-05-01fs: simplify the generic_write_sync prototypeChristoph Hellwig
The kiocb already has the new position, so use that. The only interesting case is AIO, where we currently don't bother updating ki_pos. We're about to free the kiocb after we're done, so we might as well update it to make everyone's life simpler. While we're at it also return the bytes written argument passed in if we were successful so that the boilerplate error switch code in the callers can go away. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-01fs: add IOCB_SYNC and IOCB_DSYNCChristoph Hellwig
This will allow us to do per-I/O sync file writes, as required by a lot of fileservers or storage targets. XXX: Will need a few additional audits for O_DSYNC Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-01direct-io: eliminate the offset argument to ->direct_IOChristoph Hellwig
Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually work, so eliminate the superflous argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-28udf: Export superblock magic to userspaceJan Kara
Currently UDF superblock magic doesn't appear in any userspace header files and thus userspace apps have hard time checking for this fs. Let's export the magic to userspace as with any other filesystem. Signed-off-by: Jan Kara <jack@suse.cz>
2016-04-26udf: Prevent stack overflow on corrupted filesystem mountAlden Tondettar
Presently, a corrupted or malicious UDF filesystem containing a very large number (or cycle) of Logical Volume Integrity Descriptor extent indirections may trigger a stack overflow and kernel panic in udf_load_logicalvolint() on mount. Replace the unnecessary recursion in udf_load_logicalvolint() with simple iteration. Set an arbitrary limit of 1000 indirections (which would have almost certainly overflowed the stack without this fix), and treat such cases as if there were no LVID. Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-04-25udf: Fix conversion of 'dstring' fields to UTF8Andrew Gabbasov
Commit 9293fcfbc1812a22ad5ce1b542eb90c1bbe01be1 ("udf: Remove struct ustr as non-needed intermediate storage"), while getting rid of 'struct ustr', does not take any special care of 'dstring' fields and effectively use fixed field length instead of actual string length, encoded in the last byte of the field. Also, commit 484a10f49387e4386bf2708532e75bf78ffea2cb ("udf: Merge linux specific translation into CS0 conversion function") introduced checking of the length of the string being converted, requiring proper alignment to number of bytes constituing each character. The UDF volume identifier is represented as a 32-bytes 'dstring', and needs to be converted from CS0 to UTF8, while mounting UDF filesystem. The changes in mentioned commits can in some cases lead to incorrect handling of volume identifier: - if the actual string in 'dstring' is of maximal length and does not have zero bytes separating it from dstring encoded length in last byte, that last byte may be included in conversion, thus making incorrect resulting string; - if the identifier is encoded with 2-bytes characters (compression code is 16), the length of 31 bytes (32 bytes of field length minus 1 byte of compression code), taken as the string length, is reported as an incorrect (unaligned) length, and the conversion fails, which in its turn leads to volume mounting failure. This patch introduces handling of 'dstring' encoded length field in udf_CS0toUTF8 function, that is used in all and only cases when 'dstring' fields are converted. Currently these cases are processing of Volume Identifier and Volume Set Identifier fields. The function is also renamed to udf_dstrCS0toUTF8 to distinctly indicate that it handles 'dstring' input. Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-04-10don't bother with ->d_inode->i_sb - it's always equal to ->d_sbAl Viro
... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-09udf: Merge linux specific translation into CS0 conversion functionAndrew Gabbasov
Current implementation of udf_translate_to_linux function does not support multi-bytes characters at all: it counts bytes while calculating extension length, when inserting CRC inside the name it doesn't take into account inter-character boundaries and can break into the middle of the character. The most efficient way to properly support multi-bytes characters is merging of translation operations directly into conversion function. This can help to avoid extra passes along the string or parsing the multi-bytes character back into unicode to find out it's length. Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-02-09udf: Remove struct ustr as non-needed intermediate storageAndrew Gabbasov
Although 'struct ustr' tries to structurize the data by combining the string and its length, it doesn't actually make much benefit, since it saves only one parameter, but introduces an extra copying of the whole buffer, serving as an intermediate storage. It looks quite inefficient and not actually needed. This commit gets rid of the struct ustr by changing the parameters of some functions appropriately. Also, it removes using 'dstring' type, since it doesn't make much sense too. Just using the occasion, add a 'const' qualifier to udf_get_filename to make consistent parameters sets. Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-02-09udf: Use separate buffer for copying split namesJan Kara
Code in udf_find_entry() and udf_readdir() used the same buffer for storing filename that was split among blocks and for the resulting filename in utf8. This worked because udf_get_filename() first internally copied the name into a different buffer and only then performed a conversion into the destination buffer. However we want to get rid of intermediate buffers so use separate buffer for converted name and name split between blocks so that we don't have the same source and destination buffer when converting split names. Signed-off-by: Jan Kara <jack@suse.cz>