summaryrefslogtreecommitdiff
path: root/fs/notify
AgeCommit message (Collapse)Author
2019-02-21fanotify: Make waits for fanotify events only killableJan Kara
Making waits for response to fanotify permission events interruptible can result in EINTR returns from open(2) or other syscalls when there's e.g. AV software that's monitoring the file. Orion reports that e.g. bash is complaining like: bash: /etc/bash_completion.d/itweb-settings.bash: Interrupted system call So for now convert the wait from interruptible to only killable one. That is mostly invisible to userspace. Sadly this breaks hibernation with fanotify permission events pending again but we have to put more thought into how to fix this without regressing userspace visible behavior. Reported-by: Orion Poplawski <orion@nwra.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-18fanotify: Use interruptible wait when waiting for permission eventsJan Kara
When waiting for response to fanotify permission events, we currently use uninterruptible waits. That makes code simple however it can cause lots of processes to end up in uninterruptible sleep with hard reboot being the only alternative in case fanotify listener process stops responding (e.g. due to a bug in its implementation). Uninterruptible sleep also makes system hibernation fail if the listener gets frozen before the process generating fanotify permission event. Fix these problems by using interruptible sleep for waiting for response to fanotify event. This is slightly tricky though - we have to detect when the event got already reported to userspace as in that case we must not free the event. Instead we push the responsibility for freeing the event to the process that will write response to the event. Reported-by: Orion Poplawski <orion@nwra.com> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-18fanotify: Track permission event stateJan Kara
Track whether permission event got already reported to userspace and whether userspace already answered to the permission event. Protect stores to this field together with updates to ->response field by group->notification_lock. This will allow aborting wait for reply to permission event from userspace. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-18fanotify: Simplify cleaning of access_listJan Kara
Simplify iteration cleaning access_list in fanotify_release(). That will make following changes more obvious. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-18fsnotify: Create function to remove event from notification listJan Kara
Create function to remove event from the notification list. Later it will be used from more places. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-18fanotify: Move locking inside get_one_event()Jan Kara
get_one_event() has a single caller and that just locks notification_lock around the call. Move locking inside get_one_event() as that will make using ->response field for permission event state easier. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-18fanotify: Fold dequeue_event() into process_access_response()Jan Kara
Fold dequeue_event() into process_access_response(). This will make changes to use of ->response field easier. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-14fanotify: Select EXPORTFSJan Kara
Fanotify now uses exportfs_encode_inode_fh() so it needs to select EXPORTFS. Fixes: e9e0c8903009 "fanotify: encode file identifier for FAN_REPORT_FID" Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: report FAN_ONDIR to listener with FAN_REPORT_FIDAmir Goldstein
dirent modification events (create/delete/move) do not carry the child entry name/inode information. Instead, we report FAN_ONDIR for mkdir/rmdir so user can differentiate them from creat/unlink. This is consistent with inotify reporting IN_ISDIR with dirent events and is useful for implementing recursive directory tree watcher. We avoid merging dirent events referring to subdirs with dirent events referring to non subdirs, otherwise, user won't be able to tell from a mask FAN_CREATE|FAN_DELETE|FAN_ONDIR if it describes mkdir+unlink pair or rmdir+create pair of events. For backward compatibility and consistency, do not report FAN_ONDIR to user in legacy fanotify mode (reporting fd) and report FAN_ONDIR to user in FAN_REPORT_FID mode for all event types. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: add support for create/attrib/move/delete eventsAmir Goldstein
Add support for events with data type FSNOTIFY_EVENT_INODE (e.g. create/attrib/move/delete) for inode and filesystem mark types. The "inode" events do not carry enough information (i.e. path) to report event->fd, so we do not allow setting a mask for those events unless group supports reporting fid. The "inode" events are not supported on a mount mark, because they do not carry enough information (i.e. path) to be filtered by mount point. The "dirent" events (create/move/delete) report the fid of the parent directory where events took place without specifying the filename of the child. In the future, fanotify may get support for reporting filename information for those events. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: support events with data type FSNOTIFY_EVENT_INODEAmir Goldstein
When event data type is FSNOTIFY_EVENT_INODE, we don't have a refernece to the mount, so we will not be able to open a file descriptor when user reads the event. However, if the listener has enabled reporting file identifier with the FAN_REPORT_FID init flag, we allow reporting those events and we use an identifier inode to encode fid. The inode to use as identifier when reporting fid depends on the event. For dirent modification events, we report the modified directory inode and we report the "victim" inode otherwise. For example: FS_ATTRIB reports the child inode even if reported on a watched parent. FS_CREATE reports the modified dir inode and not the created inode. [JK: Fixup condition in fanotify_group_event_mask()] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: check FS_ISDIR flag instead of d_is_dir()Amir Goldstein
All fsnotify hooks set the FS_ISDIR flag for events that happen on directory victim inodes except for fsnotify_perm(). Add the missing FS_ISDIR flag in fsnotify_perm() hook and let fanotify_group_event_mask() check the FS_ISDIR flag instead of checking if path argument is a directory. This is needed for fanotify support for event types that do not carry path information. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fsnotify: report FS_ISDIR flag with MOVE_SELF and DELETE_SELF eventsAmir Goldstein
We need to report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events for fanotify, because fanotify API requires the user to explicitly request events on directories by FAN_ONDIR flag. inotify never reported IN_ISDIR with those events. It looks like an oversight, but to avoid the risk of breaking existing inotify programs, mask the FS_ISDIR flag out when reprting those events to inotify backend. We also add the FS_ISDIR flag with FS_ATTRIB event in the case of rename over an empty target directory. inotify did not report IN_ISDIR in this case, but it normally does report IN_ISDIR along with IN_ATTRIB event, so in this case, we do not mask out the FS_ISDIR flag. [JK: Simplify the checks in fsnotify_move()] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: use vfs_get_fsid() helper instead of vfs_statfs()Amir Goldstein
This is a cleanup that doesn't change any logic. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: cache fsid in fsnotify_mark_connectorAmir Goldstein
For FAN_REPORT_FID, we need to encode fid with fsid of the filesystem on every event. To avoid having to call vfs_statfs() on every event to get fsid, we store the fsid in fsnotify_mark_connector on the first time we add a mark and on handle event we use the cached fsid. Subsequent calls to add mark on the same object are expected to pass the same fsid, so the call will fail on cached fsid mismatch. If an event is reported on several mark types (inode, mount, filesystem), all connectors should already have the same fsid, so we use the cached fsid from the first connector. [JK: Simplify code flow around fanotify_get_fid() make fsid argument of fsnotify_add_mark_locked() unconditional] Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: enable FAN_REPORT_FID init flagAmir Goldstein
When setting up an fanotify listener, user may request to get fid information in event instead of an open file descriptor. The fid obtained with event on a watched object contains the file handle returned by name_to_handle_at(2) and fsid returned by statfs(2). Restrict FAN_REPORT_FID to class FAN_CLASS_NOTIF, because we have have no good reason to support reporting fid on permission events. When setting a mark, we need to make sure that the filesystem supports encoding file handles with name_to_handle_at(2) and that statfs(2) encodes a non-zero fsid. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: copy event fid info to userAmir Goldstein
If group requested FAN_REPORT_FID and event has file identifier, copy that information to user reading the event after event metadata. fid information is formatted as struct fanotify_event_info_fid that includes a generic header struct fanotify_event_info_header, so that other info types could be defined in the future using the same header. metadata->event_len includes the length of the fid information. The fid information includes the filesystem's fsid (see statfs(2)) followed by an NFS file handle of the file that could be passed as an argument to open_by_handle_at(2). Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: encode file identifier for FAN_REPORT_FIDAmir Goldstein
When user requests the flag FAN_REPORT_FID in fanotify_init(), a unique file identifier of the event target object will be reported with the event. The file identifier includes the filesystem's fsid (i.e. from statfs(2)) and an NFS file handle of the file (i.e. from name_to_handle_at(2)). The file identifier makes holding the path reference and passing a file descriptor to user redundant, so those are disabled in a group with FAN_REPORT_FID. Encode fid and store it in event for a group with FAN_REPORT_FID. Up to 12 bytes of file handle on 32bit arch (16 bytes on 64bit arch) are stored inline in fanotify_event struct. Larger file handles are stored in an external allocated buffer. On failure to encode fid, we print a warning and queue the event without the fid information. [JK: Fold part of later patched into this one to use exportfs_encode_inode_fh() right away] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07fanotify: open code fill_event_metadata()Amir Goldstein
The helper is quite trivial and open coding it will make it easier to implement copying event fid info to user. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-06fanotify: rename struct fanotify_{,perm_}event_infoAmir Goldstein
struct fanotify_event_info "inherits" from struct fsnotify_event and therefore a more appropriate (and short) name for it is fanotify_event. Same for struct fanotify_perm_event_info, which now "inherits" from struct fanotify_event. We plan to reuse the name struct fanotify_event_info for user visible event info record format. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-06fsnotify: move mask out of struct fsnotify_eventAmir Goldstein
Common fsnotify_event helpers have no need for the mask field. It is only used by backend code, so move the field out of the abstract fsnotify_event struct and into the concrete backend event structs. This change packs struct inotify_event_info better on 64bit machine and will allow us to cram some more fields into struct fanotify_event_info. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-06fsnotify: send all event types to super block marksAmir Goldstein
So far, existence of super block marks was checked only on events with data type FSNOTIFY_EVENT_PATH. Use the super block of the "to_tell" inode to report the events of all event types to super block marks. This change has no effect on current backends. Soon, this will allow fanotify backend to receive all event types on a super block mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-01-02inotify: Fix fd refcount leak in inotify_add_watch().Tetsuo Handa
Commit 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") forgot to call fdput() before bailing out. Fixes: 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") CC: stable@vger.kernel.org Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-12-11fanotify: Use inode_is_open_for_writeNikolay Borisov
Use the aptly named function rather than opencoding i_writecount check. No functional changes. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-12-05fanotify: Make sure to check event_len when copyingKees Cook
As a precaution, make sure we check event_len when copying to userspace. Based on old feedback: https://lkml.kernel.org/r/542D9FE5.3010009@gmx.de Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-15fsnotify/fdinfo: include fdinfo.h for inotify_show_fdinfo()Eric Biggers
inotify_show_fdinfo() is defined in fs/notify/fdinfo.c and declared in fs/notify/fdinfo.h, but the declaration isn't included at the point of the definition. Include the header to enforce that the definition matches the declaration. This addresses a gcc warning when -Wmissing-prototypes is enabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-13fanotify: introduce new event mask FAN_OPEN_EXEC_PERMMatthew Bobrowski
A new event mask FAN_OPEN_EXEC_PERM has been defined. This allows users to receive events and grant access to files that are intending to be opened for execution. Events of FAN_OPEN_EXEC_PERM type will be generated when a file has been opened by using either execve(), execveat() or uselib() system calls. This acts in the same manner as previous permission event mask, meaning that an access response is required from the user application in order to permit any further operations on the file. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-13fanotify: introduce new event mask FAN_OPEN_EXECMatthew Bobrowski
A new event mask FAN_OPEN_EXEC has been defined so that users have the ability to receive events specifically when a file has been opened with the intent to be executed. Events of FAN_OPEN_EXEC type will be generated when a file has been opened using either execve(), execveat() or uselib() system calls. The feature is implemented within fsnotify_open() by generating the FAN_OPEN_EXEC event type if __FMODE_EXEC is set within file->f_flags. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-13fanotify: return only user requested event types in event maskMatthew Bobrowski
Modify fanotify_should_send_event() so that it now returns a mask for an event that contains ONLY flags for the event types that have been specifically requested by the user. Flags that may have been included within the event mask, but have not been explicitly requested by the user will not be present in the returned value. As an example, given the situation where a user requests events of type FAN_OPEN. Traditionally, the event mask returned within an event that occurred on a filesystem object that has been marked for monitoring and is opened, will only ever have the FAN_OPEN bit set. With the introduction of the new flags like FAN_OPEN_EXEC, and perhaps any other future event flags, there is a possibility of the returned event mask containing more than a single bit set, despite having only requested the single event type. Prior to these modifications performed to fanotify_should_send_event(), a user would have received a bundled event mask containing flags FAN_OPEN and FAN_OPEN_EXEC in the instance that a file was opened for execution via execve(), for example. This means that a user would receive event types in the returned event mask that have not been requested. This runs the possibility of breaking existing systems and causing other unforeseen issues. To mitigate this possibility, fanotify_should_send_event() has been modified to return the event mask containing ONLY event types explicitly requested by the user. This means that we will NOT report events that the user did no set a mask for, and we will NOT report events that the user has set an ignore mask for. The function name fanotify_should_send_event() has also been updated so that it's more relevant to what it has been designed to do. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-08fanotify: fix handling of events on child sub-directoryAmir Goldstein
When an event is reported on a sub-directory and the parent inode has a mark mask with FS_EVENT_ON_CHILD|FS_ISDIR, the event will be sent to fsnotify() even if the event type is not in the parent mark mask (e.g. FS_OPEN). Further more, if that event happened on a mount or a filesystem with a mount/sb mark that does have that event type in their mask, the "on child" event will be reported on the mount/sb mark. That is not desired, because user will get a duplicate event for the same action. Note that the event reported on the victim inode is never merged with the event reported on the parent inode, because of the check in should_merge(): old_fsn->inode == new_fsn->inode. Fix this by looking for a match of an actual event type (i.e. not just FS_ISDIR) in parent's inode mark mask and by not reporting an "on child" event to group if event type is only found on mount/sb marks. [backport hint: The bug seems to have always been in fanotify, but this patch will only apply cleanly to v4.19.y] Cc: <stable@vger.kernel.org> # v4.19 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-25fsnotify: Fix busy inodes during unmountJan Kara
Detaching of mark connector from fsnotify_put_mark() can race with unmounting of the filesystem like: CPU1 CPU2 fsnotify_put_mark() spin_lock(&conn->lock); ... inode = fsnotify_detach_connector_from_object(conn) spin_unlock(&conn->lock); generic_shutdown_super() fsnotify_unmount_inodes() sees connector detached for inode -> nothing to do evict_inode() barfs on pending inode reference iput(inode); Resulting in "Busy inodes after unmount" message and possible kernel oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode references from detached connectors. Note that the accounting of outstanding inode references in the superblock can cause some cacheline contention on the counter. OTOH it happens only during deletion of the last notification mark from an inode (or during unlinking of watched inode) and that is not too bad. I have measured time to create & delete inotify watch 100000 times from 64 processes in parallel (each process having its own inotify group and its own file on a shared superblock) on a 64 CPU machine. Average and standard deviation of 15 runs look like: Avg Stddev Vanilla 9.817400 0.276165 Fixed 9.710467 0.228294 So there's no statistically significant difference. Fixes: 6b3f05d24d35 ("fsnotify: Detach mark from object list when last reference is dropped") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-08fanotify: support reporting thread id instead of process idAmir Goldstein
In order to identify which thread triggered the event in a multi-threaded program, add the FAN_REPORT_TID flag in fanotify_init to opt-in for reporting the event creator's thread id information. Signed-off-by: nixiaoming <nixiaoming@huawei.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-04fanotify: add BUILD_BUG_ON() to count the bits of fanotify constantsAmir Goldstein
Also define the FANOTIFY_EVENT_FLAGS consisting of the extra flags FAN_ONDIR and FAN_ON_CHILD. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-04fsnotify: convert runtime BUG_ON() to BUILD_BUG_ON()Amir Goldstein
The BUG_ON() statements to verify number of bits in ALL_FSNOTIFY_BITS and ALL_INOTIFY_BITS are converted to build time check of the constant. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-04fanotify: deprecate uapi FAN_ALL_* constantsAmir Goldstein
We do not want to add new bits to the FAN_ALL_* uapi constants because they have been exposed to userspace. If there are programs out there using these constants, those programs could break if re-compiled with modified FAN_ALL_* constants and run on an old kernel. We deprecate the uapi constants FAN_ALL_* and define new FANOTIFY_* constants for internal use to replace them. New feature bits will be added only to the new constants. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-04fanotify: simplify handling of FAN_ONDIRAmir Goldstein
fanotify mark add/remove code jumps through hoops to avoid setting the FS_ISDIR in the commulative object mask. That was just papering over a bug in fsnotify() handling of the FS_ISDIR extra flag. This bug is now fixed, so all the hoops can be removed along with the unneeded internal flag FAN_MARK_ONDIR. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-04fsnotify: generalize handling of extra event flagsAmir Goldstein
FS_EVENT_ON_CHILD gets a special treatment in fsnotify() because it is not a flag specifying an event type, but rather an extra flags that may be reported along with another event and control the handling of the event by the backend. FS_ISDIR is also an "extra flag" and not an "event type" and therefore desrves the same treatment. With inotify/dnotify backends it was never possible to set FS_ISDIR in mark masks, so it did not matter. With fanotify backend, mark adding code jumps through hoops to avoid setting the FS_ISDIR in the commulative object mask. Separate the constant ALL_FSNOTIFY_EVENTS to ALL_FSNOTIFY_FLAGS and ALL_FSNOTIFY_EVENTS, so the latter can be used to test for specific event types. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-27fanotify: store fanotify_init() flags in group's fanotify_dataAmir Goldstein
This averts the need to re-generate flags in fanotify_show_fdinfo() and sets the scene for addition of more upcoming flags without growing new members to the fanotify_data struct. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03fanotify: add API to attach/detach super block markAmir Goldstein
Add another mark type flag FAN_MARK_FILESYSTEM for add/remove/flush of super block mark type. A super block watch gets all events on the filesystem, regardless of the mount from which the mark was added, unless an ignore mask exists on either the inode or the mount where the event was generated. Only one of FAN_MARK_MOUNT and FAN_MARK_FILESYSTEM mark type flags may be provided to fanotify_mark() or no mark type flag for inode mark. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03fsnotify: send path type events to group with super block marksAmir Goldstein
Send events to group if super block mark mask matches the event and unless the same group has an ignore mask on the vfsmount or the inode on which the event occurred. Soon, fanotify backend is going to support super block marks and fanotify backend only supports path type events. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03fsnotify: add super block object typeAmir Goldstein
Add the infrastructure to attach a mark to a super_block struct and detach all attached marks when super block is destroyed. This is going to be used by fanotify backend to setup super block marks. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03fsnotify: fix ignore mask logic in fsnotify()Amir Goldstein
Commit 92183a42898d ("fsnotify: fix ignore mask logic in send_to_group()") acknoledges the use case of ignoring an event on an inode mark, because of an ignore mask on a mount mark of the same group (i.e. I want to get all events on this file, except for the events that came from that mount). This change depends on correctly merging the inode marks and mount marks group lists, so that the mount mark ignore mask would be tested in send_to_group(). Alas, the merging of the lists did not take into account the case where event in question is not in the mask of any of the mount marks. To fix this, completely remove the tests for inode and mount event masks from the lists merging code. Fixes: 92183a42898d ("fsnotify: fix ignore mask logic in send_to_group") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-08-29Merge tag 'for_v4.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc fs fixes from Jan Kara: - make UDF to properly mount media created by Win7 - make isofs to properly refuse devices with large physical block size - fix a Spectre gadget in quotactl(2) - fix a warning in fsnotify code hit by syzkaller * tag 'for_v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix mounting of Win7 created UDF filesystems udf: Remove dead code from udf_find_fileset() fs/quota: Fix spectre gadget in do_quotactl fs/quota: Replace XQM_MAXQUOTAS usage with MAXQUOTAS isofs: reject hardware sector size > 2048 bytes fsnotify: fix false positive warning on inode delete
2018-08-21Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull core signal handling updates from Eric Biederman: "It was observed that a periodic timer in combination with a sufficiently expensive fork could prevent fork from every completing. This contains the changes to remove the need for that restart. This set of changes is split into several parts: - The first part makes PIDTYPE_TGID a proper pid type instead something only for very special cases. The part starts using PIDTYPE_TGID enough so that in __send_signal where signals are actually delivered we know if the signal is being sent to a a group of processes or just a single process. - With that prep work out of the way the logic in fork is modified so that fork logically makes signals received while it is running appear to be received after the fork completes" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits) signal: Don't send signals to tasks that don't exist signal: Don't restart fork when signals come in. fork: Have new threads join on-going signal group stops fork: Skip setting TIF_SIGPENDING in ptrace_init_task signal: Add calculate_sigpending() fork: Unconditionally exit if a fatal signal is pending fork: Move and describe why the code examines PIDNS_ADDING signal: Push pid type down into complete_signal. signal: Push pid type down into __send_signal signal: Push pid type down into send_signal signal: Pass pid type into do_send_sig_info signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task signal: Pass pid type into group_send_sig_info signal: Pass pid and pid type into send_sigqueue posix-timers: Noralize good_sigevent signal: Use PIDTYPE_TGID to clearly store where file signals will be sent pid: Implement PIDTYPE_TGID pids: Move the pgrp and session pid pointers from task_struct to signal_struct kvm: Don't open code task_pid in kvm_vcpu_ioctl pids: Compute task_tgid using signal->leader_pid ...
2018-08-20fsnotify: fix false positive warning on inode deleteJan Kara
When inode is getting deleted and someone else holds reference to a mark attached to the inode, we just detach the connector from the inode. In that case fsnotify_put_mark() called from fsnotify_destroy_marks() will decide to recalculate mask for the inode and __fsnotify_recalc_mask() will WARN about invalid connector type: WARNING: CPU: 1 PID: 12015 at fs/notify/mark.c:139 __fsnotify_recalc_mask+0x2d7/0x350 fs/notify/mark.c:139 Actually there's no reason to warn about detached connector in __fsnotify_recalc_mask() so just silently skip updating the mask in such case. Reported-by: syzbot+c34692a51b9a6ca93540@syzkaller.appspotmail.com Fixes: 3ac70bfcde81 ("fsnotify: add helper to get mask from connector") Signed-off-by: Jan Kara <jack@suse.cz>
2018-08-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc things - a few Y2038 fixes - ntfs fixes - arch/sh tweaks - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) mm/hmm.c: remove unused variables align_start and align_end fs/userfaultfd.c: remove redundant pointer uwq mm, vmacache: hash addresses based on pmd mm/list_lru: introduce list_lru_shrink_walk_irq() mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one() mm/list_lru.c: move locking from __list_lru_walk_one() to its caller mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node() mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP mm/sparse: delete old sparse_init and enable new one mm/sparse: add new sparse_init_nid() and sparse_init() mm/sparse: move buffer init/fini to the common place mm/sparse: use the new sparse buffer functions in non-vmemmap mm/sparse: abstract sparse buffer allocations mm/hugetlb.c: don't zero 1GiB bootmem pages mm, page_alloc: double zone's batchsize mm/oom_kill.c: document oom_lock mm/hugetlb: remove gigantic page support for HIGHMEM mm, oom: remove sleep from under oom_lock kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous() mm/cma: remove unsupported gfp_mask parameter from cma_alloc() ...
2018-08-17fs: fsnotify: account fsnotify metadata to kmemcgShakeel Butt
Patch series "Directed kmem charging", v8. The Linux kernel's memory cgroup allows limiting the memory usage of the jobs running on the system to provide isolation between the jobs. All the kernel memory allocated in the context of the job and marked with __GFP_ACCOUNT will also be included in the memory usage and be limited by the job's limit. The kernel memory can only be charged to the memcg of the process in whose context kernel memory was allocated. However there are cases where the allocated kernel memory should be charged to the memcg different from the current processes's memcg. This patch series contains two such concrete use-cases i.e. fsnotify and buffer_head. The fsnotify event objects can consume a lot of system memory for large or unlimited queues if there is either no or slow listener. The events are allocated in the context of the event producer. However they should be charged to the event consumer. Similarly the buffer_head objects can be allocated in a memcg different from the memcg of the page for which buffer_head objects are being allocated. To solve this issue, this patch series introduces mechanism to charge kernel memory to a given memcg. In case of fsnotify events, the memcg of the consumer can be used for charging and for buffer_head, the memcg of the page can be charged. For directed charging, the caller can use the scope API memalloc_[un]use_memcg() to specify the memcg to charge for all the __GFP_ACCOUNT allocations within the scope. This patch (of 2): A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it] Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21signal: Use PIDTYPE_TGID to clearly store where file signals will be sentEric W. Biederman
When f_setown is called a pid and a pid type are stored. Replace the use of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread group. Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now is only for a thread. Update the users of __f_setown to use PIDTYPE_TGID instead of PIDTYPE_PID. For now the code continues to capture task_pid (when task_tgid would really be appropriate), and iterate on PIDTYPE_PID (even when type == PIDTYPE_TGID) out of an abundance of caution to preserve existing behavior. Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID for tgid lookup also be used to avoid taking the tasklist lock. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21pids: Compute task_tgid using signal->leader_pidEric W. Biederman
The cost is the the same and this removes the need to worry about complications that come from de_thread and group_leader changing. __task_pid_nr_ns has been updated to take advantage of this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-06-27inotify: Add flag IN_MASK_CREATE for inotify_add_watch()Henry Wilson
The flag IN_MASK_CREATE is introduced as a flag for inotiy_add_watch() which prevents inotify from modifying any existing watches when invoked. If the pathname specified in the call has a watched inode associated with it and IN_MASK_CREATE is specified, fail with an errno of EEXIST. Use of IN_MASK_CREATE with IN_MASK_ADD is reserved for future use and will return EINVAL. RATIONALE In the current implementation, there is no way to prevent inotify_add_watch() from modifying existing watch descriptors. Even if the caller keeps a record of all watch descriptors collected, this is only sufficient to detect that an existing watch descriptor may have been modified. The assumption that a particular path will map to the same inode over multiple calls to inotify_add_watch() cannot be made as files can be renamed or deleted. It is also not possible to assume that two distinct paths do no map to the same inode, due to hard-links or a dereferenced symbolic link. Further uses of inotify_add_watch() to revert the change may cause other watch descriptors to be modified or created, merely compunding the problem. There is currently no system call such as inotify_modify_watch() to explicity modify a watch descriptor, which would be able to revert unwanted changes. Thus the caller cannot guarantee to be able to revert any changes to existing watch decriptors. Additionally the caller cannot assume that the events that are associated with a watch descriptor are within the set requested, as any future calls to inotify_add_watch() may unintentionally modify a watch descriptor's mask. Thus it cannot currently be guaranteed that a watch descriptor will only generate events which have been requested. The program must filter events which come through its watch descriptor to within its expected range. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Henry Wilson <henry.wilson@acentic.com> Signed-off-by: Jan Kara <jack@suse.cz>