summaryrefslogtreecommitdiff
path: root/security
AgeCommit message (Collapse)Author
2017-09-24Merge branch 'next-general' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull misc security layer update from James Morris: "This is the remaining 'general' change in the security tree for v4.14, following the direct merging of SELinux (+ TOMOYO), AppArmor, and seccomp. That's everything now for the security tree except IMA, which will follow shortly (I've been traveling for the past week with patchy internet)" * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: fix description of values returned by cap_inode_need_killpriv
2017-09-23security: fix description of values returned by cap_inode_need_killprivStefan Berger
cap_inode_need_killpriv returns 1 if security.capability exists and has a value and inode_killpriv() is required, 0 otherwise. Fix the description of the return value to reflect this. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-09-23Merge tag 'apparmor-pr-2017-09-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor updates from John Johansen: "This is the apparmor pull request, similar to SELinux and seccomp. It's the same series that I was sent to James' security tree + one regression fix that was found after the series was sent to James and would have been sent for v4.14-rc2. Features: - in preparation for secid mapping add support for absolute root view based labels - add base infastructure for socket mediation - add mount mediation - add signal mediation minor cleanups and changes: - be defensive, ensure unconfined profiles have dfas initialized - add more debug asserts to apparmorfs - enable policy unpacking to audit different reasons for failure - cleanup conditional check for label in label_print - Redundant condition: prev_ns. in [label.c:1498] Bug Fixes: - fix regression in apparmorfs DAC access permissions - fix build failure on sparc caused by undeclared signals - fix sparse report of incorrect type assignment when freeing label proxies - fix race condition in null profile creation - Fix an error code in aafs_create() - Fix logical error in verify_header() - Fix shadowed local variable in unpack_trans_table()" * tag 'apparmor-pr-2017-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: fix apparmorfs DAC access permissions apparmor: fix build failure on sparc caused by undeclared signals apparmor: fix incorrect type assignment when freeing proxies apparmor: ensure unconfined profiles have dfas initialized apparmor: fix race condition in null profile creation apparmor: move new_null_profile to after profile lookup fns() apparmor: add base infastructure for socket mediation apparmor: add more debug asserts to apparmorfs apparmor: make policy_unpack able to audit different info messages apparmor: add support for absolute root view based labels apparmor: cleanup conditional check for label in label_print apparmor: add mount mediation apparmor: add the ability to mediate signals apparmor: Redundant condition: prev_ns. in [label.c:1498] apparmor: Fix an error code in aafs_create() apparmor: Fix logical error in verify_header() apparmor: Fix shadowed local variable in unpack_trans_table()
2017-09-22apparmor: fix apparmorfs DAC access permissionsJohn Johansen
The DAC access permissions for several apparmorfs files are wrong. .access - needs to be writable by all tasks to perform queries the others in the set only provide a read fn so should be read only. With policy namespace virtualization all apparmor needs to control the permission and visibility checks directly which means DAC access has to be allowed for all user, group, and other. BugLink: http://bugs.launchpad.net/bugs/1713103 Fixes: c97204baf840b ("apparmor: rename apparmor file fns and data to indicate use") Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: fix build failure on sparc caused by undeclared signalsJohn Johansen
In file included from security/apparmor/ipc.c:23:0: security/apparmor/include/sig_names.h:26:3: error: 'SIGSTKFLT' undeclared here (not in a function) [SIGSTKFLT] = 16, /* -, 16, - */ ^ security/apparmor/include/sig_names.h:26:3: error: array index in initializer not of integer type security/apparmor/include/sig_names.h:26:3: note: (near initialization for 'sig_map') security/apparmor/include/sig_names.h:51:3: error: 'SIGUNUSED' undeclared here (not in a function) [SIGUNUSED] = 34, /* -, 31, - */ ^ security/apparmor/include/sig_names.h:51:3: error: array index in initializer not of integer type security/apparmor/include/sig_names.h:51:3: note: (near initialization for 'sig_map') Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: c6bf1adaecaa ("apparmor: add the ability to mediate signals") Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: fix incorrect type assignment when freeing proxiesJohn Johansen
sparse reports poisoning the proxy->label before freeing the struct is resulting in a sparse build warning. ../security/apparmor/label.c:52:30: warning: incorrect type in assignment (different address spaces) ../security/apparmor/label.c:52:30: expected struct aa_label [noderef] <asn:4>*label ../security/apparmor/label.c:52:30: got struct aa_label *<noident> fix with RCU_INIT_POINTER as this is one of those cases where rcu_assign_pointer() is not needed. Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: ensure unconfined profiles have dfas initializedJohn Johansen
Generally unconfined has early bailout tests and does not need the dfas initialized, however if an early bailout test is ever missed it will result in an oops. Be defensive and initialize the unconfined profile to have null dfas (no permission) so if an early bailout test is missed we fail closed (no perms granted) instead of oopsing. Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: fix race condition in null profile creationJohn Johansen
There is a race when null- profile is being created between the initial lookup/creation of the profile and lock/addition of the profile. This could result in multiple version of a profile being added to the list which need to be removed/replaced. Since these are learning profile their is no affect on mediation. Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: move new_null_profile to after profile lookup fns()John Johansen
new_null_profile will need to use some of the profile lookup fns() so move instead of doing forward fn declarations. Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: add base infastructure for socket mediationJohn Johansen
Provide a basic mediation of sockets. This is not a full net mediation but just whether a spcific family of socket can be used by an application, along with setting up some basic infrastructure for network mediation to follow. the user space rule hav the basic form of NETWORK RULE = [ QUALIFIERS ] 'network' [ DOMAIN ] [ TYPE | PROTOCOL ] DOMAIN = ( 'inet' | 'ax25' | 'ipx' | 'appletalk' | 'netrom' | 'bridge' | 'atmpvc' | 'x25' | 'inet6' | 'rose' | 'netbeui' | 'security' | 'key' | 'packet' | 'ash' | 'econet' | 'atmsvc' | 'sna' | 'irda' | 'pppox' | 'wanpipe' | 'bluetooth' | 'netlink' | 'unix' | 'rds' | 'llc' | 'can' | 'tipc' | 'iucv' | 'rxrpc' | 'isdn' | 'phonet' | 'ieee802154' | 'caif' | 'alg' | 'nfc' | 'vsock' | 'mpls' | 'ib' | 'kcm' ) ',' TYPE = ( 'stream' | 'dgram' | 'seqpacket' | 'rdm' | 'raw' | 'packet' ) PROTOCOL = ( 'tcp' | 'udp' | 'icmp' ) eg. network, network inet, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-09-22apparmor: add more debug asserts to apparmorfsJohn Johansen
Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-09-22apparmor: make policy_unpack able to audit different info messagesJohn Johansen
Switch unpack auditing to using the generic name field in the audit struct and make it so we can start adding new info messages about why an unpack failed. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-09-22apparmor: add support for absolute root view based labelsJohn Johansen
With apparmor policy virtualization based on policy namespace View's we don't generally want/need absolute root based views, however there are cases like debugging and some secid based conversions where using a root based view is important. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-09-22apparmor: cleanup conditional check for label in label_printJohn Johansen
Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-09-22apparmor: add mount mediationJohn Johansen
Add basic mount mediation. That allows controlling based on basic mount parameters. It does not include special mount parameters for apparmor, super block labeling, or any triggers for apparmor namespace parameter modifications on pivot root. default userspace policy rules have the form of MOUNT RULE = ( MOUNT | REMOUNT | UMOUNT ) MOUNT = [ QUALIFIERS ] 'mount' [ MOUNT CONDITIONS ] [ SOURCE FILEGLOB ] [ '->' MOUNTPOINT FILEGLOB ] REMOUNT = [ QUALIFIERS ] 'remount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB UMOUNT = [ QUALIFIERS ] 'umount' [ MOUNT CONDITIONS ] MOUNTPOINT FILEGLOB MOUNT CONDITIONS = [ ( 'fstype' | 'vfstype' ) ( '=' | 'in' ) MOUNT FSTYPE EXPRESSION ] [ 'options' ( '=' | 'in' ) MOUNT FLAGS EXPRESSION ] MOUNT FSTYPE EXPRESSION = ( MOUNT FSTYPE LIST | MOUNT EXPRESSION ) MOUNT FSTYPE LIST = Comma separated list of valid filesystem and virtual filesystem types (eg ext4, debugfs, etc) MOUNT FLAGS EXPRESSION = ( MOUNT FLAGS LIST | MOUNT EXPRESSION ) MOUNT FLAGS LIST = Comma separated list of MOUNT FLAGS. MOUNT FLAGS = ( 'ro' | 'rw' | 'nosuid' | 'suid' | 'nodev' | 'dev' | 'noexec' | 'exec' | 'sync' | 'async' | 'remount' | 'mand' | 'nomand' | 'dirsync' | 'noatime' | 'atime' | 'nodiratime' | 'diratime' | 'bind' | 'rbind' | 'move' | 'verbose' | 'silent' | 'loud' | 'acl' | 'noacl' | 'unbindable' | 'runbindable' | 'private' | 'rprivate' | 'slave' | 'rslave' | 'shared' | 'rshared' | 'relatime' | 'norelatime' | 'iversion' | 'noiversion' | 'strictatime' | 'nouser' | 'user' ) MOUNT EXPRESSION = ( ALPHANUMERIC | AARE ) ... PIVOT ROOT RULE = [ QUALIFIERS ] pivot_root [ oldroot=OLD PUT FILEGLOB ] [ NEW ROOT FILEGLOB ] SOURCE FILEGLOB = FILEGLOB MOUNTPOINT FILEGLOB = FILEGLOB eg. mount, mount /dev/foo, mount options=ro /dev/foo -> /mnt/, mount options in (ro,atime) /dev/foo -> /mnt/, mount options=ro options=atime, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-09-22apparmor: add the ability to mediate signalsJohn Johansen
Add signal mediation where the signal can be mediated based on the signal, direction, or the label or the peer/target. The signal perms are verified on a cross check to ensure policy consistency in the case of incremental policy load/replacement. The optimization of skipping the cross check when policy is guaranteed to be consistent (single compile unit) remains to be done. policy rules have the form of SIGNAL_RULE = [ QUALIFIERS ] 'signal' [ SIGNAL ACCESS PERMISSIONS ] [ SIGNAL SET ] [ SIGNAL PEER ] SIGNAL ACCESS PERMISSIONS = SIGNAL ACCESS | SIGNAL ACCESS LIST SIGNAL ACCESS LIST = '(' Comma or space separated list of SIGNAL ACCESS ')' SIGNAL ACCESS = ( 'r' | 'w' | 'rw' | 'read' | 'write' | 'send' | 'receive' ) SIGNAL SET = 'set' '=' '(' SIGNAL LIST ')' SIGNAL LIST = Comma or space separated list of SIGNALS SIGNALS = ( 'hup' | 'int' | 'quit' | 'ill' | 'trap' | 'abrt' | 'bus' | 'fpe' | 'kill' | 'usr1' | 'segv' | 'usr2' | 'pipe' | 'alrm' | 'term' | 'stkflt' | 'chld' | 'cont' | 'stop' | 'stp' | 'ttin' | 'ttou' | 'urg' | 'xcpu' | 'xfsz' | 'vtalrm' | 'prof' | 'winch' | 'io' | 'pwr' | 'sys' | 'emt' | 'exists' | 'rtmin+0' ... 'rtmin+32' ) SIGNAL PEER = 'peer' '=' AARE eg. signal, # allow all signals signal send set=(hup, kill) peer=foo, Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2017-09-22apparmor: Redundant condition: prev_ns. in [label.c:1498]John Johansen
Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: Fix an error code in aafs_create()Dan Carpenter
We accidentally forgot to set the error code on this path. It means we return NULL instead of an error pointer. I looked through a bunch of callers and I don't think it really causes a big issue, but the documentation says we're supposed to return error pointers here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: Fix logical error in verify_header()Christos Gkekas
verify_header() is currently checking whether interface version is less than 5 *and* greater than 7, which always evaluates to false. Instead it should check whether it is less than 5 *or* greater than 7. Signed-off-by: Christos Gkekas <chris.gekas@gmail.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-22apparmor: Fix shadowed local variable in unpack_trans_table()Geert Uytterhoeven
with W=2: security/apparmor/policy_unpack.c: In function ‘unpack_trans_table’: security/apparmor/policy_unpack.c:469: warning: declaration of ‘pos’ shadows a previous local security/apparmor/policy_unpack.c:451: warning: shadowed declaration is here Rename the old "pos" to "saved_pos" to fix this. Fixes: 5379a3312024a8be ("apparmor: support v7 transition format compatible with label_parse") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-09-20selinux: Use kmem_cache for hashtab_nodeKyeongdon Kim
During random test as own device to check slub account, we found some slack memory from hashtab_node(kmalloc-64). By using kzalloc(), middle of test result like below: allocated size 240768 request size 45144 slack size 195624 allocation count 3762 So, we want to use kmem_cache_zalloc() and that can reduce memory size 52byte(slack size/alloc count) per each struct. Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-09-14Merge branch 'work.set_fs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more set_fs removal from Al Viro: "Christoph's 'use kernel_read and friends rather than open-coding set_fs()' series" * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: unexport vfs_readv and vfs_writev fs: unexport vfs_read and vfs_write fs: unexport __vfs_read/__vfs_write lustre: switch to kernel_write gadget/f_mass_storage: stop messing with the address limit mconsole: switch to kernel_read btrfs: switch write_buf to kernel_write net/9p: switch p9_fd_read to kernel_write mm/nommu: switch do_mmap_private to kernel_read serial2002: switch serial2002_tty_write to kernel_{read/write} fs: make the buf argument to __kernel_write a void pointer fs: fix kernel_write prototype fs: fix kernel_read prototype fs: move kernel_read to fs/read_write.c fs: move kernel_write to fs/read_write.c autofs4: switch autofs4_write to __kernel_write ashmem: switch to ->read_iter
2017-09-12Merge tag 'selinux-pr-20170831' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "A relatively quiet period for SELinux, 11 patches with only two/three having any substantive changes. These noteworthy changes include another tweak to the NNP/nosuid handling, per-file labeling for cgroups, and an object class fix for AF_UNIX/SOCK_RAW sockets; the rest of the changes are minor tweaks or administrative updates (Stephen's email update explains the file explosion in the diffstat). Everything passes the selinux-testsuite" [ Also a couple of small patches from the security tree from Tetsuo Handa for Tomoyo and LSM cleanup. The separation of security policy updates wasn't all that clean - Linus ] * tag 'selinux-pr-20170831' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: constify nf_hook_ops selinux: allow per-file labeling for cgroupfs lsm_audit: update my email address selinux: update my email address MAINTAINERS: update the NetLabel and Labeled Networking information selinux: use GFP_NOWAIT in the AVC kmem_caches selinux: Generalize support for NNP/nosuid SELinux domain transitions selinux: genheaders should fail if too many permissions are defined selinux: update the selinux info in MAINTAINERS credits: update Paul Moore's info selinux: Assign proper class to PF_UNIX/SOCK_RAW sockets tomoyo: Update URLs in Documentation/admin-guide/LSM/tomoyo.rst LSM: Remove security_task_create() hook.
2017-09-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "Life has been busy and I have not gotten half as much done this round as I would have liked. I delayed it so that a minor conflict resolution with the mips tree could spend a little time in linux-next before I sent this pull request. This includes two long delayed user namespace changes from Kirill Tkhai. It also includes a very useful change from Serge Hallyn that allows the security capability attribute to be used inside of user namespaces. The practical effect of this is people can now untar tarballs and install rpms in user namespaces. It had been suggested to generalize this and encode some of the namespace information information in the xattr name. Upon close inspection that makes the things that should be hard easy and the things that should be easy more expensive. Then there is my bugfix/cleanup for signal injection that removes the magic encoding of the siginfo union member from the kernel internal si_code. The mips folks reported the case where I had used FPE_FIXME me is impossible so I have remove FPE_FIXME from mips, while at the same time including a return statement in that case to keep gcc from complaining about unitialized variables. I almost finished the work to get make copy_siginfo_to_user a trivial copy to user. The code is available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3 But I did not have time/energy to get the code posted and reviewed before the merge window opened. I was able to see that the security excuse for just copying fields that we know are initialized doesn't work in practice there are buggy initializations that don't initialize the proper fields in siginfo. So we still sometimes copy unitialized data to userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Introduce v3 namespaced file capabilities mips/signal: In force_fcr31_sig return in the impossible case signal: Remove kernel interal si_code magic fcntl: Don't use ambiguous SIG_POLL si_codes prctl: Allow local CAP_SYS_ADMIN changing exe_file security: Use user_namespace::level to avoid redundant iterations in cap_capable() userns,pidns: Verify the userns for new pid namespaces signal/testing: Don't look for __SI_FAULT in userspace signal/mips: Document a conflict with SI_USER with SIGFPE signal/sparc: Document a conflict with SI_USER with SIGFPE signal/ia64: Document a conflict with SI_USER with SIGFPE signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-07Merge tag 'audit-pr-20170907' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "A small pull request for audit this time, only four patches and only two with any real code changes. Those two changes are the removal of a pointless SELinux AVC initialization audit event and a fix to improve the audit timestamp overhead. The other two patches are comment cleanup and administrative updates, nothing very exciting. Everything passes our tests" * tag 'audit-pr-20170907' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: update the function comments selinux: remove AVC init audit log message audit: update the audit info in MAINTAINERS audit: Reduce overhead using a coarse clock
2017-09-07Merge tag 'secureexec-v4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull secureexec update from Kees Cook: "This series has the ultimate goal of providing a sane stack rlimit when running set*id processes. To do this, the bprm_secureexec LSM hook is collapsed into the bprm_set_creds hook so the secureexec-ness of an exec can be determined early enough to make decisions about rlimits and the resulting memory layouts. Other logic acting on the secureexec-ness of an exec is similarly consolidated. Capabilities needed some special handling, but the refactoring removed other special handling, so that was a wash" * tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Consolidate pdeath_signal clearing exec: Use sane stack rlimit under secureexec exec: Consolidate dumpability logic smack: Remove redundant pdeath_signal clearing exec: Use secureexec for clearing pdeath_signal exec: Use secureexec for setting dumpability LSM: drop bprm_secureexec hook commoncap: Move cap_elevated calculation into bprm_set_creds commoncap: Refactor to remove bprm_secureexec hook smack: Refactor to remove bprm_secureexec hook selinux: Refactor to remove bprm_secureexec hook apparmor: Refactor to remove bprm_secureexec hook binfmt: Introduce secureexec flag exec: Correct comments about "point of no return" exec: Rename bprm->cred_prepared to called_set_creds
2017-09-05selinux: remove AVC init audit log messageRichard Guy Briggs
In the process of normalizing audit log messages, it was noticed that the AVC initialization code registered an audit log KERNEL record that didn't fit the standard format. In the process of attempting to normalize it it was determined that this record was not even necessary. Remove it. Ref: http://marc.info/?l=selinux&m=149614868525826&w=2 See: https://github.com/linux-audit/audit-kernel/issues/48 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-09-04fs: fix kernel_write prototypeChristoph Hellwig
Make the position an in/out argument like all the other read/write helpers and and make the buf argument a void pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-04fs: fix kernel_read prototypeChristoph Hellwig
Use proper ssize_t and size_t types for the return value and count argument, move the offset last and make it an in/out argument like all other read/write helpers, and make the buf argument a void pointer to get rid of lots of casts in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-01Introduce v3 namespaced file capabilitiesSerge E. Hallyn
Root in a non-initial user ns cannot be trusted to write a traditional security.capability xattr. If it were allowed to do so, then any unprivileged user on the host could map his own uid to root in a private namespace, write the xattr, and execute the file with privilege on the host. However supporting file capabilities in a user namespace is very desirable. Not doing so means that any programs designed to run with limited privilege must continue to support other methods of gaining and dropping privilege. For instance a program installer must detect whether file capabilities can be assigned, and assign them if so but set setuid-root otherwise. The program in turn must know how to drop partial capabilities, and do so only if setuid-root. This patch introduces v3 of the security.capability xattr. It builds a vfs_ns_cap_data struct by appending a uid_t rootid to struct vfs_cap_data. This is the absolute uid_t (that is, the uid_t in user namespace which mounted the filesystem, usually init_user_ns) of the root id in whose namespaces the file capabilities may take effect. When a task asks to write a v2 security.capability xattr, if it is privileged with respect to the userns which mounted the filesystem, then nothing should change. Otherwise, the kernel will transparently rewrite the xattr as a v3 with the appropriate rootid. This is done during the execution of setxattr() to catch user-space-initiated capability writes. Subsequently, any task executing the file which has the noted kuid as its root uid, or which is in a descendent user_ns of such a user_ns, will run the file with capabilities. Similarly when asking to read file capabilities, a v3 capability will be presented as v2 if it applies to the caller's namespace. If a task writes a v3 security.capability, then it can provide a uid for the xattr so long as the uid is valid in its own user namespace, and it is privileged with CAP_SETFCAP over its namespace. The kernel will translate that rootid to an absolute uid, and write that to disk. After this, a task in the writer's namespace will not be able to use those capabilities (unless rootid was 0), but a task in a namespace where the given uid is root will. Only a single security.capability xattr may exist at a time for a given file. A task may overwrite an existing xattr so long as it is privileged over the inode. Note this is a departure from previous semantics, which required privilege to remove a security.capability xattr. This check can be re-added if deemed useful. This allows a simple setxattr to work, allows tar/untar to work, and allows us to tar in one namespace and untar in another while preserving the capability, without risking leaking privilege into a parent namespace. Example using tar: $ cp /bin/sleep sleepx $ mkdir b1 b2 $ lxc-usernsexec -m b:0:100000:1 -m b:1:$(id -u):1 -- chown 0:0 b1 $ lxc-usernsexec -m b:0:100001:1 -m b:1:$(id -u):1 -- chown 0:0 b2 $ lxc-usernsexec -m b:0:100000:1000 -- tar --xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx $ lxc-usernsexec -m b:0:100001:1000 -- tar --xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar $ lxc-usernsexec -m b:0:100001:1000 -- getcap b2/sleepx b2/sleepx = cap_sys_admin+ep # /opt/ltp/testcases/bin/getv3xattr b2/sleepx v3 xattr, rootid is 100001 A patch to linux-test-project adding a new set of tests for this functionality is in the nsfscaps branch at github.com/hallyn/ltp Changelog: Nov 02 2016: fix invalid check at refuse_fcap_overwrite() Nov 07 2016: convert rootid from and to fs user_ns (From ebiederm: mar 28 2017) commoncap.c: fix typos - s/v4/v3 get_vfs_caps_from_disk: clarify the fs_ns root access check nsfscaps: change the code split for cap_inode_setxattr() Apr 09 2017: don't return v3 cap for caps owned by current root. return a v2 cap for a true v2 cap in non-init ns Apr 18 2017: . Change the flow of fscap writing to support s_user_ns writing. . Remove refuse_fcap_overwrite(). The value of the previous xattr doesn't matter. Apr 24 2017: . incorporate Eric's incremental diff . move cap_convert_nscap to setxattr and simplify its usage May 8, 2017: . fix leaking dentry refcount in cap_inode_getsecurity Signed-off-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-08-28selinux: constify nf_hook_opsArvind Yadav
nf_hook_ops are not supposed to change at runtime. nf_register_net_hooks and nf_unregister_net_hooks are working with const nf_hook_ops. So mark the non-const nf_hook_ops structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-08-22selinux: allow per-file labeling for cgroupfsAntonio Murdaca
This patch allows genfscon per-file labeling for cgroupfs. For instance, this allows to label the "release_agent" file within each cgroup mount and limit writes to it. Signed-off-by: Antonio Murdaca <amurdaca@redhat.com> [PM: subject line and merge tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-08-17lsm_audit: update my email addressStephen Smalley
Update my email address since epoch.ncsc.mil no longer exists. MAINTAINERS and CREDITS are already correct. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-08-17selinux: update my email addressStephen Smalley
Update my email address since epoch.ncsc.mil no longer exists. MAINTAINERS and CREDITS are already correct. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-08-08selinux: use GFP_NOWAIT in the AVC kmem_cachesMichal Hocko
There is a strange __GFP_NOMEMALLOC usage pattern in SELinux, specifically GFP_ATOMIC | __GFP_NOMEMALLOC which doesn't make much sense. GFP_ATOMIC on its own allows to access memory reserves while __GFP_NOMEMALLOC dictates we cannot use memory reserves. Replace this with the much more sane GFP_NOWAIT in the AVC code as we can tolerate memory allocation failures in that code. Signed-off-by: Michal Hocko <mhocko@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-08-02selinux: Generalize support for NNP/nosuid SELinux domain transitionsStephen Smalley
As systemd ramps up enabling NNP (NoNewPrivileges) for system services, it is increasingly breaking SELinux domain transitions for those services and their descendants. systemd enables NNP not only for services whose unit files explicitly specify NoNewPrivileges=yes but also for services whose unit files specify any of the following options in combination with running without CAP_SYS_ADMIN (e.g. specifying User= or a CapabilityBoundingSet= without CAP_SYS_ADMIN): SystemCallFilter=, SystemCallArchitectures=, RestrictAddressFamilies=, RestrictNamespaces=, PrivateDevices=, ProtectKernelTunables=, ProtectKernelModules=, MemoryDenyWriteExecute=, or RestrictRealtime= as per the systemd.exec(5) man page. The end result is bad for the security of both SELinux-disabled and SELinux-enabled systems. Packagers have to turn off these options in the unit files to preserve SELinux domain transitions. For users who choose to disable SELinux, this means that they miss out on at least having the systemd-supported protections. For users who keep SELinux enabled, they may still be missing out on some protections because it isn't necessarily guaranteed that the SELinux policy for that service provides the same protections in all cases. commit 7b0d0b40cd78 ("selinux: Permit bounded transitions under NO_NEW_PRIVS or NOSUID.") allowed bounded transitions under NNP in order to support limited usage for sandboxing programs. However, defining typebounds for all of the affected service domains is impractical to implement in policy, since typebounds requires us to ensure that each domain is allowed everything all of its descendant domains are allowed, and this has to be repeated for the entire chain of domain transitions. There is no way to clone all allow rules from descendants to their ancestors in policy currently, and doing so would be undesirable even if it were practical, as it requires leaking permissions to objects and operations into ancestor domains that could weaken their own security in order to allow them to the descendants (e.g. if a descendant requires execmem permission, then so do all of its ancestors; if a descendant requires execute permission to a file, then so do all of its ancestors; if a descendant requires read to a symbolic link or temporary file, then so do all of its ancestors...). SELinux domains are intentionally not hierarchical / bounded in this manner normally, and making them so would undermine their protections and least privilege. We have long had a similar tension with SELinux transitions and nosuid mounts, albeit not as severe. Users often have had to choose between retaining nosuid on a mount and allowing SELinux domain transitions on files within those mounts. This likewise leads to unfortunate tradeoffs in security. Decouple NNP/nosuid from SELinux transitions, so that we don't have to make a choice between them. Introduce a nnp_nosuid_transition policy capability that enables transitions under NNP/nosuid to be based on a permission (nnp_transition for NNP; nosuid_transition for nosuid) between the old and new contexts in addition to the current support for bounded transitions. Domain transitions can then be allowed in policy without requiring the parent to be a strict superset of all of its children. With this change, systemd unit files can be left unmodified from upstream. SELinux-disabled and SELinux-enabled users will benefit from retaining any of the systemd-provided protections. SELinux policy will only need to be adapted to enable the new policy capability and to allow the new permissions between domain pairs as appropriate. NB: Allowing nnp_transition between two contexts opens up the potential for the old context to subvert the new context by installing seccomp filters before the execve. Allowing nosuid_transition between two contexts opens up the potential for a context transition to occur on a file from an untrusted filesystem (e.g. removable media or remote filesystem). Use with care. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-08-01smack: Remove redundant pdeath_signal clearingKees Cook
This removes the redundant pdeath_signal clearing in Smack: the check in smack_bprm_committing_creds() matches the check in smack_bprm_set_creds() (which used to be in the now-removed smack_bprm_securexec() hook) and since secureexec is now being checked for clearing pdeath_signal, this is redundant to the common exec code. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
2017-08-01LSM: drop bprm_secureexec hookKees Cook
This removes the bprm_secureexec hook since the logic has been folded into the bprm_set_creds hook for all LSMs now. Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01commoncap: Move cap_elevated calculation into bprm_set_credsKees Cook
Instead of a separate function, open-code the cap_elevated test, which lets us entirely remove bprm->cap_effective (to use the local "effective" variable instead), and more accurately examine euid/egid changes via the existing local "is_setid". The following LTP tests were run to validate the changes: # ./runltp -f syscalls -s cap # ./runltp -f securebits # ./runltp -f cap_bounds # ./runltp -f filecaps All kernel selftests for capabilities and exec continue to pass as well. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Andy Lutomirski <luto@kernel.org>
2017-08-01commoncap: Refactor to remove bprm_secureexec hookKees Cook
The commoncap implementation of the bprm_secureexec hook is the only LSM that depends on the final call to its bprm_set_creds hook (since it may be called for multiple files, it ignores bprm->called_set_creds). As a result, it cannot safely _clear_ bprm->secureexec since other LSMs may have set it. Instead, remove the bprm_secureexec hook by introducing a new flag to bprm specific to commoncap: cap_elevated. This is similar to cap_effective, but that is used for a specific subset of elevated privileges, and exists solely to track state from bprm_set_creds to bprm_secureexec. As such, it will be removed in the next patch. Here, set the new bprm->cap_elevated flag when setuid/setgid has happened from bprm_fill_uid() or fscapabilities have been prepared. This temporarily moves the bprm_secureexec hook to a static inline. The helper will be removed in the next patch; this makes the step easier to review and bisect, since this does not introduce any changes to inputs nor outputs to the "elevated privileges" calculation. The new flag is merged with the bprm->secureexec flag in setup_new_exec() since this marks the end of any further prepare_binprm() calls. Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Acked-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01smack: Refactor to remove bprm_secureexec hookKees Cook
The Smack bprm_secureexec hook can be merged with the bprm_set_creds hook since it's dealing with the same information, and all of the details are finalized during the first call to the bprm_set_creds hook via prepare_binprm() (subsequent calls due to binfmt_script, etc, are ignored via bprm->called_set_creds). Here, the test can just happen at the end of the bprm_set_creds hook, and the bprm_secureexec hook can be dropped. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
2017-08-01selinux: Refactor to remove bprm_secureexec hookKees Cook
The SELinux bprm_secureexec hook can be merged with the bprm_set_creds hook since it's dealing with the same information, and all of the details are finalized during the first call to the bprm_set_creds hook via prepare_binprm() (subsequent calls due to binfmt_script, etc, are ignored via bprm->called_set_creds). Here, the test can just happen at the end of the bprm_set_creds hook, and the bprm_secureexec hook can be dropped. Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Paul Moore <paul@paul-moore.com> Tested-by: Paul Moore <paul@paul-moore.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org>
2017-08-01apparmor: Refactor to remove bprm_secureexec hookKees Cook
The AppArmor bprm_secureexec hook can be merged with the bprm_set_creds hook since it's dealing with the same information, and all of the details are finalized during the first call to the bprm_set_creds hook via prepare_binprm() (subsequent calls due to binfmt_script, etc, are ignored via bprm->called_set_creds). Here, all the comments describe how secureexec is actually calculated during bprm_set_creds, so this actually does it, drops the bprm flag that was being used internally by AppArmor, and drops the bprm_secureexec hook. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: John Johansen <john.johansen@canonical.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01exec: Rename bprm->cred_prepared to called_set_credsKees Cook
The cred_prepared bprm flag has a misleading name. It has nothing to do with the bprm_prepare_cred hook, and actually tracks if bprm_set_creds has been called. Rename this flag and improve its comment. Cc: David Howells <dhowells@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: John Johansen <john.johansen@canonical.com> Acked-by: James Morris <james.l.morris@oracle.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Serge Hallyn <serge@hallyn.com>
2017-07-31netfilter: nf_hook_ops structs can be constFlorian Westphal
We no longer place these on a list so they can be const. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-25selinux: Assign proper class to PF_UNIX/SOCK_RAW socketsLuis Ressel
For PF_UNIX, SOCK_RAW is synonymous with SOCK_DGRAM (cf. net/unix/af_unix.c). This is a tad obscure, but libpcap uses it. Signed-off-by: Luis Ressel <aranea@aixah.de> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-07-25sync to Linus v4.13-rc2 for subsystem developers to work againstJames Morris
2017-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-07-20security: Use user_namespace::level to avoid redundant iterations in ↵Kirill Tkhai
cap_capable() When ns->level is not larger then cred->user_ns->level, then ns can't be cred->user_ns's descendant, and there is no a sense to search in parents. So, break the cycle earlier and skip needless iterations. v2: Change comment on suggested by Andy Lutomirski. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-07-19Merge tag 'gcc-plugins-v4.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull structure randomization updates from Kees Cook: "Now that IPC and other changes have landed, enable manual markings for randstruct plugin, including the task_struct. This is the rest of what was staged in -next for the gcc-plugins, and comes in three patches, largest first: - mark "easy" structs with __randomize_layout - mark task_struct with an optional anonymous struct to isolate the __randomize_layout section - mark structs to opt _out_ of automated marking (which will come later) And, FWIW, this continues to pass allmodconfig (normal and patched to enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and s390 for me" * tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randstruct: opt-out externally exposed function pointer structs task_struct: Allow randomized layout randstruct: Mark various structs for randomization