Merge branch 'bpf-cgroup-bind-connect'

Andrey Ignatov says: ==================== v2->v3: - rebase due to conflicts - fix ipv6=m build v1->v2: - support expected_attach_type at prog load time so that prog (incl. context accesses and calls to helpers) can be validated with regard to specific attach point it is supposed to be attached to. Later, at attach time, attach type is checked so that it must be same as at load time if it was provided - reworked hooks to rely on expected_attach_type, and reduced number of new prog types from 6 to just 1: BPF_PROG_TYPE_CGROUP_SOCK_ADDR - reused BPF_PROG_TYPE_CGROUP_SOCK for sys_bind post-hooks - add selftests for post-sys_bind hook For our container management we've been using complicated and fragile setup consisting of LD_PRELOAD wrapper intercepting bind and connect calls from all containerized applications. Unfortunately it doesn't work for apps that don't use glibc and changing all applications that run in the datacenter is not possible due to 3rd party code and libraries (despite being open source code) and sheer amount of legacy code that has to be rewritten (we're rewriting what we can in parallel) These applications are written without containers in mind and have builtin assumptions about network services. Like an application X expects to connect localhost:special_port and find service Y in there. To move application X and service Y into two different containers LD_PRELOAD approach is used to help one service connect to another without rewriting them. Moving these two applications into different L2 (netns) or L3 (vrf) network isolation scopes doesn't help to solve the problem, since applications need to see each other like they were running on the host without containers. So if app X and app Y would run in different netns something would need to punch a connectivity hole in those namespaces. That would be real layering violation (with corresponding network debugging pains), since clean l2, l3 abstraction would suddenly support something that breaks through the layers. Instead we used LD_PRELOAD (and now bpf programs) at bind/connect time to help applications discover and connect to each other. All applications are running in init_nens and there are no vrfs. After bind/connect the normal fib/neighbor core networking logic works as it should always do and the whole system is clean from network point of view and can be debugged with standard tools. We also considered resurrecting Hannes's afnetns work, but all hierarchical namespace abstraction don't work due to these builtin networking assumptions inside the apps. To run an application inside cgroup container that was not written with containers in mind we have to make an illusion of running in non-containerized environment. In some cases we remember the port and container id in the post-bind hook in a bpf map and when some other task in a different container is trying to connect to a service we need to know where this service is running. It can be remote and can be local. Both client and service may or may not be written with containers in mind and this sockaddr rewrite is providing connectivity and load balancing feature. BPF+cgroup looks to be the best solution for this problem. Hence we introduce 3 hooks: - at entry into sys_bind and sys_connect to let bpf prog look and modify 'struct sockaddr' provided by user space and fail bind/connect when appropriate - post sys_bind after port is allocated The approach works great and has zero overhead for anyone who doesn't use it and very low overhead when deployed. Different use case for this feature is to do low overhead firewall that doesn't need to inspect all packets and works at bind/connect time. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
author: Daniel Borkmann <daniel@iogearbox.net> 2018-03-31 02:17:57 +0200
committer: Daniel Borkmann <daniel@iogearbox.net> 2018-03-31 02:18:07 +0200
commit: 7828f20e3779e4e85e55371e0e43f5006a15fb41 (patch)
tree: 48f4977b0b8e69bd6432b18556ad9ac7ca7728eb /tools
parent: 807ae7daf5fb9ba9ef688344ae7c0d8cbebd211c (diff)
parent: 1d436885b23bf4474617914d7eb15e039c83ed99 (diff)
12 files changed, 1425 insertions, 50 deletions
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 58060bec999d..9d07465023a2 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -136,6 +136,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_DEVICE,
 	BPF_PROG_TYPE_SK_MSG,
 	BPF_PROG_TYPE_RAW_TRACEPOINT,
+	BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 };
 
 enum bpf_attach_type {
@@ -147,6 +148,12 @@ enum bpf_attach_type {
 	BPF_SK_SKB_STREAM_VERDICT,
 	BPF_CGROUP_DEVICE,
 	BPF_SK_MSG_VERDICT,
+	BPF_CGROUP_INET4_BIND,
+	BPF_CGROUP_INET6_BIND,
+	BPF_CGROUP_INET4_CONNECT,
+	BPF_CGROUP_INET6_CONNECT,
+	BPF_CGROUP_INET4_POST_BIND,
+	BPF_CGROUP_INET6_POST_BIND,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -296,6 +303,11 @@ union bpf_attr {
 		__u32		prog_flags;
 		char		prog_name[BPF_OBJ_NAME_LEN];
 		__u32		prog_ifindex;	/* ifindex of netdev to prep for */
+		/* For some prog types expected attach type must be known at
+		 * load time to verify attach type specific parts of prog
+		 * (context accesses, allowed helpers, etc).
+		 */
+		__u32		expected_attach_type;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -736,6 +748,13 @@ union bpf_attr {
  *     @flags: reserved for future use
  *     Return: SK_PASS
  *
+ * int bpf_bind(ctx, addr, addr_len)
+ *     Bind socket to address. Only binding to IP is supported, no port can be
+ *     set in addr.
+ *     @ctx: pointer to context of type bpf_sock_addr
+ *     @addr: pointer to struct sockaddr to bind socket to
+ *     @addr_len: length of sockaddr structure
+ *     Return: 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -801,7 +820,8 @@ union bpf_attr {
 	FN(msg_redirect_map),		\
 	FN(msg_apply_bytes),		\
 	FN(msg_cork_bytes),		\
-	FN(msg_pull_data),
+	FN(msg_pull_data),		\
+	FN(bind),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -929,6 +949,15 @@ struct bpf_sock {
 	__u32 protocol;
 	__u32 mark;
 	__u32 priority;
+	__u32 src_ip4;		/* Allows 1,2,4-byte read.
+				 * Stored in network byte order.
+				 */
+	__u32 src_ip6[4];	/* Allows 1,2,4-byte read.
+				 * Stored in network byte order.
+				 */
+	__u32 src_port;		/* Allows 4-byte read.
+				 * Stored in host byte order
+				 */
 };
 
 #define XDP_PACKET_HEADROOM 256
@@ -1004,6 +1033,26 @@ struct bpf_map_info {
 	__u64 netns_ino;
 } __attribute__((aligned(8)));
 
+/* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
+ * by user and intended to be used by socket (e.g. to bind to, depends on
+ * attach attach type).
+ */
+struct bpf_sock_addr {
+	__u32 user_family;	/* Allows 4-byte read, but no write. */
+	__u32 user_ip4;		/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 user_ip6[4];	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 user_port;	/* Allows 4-byte read and write.
+				 * Stored in network byte order
+				 */
+	__u32 family;		/* Allows 4-byte read, but no write */
+	__u32 type;		/* Allows 4-byte read, but no write */
+	__u32 protocol;		/* Allows 4-byte read, but no write */
+};
+
 /* User bpf_sock_ops struct to access socket values and specify request ops
  * and their replies.
  * Some of this fields are in network (bigendian) byte order and may need
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index e0500055f1a6..acbb3f8b3bec 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -146,26 +146,30 @@ int bpf_create_map_in_map(enum bpf_map_type map_type, const char *name,
 					  -1);
 }
 
-int bpf_load_program_name(enum bpf_prog_type type, const char *name,
-			  const struct bpf_insn *insns,
-			  size_t insns_cnt, const char *license,
-			  __u32 kern_version, char *log_buf,
-			  size_t log_buf_sz)
+int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
+			   char *log_buf, size_t log_buf_sz)
 {
-	int fd;
 	union bpf_attr attr;
-	__u32 name_len = name ? strlen(name) : 0;
+	__u32 name_len;
+	int fd;
+
+	if (!load_attr)
+		return -EINVAL;
+
+	name_len = load_attr->name ? strlen(load_attr->name) : 0;
 
 	bzero(&attr, sizeof(attr));
-	attr.prog_type = type;
-	attr.insn_cnt = (__u32)insns_cnt;
-	attr.insns = ptr_to_u64(insns);
-	attr.license = ptr_to_u64(license);
+	attr.prog_type = load_attr->prog_type;
+	attr.expected_attach_type = load_attr->expected_attach_type;
+	attr.insn_cnt = (__u32)load_attr->insns_cnt;
+	attr.insns = ptr_to_u64(load_attr->insns);
+	attr.license = ptr_to_u64(load_attr->license);
 	attr.log_buf = ptr_to_u64(NULL);
 	attr.log_size = 0;
 	attr.log_level = 0;
-	attr.kern_version = kern_version;
-	memcpy(attr.prog_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
+	attr.kern_version = load_attr->kern_version;
+	memcpy(attr.prog_name, load_attr->name,
+	       min(name_len, BPF_OBJ_NAME_LEN - 1));
 
 	fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
 	if (fd >= 0 || !log_buf || !log_buf_sz)
@@ -184,8 +188,18 @@ int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		     __u32 kern_version, char *log_buf,
 		     size_t log_buf_sz)
 {
-	return bpf_load_program_name(type, NULL, insns, insns_cnt, license,
-				     kern_version, log_buf, log_buf_sz);
+	struct bpf_load_program_attr load_attr;
+
+	memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
+	load_attr.prog_type = type;
+	load_attr.expected_attach_type = 0;
+	load_attr.name = NULL;
+	load_attr.insns = insns;
+	load_attr.insns_cnt = insns_cnt;
+	load_attr.license = license;
+	load_attr.kern_version = kern_version;
+
+	return bpf_load_program_xattr(&load_attr, log_buf, log_buf_sz);
 }
 
 int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns,
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index ee59342c6f42..39f6a0d64a3b 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -41,13 +41,20 @@ int bpf_create_map_in_map(enum bpf_map_type map_type, const char *name,
 			  int key_size, int inner_map_fd, int max_entries,
 			  __u32 map_flags);
 
+struct bpf_load_program_attr {
+	enum bpf_prog_type prog_type;
+	enum bpf_attach_type expected_attach_type;
+	const char *name;
+	const struct bpf_insn *insns;
+	size_t insns_cnt;
+	const char *license;
+	__u32 kern_version;
+};
+
 /* Recommend log buffer size */
 #define BPF_LOG_BUF_SIZE (256 * 1024)
-int bpf_load_program_name(enum bpf_prog_type type, const char *name,
-			  const struct bpf_insn *insns,
-			  size_t insns_cnt, const char *license,
-			  __u32 kern_version, char *log_buf,
-			  size_t log_buf_sz);
+int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
+			   char *log_buf, size_t log_buf_sz);
 int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		     size_t insns_cnt, const char *license,
 		     __u32 kern_version, char *log_buf,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 64a8fc384186..5922443063f0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -203,6 +203,8 @@ struct bpf_program {
 	struct bpf_object *obj;
 	void *priv;
 	bpf_program_clear_priv_t clear_priv;
+
+	enum bpf_attach_type expected_attach_type;
 };
 
 struct bpf_map {
@@ -1162,21 +1164,31 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
 }
 
 static int
-load_program(enum bpf_prog_type type, const char *name, struct bpf_insn *insns,
-	     int insns_cnt, char *license, u32 kern_version, int *pfd)
+load_program(enum bpf_prog_type type, enum bpf_attach_type expected_attach_type,
+	     const char *name, struct bpf_insn *insns, int insns_cnt,
+	     char *license, u32 kern_version, int *pfd)
 {
-	int ret;
+	struct bpf_load_program_attr load_attr;
 	char *log_buf;
+	int ret;
 
-	if (!insns || !insns_cnt)
+	memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
+	load_attr.prog_type = type;
+	load_attr.expected_attach_type = expected_attach_type;
+	load_attr.name = name;
+	load_attr.insns = insns;
+	load_attr.insns_cnt = insns_cnt;
+	load_attr.license = license;
+	load_attr.kern_version = kern_version;
+
+	if (!load_attr.insns || !load_attr.insns_cnt)
 		return -EINVAL;
 
 	log_buf = malloc(BPF_LOG_BUF_SIZE);
 	if (!log_buf)
 		pr_warning("Alloc log buffer for bpf loader error, continue without log\n");
 
-	ret = bpf_load_program_name(type, name, insns, insns_cnt, license,
-				    kern_version, log_buf, BPF_LOG_BUF_SIZE);
+	ret = bpf_load_program_xattr(&load_attr, log_buf, BPF_LOG_BUF_SIZE);
 
 	if (ret >= 0) {
 		*pfd = ret;
@@ -1192,18 +1204,18 @@ load_program(enum bpf_prog_type type, const char *name, struct bpf_insn *insns,
 		pr_warning("-- BEGIN DUMP LOG ---\n");
 		pr_warning("\n%s\n", log_buf);
 		pr_warning("-- END LOG --\n");
-	} else if (insns_cnt >= BPF_MAXINSNS) {
-		pr_warning("Program too large (%d insns), at most %d insns\n",
-			   insns_cnt, BPF_MAXINSNS);
+	} else if (load_attr.insns_cnt >= BPF_MAXINSNS) {
+		pr_warning("Program too large (%zu insns), at most %d insns\n",
+			   load_attr.insns_cnt, BPF_MAXINSNS);
 		ret = -LIBBPF_ERRNO__PROG2BIG;
 	} else {
 		/* Wrong program type? */
-		if (type != BPF_PROG_TYPE_KPROBE) {
+		if (load_attr.prog_type != BPF_PROG_TYPE_KPROBE) {
 			int fd;
 
-			fd = bpf_load_program_name(BPF_PROG_TYPE_KPROBE, name,
-						   insns, insns_cnt, license,
-						   kern_version, NULL, 0);
+			load_attr.prog_type = BPF_PROG_TYPE_KPROBE;
+			load_attr.expected_attach_type = 0;
+			fd = bpf_load_program_xattr(&load_attr, NULL, 0);
 			if (fd >= 0) {
 				close(fd);
 				ret = -LIBBPF_ERRNO__PROGTYPE;
@@ -1247,8 +1259,9 @@ bpf_program__load(struct bpf_program *prog,
 			pr_warning("Program '%s' is inconsistent: nr(%d) != 1\n",
 				   prog->section_name, prog->instances.nr);
 		}
-		err = load_program(prog->type, prog->name, prog->insns,
-				   prog->insns_cnt, license, kern_version, &fd);
+		err = load_program(prog->type, prog->expected_attach_type,
+				   prog->name, prog->insns, prog->insns_cnt,
+				   license, kern_version, &fd);
 		if (!err)
 			prog->instances.fds[0] = fd;
 		goto out;
@@ -1276,8 +1289,8 @@ bpf_program__load(struct bpf_program *prog,
 			continue;
 		}
 
-		err = load_program(prog->type, prog->name,
-				   result.new_insn_ptr,
+		err = load_program(prog->type, prog->expected_attach_type,
+				   prog->name, result.new_insn_ptr,
 				   result.new_insn_cnt,
 				   license, kern_version, &fd);
 
@@ -1835,11 +1848,25 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
-#define BPF_PROG_SEC(string, type) { string, sizeof(string) - 1, type }
+static void bpf_program__set_expected_attach_type(struct bpf_program *prog,
+						 enum bpf_attach_type type)
+{
+	prog->expected_attach_type = type;
+}
+
+#define BPF_PROG_SEC_FULL(string, ptype, atype) \
+	{ string, sizeof(string) - 1, ptype, atype }
+
+#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_FULL(string, ptype, 0)
+
+#define BPF_SA_PROG_SEC(string, ptype) \
+	BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, ptype)
+
 static const struct {
 	const char *sec;
 	size_t len;
 	enum bpf_prog_type prog_type;
+	enum bpf_attach_type expected_attach_type;
 } section_names[] = {
 	BPF_PROG_SEC("socket",		BPF_PROG_TYPE_SOCKET_FILTER),
 	BPF_PROG_SEC("kprobe/",		BPF_PROG_TYPE_KPROBE),
@@ -1858,10 +1885,17 @@ static const struct {
 	BPF_PROG_SEC("sockops",		BPF_PROG_TYPE_SOCK_OPS),
 	BPF_PROG_SEC("sk_skb",		BPF_PROG_TYPE_SK_SKB),
 	BPF_PROG_SEC("sk_msg",		BPF_PROG_TYPE_SK_MSG),
+	BPF_SA_PROG_SEC("cgroup/bind4",	BPF_CGROUP_INET4_BIND),
+	BPF_SA_PROG_SEC("cgroup/bind6",	BPF_CGROUP_INET6_BIND),
+	BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT),
+	BPF_SA_PROG_SEC("cgroup/connect6", BPF_CGROUP_INET6_CONNECT),
 };
+
 #undef BPF_PROG_SEC
+#undef BPF_PROG_SEC_FULL
+#undef BPF_SA_PROG_SEC
 
-static enum bpf_prog_type bpf_program__guess_type(struct bpf_program *prog)
+static int bpf_program__identify_section(struct bpf_program *prog)
 {
 	int i;
 
@@ -1871,13 +1905,13 @@ static enum bpf_prog_type bpf_program__guess_type(struct bpf_program *prog)
 	for (i = 0; i < ARRAY_SIZE(section_names); i++)
 		if (strncmp(prog->section_name, section_names[i].sec,
 			    section_names[i].len) == 0)
-			return section_names[i].prog_type;
+			return i;
 
 err:
 	pr_warning("failed to guess program type based on section name %s\n",
 		   prog->section_name);
 
-	return BPF_PROG_TYPE_UNSPEC;
+	return -1;
 }
 
 int bpf_map__fd(struct bpf_map *map)
@@ -1977,11 +2011,30 @@ long libbpf_get_error(const void *ptr)
 int bpf_prog_load(const char *file, enum bpf_prog_type type,
 		  struct bpf_object **pobj, int *prog_fd)
 {
+	struct bpf_prog_load_attr attr;
+
+	memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
+	attr.file = file;
+	attr.prog_type = type;
+	attr.expected_attach_type = 0;
+
+	return bpf_prog_load_xattr(&attr, pobj, prog_fd);
+}
+
+int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
+			struct bpf_object **pobj, int *prog_fd)
+{
 	struct bpf_program *prog, *first_prog = NULL;
+	enum bpf_attach_type expected_attach_type;
+	enum bpf_prog_type prog_type;
 	struct bpf_object *obj;
+	int section_idx;
 	int err;
 
-	obj = bpf_object__open(file);
+	if (!attr)
+		return -EINVAL;
+
+	obj = bpf_object__open(attr->file);
 	if (IS_ERR(obj))
 		return -ENOENT;
 
@@ -1990,15 +2043,23 @@ int bpf_prog_load(const char *file, enum bpf_prog_type type,
 		 * If type is not specified, try to guess it based on
 		 * section name.
 		 */
-		if (type == BPF_PROG_TYPE_UNSPEC) {
-			type = bpf_program__guess_type(prog);
-			if (type == BPF_PROG_TYPE_UNSPEC) {
+		prog_type = attr->prog_type;
+		expected_attach_type = attr->expected_attach_type;
+		if (prog_type == BPF_PROG_TYPE_UNSPEC) {
+			section_idx = bpf_program__identify_section(prog);
+			if (section_idx < 0) {
 				bpf_object__close(obj);
 				return -EINVAL;
 			}
+			prog_type = section_names[section_idx].prog_type;
+			expected_attach_type =
+				section_names[section_idx].expected_attach_type;
 		}
 
-		bpf_program__set_type(prog, type);
+		bpf_program__set_type(prog, prog_type);
+		bpf_program__set_expected_attach_type(prog,
+						      expected_attach_type);
+
 		if (prog->idx != obj->efile.text_shndx && !first_prog)
 			first_prog = prog;
 	}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index f85906533cdd..a3a62a583f27 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -248,6 +248,14 @@ int bpf_map__pin(struct bpf_map *map, const char *path);
 
 long libbpf_get_error(const void *ptr);
 
+struct bpf_prog_load_attr {
+	const char *file;
+	enum bpf_prog_type prog_type;
+	enum bpf_attach_type expected_attach_type;
+};
+
+int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
+			struct bpf_object **pobj, int *prog_fd);
 int bpf_prog_load(const char *file, enum bpf_prog_type type,
 		  struct bpf_object **pobj, int *prog_fd);
 
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index f35fb02bdf56..0a315ddabbf4 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -23,21 +23,23 @@ urandom_read: urandom_read.c
 
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
-	test_align test_verifier_log test_dev_cgroup test_tcpbpf_user
+	test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
+	test_sock test_sock_addr
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
 	test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o     \
 	sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
 	test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
 	sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
-	sockmap_tcp_msg_prog.o
+	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
 	test_libbpf.sh \
 	test_xdp_redirect.sh \
 	test_xdp_meta.sh \
-	test_offload.py
+	test_offload.py \
+	test_sock_addr.sh
 
 # Compile but not part of 'make run_tests'
 TEST_GEN_PROGS_EXTENDED = test_libbpf_open
@@ -51,6 +53,8 @@ $(TEST_GEN_PROGS): $(BPFOBJ)
 $(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/libbpf.a
 
 $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
+$(OUTPUT)/test_sock: cgroup_helpers.c
+$(OUTPUT)/test_sock_addr: cgroup_helpers.c
 
 .PHONY: force
 
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 7cae376d8d0c..d8223d99f96d 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -94,6 +94,8 @@ static int (*bpf_msg_cork_bytes)(void *ctx, int len) =
 	(void *) BPF_FUNC_msg_cork_bytes;
 static int (*bpf_msg_pull_data)(void *ctx, int start, int end, int flags) =
 	(void *) BPF_FUNC_msg_pull_data;
+static int (*bpf_bind)(void *ctx, void *addr, int addr_len) =
+	(void *) BPF_FUNC_bind;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/connect4_prog.c b/tools/testing/selftests/bpf/connect4_prog.c
new file mode 100644
index 000000000000..5a88a681d2ab
--- /dev/null
+++ b/tools/testing/selftests/bpf/connect4_prog.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <string.h>
+
+#include <linux/stddef.h>
+#include <linux/bpf.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <sys/socket.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define SRC_REWRITE_IP4		0x7f000004U
+#define DST_REWRITE_IP4		0x7f000001U
+#define DST_REWRITE_PORT4	4444
+
+int _version SEC("version") = 1;
+
+SEC("cgroup/connect4")
+int connect_v4_prog(struct bpf_sock_addr *ctx)
+{
+	struct sockaddr_in sa;
+
+	/* Rewrite destination. */
+	ctx->user_ip4 = bpf_htonl(DST_REWRITE_IP4);
+	ctx->user_port = bpf_htons(DST_REWRITE_PORT4);
+
+	if (ctx->type == SOCK_DGRAM || ctx->type == SOCK_STREAM) {
+		///* Rewrite source. */
+		memset(&sa, 0, sizeof(sa));
+
+		sa.sin_family = AF_INET;
+		sa.sin_port = bpf_htons(0);
+		sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4);
+
+		if (bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa)) != 0)
+			return 0;
+	}
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/connect6_prog.c b/tools/testing/selftests/bpf/connect6_prog.c
new file mode 100644
index 000000000000..8ea3f7d12dee
--- /dev/null
+++ b/tools/testing/selftests/bpf/connect6_prog.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <string.h>
+
+#include <linux/stddef.h>
+#include <linux/bpf.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <sys/socket.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define SRC_REWRITE_IP6_0	0
+#define SRC_REWRITE_IP6_1	0
+#define SRC_REWRITE_IP6_2	0
+#define SRC_REWRITE_IP6_3	6
+
+#define DST_REWRITE_IP6_0	0
+#define DST_REWRITE_IP6_1	0
+#define DST_REWRITE_IP6_2	0
+#define DST_REWRITE_IP6_3	1
+
+#define DST_REWRITE_PORT6	6666
+
+int _version SEC("version") = 1;
+
+SEC("cgroup/connect6")
+int connect_v6_prog(struct bpf_sock_addr *ctx)
+{
+	struct sockaddr_in6 sa;
+
+	/* Rewrite destination. */
+	ctx->user_ip6[0] = bpf_htonl(DST_REWRITE_IP6_0);
+	ctx->user_ip6[1] = bpf_htonl(DST_REWRITE_IP6_1);
+	ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
+	ctx->user_ip6[3] = bpf_htonl(DST_REWRITE_IP6_3);
+
+	ctx->user_port = bpf_htons(DST_REWRITE_PORT6);
+
+	if (ctx->type == SOCK_DGRAM || ctx->type == SOCK_STREAM) {
+		/* Rewrite source. */
+		memset(&sa, 0, sizeof(sa));
+
+		sa.sin6_family = AF_INET6;
+		sa.sin6_port = bpf_htons(0);
+
+		sa.sin6_addr.s6_addr32[0] = bpf_htonl(SRC_REWRITE_IP6_0);
+		sa.sin6_addr.s6_addr32[1] = bpf_htonl(SRC_REWRITE_IP6_1);
+		sa.sin6_addr.s6_addr32[2] = bpf_htonl(SRC_REWRITE_IP6_2);
+		sa.sin6_addr.s6_addr32[3] = bpf_htonl(SRC_REWRITE_IP6_3);
+
+		if (bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa)) != 0)
+			return 0;
+	}
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_sock.c b/tools/testing/selftests/bpf/test_sock.c
new file mode 100644
index 000000000000..73bb20cfb9b7
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sock.c
@@ -0,0 +1,479 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <stdio.h>
+#include <unistd.h>
+
+#include <arpa/inet.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+
+#include <linux/filter.h>
+
+#include <bpf/bpf.h>
+
+#include "cgroup_helpers.h"
+
+#ifndef ARRAY_SIZE
+# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#define CG_PATH		"/foo"
+#define MAX_INSNS	512
+
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
+struct sock_test {
+	const char *descr;
+	/* BPF prog properties */
+	struct bpf_insn	insns[MAX_INSNS];
+	enum bpf_attach_type expected_attach_type;
+	enum bpf_attach_type attach_type;
+	/* Socket properties */
+	int domain;
+	int type;
+	/* Endpoint to bind() to */
+	const char *ip;
+	unsigned short port;
+	/* Expected test result */
+	enum {
+		LOAD_REJECT,
+		ATTACH_REJECT,
+		BIND_REJECT,
+		SUCCESS,
+	} result;
+};
+
+static struct sock_test tests[] = {
+	{
+		"bind4 load with invalid access: src_ip6",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+				    offsetof(struct bpf_sock, src_ip6[0])),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET4_POST_BIND,
+		BPF_CGROUP_INET4_POST_BIND,
+		0,
+		0,
+		NULL,
+		0,
+		LOAD_REJECT,
+	},
+	{
+		"bind4 load with invalid access: mark",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+				    offsetof(struct bpf_sock, mark)),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET4_POST_BIND,
+		BPF_CGROUP_INET4_POST_BIND,
+		0,
+		0,
+		NULL,
+		0,
+		LOAD_REJECT,
+	},
+	{
+		"bind6 load with invalid access: src_ip4",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+				    offsetof(struct bpf_sock, src_ip4)),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET6_POST_BIND,
+		BPF_CGROUP_INET6_POST_BIND,
+		0,
+		0,
+		NULL,
+		0,
+		LOAD_REJECT,
+	},
+	{
+		"sock_create load with invalid access: src_port",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+				    offsetof(struct bpf_sock, src_port)),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET_SOCK_CREATE,
+		BPF_CGROUP_INET_SOCK_CREATE,
+		0,
+		0,
+		NULL,
+		0,
+		LOAD_REJECT,
+	},
+	{
+		"sock_create load w/o expected_attach_type (compat mode)",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		0,
+		BPF_CGROUP_INET_SOCK_CREATE,
+		AF_INET,
+		SOCK_STREAM,
+		"127.0.0.1",
+		8097,
+		SUCCESS,
+	},
+	{
+		"sock_create load w/ expected_attach_type",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET_SOCK_CREATE,
+		BPF_CGROUP_INET_SOCK_CREATE,
+		AF_INET,
+		SOCK_STREAM,
+		"127.0.0.1",
+		8097,
+		SUCCESS,
+	},
+	{
+		"attach type mismatch bind4 vs bind6",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET4_POST_BIND,
+		BPF_CGROUP_INET6_POST_BIND,
+		0,
+		0,
+		NULL,
+		0,
+		ATTACH_REJECT,
+	},
+	{
+		"attach type mismatch bind6 vs bind4",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET6_POST_BIND,
+		BPF_CGROUP_INET4_POST_BIND,
+		0,
+		0,
+		NULL,
+		0,
+		ATTACH_REJECT,
+	},
+	{
+		"attach type mismatch default vs bind4",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		0,
+		BPF_CGROUP_INET4_POST_BIND,
+		0,
+		0,
+		NULL,
+		0,
+		ATTACH_REJECT,
+	},
+	{
+		"attach type mismatch bind6 vs sock_create",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET6_POST_BIND,
+		BPF_CGROUP_INET_SOCK_CREATE,
+		0,
+		0,
+		NULL,
+		0,
+		ATTACH_REJECT,
+	},
+	{
+		"bind4 reject all",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET4_POST_BIND,
+		BPF_CGROUP_INET4_POST_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		"0.0.0.0",
+		0,
+		BIND_REJECT,
+	},
+	{
+		"bind6 reject all",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET6_POST_BIND,
+		BPF_CGROUP_INET6_POST_BIND,
+		AF_INET6,
+		SOCK_STREAM,
+		"::",
+		0,
+		BIND_REJECT,
+	},
+	{
+		"bind6 deny specific IP & port",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+			/* if (ip == expected && port == expected) */
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+				    offsetof(struct bpf_sock, src_ip6[3])),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0x01000000, 4),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+				    offsetof(struct bpf_sock, src_port)),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0x2001, 2),
+
+			/* return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_A(1),
+
+			/* else return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET6_POST_BIND,
+		BPF_CGROUP_INET6_POST_BIND,
+		AF_INET6,
+		SOCK_STREAM,
+		"::1",
+		8193,
+		BIND_REJECT,
+	},
+	{
+		"bind4 allow specific IP & port",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+			/* if (ip == expected && port == expected) */
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+				    offsetof(struct bpf_sock, src_ip4)),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0x0100007F, 4),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+				    offsetof(struct bpf_sock, src_port)),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0x1002, 2),
+
+			/* return ALLOW; */
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_A(1),
+
+			/* else return DENY; */
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET4_POST_BIND,
+		BPF_CGROUP_INET4_POST_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		"127.0.0.1",
+		4098,
+		SUCCESS,
+	},
+	{
+		"bind4 allow all",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET4_POST_BIND,
+		BPF_CGROUP_INET4_POST_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		"0.0.0.0",
+		0,
+		SUCCESS,
+	},
+	{
+		"bind6 allow all",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		BPF_CGROUP_INET6_POST_BIND,
+		BPF_CGROUP_INET6_POST_BIND,
+		AF_INET6,
+		SOCK_STREAM,
+		"::",
+		0,
+		SUCCESS,
+	},
+};
+
+static size_t probe_prog_length(const struct bpf_insn *fp)
+{
+	size_t len;
+
+	for (len = MAX_INSNS - 1; len > 0; --len)
+		if (fp[len].code != 0 || fp[len].imm != 0)
+			break;
+	return len + 1;
+}
+
+static int load_sock_prog(const struct bpf_insn *prog,
+			  enum bpf_attach_type attach_type)
+{
+	struct bpf_load_program_attr attr;
+
+	memset(&attr, 0, sizeof(struct bpf_load_program_attr));
+	attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK;
+	attr.expected_attach_type = attach_type;
+	attr.insns = prog;
+	attr.insns_cnt = probe_prog_length(attr.insns);
+	attr.license = "GPL";
+
+	return bpf_load_program_xattr(&attr, bpf_log_buf, BPF_LOG_BUF_SIZE);
+}
+
+static int attach_sock_prog(int cgfd, int progfd,
+			    enum bpf_attach_type attach_type)
+{
+	return bpf_prog_attach(progfd, cgfd, attach_type, BPF_F_ALLOW_OVERRIDE);
+}
+
+static int bind_sock(int domain, int type, const char *ip, unsigned short port)
+{
+	struct sockaddr_storage addr;
+	struct sockaddr_in6 *addr6;
+	struct sockaddr_in *addr4;
+	int sockfd = -1;
+	socklen_t len;
+	int err = 0;
+
+	sockfd = socket(domain, type, 0);
+	if (sockfd < 0)
+		goto err;
+
+	memset(&addr, 0, sizeof(addr));
+
+	if (domain == AF_INET) {
+		len = sizeof(struct sockaddr_in);
+		addr4 = (struct sockaddr_in *)&addr;
+		addr4->sin_family = domain;
+		addr4->sin_port = htons(port);
+		if (inet_pton(domain, ip, (void *)&addr4->sin_addr) != 1)
+			goto err;
+	} else if (domain == AF_INET6) {
+		len = sizeof(struct sockaddr_in6);
+		addr6 = (struct sockaddr_in6 *)&addr;
+		addr6->sin6_family = domain;
+		addr6->sin6_port = htons(port);
+		if (inet_pton(domain, ip, (void *)&addr6->sin6_addr) != 1)
+			goto err;
+	} else {
+		goto err;
+	}
+
+	if (bind(sockfd, (const struct sockaddr *)&addr, len) == -1)
+		goto err;
+
+	goto out;
+err:
+	err = -1;
+out:
+	close(sockfd);
+	return err;
+}
+
+static int run_test_case(int cgfd, const struct sock_test *test)
+{
+	int progfd = -1;
+	int err = 0;
+
+	printf("Test case: %s .. ", test->descr);
+	progfd = load_sock_prog(test->insns, test->expected_attach_type);
+	if (progfd < 0) {
+		if (test->result == LOAD_REJECT)
+			goto out;
+		else
+			goto err;
+	}
+
+	if (attach_sock_prog(cgfd, progfd, test->attach_type) == -1) {
+		if (test->result == ATTACH_REJECT)
+			goto out;
+		else
+			goto err;
+	}
+
+	if (bind_sock(test->domain, test->type, test->ip, test->port) == -1) {
+		/* sys_bind() may fail for different reasons, errno has to be
+		 * checked to confirm that BPF program rejected it.
+		 */
+		if (test->result == BIND_REJECT && errno == EPERM)
+			goto out;
+		else
+			goto err;
+	}
+
+
+	if (test->result != SUCCESS)
+		goto err;
+
+	goto out;
+err:
+	err = -1;
+out:
+	/* Detaching w/o checking return code: best effort attempt. */
+	if (progfd != -1)
+		bpf_prog_detach(cgfd, test->attach_type);
+	close(progfd);
+	printf("[%s]\n", err ? "FAIL" : "PASS");
+	return err;
+}
+
+static int run_tests(int cgfd)
+{
+	int passes = 0;
+	int fails = 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tests); ++i) {
+		if (run_test_case(cgfd, &tests[i]))
+			++fails;
+		else
+			++passes;
+	}
+	printf("Summary: %d PASSED, %d FAILED\n", passes, fails);
+	return fails ? -1 : 0;
+}
+
+int main(int argc, char **argv)
+{
+	int cgfd = -1;
+	int err = 0;
+
+	if (setup_cgroup_environment())
+		goto err;
+
+	cgfd = create_and_get_cgroup(CG_PATH);
+	if (!cgfd)
+		goto err;
+
+	if (join_cgroup(CG_PATH))
+		goto err;
+
+	if (run_tests(cgfd))
+		goto err;
+
+	goto out;
+err:
+	err = -1;
+out:
+	close(cgfd);
+	cleanup_cgroup_environment();
+	return err;
+}
diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c
new file mode 100644
index 000000000000..d488f20926e8
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -0,0 +1,588 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <arpa/inet.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+
+#include <linux/filter.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+
+#define CG_PATH	"/foo"
+#define CONNECT4_PROG_PATH	"./connect4_prog.o"
+#define CONNECT6_PROG_PATH	"./connect6_prog.o"
+
+#define SERV4_IP		"192.168.1.254"
+#define SERV4_REWRITE_IP	"127.0.0.1"
+#define SERV4_PORT		4040
+#define SERV4_REWRITE_PORT	4444
+
+#define SERV6_IP		"face:b00c:1234:5678::abcd"
+#define SERV6_REWRITE_IP	"::1"
+#define SERV6_PORT		6060
+#define SERV6_REWRITE_PORT	6666
+
+#define INET_NTOP_BUF	40
+
+typedef int (*load_fn)(enum bpf_attach_type, const char *comment);
+typedef int (*info_fn)(int, struct sockaddr *, socklen_t *);
+
+struct program {
+	enum bpf_attach_type type;
+	load_fn	loadfn;
+	int fd;
+	const char *name;
+	enum bpf_attach_type invalid_type;
+};
+
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
+static int mk_sockaddr(int domain, const char *ip, unsigned short port,
+		       struct sockaddr *addr, socklen_t addr_len)
+{
+	struct sockaddr_in6 *addr6;
+	struct sockaddr_in *addr4;
+
+	if (domain != AF_INET && domain != AF_INET6) {
+		log_err("Unsupported address family");
+		return -1;
+	}
+
+	memset(addr, 0, addr_len);
+
+	if (domain == AF_INET) {
+		if (addr_len < sizeof(struct sockaddr_in))
+			return -1;
+		addr4 = (struct sockaddr_in *)addr;
+		addr4->sin_family = domain;
+		addr4->sin_port = htons(port);
+		if (inet_pton(domain, ip, (void *)&addr4->sin_addr) != 1) {
+			log_err("Invalid IPv4: %s", ip);
+			return -1;
+		}
+	} else if (domain == AF_INET6) {
+		if (addr_len < sizeof(struct sockaddr_in6))
+			return -1;
+		addr6 = (struct sockaddr_in6 *)addr;
+		addr6->sin6_family = domain;
+		addr6->sin6_port = htons(port);
+		if (inet_pton(domain, ip, (void *)&addr6->sin6_addr) != 1) {
+			log_err("Invalid IPv6: %s", ip);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+static int load_insns(enum bpf_attach_type attach_type,
+		      const struct bpf_insn *insns, size_t insns_cnt,
+		      const char *comment)
+{
+	struct bpf_load_program_attr load_attr;
+	int ret;
+
+	memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
+	load_attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
+	load_attr.expected_attach_type = attach_type;
+	load_attr.insns = insns;
+	load_attr.insns_cnt = insns_cnt;
+	load_attr.license = "GPL";
+
+	ret = bpf_load_program_xattr(&load_attr, bpf_log_buf, BPF_LOG_BUF_SIZE);
+	if (ret < 0 && comment) {
+		log_err(">>> Loading %s program error.\n"
+			">>> Output from verifier:\n%s\n-------\n",
+			comment, bpf_log_buf);
+	}
+
+	return ret;
+}
+
+/* [1] These testing programs try to read different context fields, including
+ * narrow loads of different sizes from user_ip4 and user_ip6, and write to
+ * those allowed to be overridden.
+ *
+ * [2] BPF_LD_IMM64 & BPF_JMP_REG are used below whenever there is a need to
+ * compare a register with unsigned 32bit integer. BPF_JMP_IMM can't be used
+ * in such cases since it accepts only _signed_ 32bit integer as IMM
+ * argument. Also note that BPF_LD_IMM64 contains 2 instructions what matters
+ * to count jumps properly.
+ */
+
+static int bind4_prog_load(enum bpf_attach_type attach_type,
+			   const char *comment)
+{
+	union {
+		uint8_t u4_addr8[4];
+		uint16_t u4_addr16[2];
+		uint32_t u4_addr32;
+	} ip4;
+	struct sockaddr_in addr4_rw;
+
+	if (inet_pton(AF_INET, SERV4_IP, (void *)&ip4) != 1) {
+		log_err("Invalid IPv4: %s", SERV4_IP);
+		return -1;
+	}
+
+	if (mk_sockaddr(AF_INET, SERV4_REWRITE_IP, SERV4_REWRITE_PORT,
+			(struct sockaddr *)&addr4_rw, sizeof(addr4_rw)) == -1)
+		return -1;
+
+	/* See [1]. */
+	struct bpf_insn insns[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+		/* if (sk.family == AF_INET && */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, family)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET, 16),
+
+		/*     (sk.type == SOCK_DGRAM || sk.type == SOCK_STREAM) && */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, type)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_DGRAM, 1),
+		BPF_JMP_A(1),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_STREAM, 12),
+
+		/*     1st_byte_of_user_ip4 == expected && */
+		BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, user_ip4)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr8[0], 10),
+
+		/*     1st_half_of_user_ip4 == expected && */
+		BPF_LDX_MEM(BPF_H, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, user_ip4)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip4.u4_addr16[0], 8),
+
+		/*     whole_user_ip4 == expected) { */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, user_ip4)),
+		BPF_LD_IMM64(BPF_REG_8, ip4.u4_addr32), /* See [2]. */
+		BPF_JMP_REG(BPF_JNE, BPF_REG_7, BPF_REG_8, 4),
+
+		/*      user_ip4 = addr4_rw.sin_addr */
+		BPF_MOV32_IMM(BPF_REG_7, addr4_rw.sin_addr.s_addr),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_ip4)),
+
+		/*      user_port = addr4_rw.sin_port */
+		BPF_MOV32_IMM(BPF_REG_7, addr4_rw.sin_port),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_port)),
+		/* } */
+
+		/* return 1 */
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	};
+
+	return load_insns(attach_type, insns,
+			  sizeof(insns) / sizeof(struct bpf_insn), comment);
+}
+
+static int bind6_prog_load(enum bpf_attach_type attach_type,
+			   const char *comment)
+{
+	struct sockaddr_in6 addr6_rw;
+	struct in6_addr ip6;
+
+	if (inet_pton(AF_INET6, SERV6_IP, (void *)&ip6) != 1) {
+		log_err("Invalid IPv6: %s", SERV6_IP);
+		return -1;
+	}
+
+	if (mk_sockaddr(AF_INET6, SERV6_REWRITE_IP, SERV6_REWRITE_PORT,
+			(struct sockaddr *)&addr6_rw, sizeof(addr6_rw)) == -1)
+		return -1;
+
+	/* See [1]. */
+	struct bpf_insn insns[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+		/* if (sk.family == AF_INET6 && */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, family)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET6, 18),
+
+		/*            5th_byte_of_user_ip6 == expected && */
+		BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, user_ip6[1])),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip6.s6_addr[4], 16),
+
+		/*            3rd_half_of_user_ip6 == expected && */
+		BPF_LDX_MEM(BPF_H, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, user_ip6[1])),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, ip6.s6_addr16[2], 14),
+
+		/*            last_word_of_user_ip6 == expected) { */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, user_ip6[3])),
+		BPF_LD_IMM64(BPF_REG_8, ip6.s6_addr32[3]),  /* See [2]. */
+		BPF_JMP_REG(BPF_JNE, BPF_REG_7, BPF_REG_8, 10),
+
+
+#define STORE_IPV6_WORD(N)						       \
+		BPF_MOV32_IMM(BPF_REG_7, addr6_rw.sin6_addr.s6_addr32[N]),     \
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,		       \
+			    offsetof(struct bpf_sock_addr, user_ip6[N]))
+
+		/*      user_ip6 = addr6_rw.sin6_addr */
+		STORE_IPV6_WORD(0),
+		STORE_IPV6_WORD(1),
+		STORE_IPV6_WORD(2),
+		STORE_IPV6_WORD(3),
+
+		/*      user_port = addr6_rw.sin6_port */
+		BPF_MOV32_IMM(BPF_REG_7, addr6_rw.sin6_port),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_port)),
+
+		/* } */
+
+		/* return 1 */
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	};
+
+	return load_insns(attach_type, insns,
+			  sizeof(insns) / sizeof(struct bpf_insn), comment);
+}
+
+static int connect_prog_load_path(const char *path,
+				  enum bpf_attach_type attach_type,
+				  const char *comment)
+{
+	struct bpf_prog_load_attr attr;
+	struct bpf_object *obj;
+	int prog_fd;
+
+	memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
+	attr.file = path;
+	attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
+	attr.expected_attach_type = attach_type;
+
+	if (bpf_prog_load_xattr(&attr, &obj, &prog_fd)) {
+		if (comment)
+			log_err(">>> Loading %s program at %s error.\n",
+				comment, path);
+		return -1;
+	}
+
+	return prog_fd;
+}
+
+static int connect4_prog_load(enum bpf_attach_type attach_type,
+			      const char *comment)
+{
+	return connect_prog_load_path(CONNECT4_PROG_PATH, attach_type, comment);
+}
+
+static int connect6_prog_load(enum bpf_attach_type attach_type,
+			      const char *comment)
+{
+	return connect_prog_load_path(CONNECT6_PROG_PATH, attach_type, comment);
+}
+
+static void print_ip_port(int sockfd, info_fn fn, const char *fmt)
+{
+	char addr_buf[INET_NTOP_BUF];
+	struct sockaddr_storage addr;
+	struct sockaddr_in6 *addr6;
+	struct sockaddr_in *addr4;
+	socklen_t addr_len;
+	unsigned short port;
+	void *nip;
+
+	addr_len = sizeof(struct sockaddr_storage);
+	memset(&addr, 0, addr_len);
+
+	if (fn(sockfd, (struct sockaddr *)&addr, (socklen_t *)&addr_len) == 0) {
+		if (addr.ss_family == AF_INET) {
+			addr4 = (struct sockaddr_in *)&addr;
+			nip = (void *)&addr4->sin_addr;
+			port = ntohs(addr4->sin_port);
+		} else if (addr.ss_family == AF_INET6) {
+			addr6 = (struct sockaddr_in6 *)&addr;
+			nip = (void *)&addr6->sin6_addr;
+			port = ntohs(addr6->sin6_port);
+		} else {
+			return;
+		}
+		const char *addr_str =
+			inet_ntop(addr.ss_family, nip, addr_buf, INET_NTOP_BUF);
+		printf(fmt, addr_str ? addr_str : "??", port);
+	}
+}
+
+static void print_local_ip_port(int sockfd, const char *fmt)
+{
+	print_ip_port(sockfd, getsockname, fmt);
+}
+
+static void print_remote_ip_port(int sockfd, const char *fmt)
+{
+	print_ip_port(sockfd, getpeername, fmt);
+}
+
+static int start_server(int type, const struct sockaddr_storage *addr,
+			socklen_t addr_len)
+{
+
+	int fd;
+
+	fd = socket(addr->ss_family, type, 0);
+	if (fd == -1) {
+		log_err("Failed to create server socket");
+		goto out;
+	}
+
+	if (bind(fd, (const struct sockaddr *)addr, addr_len) == -1) {
+		log_err("Failed to bind server socket");
+		goto close_out;
+	}
+
+	if (type == SOCK_STREAM) {
+		if (listen(fd, 128) == -1) {
+			log_err("Failed to listen on server socket");
+			goto close_out;
+		}
+	}
+
+	print_local_ip_port(fd, "\t   Actual: bind(%s, %d)\n");
+
+	goto out;
+close_out:
+	close(fd);
+	fd = -1;
+out:
+	return fd;
+}
+
+static int connect_to_server(int type, const struct sockaddr_storage *addr,
+			     socklen_t addr_len)
+{
+	int domain;
+	int fd;
+
+	domain = addr->ss_family;
+
+	if (domain != AF_INET && domain != AF_INET6) {
+		log_err("Unsupported address family");
+		return -1;
+	}
+
+	fd = socket(domain, type, 0);
+	if (fd == -1) {
+		log_err("Failed to creating client socket");
+		return -1;
+	}
+
+	if (connect(fd, (const struct sockaddr *)addr, addr_len) == -1) {
+		log_err("Fail to connect to server");
+		goto err;
+	}
+
+	print_remote_ip_port(fd, "\t   Actual: connect(%s, %d)");
+	print_local_ip_port(fd, " from (%s, %d)\n");
+
+	return 0;
+err:
+	close(fd);
+	return -1;
+}
+
+static void print_test_case_num(int domain, int type)
+{
+	static int test_num;
+
+	printf("Test case #%d (%s/%s):\n", ++test_num,
+	       (domain == AF_INET ? "IPv4" :
+		domain == AF_INET6 ? "IPv6" :
+		"unknown_domain"),
+	       (type == SOCK_STREAM ? "TCP" :
+		type == SOCK_DGRAM ? "UDP" :
+		"unknown_type"));
+}
+
+static int run_test_case(int domain, int type, const char *ip,
+			 unsigned short port)
+{
+	struct sockaddr_storage addr;
+	socklen_t addr_len = sizeof(addr);
+	int servfd = -1;
+	int err = 0;
+
+	print_test_case_num(domain, type);
+
+	if (mk_sockaddr(domain, ip, port, (struct sockaddr *)&addr,
+			addr_len) == -1)
+		return -1;
+
+	printf("\tRequested: bind(%s, %d) ..\n", ip, port);
+	servfd = start_server(type, &addr, addr_len);
+	if (servfd == -1)
+		goto err;
+
+	printf("\tRequested: connect(%s, %d) from (*, *) ..\n", ip, port);
+	if (connect_to_server(type, &addr, addr_len))
+		goto err;
+
+	goto out;
+err:
+	err = -1;
+out:
+	close(servfd);
+	return err;
+}
+
+static void close_progs_fds(struct program *progs, size_t prog_cnt)
+{
+	size_t i;
+
+	for (i = 0; i < prog_cnt; ++i) {
+		close(progs[i].fd);
+		progs[i].fd = -1;
+	}
+}
+
+static int load_and_attach_progs(int cgfd, struct program *progs,
+				 size_t prog_cnt)
+{
+	size_t i;
+
+	for (i = 0; i < prog_cnt; ++i) {
+		printf("Load %s with invalid type (can pollute stderr) ",
+		       progs[i].name);
+		fflush(stdout);
+		progs[i].fd = progs[i].loadfn(progs[i].invalid_type, NULL);
+		if (progs[i].fd != -1) {
+			log_err("Load with invalid type accepted for %s",
+				progs[i].name);
+			goto err;
+		}
+		printf("... REJECTED\n");
+
+		printf("Load %s with valid type", progs[i].name);
+		progs[i].fd = progs[i].loadfn(progs[i].type, progs[i].name);
+		if (progs[i].fd == -1) {
+			log_err("Failed to load program %s", progs[i].name);
+			goto err;
+		}
+		printf(" ... OK\n");
+
+		printf("Attach %s with invalid type", progs[i].name);
+		if (bpf_prog_attach(progs[i].fd, cgfd, progs[i].invalid_type,
+				    BPF_F_ALLOW_OVERRIDE) != -1) {
+			log_err("Attach with invalid type accepted for %s",
+				progs[i].name);
+			goto err;
+		}
+		printf(" ... REJECTED\n");
+
+		printf("Attach %s with valid type", progs[i].name);
+		if (bpf_prog_attach(progs[i].fd, cgfd, progs[i].type,
+				    BPF_F_ALLOW_OVERRIDE) == -1) {
+			log_err("Failed to attach program %s", progs[i].name);
+			goto err;
+		}
+		printf(" ... OK\n");
+	}
+
+	return 0;
+err:
+	close_progs_fds(progs, prog_cnt);
+	return -1;
+}
+
+static int run_domain_test(int domain, int cgfd, struct program *progs,
+			   size_t prog_cnt, const char *ip, unsigned short port)
+{
+	int err = 0;
+
+	if (load_and_attach_progs(cgfd, progs, prog_cnt) == -1)
+		goto err;
+
+	if (run_test_case(domain, SOCK_STREAM, ip, port) == -1)
+		goto err;
+
+	if (run_test_case(domain, SOCK_DGRAM, ip, port) == -1)
+		goto err;
+
+	goto out;
+err:
+	err = -1;
+out:
+	close_progs_fds(progs, prog_cnt);
+	return err;
+}
+
+static int run_test(void)
+{
+	size_t inet6_prog_cnt;
+	size_t inet_prog_cnt;
+	int cgfd = -1;
+	int err = 0;
+
+	struct program inet6_progs[] = {
+		{BPF_CGROUP_INET6_BIND, bind6_prog_load, -1, "bind6",
+		 BPF_CGROUP_INET4_BIND},
+		{BPF_CGROUP_INET6_CONNECT, connect6_prog_load, -1, "connect6",
+		 BPF_CGROUP_INET4_CONNECT},
+	};
+	inet6_prog_cnt = sizeof(inet6_progs) / sizeof(struct program);
+
+	struct program inet_progs[] = {
+		{BPF_CGROUP_INET4_BIND, bind4_prog_load, -1, "bind4",
+		 BPF_CGROUP_INET6_BIND},
+		{BPF_CGROUP_INET4_CONNECT, connect4_prog_load, -1, "connect4",
+		 BPF_CGROUP_INET6_CONNECT},
+	};
+	inet_prog_cnt = sizeof(inet_progs) / sizeof(struct program);
+
+	if (setup_cgroup_environment())
+		goto err;
+
+	cgfd = create_and_get_cgroup(CG_PATH);
+	if (!cgfd)
+		goto err;
+
+	if (join_cgroup(CG_PATH))
+		goto err;
+
+	if (run_domain_test(AF_INET, cgfd, inet_progs, inet_prog_cnt, SERV4_IP,
+			    SERV4_PORT) == -1)
+		goto err;
+
+	if (run_domain_test(AF_INET6, cgfd, inet6_progs, inet6_prog_cnt,
+			    SERV6_IP, SERV6_PORT) == -1)
+		goto err;
+
+	goto out;
+err:
+	err = -1;
+out:
+	close(cgfd);
+	cleanup_cgroup_environment();
+	printf(err ? "### FAIL\n" : "### SUCCESS\n");
+	return err;
+}
+
+int main(int argc, char **argv)
+{
+	if (argc < 2) {
+		fprintf(stderr,
+			"%s has to be run via %s.sh. Skip direct run.\n",
+			argv[0], argv[0]);
+		exit(0);
+	}
+	return run_test();
+}
diff --git a/tools/testing/selftests/bpf/test_sock_addr.sh b/tools/testing/selftests/bpf/test_sock_addr.sh
new file mode 100755
index 000000000000..c6e1dcf992c4
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sock_addr.sh
@@ -0,0 +1,57 @@
+#!/bin/sh
+
+set -eu
+
+ping_once()
+{
+	ping -q -c 1 -W 1 ${1%%/*} >/dev/null 2>&1
+}
+
+wait_for_ip()
+{
+	local _i
+	echo -n "Wait for testing IPv4/IPv6 to become available "
+	for _i in $(seq ${MAX_PING_TRIES}); do
+		echo -n "."
+		if ping_once ${TEST_IPv4} && ping_once ${TEST_IPv6}; then
+			echo " OK"
+			return
+		fi
+	done
+	echo 1>&2 "ERROR: Timeout waiting for test IP to become available."
+	exit 1
+}
+
+setup()
+{
+	# Create testing interfaces not to interfere with current environment.
+	ip link add dev ${TEST_IF} type veth peer name ${TEST_IF_PEER}
+	ip link set ${TEST_IF} up
+	ip link set ${TEST_IF_PEER} up
+
+	ip -4 addr add ${TEST_IPv4} dev ${TEST_IF}
+	ip -6 addr add ${TEST_IPv6} dev ${TEST_IF}
+	wait_for_ip
+}
+
+cleanup()
+{
+	ip link del ${TEST_IF} 2>/dev/null || :
+	ip link del ${TEST_IF_PEER} 2>/dev/null || :
+}
+
+main()
+{
+	trap cleanup EXIT 2 3 6 15
+	setup
+	./test_sock_addr setup_done
+}
+
+BASENAME=$(basename $0 .sh)
+TEST_IF="${BASENAME}1"
+TEST_IF_PEER="${BASENAME}2"
+TEST_IPv4="127.0.0.4/8"
+TEST_IPv6="::6/128"
+MAX_PING_TRIES=5
+
+main
author	Daniel Borkmann <daniel@iogearbox.net>	2018-03-31 02:17:57 +0200
committer	Daniel Borkmann <daniel@iogearbox.net>	2018-03-31 02:18:07 +0200
commit	7828f20e3779e4e85e55371e0e43f5006a15fb41 (patch)
tree	48f4977b0b8e69bd6432b18556ad9ac7ca7728eb /tools
parent	807ae7daf5fb9ba9ef688344ae7c0d8cbebd211c (diff)
parent	1d436885b23bf4474617914d7eb15e039c83ed99 (diff)