summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2011-02-05tcp: Add reference to initial CWND ietf draft.David S. Miller
Suggested by Alexander Zimmermann Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-04inetpeer: Move ICMP rate limiting state into inet_peer entries.David S. Miller
Like metrics, the ICMP rate limiting bits are cached state about a destination. So move it into the inet_peer entries. If an inet_peer cannot be bound (the reason is memory allocation failure or similar), the policy is to allow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-04Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2011-02-03include/net/genetlink.h: Allow genlmsg_cancel to accept a NULL argumentJulia Lawall
nlmsg_cancel can accept NULL as its second argument, so for similarity, this patch extends genlmsg_cancel to be able to accept a NULL second argument as well. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2011-02-02sched: CHOKe flow schedulerstephen hemminger
CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative packet scheduler based on the Random Exponential Drop (RED) algorithm. The core idea is: For every packet arrival: Calculate Qave if (Qave < minth) Queue the new packet else Select randomly a packet from the queue if (both packets from same flow) then Drop both the packets else if (Qave > maxth) Drop packet else Admit packet with proability p (same as RED) See also: Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active queue management scheme for approximating fair bandwidth allocation", Proceeding of INFOCOM'2000, March 2000. Help from: Eric Dumazet <eric.dumazet@gmail.com> Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02tcp: Increase the initial congestion window to 10.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Nandita Dukkipati <nanditad@google.com>
2011-02-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2011-02-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2011-02-03netfilter: xtables: add device group matchPatrick McHardy
Add a new 'devgroup' match to match on the device group of the incoming and outgoing network device of a packet. Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-02netfilter: ipset: fix linking with CONFIG_IPV6=nPatrick McHardy
Add a dummy ip_set_get_ip6_port function that unconditionally returns false for CONFIG_IPV6=n and convert the real function to ipv6_skip_exthdr() to avoid pulling in the ip6_tables module when loading ipset. Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01ipv4: Update some fib_hash centric interface names.David S. Miller
fib_hash_init() --> fib_trie_init() fib_hash_table() --> fib_trie_table() Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-01netfilter: ipset: install ipset related header filesPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01IPVS: Remove unused variablesSimon Horman
These variables are unused as a result of the recent netns work. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Tested-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01netfilter: ecache: always set events bits, filter them laterPablo Neira Ayuso
For the following rule: iptables -I PREROUTING -t raw -j CT --ctevents assured The event delivered looks like the following: [UPDATE] tcp 6 src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED] Note that the TCP protocol state is not included. For that reason the CT event filtering is not very useful for conntrackd. To resolve this issue, instead of conditionally setting the CT events bits based on the ctmask, we always set them and perform the filtering in the late stage, just before the delivery. Thus, the event delivered looks like the following: [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01netfilter: xtables: "set" match and "SET" target supportJozsef Kadlecsik
The patch adds the combined module of the "SET" target and "set" match to netfilter. Both the previous and the current revisions are supported. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01netfilter: ipset: list:set set type supportJozsef Kadlecsik
The module implements the list:set type support in two flavours: without and with timeout. The sets has two sides: for the userspace, they store the names of other (non list:set type of) sets: one can add, delete and test set names. For the kernel, it forms an ordered union of the member sets: the members sets are tried in order when elements are added, deleted and tested and the process stops at the first success. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01netfilter: ipset: hash:ip set type supportJozsef Kadlecsik
The module implements the hash:ip type support in four flavours: for IPv4 or IPv6, both without and with timeout support. All the hash types are based on the "array hash" or ahash structure and functions as a good compromise between minimal memory footprint and speed. The hashing uses arrays to resolve clashes. The hash table is resized (doubled) when searching becomes too long. Resizing can be triggered by userspace add commands only and those are serialized by the nfnl mutex. During resizing the set is read-locked, so the only possible concurrent operations are the kernel side readers. Those are protected by RCU locking. Because of the four flavours and the other hash types, the functions are implemented in general forms in the ip_set_ahash.h header file and the real functions are generated before compiling by macro expansion. Thus the dereferencing of low-level functions and void pointer arguments could be avoided: the low-level functions are inlined, the function arguments are pointers of type-specific structures. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01netfilter: ipset: bitmap:ip set type supportJozsef Kadlecsik
The module implements the bitmap:ip set type in two flavours, without and with timeout support. In this kind of set one can store IPv4 addresses (or network addresses) from a given range. In order not to waste memory, the timeout version does not rely on the kernel timer for every element to be timed out but on garbage collection. All set types use this mechanism. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01netfilter: ipset: IP set core supportJozsef Kadlecsik
The patch adds the IP set core support to the kernel. The IP set core implements a netlink (nfnetlink) based protocol by which one can create, destroy, flush, rename, swap, list, save, restore sets, and add, delete, test elements from userspace. For simplicity (and backward compatibilty and for not to force ip(6)tables to be linked with a netlink library) reasons a small getsockopt-based protocol is also kept in order to communicate with the ip(6)tables match and target. The netlink protocol passes all u16, etc values in network order with NLA_F_NET_BYTEORDER flag. The protocol enforces the proper use of the NLA_F_NESTED and NLA_F_NET_BYTEORDER flags. For other kernel subsystems (netfilter match and target) the API contains the functions to add, delete and test elements in sets and the required calls to get/put refereces to the sets before those operations can be performed. The set types (which are implemented in independent modules) are stored in a simple RCU protected list. A set type may have variants: for example without timeout or with timeout support, for IPv4 or for IPv6. The sets (i.e. the pointers to the sets) are stored in an array. The sets are identified by their index in the array, which makes possible easy and fast swapping of sets. The array is protected indirectly by the nfnl mutex from nfnetlink. The content of the sets are protected by the rwlock of the set. There are functional differences between the add/del/test functions for the kernel and userspace: - kernel add/del/test: works on the current packet (i.e. one element) - kernel test: may trigger an "add" operation in order to fill out unspecified parts of the element from the packet (like MAC address) - userspace add/del: works on the netlink message and thus possibly on multiple elements from the IPSET_ATTR_ADT container attribute. - userspace add: may trigger resizing of a set Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01netfilter: NFNL_SUBSYS_IPSET id and NLA_PUT_NET* macrosJozsef Kadlecsik
The patch adds the NFNL_SUBSYS_IPSET id and NLA_PUT_NET* macros to the vanilla kernel. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-31ipv4: Consolidate all default route selection implementations.David S. Miller
Both fib_trie and fib_hash have a local implementation of fib_table_select_default(). This is completely unnecessary code duplication. Since we now remember the fib_table and the head of the fib alias list of the default route, we can implement one single generic version of this routine. Looking at the fib_hash implementation you may get the impression that it's possible for there to be multiple top-level routes in the table for the default route. The truth is, it isn't, the insert code will only allow one entry to exist in the zero prefix hash table, because all keys evaluate to zero and all keys in a hash table must be unique. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31ipv4: Remember FIB alias list head and table in lookup results.David S. Miller
This will be used later to implement fib_select_default() in a completely generic manner, instead of the current situation where the default route is re-looked up in the TRIE/HASH table and then the available aliases are analyzed. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2011-01-30net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNTEric W. Biederman
SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1, which unfortunately means the existing infrastructure for compat networking ioctls is insufficient. A trivial compact ioctl implementation would conflict with: SIOCAX25ADDUID SIOCAIPXPRISLT SIOCGETSGCNT_IN6 SIOCGETSGCNT SIOCRSSCAUSE SIOCX25SSUBSCRIP SIOCX25SDTEFACILITIES To make this work I have updated the compat_ioctl decode path to mirror the the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl function into struct proto so I can break out ioctls by which kind of ip socket I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal ipmr_ioctl. This was necessary because unfortunately the struct layout for the SIOCGETSGCNT has unsigned longs in it so changes between 32bit and 64bit kernels. This change was sufficient to run a 32bit ip multicast routing daemon on a 64bit kernel. Reported-by: Bill Fenner <fenner@aristanetworks.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-30caif: bugfix - add caif headers for userspace usage.sjur.brandeland@stericsson.com
Add caif_socket.h and if_caif.h to the kernel header files exported for use by userspace. Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28ipv4: Attach FIB info to dst_default_metrics when possibleDavid S. Miller
If there are no explicit metrics attached to a route, hook fi->fib_info up to dst_default_metrics. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28ipv4: Allocate fib metrics dynamically.David S. Miller
This is the initial gateway towards super-sharing metrics if they are all set to zero for a route. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 Conflicts: drivers/net/wireless/ath/ath9k/init.c
2011-01-28mac80211: add MCS information to radiotapJohannes Berg
This adds the MCS information we currently get from the drivers into radiotap. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-27net: Pre-COW metrics for TCP.David S. Miller
TCP is going to record metrics for the connection, so pre-COW the route metrics at route cache entry creation time. This avoids several atomic operations that have to occur if we COW the metrics after the entry reaches global visibility. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27Merge branch 'master' of ↵David S. Miller
ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2011-01-27net: fix dev_seq_next()Eric Dumazet
Commit c6d14c84566d (net: Introduce for_each_netdev_rcu() iterator) added a race in dev_seq_next(). The rcu_dereference() call should be done _before_ testing the end of list, or we might return a wrong net_device if a concurrent thread changes net_device list under us. Note : discovered thanks to a sparse warning : net/core/dev.c:3919:9: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27inetpeer: Mark metrics as "new" in fresh inetpeer entries.David S. Miller
Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically, all ones) on fresh inetpeer entries. This way code can determine if default metrics have been loaded in from a routing table entry already. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27inetpeer: Add metrics storage to inetpeer entries.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26net: Implement read-only protection and COW'ing of metrics.David S. Miller
Routing metrics are now copy-on-write. Initially a route entry points it's metrics at a read-only location. If a routing table entry exists, it will point there. Else it will point at the all zero metric place-holder called 'dst_default_metrics'. The writeability state of the metrics is stored in the low bits of the metrics pointer, we have two bits left to spare if we want to store more states. For the initial implementation, COW is implemented simply via kmalloc. However future enhancements will change this to place the writable metrics somewhere else, in order to increase sharing. Very likely this "somewhere else" will be the inetpeer cache. Note also that this means that metrics updates may transiently fail if we cannot COW the metrics successfully. But even by itself, this patch should decrease memory usage and increase cache locality especially for routing workloads. In those cases the read-only metric copies stay in place and never get written to. TCP workloads where metrics get updated, and those rare cases where PMTU triggers occur, will take a very slight performance hit. But that hit will be alleviated when the long-term writable metrics move to a more sharable location. Since the metrics storage went from a u32 array of RTAX_MAX entries to what is essentially a pointer, some retooling of the dst_entry layout was necessary. Most importantly, we need to preserve the alignment of the reference count so that it doesn't share cache lines with the read-mostly state, as per Eric Dumazet's alignment assertion checks. The only non-trivial bit here is the move of the 'flags' member into the writeable cacheline. This is OK since we are always accessing the flags around the same moment when we made a modification to the reference count. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2011-01-26Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2011-01-24net: reduce and unify printk level in netdev_fix_features()Michał Mirosław
Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will be used by ethtool and might spam dmesg unnecessarily. This converts the function to use netdev_info() instead of plain printk(). As a side effect, bonding and bridge devices will now log dropped features on every slave device change. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24net: change netdev->features to u32Michał Mirosław
Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24net: RPS: Enable hardware acceleration of RFSBen Hutchings
Allow drivers for multiqueue hardware with flow filter tables to accelerate RFS. The driver must: 1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion IRQs (in queue order). This will provide a mapping from CPUs to the queues for which completions are handled nearest to them. 2. Implement net_device_ops::ndo_rx_flow_steer. This operation adds or replaces a filter steering the given flow to the given RX queue, if possible. 3. Periodically remove filters for which rps_may_expire_flow() returns true. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24lib: cpu_rmap: CPU affinity reverse-mappingBen Hutchings
When initiating I/O on a multiqueue and multi-IRQ device, we may want to select a queue for which the response will be handled on the same or a nearby CPU. This requires a reverse-map of IRQ affinity. Add library functions to support a generic reverse-mapping from CPUs to objects with affinity and the specific case where the objects are IRQs. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24Merge branch 'irq/numa' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
2011-01-24Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/sched/sch_hfsc.c net/sched/sch_htb.c net/sched/sch_tbf.c
2011-01-24Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
2011-01-25Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: RTC: Remove Kconfig symbol for UIE emulation RTC: Properly handle rtc_read_alarm error propagation and fix bug RTC: Propagate error handling via rtc_timer_enqueue properly acpi_pm: Clear pmtmr_ioport if acpi_pm initialization fails rtc: Cleanup removed UIE emulation declaration hrtimers: Notify hrtimer users of switches to NOHZ mode
2011-01-24Merge branch 'BUG_ON' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * 'BUG_ON' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: Remove MAYBE_BUILD_BUG_ON BUILD_BUG_ON: make it handle more cases
2011-01-24Remove MAYBE_BUILD_BUG_ONRusty Russell
Now BUILD_BUG_ON() can handle optimizable constants, we don't need MAYBE_BUILD_BUG_ON any more. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24BUILD_BUG_ON: make it handle more casesRusty Russell
BUILD_BUG_ON used to use the optimizer to do code elimination or fail at link time; it was changed to first the size of a negative array (a nicer compile time error), then (in 8c87df457cb58fe75b9b893007917cf8095660a0) to a bitfield. This forced us to change some non-constant cases to MAYBE_BUILD_BUG_ON(); as Jan points out in that commit, it didn't work as intended anyway. bitfields: needs a literal constant at parse time, and can't be put under "if (__builtin_constant_p(x))" for example. negative array: can handle anything, but if the compiler can't tell it's a constant, silently has no effect. link time: breaks link if the compiler can't determine the value, but the linker output is not usually as informative as a compiler error. If we use the negative-array-size method *and* the link time trick, we get the ability to use BUILD_BUG_ON() under __builtin_constant_p() branches, and maximal ability for the compiler to detect errors at build time. We also document it thoroughly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jan Beulich <JBeulich@novell.com> Acked-by: Hollis Blanchard <hollisb@us.ibm.com>