summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-12-16unix_diag: Write it into kbuildPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16unix_diag: Receive queue lenght NLAPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16unix_diag: Pending connections IDs NLAPavel Emelyanov
When establishing a unix connection on stream sockets the server end receives an skb with socket in its receive queue. Report who is waiting for these ends to be accepted for listening sockets via NLA. There's a lokcing issue with this -- the unix sk state lock is required to access the peer, and it is taken under the listening sk's queue lock. Strictly speaking the queue lock should be taken inside the state lock, but since in this case these two sockets are different it shouldn't lead to deadlock. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16unix_diag: Unix peer inode NLAPavel Emelyanov
Report the peer socket inode ID as NLA. With this it's finally possible to find out the other end of an interesting unix connection. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16unix_diag: Unix inode info NLAPavel Emelyanov
Actually, the socket path if it's not anonymous doesn't give a clue to which file the socket is bound to. Even if the path is absolute, it can be unlinked and then new socket can be bound to it. With this NLA it's possible to check which file a particular socket is really bound to. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16unix_diag: Unix socket name NLAPavel Emelyanov
Report the sun_path when requested as NLA. With leading '\0' if present but without the leading AF_UNIX bits. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16unix_diag: Dumping exact socket corePavel Emelyanov
The socket inode is used as a key for lookup. This is effectively the only really unique ID of a unix socket, but using this for search currently has one problem -- it is O(number of sockets) :( Does it worth fixing this lookup or inventing some other ID for unix sockets? Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16unix_diag: Dumping all sockets corePavel Emelyanov
Walk the unix sockets table and fill the core response structure, which includes type, state and inode. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16unix_diag: Basic module skeletonPavel Emelyanov
Includes basic module_init/_exit functionality, dump/get_exact stubs and declares the basic API structures for request and response. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16af_unix: Export stuff required for diag modulePavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Generalize requests cookies managementsPavel Emelyanov
The sk address is used as a cookie between dump/get_exact calls. It will be required for unix socket sdumping, so move it from inet_diag to sock_diag. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Fix module netlink aliasesPavel Emelyanov
I've made a mistake when fixing the sock_/inet_diag aliases :( 1. The sock_diag layer should request the family-based alias, not just the IPPROTO_IP one; 2. The inet_diag layer should request for AF_INET+protocol alias, not just the protocol one. Thus fix this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Move the SOCK_DIAG_BY_FAMILY cmd declarationPavel Emelyanov
It should belong to sock_diag, not inet_diag. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/freescale/fsl_pq_mdio.c net/batman-adv/translation-table.c net/ipv6/route.c
2011-12-15tg3: Break out RSS indir table init and assignmentMatt Carlson
This patch creates a new device member to hold the RSS indirection table and separates out the code that initializes the table from the code that programs the table into device registers. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-15tg3: Use mii_advertise_flowctrlMatt Carlson
This patch replaces tg3's internal tg3_advert_flowctrl_1000T function with mii_advertise_flowctrl provided by the kernel headers. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-15tg3: Add 57766 ASIC rev supportMatt Carlson
This patch adds support for the 57766 ASIC revision. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-15tg3: Make the TX BD DMA limit configurableMatt Carlson
The 57766 ASIC rev will impose a new TX BD DMA limit on the driver. This patch prepares for 57766 support by making the tx BD DMA limit tunable. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-15tg3: Enable EEE support for capable 10/100 devsMatt Carlson
There are some devices in the 57765 ASIC rev that are EEE capable. Unfortunately the EEE setup code only gets executed if the device is gigabit capable. This patch fixes the problem. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-15tcp_memcontrol: fix reversed if conditionDan Carpenter
We should only dereference the pointer if it's valid, not the other way round. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-15Move limit definitions outside CONFIG_INETGlauber Costa
They need to be available for other protocols as well, since they are used in sock.c openly Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14inet: remove rcu protection on tw_netEric Dumazet
commit b099ce2602d806 (net: Batch inet_twsk_purge) added rcu protection on tw_net for no obvious reason. struct net are refcounted anyway since timewait sockets escape from rcu protected sections. tw_net stay valid for the whole timwait lifetime. This also removes a lot of sparse errors. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14net: ping: remove some sparse errorsEric Dumazet
net/ipv4/sysctl_net_ipv4.c:78:6: warning: symbol 'inet_get_ping_group_range_table' was not declared. Should it be static? net/ipv4/sysctl_net_ipv4.c:119:31: warning: incorrect type in argument 2 (different signedness) net/ipv4/sysctl_net_ipv4.c:119:31: expected int *range net/ipv4/sysctl_net_ipv4.c:119:31: got unsigned int *<noident> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14cls_flow: remove one dynamic arrayEric Dumazet
Its better to use a predefined size for this small automatic variable. Removes a sparse error as well : net/sched/cls_flow.c:288:13: error: bad constant expression Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14bnx2x: handle vpd data longer than 128 bytesBarak Witkowski
Signed-off-by: Barak Witkowski <barak@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14vlan: static functionsEric Dumazet
commit 6d4cdf47d2 (vlan: add 802.1q netpoll support) forgot to declare as static some private functions. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14vlan: add rtnl_dereference() annotationsDan Carpenter
The original code generates a Sparse warning: net/8021q/vlan_core.c:336:9: error: incompatible types in comparison expression (different address spaces) It's ok to dereference __rcu pointers here because we are holding the RTNL lock. I've added some calls to rtnl_dereference() to silence the warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14rtnetlink: rtnl_link_register() sanity testEric Dumazet
Before adding a struct rtnl_link_ops into link_ops list, check it doesnt clash with a prior one. Based on a previous patch from Alexander Smirnov Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13ipv6: Check dest prefix length on original route not copied one in ↵David S. Miller
rt6_alloc_cow(). After commit 8e2ec639173f325977818c45011ee176ef2b11f6 ("ipv6: don't use inetpeer to store metrics for routes.") the test in rt6_alloc_cow() for setting the ANYCAST flag is now wrong. 'rt' will always now have a plen of 128, because it is set explicitly to 128 by ip6_rt_copy. So to restore the semantics of the test, check the destination prefix length of 'ort'. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13ipv6: If neigh lookup fails during icmp6 dst allocation, propagate error.David S. Miller
Don't just succeed with a route that has a NULL neighbour attached. This follows the behavior of addrconf_dst_alloc(). Allowing this kind of route to end up with a NULL neigh attached will result in packet drops on output until the route is somehow invalidated, since nothing will meanwhile try to lookup the neigh again. A statistic is bumped for the case where we see a neigh-less route on output, but the resulting packet drop is otherwise silent in nature, and frankly it's a hard error for this to happen and ipv6 should do what ipv4 does which is say something in the kernel logs. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13net: Remove unused neighbour layer ops.David S. Miller
It's simpler to just keep these things out until there is a real user of them, so we can see what the needs actually are, rather than keep these things around as useless overhead. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_en: updated driver version to 2.0Yevgeny Petrilin
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: updated driver version to 1.1Yevgeny Petrilin
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: Modify driver initialization flow to accommodate SRIOV for EthernetJack Morgenstein
1. Added module parameters sr_iov and probe_vf for controlling enablement of SRIOV mode. 2. Increased default max num-qps, num-mpts and log_num_macs to accomodate SRIOV mode 3. Added port_type_array as a module parameter to allow driver startup with ports configured as desired. In SRIOV mode, only ETH is supported, and this array is ignored; otherwise, for the case where the FW supports both port types (ETH and IB), the port_type_array parameter is used. By default, the port_type_array is set to configure both ports as IB. 4. When running in sriov mode, the master needs to initialize the ICM eq table to hold the eq's for itself and also for all the slaves. 5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap. 6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca). VFs obtain their startup information from the PF (master) device via the comm channel. 7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver aborts loading the device. 8. Do not allow setting port type via sysfs when running in SRIOV mode. 9. mlx4_get_ownership: Currently, only one PF is supported by the driver. If the HCA is burned with FW which enables more than one PF, only one of the PFs is allowed to run. The first one up grabs a FW ownership semaphone -- all other PFs will find that semaphore taken, and the driver will not allow them to run. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Liran Liss <liranl@mellanox.co.il> Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: adjust catas operation for SRIOV modeJack Morgenstein
When running in SRIOV mode, driver should not automatically start/stop the mlx4_core upon sensing an HCA internal error -- doing this disables/enables sriov, which will cause the hypervisor to hang if there are running VMs with attached VFs. In addition, on VMs the catas process should not run at all, since the HCA error buffer is not available to VMs in the BARs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: mtts resources units changed to offsetMarcel Apfelbaum
In the previous implementation mtts are managed by: 1. order - log(mtt segments), 'mtt segment' groups several mtts together. 2. first_seg - segment location relative to mtt table. In the current implementation: 1. order - log(mtts) rather than segments 2. offset - mtt index in mtt table Note: The actual mtt allocation is made in segments but it is transparent to callers. Rational: The mtt resource holders are not interested on how the allocation of mtt is done, but rather on how they will use it. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_en: Allow communication between functions on same hostEugenia Emantayev
To enable internal loopback, always fill DMAC in control segment when transmitting the packet, once this is done, the packet is subject for loopback for if the DMAC mathces one of the multicast/unicast addresses registered on the physical port. In receive path if source MAC is our own MAC and we are not in selftest, or not in force LB mode - drop this packet. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4: Ethernet port management modificationsEugenia Emantayev
The physical port is now common to the PF and VFs. The port resources and configuration is managed by the PF, VFs can only influence the MTU of the port, it is set as max among all functions, Each function allocates RX buffers of required size to meet it's MTU enforcement. Port management code was moved to mlx4_core, as the mlx4_en module is virtualization unaware Move handling qp functionality to mlx4_get_eth_qp/mlx4_put_eth_qp including reserve/release range and add/release unicast steering. Let mlx4_register/unregister_mac deal only with MAC (un)registration. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4: Traffic steering management support for SRIOVEugenia Emantayev
Let multicast/unicast attaching flow go through resource tracker. The PF is the one responsible for managing all the steering entries. Define and use module parameter that determines the number of qps per multicast group. Minor changes in function calls according to changed prototype. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_ib: disable SRIOV mode for IB ports (not yet supported)Jack Morgenstein
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: resource tracking for HCA resources used by guestsEli Cohen
The resource tracker is used to track usage of HCA resources by the different guests. Virtual functions (VFs) are attached to guest operating systems but resources are allocated from the same pool and are assigned to VFs. It is essential that hostile/buggy guests not be able to affect the operation of other VFs, possibly attached to other guest OSs since ConnectX firmware is not tolerant to misuse of resources. The resource tracker module associates each resource with a VF and maintains state information for the allocated object. It also defines allowed state transitions and enforces them. Relationships between resources are also referred to. For example, CQs are pointed to by QPs, so it is forbidden to destroy a CQ if a QP refers to it. ICM memory is always accessible through the primary function and hence it is allocated by the owner of the primary function. When a guest dies, an FLR is generated for all the VFs it owns and all the resources it used are freed. The tracked resource types are: QPs, CQs, SRQs, MPTs, MTTs, MACs, RES_EQs, and XRCDNs. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: Add wrapper functions and comm channel and slave event support to EQsJack Morgenstein
Passing async events to slaves: In SRIOV mode, each slave creates its own async EQ, but only the master can register directly with the FW to receive async events. Async events which should be passed to slaves (such as a WQ_ACCESS_ERROR for a QP owned by a slave) are generated at the slave by the master using the GEN_EQE FW command. Wrapper functions: mlx4_MAP_EQ_wrapper Only the master can map an EQ. The slave commands to map their EQs arrive at the master via the comm channel. The master then invokes the wrapper function to do the work (and enter the resource in the tracking database). New events: COMM_CHANNEL and FLR The COMM_CHANNEL event arrives only at the master, and signals that a slave has posted a command on the comm channel. The FLR event is generated by the FW when a guest operating a VF unexpectedly goes down. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: mtt modifications for SRIOVJack Morgenstein
MTTs are resources which are allocated and tracked by the PF driver. In multifunction mode, the allocation and icm mapping is done in the resource tracker (later patch in this sequence). To accomplish this, we have "work" functions whose names start with "__", and "request" functions (same name, no __). If we are operating in multifunction mode, the request function actually results in comm-channel commands being sent (ALLOC_RES or FREE_RES). The PF-driver comm-channel handler will ultimately invoke the "work" (__) function and return the result. If we are not in multifunction mode, the "work" handler is invoked immediately. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: cq modifications for SRIOVJack Morgenstein
CQs are resources which are allocated and tracked by the PF driver. In multifunction mode, the allocation and icm mapping is done in the resource tracker (later patch in this sequence). To accomplish this, we have "work" functions whose names start with "__", and "request" functions (same name, no __). If we are operating in multifunction mode, the request function actually results in comm-channel commands being sent (ALLOC_RES or FREE_RES). The PF-driver comm-channel handler will ultimately invoke the "work" (__) function and return the result. If we are not in multifunction mode, the "work" handler is invoked immediately. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: qp modifications for SRIOVJack Morgenstein
QPs are resources which are allocated and tracked by the PF driver. In multifunction mode, the allocation and icm mapping is done in the resource tracker (later patch in this sequence). To accomplish this, we have "work" functions whose names start with "__", and "request" functions (same name, no __). If we are operating in multifunction mode, the request function actually results in comm-channel commands being sent (ALLOC_RES or FREE_RES). The PF-driver comm-channel handler will ultimately invoke the "work" (__) function and return the result. If we are not in multifunction mode, the "work" handler is invoked immediately. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: srq modifications for SRIOVJack Morgenstein
SRQs are resources which are allocated and tracked by the PF driver. In multifunction mode, the allocation and icm mapping is done in the resource tracker (later patch in this sequence). To accomplish this, we have "work" functions whose names start with "__", and "request" functions (same name, no __). If we are operating in multifunction mode, the request function actually results in comm-channel commands being sent (ALLOC_RES or FREE_RES). The PF-driver comm-channel handler will ultimately invoke the "work" (__) function and return the result. If we are not in multifunction mode, the "work" handler is invoked immediately. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: Added FW commands and their wrappers for supporting SRIOVMarcel Apfelbaum
The following commands are added here: 1. QUERY_FUNC_CAP and its wrapper. This function is used by VFs when they start up to receive configuration information from the PF, such as resource quotas for this VF, which ports should be used (currently two), what protocol is running on the port (currently Ethernet ONLY, or port not active). 2. QUERY_PORT and its wrapper. Previously, this FW command was invoked directly by the ETH driver (en_port.c) using mlx4_cmd_box. Virtualization is now required here (the VF's MAC address must be substituted for the PFs MAC address returned by the FW). We changed the invocation in the ETH driver to use mlx4_QUERY_PORT, and added the wrapper. 3. QUERY_HCA. Used by the VF to determine how the HCA was initialized. For now, we need only the multicast table member entry size (log2_mc_table_entry_sz, in the ConnectX PRM). No wrapper is needed here, because the data may be passed as is to the VF without modification). In this command, we have added a GLOBAL_CAPS field for passing required configuration information from FW to a VF (this field is to allow safely adding new SRIOV capabilities which require support in VF drivers, too). Bits will set here by FW in response to PF-driver configuration commands which will activate as yet undefined new SRIOV features. The VF will test to see that all required capabilities indicated by this field are supported (i.e., if a bit is set and the VF driver does not recognize that bit, it must abort its initialization). Currently, no bits are set. 4. Added a CLOSE_PORT wrapper. The PF context needs to keep track of how many VF contexts have the port open. The PF context will not actually issue the FW close port command until the last port user issues a CLOSE_PORT request. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13net/mlx4_core: Implement the master-slave communication channelYevgeny Petrilin
When SRIOV is enabled, pf and vfs communicate via shared comm channel. The vf gets its side of the comm channel via a VF BAR. Each VF (slave) creates its vHCR (virtual HCA Command Register), Its DMA address is passed to the PF (master) using Communication Channel Register. The same Register is used to notify the master of commands posted by the slaves and for the master to pass events to the slaves, such as command completions and asynchronous events. The vHCR format is identical to the HCR format, except for the 'go' and 't' bits, which are reserved in the vHCR. Posting commands to the vHCR is identical to the way it is done with the HCR, albeit that the function/PF token fields are used instead of the HCR go bit. Specifically: - When the function prepares a new command in the vHCR, it issues the Post_vHCR_cmd communication channel command and toggles the value of the function token; when PF token has an equal value, the command has been accepted and a new command may be posted. - When the PF detects a Post_vHCR_cmd command, it concludes that a new command is available in the vHCR; after processing the command, the PF toggles the PF token to match the function token. When the 'e' bit is not set, the completion of a Post_vHCR_cmd command also indicates the completion the vHCR command. If, however, the 'e' bit is set, the completion of a Post_vHCR_cmd command only indicates that the vHCR command has been accepted for execution by the PF. Function commands are processed by the PF as follows: -DMA (using the ACCESS_MEM command) the vHCR image into a shadow buffer. -Validate that the opcode is non-privileged, and that the opcode- and input-modifiers are legal. -DMA the in-box (if required) into a shadow buffer. -Validate the command: o Resource ranges (e.g., QP ranges). o Partition key. o Ranges of referenced resources (e.g., CQs within QP contexts). -If the 'e' bit is set o complete the Post_vHCR_cmd command -Execute the command on the HCR. -DMA the results to the vHCR out-box (if required). -If the 'e' bit is set o Indicate command completion by generating a completion event using the GEN_EQE command -Otherwise o DMA the command status to the vHCR o Complete the Post_vHCR_cmd command Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrillin <yevgenyp@mellanox.com> Signed-off-by: Liran Liss <liranl@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: Reduce number of PD bits to 17Jack Morgenstein
When SRIOV is enabled on the chip (at FW burning time), the HCA uses only 17 bits for the PD. The remaining 7 high-order bits are ignored. Change the allocator to return only 17 bits for the PD. The MSB 7 bits will be used to encode the slave number for consistency checking later on in the resource tracker. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13mlx4_core: Add "native" argument to mlx4_cmd and its callers (where needed)Jack Morgenstein
For SRIOV, some Hypervisor commands can be executed directly (native = 1). Others should go through the command wrapper flow (for tracking resource usage, for example, or for changing some HCA configurations that slaves need to be notified of). This patch sets the groundwork for this capability -- adding the correct value of "native" in each case. Note that if SRIOV is not activated, this parameter has no effect. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>