Age | Commit message (Collapse) | Author |
|
When we hotremove a memory device, we will free the memory to store struct
page. If the page is hwpoisoned page, we should decrease mce_bad_pages.
[akpm@linux-foundation.org: cleanup ifdefs]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
hwpoisoned may be set when we offline a page by the sysfs interface
/sys/devices/system/memory/soft_offline_page or
/sys/devices/system/memory/hard_offline_page. If we don't clear
this flag when onlining pages, this page can't be freed, and will
not in free list. So we can't offline these pages again. So we
should skip such page when offlining pages.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
warning
When calling remove_memory_block(), the function shows following message
at device_release().
"Device 'memory528' does not have a release() function, it is broken and
must be fixed."
The reason is memory_block's device struct does not have a release()
function.
So the patch registers memory_block_release() to the device's release()
function for suppressing the warning message. Additionally, the patch
moves kfree(mem) into the release function since the release function is
prepared as a means to free a memory_block struct.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Introduce mk_huge_pmd() to simplify the code
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Multiple places do the same check.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Several place need to find the pmd by(mm_struct, address), so introduce a
function to simplify it.
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are duplicated places using release_pte_pages().
And release_all_pte_pages() can be removed.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We don't need custom COMPACTION_BUILD anymore, since we have handy
IS_ENABLED().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We don't need custom NUMA_BUILD anymore, since we have handy
IS_ENABLED().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
mem_cgroup_out_of_memory() is only referenced from within file scope, so
it can be marked static.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is useful to diagnose the reason for page allocation failure for
cases where there appear to be several free pages.
Example, with this alloc_pages(GFP_ATOMIC) failure:
swapper/0: page allocation failure: order:0, mode:0x0
...
Mem-info:
Normal per-cpu:
CPU 0: hi: 90, btch: 15 usd: 48
CPU 1: hi: 90, btch: 15 usd: 21
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:84 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:4026 slab_reclaimable:75 slab_unreclaimable:484
mapped:0 shmem:0 pagetables:0 bounce:0
Normal free:16104kB min:2296kB low:2868kB high:3444kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:336kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:331776kB mlocked:0kB
dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:300kB
slab_unreclaimable:1936kB kernel_stack:328kB pagetables:0kB unstable:0kB
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Before the patch, it's hard (for me, at least) to say why all these free
chunks weren't considered for allocation:
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB
1*1024kB 1*2048kB 3*4096kB = 16128kB
After the patch, it's obvious that the reason is that all of these are
in the MIGRATE_CMA (C) freelist:
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB
(C) 1*1024kB (C) 1*2048kB (C) 3*4096kB (C) = 16128kB
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is no reason to pass the nr_pages_dirtied argument, because
nr_pages_dirtied value from the caller is unused in
balance_dirty_pages_ratelimited_nr().
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull device tree changes from Grant Likely:
"Here are the DT changes I've got queued up for v3.8. As described
below, there are a lot of bug fixes here and documentation updates but
nothing major:
Bug fixes, little cleanups, and documentation changes. The most
invasive thing here touches a bunch of the arch directories to use a
common build rule for .dtb files. There are no major changes to
functionality here other than a few new helper functions."
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6: (34 commits)
arm64: Fix the dtbs target building
mtd: nand: davinci: fix the binding documentation
rtc: rtc-mv: Add the device tree binding documentation
devicetree/bindings: Move gpio-leds binding into leds directory
of/vendor-prefixes: add Imagination Technologies
microblaze: use new common dtc rule
c6x: use new common dtc rule
openrisc: use new common dtc rule
arm64: Add dtbs target for building all the enabled dtb files
arm64: use new common dtc rule
ARM: dt: change .dtb build rules to build in dts directory
kbuild: centralize .dts->.dtb rule
Fix build when CONFIG_W1_MASTER_GPIO=m b exporting "allnodes"
of/spi: Honour "status=disabled" property of device
of_mdio: Honour "status=disabled" property of device
of_i2c: Honour "status=disabled" property of device
powerpc: Fix fallout from device_node->name constification
of: add 'const' for of_parse_phandle parameter *np
Documentation: correct of_platform_populate() argument list
script: dtc: clean generated files
...
|
|
Pull irqdomain changes from Grant Likely:
"Trivial changes to irqdomain. An update to the documentation and make
one of the error paths not quite so obnoxious."
* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
irqdomain: update documentation
irqdomain: stop screaming about preallocated irqdescs
|
|
Pull EDAC fixes from Borislav Petkov:
- EDAC core error path fix, from Denis Kirjanov.
- Generalization of AMD MCE bank names and some minor error reporting
improvements.
- EDAC core cleanups and simplifications, from Wei Yongjun.
- amd64_edac fixes for sysfs-reported values, from Josh Hunt.
- some heavy amd64_edac error reporting path shaving, leading to
removing a bunch of code.
- amd64_edac error injection method improvements.
- EDAC core cleanups and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits)
EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
EDAC: Handle error path in edac_mc_sysfs_init() properly
MCE, AMD: Dump error status
MCE, AMD: Report decoded error type first
MCE, AMD: Dump CPU f/m/s triple with the error
MCE, AMD: Remove functional unit references
EDAC: Convert to use simple_open()
EDAC, Calxeda highbank: Convert to use simple_open()
EDAC: Fix mc size reported in sysfs
EDAC: Fix csrow size reported in sysfs
EDAC: Pass mci parent
EDAC: Add memory controller flags
amd64_edac: Fix csrows size and pages computation
amd64_edac: Use DBAM_DIMM macro
amd64_edac: Fix K8 chip select reporting
amd64_edac: Reorganize error reporting path
amd64_edac: Do not check whether error address is valid
amd64_edac: Improve error injection
amd64_edac: Cleanup error injection code
amd64_edac: Small fixlets and cleanups
...
|
|
git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull CMA and DMA-mapping update from Marek Szyprowski:
"Another set of Contiguous Memory Allocator and DMA-mapping framework
updates for v3.8.
This pull request consists only of two patches. The first fixes a
long standing issue with dmapools (the code predates current GIT
history), which forced all allocations to use GFP_ATOMIC flag,
ignoring the flags passed by the caller. The second patch changes CMA
code to correctly use phys_addr_t type what enables support for LPAE
systems."
* 'for-v3.8' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
drivers: cma: represent physical addresses as phys_addr_t
mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls
|
|
Pull clock framework changes from Mike Turquette:
"The common clock framework changes for 3.8 are comprised of lots of
fixes for existing platforms as well as new ports for some ARM
platforms. In addition there are new clk drivers for audio devices
and MFDs."
Fix up trivial conflict in <linux/clk-provider.h> (removal of 'inline'
clashing with return type fixes)
* tag 'clk-for-linus' of git://git.linaro.org/people/mturquette/linux: (51 commits)
MAINTAINERS: bad email address for Mike Turquette
clk: introduce optional disable_unused callback
clk: ux500: fix bit error
clk: clock multiplexers may register out of order
clk: ux500: Initial support for abx500 clock driver
CLK: SPEAr: Remove unused dummy apb_pclk
CLK: SPEAr: Correct index scanning done for clock synths
CLK: SPEAr: Update clock rate table
CLK: SPEAr: Add missing clocks
CLK: SPEAr: Set CLK_SET_RATE_PARENT for few clocks
CLK: SPEAr13xx: fix parent names of multiple clocks
CLK: SPEAr13xx: Fix mux clock names
CLK: SPEAr: Fix dev_id & con_id for multiple clocks
clk: move IM-PD1 clocks to drivers/clk
clk: make ICST driver handle the VCO registers
clk: add GPLv2 headers to the Versatile clock files
clk: mxs: Use a better name for the USB PHY clock
clk: spear: Add stub functions for spear3[0|1|2]0_clk_init()
CLK: clk-twl6040: fix return value check in twl6040_clk_probe()
clk: ux500: Register nomadik keypad clock lookups for u8500
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pinctrl changes from Linus Walleij:
"These are the first and major pinctrl changes for the v3.8 merge
cycle. Some of this is used as merge base for other trees so I better
be early on the trigger.
As can be seen from the diffstat the major changes are:
- A big conversion of the AT91 pinctrl driver and the associated ACKed
platform changes under arch/arm/max-at91 and its device trees. This
has been coordinated with the AT91 maintainers to go in through the
pinctrl tree.
- A larger chunk of changes to the SPEAr drivers and the addition of
the "plgpio" driver for the SPEAr as well.
- The removal of the remnants of the Nomadik driver from the arch/arm
tree and fusion of that into the Nomadik driver and platform data
header files.
- Some local movement in the Marvell MVEBU drivers, these now have
their own subdirectory.
- The addition of a chunk of code to gpiolib under drivers/gpio to
register gpio-to-pin range mappings from the GPIO side of things.
This has been requested by Grant Likely and is now implemented, it
is particularly useful for device tree work.
Then we have incremental updates all over the place, many of these are
cleanups and fixes from Axel Lin who has done a great job of removing
minor mistakes and compilation annoyances."
* tag 'pinctrl-for-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (114 commits)
ARM: mmp: select PINCTRL for ARCH_MMP
pinctrl: Drop selecting PINCONF for MMP2, PXA168 and PXA910
pinctrl: pinctrl-single: Fix error check condition
pinctrl: SPEAr: Update error check for unsigned variables
gpiolib: Fix use after free in gpiochip_add_pin_range
gpiolib: rename pin range arguments
pinctrl: single: support gpio request and free
pinctrl: generic: add input schmitt disable parameter
pinctrl/u300/coh901: stop spawning pinctrl from GPIO
pinctrl/u300/coh901: let the gpio_chip register the range
pinctrl: add function to retrieve range from pin
gpiolib: return any error code from range creation
pinctrl: make range registration defer properly
gpiolib: rename find_pinctrl_*
gpiolib: let gpiochip_add_pin_range() specify offset
ARM: at91: pm9g45: add mmc support
ARM: at91: Animeo IP: add mmc support
ARM: at91: dt: add mmc pinctrl for Atmel reference boards
ARM: at91: dt: at91sam9: add mmc pinctrl support
ARM: at91/dts: add nodes for atmel hsmci controllers for atmel boards
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
"New driver: DA9055
Added/improved support for new chips in existing drivers: Z650/670,
N550/570, ADS7830, AMD 16h family"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (da9055) Fix chan_mux[DA9055_ADC_ADCIN3] setting
hwmon: DA9055 HWMON driver
hwmon: (coretemp) List TjMax for Z650/670 and N550/570
hwmon: (coretemp) Drop N4xx, N5xx, D4xx, D5xx CPUs from tjmax table
hwmon: (coretemp) Use model table instead of if/else to identify CPU models
hwmon: da9052: Use da9052_reg_update for rmw operations
hwmon: (coretemp) Drop dependency on PCI for TjMax detection on Atom CPUs
hwmon: (ina2xx) use module_i2c_driver to simplify the code
hwmon: (ads7828) add support for ADS7830
hwmon: (ads7828) driver cleanup
x86,AMD: Power driver support for AMD's family 16h processors
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
Pull MMC updates from Chris Ball:
"MMC highlights for 3.8:
Core:
- Expose access to the eMMC RPMB ("Replay Protected Memory Block")
area by extending the existing mmc_block ioctl.
- Add SDIO powered-suspend DT properties to the core MMC DT binding.
- Add no-1-8-v DT flag for boards where the SD controller reports
that it supports 1.8V but the board itself has no way to switch to
1.8V.
- More work on switching to 1.8V UHS support using a vqmmc regulator.
- Fix up a case where the slot-gpio helper may fail to reset the host
controller properly if a card was removed during a transfer.
- Fix several cases where a broken device could cause an infinite
loop while we wait for a register to update.
Drivers:
- at91-mci: Remove obsolete driver, atmel-mci handles these devices
now.
- sdhci-dove: Allow using GPIOs for card-detect notifications.
- sdhci-esdhc: Fix for recovering from ADMA errors on broken silicon.
- sdhci-s3c: Add pinctrl support.
- wmt-sdmmc: New driver for WonderMedia SD/MMC controllers."
* tag 'mmc-updates-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (65 commits)
mmc: sdhci: implement the .card_event() method
mmc: extend the slot-gpio card-detection to use host's .card_event() method
mmc: add a card-event host operation
mmc: sdhci-s3c: Fix compilation warning
mmc: sdhci-pci: Enable SDHCI_CAN_DO_HISPD for Ricoh SDHCI controller
mmc: sdhci-dove: allow GPIOs to be used for card detection on Dove
mmc: sdhci-dove: use two-stage initialization for sdhci-pltfm
mmc: sdhci-dove: use devm_clk_get()
mmc: eSDHC: Recover from ADMA errors
mmc: dw_mmc: remove duplicated buswidth code
mmc: dw_mmc: relocate where dw_mci_setup_bus() is called from
mmc: Limit MMC speed to 52MHz if not HS200
mmc: dw_mmc: use devres functions in dw_mmc
mmc: sh_mmcif: remove unneeded clock connection ID
mmc: sh_mobile_sdhi: remove unneeded clock connection ID
mmc: sh_mobile_sdhi: fix clock frequency printing
mmc: Remove redundant null check before kfree in bus.c
mmc: Remove redundant null check before kfree in sdio_bus.c
mmc: sdhci-imx-esdhc: use more devm_* functions
mmc: dt: add no-1-8-v device tree flag
...
|
|
This commit changes the CMA early initialization code to use phys_addr_t
for representing physical addresses instead of unsigned long.
Without this change, among other things, dma_declare_contiguous() simply
discards any memory regions whose address is not representable as unsigned
long.
This is a problem on 32-bit PAE machines where unsigned long is 32-bit
but physical address space is larger.
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Cyril Chemparathy <cyril@ti.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
|
|
dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
regardless the flags provided by the caller. This causes excessive
pruning of emergency memory pools without any good reason. Additionaly,
on ARM architecture any driver which is using dmapools will sooner or
later trigger the following error:
"ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!".
Increasing the coherent pool size usually doesn't help much and only
delays such error, because all GFP_ATOMIC DMA allocations are always
served from the special, very limited memory pool.
This patch changes the dmapool code to correctly use gfp flags provided
by the dmapool caller.
Reported-by: Soeren Moch <smoch@web.de>
Reported-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Soeren Moch <smoch@web.de>
Cc: stable@vger.kernel.org
|
|
Signed-off-by: Mike Turquette <mturquette@linaro.org>
|
|
Some gate clocks have special needs which must be handled during the
disable-unused clocks sequence. These needs might be driven by software
due to the fact that we're disabling a clock outside of the normal
clk_disable path and a clk's enable_count will not be accurate. On the
other hand a specific hardware programming sequence might need to be
followed for this corner case.
This change is needed for the upcoming OMAP port to the common clock
framework. Specifically, it is undesirable to treat the disable-unused
path identically to the normal clk_disable path since other software
layers are involved. In this case OMAP's clockdomain code throws WARNs
and bails early due to the clock's enable_count being set to zero. A
custom callback mitigates this problem nicely.
Cc: Paul Walmsley <paul@pwsan.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
|
|
|
|
The arch/arm64/Makefile was not passing the right target to the
boot/dts/Makefile.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
|
|
The matrix-keymap module is currently lacking a proper module license,
add one so we don't have this module tainting the entire kernel. This
issue has been present since commit 1932811f426f ("Input: matrix-keymap
- uninline and prepare for device tree support")
Signed-off-by: Florian Fainelli <florian@openwrt.org>
CC: stable@vger.kernel.org # v3.5+
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
1) Netlink socket dumping had several missing verifications and checks.
In particular, address comparisons in the request byte code
interpreter could access past the end of the address in the
inet_request_sock.
Also, address family and address prefix lengths were not validated
properly at all.
This means arbitrary applications can read past the end of certain
kernel data structures.
Fixes from Neal Cardwell.
2) ip_check_defrag() operates in contexts where we're in the process
of, or about to, input the packet into the real protocols
(specifically macvlan and AF_PACKET snooping).
Unfortunately, it does a pskb_may_pull() which can modify the
backing packet data which is not legal if the SKB is shared. It
very much can be shared in this context.
Deal with the possibility that the SKB is segmented by using
skb_copy_bits().
Fix from Johannes Berg based upon a report by Eric Leblond.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
ipv4: ip_check_defrag must not modify skb before unsharing
inet_diag: validate port comparison byte code to prevent unsafe reads
inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
|
|
Since the aemif driver conversion to DT along with
its movement to drivers/ folder is not yet done,
fix NAND binding documentation to have NAND specific
DT details only.
Signed-off-by: Kumar, Anil <anilkumar.v@ti.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
The support was already written, but the binding documentation was
lacking.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
This reverts commits a50915394f1fc02c2861d3b7ce7014788aa5066e and
d7c3b937bdf45f0b844400b7bf6fd3ed50bac604.
This is a revert of a revert of a revert. In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.
It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all. We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.
When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation. That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.
So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake. Let's hope we never revisit
this mistake again - and certainly not this many times ;)
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ip_check_defrag() might be called from af_packet within the
RX path where shared SKBs are used, so it must not modify
the input SKB before it has unshared it for defragmentation.
Use skb_copy_bits() to get the IP header and only pull in
everything later.
The same is true for the other caller in macvlan as it is
called from dev->rx_handler which can also get a shared SKB.
Reported-by: Eric Leblond <eric@regit.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
deferred or contended"
This reverts commit 782fd30406ecb9d9b082816abe0c6008fc72a7b0.
We are going to reinstate the __GFP_NO_KSWAPD flag that has been
removed, the removal reverted, and then removed again. Making this
commit a pointless fixup for a problem that was caused by the removal of
__GFP_NO_KSWAPD flag.
The thing is, we really don't want to wake up kswapd for THP allocations
(because they fail quite commonly under any kind of memory pressure,
including when there is tons of memory free), and these patches were
just trying to fix up the underlying bug: the original removal of
__GFP_NO_KSWAPD in commit c654345924f7 ("mm: remove __GFP_NO_KSWAPD")
was simply bogus.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add logic to verify that a port comparison byte code operation
actually has the second inet_diag_bc_op from which we read the port
for such operations.
Previously the code blindly referenced op[1] without first checking
whether a second inet_diag_bc_op struct could fit there. So a
malicious user could make the kernel read 4 bytes beyond the end of
the bytecode array by claiming to have a whole port comparison byte
code (2 inet_diag_bc_op structs) when in fact the bytecode was not
long enough to hold both.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add logic to check the address family of the user-supplied conditional
and the address family of the connection entry. We now do not do
prefix matching of addresses from different address families (AF_INET
vs AF_INET6), except for the previously existing support for having an
IPv4 prefix match an IPv4-mapped IPv6 address (which this commit
maintains as-is).
This change is needed for two reasons:
(1) The addresses are different lengths, so comparing a 128-bit IPv6
prefix match condition to a 32-bit IPv4 connection address can cause
us to unwittingly walk off the end of the IPv4 address and read
garbage or oops.
(2) The IPv4 and IPv6 address spaces are semantically distinct, so a
simple bit-wise comparison of the prefixes is not meaningful, and
would lead to bogus results (except for the IPv4-mapped IPv6 case,
which this commit maintains).
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND
operations.
Previously we did not validate the inet_diag_hostcond, address family,
address length, and prefix length. So a malicious user could make the
kernel read beyond the end of the bytecode array by claiming to have a
whole inet_diag_hostcond when the bytecode was not long enough to
contain a whole inet_diag_hostcond of the given address family. Or
they could make the kernel read up to about 27 bytes beyond the end of
a connection address by passing a prefix length that exceeded the
length of addresses of the given family.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
instantiated for IPv4 traffic and in the SYN-RECV state were actually
created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
means that for such connections inet6_rsk(req) returns a pointer to a
random spot in memory up to roughly 64KB beyond the end of the
request_sock.
With this bug, for a server using AF_INET6 TCP sockets and serving
IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
inet_diag_fill_req() causing an oops or the export to user space of 16
bytes of kernel memory as a garbage IPv6 address, depending on where
the garbage inet6_rsk(req) pointed.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit c702418f8a2f ("mm: vmscan: do not keep kswapd looping forever due
to individual uncompactable zones") removed zone watermark checks from
the compaction code in kswapd but left in the zone congestion clearing,
which now happens unconditionally on higher order reclaim.
This messes up the reclaim throttling logic for zones with
dirty/writeback pages, where zones should only lose their congestion
status when their watermarks have been restored.
Remove the clearing from the zone compaction section entirely. The
preliminary zone check and the reclaim loop in kswapd will clear it if
the zone is considered balanced.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The direct-IO write path already had the i_size checks in mm/filemap.c,
but it turns out the read path did not, and removing the block size
checks in fs/block_dev.c (commit bbec0270bdd8: "blkdev_max_block: make
private to fs/buffer.c") removed the magic "shrink IO to past the end of
the device" code there.
Fix it by truncating the IO to the size of the block device, like the
write path already does.
NOTE! I suspect the write path would be *much* better off doing it this
way in fs/block_dev.c, rather than hidden deep in mm/filemap.c. The
mm/filemap.c code is extremely hard to follow, and has various
conditionals on the target being a block device (ie the flag passed in
to 'generic_write_checks()', along with a conditional update of the
inode timestamp etc).
It is also quite possible that we should treat this whole block device
size as a "s_maxbytes" issue, and try to make the logic even more
generic. However, in the meantime this is the fairly minimal targeted
fix.
Noted by Milan Broz thanks to a regression test for the cryptsetup
reencrypt tool.
Reported-and-tested-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
"Two stragglers:
1) The new code that adds new flushing semantics to GRO can cause SKB
pointer list corruption, manage the lists differently to avoid the
OOPS. Fix from Eric Dumazet.
2) When TCP fast open does a retransmit of data in a SYN-ACK or
similar, we update retransmit state that we shouldn't triggering a
WARN_ON later. Fix from Yuchung Cheng."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: gro: fix possible panic in skb_gro_receive()
tcp: bug fix Fast Open client retransmission
|
|
commit 2e71a6f8084e (net: gro: selective flush of packets) added
a bug for skbs using frag_list. This part of the GRO stack is rarely
used, as it needs skb not using a page fragment for their skb->head.
Most drivers do use a page fragment, but some of them use GFP_KERNEL
allocations for the initial fill of their RX ring buffer.
napi_gro_flush() overwrite skb->prev that was used for these skb to
point to the last skb in frag_list.
Fix this using a separate field in struct napi_gro_cb to point to the
last fragment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If SYN-ACK partially acks SYN-data, the client retransmits the
remaining data by tcp_retransmit_skb(). This increments lost recovery
state variables like tp->retrans_out in Open state. If loss recovery
happens before the retransmission is acked, it triggers the WARN_ON
check in tcp_fastretrans_alert(). For example: the client sends
SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
another 4 data packets and get 3 dupacks.
Since the retransmission is not caused by network drop it should not
update the recovery state variables. Further the server may return a
smaller MSS than the cached MSS used for SYN-data, so the retranmission
needs a loop. Otherwise some data will not be retransmitted until timeout
or other loss recovery events.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extracting a part of the SDHCI card tasklet into a .card_event()
implementation allows SDHCI hosts to use generic card-detection
services, e.g. the GPIO slot function.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Reviewed-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
|
|
The slot-gpio API provides a generic card-detection handler. To support a
wider range of hosts it has to call the host's card-event callback, if
implemented. Also increase the debounce interval to 200ms to match the
SDHCI driver.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Reviewed-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
|
|
Some hosts need to perform additional actions upon card insertion or
ejection. Add a host operation to be called from card detection handlers.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Reviewed-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
|
|
'sc' is used only when CONFIG_PM_RUNTIME is defined. Hence define it
conditionally.
Silences the following warning:
drivers/mmc/host/sdhci-s3c.c: In function ‘sdhci_s3c_notify_change’:
drivers/mmc/host/sdhci-s3c.c:378:20: warning: unused variable ‘sc’ [-Wunused-variable]
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
Pull MMC fixes from Chris Ball:
"Two small regression fixes:
- sdhci-s3c: Fix runtime PM regression against 3.7-rc1
- sh-mmcif: Fix oops against 3.6"
* tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
mmc: sdhci-s3c: fix missing clock for gpio card-detect
|
|
This fixes a regression in 3.7-rc, which has since gone into stable.
Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().
Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a5e,
alloc_pages_vma() has kept refcount in balance, so now no problem.
Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
uncompactable zones
When a zone meets its high watermark and is compactable in case of
higher order allocations, it contributes to the percentage of the node's
memory that is considered balanced.
This requirement, that a node be only partially balanced, came about
when kswapd was desparately trying to balance tiny zones when all bigger
zones in the node had plenty of free memory. Arguably, the same should
apply to compaction: if a significant part of the node is balanced
enough to run compaction, do not get hung up on that tiny zone that
might never get in shape.
When the compaction logic in kswapd is reached, we know that at least
25% of the node's memory is balanced properly for compaction (see
zone_balanced and pgdat_balanced). Remove the individual zone checks
that restart the kswapd cycle.
Otherwise, we may observe more endless looping in kswapd where the
compaction code loops back to reclaim because of a single zone and
reclaim does nothing because the node is considered balanced overall.
See for example
https://bugzilla.redhat.com/show_bug.cgi?id=866988
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-and-tested-by: Thorsten Leemhuis <fedora@leemhuis.info>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Tested-by: John Ellson <john.ellson@comcast.net>
Tested-by: Zdenek Kabelac <zkabelac@redhat.com>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 0bf380bc70ec ("mm: compaction: check pfn_valid when entering a
new MAX_ORDER_NR_PAGES block during isolation for migration") added a
check for pfn_valid() when isolating pages for migration as the scanner
does not necessarily start pageblock-aligned.
Since commit c89511ab2f8f ("mm: compaction: Restart compaction from near
where it left off"), the free scanner has the same problem. This patch
makes sure that the pfn range passed to isolate_freepages_block() is
within the same block so that pfn_valid() checks are unnecessary.
In answer to Henrik's wondering why others have not reported this:
reproducing this requires a large enough hole with the right aligment to
have compaction walk into a PFN range with no memmap. Size and
alignment depends in the memory model - 4M for FLATMEM and 128M for
SPARSEMEM on x86. It needs a "lucky" machine.
Reported-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|