Commits · e045f044e84ef9d500b4477f8e67875e5cb3fc21 · Linaro / qcomlt / demos / linux

Oct 28, 2019

arm64: dts: imx8mq: Move usdhc clocks assignment to board DT · e045f044

Anson Huang authored 5 years ago


usdhc's clock rate is different according to different devices
connected, so clock rate assignment should be placed in board
DT according to different devices connected on each usdhc port.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Abel Vesa <abel.vesa@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

e045f044

arm64: dts: imx8qxp: Move usdhc clocks assignment to board DT · 3944b454

Anson Huang authored 5 years ago


usdhc's clock rate is different according to different devices
connected, so clock rate assignment should be placed in board
DT according to different devices connected on each usdhc port.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Abel Vesa <abel.vesa@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

3944b454

arm64: dts: fsl: Add device tree for S32V234-EVB · bc66392d

Stoica Cosmin-Stefan authored 5 years ago


Add initial version of device tree for S32V234-EVB, including nodes for the
4 Cortex-A53 cores, AIPS bus with UART modules, ARM architected timer and
Generic Interrupt Controller (GIC).

Keep SoC level separate from board level to let future boards with this SoC
share common properties, while the dts files will keep board-dependent
properties.

Signed-off-by: Stoica Cosmin-Stefan <cosmin.stoica@nxp.com>
Signed-off-by: Mihaela Martinas <Mihaela.Martinas@freescale.com>
Signed-off-by: Dan Nica <dan.nica@nxp.com>
Signed-off-by: Larisa Grigore <Larisa.Grigore@nxp.com>
Signed-off-by: Phu Luu An <phu.luuan@nxp.com>
Signed-off-by: Stefan-Gabriel Mirea <stefan-gabriel.mirea@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

bc66392d

arm64: dts: imx8mm-evk: Assigned clocks for audio plls · e8b395b2

S.j. Wang authored 5 years ago


Assign clocks and clock-rates for audio plls, that audio
drivers can utilize them.

Add dai-tdm-slot-num and dai-tdm-slot-width for sound-wm8524,
that sai driver can generate correct bit clock.

Fixes: 13f3b9fd ("arm64: dts: imx8mm-evk: Enable audio codec wm8524")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

e8b395b2

arm64: dts: zii-ultra: Add node for switch watchdog · 4c997d12

Andrey Smirnov authored 5 years ago


Add I2C node for switch watchdog present on both Zest and RMB3 boards.

Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org,
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

4c997d12

arm64: dts: zii-ultra: Add node for accelerometer · 2600069f

Andrey Smirnov authored 5 years ago


Add I2C node for accelerometer present on both Zest and RMB3 boards.

Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org,
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

2600069f

arm64: dts: zii-ultra: Fix regulator-3p3-main's name · 032c10ae

Andrey Smirnov authored 5 years ago


It's 3V3_MAIN, not 3V3V_MAIN on schematic. Fix it.

Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org,
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

032c10ae

arm64: dts: zii-ultra: Fix regulator-vsd-3v3's vin-supply · 7270a6b6

Andrey Smirnov authored 5 years ago

Regulator-vsd-3v3 is supplied via GEN_3V3 rail which is an output of
an "always on" load switch supplied by 3V3_MAIN. GEN_3V3 is also used
as vin-supply by a number of peripherals, so adding it also allows us
to follow the schematic more closely.

Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org,
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

7270a6b6

arm64: dts: ls1028a: Update #clock-cells of dpclk node · 91035cb0

Wen He authored 5 years ago


Update the property #clock-cells = <1> to #clock-cells = <0> of the
dpclk, since the Display output pixel clock driver provides single
clock output.

Signed-off-by: Wen He <wen.he_1@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

91035cb0

Oct 25, 2019

arm64: dts: imx8mq-evk: VDD_ARM power rail is always ON · 13645b1a

Anson Huang authored 5 years ago


On i.MX8MQ EVK board, VDD_ARM is from a DC-DC converter which
is always ON, the GPIO1_IO13 is ONLY to switch VDD_ARM's voltage
between 0.9V and 1V for CPU DVFS, so VDD_ARM's GPIO regulator
should be always ON to avoid below confusion after kernel boot
up:

imx8mqevk login:
[   31.776619] vdd_arm: disabling

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

13645b1a

arm64: dts: imx8qxp-mek: Enable scu key · e0cb59bd

Anson Huang authored 5 years ago


Enable scu key for i.MX8QXP MEK board.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

e0cb59bd

arm64: dts: imx8qxp: Add scu key node · 49dad0c1

Anson Huang authored 5 years ago


Add scu key node for i.MX8QXP, disabled by default as it
depends on board design.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

49dad0c1

Oct 14, 2019

arm64: dts: imx8mn-ddr4-evk: Move iomuxc node to end of file · caa2ac29

Anson Huang authored 5 years ago


All nodes are better to follow alphabetical sort except iomuxc
which has huge pinctrl data, better to put it at the end of
file.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

caa2ac29

arm64: dts: imx8mq-evk: Adjust nodes following alphabetical sort · 0169002f

Anson Huang authored 5 years ago


Adjust some nodes to make them follow alphabetical sort except
iomuxc node which is put at the end of file because of its huge
pinctrl data.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

0169002f

arm64: dts: enable otg mode for dwc3 usb ip on layerscape · 4bfc5303

Yinbo Zhu authored 5 years ago


layerscape otg function should be supported HNP SRP and ADP protocol
accroing to rm doc, but dwc3 code not realize it and use id pin to
detect who is host or device(0 is host 1 is device) this patch is to
enable OTG mode on ls1028ardb ls1088ardb and ls1046ardb in dts

Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

4bfc5303

arm64: dts: imx8mm-evk: Enable pca6416 on i2c3 bus · c8713352

Anson Huang authored 5 years ago


Enable pca6416 on i.MX8MM EVK board's i2c3 bus.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

c8713352

arm64: dts: imx8mm-evk: Add i2c3 support · 4a79aed9

Anson Huang authored 5 years ago


Enable i2c3 for i.MX8MM EVK board.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

4a79aed9

arm64: dts: imx8mm-evk: Adjust i2c nodes following alphabetical sort · d11ece80

Anson Huang authored 5 years ago


The iomuxc node is being put at end of file because of its huge
pinctrl data. I2C devices should be placed in alphabetical sort.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

d11ece80

arm64: dts: ls1028a: Update the clock providers for the Mali DP500 · 13782597

Wen He authored 5 years ago


In order to maximise performance of the LCD Controller's 64-bit AXI
bus, for any give speed bin of the device, the AXI master interface
clock(ACLK) clock can be up to CPU_frequency/2, which is already
capable of optimal performance. In general, ACLK is always expected
to be equal to CPU_frequency/2. APB slave interface clock(PCLK) and
Main processing clock(PCLK) both are tied to the same clock as ACLK.

This change followed the LS1028A Architecture Specification Manual.

Signed-off-by: Wen He <wen.he_1@nxp.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

13782597

Oct 07, 2019

arm64: dts: mark lx2160a esdhc controllers dma coherent · 62b4359c

Russell King authored 5 years ago


The LX2160A esdhc controllers are setup by the driver to be DMA
coherent, but without marking them as such in DT, Linux thinks they
are not.  This can lead to random sporadic DMA errors, even to the
extent of preventing boot, such as:

mmc0: ADMA error
mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
mmc0: sdhci: Blk size:  0x00000008 | Blk cnt:  0x00000001
mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
mmc0: sdhci: Present:   0x01f50008 | Host ctl: 0x00000038
mmc0: sdhci: Power:     0x00000003 | Blk gap:  0x00000000
mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x000040d8
mmc0: sdhci: Timeout:   0x00000003 | Int stat: 0x00000001
mmc0: sdhci: Int enab:  0x037f108f | Sig enab: 0x037f108b
mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202
mmc0: sdhci: Caps:      0x35fa0000 | Caps_1:   0x0000af00
mmc0: sdhci: Cmd:       0x0000333a | Max curr: 0x00000000
mmc0: sdhci: Resp[0]:   0x00000920 | Resp[1]:  0x001d8a33
mmc0: sdhci: Resp[2]:   0x325b5900 | Resp[3]:  0x3f400e00
mmc0: sdhci: Host ctl2: 0x00000000
mmc0: sdhci: ADMA Err:  0x00000009 | ADMA Ptr: 0x000000236d43820c
mmc0: sdhci: ============================================
mmc0: error -5 whilst initialising SD card

These are caused by the device's descriptor fetch hitting speculatively
loaded CPU cache lines that the CPU does not see through the normal,
non-cacheable DMA coherent mapping that it uses for non-coherent
devices.

DT and the device must agree wrt whether the device is DMA coherent or
not.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

62b4359c

Oct 06, 2019

arm64: dts: imx8mn: add ddr pmu node · 2d8e0747

Joakim Zhang authored 5 years ago


Add ddr pmu node for i.MX8MN EVK board.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

2d8e0747

arm64: dts: imx8mq: Enable gpu passive throttling · 9404f2ea

Guido Günther authored 5 years ago


Temperature and hysteresis were picked after the CPU.

Signed-off-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

9404f2ea

arm64: dts: imx8mn: Use "fsl,imx8mm-ocotp" as ocotp's fallback compatible · 2bad8c48

Anson Huang authored 5 years ago


Use "fsl,imx8mm-ocotp" as i.MX8MN ocotp's fallback compatible instead
of "fsl,imx7d-ocotp" to support SoC UID read, as i.MX8MN reuses
i.MX8MM's SoC ID driver.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

2bad8c48

arm64: dts: imx8mm: Remove incorrect fallback compatible for ocotp · b09802a0

Anson Huang authored 5 years ago


Compared to i.MX7D, i.MX8MM has different ocotp layout, so it should
NOT use "fsl,imx7d-ocotp" as ocotp's fallback compatible, remove it.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

b09802a0

arm64: dts: imx8mn: Enable cpu-idle driver · df844a9a

Anson Huang authored 5 years ago


Enable i.MX8MN cpu-idle using generic ARM cpu-idle driver, 2 states
are supported, details as below:

root@imx8mnevk:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
WFI
root@imx8mnevk:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/usage
3098
root@imx8mnevk:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/name
cpu-pd-wait
root@imx8mnevk:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/usage
3078

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

df844a9a

arm64: dts: imx8mn: Add system counter node · c4a21269

Anson Huang authored 5 years ago


Add i.MX8MN system counter node to enable timer-imx-sysctr
broadcast timer driver.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

c4a21269

arm64: dts: imx8mn: Add "fsl,imx8mq-src" as src's fallback compatible · 23b80c20

Anson Huang authored 5 years ago


i.MX8MN can reuse i.MX8MQ's src driver, add "fsl,imx8mq-src" as
src's fallback compatible to enable it.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

23b80c20

arm64: dts: imx8mn-ddr4-evk: Enable GPIO LED · 4c2435a6

Anson Huang authored 5 years ago


i.MX8MN DDR4 EVK board has a GPIO LED to indicate status,
add support for it.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>

4c2435a6

Sep 30, 2019

csky: Move static keyword to the front of declaration · 9af032a3

Krzysztof Wilczynski authored 5 years ago


Move the static keyword to the front of declaration of
csky_pmu_of_device_ids, and resolve the following compiler
warning that can be seen when building with warnings
enabled (W=1):

arch/csky/kernel/perf_event.c:1340:1: warning:
  ‘static’ is not at beginning of declaration [-Wold-style-declaration]

Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Guo Ren <guoren@kernel.org>

9af032a3

csky: entry: Remove unneeded need_resched() loop · a2139d3b

Valentin Schneider authored 5 years ago


Since the enabling and disabling of IRQs within preempt_schedule_irq()
is contained in a need_resched() loop, we don't need the outer arch
code loop.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Guo Ren <guoren@kernel.org>

a2139d3b

csky: Fixup csky_pmu.max_period assignment · 3a09d8e2

Mao Han authored 5 years ago


The csky_pmu.max_period has type u64, and BIT() can only return
32 bits unsigned long on C-SKY. The initialization for max_period
will be incorrect when count_width is bigger than 32.

Use BIT_ULL()

Signed-off-by: Mao Han <han_mao@c-sky.com>
Signed-off-by: Guo Ren <ren_guo@c-sky.com>

3a09d8e2

csky: Fixup add zero_fp fixup perf backtrace panic · 48ede51f

Guo Ren authored 5 years ago


We need set fp zero to let backtrace know the end. The patch fixup perf
callchain panic problem, because backtrace didn't know what is the end
of fp.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Reported-by: Mao Han <han_mao@c-sky.com>

48ede51f

csky: Use generic free_initrd_mem() · fdbdcddc

Mike Rapoport authored 5 years ago


The csky implementation of free_initrd_mem() is an open-coded version of
free_reserved_area() without poisoning.

Remove it and make csky use the generic version of free_initrd_mem().

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Guo Ren <guoren@kernel.org>

fdbdcddc

Sep 26, 2019

powerpc/eeh: Fix eeh eeh_debugfs_break_device() with SRIOV devices · 253c8921

Oliver O'Halloran authored 5 years ago


s/CONFIG_IOV/CONFIG_PCI_IOV/

Whoops.

Fixes: bd6461cc ("powerpc/eeh: Add a eeh_dev_break debugfs interface")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Fixup the #endif comment as well]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190926122502.14826-1-oohall@gmail.com

253c8921

arch/sparc/include/asm/pgtable_64.h: fix build · a22fea94

Andrew Morton authored 5 years ago


A last-minute fixlet which I'd failed to merge at the appropriate time
had the predictable effect.

Fixes: f672e2c217e2d4b2 ("lib: untag user pointers in strn*_user")
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a22fea94

mm: treewide: clarify pgtable_page_{ctor,dtor}() naming · b4ed71f5

Mark Rutland authored 5 years ago

The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
people, and until recently arm64 used these erroneously/pointlessly for
other levels of page table.

To make it incredibly clear that these only apply to the PTE level, and to
align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
to pgtable_pte_page_{ctor,dtor}().

These changes were generated with the following shell script:

----
git grep -lw 'pgtable_page_.tor' | while read FILE; do
    sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
    sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
done
----

... with the documentation re-flowed to remain under 80 columns, and
whitespace fixed up in macros to keep backslashes aligned.

There should be no functional change as a result of this patch.

Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com


Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b4ed71f5

hexagon: drop empty and unused free_initrd_mem · c7cc8d77

Mike Rapoport authored 5 years ago

hexagon never reserves or initializes initrd and the only mention of it is
the empty free_initrd_mem() function.

As we have a generic implementation of free_initrd_mem(), there is no need
to define an empty stub for the hexagon implementation and it can be
dropped.

Link: http://lkml.kernel.org/r/1565858133-25852-1-git-send-email-rppt@linux.ibm.com


Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c7cc8d77

mm: introduce MADV_PAGEOUT · 1a4e58cc

Minchan Kim authored 5 years ago

When a process expects no accesses to a certain memory range for a long
time, it could hint kernel that the pages can be reclaimed instantly but
data should be preserved for future use.  This could reduce workingset
eviction so it ends up increasing performance.

This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
MADV_PAGEOUT can be used by a process to mark a memory range as not
expected to be used for a long time so that kernel reclaims *any LRU*
pages instantly.  The hint can help kernel in deciding which pages to
evict proactively.

A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
intentionally because it's automatically bounded by PMD size.  If PMD
size(e.g., 256) makes some trouble, we could fix it later by limit it to
SWAP_CLUSTER_MAX[1].

- man-page material

MADV_PAGEOUT (since Linux x.x)

Do not expect access in the near future so pages in the specified
regions could be reclaimed instantly regardless of memory pressure.
Thus, access in the range after successful operation could cause
major page fault but never lose the up-to-date contents unlike
MADV_DONTNEED. Pages belonging to a shared mapping are only processed
if a write access is allowed for the calling process.

MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
VM_PFNMAP pages.

[1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/

[minchan@kernel.org: clear PG_active on MADV_PAGEOUT]
  Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com
[akpm@linux-foundation.org: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org


Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1a4e58cc

mm: introduce MADV_COLD · 9c276cc6

Minchan Kim authored 5 years ago

Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.

- Background

The Android terminology used for forking a new process and starting an app
from scratch is a cold start, while resuming an existing app is a hot
start.  While we continually try to improve the performance of cold
starts, hot starts will always be significantly less power hungry as well
as faster so we are trying to make hot start more likely than cold start.

To increase hot start, Android userspace manages the order that apps
should be killed in a process called ActivityManagerService.
ActivityManagerService tracks every Android app or service that the user
could be interacting with at any time and translates that into a ranked
list for lmkd(low memory killer daemon).  They are likely to be killed by
lmkd if the system has to reclaim memory.  In that sense they are similar
to entries in any other cache.  Those apps are kept alive for
opportunistic performance improvements but those performance improvements
will vary based on the memory requirements of individual workloads.

- Problem

Naturally, cached apps were dominant consumers of memory on the system.
However, they were not significant consumers of swap even though they are
good candidate for swap.  Under investigation, swapping out only begins
once the low zone watermark is hit and kswapd wakes up, but the overall
allocation rate in the system might trip lmkd thresholds and cause a
cached process to be killed(we measured performance swapping out vs.
zapping the memory by killing a process.  Unsurprisingly, zapping is 10x
times faster even though we use zram which is much faster than real
storage) so kill from lmkd will often satisfy the high zone watermark,
resulting in very few pages actually being moved to swap.

- Approach

The approach we chose was to use a new interface to allow userspace to
proactively reclaim entire processes by leveraging platform information.
This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
that are known to be cold from userspace and to avoid races with lmkd by
reclaiming apps as soon as they entered the cached state.  Additionally,
it could provide many chances for platform to use much information to
optimize memory efficiency.

To achieve the goal, the patchset introduce two new options for madvise.
One is MADV_COLD which will deactivate activated pages and the other is
MADV_PAGEOUT which will reclaim private pages instantly.  These new
options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
ways to gain some free memory space.  MADV_PAGEOUT is similar to
MADV_DONTNEED in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed immediately; MADV_COLD is similar
to MADV_FREE in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed when memory pressure rises.

This patch (of 5):

When a process expects no accesses to a certain memory range, it could
give a hint to kernel that the pages can be reclaimed when memory pressure
happens but data should be preserved for future use.  This could reduce
workingset eviction so it ends up increasing performance.

This patch introduces the new MADV_COLD hint to madvise(2) syscall.
MADV_COLD can be used by a process to mark a memory range as not expected
to be used in the near future.  The hint can help kernel in deciding which
pages to evict early during memory pressure.

It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves

	active file page -> inactive file LRU
	active anon page -> inacdtive anon LRU

Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
LRU's head because MADV_COLD is a little bit different symantic.
MADV_FREE means it's okay to discard when the memory pressure because the
content of the page is *garbage* so freeing such pages is almost zero
overhead since we don't need to swap out and access afterward causes just
minor fault.  Thus, it would make sense to put those freeable pages in
inactive file LRU to compete other used-once pages.  It makes sense for
implmentaion point of view, too because it's not swapbacked memory any
longer until it would be re-dirtied.  Even, it could give a bonus to make
them be reclaimed on swapless system.  However, MADV_COLD doesn't mean
garbage so reclaiming them requires swap-out/in in the end so it's bigger
cost.  Since we have designed VM LRU aging based on cost-model, anonymous
cold pages would be better to position inactive anon's LRU list, not file
LRU.  Furthermore, it would help to avoid unnecessary scanning if system
doesn't have a swap device.  Let's start simpler way without adding
complexity at this moment.  However, keep in mind, too that it's a caveat
that workloads with a lot of pages cache are likely to ignore MADV_COLD on
anonymous memory because we rarely age anonymous LRU lists.

* man-page material

MADV_COLD (since Linux x.x)

Pages in the specified regions will be treated as less-recently-accessed
compared to pages in the system with similar access frequencies.  In
contrast to MADV_FREE, the contents of the region are preserved regardless
of subsequent writes to pages.

MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
pages.

[akpm@linux-foundation.org: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org


Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9c276cc6

lib: untag user pointers in strn*_user · 903f433f

Andrey Konovalov authored 5 years ago

Patch series "arm64: untag user pointers passed to the kernel", v19.

=== Overview

arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer.  Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.

Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:

1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
             tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
	      pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
	      pointers")

This patchset extends tagged pointer support to syscall arguments.

As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).

For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok).  The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.

The mmap and mremap (only new_addr) syscalls do not currently accept
tagged addresses.  Architectures may interpret the tag as a background
colour for the corresponding vma.

Other memory syscalls (mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally.  Thus for memory syscalls we untag
pointers completely when they enter the kernel.

=== Other approaches

One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls.  With this approach we would need a
custom wrapper for each ioctl variation, which doesn't seem practical.

An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there.  Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched.  Thus this approach
would still require changing all find_vma() callers.

=== Testing

The following testing approaches has been taken to find potential issues
with user pointer untagging:

1. Static testing (with sparse [2] and separately with a custom static
   analyzer based on Clang) to track casts of __user pointers to integer
   types to find places where untagging needs to be done.

2. Static testing with grep to find parts of the kernel that call
   find_vma() (and other similar functions) or directly compare against
   vm_start/vm_end fields of vma.

3. Static testing with grep to find parts of the kernel that compare
   user pointers with TASK_SIZE or other similar consts and macros.

4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
   a modified syzkaller version that passes tagged pointers to the kernel.

Based on the results of the testing the requried patches have been added
to the patchset.

=== Notes

This patchset is meant to be merged together with "arm64 relaxed ABI" [3].

This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].

This patchset has been merged into the Pixel 2 & 3 kernel trees and is
now being used to enable testing of Pixel phones with HWASan.

Thanks!

[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html

[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292

[3] https://lkml.org/lkml/2019/6/12/745

[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a

This patch (of 11)

This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

strncpy_from_user and strnlen_user accept user addresses as arguments, and
do not go through the same path as copy_from_user and others, so here we
need to handle the case of tagged user addresses separately.

Untag user pointers passed to these functions.

Note, that this patch only temporarily untags the pointers to perform
validity checks, but then uses them as is to perform user memory accesses.

[andreyknvl@google.com: fix sparc4 build]
 Link: http://lkml.kernel.org/r/CAAeHK+yx4a-P0sDrXTUxMvO2H0CJZUFPffBrg_cU7oJOZyC7ew@mail.gmail.com
Link: http://lkml.kernel.org/r/c5a78bcad3e94d6cda71fcaa60a423231ae71e4c.1563904656.git.andreyknvl@google.com


Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

903f433f