- Oct 28, 2019
-
-
Anson Huang authored
usdhc's clock rate is different according to different devices connected, so clock rate assignment should be placed in board DT according to different devices connected on each usdhc port. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Reviewed-by:
Abel Vesa <abel.vesa@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
usdhc's clock rate is different according to different devices connected, so clock rate assignment should be placed in board DT according to different devices connected on each usdhc port. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Reviewed-by:
Abel Vesa <abel.vesa@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Stoica Cosmin-Stefan authored
Add initial version of device tree for S32V234-EVB, including nodes for the 4 Cortex-A53 cores, AIPS bus with UART modules, ARM architected timer and Generic Interrupt Controller (GIC). Keep SoC level separate from board level to let future boards with this SoC share common properties, while the dts files will keep board-dependent properties. Signed-off-by:
Stoica Cosmin-Stefan <cosmin.stoica@nxp.com> Signed-off-by:
Mihaela Martinas <Mihaela.Martinas@freescale.com> Signed-off-by:
Dan Nica <dan.nica@nxp.com> Signed-off-by:
Larisa Grigore <Larisa.Grigore@nxp.com> Signed-off-by:
Phu Luu An <phu.luuan@nxp.com> Signed-off-by:
Stefan-Gabriel Mirea <stefan-gabriel.mirea@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
S.j. Wang authored
Assign clocks and clock-rates for audio plls, that audio drivers can utilize them. Add dai-tdm-slot-num and dai-tdm-slot-width for sound-wm8524, that sai driver can generate correct bit clock. Fixes: 13f3b9fd ("arm64: dts: imx8mm-evk: Enable audio codec wm8524") Signed-off-by:
Shengjiu Wang <shengjiu.wang@nxp.com> Reviewed-by:
Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Andrey Smirnov authored
Add I2C node for switch watchdog present on both Zest and RMB3 boards. Signed-off-by:
Andrey Smirnov <andrew.smirnov@gmail.com> Cc: Fabio Estevam <festevam@gmail.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: linux-arm-kernel@lists.infradead.org, Cc: linux-kernel@vger.kernel.org Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Andrey Smirnov authored
Add I2C node for accelerometer present on both Zest and RMB3 boards. Signed-off-by:
Andrey Smirnov <andrew.smirnov@gmail.com> Cc: Fabio Estevam <festevam@gmail.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: linux-arm-kernel@lists.infradead.org, Cc: linux-kernel@vger.kernel.org Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Andrey Smirnov authored
It's 3V3_MAIN, not 3V3V_MAIN on schematic. Fix it. Signed-off-by:
Andrey Smirnov <andrew.smirnov@gmail.com> Cc: Fabio Estevam <festevam@gmail.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: linux-arm-kernel@lists.infradead.org, Cc: linux-kernel@vger.kernel.org Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Andrey Smirnov authored
Regulator-vsd-3v3 is supplied via GEN_3V3 rail which is an output of an "always on" load switch supplied by 3V3_MAIN. GEN_3V3 is also used as vin-supply by a number of peripherals, so adding it also allows us to follow the schematic more closely. Signed-off-by:
Andrey Smirnov <andrew.smirnov@gmail.com> Cc: Fabio Estevam <festevam@gmail.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: linux-arm-kernel@lists.infradead.org, Cc: linux-kernel@vger.kernel.org Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Wen He authored
Update the property #clock-cells = <1> to #clock-cells = <0> of the dpclk, since the Display output pixel clock driver provides single clock output. Signed-off-by:
Wen He <wen.he_1@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
- Oct 25, 2019
-
-
Anson Huang authored
On i.MX8MQ EVK board, VDD_ARM is from a DC-DC converter which is always ON, the GPIO1_IO13 is ONLY to switch VDD_ARM's voltage between 0.9V and 1V for CPU DVFS, so VDD_ARM's GPIO regulator should be always ON to avoid below confusion after kernel boot up: imx8mqevk login: [ 31.776619] vdd_arm: disabling Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Enable scu key for i.MX8QXP MEK board. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Add scu key node for i.MX8QXP, disabled by default as it depends on board design. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
- Oct 14, 2019
-
-
Anson Huang authored
All nodes are better to follow alphabetical sort except iomuxc which has huge pinctrl data, better to put it at the end of file. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Adjust some nodes to make them follow alphabetical sort except iomuxc node which is put at the end of file because of its huge pinctrl data. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Yinbo Zhu authored
layerscape otg function should be supported HNP SRP and ADP protocol accroing to rm doc, but dwc3 code not realize it and use id pin to detect who is host or device(0 is host 1 is device) this patch is to enable OTG mode on ls1028ardb ls1088ardb and ls1046ardb in dts Signed-off-by:
Yinbo Zhu <yinbo.zhu@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Enable pca6416 on i.MX8MM EVK board's i2c3 bus. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Enable i2c3 for i.MX8MM EVK board. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
The iomuxc node is being put at end of file because of its huge pinctrl data. I2C devices should be placed in alphabetical sort. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Wen He authored
In order to maximise performance of the LCD Controller's 64-bit AXI bus, for any give speed bin of the device, the AXI master interface clock(ACLK) clock can be up to CPU_frequency/2, which is already capable of optimal performance. In general, ACLK is always expected to be equal to CPU_frequency/2. APB slave interface clock(PCLK) and Main processing clock(PCLK) both are tied to the same clock as ACLK. This change followed the LS1028A Architecture Specification Manual. Signed-off-by:
Wen He <wen.he_1@nxp.com> Acked-by:
Li Yang <leoyang.li@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
- Oct 07, 2019
-
-
Russell King authored
The LX2160A esdhc controllers are setup by the driver to be DMA coherent, but without marking them as such in DT, Linux thinks they are not. This can lead to random sporadic DMA errors, even to the extent of preventing boot, such as: mmc0: ADMA error mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00002202 mmc0: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000001 mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013 mmc0: sdhci: Present: 0x01f50008 | Host ctl: 0x00000038 mmc0: sdhci: Power: 0x00000003 | Blk gap: 0x00000000 mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x000040d8 mmc0: sdhci: Timeout: 0x00000003 | Int stat: 0x00000001 mmc0: sdhci: Int enab: 0x037f108f | Sig enab: 0x037f108b mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202 mmc0: sdhci: Caps: 0x35fa0000 | Caps_1: 0x0000af00 mmc0: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000 mmc0: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x001d8a33 mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x3f400e00 mmc0: sdhci: Host ctl2: 0x00000000 mmc0: sdhci: ADMA Err: 0x00000009 | ADMA Ptr: 0x000000236d43820c mmc0: sdhci: ============================================ mmc0: error -5 whilst initialising SD card These are caused by the device's descriptor fetch hitting speculatively loaded CPU cache lines that the CPU does not see through the normal, non-cacheable DMA coherent mapping that it uses for non-coherent devices. DT and the device must agree wrt whether the device is DMA coherent or not. Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Acked-by:
Li Yang <leoyang.li@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
- Oct 06, 2019
-
-
Joakim Zhang authored
Add ddr pmu node for i.MX8MN EVK board. Signed-off-by:
Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Guido Günther authored
Temperature and hysteresis were picked after the CPU. Signed-off-by:
Guido Günther <agx@sigxcpu.org> Reviewed-by:
Lucas Stach <l.stach@pengutronix.de> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Use "fsl,imx8mm-ocotp" as i.MX8MN ocotp's fallback compatible instead of "fsl,imx7d-ocotp" to support SoC UID read, as i.MX8MN reuses i.MX8MM's SoC ID driver. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Compared to i.MX7D, i.MX8MM has different ocotp layout, so it should NOT use "fsl,imx7d-ocotp" as ocotp's fallback compatible, remove it. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Enable i.MX8MN cpu-idle using generic ARM cpu-idle driver, 2 states are supported, details as below: root@imx8mnevk:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name WFI root@imx8mnevk:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/usage 3098 root@imx8mnevk:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/name cpu-pd-wait root@imx8mnevk:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/usage 3078 Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
Add i.MX8MN system counter node to enable timer-imx-sysctr broadcast timer driver. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
i.MX8MN can reuse i.MX8MQ's src driver, add "fsl,imx8mq-src" as src's fallback compatible to enable it. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
Anson Huang authored
i.MX8MN DDR4 EVK board has a GPIO LED to indicate status, add support for it. Signed-off-by:
Anson Huang <Anson.Huang@nxp.com> Signed-off-by:
Shawn Guo <shawnguo@kernel.org>
-
- Sep 30, 2019
-
-
Krzysztof Wilczynski authored
Move the static keyword to the front of declaration of csky_pmu_of_device_ids, and resolve the following compiler warning that can be seen when building with warnings enabled (W=1): arch/csky/kernel/perf_event.c:1340:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration] Signed-off-by:
Krzysztof Wilczynski <kw@linux.com> Signed-off-by:
Guo Ren <guoren@kernel.org>
-
Valentin Schneider authored
Since the enabling and disabling of IRQs within preempt_schedule_irq() is contained in a need_resched() loop, we don't need the outer arch code loop. Signed-off-by:
Valentin Schneider <valentin.schneider@arm.com> Signed-off-by:
Guo Ren <guoren@kernel.org>
-
Mao Han authored
The csky_pmu.max_period has type u64, and BIT() can only return 32 bits unsigned long on C-SKY. The initialization for max_period will be incorrect when count_width is bigger than 32. Use BIT_ULL() Signed-off-by:
Mao Han <han_mao@c-sky.com> Signed-off-by:
Guo Ren <ren_guo@c-sky.com>
-
Guo Ren authored
We need set fp zero to let backtrace know the end. The patch fixup perf callchain panic problem, because backtrace didn't know what is the end of fp. Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Reported-by:
Mao Han <han_mao@c-sky.com>
-
Mike Rapoport authored
The csky implementation of free_initrd_mem() is an open-coded version of free_reserved_area() without poisoning. Remove it and make csky use the generic version of free_initrd_mem(). Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Signed-off-by:
Guo Ren <guoren@kernel.org>
-
- Sep 26, 2019
-
-
Oliver O'Halloran authored
s/CONFIG_IOV/CONFIG_PCI_IOV/ Whoops. Fixes: bd6461cc ("powerpc/eeh: Add a eeh_dev_break debugfs interface") Signed-off-by:
Oliver O'Halloran <oohall@gmail.com> [mpe: Fixup the #endif comment as well] Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190926122502.14826-1-oohall@gmail.com
-
Andrew Morton authored
A last-minute fixlet which I'd failed to merge at the appropriate time had the predictable effect. Fixes: f672e2c217e2d4b2 ("lib: untag user pointers in strn*_user") Cc: Andrey Konovalov <andreyknvl@google.com> Cc: David Miller <davem@davemloft.net> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Rutland authored
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few people, and until recently arm64 used these erroneously/pointlessly for other levels of page table. To make it incredibly clear that these only apply to the PTE level, and to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them to pgtable_pte_page_{ctor,dtor}(). These changes were generated with the following shell script: ---- git grep -lw 'pgtable_page_.tor' | while read FILE; do sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE; sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE; done ---- ... with the documentation re-flowed to remain under 80 columns, and whitespace fixed up in macros to keep backslashes aligned. There should be no functional change as a result of this patch. Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Mike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Rapoport authored
hexagon never reserves or initializes initrd and the only mention of it is the empty free_initrd_mem() function. As we have a generic implementation of free_initrd_mem(), there is no need to define an empty stub for the hexagon implementation and it can be dropped. Link: http://lkml.kernel.org/r/1565858133-25852-1-git-send-email-rppt@linux.ibm.com Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Cc: Richard Kuo <rkuo@codeaurora.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ [minchan@kernel.org: clear PG_active on MADV_PAGEOUT] Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org Signed-off-by:
Minchan Kim <minchan@kernel.org> Reported-by:
kbuild test robot <lkp@intel.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org Signed-off-by:
Minchan Kim <minchan@kernel.org> Reported-by:
kbuild test robot <lkp@intel.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Andrey Konovalov authored
Patch series "arm64: untag user pointers passed to the kernel", v19. === Overview arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces. Right now the kernel is already able to handle user faults with tagged pointers, due to these patches: 1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer") 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers") 3. 276e9327 ("arm64: entry: improve data abort handling of tagged pointers") This patchset extends tagged pointer support to syscall arguments. As per the proposed ABI change [3], tagged pointers are only allowed to be passed to syscalls when they point to memory ranges obtained by anonymous mmap() or sbrk() (see the patchset [3] for more details). For non-memory syscalls this is done by untaging user pointers when the kernel performs pointer checking to find out whether the pointer comes from userspace (most notably in access_ok). The untagging is done only when the pointer is being checked, the tag is preserved as the pointer makes its way through the kernel and stays tagged when the kernel dereferences the pointer when perfoming user memory accesses. The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma. Other memory syscalls (mprotect, etc.) don't do user memory accesses but rather deal with memory ranges, and untagged pointers are better suited to describe memory ranges internally. Thus for memory syscalls we untag pointers completely when they enter the kernel. === Other approaches One of the alternative approaches to untagging that was considered is to completely strip the pointer tag as the pointer enters the kernel with some kind of a syscall wrapper, but that won't work with the countless number of different ioctl calls. With this approach we would need a custom wrapper for each ioctl variation, which doesn't seem practical. An alternative approach to untagging pointers in memory syscalls prologues is to inspead allow tagged pointers to be passed to find_vma() (and other vma related functions) and untag them there. Unfortunately, a lot of find_vma() callers then compare or subtract the returned vma start and end fields against the pointer that was being searched. Thus this approach would still require changing all find_vma() callers. === Testing The following testing approaches has been taken to find potential issues with user pointer untagging: 1. Static testing (with sparse [2] and separately with a custom static analyzer based on Clang) to track casts of __user pointers to integer types to find places where untagging needs to be done. 2. Static testing with grep to find parts of the kernel that call find_vma() (and other similar functions) or directly compare against vm_start/vm_end fields of vma. 3. Static testing with grep to find parts of the kernel that compare user pointers with TASK_SIZE or other similar consts and macros. 4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running a modified syzkaller version that passes tagged pointers to the kernel. Based on the results of the testing the requried patches have been added to the patchset. === Notes This patchset is meant to be merged together with "arm64 relaxed ABI" [3]. This patchset is a prerequisite for ARM's memory tagging hardware feature support [4]. This patchset has been merged into the Pixel 2 & 3 kernel trees and is now being used to enable testing of Pixel phones with HWASan. Thanks! [1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html [2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292 [3] https://lkml.org/lkml/2019/6/12/745 [4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a This patch (of 11) This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. strncpy_from_user and strnlen_user accept user addresses as arguments, and do not go through the same path as copy_from_user and others, so here we need to handle the case of tagged user addresses separately. Untag user pointers passed to these functions. Note, that this patch only temporarily untags the pointers to perform validity checks, but then uses them as is to perform user memory accesses. [andreyknvl@google.com: fix sparc4 build] Link: http://lkml.kernel.org/r/CAAeHK+yx4a-P0sDrXTUxMvO2H0CJZUFPffBrg_cU7oJOZyC7ew@mail.gmail.com Link: http://lkml.kernel.org/r/c5a78bcad3e94d6cda71fcaa60a423231ae71e4c.1563904656.git.andreyknvl@google.com Signed-off-by:
Andrey Konovalov <andreyknvl@google.com> Reviewed-by:
Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by:
Khalid Aziz <khalid.aziz@oracle.com> Acked-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-