- Jun 22, 2022
-
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 09dadb59 Author: Yu Kuai <yukuai3@huawei.com> Date: Sat May 21 15:37:47 2022 +0800 nbd: fix io hung while disconnecting device In our tests, "qemu-nbd" triggers a io hung: INFO: task qemu-nbd:11445 blocked for more than 368 seconds. Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x480/0x1050 ? _raw_spin_lock_irqsave+0x3e/0xb0 schedule+0x9c/0x1b0 blk_mq_freeze_queue_wait+0x9d/0xf0 ? ipi_rseq+0x70/0x70 blk_mq_freeze_queue+0x2b/0x40 nbd_add_socket+0x6b/0x270 [nbd] nbd_ioctl+0x383/0x510 [nbd] blkdev_ioctl+0x18e/0x3e0 __x64_sys_ioctl+0xac/0x120 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd8ff706577 RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577 RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0 R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0 "qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following message was found: block nbd0: Send disconnect failed -32 Which indicate that something is wrong with the server. Then, "qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear requests after commit 2516ab15("nbd: only clear the queue on device teardown"). And in the meantime, request can't complete through timeout because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which means such request will never be completed in this situation. Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't complete multiple times, switch back to call nbd_clear_sock() in nbd_clear_sock_ioctl(), so that inflight requests can be cleared. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-5-yukuai3@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 2895f183 Author: Yu Kuai <yukuai3@huawei.com> Date: Sat May 21 15:37:46 2022 +0800 nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed Otherwise io will hung because request will only be completed if the cmd has the flag 'NBD_CMD_INFLIGHT'. Fixes: 07175cb1 ("nbd: make sure request completion won't concurrent") Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220521073749.3146892-4-yukuai3@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit c55b2b98 Author: Yu Kuai <yukuai3@huawei.com> Date: Sat May 21 15:37:45 2022 +0800 nbd: fix race between nbd_alloc_config() and module removal When nbd module is being removing, nbd_alloc_config() may be called concurrently by nbd_genl_connect(), although try_module_get() will return false, but nbd_alloc_config() doesn't handle it. The race may lead to the leak of nbd_config and its related resources (e.g, recv_workq) and oops in nbd_read_stat() due to the unload of nbd module as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000040 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: knbd16-recv recv_work [nbd] RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd] Call Trace: recv_work+0x3b/0xb0 [nbd] process_one_work+0x1ed/0x390 worker_thread+0x4a/0x3d0 kthread+0x12a/0x150 ret_from_fork+0x22/0x30 Fixing it by checking the return value of try_module_get() in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV), assign nbd->config only when nbd_alloc_config() succeeds to ensure the value of nbd->config is binary (valid or NULL). Also adding a debug message to check the reference counter of nbd_config during module removal. Signed-off-by:
Hou Tao <houtao1@huawei.com> Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-3-yukuai3@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 06c4da89 Author: Yu Kuai <yukuai3@huawei.com> Date: Sat May 21 15:37:44 2022 +0800 nbd: call genl_unregister_family() first in nbd_cleanup() Otherwise there may be race between module removal and the handling of netlink command, which can lead to the oops as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000098 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 31299 Comm: nbd-client Tainted: G E 5.14.0-rc4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:down_write+0x1a/0x50 Call Trace: start_creating+0x89/0x130 debugfs_create_dir+0x1b/0x130 nbd_start_device+0x13d/0x390 [nbd] nbd_genl_connect+0x42f/0x748 [nbd] genl_family_rcv_msg_doit.isra.0+0xec/0x150 genl_rcv_msg+0xe5/0x1e0 netlink_rcv_skb+0x55/0x100 genl_rcv+0x29/0x40 netlink_unicast+0x1a8/0x250 netlink_sendmsg+0x21b/0x430 ____sys_sendmsg+0x2a4/0x2d0 ___sys_sendmsg+0x81/0xc0 __sys_sendmsg+0x62/0xb0 __x64_sys_sendmsg+0x1f/0x30 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Modules linked in: nbd(E-) Signed-off-by:
Hou Tao <houtao1@huawei.com> Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-2-yukuai3@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 Conflicts: drop changes on uhs and nvme io_uring passthrough commit e2e53086 Author: Christoph Hellwig <hch@lst.de> Date: Tue May 24 14:15:30 2022 +0200 blk-mq: remove the done argument to blk_execute_rq_nowait Let the caller set it together with the end_io_data instead of passing a pointless argument. Note the the target code did in fact already set it and then just overrode it again by calling blk_execute_rq_nowait. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Keith Busch <kbusch@kernel.org> Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 32ac5a9b Author: Christoph Hellwig <hch@lst.de> Date: Tue May 24 14:15:29 2022 +0200 blk-mq: avoid a mess of casts for blk_end_sync_rq Instead of trying to cast a __bitwise 32-bit integer to a larger integer and then a pointer, just allow a struct with the blk_status_t and the completion on stack and set the end_io_data to that. Use the opportunity to move the code to where it belongs and drop rather confusing comments. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Keith Busch <kbusch@kernel.org> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit ae948fd6 Author: Christoph Hellwig <hch@lst.de> Date: Tue May 24 14:15:28 2022 +0200 blk-mq: remove __blk_execute_rq_nowait We don't want to plug for synchronous execution that where we immediately wait for the request. Once that is done not a whole lot of code is shared, so just remove __blk_execute_rq_nowait. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Keith Busch <kbusch@kernel.org> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 41e46b3c Author: Damien Le Moal <damien.lemoal@opensource.wdc.com> Date: Fri Jun 3 11:19:05 2022 +0900 block: Fix potential deadlock in blk_ia_range_sysfs_show() When being read, a sysfs attribute is already protected against removal with the kobject node active reference counter. As a result, in blk_ia_range_sysfs_show(), there is no need to take the queue sysfs lock when reading the value of a range attribute. Using the queue sysfs lock in this function creates a potential deadlock situation with the disk removal, something that a lockdep signals with a splat when the device is removed: [ 760.703551] Possible unsafe locking scenario: [ 760.703551] [ 760.703554] CPU0 CPU1 [ 760.703556] ---- ---- [ 760.703558] lock(&q->sysfs_lock); [ 760.703565] lock(kn->active#385); [ 760.703573] lock(&q->sysfs_lock); [ 760.703579] lock(kn->active#385); [ 760.703587] [ 760.703587] *** DEADLOCK *** Solve this by removing the mutex_lock()/mutex_unlock() calls from blk_ia_range_sysfs_show(). Fixes: a2247f19 ("block: Add independent access ranges support") Cc: stable@vger.kernel.org Signed-off-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220603021905.1441419-1-damien.lemoal@opensource.wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 22b106e5 Author: Jan Kara <jack@suse.cz> Date: Thu Jun 2 10:12:42 2022 +0200 block: fix bio_clone_blkg_association() to associate with proper blkcg_gq Commit d92c370a ("block: really clone the block cgroup in bio_clone_blkg_association") changed bio_clone_blkg_association() to just clone bio->bi_blkg reference from source to destination bio. This is however wrong if the source and destination bios are against different block devices because struct blkcg_gq is different for each bdev-blkcg pair. This will result in IOs being accounted (and throttled as a result) multiple times against the same device (src bdev) while throttling of the other device (dst bdev) is ignored. In case of BFQ the inconsistency can even result in crashes in bfq_bic_update_cgroup(). Fix the problem by looking up correct blkcg_gq for the cloned bio. Reported-by:
Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by:
Donald Buczek <buczek@molgen.mpg.de> Fixes: d92c370a ("block: really clone the block cgroup in bio_clone_blkg_association") CC: stable@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit ff47dbd1 Author: Damien Le Moal <damien.lemoal@opensource.wdc.com> Date: Thu Jun 2 16:51:59 2022 +0900 block: remove useless BUG_ON() in blk_mq_put_tag() Since the if condition in blk_mq_put_tag() checks that the tag to put is not a reserved one, the BUG_ON() check in the else branch checking if the tag is indeed a reserved one is useless. Remove it. Signed-off-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220602075159.1273366-1-damien.lemoal@opensource.wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit b81c14ca Author: Haisu Wang <haisuwang@tencent.com> Date: Mon May 30 14:40:59 2022 +0800 blk-mq: do not update io_ticks with passthrough requests Flush or passthrough requests are not accounted as normal IO in completion. To reflect iostat for slow IO, io_ticks is updated when stat show called based on inflight numbers. It may cause inconsistent io_ticks calculation result. So do not account non-passthrough request when check inflight. Fixes: 86d73312 ("block: update io_ticks when io hang") Signed-off-by:
Haisu Wang <haisuwang@tencent.com> Reviewed-by:
samuelliao <samuelliao@tencent.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220530064059.1120058-1-haisuwang@tencent.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 605f7415 Author: Jens Axboe <axboe@kernel.dk> Date: Sun May 29 07:13:09 2022 -0600 block: make bioset_exit() fully resilient against being called twice Most of bioset_exit() is fine being called twice, as it clears the various allocations etc when they are freed. The exception is bio_alloc_cache_destroy(), which does not clear ->cache when it has freed it. This isn't necessarily a bug, but can be if buggy users does call the exit path more then once, or with just a memset() bioset which has never been initialized. dm appears to be one such user. Fixes: be4d234d ("bio: add allocation cache abstraction") Link: https://lore.kernel.org/linux-block/YpK7m+14A+pZKs5k@casper.infradead.org/ Reported-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit ebd076bf Author: Christoph Hellwig <hch@lst.de> Date: Mon May 23 14:43:02 2022 +0200 block: use bio_queue_enter instead of blk_queue_enter in bio_poll We want to have a valid live gendisk to call ->poll and not just a request_queue, so call the right helper. Fixes: 3e08773c ("block: switch polling to be bio based") Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220523124302.526186-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 98d40e76 Author: Hannes Reinecke <hare@suse.de> Date: Tue May 24 07:56:30 2022 +0200 block: document BLK_STS_AGAIN usage BLK_STS_AGAIN should only be used if RQF_NOWAIT is set and the bio would block. So we'd better document that to avoid accidental misuse. Signed-off-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220524055631.85480-2-hare@suse.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 403d5034 Author: Christoph Hellwig <hch@lst.de> Date: Tue May 24 16:39:19 2022 +0200 block: take destination bvec offsets into account in bio_copy_data_iter Appartly bcache can copy into bios that do not just contain fresh pages but can have offsets into the bio_vecs. Restore support for tht in bio_copy_data_iter. Fixes: f8b679a0 ("block: rewrite bio_copy_data_iter to use bvec_kmap_local and memcpy_to_bvec") Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220524143919.1155501-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 8a177a36 Author: Tejun Heo <tj@kernel.org> Date: Fri May 13 20:55:45 2022 -1000 blk-iolatency: Fix inflight count imbalances and IO hangs on offline iolatency needs to track the number of inflight IOs per cgroup. As this tracking can be expensive, it is disabled when no cgroup has iolatency configured for the device. To ensure that the inflight counters stay balanced, iolatency_set_limit() freezes the request_queue while manipulating the enabled counter, which ensures that no IO is in flight and thus all counters are zero. Unfortunately, iolatency_set_limit() isn't the only place where the enabled counter is manipulated. iolatency_pd_offline() can also dec the counter and trigger disabling. As this disabling happens without freezing the q, this can easily happen while some IOs are in flight and thus leak the counts. This can be easily demonstrated by turning on iolatency on an one empty cgroup while IOs are in flight in other cgroups and then removing the cgroup. Note that iolatency shouldn't have been enabled elsewhere in the system to ensure that removing the cgroup disables iolatency for the whole device. The following keeps flipping on and off iolatency on sda: echo +io > /sys/fs/cgroup/cgroup.subtree_control while true; do mkdir -p /sys/fs/cgroup/test echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency sleep 1 rmdir /sys/fs/cgroup/test sleep 1 done and there's concurrent fio generating direct rand reads: fio --name test --filename=/dev/sda --direct=1 --rw=randread \ --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k while monitoring with the following drgn script: while True: for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()): for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list): blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node') pd = blkg.pd[prog['blkcg_policy_iolatency'].plid] if pd.value_() == 0: continue iolat = container_of(pd, 'struct iolatency_grp', 'pd') inflight = iolat.rq_wait.inflight.counter.value_() if inflight: print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} ' f'{cgroup_path(css.cgroup).decode("utf-8")}') time.sleep(1) The monitoring output looks like the following: inflight=1 sda /user.slice inflight=1 sda /user.slice ... inflight=14 sda /user.slice inflight=13 sda /user.slice inflight=17 sda /user.slice inflight=15 sda /user.slice inflight=18 sda /user.slice inflight=17 sda /user.slice inflight=20 sda /user.slice inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19 inflight=19 sda /user.slice inflight=19 sda /user.slice If a cgroup with stuck inflight ends up getting throttled, the throttled IOs will never get issued as there's no completion event to wake it up leading to an indefinite hang. This patch fixes the bug by unifying enable handling into a work item which is automatically kicked off from iolatency_set_min_lat_nsec() which is called from both iolatency_set_limit() and iolatency_pd_offline() paths. Punting to a work item is necessary as iolatency_pd_offline() is called under spinlocks while freezing a request_queue requires a sleepable context. This also simplifies the code reducing LOC sans the comments and avoids the unnecessary freezes which were happening whenever a cgroup's latency target is newly set or cleared. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Liu Bo <bo.liu@linux.alibaba.com> Fixes: 8c772a9b ("blk-iolatency: fix IO hang due to negative inflight counter") Cc: stable@vger.kernel.org # v5.0+ Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.org Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 5d05426e Author: Ming Lei <ming.lei@redhat.com> Date: Sun May 22 20:23:50 2022 +0800 blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx blk_mq_run_hw_queues() could be run when there isn't queued request and after queue is cleaned up, at that time tagset is freed, because tagset lifetime is covered by driver, and often freed after blk_cleanup_queue() returns. So don't touch ->tagset for figuring out current default hctx by the mapping built in request queue, so use-after-free on tagset can be avoided. Meantime this way should be fast than retrieving mapping from tagset. Cc: "yukuai (C)" <yukuai3@huawei.com> Cc: Jan Kara <jack@suse.cz> Fixes: b6e68ee8 ("blk-mq: Improve performance of non-mq IO schedulers with multiple HW queues") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220522122350.743103-1-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 537b9f2b Author: Julia Lawall <Julia.Lawall@inria.fr> Date: Sat May 21 13:10:38 2022 +0200 mtip32xx: fix typo in comment Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by:
Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220521111145.81697-28-Julia.Lawall@inria.fr Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 491bf8f2 Author: Xie Yongji <xieyongji@bytedance.com> Date: Tue Mar 22 16:06:39 2022 +0800 nbd: Fix hung on disconnect request if socket is closed before When userspace closes the socket before sending a disconnect request, the following I/O requests will be blocked in wait_for_reconnect() until dead timeout. This will cause the following disconnect request also hung on blk_mq_quiesce_queue(). That means we have no way to disconnect a nbd device if there are some I/O requests waiting for reconnecting until dead timeout. It's not expected. So let's wake up the thread waiting for reconnecting directly when a disconnect request is sent. Reported-by:
Xu Jianhai <zero.xu@bytedance.com> Signed-off-by:
Xie Yongji <xieyongji@bytedance.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220322080639.142-1-xieyongji@bytedance.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit c23d47ab Author: Christoph Hellwig <hch@lst.de> Date: Tue Apr 19 08:33:03 2022 +0200 loop: remove most the top-of-file boilerplate comment from the UAPI header Just leave the SPDX marker and the copyright notice and remove the irrelevant rest. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220419063303.583106-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit eb04bb15 Author: Christoph Hellwig <hch@lst.de> Date: Tue Apr 19 08:33:02 2022 +0200 loop: remove most the top-of-file boilerplate comment Remove the irrelevant changelogs and todo notes and just leave the SPDX marker and the copyright notice. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220419063303.583106-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit f21e6e18 Author: Christoph Hellwig <hch@lst.de> Date: Tue Apr 19 08:33:01 2022 +0200 loop: add a SPDX header The copyright statement says: "Redistribution of this file is permitted under the GNU General Public License." and was added by Ted in 1993, at which point GPLv2 only was the default Linux license. Replace it with the usual GPLv2 only SPDX header. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220419063303.583106-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 754d9679 Author: Christoph Hellwig <hch@lst.de> Date: Tue Apr 19 08:33:00 2022 +0200 loop: remove loop.h Merge loop.h into loop.c as all the content is only used there. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220419063303.583106-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 49c3b926 Author: Damien Le Moal <damien.lemoal@opensource.wdc.com> Date: Wed Apr 20 09:57:18 2022 +0900 block: null_blk: Improve device creation with configfs Currently, the directory name used to create a nullb device through sysfs is not used as the device name, potentially causing headaches for users if devices are already created through the modprobe operation withe the nr_device module parameter not set to 0. E.g. a user can do "mkdir /sys/kernel/config/nullb/nullb0" to create a nullb device even though /dev/nullb0 was already created by modprobe. In this case, the configfs nullb device will be named nullb1, causing confusion for the user. Simplify this by using the configfs directory name as the nullb device name, always, unless another nullb device is already using the same name. E.g. if modprobe created nullb0, then: $ mkdir /sys/kernel/config/nullb/nullb0 mkdir: cannot create directory '/sys/kernel/config/nullb/nullb0': File exists will be reported to the user. To implement this, the function null_find_dev_by_name() is added to check for the existence of a nullb device with the name used for a new configfs device directory. nullb_group_make_item() uses this new function to check if the directory name can be used as the disk name. Finally, null_add_dev() is modified to use the device config item name as the disk name for a new nullb device created using configfs. The naming of devices created though modprobe remains unchanged. Of note is that it is possible for a user to create through configfs a nullb device with the same name as an existing device. E.g. $ mkdir /sys/kernel/config/nullb/null will successfully create the nullb device named "null" but this block device will however not appear under /dev/ since /dev/null already exists. Suggested-by:
Joseph Bacik <josef@toxicpanda.com> Signed-off-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-5-damien.lemoal@opensource.wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit db060f54 Author: Damien Le Moal <damien.lemoal@opensource.wdc.com> Date: Wed Apr 20 09:57:17 2022 +0900 block: null_blk: Cleanup messages Use the pr_fmt() macro to prefix all null_blk pr_xxx() messages with "null_blk:" to clarify which module is printing the messages. Also add a pr_info() message in null_add_dev() to print the name of a newly created disk. Signed-off-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-4-damien.lemoal@opensource.wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit b3a0a73e Author: Damien Le Moal <damien.lemoal@opensource.wdc.com> Date: Wed Apr 20 09:57:16 2022 +0900 block: null_blk: Cleanup device creation and deletion Introduce the null_create_dev() and null_destroy_dev() helper functions to respectivel create nullb devices on modprobe and destroy them on rmmod. The null_destroy_dev() helper avoids duplicated code in the null_init() and null_exit() functions for deleting devices. Signed-off-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-3-damien.lemoal@opensource.wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 525323d2 Author: Damien Le Moal <damien.lemoal@opensource.wdc.com> Date: Wed Apr 20 09:57:15 2022 +0900 block: null_blk: Fix code style issues Fix message grammar and code style issues (brackets and indentation) in null_init(). Signed-off-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-2-damien.lemoal@opensource.wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 0000f2f7 Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:14 2022 +0200 xen-blkback: use bdev_discard_alignment Use bdev_discard_alignment to calculate the correct discard alignment offset even for partitions instead of just looking at the queue limit. Also switch to use bdev_discard_granularity to get rid of the last direct queue reference in xen_blkbk_discard. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-12-hch@lst.de [axboe: fold in 'q' removal as it's now unused] Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 18292faa Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:13 2022 +0200 rnbd-srv: use bdev_discard_alignment Use bdev_discard_alignment to calculate the correct discard alignment offset even for partitions instead of just looking at the queue limit. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Jack Wang <jinpu.wang@ionos.com> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-11-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 4e7f0ece Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:12 2022 +0200 nvme: remove a spurious clear of discard_alignment The nvme driver never sets a discard_alignment, so it also doens't need to clear it to zero. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-10-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 4418bfd8 Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:11 2022 +0200 loop: remove a spurious clear of discard_alignment The loop driver never sets a discard_alignment, so it also doens't need to clear it to zero. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-9-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit c3f76529 Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:10 2022 +0200 dasd: don't set the discard_alignment queue limit The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to PAGE_SIZE while the discard granularity is the block size that is smaller or the same as PAGE_SIZE as done by dasd is mostly harmless but also useless. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-8-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 3d50d368 Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:09 2022 +0200 raid5: don't set the discard_alignment queue limit The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by raid5 is mostly harmless but also useless. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Song Liu <song@kernel.org> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-7-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 44d58370 Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:08 2022 +0200 dm-zoned: don't set the discard_alignment queue limit The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by dm-zoned is mostly harmless but also useless. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-6-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 62952cc5 Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:07 2022 +0200 virtio_blk: fix the discard_granularity and discard_alignment queue limits The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. On the other hand the discard_sector_alignment from the virtio 1.1 looks similar to what Linux uses as discard granularity (even if not very well described): "discard_sector_alignment can be used by OS when splitting a request based on alignment. " And at least qemu does set it to the discard granularity. So stop setting the discard_alignment and use the virtio discard_sector_alignment to set the discard granularity. Fixes: 1f23816b ("virtio_blk: add discard and write zeroes support") Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit fb749a87 Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:06 2022 +0200 null_blk: don't set the discard_alignment queue limit The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by null_blk is mostly harmless but also useless. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 4a04d517 Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:05 2022 +0200 nbd: don't set the discard_alignment queue limit The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by nbd is mostly harmless but also useless. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 07c6e92a Author: Christoph Hellwig <hch@lst.de> Date: Mon Apr 18 06:53:04 2022 +0200 ubd: don't set the discard_alignment queue limit The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by ubd is mostly harmless but also useless. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 0b8d7622 Author: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Tue Apr 19 08:31:55 2022 +0900 aoe: Avoid flush_scheduled_work() usage Flushing system-wide workqueues is dangerous and will be forbidden. Replace system_wq with local aoe_wq. Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/abb37616-eec9-2794-e21e-7c623085d987@I-love.SAKURA.ne.jp Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-
Ming Lei authored
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917 commit 8ba816b2 Author: Yu Kuai <yukuai3@huawei.com> Date: Tue Apr 26 10:21:33 2022 +0800 null-blk: save memory footprint for struct nullb_cmd Total 16 bytes can be saved in two ways: 1) The field 'bio' will only be used in bio based mode, and the field 'rq' will only be used in mq mode. Since they won't be used in the same time, declare a union for them. 2) The field 'bool fake_timeout' can be placed in the hole after the field 'error'. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220426022133.3999006-1-yukuai3@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Ming Lei <ming.lei@redhat.com>
-