- Dec 03, 2021
-
-
Ming Lei authored
It isn't necessary to call blk_mq_run_dispatch_ops() once for issuing single request directly, and enough to do it one time when issuing from whole list. Signed-off-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-5-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 29, 2021
-
-
Christoph Hellwig authored
Move blk_mq_sched_assign_ioc so that many interfaces from the file can be marked static. Rename the function to ioc_find_get_icq as well and return the icq to simplify the interface. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-8-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This reverts commit 4896c4e64ba5d5d5acdbcf68c5910dd4f6d8fa62. The helper is not needed any more. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-6-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
Currently we lookup ICQ only after the request is allocated. However BFQ will want to decide how many scheduler tags it allows a given bfq queue (effectively a process) to consume based on cgroup weight. So provide a function blk_mq_sched_get_icq() so that BFQ can lookup ICQ earlier. Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-1-jack@suse.cz Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
The only user of the io_context for IO is BFQ, yet we put the checking and logic of it into the normal IO path. Put the creation into blk_mq_sched_assign_ioc(), and have BFQ use that helper. Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 11, 2021
-
-
Ming Lei authored
blk_mq_sched_bio_merge is only called from blk-mq.c:blk_attempt_bio_merge(), which is called when queue usage counter is grabbed already: 1) blk_mq_get_new_requests() 2) blk_mq_get_request() - cached request in current plug owns one queue usage counter So don't grab ->q_usage_counter in blk_mq_sched_bio_merge(), and more importantly this nest way causes hang in blk_mq_freeze_queue_wait(). Cc: Christoph Hellwig <hch@lst.de> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211111085134.345235-2-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 05, 2021
-
-
Jens Axboe authored
Retain the old logic for the fops based submit, but for our internal blk_mq_submit_bio(), move the queue entering logic into the core function itself. We need to be a bit careful if going into the scheduler, as a scheduler or queue mappings can arbitrarily change before we have entered the queue. Have the bio scheduler mapping do that separately, it's a very cheap operation compared to actually doing merging locking and lookups. Reviewed-by:
Christoph Hellwig <hch@lst.de> [axboe: update to check merge post submit_bio_checks() doing remap...] Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 30, 2021
-
-
Jean Sacren authored
In the if branch, e is checked. In the else branch, ->dispatch_busy is merely a number and has no effect on !e. We should remove the check of !e since it is always true. Signed-off-by:
Jean Sacren <sakiwit@gmail.com> Link: https://lore.kernel.org/r/20211029202945.3052-1-sakiwit@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 22, 2021
-
-
John Garry authored
We should not reference the queue tagset in blk_mq_sched_tags_teardown() (see function comment) for the blk-mq flags, so use the passed flags instead. This solves a use-after-free, similarly fixed earlier (and since broken again) in commit f0c1c4d2 ("blk-mq: fix use-after-free in blk_mq_exit_sched"). Reported-by:
Linux Kernel Functional Testing <lkft@linaro.org> Reported-by:
Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by:
Anders Roxell <anders.roxell@linaro.org> Fixes: e155b0c2 ("blk-mq: Use shared tags for shared sbitmap support") Signed-off-by:
John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1634890340-15432-1-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 21, 2021
-
-
Pavel Begunkov authored
Combine blk_mq_sched_bio_merge() and blk_attempt_plug_merge() under a common if, so we don't check it twice. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/daedc90d4029a5d1d73344771632b1faca3aaf81.1634755800.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 18, 2021
-
-
Jens Axboe authored
These were added as part of early days debugging for blk-mq, and they are not really useful anymore. Rather than spend cycles updating them, just get rid of them. As a bonus, this shrinks the per-cpu software queue size from 256b to 192b. That's a whole cacheline less. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Extract a fast check out of __block_mq_sched_restart() and inline it for performance reasons. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/894abaa0998e5999f2fe18f271e5efdfc2c32bd2.1633781740.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
John Garry authored
Now that shared sbitmap support really means shared tags, rename symbols to match that. Signed-off-by:
John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
John Garry authored
Currently we use separate sbitmap pairs and active_queues atomic_t for shared sbitmap support. However a full sets of static requests are used per HW queue, which is quite wasteful, considering that the total number of requests usable at any given time across all HW queues is limited by the shared sbitmap depth. As such, it is considerably more memory efficient in the case of shared sbitmap to allocate a set of static rqs per tag set or request queue, and not per HW queue. So replace the sbitmap pairs and active_queues atomic_t with a shared tags per tagset and request queue, which will hold a set of shared static rqs. Since there is now no valid HW queue index to be passed to the blk_mq_ops .init and .exit_request callbacks, pass an invalid index token. This changes the semantics of the APIs, such that the callback would need to validate the HW queue index before using it. Currently no user of shared sbitmap actually uses the HW queue index (as would be expected). Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
John Garry authored
Add a function to combine allocating tags and the associated requests, and factor out common patterns to use this new function. Some function only call blk_mq_alloc_map_and_rqs() now, but more functionality will be added later. Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they are only used in blk-mq.c, and finally rename some functions for conciseness and consistency with other function names: - __blk_mq_alloc_map_and_{request -> rqs}() - blk_mq_alloc_{map_and_requests -> set_map_and_rqs}() Suggested-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-11-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
John Garry authored
Put the functionality to update the sched shared sbitmap size in a common function. Since the same formula is always used to resize, and it can be got from the request queue argument, so just pass the request queue pointer. Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-10-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
John Garry authored
To be more concise and consistent in naming, rename blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs(). Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
John Garry authored
Function blk_mq_sched_alloc_tags() does same as __blk_mq_alloc_map_and_request(), so give a similar name to be consistent. Similarly rename label err_free_tags -> err_free_map_and_rqs. Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-6-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
John Garry authored
It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as the name BLKDEV_MAX_RQ would imply the max requests always, which it is not. Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being the default number of requests assigned when allocating a request queue. Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 27, 2021
-
-
John Garry authored
If the blk_mq_sched_alloc_tags() -> blk_mq_alloc_rqs() call fails, then we call blk_mq_sched_free_tags() -> blk_mq_free_rqs(). It is incorrect to do so, as any rqs would have already been freed in the blk_mq_alloc_rqs() call. Fix by calling blk_mq_free_rq_map() only directly. Fixes: 6917ff0b ("blk-mq-sched: refactor scheduler initialization") Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1627378373-148090-1-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 25, 2021
-
-
Jan Kara authored
Lockdep complains about lock inversion between ioc->lock and bfqd->lock: bfqd -> ioc: put_io_context+0x33/0x90 -> ioc->lock grabbed blk_mq_free_request+0x51/0x140 blk_put_request+0xe/0x10 blk_attempt_req_merge+0x1d/0x30 elv_attempt_insert_merge+0x56/0xa0 blk_mq_sched_try_insert_merge+0x4b/0x60 bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed blk_mq_sched_insert_requests+0xd6/0x2b0 blk_mq_flush_plug_list+0x154/0x280 blk_finish_plug+0x40/0x60 ext4_writepages+0x696/0x1320 do_writepages+0x1c/0x80 __filemap_fdatawrite_range+0xd7/0x120 sync_file_range+0xac/0xf0 ioc->bfqd: bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed put_io_context_active+0x78/0xb0 -> ioc->lock grabbed exit_io_context+0x48/0x50 do_exit+0x7e9/0xdd0 do_group_exit+0x54/0xc0 To avoid this inversion we change blk_mq_sched_try_insert_merge() to not free the merged request but rather leave that upto the caller similarly to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure to free all the merged requests after dropping bfqd->lock. Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Reviewed-by:
Ming Lei <ming.lei@redhat.com> Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210623093634.27879-3-jack@suse.cz Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 18, 2021
-
-
Damien Le Moal authored
The insert_requests and dispatch_request elevator operations are mandatory for the correct execution of an elevator, and all implemented elevators (bfq, kyber and mq-deadline) implement them. As a result, there is no need to check for these operations before calling them when a queue has an elevator set. This simplifies the code in __blk_mq_sched_dispatch_requests() and blk_mq_sched_insert_request(). To avoid out-of-tree elevators to crash the kernel in case of bad implementation, add a check in elv_register() to verify that these operations are implemented. A small, probably not significant, IOPS improvement of 0.1% is observed with this patch applied (4.117 MIOPS to 4.123 MIOPS, average of 20 fio runs doing 4K random direct reads with psync and 32 jobs). Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210618015922.713999-1-damien.lemoal@wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
tagset can't be used after blk_cleanup_queue() is returned because freeing tagset usually follows blk_clenup_queue(). Commit d97e594c ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap") adds check on q->tag_set->flags in blk_mq_exit_sched(), and causes use-after-free. Fixes it by using hctx->flags. Reported-by:
<syzbot+77ba3d171a25c56756ea@syzkaller.appspotmail.com> Fixes: d97e594c ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap") Cc: John Garry <john.garry@huawei.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Tested-by:
John Garry <john.garry@huawei.com> Reviewed-by:
John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20210609063046.122843-1-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 03, 2021
-
-
Jan Kara authored
Provided the device driver does not implement dispatch budget accounting (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls requests from the IO scheduler as long as it is willing to give out any. That defeats scheduling heuristics inside the scheduler by creating false impression that the device can take more IO when it in fact cannot. For example with BFQ IO scheduler on top of virtio-blk device setting blkio cgroup weight has barely any impact on observed throughput of async IO because __blk_mq_do_dispatch_sched() always sucks out all the IO queued in BFQ. BFQ first submits IO from higher weight cgroups but when that is all dispatched, it will give out IO of lower weight cgroups as well. And then we have to wait for all this IO to be dispatched to the disk (which means lot of it actually has to complete) before the IO scheduler is queried again for dispatching more requests. This completely destroys any service differentiation. So grab request tag for a request pulled out of the IO scheduler already in __blk_mq_do_dispatch_sched() and do not pull any more requests if we cannot get it because we are unlikely to be able to dispatch it. That way only single request is going to wait in the dispatch list for some tag to free. Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.cz Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 24, 2021
-
-
John Garry authored
The tags used for an IO scheduler are currently per hctx. As such, when q->nr_hw_queues grows, so does the request queue total IO scheduler tag depth. This may cause problems for SCSI MQ HBAs whose total driver depth is fixed. Ming and Yanhui report higher CPU usage and lower throughput in scenarios where the fixed total driver tag depth is appreciably lower than the total scheduler tag depth: https://lore.kernel.org/linux-block/440dfcfc-1a2c-bd98-1161-cec4d78c6dfc@huawei.com/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76b In that scenario, since the scheduler tag is got first, much contention is introduced since a driver tag may not be available after we have got the sched tag. Improve this scenario by introducing request queue-wide tags for when a tagset-wide sbitmap is used. The static sched requests are still allocated per hctx, as requests are initialised per hctx, as in blk_mq_init_request(..., hctx_idx, ...) -> set->ops->init_request(.., hctx_idx, ...). For simplicity of resizing the request queue sbitmap when updating the request queue depth, just init at the max possible size, so we don't need to deal with the possibly with swapping out a new sbitmap for old if we need to grow. Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1620907258-30910-3-git-send-email-john.garry@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 11, 2021
-
-
Omar Sandoval authored
__blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx for the current CPU again and uses that to get the corresponding Kyber context in the passed hctx. However, the thread may be preempted between the two calls to blk_mq_get_ctx(), and the ctx returned the second time may no longer correspond to the passed hctx. This "works" accidentally most of the time, but it can cause us to read garbage if the second ctx came from an hctx with more ctx's than the first one (i.e., if ctx->index_hw[hctx->type] > hctx->nr_ctx). This manifested as this UBSAN array index out of bounds error reported by Jakub: UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9 index 13106 is out of range for type 'long unsigned int [128]' Call Trace: dump_stack+0xa4/0xe5 ubsan_epilogue+0x5/0x40 __ubsan_handle_out_of_bounds.cold.13+0x2a/0x34 queued_spin_lock_slowpath+0x476/0x480 do_raw_spin_lock+0x1c2/0x1d0 kyber_bio_merge+0x112/0x180 blk_mq_submit_bio+0x1f5/0x1100 submit_bio_noacct+0x7b0/0x870 submit_bio+0xc2/0x3a0 btrfs_map_bio+0x4f0/0x9d0 btrfs_submit_data_bio+0x24e/0x310 submit_one_bio+0x7f/0xb0 submit_extent_page+0xc4/0x440 __extent_writepage_io+0x2b8/0x5e0 __extent_writepage+0x28d/0x6e0 extent_write_cache_pages+0x4d7/0x7a0 extent_writepages+0xa2/0x110 do_writepages+0x8f/0x180 __writeback_single_inode+0x99/0x7f0 writeback_sb_inodes+0x34e/0x790 __writeback_inodes_wb+0x9e/0x120 wb_writeback+0x4d2/0x660 wb_workfn+0x64d/0xa10 process_one_work+0x53a/0xa80 worker_thread+0x69/0x5b0 kthread+0x20b/0x240 ret_from_fork+0x1f/0x30 Only Kyber uses the hctx, so fix it by passing the request_queue to ->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can map the queues itself to avoid the mismatch. Fixes: a6088845 ("block: kyber: make kyber more friendly with merging") Reported-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 08, 2021
-
-
Sami Tolvanen authored
list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Sami Tolvanen <samitolvanen@google.com> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Kees Cook <keescook@chromium.org> Tested-by:
Nick Desaulniers <ndesaulniers@google.com> Tested-by:
Nathan Chancellor <nathan@kernel.org> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
-
- Mar 04, 2021
-
-
Ming Lei authored
SCSI uses a global atomic variable to track queue depth for each LUN/request queue. This doesn't scale well when there are lots of CPU cores and the disk is very fast. It has been observed that IOPS is affected a lot by tracking queue depth via sdev->device_busy in the I/O path. Return budget token from .get_budget callback. The budget token can be passed to driver so that we can replace the atomic variable with sbitmap_queue and alleviate the scaling problems that way. Link: https://lore.kernel.org/r/20210122023317.687987-9-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Tested-by:
Sumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- Mar 01, 2021
-
-
Jean Delvare authored
Commit a1ce35fa ("block: remove dead elevator code") removed all users of RQF_SORTED. However it is still defined, and there is one reference left to it (which in effect is dead code). Clear it all up. Signed-off-by:
Jean Delvare <jdelvare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <ming.lei@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 22, 2021
-
-
Chaitanya Kulkarni authored
Get rid of the wrapper for trace_block_rq_insert() and call the function directly. Signed-off-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 04, 2020
-
-
Christoph Hellwig authored
The request_queue can trivially be derived from the request. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 09, 2020
-
-
Yufen Yu authored
After commit 923218f6 ("blk-mq: don't allocate driver tag upfront for flush rq"), blk_mq_submit_bio() will call blk_insert_flush() directly to handle flush request rather than blk_mq_sched_insert_request() in the case of elevator. Then, all flush request either have set RQF_FLUSH_SEQ flag when call blk_mq_sched_insert_request(), or have inserted into hctx->dispatch. So, remove the dead code path. Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 06, 2020
-
-
Christoph Hellwig authored
Move blk_mq_sched_try_merge to blk-merge.c, which allows to mark a lot of the merge infrastructure static there. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 08, 2020
-
-
Baolin Wang authored
Now we usually free the hctx->sched_data by e->type->ops.exit_hctx(), and no users will use blk_mq_sched_free_hctx_data() function. Remove it. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 03, 2020
-
-
John Garry authored
Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support multiple reply queues with single hostwide tags. In addition, these drivers want to use interrupt assignment in pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0], CPU hotplug may cause in-flight IO completion to not be serviced when an interrupt is shutdown. That problem is solved in commit bf0beec0 ("blk-mq: drain I/O when all CPUs in a hctx are offline"). However, to take advantage of that blk-mq feature, the HBA HW queuess are required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW queues need to be exposed to the upper layer. In making that transition, the per-SCSI command request tags are no longer unique per Scsi host - they are just unique per hctx. As such, the HBA LLDD would have to generate this tag internally, which has a certain performance overhead. However another problem is that blk-mq assumes the host may accept (Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e0 ("scsi: core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy counter was removed, which would stop the LLDD being sent more than .can_queue commands; however, it should still be ensured that the block layer does not issue more than .can_queue commands to the Scsi host. To solve this problem, introduce a shared sbitmap per blk_mq_tag_set, which may be requested at init time. New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the tagset to indicate whether the shared sbitmap should be used. Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests are still allocated per hctx; the reason for this is that if tags and requests were only allocated for a single hctx - like hctx0 - it may break block drivers which expect a request be associated with a specific hctx, i.e. not always hctx0. This will introduce extra memory usage. This change is based on work originally from Ming Lei in [1] and from Bart's suggestion in [2]. [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/ [2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be Signed-off-by:
John Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by:
Douglas Gilbert <dgilbert@interlog.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
John Garry authored
Pass hctx/tagset flags argument down to blk_mq_init_tags() and blk_mq_free_tags() for selective init/free. For now, make it include the alloc policy flag, which can be evaluated when needed (in blk_mq_init_tags()). Signed-off-by:
John Garry <john.garry@huawei.com> Tested-by:
Douglas Gilbert <dgilbert@interlog.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 01, 2020
-
-
Xianting Tian authored
Replace various magic -1 constants for tags with BLK_MQ_NO_TAG. Signed-off-by:
Xianting Tian <tian.xianting@h3c.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The small blk_mq_attempt_merge() function is only called by __blk_mq_sched_bio_merge(), just open code it. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
There are lots of duplicated code when trying to merge a bio from plug list and sw queue, we can introduce a new helper to attempt to merge a bio, which can simplify the blk_bio_list_merge() and blk_attempt_plug_merge(). Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
Move the blk_mq_bio_list_merge() into blk-merge.c and rename it as a generic name. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-