Skip to content
Snippets Groups Projects
  1. Jun 18, 2021
  2. Jun 03, 2021
    • Jan Kara's avatar
      block: Do not pull requests from the scheduler when we cannot dispatch them · 61347154
      Jan Kara authored
      
      Provided the device driver does not implement dispatch budget accounting
      (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
      requests from the IO scheduler as long as it is willing to give out any.
      That defeats scheduling heuristics inside the scheduler by creating
      false impression that the device can take more IO when it in fact
      cannot.
      
      For example with BFQ IO scheduler on top of virtio-blk device setting
      blkio cgroup weight has barely any impact on observed throughput of
      async IO because __blk_mq_do_dispatch_sched() always sucks out all the
      IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
      when that is all dispatched, it will give out IO of lower weight cgroups
      as well. And then we have to wait for all this IO to be dispatched to
      the disk (which means lot of it actually has to complete) before the
      IO scheduler is queried again for dispatching more requests. This
      completely destroys any service differentiation.
      
      So grab request tag for a request pulled out of the IO scheduler already
      in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
      cannot get it because we are unlikely to be able to dispatch it. That
      way only single request is going to wait in the dispatch list for some
      tag to free.
      
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      61347154
  3. May 24, 2021
  4. May 11, 2021
    • Omar Sandoval's avatar
      kyber: fix out of bounds access when preempted · efed9a33
      Omar Sandoval authored
      
      __blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and
      passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx
      for the current CPU again and uses that to get the corresponding Kyber
      context in the passed hctx. However, the thread may be preempted between
      the two calls to blk_mq_get_ctx(), and the ctx returned the second time
      may no longer correspond to the passed hctx. This "works" accidentally
      most of the time, but it can cause us to read garbage if the second ctx
      came from an hctx with more ctx's than the first one (i.e., if
      ctx->index_hw[hctx->type] > hctx->nr_ctx).
      
      This manifested as this UBSAN array index out of bounds error reported
      by Jakub:
      
      UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9
      index 13106 is out of range for type 'long unsigned int [128]'
      Call Trace:
       dump_stack+0xa4/0xe5
       ubsan_epilogue+0x5/0x40
       __ubsan_handle_out_of_bounds.cold.13+0x2a/0x34
       queued_spin_lock_slowpath+0x476/0x480
       do_raw_spin_lock+0x1c2/0x1d0
       kyber_bio_merge+0x112/0x180
       blk_mq_submit_bio+0x1f5/0x1100
       submit_bio_noacct+0x7b0/0x870
       submit_bio+0xc2/0x3a0
       btrfs_map_bio+0x4f0/0x9d0
       btrfs_submit_data_bio+0x24e/0x310
       submit_one_bio+0x7f/0xb0
       submit_extent_page+0xc4/0x440
       __extent_writepage_io+0x2b8/0x5e0
       __extent_writepage+0x28d/0x6e0
       extent_write_cache_pages+0x4d7/0x7a0
       extent_writepages+0xa2/0x110
       do_writepages+0x8f/0x180
       __writeback_single_inode+0x99/0x7f0
       writeback_sb_inodes+0x34e/0x790
       __writeback_inodes_wb+0x9e/0x120
       wb_writeback+0x4d2/0x660
       wb_workfn+0x64d/0xa10
       process_one_work+0x53a/0xa80
       worker_thread+0x69/0x5b0
       kthread+0x20b/0x240
       ret_from_fork+0x1f/0x30
      
      Only Kyber uses the hctx, so fix it by passing the request_queue to
      ->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can
      map the queues itself to avoid the mismatch.
      
      Fixes: a6088845 ("block: kyber: make kyber more friendly with merging")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      efed9a33
  5. Apr 08, 2021
  6. Mar 04, 2021
  7. Mar 01, 2021
  8. Feb 22, 2021
  9. Dec 04, 2020
  10. Oct 09, 2020
    • Yufen Yu's avatar
      blk-mq: get rid of the dead flush handle code path · c7281524
      Yufen Yu authored
      
      After commit 923218f6 ("blk-mq: don't allocate driver tag upfront
      for flush rq"), blk_mq_submit_bio() will call blk_insert_flush()
      directly to handle flush request rather than blk_mq_sched_insert_request()
      in the case of elevator.
      
      Then, all flush request either have set RQF_FLUSH_SEQ flag when call
      blk_mq_sched_insert_request(), or have inserted into hctx->dispatch.
      So, remove the dead code path.
      
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c7281524
  11. Oct 06, 2020
  12. Sep 08, 2020
  13. Sep 03, 2020
    • John Garry's avatar
      blk-mq: Facilitate a shared sbitmap per tagset · 32bc15af
      John Garry authored
      Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
      multiple reply queues with single hostwide tags.
      
      In addition, these drivers want to use interrupt assignment in
      pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
      CPU hotplug may cause in-flight IO completion to not be serviced when an
      interrupt is shutdown. That problem is solved in commit bf0beec0
      ("blk-mq: drain I/O when all CPUs in a hctx are offline").
      
      However, to take advantage of that blk-mq feature, the HBA HW queuess are
      required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
      queues need to be exposed to the upper layer.
      
      In making that transition, the per-SCSI command request tags are no
      longer unique per Scsi host - they are just unique per hctx. As such, the
      HBA LLDD would have to generate this tag internally, which has a certain
      performance overhead.
      
      However another problem is that blk-mq assumes the host may accept
      (Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e0 ("scsi:
       core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
      counter was removed, which would stop the LLDD being sent more than
      .can_queue commands; however, it should still be ensured that the block
      layer does not issue more than .can_queue commands to the Scsi host.
      
      To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
      which may be requested at init time.
      
      New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
      tagset to indicate whether the shared sbitmap should be used.
      
      Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
      are still allocated per hctx; the reason for this is that if tags and
      requests were only allocated for a single hctx - like hctx0 - it may break
      block drivers which expect a request be associated with a specific hctx,
      i.e. not always hctx0. This will introduce extra memory usage.
      
      This change is based on work originally from Ming Lei in [1] and from
      Bart's suggestion in [2].
      
      [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
      [1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
      [2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be
      
      
      
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
      Tested-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      32bc15af
    • John Garry's avatar
      blk-mq: Pass flags for tag init/free · 1c0706a7
      John Garry authored
      
      Pass hctx/tagset flags argument down to blk_mq_init_tags() and
      blk_mq_free_tags() for selective init/free.
      
      For now, make it include the alloc policy flag, which can be evaluated
      when needed (in blk_mq_init_tags()).
      
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Tested-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1c0706a7
  14. Sep 01, 2020
  15. Aug 17, 2020
    • Ming Lei's avatar
      blk-mq: order adding requests to hctx->dispatch and checking SCHED_RESTART · d7d8535f
      Ming Lei authored
      SCHED_RESTART code path is relied to re-run queue for dispatch requests
      in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding
      requests to hctx->dispatch.
      
      memory barriers have to be used for ordering the following two pair of OPs:
      
      1) adding requests to hctx->dispatch and checking SCHED_RESTART in
      blk_mq_dispatch_rq_list()
      
      2) clearing SCHED_RESTART and checking if there is request in hctx->dispatch
      in blk_mq_sched_restart().
      
      Without the added memory barrier, either:
      
      1) blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime
      blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in
      dispatch side
      
      or
      
      2) blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue
      in dispatch side, meantime checking if there is request in
      hctx->dispatch from blk_mq_sched_restart() is missed.
      
      IO hang in ltp/fs_fill test is reported by kernel test robot:
      
      	https://lkml.org/lkml/2020/7/26/77
      
      
      
      Turns out it is caused by the above out-of-order OPs. And the IO hang
      can't be observed any more after applying this patch.
      
      Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers")
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Jeffery <djeffery@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d7d8535f
  16. Jul 31, 2020
  17. Jul 10, 2020
  18. Jun 30, 2020
  19. Apr 29, 2020
  20. Apr 24, 2020
    • Salman Qazi's avatar
      block: Limit number of items taken from the I/O scheduler in one go · 28d65729
      Salman Qazi authored
      
      Flushes bypass the I/O scheduler and get added to hctx->dispatch
      in blk_mq_sched_bypass_insert.  This can happen while a kworker is running
      hctx->run_work work item and is past the point in
      blk_mq_sched_dispatch_requests where hctx->dispatch is checked.
      
      The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time,
      because the I/O scheduler can feed an arbitrary number of commands.
      
      Since we have only one hctx->run_work, the commands waiting in
      hctx->dispatch will wait an arbitrary length of time for run_work to be
      rerun.
      
      A similar phenomenon exists with dispatches from the software queue.
      
      The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and
      blk_mq_do_dispatch_ctx and return from the run_work handler and let it
      rerun.
      
      Signed-off-by: default avatarSalman Qazi <sqazi@google.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      28d65729
  21. Apr 20, 2020
    • Douglas Anderson's avatar
      blk-mq: Rerun dispatching in the case of budget contention · a0823421
      Douglas Anderson authored
      
      If ever a thread running blk-mq code tries to get budget and fails it
      immediately stops doing work and assumes that whenever budget is freed
      up that queues will be kicked and whatever work the thread was trying
      to do will be tried again.
      
      One path where budget is freed and queues are kicked in the normal
      case can be seen in scsi_finish_command().  Specifically:
      - scsi_finish_command()
        - scsi_device_unbusy()
          - # Decrement "device_busy", AKA release budget
        - scsi_io_completion()
          - scsi_end_request()
            - blk_mq_run_hw_queues()
      
      The above is all well and good.  The problem comes up when a thread
      claims the budget but then releases it without actually dispatching
      any work.  Since we didn't schedule any work we'll never run the path
      of finishing work / kicking the queues.
      
      This isn't often actually a problem which is why this issue has
      existed for a while and nobody noticed.  Specifically we only get into
      this situation when we unexpectedly found that we weren't going to do
      any work.  Code that later receives new work kicks the queues.  All
      good, right?
      
      The problem shows up, however, if timing is just wrong and we hit a
      race.  To see this race let's think about the case where we only have
      a budget of 1 (only one thread can hold budget).  Now imagine that a
      thread got budget and then decided not to dispatch work.  It's about
      to call put_budget() but then the thread gets context switched out for
      a long, long time.  While in this state, any and all kicks of the
      queue (like the when we received new work) will be no-ops because
      nobody can get budget.  Finally the thread holding budget gets to run
      again and returns.  All the normal kicks will have been no-ops and we
      have an I/O stall.
      
      As you can see from the above, you need just the right timing to see
      the race.  To start with, the only case it happens if we thought we
      had work, actually managed to get the budget, but then actually didn't
      have work.  That's pretty rare to start with.  Even then, there's
      usually a very small amount of time between realizing that there's no
      work and putting the budget.  During this small amount of time new
      work has to come in and the queue kick has to make it all the way to
      trying to get the budget and fail.  It's pretty unlikely.
      
      One case where this could have failed is illustrated by an example of
      threads running blk_mq_do_dispatch_sched():
      
      * Threads A and B both run has_work() at the same time with the same
        "hctx".  Imagine has_work() is exact.  There's no lock, so it's OK
        if Thread A and B both get back true.
      * Thread B gets interrupted for a long time right after it decides
        that there is work.  Maybe its CPU gets an interrupt and the
        interrupt handler is slow.
      * Thread A runs, get budget, dispatches work.
      * Thread A's work finishes and budget is released.
      * Thread B finally runs again and gets budget.
      * Since Thread A already took care of the work and no new work has
        come in, Thread B will get NULL from dispatch_request().  I believe
        this is specifically why dispatch_request() is allowed to return
        NULL in the first place if has_work() must be exact.
      * Thread B will now be holding the budget and is about to call
        put_budget(), but hasn't called it yet.
      * Thread B gets interrupted for a long time (again).  Dang interrupts.
      * Now Thread C (maybe with a different "hctx" but the same queue)
        comes along and runs blk_mq_do_dispatch_sched().
      * Thread C won't do anything because it can't get budget.
      * Finally Thread B will run again and put the budget without kicking
        any queues.
      
      Even though the example above is with blk_mq_do_dispatch_sched() I
      believe the race is possible any time someone is holding budget but
      doesn't do work.
      
      Unfortunately, the unlikely has become more likely if you happen to be
      using the BFQ I/O scheduler.  BFQ, by design, sometimes returns "true"
      for has_work() but then NULL for dispatch_request() and stays in this
      state for a while (currently up to 9 ms).  Suddenly you only need one
      race to hit, not two races in a row.  With my current setup this is
      easy to reproduce in reboot tests and traces have actually shown that
      we hit a race similar to the one described above.
      
      Note that we only need to fix blk_mq_do_dispatch_sched() and
      blk_mq_do_dispatch_ctx() and not the other places that put budget.  In
      other cases we know that we have work to do on at least one "hctx" and
      code already exists to kick that "hctx"'s queue.  When that work
      finally finishes all the queues will be kicked using the normal flow.
      
      One last note is that (at least in the SCSI case) budget is shared by
      all "hctx"s that have the same queue.  Thus we need to make sure to
      kick the whole queue, not just re-run dispatching on a single "hctx".
      
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a0823421
  22. Mar 12, 2020
    • Ming Lei's avatar
      blk-mq: insert flush request to the front of dispatch queue · cc3200ea
      Ming Lei authored
      
      commit 01e99aec ("blk-mq: insert passthrough request into
      hctx->dispatch directly") may change to add flush request to the tail
      of dispatch by applying the 'add_head' parameter of
      blk_mq_sched_insert_request.
      
      Turns out this way causes performance regression on NCQ controller because
      flush is non-NCQ command, which can't be queued when there is any in-flight
      NCQ command. When adding flush rq to the front of hctx->dispatch, it is
      easier to introduce extra time to flush rq's latency compared with adding
      to the tail of dispatch queue because of S_SCHED_RESTART, then chance of
      flush merge is increased, and less flush requests may be issued to
      controller.
      
      So always insert flush request to the front of dispatch queue just like
      before applying commit 01e99aec ("blk-mq: insert passthrough request
      into hctx->dispatch directly").
      
      Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
      Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reported-by: default avatarShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Fixes: 01e99aec ("blk-mq: insert passthrough request into hctx->dispatch directly")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cc3200ea
  23. Feb 25, 2020
    • Ming Lei's avatar
      blk-mq: insert passthrough request into hctx->dispatch directly · 01e99aec
      Ming Lei authored
      
      For some reason, device may be in one situation which can't handle
      FS request, so STS_RESOURCE is always returned and the FS request
      will be added to hctx->dispatch. However passthrough request may
      be required at that time for fixing the problem. If passthrough
      request is added to scheduler queue, there isn't any chance for
      blk-mq to dispatch it given we prioritize requests in hctx->dispatch.
      Then the FS IO request may never be completed, and IO hang is caused.
      
      So passthrough request has to be added to hctx->dispatch directly
      for fixing the IO hang.
      
      Fix this issue by inserting passthrough request into hctx->dispatch
      directly together withing adding FS request to the tail of
      hctx->dispatch in blk_mq_dispatch_rq_list(). Actually we add FS request
      to tail of hctx->dispatch at default, see blk_mq_request_bypass_insert().
      
      Then it becomes consistent with original legacy IO request
      path, in which passthrough request is always added to q->queue_head.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      01e99aec
  24. Sep 26, 2019
  25. Jul 03, 2019
  26. Jun 20, 2019
    • Christoph Hellwig's avatar
      block: remove the bi_phys_segments field in struct bio · 14ccb66b
      Christoph Hellwig authored
      
      We only need the number of segments in the blk-mq submission path.
      Remove the field from struct bio, and return it from a variant of
      blk_queue_split instead of that it can passed as an argument to
      those functions that need the value.
      
      This also means we stop recounting segments except for cloning
      and partial segments.
      
      To keep the number of arguments in this how path down remove
      pointless struct request_queue arguments from any of the functions
      that had it and grew a nr_segs argument.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      14ccb66b
  27. Jun 13, 2019
  28. Jun 07, 2019
    • Ming Lei's avatar
      block: free sched's request pool in blk_cleanup_queue · c3e22192
      Ming Lei authored
      
      In theory, IO scheduler belongs to request queue, and the request pool
      of sched tags belongs to the request queue too.
      
      However, the current tags allocation interfaces are re-used for both
      driver tags and sched tags, and driver tags is definitely host wide,
      and doesn't belong to any request queue, same with its request pool.
      So we need tagset instance for freeing request of sched tags.
      
      Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
      of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
      tags to be freed before calling blk_mq_free_tag_set().
      
      Commit 47cdee29 ("block: move blk_exit_queue into __blk_release_queue")
      moves blk_exit_queue into __blk_release_queue for simplying the fast
      path in generic_make_request(), then causes oops during freeing requests
      of sched tags in __blk_release_queue().
      
      Fix the above issue by move freeing request pool of sched tags into
      blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
      in-queue requests at that time. Freeing sched tags has to be kept in queue's
      release handler becasue there might be un-completed dispatch activity
      which might refer to sched tags.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Fixes: 47cdee29 ("block: move blk_exit_queue into __blk_release_queue")
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c3e22192
  29. May 04, 2019
    • Ming Lei's avatar
      blk-mq: grab .q_usage_counter when queuing request from plug code path · e87eb301
      Ming Lei authored
      
      Just like aio/io_uring, we need to grab 2 refcount for queuing one
      request, one is for submission, another is for completion.
      
      If the request isn't queued from plug code path, the refcount grabbed
      in generic_make_request() serves for submission. In theroy, this
      refcount should have been released after the sumission(async run queue)
      is done. blk_freeze_queue() works with blk_sync_queue() together
      for avoiding race between cleanup queue and IO submission, given async
      run queue activities are canceled because hctx->run_work is scheduled with
      the refcount held, so it is fine to not hold the refcount when
      running the run queue work function for dispatch IO.
      
      However, if request is staggered into plug list, and finally queued
      from plug code path, the refcount in submission side is actually missed.
      And we may start to run queue after queue is removed because the queue's
      kobject refcount isn't guaranteed to be grabbed in flushing plug list
      context, then kernel oops is triggered, see the following race:
      
      blk_mq_flush_plug_list():
              blk_mq_sched_insert_requests()
                      insert requests to sw queue or scheduler queue
                      blk_mq_run_hw_queue
      
      Because of concurrent run queue, all requests inserted above may be
      completed before calling the above blk_mq_run_hw_queue. Then queue can
      be freed during the above blk_mq_run_hw_queue().
      
      Fixes the issue by grab .q_usage_counter before calling
      blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
      safe because the queue is absolutely alive before inserting request.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: linux-scsi@vger.kernel.org,
      Cc: Martin K . Petersen <martin.petersen@oracle.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Tested-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e87eb301
  30. Apr 30, 2019
  31. Apr 05, 2019
    • Bart Van Assche's avatar
      block: Revert v5.0 blk_mq_request_issue_directly() changes · fd9c40f6
      Bart Van Assche authored
      
      blk_mq_try_issue_directly() can return BLK_STS*_RESOURCE for requests that
      have been queued. If that happens when blk_mq_try_issue_directly() is called
      by the dm-mpath driver then dm-mpath will try to resubmit a request that is
      already queued and a kernel crash follows. Since it is nontrivial to fix
      blk_mq_request_issue_directly(), revert the blk_mq_request_issue_directly()
      changes that went into kernel v5.0.
      
      This patch reverts the following commits:
      * d6a51a97 ("blk-mq: replace and kill blk_mq_request_issue_directly") # v5.0.
      * 5b7a6f12 ("blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests") # v5.0.
      * 7f556a44 ("blk-mq: refactor the code of issue request directly") # v5.0.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarLaurence Oberman <loberman@redhat.com>
      Tested-by: default avatarLaurence Oberman <loberman@redhat.com>
      Fixes: 7f556a44 ("blk-mq: refactor the code of issue request directly") # v5.0.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fd9c40f6
  32. Feb 01, 2019
  33. Dec 17, 2018
    • Ming Lei's avatar
      blk-mq: fix dispatch from sw queue · c16d6b5a
      Ming Lei authored
      
      When a request is added to rq list of sw queue(ctx), the rq may be from
      a different type of hctx, especially after multi queue mapping is
      introduced.
      
      So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
      blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
      hctx can be dispatched to current hctx in case that read queue or poll
      queue is enabled.
      
      This patch fixes this issue by introducing per-queue-type list.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      
      Changed by me to not use separately cacheline aligned lists, just
      place them all in the same cacheline where we had just the one list
      and lock before.
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c16d6b5a
Loading