- Mar 08, 2017
-
-
Ming Lei authored
Both q->mq_kobj and sw queues' kobjects should have been initialized once, instead of doing that each add_disk context. Also this patch removes clearing of ctx in blk_mq_init_cpu_queues() because percpu allocator fills zero to allocated variable. This patch fixes one issue[1] reported from Omar. [1] kernel wearning when doing unbind/bind on one scsi-mq device [ 19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong. [ 19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34 [ 19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014 [ 19.350920] Workqueue: events_unbound async_run_entry_fn [ 19.350920] Call Trace: [ 19.350920] dump_stack+0x63/0x83 [ 19.350920] kobject_init+0x77/0x90 [ 19.350920] blk_mq_register_dev+0x40/0x130 [ 19.350920] blk_register_queue+0xb6/0x190 [ 19.350920] device_add_disk+0x1ec/0x4b0 [ 19.350920] sd_probe_async+0x10d/0x1c0 [sd_mod] [ 19.350920] async_run_entry_fn+0x48/0x150 [ 19.350920] process_one_work+0x1d0/0x480 [ 19.350920] worker_thread+0x48/0x4e0 [ 19.350920] kthread+0x101/0x140 [ 19.350920] ? process_one_work+0x480/0x480 [ 19.350920] ? kthread_create_on_node+0x60/0x60 [ 19.350920] ret_from_fork+0x2c/0x40 Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by:
Ming Lei <tom.leiming@gmail.com> Tested-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Mar 02, 2017
-
-
Jan Kara authored
Commit 6cd18e71 "block: destroy bdi before blockdev is unregistered." moved bdi unregistration (at that time through bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because it needs to happen before blk_unregister_region() call in del_gendisk() for MD. SCSI though will free up the device number from sd_remove() called through a maze of callbacks from device_del() in __scsi_remove_device() before blk_cleanup_queue() and thus similar races as described in 6cd18e71 can happen for SCSI as well as reported by Omar [1]. Moving bdi_unregister() to del_gendisk() works for MD and fixes the problem for SCSI since del_gendisk() gets called from sd_remove() before freeing the device number. This also makes device_add_disk() (calling bdi_register_owner()) more symmetric with del_gendisk(). [1] http://marc.info/?l=linux-block&m=148554717109098&w=2 Tested-by:
Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jan Kara <jack@suse.cz> Tested-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
When drivers are called with a request in blk-mq, blk-mq flags the state such that the driver knows if this is the last request in this call chain or not. The driver can then use that information to defer kicking off IO until bd->last is true. However, with blk-mq and scheduling, we need to allocate a driver tag for a request before it can be issued. If we fail to allocate such a tag, we could end up in the situation where the last request issued did not have bd->last == true set. This can then cause a driver hang. This fixes a hang with virtio-blk, which uses bd->last as a hint on whether to kick the queue or not. Reported-by:
Chris Mason <clm@fb.com> Tested-by:
Chris Mason <clm@fb.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
For legacy scheduling, we always call ioc_exit_icq() with both the ioc and queue lock held. This poses a problem for blk-mq with scheduling, since the queue lock isn't what we use in the scheduler. And since we don't need the queue lock held for ioc exit there, don't grab it and leave any extra locking up to the blk-mq scheduler. Reported-by:
Paolo Valente <paolo.valente@linaro.org> Tested-by:
Paolo Valente <paolo.valente@linaro.org> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Keith Busch authored
A driver may wish to take corrective action if queued requests do not complete within a set time. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Keith Busch authored
Drivers can start a freeze, so this provides a way to wait for frozen. Signed-off-by:
Keith Busch <keith.busch@intel.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
No functional difference, it just makes a little more sense to update the tag map where we actually allocate the tag. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com> Tested-by:
Sagi Grimberg <sagi@grimberg.me>
-
Omar Sandoval authored
Nothing is using it anymore. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com> Tested-by:
Sagi Grimberg <sagi@grimberg.me>
-
Omar Sandoval authored
blk_mq_alloc_request_hctx() allocates a driver request directly, unlike its blk_mq_alloc_request() counterpart. It also crashes because it doesn't update the tags->rqs map. Fix it by making it allocate a scheduler request. Reported-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com> Tested-by:
Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Modified by me to also check at driver tag allocation time if the original request was reserved, so we can be sure to allocate a properly reserved tag at that point in time, too. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Shaohua Li authored
blk_mq_tags/requests of specific hardware queue are mostly used in specific cpus, which might not be in the same numa node as disk. For example, a nvme card is in node 0. half hardware queue will be used by node 0, the other node 1. Signed-off-by:
Shaohua Li <shli@fb.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Ingo Molnar authored
But first update the code that uses these facilities with the new header. Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
We are going to split <linux/sched/user.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/user.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
We are going to split <linux/sched/topology.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/topology.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Feb 28, 2017
-
-
Alexey Dobriyan authored
Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
Fix typos and add the following to the scripts/spelling.txt: embeded||embedded Link: http://lkml.kernel.org/r/1481573103-11329-12-git-send-email-yamada.masahiro@socionext.com Signed-off-by:
Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 27, 2017
-
-
Christoph Hellwig authored
Similar to the PCI version, just calling into virtio instead. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- Feb 23, 2017
-
-
Omar Sandoval authored
In blk_mq_sched_dispatch_requests(), we call blk_mq_sched_mark_restart() after we dispatch requests left over on our hardware queue dispatch list. This is so we'll go back and dispatch requests from the scheduler. In this case, it's only necessary to restart the hardware queue that we are running; there's no reason to run other hardware queues just because we are using shared tags. So, split out blk_mq_sched_mark_restart() into two operations, one for just the hardware queue and one for the whole request queue. The core code only needs the hctx variant, but I/O schedulers will want to use both. This also requires adjusting blk_mq_sched_restart_queues() to always check the queue restart flag, not just when using shared tags. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
Commit 50e1dab8 ("blk-mq-sched: fix starvation for multiple hardware queues and shared tags") fixed one starvation issue for shared tags. However, we can still get into a situation where we fail to allocate a tag because all tags are allocated but we don't have any pending requests on any hardware queue. One solution for this would be to restart all queues that share a tag map, but that really sucks. Ideally, we could just block and wait for a tag, but that isn't always possible from blk_mq_dispatch_rq_list(). However, we can still use the struct sbitmap_queue wait queues with a custom callback instead of blocking. This has a few benefits: 1. It avoids iterating over all hardware queues when completing an I/O, which the current restart code has to do. 2. It benefits from the existing rolling wakeup code. 3. It avoids punting to another thread just to have it block. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Scott Bauer authored
During an error on a comannd, ex: user provides wrong pw to unlock range, we will gracefully terminate the opal session. We want to propagate the original error to userland instead of the result of the session termination, which is almost always a success. Signed-off-by:
Scott Bauer <scott.bauer@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Scott Bauer authored
Before we free the opal structure we need to clean up any saved locking ranges that the user had told us to unlock from a suspend. Signed-off-by:
Scott Bauer <scott.bauer@intel.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Tetsuo Handa authored
IOPRIO_WHO_USER case in sys_ioprio_set()/sys_ioprio_get() are using while_each_thread(), which is unsafe under RCU lock according to commit 0c740d0a ("introduce for_each_thread() to replace the buggy while_each_thread()"). Use for_each_thread() (via for_each_process_thread()) which is safe under RCU lock. Link: http://lkml.kernel.org/r/201702011947.DBD56740.OMVHOLOtSJFFFQ@I-love.SAKURA.ne.jp Link: http://lkml.kernel.org/r/1486041779-4401-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 22, 2017
-
-
Jens Axboe authored
The wording in the entries were poor and not understandable by even deities. Kill the selection for default block scheduler, and impose a policy with sane defaults. Architected-by:
Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jon Derrick authored
By embedding the function data with the function sequence, we can eliminate the external function data and state variable code. It also made obvious some other small cleanups. Signed-off-by:
Jon Derrick <jonathan.derrick@intel.com> Reviewed-by:
Scott Bauer <scott.bauer@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jon Derrick authored
Add a buffer size check against discovery and response header lengths before we loop over their buffers. Signed-off-by:
Jon Derrick <jonathan.derrick@intel.com> Reviewed-by:
Scott Bauer <scott.bauer@intel.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jon Derrick authored
Add helper which verifies the response token is valid and matches the expected value. Merges token_type and response_get_token. Signed-off-by:
Jon Derrick <jonathan.derrick@intel.com> Reviewed-by:
Scott Bauer <scott.bauer@intel.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jon Derrick authored
The short atom parser can return an errno from decoding but does not currently return the error as a signed value. Convert all of the parsers to ssize_t. Signed-off-by:
Jon Derrick <jonathan.derrick@intel.com> Reviewed-by:
Scott Bauer <scott.bauer@intel.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Feb 21, 2017
-
-
Jan Kara authored
Iteration over partitions in del_gendisk() omits part0. Add bdev_unhash_inode() call for the whole device. Otherwise if the device number gets reused, bdev inode will be still associated with the old (stale) bdi. Tested-by:
Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jan Kara authored
Move bdev_unhash_inode() after invalidate_partition() as invalidate_partition() looks up bdev and it cannot find the right bdev inode after bdev_unhash_inode() is called. Thus invalidate_partition() would not invalidate page cache of the previously used bdev. Also use part_devt() when calling bdev_unhash_inode() instead of manually creating the device number. Tested-by:
Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Feb 17, 2017
-
-
Christoph Hellwig authored
Insted of bloating the containing structure with it all the time this allocates struct opal_dev dynamically. Additionally this allows moving the definition of struct opal_dev into sed-opal.c. For this a new private data field is added to it that is passed to the send/receive callback. After that a lot of internals can be made private as well. Signed-off-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Scott Bauer <scott.bauer@intel.com> Reviewed-by:
Scott Bauer <scott.bauer@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Christoph Hellwig authored
Not having OPAL or a sub-feature supported is an entirely normal condition for many drives, so don't warn about it. Keep the messages, but tone them down to debug only. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Scott Bauer <scott.bauer@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
For blk-mq with scheduling, we can potentially end up with ALL driver tags assigned and sitting on the flush queues. If we defer because of an inlfight data request, then we can deadlock if that data request doesn't already have a tag assigned. This fixes a deadlock with running the xfs/297 xfstest, where thousands of syncs can cause the drive queue to stall. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
Usually we don't ask the scheduler for work, if we already have leftovers on the dispatch list. This is done to leave work on the scheduler side for as long as possible, for proper merging. But if we do have work leftover but didn't dispatch anything, then we should ask the scheduler since we could potentially issue requests from that. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
The current request insertion machinery works just fine for directly inserting flushes, so no need to special case this anymore. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
If we are currently out of driver tags, we don't want to add a new flush (without a tag) to the head of the requeue list. We want to add it to the back, behind the others that are potentially also waiting for a tag. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
Currently we're almost there, but if we dispatch nothing, then we still return success. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-