Skip to content
Snippets Groups Projects
  1. Dec 12, 2009
  2. Dec 11, 2009
  3. Dec 10, 2009
    • Christoph Hellwig's avatar
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig authored
      
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarUlrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      6b2f3d1f
  4. Nov 30, 2009
  5. Nov 26, 2009
    • Ilya Loginov's avatar
      block: add helpers to run flush_dcache_page() against a bio and a request's pages · 2d4dc890
      Ilya Loginov authored
      
      Mtdblock driver doesn't call flush_dcache_page for pages in request.  So,
      this causes problems on architectures where the icache doesn't fill from
      the dcache or with dcache aliases.  The patch fixes this.
      
      The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
      pointless empty cache-thrashing loops on architectures for which
      flush_dcache_page() is a no-op.  Every architecture was provided with this
      flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
      equal 1 or do nothing otherwise.
      
      See "fix mtd_blkdevs problem with caches on some architectures" discussion
      on LKML for more information.
      
      Signed-off-by: default avatarIlya Loginov <isloginov@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Peter Horton <phorton@bitbox.co.uk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      2d4dc890
  6. Nov 17, 2009
  7. Nov 09, 2009
  8. Nov 06, 2009
  9. Oct 12, 2009
    • Neil Horman's avatar
      net: Generalize socket rx gap / receive queue overflow cmsg · 3b885787
      Neil Horman authored
      
      Create a new socket level option to report number of queue overflows
      
      Recently I augmented the AF_PACKET protocol to report the number of frames lost
      on the socket receive queue between any two enqueued frames.  This value was
      exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
      requested that this feature be generalized so that any datagram oriented socket
      could make use of this option.  As such I've created this patch, It creates a
      new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
      SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
      overflowed between any two given frames.  It also augments the AF_PACKET
      protocol to take advantage of this new feature (as it previously did not touch
      sk->sk_drops, which this patch uses to record the overflow count).  Tested
      successfully by me.
      
      Notes:
      
      1) Unlike my previous patch, this patch simply records the sk_drops value, which
      is not a number of drops between packets, but rather a total number of drops.
      Deltas must be computed in user space.
      
      2) While this patch currently works with datagram oriented protocols, it will
      also be accepted by non-datagram oriented protocols. I'm not sure if thats
      agreeable to everyone, but my argument in favor of doing so is that, for those
      protocols which aren't applicable to this option, sk_drops will always be zero,
      and reporting no drops on a receive queue that isn't used for those
      non-participating protocols seems reasonable to me.  This also saves us having
      to code in a per-protocol opt in mechanism.
      
      3) This applies cleanly to net-next assuming that commit
      97775007 (my af packet cmsg patch) is reverted
      
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b885787
  10. Sep 28, 2009
  11. Sep 25, 2009
  12. Sep 24, 2009
    • Peter Zijlstra's avatar
      fcntl: add F_[SG]ETOWN_EX · ba0a6c9f
      Peter Zijlstra authored
      
      In order to direct the SIGIO signal to a particular thread of a
      multi-threaded application we cannot, like suggested by the manpage, put a
      TID into the regular fcntl(F_SETOWN) call.  It will still be send to the
      whole process of which that thread is part.
      
      Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.
      
      The need to direct SIGIO comes from self-monitoring profiling such as with
      perf-counters.  Perf-counters uses SIGIO to notify that new sample data is
      available.  If the signal is delivered to the same task that generated the
      new sample it can augment that data by inspecting the task's user-space
      state right after it returns from the kernel.  This is esp.  convenient
      for interpreted or virtual machine driven environments.
      
      Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
      as argument:
      
      struct f_owner_ex {
      	int   type;
      	pid_t pid;
      };
      
      Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Tested-by: default avatarstephane eranian <eranian@googlemail.com>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba0a6c9f
    • Alexey Dobriyan's avatar
      headers: utsname.h redux · 2bcd57ab
      Alexey Dobriyan authored
      
      * remove asm/atomic.h inclusion from linux/utsname.h --
         not needed after kref conversion
       * remove linux/utsname.h inclusion from files which do not need it
      
      NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
      due to some personality stuff it _is_ needed -- cowardly leave ELF-related
      headers and files alone.
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bcd57ab
    • Rusty Russell's avatar
      cpumask: remove arch_send_call_function_ipi · 0748bd01
      Rusty Russell authored
      
      Now everyone is converted to arch_send_call_function_ipi_mask, remove
      the shim and the #defines.
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      0748bd01
  13. Sep 22, 2009
  14. Sep 21, 2009
    • Ingo Molnar's avatar
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar authored
      
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cdd6c482
    • Joe Perches's avatar
  15. Sep 20, 2009
    • Sam Ravnborg's avatar
      kbuild: use INSTALLKERNEL to select customized installkernel script · caa27b66
      Sam Ravnborg authored
      
      Replace the use of CROSS_COMPILE to select a customized
      installkernel script with the possibility to set INSTALLKERNEL
      to select a custom installkernel script when running make:
      
          make INSTALLKERNEL=arm-installkernel install
      
      With this patch we are now more consistent across
      different architectures - they did not all support use
      of CROSS_COMPILE.
      
      The use of CROSS_COMPILE was a hack as this really belongs
      to gcc/binutils and the installkernel script does not change
      just because we change toolchain.
      
      The use of CROSS_COMPILE caused troubles with an upcoming patch
      that saves CROSS_COMPILE when a kernel is built - it would no
      longer be installable.
      [Thanks to Peter Z. for this hint]
      
      This patch undos what Ian did in commit:
      
        0f8e2d62
        ("use ${CROSS_COMPILE}installkernel in arch/*/boot/install.sh")
      
      The patch has been lightly tested on x86 only - but all changes
      looks obvious.
      
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: Mike Frysinger <vapier@gentoo.org> [blackfin]
      Acked-by: Russell King <linux@arm.linux.org.uk> [arm]
      Acked-by: Paul Mundt <lethal@linux-sh.org> [sh]
      Acked-by: "H. Peter Anvin" <hpa@zytor.com> [x86]
      Cc: Ian Campbell <icampbell@arcom.com>
      Cc: Tony Luck <tony.luck@intel.com> [ia64]
      Cc: Fenghua Yu <fenghua.yu@intel.com> [ia64]
      Cc: Hirokazu Takata <takata@linux-m32r.org> [m32r]
      Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Cc: Kyle McMartin <kyle@mcmartin.ca> [parisc]
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc]
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
      Cc: Thomas Gleixner <tglx@linutronix.de> [x86]
      Cc: Ingo Molnar <mingo@redhat.com> [x86]
      Signed-off-by: default avatarSam Ravnborg <sam@ravnborg.org>
      caa27b66
  16. Sep 09, 2009
  17. Sep 02, 2009
    • David Howells's avatar
      KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] · ee18d64c
      David Howells authored
      
      Add a keyctl to install a process's session keyring onto its parent.  This
      replaces the parent's session keyring.  Because the COW credential code does
      not permit one process to change another process's credentials directly, the
      change is deferred until userspace next starts executing again.  Normally this
      will be after a wait*() syscall.
      
      To support this, three new security hooks have been provided:
      cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
      the blank security creds and key_session_to_parent() - which asks the LSM if
      the process may replace its parent's session keyring.
      
      The replacement may only happen if the process has the same ownership details
      as its parent, and the process has LINK permission on the session keyring, and
      the session keyring is owned by the process, and the LSM permits it.
      
      Note that this requires alteration to each architecture's notify_resume path.
      This has been done for all arches barring blackfin, m68k* and xtensa, all of
      which need assembly alteration to support TIF_NOTIFY_RESUME.  This allows the
      replacement to be performed at the point the parent process resumes userspace
      execution.
      
      This allows the userspace AFS pioctl emulation to fully emulate newpag() and
      the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
      alter the parent process's PAG membership.  However, since kAFS doesn't use
      PAGs per se, but rather dumps the keys into the session keyring, the session
      keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
      the newpag flag.
      
      This can be tested with the following program:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <keyutils.h>
      
      	#define KEYCTL_SESSION_TO_PARENT	18
      
      	#define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)
      
      	int main(int argc, char **argv)
      	{
      		key_serial_t keyring, key;
      		long ret;
      
      		keyring = keyctl_join_session_keyring(argv[1]);
      		OSERROR(keyring, "keyctl_join_session_keyring");
      
      		key = add_key("user", "a", "b", 1, keyring);
      		OSERROR(key, "add_key");
      
      		ret = keyctl(KEYCTL_SESSION_TO_PARENT);
      		OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");
      
      		return 0;
      	}
      
      Compiled and linked with -lkeyutils, you should see something like:
      
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: _ses
      	355907932 --alswrv   4043    -1   \_ keyring: _uid.4043
      	[dhowells@andromeda ~]$ /tmp/newpag
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: _ses
      	1055658746 --alswrv   4043  4043   \_ user: a
      	[dhowells@andromeda ~]$ /tmp/newpag hello
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: hello
      	340417692 --alswrv   4043  4043   \_ user: a
      
      Where the test program creates a new session keyring, sticks a user key named
      'a' into it and then installs it on its parent.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      ee18d64c
    • David Howells's avatar
      KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6] · d0420c83
      David Howells authored
      
      Implement TIF_NOTIFY_RESUME for most of those architectures in which isn't yet
      available, and, whilst we're at it, have it call the appropriate tracehook.
      
      After this patch, blackfin, m68k* and xtensa still lack support and need
      alteration of assembly code to make it work.
      
      Resume notification can then be used (by a later patch) to install a new
      session keyring on the parent of a process.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      
      cc: linux-arch@vger.kernel.org
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      d0420c83
  18. Aug 29, 2009
  19. Aug 05, 2009
    • Jan Engelhardt's avatar
      net: implement a SO_DOMAIN getsockoption · 0d6038ee
      Jan Engelhardt authored
      
      This sockopt goes in line with SO_TYPE and SO_PROTOCOL. It makes it
      possible for userspace programs to pass around file descriptors — I
      am referring to arguments-to-functions, but it may even work for the
      fd passing over UNIX sockets — without needing to also pass the
      auxiliary information (PF_INET6/IPPROTO_TCP).
      
      Signed-off-by: default avatarJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d6038ee
    • Jan Engelhardt's avatar
      net: implement a SO_PROTOCOL getsockoption · 49c794e9
      Jan Engelhardt authored
      
      Similar to SO_TYPE returning the socket type, SO_PROTOCOL allows to
      retrieve the protocol used with a given socket.
      
      I am not quite sure why we have that-many copies of socket.h, and why
      the values are not the same on all arches either, but for where hex
      numbers dominate, I use 0x1029 for SO_PROTOCOL as that seems to be
      the next free unused number across a bunch of operating systems, or
      so Google results make me want to believe. SO_PROTOCOL for others
      just uses the next free Linux number, 38.
      
      Signed-off-by: default avatarJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49c794e9
Loading