path: root/ipc
AgeCommit message (Collapse)AuthorFilesLines
2009-04-13namespaces: move get_mq() inside #ifdef CONFIG_SYSCTLGeert Uytterhoeven1-1/+1
| ipc/mq_sysctl.c:26: warning: 'get_mq' defined but not used Signed-off-by: Geert Uytterhoeven <> Acked-by: Serge Hallyn <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-04-07namespaces: mqueue namespace: adapt sysctlSerge E. Hallyn3-64/+118
Largely inspired from ipc/ipc_sysctl.c. This patch isolates the mqueue sysctl stuff in its own file. [ build fix] Signed-off-by: Cedric Le Goater <> Signed-off-by: Nadia Derbey <> Signed-off-by: Serge E. Hallyn <> Cc: Alexey Dobriyan <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-04-07namespaces: ipc namespaces: implement support for posix msqueuesSerge E. Hallyn4-43/+124
Implement multiple mounts of the mqueue file system, and link it to usage of CLONE_NEWIPC. Each ipc ns has a corresponding mqueuefs superblock. When a user does clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an internal mount of a new mqueuefs sb linked to the new ipc ns. When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the mqueuefs superblock. Posix message queues can be worked with both through the mq_* system calls (see mq_overview(7)), and through the VFS through the mqueue mount. Any usage of mq_open() and friends will work with the acting task's ipc namespace. Any actions through the VFS will work with the mqueuefs in which the file was created. So if a user doesn't remount mqueuefs after unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls /dev/mqueue". If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns, ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1) ipc_ns:1 will be freed, (2) it's superblock will live on until task b umounts the corresponding mqueuefs, and vfs actions will continue to succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to the deceased ipc_ns:1. To make this happen, we must protect the ipc reference count when a) a task exits and drops its ipcns->count, since it might be dropping it to 0 and freeing the ipcns b) a task accesses the ipcns through its mqueuefs interface, since it bumps the ipcns refcount and might race with the last task in the ipcns exiting. So the kref is changed to an atomic_t so we can use atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns through ns = mqueuefs_sb->s_fs_info is protected by the same lock. Signed-off-by: Cedric Le Goater <> Signed-off-by: Serge E. Hallyn <> Cc: Alexey Dobriyan <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-04-07namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespaceSerge E. Hallyn5-65/+108
Move mqueue vfsmount plus a few tunables into the ipc_namespace struct. The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the posix message queue namespaces and the SYSV ipc namespaces. The sysctl code will be fixed separately in patch 3. After just this patch, making a change to posix mqueue tunables always changes the values in the initial ipc namespace. Signed-off-by: Cedric Le Goater <> Signed-off-by: Serge E. Hallyn <> Cc: Alexey Dobriyan <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-04-02Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git:// * 'for-linus' of git:// Remove two unneeded exports and make two symbols static in fs/mpage.c Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225 Trim includes of fdtable.h Don't crap into descriptor table in binfmt_som Trim includes in binfmt_elf Don't mess with descriptor table in load_elf_binary() Get rid of indirect include of fs_struct.h New helper - current_umask() check_unsafe_exec() doesn't care about signal handlers sharing New locking/refcounting for fs_struct Take fs_struct handling to new file (fs/fs_struct.c) Get rid of bumping fs_struct refcount in pivot_root(2) Kill unsharing fs_struct in __set_personality()
2009-04-02proc_sysctl: use CONFIG_PROC_SYSCTL around ipc and utsname proc_handlersSerge E. Hallyn1-1/+1
As pointed out by Cedric Le Goater (in response to Alexey's original comment wrt mqns), ipc_sysctl.c and utsname_sysctl.c are using CONFIG_PROC_FS, not CONFIG_PROC_SYSCTL, to determine whether to define the proc_handlers. Change that. Signed-off-by: Serge E. Hallyn <> Cc: Cedric Le Goater <> Acked-by: Alexey Dobriyan <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-04-02ipc: make shm_get_stat() more robustTony Battersby1-2/+4
shm_get_stat() assumes idr_find(&shm_ids(ns).ipcs_idr) returns "struct shmid_kernel *"; all other callers assume that it returns "struct kern_ipc_perm *". This works because "struct kern_ipc_perm" is currently the first member of "struct shmid_kernel", but it would be better to use container_of() to prevent future breakage. Signed-off-by: Tony Battersby <> Cc: Jiri Olsa <> Cc: Jiri Kosina <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-03-31New helper - current_umask()Al Viro1-1/+1
current->fs->umask is what most of fs_struct users are doing. Put that into a helper function. Signed-off-by: Al Viro <>
2009-03-26Merge branch 'bkl-removal' of git:// Torvalds1-0/+2
* 'bkl-removal' of git:// Rationalize fasync return values Move FASYNC bit handling to f_op->fasync() Use f_lock to protect f_flags Rename struct file->f_ep_lock
2009-03-24Merge branch 'master' into nextJames Morris1-3/+5
2009-03-16Use f_lock to protect f_flagsJonathan Corbet1-0/+2
Traditionally, changes to struct file->f_flags have been done under BKL protection, or with no protection at all. This patch causes all f_flags changes after file open/creation time to be done under protection of f_lock. This allows the removal of some BKL usage and fixes a number of longstanding (if microscopic) races. Reviewed-by: Christoph Hellwig <> Cc: Al Viro <> Signed-off-by: Jonathan Corbet <>
2009-02-10Do not account for the address space used by hugetlbfs using VM_ACCOUNTMel Gorman1-3/+5
When overcommit is disabled, the core VM accounts for pages used by anonymous shared, private mappings and special mappings. It keeps track of VMAs that should be accounted for with VM_ACCOUNT and VMAs that never had a reserve with VM_NORESERVE. Overcommit for hugetlbfs is much riskier than overcommit for base pages due to contiguity requirements. It avoids overcommiting on both shared and private mappings using reservation counters that are checked and updated during mmap(). This ensures (within limits) that hugepages exist in the future when faults occurs or it is too easy to applications to be SIGKILLed. As hugetlbfs makes its own reservations of a different unit to the base page size, VM_ACCOUNT should never be set. Even if the units were correct, we would double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may be set because an application can request no reserves be made for hugetlbfs at the risk of getting killed later. With commit fc8744adc870a8d4366908221508bb113d8b72ee, VM_NORESERVE and VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This breaks the accounting for both the core VM and hugetlbfs, can trigger an OOM storm when hugepage pools are too small lockups and corrupted counters otherwise are used. This patch brings hugetlbfs more in line with how the core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set. Signed-off-by: Mel Gorman <> Signed-off-by: Linus Torvalds <>
2009-02-06Merge branch 'master' into nextJames Morris6-132/+141
Conflicts: fs/namei.c Manually merged per: diff --cc fs/namei.c index 734f2b5,bbc15c2..0000000 --- a/fs/namei.c +++ b/fs/namei.c @@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char nd->flags |= LOOKUP_CONTINUE; err = exec_permission_lite(inode); if (err == -EAGAIN) - err = vfs_permission(nd, MAY_EXEC); + err = inode_permission(nd->path.dentry->d_inode, + MAY_EXEC); + if (!err) + err = ima_path_check(&nd->path, MAY_EXEC); if (err) break; @@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc flag &= ~O_TRUNC; } - error = vfs_permission(nd, acc_mode); + error = inode_permission(inode, acc_mode); if (error) return error; + - error = ima_path_check(&nd->path, ++ error = ima_path_check(path, + acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC)); + if (error) + return error; /* * An append-only file must be opened in append mode for writing. */ Signed-off-by: James Morris <>
2009-02-06Integrity: IMA file free imbalanceMimi Zohar1-0/+3
The number of calls to ima_path_check()/ima_file_free() should be balanced. An extra call to fput(), indicates the file could have been accessed without first being measured. Although f_count is incremented/decremented in places other than fget/fput, like fget_light/fput_light and get_file, the current task must already hold a file refcnt. The call to __fput() is delayed until the refcnt becomes 0, resulting in ima_file_free() flagging any changes. - add hook to increment opencount for IPC shared memory(SYSV), shmat files, and /dev/zero - moved NULL iint test in opencount_get() Signed-off-by: Mimi Zohar <> Acked-by: Serge Hallyn <> Signed-off-by: James Morris <>
2009-02-05shm: fix shmctl(SHM_INFO) lockup with !CONFIG_SHMEMTony Battersby1-0/+4
shm_get_stat() assumes that the inode is a "struct shmem_inode_info", which is incorrect for !CONFIG_SHMEM (see fs/ramfs/inode.c: ramfs_get_inode() vs. mm/shmem.c: shmem_get_inode()). This bad assumption can cause shmctl(SHM_INFO) to lockup when shm_get_stat() tries to spin_lock(&info->lock). Users of !CONFIG_SHMEM may encounter this lockup simply by invoking the 'ipcs' command. Reported by Jiri Olsa back in February 2008: Signed-off-by: Tony Battersby <> Cc: Jiri Kosina <> Reported-by: Jiri Olsa <> Cc: Hugh Dickins <> Cc: <> [2.6.everything] Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-01-31Stop playing silly games with the VM_ACCOUNT flagLinus Torvalds1-2/+2
The mmap_region() code would temporarily set the VM_ACCOUNT flag for anonymous shared mappings just to inform shmem_zero_setup() that it should enable accounting for the resulting shm object. It would then clear the flag after calling ->mmap (for the /dev/zero case) or doing shmem_zero_setup() (for the MAP_ANON case). This just resulted in vma merge issues, but also made for just unnecessary confusion. Use the already-existing VM_NORESERVE flag for this instead, and let shmem_{zero|file}_setup() just figure it out from that. This also happens to make it obvious that the new DRI2 GEM layer uses a non-reserving backing store for its object allocation - which is quite possibly not intentional. But since I didn't want to change semantics in this patch, I left it alone, and just updated the caller to use the new flag semantics. Signed-off-by: Linus Torvalds <>
2009-01-14[CVE-2009-0029] System call wrappers part 26Heiko Carstens1-11/+11
Signed-off-by: Heiko Carstens <>
2009-01-14[CVE-2009-0029] System call wrappers part 25Heiko Carstens3-11/+12
Signed-off-by: Heiko Carstens <>
2009-01-14[CVE-2009-0029] System call wrappers part 24Heiko Carstens1-6/+6
Signed-off-by: Heiko Carstens <>
2009-01-14[CVE-2009-0029] System call wrapper special casesHeiko Carstens1-1/+8
System calls with an unsigned long long argument can't be converted with the standard wrappers since that would include a cast to long, which in turn means that we would lose the upper 32 bit on 32 bit architectures. Also semctl can't use the standard wrapper since it has a 'union' parameter. So we handle them as special case and add some extra wrappers instead. Signed-off-by: Heiko Carstens <>
2009-01-14[CVE-2009-0029] Convert all system calls to return a longHeiko Carstens1-1/+1
Convert all system calls to return a long. This should be a NOP since all converted types should have the same size anyway. With the exception of sys_exit_group which returned void. But that doesn't matter since the system call doesn't return. Signed-off-by: Heiko Carstens <>
2009-01-09Merge git:// Torvalds1-0/+12
* git:// NOMMU: Support XIP on initramfs NOMMU: Teach kobjsize() about VMA regions. FLAT: Don't attempt to expand the userspace stack to fill the space allocated FDPIC: Don't attempt to expand the userspace stack to fill the space allocated NOMMU: Improve procfs output using per-MM VMAs NOMMU: Make mmap allocation page trimming behaviour configurable. NOMMU: Make VMAs per MM as for MMU-mode linux NOMMU: Delete askedalloc and realalloc variables NOMMU: Rename ARM's struct vm_region NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()
2009-01-08mqueue: fix si_pid value in mqueue do_notify()Sukadev Bhattiprolu1-1/+2
If a process registers for asynchronous notification on a POSIX message queue, it gets a signal and a siginfo_t structure when a message arrives on the message queue. The si_pid in the siginfo_t structure is set to the PID of the process that sent the message to the message queue. The principle is the following: . when mq_notify(SIGEV_SIGNAL) is called, the caller registers for notification when a msg arrives. The associated pid structure is stroed into inode_info->notify_owner. Let's call this process P1. . when mq_send() is called by say P2, P2 sends a signal to P1 to notify him about msg arrival. The way .si_pid is set today is not correct, since it doesn't take into account the fact that the process that is sending the message might not be in the same namespace as the notified one. This patch proposes to set si_pid to the sender's pid into the notify_owner namespace. Signed-off-by: Nadia Derbey <> Signed-off-by: Sukadev Bhattiprolu <> Acked-by: Oleg Nesterov <> Cc: Roland McGrath <> Cc: Bastian Blank <> Cc: Pavel Emelyanov <> Cc: Eric W. Biederman <> Acked-by: Serge Hallyn <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-01-08NOMMU: Make VMAs per MM as for MMU-mode linuxDavid Howells1-0/+12
Make VMAs per mm_struct as for MMU-mode linux. This solves two problems: (1) In SYSV SHM where nattch for a segment does not reflect the number of shmat's (and forks) done. (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an exec'ing process when VM_EXECUTABLE is specified, regardless of the fact that a VMA might be shared and already have its vm_mm assigned to another process or a dead process. A new struct (vm_region) is introduced to track a mapped region and to remember the circumstances under which it may be shared and the vm_list_struct structure is discarded as it's no longer required. This patch makes the following additional changes: (1) Regions are now allocated with alloc_pages() rather than kmalloc() and with no recourse to __GFP_COMP, so the pages are not composite. Instead, each page has a reference on it held by the region. Anything else that is interested in such a page will have to get a reference on it to retain it. When the pages are released due to unmapping, each page is passed to put_page() and will be freed when the page usage count reaches zero. (2) Excess pages are trimmed after an allocation as the allocation must be made as a power-of-2 quantity of pages. (3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may end up with overlapping VMAs within the tree, the VMA struct address is appended to the sort key. (4) Non-anonymous VMAs are now added to the backing inode's prio list. (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of the backing region. The VMA and region structs will be split if necessary. (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory segment instead of all the attachments at that addresss. Multiple shmat()'s return the same address under NOMMU-mode instead of different virtual addresses as under MMU-mode. (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode. (8) /proc/maps is now the global list of mapped regions, and may list bits that aren't actually mapped anywhere. (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount of RAM currently allocated by mmap to hold mappable regions that can't be mapped directly. These are copies of the backing device or file if not anonymous. These changes make NOMMU mode more similar to MMU mode. The downside is that NOMMU mode requires some extra memory to track things over NOMMU without this patch (VMAs are no longer shared, and there are now region structs). Signed-off-by: David Howells <> Tested-by: Mike Frysinger <> Acked-by: Paul Mundt <>
2009-01-06ipc/ipc_sysctl.c: move the definition of ipc_auto_callback()akpm@linux-foundation.org1-23/+23
proc_ipcauto_dointvec_minmax() is the only user of ipc_auto_callback(), since the former function is protected by CONFIG_PROC_FS, so should be the latter one. Just move its definition down. Signed-off-by: WANG Cong <> Cc: Eric Biederman <> Cc: Nadia Derbey <> Cc: Alexey Dobriyan <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-01-06ipc: do not goto to the next lineDenis V. Lunev1-1/+0
Signed-off-by: Denis V. Lunev <> Reviewed-by: WANG Cong <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-01-06ipc: clean up ipc/shm.cWANG Cong1-10/+5
Use the macro shm_ids(). Remove useless check for a userspace pointer, because copy_to_user() will check it. Some style cleanups. Signed-off-by: WANG Cong <> Cc: Nadia Derbey <> Cc: Pierre Peiffer <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2009-01-05Merge branch 'for-linus' of ↵Linus Torvalds1-1/+0
git:// * 'for-linus' of git:// inotify: fix type errors in interfaces fix breakage in reiserfs_new_inode() fix the treatment of jfs special inodes vfs: remove duplicate code in get_fs_type() add a vfs_fsync helper sys_execve and sys_uselib do not call into fsnotify zero i_uid/i_gid on inode allocation inode->i_op is never NULL ntfs: don't NULL i_op isofs check for NULL ->i_op in root directory is dead code affs: do not zero ->i_op kill suid bit only for regular files vfs: lseek(fd, 0, SEEK_CUR) race condition
2009-01-05mm: update my addressAlan Cox1-1/+1
Signed-off-by: Alan Cox <> Signed-off-by: Linus Torvalds <>
2009-01-05zero i_uid/i_gid on inode allocationAl Viro1-1/+0
... and don't bother in callers. Don't bother with zeroing i_blocks, while we are at it - it's already been zeroed. i_mode is not worth the effort; it has no common default value. Signed-off-by: Al Viro <>
2009-01-04sanitize audit_mq_open()Al Viro1-12/+11
* don't bother with allocations * don't do double copy_from_user() * don't duplicate parts of check for audit_dummy_context() Signed-off-by: Al Viro <>
2009-01-04sanitize AUDIT_MQ_SENDRECVAl Viro1-24/+30
* logging the original value of *msg_prio in mq_timedreceive(2) is insane - the argument is write-only (i.e. syscall always ignores the original value and only overwrites it). * merge __audit_mq_timed{send,receive} * don't do copy_from_user() twice * don't mess with allocations in auditsc part * ... and don't bother checking !audit_enabled and !context in there - we'd already checked for audit_dummy_context(). Signed-off-by: Al Viro <>
2009-01-04sanitize audit_mq_notify()Al Viro1-7/+7
* don't copy_from_user() twice * don't bother with allocations * don't duplicate parts of audit_dummy_context() * make it return void Signed-off-by: Al Viro <>
2009-01-04sanitize audit_mq_getsetattr()Al Viro1-5/+1
* get rid of allocations * make it return void * don't duplicate parts of audit_dummy_context() Signed-off-by: Al Viro <>
2009-01-04sanitize audit_ipc_set_perm()Al Viro1-7/+2
* get rid of allocations * make it return void * simplify callers Signed-off-by: Al Viro <>
2009-01-04sanitize audit_ipc_obj()Al Viro2-9/+4
* get rid of allocations * make it return void * simplify callers Signed-off-by: Al Viro <>
2008-12-04Merge branch 'master' into nextJames Morris1-5/+9
Conflicts: fs/nfsd/nfs4recover.c Manually fixed above to use new creds API functions, e.g. nfs4_save_creds(). Signed-off-by: James Morris <>
2008-11-19sysvipc: fix the ipc structures initializationNadia Derbey1-5/+9
A problem was found while reviewing the code after Bugzilla bug In ipc_addid(), the newly allocated ipc structure is inserted into the ipcs tree (i.e made visible to readers) without locking it. This is not correct since its initialization continues after it has been inserted in the tree. This patch moves the ipc structure lock initialization + locking before the actual insertion. Signed-off-by: Nadia Derbey <> Reported-by: Clement Calmels <> Cc: Manfred Spraul <> Cc: <> [2.6.27.x] Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2008-11-14CRED: Pass credentials through dentry_open()David Howells1-4/+7
Pass credentials through dentry_open() so that the COW creds patch can have SELinux's flush_unauthorized_files() pass the appropriate creds back to itself when it opens its null chardev. The security_dentry_open() call also now takes a creds pointer, as does the dentry_open hook in struct security_operations. Signed-off-by: David Howells <> Acked-by: James Morris <> Signed-off-by: James Morris <>
2008-11-14CRED: Wrap current->cred and a few other accessorsDavid Howells2-3/+3
Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: David Howells <> Acked-by: James Morris <> Acked-by: Serge Hallyn <> Signed-off-by: James Morris <>
2008-11-14CRED: Separate task security context from task_structDavid Howells2-3/+3
Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <> Signed-off-by: David Howells <> Acked-by: James Morris <> Acked-by: Serge Hallyn <> Signed-off-by: James Morris <>
2008-11-14CRED: Wrap task credential accesses in the SYSV IPC subsystemDavid Howells3-10/+19
Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <> Reviewed-by: James Morris <> Acked-by: Serge Hallyn <> Signed-off-by: James Morris <>
2008-10-21[PATCH] introduce fmode_t, do annotationsAl Viro1-1/+1
Signed-off-by: Al Viro <>
2008-10-20message queues: increase range limitsJoe Korty1-6/+14
Increase the range of various posix message queue limits. Posix gives the message queue user the ability to 'trade off' the maximum size of messages with the number of possible messages that can be 'in flight'. Linux currently makes this trade off more restrictive than it needs to be. In particular, the maximum message size today can be made no smaller than 8192. This greatly restricts those applications that would like to have the ability to post large numbers of very small messages. So this task lowers the limit that the maximum message size can be set to, from 8192 to 128. It also lowers the limit that the maximum #number of messages in flight can be set to, from 10 to 1. With these changes the message queue user can make better trade offs between #messages and message size, in order to get everything to fit within the setrlimit(RLIMIT_MSGQUEUE) limit for that particular user. This patch also applies the values in /proc/sys/fs/mqueue/msg_max /proc/sys/fs/mqueue/msgsize_max as the defaults for the max #messages allowed and the max message size allowed, respectively, for those applications that do not supply these. Previously, the defaults were hardwired to 10 and 8192, respectively. [ coding-style fixes] Signed-off-by: Joe Korty <> Cc: Al Viro <> Cc: Manfred Spraul <> Cc: Nadia Derbey <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2008-10-20SHM_LOCKED pages are unevictableLee Schermerhorn1-0/+4
Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be kept on the normal LRU, since scanning them is a waste of time and might throw off kswapd's balancing algorithms. Place them on the unevictable LRU list instead. Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared memory regions as unevictable. Then these pages will be culled off the normal LRU lists during vmscan. Add new wrapper function to clear the mapping's unevictable state when/if shared memory segment is munlocked. Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in the shmem segment's mapping [struct address_space] for evictability now that they're no longer locked. If so, move them to the appropriate zone lru list. Changes depend on [CONFIG_]UNEVICTABLE_LRU. [ revert shm change] Signed-off-by: Lee Schermerhorn <> Signed-off-by: Rik van Riel <> Signed-off-by: Kosaki Motohiro <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2008-10-16ipc/sem.c: make free_un() staticAdrian Bunk1-1/+1
Signed-off-by: Adrian Bunk <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2008-10-16sysctl: simplify ->strategyAlexey Dobriyan1-5/+4
name and nlen parameters passed to ->strategy hook are unused, remove them. In general ->strategy hook should know what it's doing, and don't do something tricky for which, say, pointer to original userspace array may be needed (name). Signed-off-by: Alexey Dobriyan <> Acked-by: David S. Miller <> [ networking bits ] Cc: Ralf Baechle <> Cc: David Howells <> Cc: Matt Mackall <> Cc: "Eric W. Biederman" <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2008-07-26[PATCH] kill nameidata passing to permission(), rename to inode_permission()Al Viro1-1/+1
Incidentally, the name that gives hundreds of false positives on grep is not a good idea... Signed-off-by: Al Viro <>
2008-07-26SL*B: drop kmem cache argument from constructorAlexey Dobriyan1-1/+1
Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <> Acked-by: Pekka Enberg <> Acked-by: Christoph Lameter <> Cc: Jon Tollefson <> Cc: Nick Piggin <> Cc: Matt Mackall <> [ fix arch/powerpc/mm/hugetlbpage.c] [ fix mm/slab.c] [ fix ubifs] Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2008-07-25ipc: do not use a negative value to re-enable msgmni automatic recomputingNadia Derbey2-18/+74
This patch proposes an alternative to the "magical positive-versus-negative number trick" Andrew complained about last week in This had been introduced with the patches that scale msgmni to the amount of lowmem. With these patches, msgmni has a registered notification routine that recomputes msgmni value upon memory add/remove or ipc namespace creation/ removal. When msgmni is changed from user space (i.e. value written to the proc file), that notification routine is unregistered, and the way to make it registered back is to write a negative value into the proc file. This is the "magical positive-versus-negative number trick". To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni. This file acts as ON/OFF for msgmni automatic recomputing. With this patch, the process is the following: 1) kernel boots in "automatic recomputing mode" /proc/sys/kernel/msgmni contains the value that has been computed (depends on lowmem) /proc/sys/kernel/automatic_msgmni contains "1" 2) echo <val> > /proc/sys/kernel/msgmni . sets msg_ctlmni to <val> . de-activates automatic recomputing (i.e. if, say, some memory is added msgmni won't be recomputed anymore) . /proc/sys/kernel/automatic_msgmni now contains "0" 3) echo "0" > /proc/sys/kernel/automatic_msgmni . de-activates msgmni automatic recomputing this has the same effect as 2) except that msg_ctlmni's value stays blocked at its current value) 3) echo "1" > /proc/sys/kernel/automatic_msgmni . recomputes msgmni's value based on the current available memory size and number of ipc namespaces . re-activates automatic recomputing for msgmni. Signed-off-by: Nadia Derbey <> Cc: Solofo Ramangalahy <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>