path: root/arch/sparc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-05-31Merge git:// Torvalds8-93/+200
Pull sparc fixes from David Miller: "sparc64 mmu context allocation and trap return bug fixes" * git:// sparc64: Fix return from trap window fill crashes. sparc: Harden signal return frame checks. sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
2016-05-29sparc64: Fix return from trap window fill crashes.David S. Miller3-48/+108
We must handle data access exception as well as memory address unaligned exceptions from return from trap window fill faults, not just normal TLB misses. Otherwise we can get an OOPS that looks like this: Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: <do_sparc64_fault+0x5c4/0x700> g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: <do_sparc64_fault+0x5bc/0x700> l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: <user_rtt_fill_fixup+0x6c/0x7c> Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c The window trap handlers are slightly clever, the trap table entries for them are composed of two pieces of code. First comes the code that actually performs the window fill or spill trap handling, and then there are three instructions at the end which are for exception processing. The userland register window fill handler is: add %sp, STACK_BIAS + 0x00, %g1; \ ldxa [%g1 + %g0] ASI, %l0; \ mov 0x08, %g2; \ mov 0x10, %g3; \ ldxa [%g1 + %g2] ASI, %l1; \ mov 0x18, %g5; \ ldxa [%g1 + %g3] ASI, %l2; \ ldxa [%g1 + %g5] ASI, %l3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %l4; \ ldxa [%g1 + %g2] ASI, %l5; \ ldxa [%g1 + %g3] ASI, %l6; \ ldxa [%g1 + %g5] ASI, %l7; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i0; \ ldxa [%g1 + %g2] ASI, %i1; \ ldxa [%g1 + %g3] ASI, %i2; \ ldxa [%g1 + %g5] ASI, %i3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i4; \ ldxa [%g1 + %g2] ASI, %i5; \ ldxa [%g1 + %g3] ASI, %i6; \ ldxa [%g1 + %g5] ASI, %i7; \ restored; \ retry; nop; nop; nop; nop; \ b,a,pt %xcc, fill_fixup_dax; \ b,a,pt %xcc, fill_fixup_mna; \ b,a,pt %xcc, fill_fixup; And the way this works is that if any of those memory accesses generate an exception, the exception handler can revector to one of those final three branch instructions depending upon which kind of exception the memory access took. In this way, the fault handler doesn't have to know if it was a spill or a fill that it's handling the fault for. It just always branches to the last instruction in the parent trap's handler. For example, for a regular fault, the code goes: winfix_trampoline: rdpr %tpc, %g3 or %g3, 0x7c, %g3 wrpr %g3, %tnpc done All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the trap time program counter, we'll get that final instruction in the trap handler. On return from trap, we have to pull the register window in but we do this by hand instead of just executing a "restore" instruction for several reasons. The largest being that from Niagara and onward we simply don't have enough levels in the trap stack to fully resolve all possible exception cases of a window fault when we are already at trap level 1 (which we enter to get ready to return from the original trap). This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's code branches directly to these to do the window fill by hand if necessary. Now if you look at them, we'll see at the end: ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; And oops, all three cases are handled like a fault. This doesn't work because each of these trap types (data access exception, memory address unaligned, and faults) store their auxiliary info in different registers to pass on to the C handler which does the real work. So in the case where the stack was unaligned, the unaligned trap handler sets up the arg registers one way, and then we branched to the fault handler which expects them setup another way. So the FAULT_TYPE_* value ends up basically being garbage, and randomly would generate the backtrace seen above. Reported-by: Nick Alcock <> Signed-off-by: David S. Miller <>
2016-05-29sparc: Harden signal return frame checks.David S. Miller5-45/+92
All signal frames must be at least 16-byte aligned, because that is the alignment we explicitly create when we build signal return stack frames. All stack pointers must be at least 8-byte aligned. Signed-off-by: David S. Miller <>
2016-05-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-7/+7
git:// Pull perf updates from Ingo Molnar: "Mostly tooling and PMU driver fixes, but also a number of late updates such as the reworking of the call-chain size limiting logic to make call-graph recording more robust, plus tooling side changes for the new 'backwards ring-buffer' extension to the perf ring-buffer" * 'perf-urgent-for-linus' of git:// (34 commits) perf record: Read from backward ring buffer perf record: Rename variable to make code clear perf record: Prevent reading invalid data in record__mmap_read perf evlist: Add API to pause/resume perf trace: Use the ptr->name beautifier as default for "filename" args perf trace: Use the fd->name beautifier as default for "fd" args perf report: Add srcline_from/to branch sort keys perf evsel: Record fd into perf_mmap perf evsel: Add overwrite attribute and check write_backward perf tools: Set buildid dir under symfs when --symfs is provided perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced perf annotate: Sort list of recognised instructions perf annotate: Fix identification of ARM blt and bls instructions perf tools: Fix usage of max_stack sysctl perf callchain: Stop validating callchains by the max_stack sysctl perf trace: Fix exit_group() formatting perf top: Use machine->kptr_restrict_warned perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1 perf machine: Do not bail out if not managing to read ref reloc symbol perf/x86/intel/p4: Trival indentation fix, remove space ...
2016-05-22Merge git:// Torvalds4-15/+11
Pull sparc updates from David Miller: "Some 32-bit kgdb cleanups from Sam Ravnborg, and a hugepage TLB flush overhead fix on 64-bit from Nitin Gupta" * git:// sparc64: Reduce TLB flushes during hugepte changes aeroflex/greth: fix warning about unused variable openprom: fix warning sparc32: drop superfluous cast in calls to __nocache_pa() sparc32: fix build with STRICT_MM_TYPECHECKS sparc32: use proper prototype for trapbase sparc32: drop local prototype in kgdb_32 sparc32: drop hardcoding trap_level in kgdb_trap
2016-05-20exit_thread: accept a task parameter to be exitedJiri Slaby2-8/+8
We need to call exit_thread from copy_process in a fail path. So make it accept task_struct as a parameter. [v2] * s390: exit_thread_runtime_instr doesn't make sense to be called for non-current tasks. * arm: fix the comment in vfp_thread_copy * change 'me' to 'tsk' for task_struct * now we can change only archs that actually have exit_thread [ coding-style fixes] Signed-off-by: Jiri Slaby <> Cc: "David S. Miller" <> Cc: "H. Peter Anvin" <> Cc: "James E.J. Bottomley" <> Cc: Aurelien Jacquiot <> Cc: Benjamin Herrenschmidt <> Cc: Catalin Marinas <> Cc: Chen Liqin <> Cc: Chris Metcalf <> Cc: Chris Zankel <> Cc: David Howells <> Cc: Fenghua Yu <> Cc: Geert Uytterhoeven <> Cc: Guan Xuetao <> Cc: Haavard Skinnemoen <> Cc: Hans-Christian Egtvedt <> Cc: Heiko Carstens <> Cc: Helge Deller <> Cc: Ingo Molnar <> Cc: Ivan Kokshaysky <> Cc: James Hogan <> Cc: Jeff Dike <> Cc: Jesper Nilsson <> Cc: Jiri Slaby <> Cc: Jonas Bonn <> Cc: Koichi Yasutake <> Cc: Lennox Wu <> Cc: Ley Foon Tan <> Cc: Mark Salter <> Cc: Martin Schwidefsky <> Cc: Matt Turner <> Cc: Max Filippov <> Cc: Michael Ellerman <> Cc: Michal Simek <> Cc: Mikael Starvik <> Cc: Paul Mackerras <> Cc: Peter Zijlstra <> Cc: Ralf Baechle <> Cc: Rich Felker <> Cc: Richard Henderson <> Cc: Richard Kuo <> Cc: Richard Weinberger <> Cc: Russell King <> Cc: Steven Miao <> Cc: Thomas Gleixner <> Cc: Tony Luck <> Cc: Vineet Gupta <> Cc: Will Deacon <> Cc: Yoshinori Sato <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2016-05-20sparc32: use proper prototype for trapbaseSam Ravnborg3-5/+3
This killed an extern ... in a .c file. No functional change. Signed-off-by: Sam Ravnborg <> Signed-off-by: David S. Miller <>
2016-05-20sparc32: drop local prototype in kgdb_32Sam Ravnborg1-2/+2
Signed-off-by: Sam Ravnborg <> Signed-off-by: David S. Miller <>
2016-05-20sparc32: drop hardcoding trap_level in kgdb_trapSam Ravnborg2-9/+7
Fix this so we pass the trap_level from the actual trap code like we do in sparc64. Add use on ENTRY(), ENDPROC() in the assembler function too. This fixes a bug where the hardcoded value for trap_level was the sparc64 value. As the generic code does not use the trap_level argument (for sparc32) - this patch does not have any functional impact. Signed-off-by: Sam Ravnborg <> Signed-off-by: David S. Miller <>
2016-05-16perf core: Add a 'nr' field to perf_event_callchain_contextArnaldo Carvalho de Melo1-3/+3
We will use it to count how many addresses are in the entry->ip[] array, excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really return the number of entries specified by the user via the relevant sysctl, kernel.perf_event_max_contexts, or via the per event perf_event_attr.sample_max_stack knob. This way we keep the perf_sample->ip_callchain->nr meaning, that is the number of entries, be it real addresses or PERF_CONTEXT_ entries, while honouring the max_stack knobs, i.e. the end result will be max_stack entries if we have at least that many entries in a given stack trace. Cc: David Ahern <> Cc: Frederic Weisbecker <> Cc: Jiri Olsa <> Cc: Namhyung Kim <> Cc: Peter Zijlstra <> Link: Signed-off-by: Arnaldo Carvalho de Melo <>
2016-05-16perf core: Pass max stack as a perf_callchain_entry contextArnaldo Carvalho de Melo1-7/+7
This makes perf_callchain_{user,kernel}() receive the max stack as context for the perf_callchain_entry, instead of accessing the global sysctl_perf_event_max_stack. Cc: Adrian Hunter <> Cc: Alexander Shishkin <> Cc: Alexei Starovoitov <> Cc: Brendan Gregg <> Cc: David Ahern <> Cc: Frederic Weisbecker <> Cc: He Kuang <> Cc: Jiri Olsa <> Cc: Linus Torvalds <> Cc: Masami Hiramatsu <> Cc: Milian Wolff <> Cc: Namhyung Kim <> Cc: Peter Zijlstra <> Cc: Stephane Eranian <> Cc: Thomas Gleixner <> Cc: Vince Weaver <> Cc: Wang Nan <> Cc: Zefan Li <> Link: Signed-off-by: Arnaldo Carvalho de Melo <>
2016-05-11Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar15-65/+112
Signed-off-by: Ingo Molnar <>
2016-04-27sparc64: Fix bootup regressions on some Kconfig combinations.David S. Miller8-55/+34
The system call tracing bug fix mentioned in the Fixes tag below increased the amount of assembler code in the sequence of assembler files included by head_64.S This caused to total set of code to exceed 0x4000 bytes in size, which overflows the expression in head_64.S that works to place swapper_tsb at address 0x408000. When this is violated, the TSB is not properly aligned, and also the trap table is not aligned properly either. All of this together results in failed boots. So, do two things: 1) Simplify some code by using ba,a instead of ba/nop to get those bytes back. 2) Add a linker script assertion to make sure that if this happens again the build will fail. Fixes: 1a40b95374f6 ("sparc: Fix system call tracing register handling.") Reported-by: Meelis Roos <> Reported-by: Joerg Abraham <> Signed-off-by: David S. Miller <>
2016-04-27perf core: Allow setting up max frame stack depth via sysctlArnaldo Carvalho de Melo1-3/+3
The default remains 127, which is good for most cases, and not even hit most of the time, but then for some cases, as reported by Brendan, 1024+ deep frames are appearing on the radar for things like groovy, ruby. And in some workloads putting a _lower_ cap on this may make sense. One that is per event still needs to be put in place tho. The new file is: # cat /proc/sys/kernel/perf_event_max_stack 127 Chaging it: # echo 256 > /proc/sys/kernel/perf_event_max_stack # cat /proc/sys/kernel/perf_event_max_stack 256 But as soon as there is some event using callchains we get: # echo 512 > /proc/sys/kernel/perf_event_max_stack -bash: echo: write error: Device or resource busy # Because we only allocate the callchain percpu data structures when there is a user, which allows for changing the max easily, its just a matter of having no callchain users at that point. Reported-and-Tested-by: Brendan Gregg <> Reviewed-by: Frederic Weisbecker <> Acked-by: Alexei Starovoitov <> Acked-by: David Ahern <> Cc: Adrian Hunter <> Cc: Alexander Shishkin <> Cc: He Kuang <> Cc: Jiri Olsa <> Cc: Linus Torvalds <> Cc: Masami Hiramatsu <> Cc: Milian Wolff <> Cc: Namhyung Kim <> Cc: Peter Zijlstra <> Cc: Stephane Eranian <> Cc: Thomas Gleixner <> Cc: Vince Weaver <> Cc: Wang Nan <> Cc: Zefan Li <> Link: Signed-off-by: Arnaldo Carvalho de Melo <>
2016-04-21sparc64: recognize and support Sonoma CPU typeKhalid Aziz4-1/+21
Add code to recognize SPARC-Sonoma cpu correctly and update cpu hardware caps and cpu distribution map. SPARC-Sonoma is based upon SPARC-M7 core along with additional PCI functions added on and is reported by firmware as "SPARC-SN". Signed-off-by: Khalid Aziz <> Acked-by: Allen Pais <> Signed-off-by: David S. Miller <>
2016-04-21sparc: Implement and wire up vio_hotplug for vio.Adrian Glaubitz1-0/+9
Signed-off-by: John Paul Adrian Glaubitz <> Acked-by: Sam Ravnborg <> Signed-off-by: David S. Miller <>
2016-04-21sparc: Implement and wire up modalias_show for vio.Adrian Glaubitz1-0/+9
Signed-off-by: John Paul Adrian Glaubitz <> Acked-by: Sam Ravnborg <> Signed-off-by: David S. Miller <>
2016-04-21sparc/pci: Refactor dev_archdata initialization into pci_init_dev_archdataSowmini Varadhan1-8/+21
The function pcibios_add_device() added by commit d0c31e020057 ("sparc/PCI: Fix for panic while enabling SR-IOV") initializes the dev_archdata by doing a memcpy from the PF. This has the problem that it erroneously copies the OF device without explicitly refcounting it. As David Miller pointed out: "Generally speaking we don't really support hot-plug for OF probed devices, but if we did all of the device tree pointers have to be refcounted properly." To fix this error, and also avoid code duplication, this patch creates a new helper function, pci_init_dev_archdata(), that initializes the fields in dev_archdata, and can be invoked by callers after they have taken the needed refcounts Signed-off-by: Sowmini Varadhan <> Tested-by: Babu Moger <> Reviewed-by: Khalid Aziz <> Signed-off-by: David S. Miller <>
2016-03-29sparc: Write up preadv2/pwritev2 syscalls.David S. Miller2-3/+3
Signed-off-by: David S. Miller <>
2016-03-29sparc/PCI: Fix for panic while enabling SR-IOVBabu Moger1-0/+17
We noticed this panic while enabling SR-IOV in sparc. mlx4_core: Mellanox ConnectX core driver v2.2-1 (Jan 1 2015) mlx4_core: Initializing 0007:01:00.0 mlx4_core 0007:01:00.0: Enabling SR-IOV with 5 VFs mlx4_core: Initializing 0007:01:00.1 Unable to handle kernel NULL pointer dereference insmod(10010): Oops [#1] CPU: 391 PID: 10010 Comm: insmod Not tainted 4.1.12-32.el6uek.kdump2.sparc64 #1 TPC: <dma_supported+0x20/0x80> I7: <__mlx4_init_one+0x324/0x500 [mlx4_core]> Call Trace: [00000000104c5ea4] __mlx4_init_one+0x324/0x500 [mlx4_core] [00000000104c613c] mlx4_init_one+0xbc/0x120 [mlx4_core] [0000000000725f14] local_pci_probe+0x34/0xa0 [0000000000726028] pci_call_probe+0xa8/0xe0 [0000000000726310] pci_device_probe+0x50/0x80 [000000000079f700] really_probe+0x140/0x420 [000000000079fa24] driver_probe_device+0x44/0xa0 [000000000079fb5c] __device_attach+0x3c/0x60 [000000000079d85c] bus_for_each_drv+0x5c/0xa0 [000000000079f588] device_attach+0x88/0xc0 [000000000071acd0] pci_bus_add_device+0x30/0x80 [0000000000736090] virtfn_add.clone.1+0x210/0x360 [00000000007364a4] sriov_enable+0x2c4/0x520 [000000000073672c] pci_enable_sriov+0x2c/0x40 [00000000104c2d58] mlx4_enable_sriov+0xf8/0x180 [mlx4_core] [00000000104c49ac] mlx4_load_one+0x42c/0xd40 [mlx4_core] Disabling lock debugging due to kernel taint Caller[00000000104c5ea4]: __mlx4_init_one+0x324/0x500 [mlx4_core] Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core] Caller[0000000000725f14]: local_pci_probe+0x34/0xa0 Caller[0000000000726028]: pci_call_probe+0xa8/0xe0 Caller[0000000000726310]: pci_device_probe+0x50/0x80 Caller[000000000079f700]: really_probe+0x140/0x420 Caller[000000000079fa24]: driver_probe_device+0x44/0xa0 Caller[000000000079fb5c]: __device_attach+0x3c/0x60 Caller[000000000079d85c]: bus_for_each_drv+0x5c/0xa0 Caller[000000000079f588]: device_attach+0x88/0xc0 Caller[000000000071acd0]: pci_bus_add_device+0x30/0x80 Caller[0000000000736090]: virtfn_add.clone.1+0x210/0x360 Caller[00000000007364a4]: sriov_enable+0x2c4/0x520 Caller[000000000073672c]: pci_enable_sriov+0x2c/0x40 Caller[00000000104c2d58]: mlx4_enable_sriov+0xf8/0x180 [mlx4_core] Caller[00000000104c49ac]: mlx4_load_one+0x42c/0xd40 [mlx4_core] Caller[00000000104c5f90]: __mlx4_init_one+0x410/0x500 [mlx4_core] Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core] Caller[0000000000725f14]: local_pci_probe+0x34/0xa0 Caller[0000000000726028]: pci_call_probe+0xa8/0xe0 Caller[0000000000726310]: pci_device_probe+0x50/0x80 Caller[000000000079f700]: really_probe+0x140/0x420 Caller[000000000079fa24]: driver_probe_device+0x44/0xa0 Caller[000000000079fb08]: __driver_attach+0x88/0xa0 Caller[000000000079d90c]: bus_for_each_dev+0x6c/0xa0 Caller[000000000079f29c]: driver_attach+0x1c/0x40 Caller[000000000079e35c]: bus_add_driver+0x17c/0x220 Caller[00000000007a02d4]: driver_register+0x74/0x120 Caller[00000000007263fc]: __pci_register_driver+0x3c/0x60 Caller[00000000104f62bc]: mlx4_init+0x60/0xcc [mlx4_core] Kernel panic - not syncing: Fatal exception Press Stop-A (L1-A) to return to the boot prom ---[ end Kernel panic - not syncing: Fatal exception Details: Here is the call sequence virtfn_add->__mlx4_init_one->dma_set_mask->dma_supported The panic happened at line 760(file arch/sparc/kernel/iommu.c) 758 int dma_supported(struct device *dev, u64 device_mask) 759 { 760 struct iommu *iommu = dev->archdata.iommu; 761 u64 dma_addr_mask = iommu->dma_addr_mask; 762 763 if (device_mask >= (1UL << 32UL)) 764 return 0; 765 766 if ((device_mask & dma_addr_mask) == dma_addr_mask) 767 return 1; 768 769 #ifdef CONFIG_PCI 770 if (dev_is_pci(dev)) 771 return pci64_dma_supported(to_pci_dev(dev), device_mask); 772 #endif 773 774 return 0; 775 } 776 EXPORT_SYMBOL(dma_supported); Same panic happened with Intel ixgbe driver also. SR-IOV code looks for arch specific data while enabling VFs. When VF device is added, driver probe function makes set of calls to initialize the pci device. Because the VF device is added different way than the normal PF device(which happens via of_create_pci_dev for sparc), some of the arch specific initialization does not happen for VF device. That causes panic when archdata is accessed. To fix this, I have used already defined weak function pcibios_setup_device to copy archdata from PF to VF. Also verified the fix. Signed-off-by: Babu Moger <> Signed-off-by: Sowmini Varadhan <> Reviewed-by: Ethan Zhao <> Signed-off-by: David S. Miller <>
2016-03-28Merge git:// Torvalds13-32/+32
Pull sparc fixes from David Miller: "Minor typing cleanup from Joe Perches, and some comment typo fixes from Adam Buchbinder" * git:// sparc: Convert naked unsigned uses to unsigned int sparc: Fix misspellings in comments.
2016-03-25arch, ftrace: for KASAN put hard/soft IRQ entries into separate sectionsAlexander Potapenko1-0/+1
KASAN needs to know whether the allocation happens in an IRQ handler. This lets us strip everything below the IRQ entry point to reduce the number of unique stack traces needed to be stored. Move the definition of __irq_entry to <linux/interrupt.h> so that the users don't need to pull in <linux/ftrace.h>. Also introduce the __softirq_entry macro which is similar to __irq_entry, but puts the corresponding functions to the .softirqentry.text section. Signed-off-by: Alexander Potapenko <> Acked-by: Steven Rostedt <> Cc: Christoph Lameter <> Cc: Pekka Enberg <> Cc: David Rientjes <> Cc: Joonsoo Kim <> Cc: Andrey Konovalov <> Cc: Dmitry Vyukov <> Cc: Andrey Ryabinin <> Cc: Konstantin Serebryany <> Cc: Dmitry Chernenkov <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2016-03-20sparc: Convert naked unsigned uses to unsigned intJoe Perches9-26/+26
Use the more normal kernel definition/declaration style. Done via: $ git ls-files arch/sparc | \ xargs ./scripts/ -f --fix-inplace --types=unspecified_int Signed-off-by: Joe Perches <> Signed-off-by: David S. Miller <>
2016-03-20sparc: Fix misspellings in comments.Adam Buchbinder4-6/+6
Signed-off-by: Adam Buchbinder <> Reviewed-by: Julian Calaby <> Signed-off-by: David S. Miller <>
2016-03-15Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds2-2/+2
git:// Pull cpu hotplug updates from Thomas Gleixner: "This is the first part of the ongoing cpu hotplug rework: - Initial implementation of the state machine - Runs all online and prepare down callbacks on the plugged cpu and not on some random processor - Replaces busy loop waiting with completions - Adds tracepoints so the states can be followed" More detailed commentary on this work from an earlier email: "What's wrong with the current cpu hotplug infrastructure? - Asymmetry The hotplug notifier mechanism is asymmetric versus the bringup and teardown. This is mostly caused by the notifier mechanism. - Largely undocumented dependencies While some notifiers use explicitely defined notifier priorities, we have quite some notifiers which use numerical priorities to express dependencies without any documentation why. - Control processor driven Most of the bringup/teardown of a cpu is driven by a control processor. While it is understandable, that preperatory steps, like idle thread creation, memory allocation for and initialization of essential facilities needs to be done before a cpu can boot, there is no reason why everything else must run on a control processor. Before this patch series, bringup looks like this: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu bring the rest up - All or nothing approach There is no way to do partial bringups. That's something which is really desired because we waste e.g. at boot substantial amount of time just busy waiting that the cpu comes to life. That's stupid as we could very well do preparatory steps and the initial IPI for other cpus and then go back and do the necessary low level synchronization with the freshly booted cpu. - Minimal debuggability Due to the notifier based design, it's impossible to switch between two stages of the bringup/teardown back and forth in order to test the correctness. So in many hotplug notifiers the cancel mechanisms are either not existant or completely untested. - Notifier [un]registering is tedious To [un]register notifiers we need to protect against hotplug at every callsite. There is no mechanism that bringup/teardown callbacks are issued on the online cpus, so every caller needs to do it itself. That also includes error rollback. What's the new design? The base of the new design is a symmetric state machine, where both the control processor and the booting/dying cpu execute a well defined set of states. Each state is symmetric in the end, except for some well defined exceptions, and the bringup/teardown can be stopped and reversed at almost all states. So the bringup of a cpu will look like this in the future: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu bring itself up The synchronization step does not require the control cpu to wait. That mechanism can be done asynchronously via a worker or some other mechanism. The teardown can be made very similar, so that the dying cpu cleans up and brings itself down. Cleanups which need to be done after the cpu is gone, can be scheduled asynchronously as well. There is a long way to this, as we need to refactor the notion when a cpu is available. Today we set the cpu online right after it comes out of the low level bringup, which is not really correct. The proper mechanism is to set it to available, i.e. cpu local threads, like softirqd, hotplug thread etc. can be scheduled on that cpu, and once it finished all booting steps, it's set to online, so general workloads can be scheduled on it. The reverse happens on teardown. First thing to do is to forbid scheduling of general workloads, then teardown all the per cpu resources and finally shut it off completely. This patch series implements the basic infrastructure for this at the core level. This includes the following: - Basic state machine implementation with well defined states, so ordering and prioritization can be expressed. - Interfaces to [un]register state callbacks This invokes the bringup/teardown callback on all online cpus with the proper protection in place and [un]installs the callbacks in the state machine array. For callbacks which have no particular ordering requirement we have a dynamic state space, so that drivers don't have to register an explicit hotplug state. If a callback fails, the code automatically does a rollback to the previous state. - Sysfs interface to drive the state machine to a particular step. This is only partially functional today. Full functionality and therefor testability will be achieved once we converted all existing hotplug notifiers over to the new scheme. - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying processor: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu wait for boot bring itself up Signal completion to control cpu In a previous step of this work we've done a full tree mechanical conversion of all hotplug notifiers to the new scheme. The balance is a net removal of about 4000 lines of code. This is not included in this series, as we decided to take a different approach. Instead of mechanically converting everything over, we will do a proper overhaul of the usage sites one by one so they nicely fit into the symmetric callback scheme. I decided to do that after I looked at the ugliness of some of the converted sites and figured out that their hotplug mechanism is completely buggered anyway. So there is no point to do a mechanical conversion first as we need to go through the usage sites one by one again in order to achieve a full symmetric and testable behaviour" * 'smp-hotplug-for-linus' of git:// (23 commits) cpu/hotplug: Document states better cpu/hotplug: Fix smpboot thread ordering cpu/hotplug: Remove redundant state check cpu/hotplug: Plug death reporting race rcu: Make CPU_DYING_IDLE an explicit call cpu/hotplug: Make wait for dead cpu completion based cpu/hotplug: Let upcoming cpu bring itself fully up arch/hotplug: Call into idle with a proper state cpu/hotplug: Move online calls to hotplugged cpu cpu/hotplug: Create hotplug threads cpu/hotplug: Split out the state walk into functions cpu/hotplug: Unpark smpboot threads from the state machine cpu/hotplug: Move scheduler cpu_online notifier to hotplug core cpu/hotplug: Implement setup/removal interface cpu/hotplug: Make target state writeable cpu/hotplug: Add sysfs state interface cpu/hotplug: Hand in target state to _cpu_up/down cpu/hotplug: Convert the hotplugged cpu work to a state machine cpu/hotplug: Convert to a state machine for the control processor cpu/hotplug: Add tracepoints ...
2016-03-10Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar7-5/+60
Signed-off-by: Ingo Molnar <>
2016-03-01arch/hotplug: Call into idle with a proper stateThomas Gleixner2-2/+2
Let the non boot cpus call into idle with the corresponding hotplug state, so the hotplug core can handle the further bringup. That's a first step to convert the boot side of the hotplugged cpus to do all the synchronization with the other side through the state machine. For now it'll only start the hotplug thread and kick the full bringup of the cpu. Signed-off-by: Thomas Gleixner <> Cc: Cc: Rik van Riel <> Cc: Rafael Wysocki <> Cc: "Srivatsa S. Bhat" <> Cc: Peter Zijlstra <> Cc: Arjan van de Ven <> Cc: Sebastian Siewior <> Cc: Rusty Russell <> Cc: Steven Rostedt <> Cc: Oleg Nesterov <> Cc: Tejun Heo <> Cc: Andrew Morton <> Cc: Paul McKenney <> Cc: Linus Torvalds <> Cc: Paul Turner <> Link: Signed-off-by: Thomas Gleixner <>
2016-03-01Merge git:// Torvalds7-5/+60
Pull sparc fixes from David Miller: 1) System call tracing doesn't handle register contents properly across the trace. From Mike Frysinger. 2) Hook up copy_file_range 3) Build fix for 32-bit with newer tools. 4) New sun4v watchdog driver, from Wim Coekaerts. 5) Set context system call has to allow for servicable faults when we flush the register windows to memory * git:// sparc64: Fix sparc64_set_context stack handling. sparc32: Add -Wa,-Av8 to KBUILD_CFLAGS. Add sun4v_wdt watchdog driver sparc: Fix system call tracing register handling. sparc: Hook up copy_file_range syscall.
2016-03-01sparc64: Fix sparc64_set_context stack handling.David S. Miller1-1/+1
Like a signal return, we should use synchronize_user_stack() rather than flush_user_windows(). Reported-by: Ilya Malakhov <> Signed-off-by: David S. Miller <>
2016-02-29Merge tag 'v4.5-rc6' into locking/core, to pick up fixesIngo Molnar1-1/+1
Signed-off-by: Ingo Molnar <>
2016-02-27mm: ASLR: use get_random_long()Daniel Cashman1-1/+1
Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <> Acked-by: Kees Cook <> Cc: "Theodore Ts'o" <> Cc: Arnd Bergmann <> Cc: Greg Kroah-Hartman <> Cc: Catalin Marinas <> Cc: Will Deacon <> Cc: Ralf Baechle <> Cc: Benjamin Herrenschmidt <> Cc: Paul Mackerras <> Cc: Michael Ellerman <> Cc: David S. Miller <> Cc: Thomas Gleixner <> Cc: Ingo Molnar <> Cc: H. Peter Anvin <> Cc: Al Viro <> Cc: Nick Kralevich <> Cc: Jeff Vander Stoep <> Cc: Mark Salyzyn <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2016-02-09locking/lockdep: Eliminate lockdep_init()Andrey Ryabinin1-8/+0
Lockdep is initialized at compile time now. Get rid of lockdep_init(). Signed-off-by: Andrey Ryabinin <> Signed-off-by: Andrew Morton <> Cc: Linus Torvalds <> Cc: Mike Krinkin <> Cc: Paul E. McKenney <> Cc: Peter Zijlstra <> Cc: Thomas Gleixner <> Cc: Cc: Signed-off-by: Ingo Molnar <>
2016-01-31Add sun4v_wdt watchdog driverwim.coekaerts@oracle.com2-1/+3
This driver adds sparc hypervisor watchdog support. The default timeout is 60 seconds and the range is between 1 and 31536000 seconds. Both watchdog-resolution and watchdog-max-timeout MD properties settings are supported. Signed-off-by: Wim Coekaerts <> Reviewed-by: Julian Calaby <> Reviewed-by: Guenter Roeck <> Signed-off-by: David S. Miller <>
2016-01-21sparc: Fix system call tracing register handling.Mike Frysinger2-0/+53
A system call trace trigger on entry allows the tracing process to inspect and potentially change the traced process's registers. Account for that by reloading the %g1 (syscall number) and %i0-%i5 (syscall argument) values. We need to be careful to revalidate the range of %g1, and reload the system call table entry it corresponds to into %l7. Reported-by: Mike Frysinger <> Signed-off-by: David S. Miller <> Tested-by: Mike Frysinger <>
2016-01-21sparc: Hook up copy_file_range syscall.David S. Miller2-3/+3
Signed-off-by: David S. Miller <>
2016-01-14sparc64: fix incorrect sign extension in sys_sparc64_personalityDmitry V. Levin1-1/+1
The value returned by sys_personality has type "long int". It is saved to a variable of type "int", which is not a problem yet because the type of task_struct->pesonality is "unsigned int". The problem is the sign extension from "int" to "long int" that happens on return from sys_sparc64_personality. For example, a userspace call personality((unsigned) -EINVAL) will result to any subsequent personality call, including absolutely harmless read-only personality(0xffffffff) call, failing with errno set to EINVAL. Signed-off-by: Dmitry V. Levin <> Cc: <> Signed-off-by: David S. Miller <>
2016-01-12Merge git:// Torvalds1-0/+7
Pull networking updates from Davic Miller: 1) Support busy polling generically, for all NAPI drivers. From Eric Dumazet. 2) Add byte/packet counter support to nft_ct, from Floriani Westphal. 3) Add RSS/XPS support to mvneta driver, from Gregory Clement. 4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes Frederic Sowa. 5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai. 6) Add support for VLAN device bridging to mlxsw switch driver, from Ido Schimmel. 7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski. 8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko. 9) Reorganize wireless drivers into per-vendor directories just like we do for ethernet drivers. From Kalle Valo. 10) Provide a way for administrators "destroy" connected sockets via the SOCK_DESTROY socket netlink diag operation. From Lorenzo Colitti. 11) Add support to add/remove multicast routes via netlink, from Nikolay Aleksandrov. 12) Make TCP keepalive settings per-namespace, from Nikolay Borisov. 13) Add forwarding and packet duplication facilities to nf_tables, from Pablo Neira Ayuso. 14) Dead route support in MPLS, from Roopa Prabhu. 15) TSO support for thunderx chips, from Sunil Goutham. 16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon. 17) Rationalize, consolidate, and more completely document the checksum offloading facilities in the networking stack. From Tom Herbert. 18) Support aborting an ongoing scan in mac80211/cfg80211, from Vidyullatha Kanchanapally. 19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming. * git:// (1375 commits) net: bnxt: always return values from _bnxt_get_max_rings net: bpf: reject invalid shifts phonet: properly unshare skbs in phonet_rcv() dwc_eth_qos: Fix dma address for multi-fragment skbs phy: remove an unneeded condition mdio: remove an unneed condition mdio_bus: NULL dereference on allocation error net: Fix typo in netdev_intersect_features net: freescale: mac-fec: Fix build error from phy_device API change net: freescale: ucc_geth: Fix build error from phy_device API change bonding: Prevent IPv6 link local address on enslaved devices IB/mlx5: Add flow steering support net/mlx5_core: Export flow steering API net/mlx5_core: Make ipv4/ipv6 location more clear net/mlx5_core: Enable flow steering support for the IB driver net/mlx5_core: Initialize namespaces only when supported by device net/mlx5_core: Set priority attributes net/mlx5_core: Connect flow tables net/mlx5_core: Introduce modify flow table command net/mlx5_core: Managing root flow table ...
2016-01-08Merge branch 'for-linus' into work.miscAl Viro2-17/+20
2016-01-06net: Add eth_platform_get_mac_address() helper.David S. Miller1-0/+7
A repeating pattern in drivers has become to use OF node information and, if not found, platform specific host information to extract the ethernet address for a given device. Currently this is done with a call to of_get_mac_address() and then some ifdef'd stuff for SPARC. Consolidate this into a portable routine, and provide the arch_get_platform_mac_address() weak function hook for all architectures to implement if they want. Signed-off-by: David S. Miller <>
2016-01-04Merge branch 'memdup_user_nul' into work.miscAl Viro6-9/+40
2015-12-31sparc: Wire up mlock2 system call.David S. Miller2-3/+3
Signed-off-by: David S. Miller <>
2015-12-31sparc: Add all necessary direct socket system calls.David S. Miller2-17/+20
The GLIBC folks would like to eliminate socketcall support eventually, and this makes sense regardless so wire them all up. Signed-off-by: David S. Miller <>
2015-12-24sparc64: fix FP corruption in user copy functionsRob Gardner1-0/+13
Short story: Exception handlers used by some copy_to_user() and copy_from_user() functions do not diligently clean up floating point register usage, and this can result in a user process seeing invalid values in floating point registers. This sometimes makes the process fail. Long story: Several cpu-specific (NG4, NG2, U1, U3) memcpy functions use floating point registers and VIS alignaddr/faligndata to accelerate data copying when source and dest addresses don't align well. Linux uses a lazy scheme for saving floating point registers; It is not done upon entering the kernel since it's a very expensive operation. Rather, it is done only when needed. If the kernel ends up not using FP regs during the course of some trap or system call, then it can return to user space without saving or restoring them. The various memcpy functions begin their FP code with VISEntry (or a variation thereof), which saves the FP regs. They conclude their FP code with VISExit (or a variation) which essentially marks the FP regs "clean", ie, they contain no unsaved values. fprs.FPRS_FEF is turned off so that a lazy restore will be triggered when/if the user process accesses floating point regs again. The bug is that the user copy variants of memcpy, copy_from_user() and copy_to_user(), employ an exception handling mechanism to detect faults when accessing user space addresses, and when this handler is invoked, an immediate return from the function is forced, and VISExit is not executed, thus leaving the fprs register in an indeterminate state, but often with fprs.FPRS_FEF set and one or more dirty bits. This results in a return to user space with invalid values in the FP regs, and since fprs.FPRS_FEF is on, no lazy restore occurs. This bug affects copy_to_user() and copy_from_user() for NG4, NG2, U3, and U1. All are fixed by using a new exception handler for those loads and stores that are done during the time between VISEnter and VISExit. n.b. In NG4memcpy, the problematic code can be triggered by a copy size greater than 128 bytes and an unaligned source address. This bug is known to be the cause of random user process memory corruptions while perf is running with the callgraph option (ie, perf record -g). This occurs because perf uses copy_from_user() to read user stacks, and may fault when it follows a stack frame pointer off to an invalid page. Validation checks on the stack address just obscure the underlying problem. Signed-off-by: Rob Gardner <> Signed-off-by: Dave Aldridge <> Signed-off-by: David S. Miller <>
2015-12-24sparc64: Perf should save/restore fault infoRob Gardner1-0/+4
There have been several reports of random processes being killed with a bus error or segfault during userspace stack walking in perf. One of the root causes of this problem is an asynchronous modification to thread_info fault_address and fault_code, which stems from a perf counter interrupt arriving during kernel processing of a "benign" fault, such as a TSB miss. Since perf_callchain_user() invokes copy_from_user() to read user stacks, a fault is not only possible, but probable. Validity checks on the stack address merely cover up the problem and reduce its frequency. The solution here is to save and restore fault_address and fault_code in perf_callchain_user() so that the benign fault handler is not disturbed by a perf interrupt. Signed-off-by: Rob Gardner <> Signed-off-by: Dave Aldridge <> Signed-off-by: David S. Miller <>
2015-12-24sparc64: Ensure perf can access user stacksRob Gardner1-0/+7
When an interrupt (such as a perf counter interrupt) is delivered while executing in user space, the trap entry code puts ASI_AIUS in %asi so that copy_from_user() and copy_to_user() will access the correct memory. But if a perf counter interrupt is delivered while the cpu is already executing in kernel space, then the trap entry code will put ASI_P in %asi, and this will prevent copy_from_user() from reading any useful stack data in either of the perf_callchain_user_X functions, and thus no user callgraph data will be collected for this sample period. An additional problem is that a fault is guaranteed to occur, and though it will be silently covered up, it wastes time and could perturb state. In perf_callchain_user(), we ensure that %asi contains ASI_AIUS because we know for a fact that the subsequent calls to copy_from_user() are intended to read the user's stack. [ Use get_fs()/set_fs() -DaveM ] Signed-off-by: Rob Gardner <> Signed-off-by: Dave Aldridge <> Signed-off-by: David S. Miller <>
2015-12-24sparc64: Don't set %pil in rtrap_nmi too earlyRob Gardner1-1/+7
Commit 28a1f53 delays setting %pil to avoid potential hardirq stack overflow in the common rtrap_irq path. Setting %pil also needs to be delayed in the rtrap_nmi path for the same reason. Signed-off-by: Rob Gardner <> Signed-off-by: Dave Aldridge <> Signed-off-by: David S. Miller <>
2015-12-24sparc64: Add ADI capability to cpu capabilitiesKhalid Aziz1-4/+5
Add ADI (Application Data Integrity) capability to cpu capabilities list. ADI capability allows virtual addresses to be encoded with a tag in bits 63-60. This tag serves as an access control key for the regions of virtual address with ADI enabled and a key set on them. Hypervisor encodes this capability as "adp" in "hwcap-list" property in machine description. Signed-off-by: Khalid Aziz <> Signed-off-by: David S. Miller <>
2015-12-23sparc: Hook up userfaultfd system callMike Kravetz2-3/+3
After hooking up system call, userfaultfd selftest was successful for both 32 and 64 bit version of test. Signed-off-by: Mike Kravetz <> Signed-off-by: David S. Miller <>
2015-12-23new helpers: no_seek_end_llseek{,_size}()Al Viro1-18/+2
Signed-off-by: Al Viro <>
2015-11-23treewide: Remove old email addressPeter Zijlstra1-1/+1
There were still a number of references to my old Red Hat email address in the kernel source. Remove these while keeping the Red Hat copyright notices intact. Signed-off-by: Peter Zijlstra (Intel) <> Cc: Arnaldo Carvalho de Melo <> Cc: Jiri Olsa <> Cc: Linus Torvalds <> Cc: Mike Galbraith <> Cc: Peter Zijlstra <> Cc: Stephane Eranian <> Cc: Thomas Gleixner <> Cc: Vince Weaver <> Signed-off-by: Ingo Molnar <>