Example: Each element of array a[0:N-1] is incremented by 1. On a multicore CPU: use coarse-grained parallelism; #threads = #cores. On a GPU: use fine-grained parallelism; one thread per data element!
While GPUs excel at highly parallel tasks, they struggle with irregular access patterns such as those found in graph or sparse matrix operations. In these cases, where SIMT is not possible and ...