我是CuPy
CUDA/GPU 计算的新手。有人能解释一下为什么(x / y)[i]
比 更快吗x[i] / y[i]
?
当利用 GPU 加速计算时,是否有任何指导原则可以让我快速确定哪个操作更快?从而避免对每个操作进行基准测试。
# In VSCode Jupyter Notebook
import cupy as cp
from cupyx.profiler import benchmark
x = cp.arange(1_000_000)
y = (cp.arange(1_000_000) + 1) / 2
i = cp.random.randint(2, size=1_000_000) == 0
x, y, i
# Output:
(array([ 0, 1, 2, ..., 999997, 999998, 999999], shape=(1000000,), dtype=int32),
array([5.000000e-01, 1.000000e+00, 1.500000e+00, ..., 4.999990e+05,
4.999995e+05, 5.000000e+05], shape=(1000000,), dtype=float64),
array([ True, False, True, ..., True, False, True], shape=(1000000,), dtype=bool))
def test1(x, y, i):
return (x / y)[i]
def test2(x, y, i):
return x[i] / y[i]
print(benchmark(test1, (x, y, i)))
print(benchmark(test2, (x, y, i)))
# Output:
test1: CPU: 175.164 us +/- 61.250 (min: 125.200 / max: 765.100) us GPU-0: 186.001 us +/- 67.314 (min: 134.144 / max: 837.568) us
test2: CPU: 342.364 us +/- 130.840 (min: 223.000 / max: 1277.600) us GPU-0: 368.133 us +/- 136.911 (min: 225.504 / max: 1297.408) us