我在启用了高内存的 google collab CPU 上运行了以下基准测试代码。请指出我在基准测试过程中的任何错误(如果有),以及为什么 tinygrad 的性能提升如此之高。
# Set the size of the matrices
size = 10000
# Generate a random 10000x10000 matrix with NumPy
np_array = np.random.rand(size, size)
# Generate a random 10000x10000 matrix with PyTorch
torch_tensor = torch.rand(size, size)
# Generate a random 10000x10000 matrix with TinyGrad
tg_tensor = Tensor.rand(size, size)
# Benchmark NumPy
start_np = time.time()
np_result = np_array @ np_array # Matrix multiplication
np_time = time.time() - start_np
print(f"NumPy Time: {np_time:.6f} seconds")
# Benchmark PyTorch
start_torch = time.time()
torch_result = torch_tensor @ torch_tensor # Matrix multiplication
torch_time = time.time() - start_torch
print(f"PyTorch Time: {torch_time:.6f} seconds")
# Benchmark TinyGrad
start_tg = time.time()
tg_result = tg_tensor @ tg_tensor # Matrix multiplication
tg_time = time.time() - start_tg
print(f"TinyGrad Time: {tg_time:.6f} seconds")
- NumPy 时间:11.977072 秒
- PyTorch 时间:7.905509 秒
- TinyGrad 时间:0.000607 秒
这些就是结果。多次运行代码后,结果非常相似
Tinygrad 以“懒惰”的方式执行操作,因此尚未执行矩阵乘法。将矩阵乘法行更改为:
或者