我严格按照 tensorflow.org 上的说明安装了 Cuda、cudann 和 TensorFlow。在安装过程中,我让我的 ubuntu 切换到了 Nvidia 卡。验证安装后,我切换回英特尔。现在在编译我的代码时,我在终端中收到了这条消息:
2019-11-26 19:24:24.781299: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcuda.so.1 2019-11-26 19:24:24.830457: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点具有负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:24.831899:我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] 找到具有以下属性的设备 0: 名称:GeForce MX150 主要:6 次要:1 memoryClockRate(GHz):1.5315 pciBusID: 0000:01:00.0 2019-11-26 19:24:24.983890: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcudart.so.10.0 2019-11-26 19:24:25.001409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcublas.so.10.0 2019-11-26 19:24:25.009430: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcufft.so.10.0 2019-11-26 19:24:25.030189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcurand.so.10.0 2019-11-26 19:24:25.048404: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcusolver.so.10.0 2019-11-26 19:24:25.067131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcusparse.so.10.0 2019-11-26 19:24:25.095875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcudnn.so.7 2019-11-26 19:24:25.096289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点为负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:25.099040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点具有负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:25.100673:我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] 添加可见 gpu 设备:0 2019-11-26 19:24:25.101394: I tensorflow/core/platform/cpu_feature_guard.cc:142] 您的 CPU 支持未编译此 TensorFlow 二进制文件以使用的指令:AVX2 FMA 2019-11-26 19:24:25.135574:I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU 频率:1800000000 Hz 2019-11-26 19:24:25.137917:我 tensorflow/compiler/xla/service/service.cc:168] XLA 服务 0x5619ea60e930 在平台主机上执行计算。设备: 2019-11-26 19:24:25.137997:I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor 设备(0):主机,默认版本 2019-11-26 19:24:25.270223: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点具有负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:25.271459:我 tensorflow/compiler/xla/service/service.cc:168] XLA 服务 0x5619ebdb8290 在平台 CUDA 上执行计算。设备: 2019-11-26 19:24:25.271500: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor 设备 (0): GeForce MX150, Compute Capability 6.1 2019-11-26 19:24:25.271757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点具有负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:25.272737:我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] 找到具有以下属性的设备 0: 名称:GeForce MX150 主要:6 次要:1 memoryClockRate(GHz):1.5315 pciBusID: 0000:01:00.0 2019-11-26 19:24:25.272802: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcudart.so.10.0 2019-11-26 19:24:25.272831: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcublas.so.10.0 2019-11-26 19:24:25.272858: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcufft.so.10.0 2019-11-26 19:24:25.272882: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcurand.so.10.0 2019-11-26 19:24:25.272913: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcusolver.so.10.0 2019-11-26 19:24:25.272940: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcusparse.so.10.0 2019-11-26 19:24:25.272966: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcudnn.so.7 2019-11-26 19:24:25.273065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点具有负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:25.274126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点具有负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:25.275065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] 添加可见 gpu 设备:0 2019-11-26 19:24:25.275131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcudart.so.10.0 2019-11-26 19:24:25.277086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] 设备互连 StreamExecutor 与强度 1 边缘矩阵: 2019-11-26 19:24:25.277112: 我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-11-26 19:24:25.277124: 我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-11-26 19:24:25.277325: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点具有负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:25.278329: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 从 SysFS 读取的成功 NUMA 节点为负值 (-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零 2019-11-26 19:24:25.279323:我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] 创建了 TensorFlow 设备(/job:localhost/replica:0/task:0/device:GPU:0 和 1323 MB 内存)-> 物理 GPU(设备:0,名称:GeForce MX150,pci 总线 ID:0000:01:00.0,计算能力:6.1)
那么我的 GPU 工作正常还是我需要做其他事情?感谢您的回答,但看起来我在尝试运行卷积层时遇到了一个新问题。错误说:
2019-11-29 20:59:03.481920: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcublas.so.10.0 2019-11-29 20:59:06.074691: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcudnn.so.7 2019-11-29 20:59:06.171580:E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] 无法创建 cudnn 句柄:CUDNN_STATUS_NOT_INITIALIZED 2019-11-29 20:59:06.171825: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] 驱动程序版本可能不足:418.87.1 2019-11-29 20:59:06.171902:E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] 无法创建 cudnn 句柄:CUDNN_STATUS_NOT_INITIALIZED 2019-11-29 20:59:06.171993: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] 驱动程序版本可能不足:418.87.1 2019-11-29 20:59:06.172777:W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown:无法获取卷积算法。这可能是因为 cuDNN 初始化失败,因此请尝试查看上面是否打印了警告日志消息。 [[{{节点顺序/conv2d/Conv2D}}]] 32/50000 [.......................] - ETA:2:43:50Traceback(最近一次通话最后): 文件“sample3.py”,第 54 行,在 验证数据=(测试图像,测试标签)) 文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py”,第 728 行,适合 use_multiprocessing=use_multiprocessing) 文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py”,第 324 行,适合 total_epochs=epochs) 文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py”,第 123 行,在 run_one_epoch batch_outs = execution_function(迭代器) 文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py”,第 86 行,在 execution_function 分布式函数(input_fn)) __call__ 中的文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py”,第 457 行 结果 = self._call(*args, **kwds) _call 中的文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py”,第 520 行 return self._stateless_fn(*args, **kwds) __call__ 中的文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py”,第 1823 行 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access _filtered_call 中的文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py”,第 1141 行 self.captured_inputs) _call_flat 中的文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py”,第 1224 行 ctx、args、cancellation_manager=cancellation_manager) 调用中的文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py”,第 511 行 ctx=ctx) 文件“/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py”,第 67 行,在 quick_execute 六.raise_from(core._status_to_exception(e.code, message), None) 文件“”,第 3 行,在 raise_from tensorflow.python.framework.errors_impl.UnknownError:获取卷积算法失败。这可能是因为 cuDNN 初始化失败,因此请尝试查看上面是否打印了警告日志消息。 [[节点顺序/conv2d/Conv2D(定义在/home/encrypto/venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751)]] [Op:__inference_distributed_function_1055] 函数调用栈: 分布式函数
请告诉我如何处理这个?