AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / user-7347911

mks2192's questions

Martin Hope
manoj
Asked: 2024-02-03 21:18:08 +0800 CST

训练 xception 模型 keras - 批量大小 32 给出错误,但它适用于批量大小 = 16

  • 5

训练 xception 模型 keras - 批量大小 32 给出错误,但它适用于批量大小 = 16
下面是错误日志的详细信息,您可以帮助我吗?我猜测下面的内容是错误的关键,但无法弄清楚 OOM当使用 shape[728,728,1,1] 分配张量并通过分配器 GPU_0_bfc 在 /job:localhost/replica:0/task:0/device:GPU:0 上键入 float 时

   ResourceExhaustedError                    Traceback (most recent call last)
    Cell In[34], line 7
          2 model_save =  ModelCheckpoint('/kaggle/working/model_weights.keras' , monitor = 'val_loss', save_best_only = True, mode = 'min')
          3 reduce_lr =  ReduceLROnPlateau(monitor='val_loss', factor=0.1,
          4                               patience=4, min_lr=0.0001)
    ----> 7 history = model.fit(train_it, steps_per_epoch= steps_per_epoch, validation_data=val_it,
          8              validation_steps=validation_steps, epochs = epochs, callbacks=[early_stopping, model_save, reduce_lr] )
    
    File /opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
         67     filtered_tb = _process_traceback_frames(e.__traceback__)
         68     # To get the full stack trace, call:
         69     # `tf.debugging.disable_traceback_filtering()`
    ---> 70     raise e.with_traceback(filtered_tb) from None
         71 finally:
         72     del filtered_tb
    
    File /opt/conda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
         50 try:
         51   ctx.ensure_initialized()
    ---> 52   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
         53                                       inputs, attrs, num_outputs)
         54 except core._NotOkStatusException as e:
         55   if name is not None:
    
    ResourceExhaustedError: Graph execution error:
    
    Detected at node 'model_1/block6_sepconv2/separable_conv2d' defined at (most recent call last):
        File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
          return _run_code(code, main_globals, None,
        File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
          exec(code, run_globals)
        File "/opt/conda/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in <module>
          app.launch_new_instance()
        File "/opt/conda/lib/python3.10/site-packages/traitlets/config/application.py", line 1043, in launch_instance
          app.start()
        File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 728, in start
          self.io_loop.start()
        File "/opt/conda/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 195, in start
          self.asyncio_loop.run_forever()
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
          self._run_once()
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
          handle._run()
        File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
          self._context.run(self._callback, *self._args)
        File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
          await self.process_one()
        File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 502, in process_one
          await dispatch(*args)
        File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
          await result
        File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
          reply_content = await reply_content
        File "/opt/conda/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
          res = shell.run_cell(
        File "/opt/conda/lib/python3.10/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
          return super().run_cell(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3009, in run_cell
          result = self._run_cell(
        File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3064, in _run_cell
          result = runner(coro)
        File "/opt/conda/lib/python3.10/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
          coro.send(None)
        File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3269, in run_cell_async
          has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
        File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3448, in run_ast_nodes
          if await self.run_code(code, result, async_=asy):
        File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
          exec(code_obj, self.user_global_ns, self.user_ns)
        File "/tmp/ipykernel_33/698136834.py", line 7, in <module>
          history = model.fit(train_it, steps_per_epoch= steps_per_epoch, validation_data=val_it,
        File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1685, in fit
          tmp_logs = self.train_function(iterator)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1284, in train_function
          return step_function(self, iterator)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1268, in step_function
          outputs = model.distribute_strategy.run(run_step, args=(data,))
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in run_step
          outputs = model.train_step(data)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1050, in train_step
          y_pred = self(x, training=True)
        File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 558, in __call__
          return super().__call__(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1145, in __call__
          outputs = call_fn(inputs, *args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/functional.py", line 512, in call
          return self._run_internal_graph(inputs, training=training, mask=mask)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/functional.py", line 669, in _run_internal_graph
          outputs = node.layer(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1145, in __call__
          outputs = call_fn(inputs, *args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/keras/layers/convolutional/separable_conv2d.py", line 188, in call
          outputs = tf.compat.v1.nn.separable_conv2d(
    Node: 'model_1/block6_sepconv2/separable_conv2d'
    OOM when allocating tensor with shape[728,728,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model_1/block6_sepconv2/separable_conv2d}}]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
     [Op:__inference_train_function_39551]
tensorflow
  • 1 个回答
  • 21 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    重新格式化数字,在固定位置插入分隔符

    • 6 个回答
  • Marko Smith

    为什么 C++20 概念会导致循环约束错误,而老式的 SFINAE 不会?

    • 2 个回答
  • Marko Smith

    VScode 自动卸载扩展的问题(Material 主题)

    • 2 个回答
  • Marko Smith

    Vue 3:创建时出错“预期标识符但发现‘导入’”[重复]

    • 1 个回答
  • Marko Smith

    具有指定基础类型但没有枚举器的“枚举类”的用途是什么?

    • 1 个回答
  • Marko Smith

    如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误?

    • 6 个回答
  • Marko Smith

    `(表达式,左值) = 右值` 在 C 或 C++ 中是有效的赋值吗?为什么有些编译器会接受/拒绝它?

    • 3 个回答
  • Marko Smith

    在 C++ 中,一个不执行任何操作的空程序需要 204KB 的堆,但在 C 中则不需要

    • 1 个回答
  • Marko Smith

    PowerBI 目前与 BigQuery 不兼容:Simba 驱动程序与 Windows 更新有关

    • 2 个回答
  • Marko Smith

    AdMob:MobileAds.initialize() - 对于某些设备,“java.lang.Integer 无法转换为 java.lang.String”

    • 1 个回答
  • Martin Hope
    Fantastic Mr Fox msvc std::vector 实现中仅不接受可复制类型 2025-04-23 06:40:49 +0800 CST
  • Martin Hope
    Howard Hinnant 使用 chrono 查找下一个工作日 2025-04-21 08:30:25 +0800 CST
  • Martin Hope
    Fedor 构造函数的成员初始化程序可以包含另一个成员的初始化吗? 2025-04-15 01:01:44 +0800 CST
  • Martin Hope
    Petr Filipský 为什么 C++20 概念会导致循环约束错误,而老式的 SFINAE 不会? 2025-03-23 21:39:40 +0800 CST
  • Martin Hope
    Catskul C++20 是否进行了更改,允许从已知绑定数组“type(&)[N]”转换为未知绑定数组“type(&)[]”? 2025-03-04 06:57:53 +0800 CST
  • Martin Hope
    Stefan Pochmann 为什么 {2,3,10} 和 {x,3,10} (x=2) 的顺序不同? 2025-01-13 23:24:07 +0800 CST
  • Martin Hope
    Chad Feller 在 5.2 版中,bash 条件语句中的 [[ .. ]] 中的分号现在是可选的吗? 2024-10-21 05:50:33 +0800 CST
  • Martin Hope
    Wrench 为什么双破折号 (--) 会导致此 MariaDB 子句评估为 true? 2024-05-05 13:37:20 +0800 CST
  • Martin Hope
    Waket Zheng 为什么 `dict(id=1, **{'id': 2})` 有时会引发 `KeyError: 'id'` 而不是 TypeError? 2024-05-04 14:19:19 +0800 CST
  • Martin Hope
    user924 AdMob:MobileAds.initialize() - 对于某些设备,“java.lang.Integer 无法转换为 java.lang.String” 2024-03-20 03:12:31 +0800 CST

热门标签

python javascript c++ c# java typescript sql reactjs html

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve