dev-mirzabicer提出的问题 -coding

dev-mirzabicer

Asked: 2024-09-03 14:08:23 +0800 CST

读取/修复损坏的 parquet 文件

我有一个镶木地板文件，该文件正在通过如下连续循环进行写入：

    def process_data(self):
        #... other code ...

        with pq.ParquetWriter(self.destination_file, schema) as writer:
            with tqdm(total=total_rows, desc="Processing nodes") as pbar:
                for i in range(0, total_rows, self.batch_size):
                    # ... processing code ...

                    # Create a table from the batched data
                    batch_table = pa.Table.from_arrays(
                        [
                            pa.array(node_ids),
                            pa.array(mut_positions),
                            pa.array(new_6mers),
                            pa.array(context_embeddings),
                            pa.array(nonmutation_contexts),
                        ],
                        schema=schema
                    )

                    # Write the batch table
                    writer.write_table(batch_table)

                    # ...

                    pbar.update(len(batch_indices))

由于计算机在此过程中突然关闭，此循环被突然切断。

现在，当我尝试读取文件时pq.read_table，我（预计）收到一个错误

pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from 'data/processed/data_with_embeddings.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'data/processed/data_with_embeddings.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

我迫切希望有办法解决这个问题。比如一种变通方法。比如丢失几行（或更多行），但保存大部分数据。我搜索了网络，但似乎没有关于此的任何信息，或者现有的信息超出了我的专业知识范围（这可能从我的标签使用中可以看出，我提前道歉）。

还有希望吗？

读取/修复损坏的 parquet 文件

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

为什么这个简单而小的 Java 代码在所有 Graal JVM 上的运行速度都快 30 倍，但在任何 Oracle JVM 上却不行？

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

我正在尝试仅使用海龟随机和数学模块来制作吃豆人游戏

dev-mirzabicer's questions