代码:
# Rename the columns after pivoting
new_column_names = ["Year"] + [col_name.replace(" ", "_") for col_name in pivot_table.columns[1:]]
pivot_table = pivot_table.toDF(*new_column_names)
# Task 4: Calculate percentage change
percentage_cols = pivot_table.columns[1:] # Exclude the "Year" column
window_spec = Window.orderBy("Year")
print(pivot_table.columns)
pivot_table = pivot_table.drop('Other_purchases_and_operating_expenses')
# Calculate percentage change using a loop
for col_name in percentage_cols:
pivot_table = pivot_table.withColumn(f"{col_name}_lag", lag(col(col_name)).over(window_spec))
pivot_table = pivot_table.withColumn(f"{col_name}_change", (col(col_name) - col(f"{col_name}_lag")) / col(f"{col_name}_lag") * 100)
pivot_table = pivot_table.drop(f"{col_name}_lag")
pivot_table = pivot_table.drop(f"{col_name}")
代码的输出是
year column ,variablenames column 3
我想要每列的最大百分比变化以及它发生在数据框中的时间