我有以下输入:
[1i+2j], [3i+4j], [5i+6j],...
输出应为:
[1i+2j], [0i+0j], [3i+4j], [0i+0j], [5i+6j], [0i+0j],...
我写了以下代码:
void Extract (ComplexFloat *pIn, ComplexFloat*pOut, uint32_t N)
{
ComplexFloat* pSrc = pIn;
ComplexFloat *pDst = (ComplexFloat*)pOut;
float32x2_t Zero;
float32x4_t In, Out;
float32x2_t HighIn, LowIn;
Zero = vdup_n_f32 (0);
//Loop on all input elements
for (int n = 0; n < N >> 1; n++)
{
In = vld1q_f32((float*)pSrc);
HighIn = vget_high_f32(In);
LowIn = vget_low_f32(In);
Out = vcombine_f32(LowIn, Zero);
vst1q_f32((float*)pDst, Out);
pDst += 2;
Out = vcombine_f32(HighIn, Zero);
vst1q_f32((float*)pDst, Out);
pDst += 2;
pSrc += 2;
}
}
您能推荐一个性能更好的代码吗?
谢谢你,Zvika
至少以下应该给出更少的指令:
我们需要将两个相邻的浮点数转换为更宽的 64 位元素,之后我们可以
vst2
一次交错 64 位,而无需明确的 zip 指令。