上一篇的代码已经经过初步的优化了:
unsigned short srcColor, *destColor; for(j = 0; j < pDIBSrc->cy; ++j) { for(i = 0; i < pIDIBSrc->cx; ++i) { srcColor = *(unsigned short *)&pIDIBSrc->pBmp[i + j * pIDIBDest->nPitch]; destColor = (unsigned short *)&pIDIBDest->pBmp[i + j * pIDIBDest->nPitch]; srcColor = ((srcColor) >> 1) & 0x7BEF; *destColor = ((*destColor) >> 1) & 0x7BEF; *destColor = srcColor + (*destColor); } }
当时项目并不紧,我有足够的时间来打磨和优化
比如这一行
srcColor = *(unsigned short *)&pIDIBSrc->pBmp[i + j * pIDIBDest->nPitch];
着一个运算
j * pIDIBDest->nPitch
会执行pIDIBSrc->cx * pIDIBSrc->cy次,而且乘法在没有协处理器的功能机上,效率非常低
优化方式是加一个步进,将这一个乘法的cx * cy的复杂度变为cx复杂度
unsigned short srcColor, *destColor; int step = 0; for(j = 0; j < pDIBSrc->cy; ++j) { for(i = 0; i < pIDIBSrc->cx; ++i) { srcColor = *(unsigned short *)&pIDIBSrc->pBmp[i + step]; destColor = (unsigned short *)&pIDIBDest->pBmp[i + step]; srcColor = ((srcColor) >> 1) & 0x7BEF; *destColor = ((*destColor) >> 1) & 0x7BEF; *destColor = srcColor + (*destColor); } step += pDIBDest->nPitch; }
又更进了一步:)