Thursday, 24 March 2011


DDRKirby(ISQ) had problems with the HLSL code for "RGBtoHSV" in my original posting on this subject. I've had a chance to look at this in more detail and believe I know what the problem is/was. Some versions of the HLSL compiler have difficulty generating correct code for component swizzling under very specific circumstances. A minor refactoring seems to solve the problem:

float3 Hue(float H)
  float R = abs(H * 6 - 3) - 1;
  float G = 2 - abs(H * 6 - 2);
  float B = 2 - abs(H * 6 - 4);
  return saturate(float3(R,G,B));

float3 HSVtoRGB(in float3 HSV)
  return ((Hue(HSV.x) - 1) * HSV.y + 1) * HSV.z;

float3 RGBtoHSV(in float3 RGB)
  float3 HSV = 0;
#if NO_ASM
  HSV.z = max(RGB.r, max(RGB.g, RGB.b));
  float M = min(RGB.r, min(RGB.g, RGB.b));
  float C = HSV.z - M;
  float4 RGB4 = RGB.rgbr;
  asm { max4 HSV.z, RGB4 };
  asm { max4 RGB4.w, -RGB4 };
  float C = HSV.z + RGB4.w;
  if (C != 0)
    float4 RGB0 = float4(RGB, 0);
    float4 Delta = (HSV.z - RGB0) / C;
    Delta.rgb -= Delta.brg;
    Delta.rgb += float3(2,4,6);
    Delta.brg = step(HSV.z, RGB) * Delta.brg;
#if NO_ASM
    HSV.x = max(Delta.r, max(Delta.g, Delta.b));
    float4 Delta4 = Delta.rgbr;
    asm { max4 HSV.x, Delta4 };
    HSV.x = frac(HSV.x / 6);
    HSV.y = 1 / Delta.w;
  return HSV;

The major change (which produces the same optimized code as I expected from the original version) is highlighted. I'm not saying this will fix everyone's woes, but it certainly corrects the problem with the one situation I came across.

Saturday, 19 March 2011

Slacker By Nature

Having a blogger motto of "Software engineer by trade; slacker by nature" may not seem like a good advert for myself, but I believe I'm incredibly effective. That's subtly difference from being efficient, and vastly different from being a fast coder; I probably wouldn't fair well in demoscene 48-hour coding challenges.

But being a slacker by nature means I'm more likely to look for a simple, path-of-least-resistance solution, which, in many software engineering domains is the "right" solution. Don't reinvent the wheel ... unless you're looking for a competitive advantage by engineering a racing wheel that is 1% more efficient for 100% extra work. In which case, reimplement your string class and XML parser while you're at it!