Jump to content


Photo

scale2x / scale3x shaders


  • Please log in to reply
31 replies to this topic

#1 OFFLINE   sebt3

sebt3

    PowerFreak Troll, leave him alone

  • Members
  • PipPipPip
  • 2303 posts
  • Local time: 01:57 PM

Donator

Posted 11 September 2011 - 12:12 PM

Hi there,

I do have written these 2 shaders. But they are _way_ _too_ _slow_ to be anything usefull :(

Anyway here they are :

scale2x :
Spoiler

scale3x :
Spoiler


I tried to optimize them, but that's the best I can get. They are at least as slow as their CPU counter-part.

PS : I'm doing more this as a warning/starting point for the next man as I spend some days doing so and the result are... not satisfying at least...

- [ PNDS ] - [ Yactfeau ] -

Spoiler

#2 ONLINE   PokeParadox

PokeParadox

    Advanced Member

  • Moderators
  • 2121 posts
  • Local time: 12:57 PM

Donator

Posted 11 September 2011 - 12:15 PM

Awww... How do they perform on the dekstop?


Tired of the infamous preorder queue? Donate and help get this queue cleared!
Twitter: @PokeParadox
If you like my work, please consider leaving a rating or feedback or donating, every little "thank you" is appreciated!
My Pandora Apps - My Development Blog - Pirate Games


#3 OFFLINE   sebt3

sebt3

    PowerFreak Troll, leave him alone

  • Members
  • PipPipPip
  • 2303 posts
  • Local time: 01:57 PM

Donator

Posted 11 September 2011 - 12:23 PM

Awww... How do they perform on the dekstop?

Not tested, but as I know epsxe is using a similar shader for scale2x, I guess fast enough.

EDIT : I just tested a 2xSaI shader as I've been told that conditionnal in shader are bad, and 2xSaI can be implemented without a single test.... Still too _way_ slow (about the same as scale2x :( )

- [ PNDS ] - [ Yactfeau ] -

Spoiler

#4 OFFLINE   B-ZaR

B-ZaR

    A Commando

  • Supporter
  • 2330 posts
  • Local time: 02:57 PM
  • LocationFinland

Posted 12 September 2011 - 06:46 AM

Probably not a big impact (and might already be optimized by the compiler anyway), but couldn't you do the scale2x conditions as if-else if-else if... instead of if-if-if...? Looks like no two of those conditions will apply at the same time. Could save you some condition checking. Also this will probably help even less, but you could do away with the E variable and just substitute it with gl_FragColor to do away with one assignment. Then move the texture2D(s_texture0, v_texCoord[0]); to a else block after the if blocks (both of them), saves you another useless assignment.

I really don't know if these will have any measurable impact, but something to try at least :)

EDIT:
precision mediump float;
varying vec2 v_texCoord[5];
varying vec2 pos;
uniform sampler2D s_texture0;
uniform vec4 u_param;
void main()
{
    	vec4 D = texture2D(s_texture0, v_texCoord[1]);
    	vec4 F = texture2D(s_texture0, v_texCoord[2]);
    	if (D == F) {
            	gl_FragColor = texture2D(s_texture0, v_texCoord[0]);
    	} else {
            	vec4 H = texture2D(s_texture0, v_texCoord[3]);
            	vec4 B = texture2D(s_texture0, v_texCoord[4]);
            	if (B == H) {
                    	gl_FragColor = texture2D(s_texture0, v_texCoord[0]);
            	} else {
                    	vec2 p = fract(pos);
                    	if(p.x< 0.5 && p.y>=0.5 && D == <img src='http://boards.openpandora.org/public/style_emoticons/<#EMO_DIR#>/cool.png' class='bbc_emoticon' alt='B)' /> gl_FragColor = D; // E0
                    	else if(p.x< 0.5 && p.y< 0.5 && D == H) gl_FragColor = D; // E2
                    	else if(p.x>=0.5 && p.y>=0.5 && B == F) gl_FragColor = F; // E1
                    	else if(p.x>=0.5 && p.y< 0.5 && H == F) gl_FragColor = F; // E3
                    	else gl_FragColor = texture2D(s_texture0, v_texCoord[0]);
            	}
    	}
}

EDIT2: Let's save some sampling while we're at it (now it's ugly, too!)
EDIT3: Removed extra "else"
"Things should be simple by default, customizable by preference and replaceable by design."
Three Golden Lessons of the Pandora project: 1. Never assume, 2. No news is no news, 3. Something can go wrong

Currently working on: (research mode, studying different programming languages)
On hold: glhck | guihck | warshck | QMLON | EngineWorks | Wars | GameNode | Panorama | PNDManager

Finished: Wars: Commando | Space Rocks!
Spoiler

#5 OFFLINE   Exophase

Exophase

    Advanced Member

  • Members
  • PipPipPip
  • 3975 posts
  • Local time: 08:57 AM
  • LocationCleveland, OH

Donator

Posted 12 September 2011 - 02:35 PM

First thing, I'm curious what performance you're getting on a pure pass-through identity filter. This is being applied to a virtual framebuffer that's constantly being updated, right? It seems that the best method for doing this is using bc-cat (http://processors.wi...-cat_User_Guide) but I don't know if anyone has built it for Pandora.

Unfortunately, I can tell you off the bat that getting 60FPS for 800x480 with this shader is impossible if you need all of the samplers. With 5 texture accesses you're already exceeding the theoretical maximum fill-rate of the SGX530 at 110MHz (you'd be limited around 57FPS, but I'm sure nothing close to that is attainable). That's for the scale2x fragment shader, the scale3x one is obviously much worse.

The compiler is probably going to turn some conditionals into predicates and keep some as conditionals. SGX can handle real flow control changes with single thread granularity, but we don't really know what the overhead is.. and I have a feeling that making samplers conditional may make them slower. Worth a shot though I guess.

#6 OFFLINE   sebt3

sebt3

    PowerFreak Troll, leave him alone

  • Members
  • PipPipPip
  • 2303 posts
  • Local time: 01:57 PM

Donator

Posted 12 September 2011 - 02:48 PM

I know about bc-cat and I was about to try to get it running (aka get that kernel drivers first). But I decided to test with the passthrow shader first and I was realy surprised to get zelda3T (my testing pet) working fullspeed.
When I'll be back from work I'll :
- test B-ZaR implementation
- provide a 3T test build with switchable shaders (pass-through, scale2x and 2xSaI)
- bench my flip function with both shaders

- [ PNDS ] - [ Yactfeau ] -

Spoiler

#7 OFFLINE   FSO

FSO

    Newbie

  • Members
  • Pip
  • 1 posts
  • Local time: 01:57 PM

Posted 12 September 2011 - 10:25 PM

I don't know much about shaders but if branching is a problem you may consider something like this:
vec4 tmp1 = p.x < 0.5 ? D : F;
vec4 tmp2 = p.y < 0.5 ? H : B;
vec4 tmp3 = D == F || H == B ? E : tmp1;
gl_FragColor = tmp1 == tmp2 ? tmp3 : E;
It should be equivalent. Any compiler should be smart enough to produce branch free code out of this and it is only 10 instructions total.

#8 OFFLINE   sebt3

sebt3

    PowerFreak Troll, leave him alone

  • Members
  • PipPipPip
  • 2303 posts
  • Local time: 01:57 PM

Donator

Posted 12 September 2011 - 11:53 PM

Thanks FSO works nicely ;)
But even with that, my flip function cost (for snes resultion pushed to fullscreen) :
- passthrough : 10-14ms
- 2xSaI : 62-65ms
- scale2x : 160-180ms (FSO implemention)

Spoiler

So branches are bad in a shader, but sampling too :D

- [ PNDS ] - [ Yactfeau ] -

Spoiler

#9 OFFLINE   Exophase

Exophase

    Advanced Member

  • Members
  • PipPipPip
  • 3975 posts
  • Local time: 08:57 AM
  • LocationCleveland, OH

Donator

Posted 13 September 2011 - 12:05 AM

62-65ms is about 17-19 USSE cycles. If you count the number of operations in the shader this isn't very surprising. I can count as many as 14 vec3 operations operations and 5 samplers. Both USSEs can work on the ALU operations but you can only get one TMU result per cycle. Depending on the datatype those vector operations may take multiple cycles.

Are the vec3s lowp? Its type could make a big difference.

#10 OFFLINE   sebt3

sebt3

    PowerFreak Troll, leave him alone

  • Members
  • PipPipPip
  • 2303 posts
  • Local time: 01:57 PM

Donator

Posted 13 September 2011 - 12:15 AM

Using vec3 instead of vec4, the scale2x code reach 125-130ms. Not usable (far from it) but interesting


Spoiler

- [ PNDS ] - [ Yactfeau ] -

Spoiler

#11 OFFLINE   mjohansson

mjohansson

    Advanced Member

  • Supporter
  • 273 posts
  • Local time: 01:57 PM

Posted 27 September 2011 - 02:43 AM

Ive seen that x ? y : z type of code before, how does it work? How can it create bransh free code? Im abusing if else like crazy, but on normal cpus it dosnt seem to be very slow, arm cpus arent out of order type are they? Cos thats were youd need no branching to boost speed?

#12 OFFLINE   B-ZaR

B-ZaR

    A Commando

  • Supporter
  • 2330 posts
  • Local time: 02:57 PM
  • LocationFinland

Posted 27 September 2011 - 07:22 AM

?: tertiary operator is a shorthand for if. The return value of the statement depends on the first parameter (before "?"). If it's true, the statement's value is the second paramenter (between "?" and ":"). Otherwise it's the third parameter (after ":").
int a = 1 == 2 ? 10 : 20; // a = 20
int b = true || false ? 10 : 20; // b = 10

Branching makes code slower on some platforms because it messes up pipelining (processing multiple sequential instructions partly in parallel) inside the processor. In the case of shaders, which are typically short programs run a lot of times messing up pipelining can affect performance by a visible margin. You shouldn't need to think about this when doing normal code at all.

I'm not quite sure how the tertiary shorthand if operator is optimized to not require branching, however.

(disclaimer: the above may contain errors, which I would like to be pointed out by someone who can explain it better)
"Things should be simple by default, customizable by preference and replaceable by design."
Three Golden Lessons of the Pandora project: 1. Never assume, 2. No news is no news, 3. Something can go wrong

Currently working on: (research mode, studying different programming languages)
On hold: glhck | guihck | warshck | QMLON | EngineWorks | Wars | GameNode | Panorama | PNDManager

Finished: Wars: Commando | Space Rocks!
Spoiler

#13 OFFLINE   milkshake

milkshake

    Super Advanced Member

  • Members
  • PipPipPip
  • 3096 posts
  • Local time: 12:57 PM
  • LocationRotherham, UK

Posted 27 September 2011 - 07:46 AM

I use that alot in my php stuff when assigning a variable which could be 2 different values.
$a = ($x==$y)? '10'/*if true*/ : '120'/*if false*/;
thats just a random example but its pretty much the same as B-ZaR explained, not sure if this has any effect the speed of my scripts but its quick and neat to write.

you cant use this to replace "elseif" or "switch" statements however.
minipandalogo.png Pandora Repo - software for your pandora :)

btn_donate_SM.gif If you like my site/contributions, consider donating.
Prometheus.jpg

#14 OFFLINE   Pickle

Pickle

    Advanced Member

  • Members
  • PipPipPip
  • 1120 posts
  • Local time: 07:57 AM

Posted 27 September 2011 - 01:07 PM

you cant use this to replace "elseif" or "switch" statements however.


a ? (b ? (c ? d : e) : f) : g;


#15 OFFLINE   milkshake

milkshake

    Super Advanced Member

  • Members
  • PipPipPip
  • 3096 posts
  • Local time: 12:57 PM
  • LocationRotherham, UK

Posted 27 September 2011 - 01:20 PM


you cant use this to replace &quot;elseif&quot; or &quot;switch&quot; statements however.


a ? (b ? (c ? d : e) : f) : g;


or maybe you can ;) but I probably wouldnt.
minipandalogo.png Pandora Repo - software for your pandora :)

btn_donate_SM.gif If you like my site/contributions, consider donating.
Prometheus.jpg

#16 OFFLINE   Exophase

Exophase

    Advanced Member

  • Members
  • PipPipPip
  • 3975 posts
  • Local time: 08:57 AM
  • LocationCleveland, OH

Donator

Posted 27 September 2011 - 05:06 PM

In C, ?: is called the "conditional operator".. since it's the only ternary operator (three arguments) it's often just called "the ternary operator."

It isn't a shorthand for if, because it actually evaluates to something, so can be used as a value inside an expression. Also unlike if, the branches can't contain compound statements. But like if, only the taken branch will be evaluated. Since there aren't compound statements it's harder to properly sequence side effects, but you can still have them. Therefore, in order to create a branch-free version of this code you have to predicate everything, unless you can determine there are no side-effects, in which case you just have to have a mechanism for picking the result. If you don't have predication in hardware you're probably better off outputting a branch. This is no different for the analysis a compiler would do on an if statement.

From what I understand, OpenCL, which calls it "ternary selection", behaves the same for scalars, so it'd be up to the compiler to generate branches. But for vectors, "? a : b : c" is equivalent to select(a, b, c), which means that both b and c will be evaluated and therefore it will be branch-less. The select function will return a vector where each element is selected from b or c depending on if that element in a is true or false (determined by the MSB). ISAs that don't have full instruction predication will often at least have select instructions, for instance SSE and ARM/NEON have it. But even if it isn't there you can do it with just a few instructions, depending on how the selector is stored.

Personally I never use the operator in my C code, I guess I just can't stand the appearance of it.. but nothing is really stopping me or anyone else from making a macro to hide it.

#17 OFFLINE   B-ZaR

B-ZaR

    A Commando

  • Supporter
  • 2330 posts
  • Local time: 02:57 PM
  • LocationFinland

Posted 28 September 2011 - 07:01 AM

Yes, ternary. Sorry for the mixup. I have seen it (or variants in other languages) called "shorthand if" though. Wikipedia lists "inline if" as a common name, but also refers to the term "shorthand if", though I do recognize the differences. Maybe the name refers to being a shorter version of the common "if(condition) x = 1; else x = 2;" (insert line breaks and curly braces for your reading pleasure) pattern. Dunno, I've heard the name several times though.

Anyway, thanks for clearing this up :)
"Things should be simple by default, customizable by preference and replaceable by design."
Three Golden Lessons of the Pandora project: 1. Never assume, 2. No news is no news, 3. Something can go wrong

Currently working on: (research mode, studying different programming languages)
On hold: glhck | guihck | warshck | QMLON | EngineWorks | Wars | GameNode | Panorama | PNDManager

Finished: Wars: Commando | Space Rocks!
Spoiler

#18 OFFLINE   Steven Craft

Steven Craft

    Advanced Member

  • Members
  • PipPipPip
  • 638 posts
  • Local time: 12:57 PM
  • LocationChester, Cheshire, UK

Posted 09 December 2011 - 09:39 PM


you cant use this to replace "elseif" or "switch" statements however.


a ? (b ? (c ? d : e) : f) : g;


Syntax along these lines can actually be quite handy, for example when setting up a reference (you have to define the reference and point it at something on the same line, you can't define on one line and set up what it points to later) so it is not crazily uncommon to see stuff like:

const MyDataStructure & rData = ( i == 0 ) ? kZeroData : i > 100 ? kLargeData : kNormalData;

Which is basically (in pseudo):

if i = 0: rData = kZeroData
else if i > 100: rData = kLargeData
else rData = kNormalData

Anyway!

Steve
KAMI RETRO for Pandora: now available on the repository || Aliens vs Predator Pandora port: now available on the repository

#19 OFFLINE   doragasu

doragasu

    Advanced Member

  • Members
  • PipPipPip
  • 275 posts
  • Local time: 12:57 PM

Posted 09 March 2012 - 12:56 PM

In some architectures having instructions for conditional selection or swapping of registers (for example some DSP hardware and also the SPUs in the PS3), ternary operator doesn't always need to be coded as if/else. Just a conditional select and you have avoided a branch. Some times even, calculating both ways and making the selection of the result works faster than branching and calculating only one way.
Unfortunately, I know nothing about GPU shaders, so I don't know if they have conditional select/swap instructions.

#20 OFFLINE   Exophase

Exophase

    Advanced Member

  • Members
  • PipPipPip
  • 3975 posts
  • Local time: 08:57 AM
  • LocationCleveland, OH

Donator

Posted 09 March 2012 - 05:35 PM

Since (scalar) ternary can have side effects they can't automatically be turned into both branch + select. So it's just like if/then/else. There's no reason why a compiler could implement one using selects and not the other.

Shader languages literally have select instructions, so it'd be pretty lame if GPU hardware didn't support it directly.


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users