Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nv2a: Adjust NaN handling to be similar to HW #913

Closed
wants to merge 1 commit into from

Conversation

abaire
Copy link
Contributor

@abaire abaire commented May 11, 2022

Fixes #365 by forcing treatment of NaN values generated in the vertex shader to follow the same pattern as HW.

Unfortunately this does not address the fact that on HW, +/- NaN * 0 (or very close to 0) should be 0. This difference in behavior is also responsible for #1008.

This also does not address the same issue for +/- inf and the shader version currently used by xemu explicitly notes that operations on inf/NaN's are undefined (which may explain why Otogi textures are not blacked out on M1 mac).

Tests
HW results

@abaire
Copy link
Contributor Author

abaire commented May 11, 2022

Hey @NZJenkins I wasn't sure if you were already looking at a fix, but I started to suspect this might be partly to blame for #801 so I threw together a partial fix to unblock my debugging there.

This is mostly a proof of concept at the moment as it doesn't fully match the HW behavior and because some investigation will need to be done on the performance impact (and possibly some optimization)

" return src;\n"
" }\n"
" ivec4 signs = floatBitsToInt(src);\n"
" vec4 dest;\n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff can be vectorized, assuming the logic will stay the same:

" return mix(src, mix(vec4(1.0), vec4(0.0), lessThan(signs, ivec4(0))), nans);\n"

@ko81e24wy
Copy link

ko81e24wy commented May 11, 2022

glitch in NGB this PR
may relate to issue #783
2022-05-11 11-48-33 mp4_20220511_115252406
same glitch in DOA2U
xemu_2022_05_11_12_52_47_379

@abaire
Copy link
Contributor Author

abaire commented May 11, 2022

@ko81e24wy where is the nearest save point so I can reproduce that?

@ko81e24wy
Copy link

@ko81e24wy where is the nearest save point so I can reproduce that?

I think you just need to start a new game

@NZJenkins
Copy link

NZJenkins commented May 11, 2022

Hey @NZJenkins I wasn't sure if you were already looking at a fix, but I started to suspect this might be partly to blame for #801 so I threw together a partial fix to unblock my debugging there.

No not looking at a fix.
Did something similar for cxbxr where all outputs are de-NaN-ified, but probably need more tests to determine the correct behaviour.

For texcoord and fog outputs, additional processing happens on the VS output values before they are used in the PS, which may interpret NaN, INF, etc. differently to the mapping that happens when they are passed directly from the VS to the PS.
Not sure how Xemu shaders are set up but that might need to be accounted for here.

Unfortunately this does not address the fact that on HW, +/- NaN * 0 (or very close to 0) should be 0.

The Min test is using "subnormal" float values- it might be interesting to try it with the minimum "normal" values instead, and see if they behave differently to 0.
Apparently it's more complex to support subnormal numbers correctly in hardware, so for a GPU it's likely not implemented
(added test cases - subnormals appear to be treated as 0 abaire/nxdk_pgraph_tests#65)

@abaire
Copy link
Contributor Author

abaire commented May 16, 2022

@ko81e24wy where is the nearest save point so I can reproduce that?

I think you just need to start a new game

There was a stupid mistake in my code that is now fixed, can you test again when you have a chance?

@ko81e24wy
Copy link

I am just going to work and test with a combine version with this PR merged
that white glitch was gone,and I dont know if this PR relate to #783
xemu-2022-05-17-07-42-08

@abaire
Copy link
Contributor Author

abaire commented May 17, 2022

@ko81e24wy cool, thanks for double checking!

I have not tried to debug 783 yet, so if this PR happens to fix the missing lighting that'd be a nice bonus. I'm not exactly sure what that screenshot should look like compared to HW though; are you saying it looks more correct with this PR than before?

@ko81e24wy
Copy link

@ko81e24wy cool, thanks for double checking!

I have not tried to debug 783 yet, so if this PR happens to fix the missing lighting that'd be a nice bonus. I'm not exactly sure what that screenshot should look like compared to HW though; are you saying it looks more correct with this PR than before?

that white issue remind me of that light effect,the last PR just as before with that effect missing.

@mysteria25
Copy link

Was hoping this would fix the black graphics of Inside Pitch 2003 but guess that is a separate issue from this pr since that game still looks like this:

inside pitch 2003

@abaire
Copy link
Contributor Author

abaire commented May 17, 2022

Was hoping this would fix the black graphics of Inside Pitch 2003 but guess that is a separate issue from this pr since that game still looks like this:

@mysteria25 as far as I can see from the compatibility page there is no bug tracking this problem, please file one and it'll get looked into eventually (PRs are definitely not the place to report untracked issues). Grabbed by the Ghoulies has a black texture problem related to texture compression; there are many ways for things to fail so it's entirely possible what you're seeing with that game is something novel that will need a specific fix.

@abaire abaire marked this pull request as ready for review May 23, 2022 03:35
@abaire
Copy link
Contributor Author

abaire commented May 23, 2022

This doesn't entirely match HW behavior on my GTX 1070 but it is much closer and fixes #365 despite the remaining departure around infinite values and zeros.

" return src;\n"
" }\n"
" ivec4 signs = floatBitsToInt(src);\n"
" return mix(src, mix(vec4(1.0), vec4(0.0), lessThan(signs, ivec4(0))), nans);\n"
Copy link
Member

@mborgerson mborgerson May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The sign bit vectorization is neat, nice :)
  • So "negative NaN" (msb set) gets 0 and "positive NaN" gets 1.0?
  • Seems odd because you mentioned 0*NaN should == 0? So under what conditions do you expect it to be 1?(nvm, I see your comment about it) I don't think this it makes sense to generalize this based on sign bit alone here. More about this thought below.
  • It looks like mix(genBType) was added in GL4.5 so this might fail in a 4.0 context. Does it work on your M1 Mac? You could cast the boolean to a float

As far as I can see in the general case, I think this NaN handling is not more correct than just using 0.0 in the event we do get a NaN (which would force all platforms to behave the same at least, but might be unnecessary if we don't have issues cataloged for it). Is there another case in vsh output that this solves for, beyond fog?

To fix this fog issue for Otogi (and now at least 1 other game (Triggerman) that is affected in the same way that I've seen), I do think it's a fair compromise to have specialized fog NaN handling (like we did with Inf previously); and if this is only intended to fix the fog issue, I think we should just stick with the specialized handling for that case.

The fishing game is a different, interesting case that appears to be setting light colors to NaN, so for this second specialized case I think it makes sense to interpret NaN as 1.0 before performing lighting calculations, so lighting is calculated correctly. Perhaps it makes sense to generalize NaNs in the uniform spaces to 1.0 for such cases but I'd have to think more about it. Needs more investigation to understand how hardware is handling that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial response:

As far as I can see in the general case, I think this NaN handling is not more correct than just using 0.0 in the event we do get a NaN (which would force all platforms to behave the same at least, but might be unnecessary if we don't have issues cataloged for it). Is there another case in vsh output that this solves for, beyond fog?

The tester itself is using the diffuse channel and the xemu output does not match HW, if that's what you're asking. I don't know of any game where this makes a difference offhand, but that doesn't mean there isn't one with graphics glitching attributable to this handling.

An example difference with all three cases in one place:

master:
-8_1

This branch:
-8_1

HW:
-8_1

To fix this fog issue for Otogi (and now at least 1 other game (Triggerman) that is affected in the same way that I've seen), I do think it's a fair compromise to have specialized fog NaN handling (like we did with Inf previously); and if this is only intended to fix the fog issue, I think we should just stick with the specialized handling for that case.

Otogi's fog is the only case I know of (I assume someone has already looked into Triggerman and determined that just applying the fix to the fog fixes that game? I don't own that one to test). I guess the issue is balancing the performance impact of doing this check against everything in an attempt to be closer to HW behavior versus the time a contributor might end up spending tracing some glitch back to a FIXME in the shader code.

The fishing game is a different, interesting case that appears to be setting light colors to NaN, so for this second specialized case I think it makes sense to interpret NaN as 1.0 before performing lighting calculations, so lighting is calculated correctly. Perhaps it makes sense to generalize NaNs in the uniform spaces to 1.0 for such cases but I'd have to think more about it. Needs more investigation to understand how hardware is handling that case.

Hopefully @Triticum0 wouldn't mind splitting Lakemasters into a new issue given that it sounds like it's a similar symptom due to a different bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that the shader works on macOS 12.4 (M1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abaire Lakemaster is fixed with this PR
xemu-2022-06-02-20-51-16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1008 is due to a similar cause, INF * 0 should == 0 but is instead NaN (at least on my machine and presumably other nvidia systems since the original report was on a 3060).

In this case the incorrect number causes a chain of errors and catching it at the very end where the outputs are assigned would also lead to incorrect behavior; in the test cases I examined for #1008 the output diffuse should be set to values between 0 and 1 so blindly detecting the NaN and setting to either would do the wrong thing.

@Fabxx
Copy link
Contributor

Fabxx commented Jun 2, 2022

image
current handling causes this bug where the character is highlighted in proximity of lights

@abaire abaire marked this pull request as draft June 3, 2022 02:26
@abaire
Copy link
Contributor Author

abaire commented Jun 3, 2022

current handling causes this bug where the character is highlighted in proximity of lights

Looks like limiting the NaN fix to fog will fix that regression and align with Matt's suggestion, will update and re-push in a bit.

@Fabxx
Copy link
Contributor

Fabxx commented Jun 9, 2022

The highlight problem seems to be fixed with #1045

@ko81e24wy
Copy link

I test the lastest build of this PR,and it crash when I run xemu.
xemu.log

@abaire
Copy link
Contributor Author

abaire commented Jun 29, 2022

I test the lastest build of this PR,and it crash when I run xemu. xemu.log

Bad merge, should be fixed now. Thanks for letting me know!

@Triticum0
Copy link

@abaire could you rebase this pr with the work you did on Match inv_w qualifier to attribute qualifier, Thanks

@Kohryujin
Copy link

Can we get another pull request or rebase or however that works so there is a working download available for this branch for testing? The one from July is expired and no longer available.

@TheHollows
Copy link

Any update on when this will be merged?

@mborgerson
Copy link
Member

@TheHollows

Any update on when this will be merged?

When it's ready

Comment on lines +865 to +871
" bvec4 nans = isnan(src);\n"
" if (!any(nans)) {\n"
" return src;\n"
" }\n"
" ivec4 signs = floatBitsToInt(src);\n"
" vec4 negative = vec4(lessThan(signs, ivec4(0)));"
" return mix(src, mix(vec4(1.0), vec4(0.0), negative), vec4(nans));\n"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" bvec4 nans = isnan(src);\n"
" if (!any(nans)) {\n"
" return src;\n"
" }\n"
" ivec4 signs = floatBitsToInt(src);\n"
" vec4 negative = vec4(lessThan(signs, ivec4(0)));"
" return mix(src, mix(vec4(1.0), vec4(0.0), negative), vec4(nans));\n"
" if (isnan(src.x)) {\n"
" src.x = step(0.0, sign(src.x));\n"
" }\n"
" if (isnan(src.y)) {\n"
" src.y = step(0.0, sign(src.y));\n"
" }\n"
" if (isnan(src.z)) {\n"
" src.z = step(0.0, sign(src.z));\n"
" }\n"
" if (isnan(src.w)) {\n"
" src.w = step(0.0, sign(src.w));\n"
" }\n"
" return src;\n"

Hi @abaire, may a suggest to change the code to what I wrote? I tried your current solution but it didn't fix anything on my PC. It behaves exactly as master. From my tests, my change behaves almost exactly as real hardware. Almost because there's a slight difference in the -NaNQ_NaNQ and -NaNS_NaNS tests cases in the N/A column, but I'm pretty sure that it's safe to ignore.

I tested a lot of games I couldn't find any regressions in any of them. Games that are fixed by this change:

  • Otogi
  • Otogi 2
  • Pro Cast: Sports Fishing Game
  • Trigger Man

HW - PC (AMD RX 6600M )

Attrib_float_0_1
Attrib_float_0_8
Attrib_float_-1_1
Attrib_float_-8_1
Attrib_float_-INF_INF
Attrib_float_-Max_Max
Attrib_float_-MaxSN_MaxSN
Attrib_float_-Min_Min
Attrib_float_-MinN_MinN
Attrib_float_-NaNq_NaNq
Attrib_float_-NaNs_NaNs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're very welcome to take this one over, I've got some other projects going at the moment and don't have much time for xemu. It'd be good to test on an nvidia card as well, I'm not sure if glsl guarantees NaN behavior so it may be manufacturer or device specific.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a gtx 3060 and can test nvidia, just need the build and the test program...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's too bad to heard. I will need to open a new PR then but I will link it to this one. Once is merged we can close both. Thanks for your research and I hope that you are back to xemu in the future.

@medievil1 you can download the build once I open the new PR.

@mborgerson
Copy link
Member

Thank you for the PR! Since this work is continuing in #1780 with original author sign off, I'll close this PR now

@mborgerson mborgerson closed this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blackened Texture Issues and Documentation