gfortran 11.2 throws IEEE overflow and invalid exceptions on apple silicon arm64 #3148
-
Hello, I recently bought an Apple Silicon Mac with M1 Pro and I'm having some trouble with gfortran (homebrew gcc 11.2.0_3 on Monterey 12.3.1). I am running some scientific numerical code that compiles without issue but throws IEEE_OVERFLOW_FLAG and IEEE_INVALID_FLAG exceptions upon execution (with NaN or extremely large values for some variables) that I do not see when I compile it on my old Intel machine (same OS, gfortran/gcc, brew and Xcode command line tools versions, all up to date). Curiously, the error doesn't occur every 20 times or so I execute the compiled binary, and in that case the code produces the same output as on the intel machine. I'm not exactly an expert on neither fortran nor compilers, but is this a known issue and is there a workaround? The code is a fairly complex construct of about 4000 lines of code and a database, if this is something that could be resolved by modifying it then advice on debugging would be equally appreciated. Many thanks, PS: I hope I chose the right forum to ask my question, if not I would be grateful if you could point me in the right direction. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
cc @fxcoudert since you know the most about this area and might know what next steps to suggest. Though I will say GCC 12 is coming soon which will also be the time where we take the latest state of any arm64 patches, and it's possible it's been fixed already. |
Beta Was this translation helpful? Give feedback.
-
Hello again, I managed to trace the error to a matrix inversion subroutine for solving the problem A.x = b where A and b are computed by the rest of the program. This routine ('linbcg') is called on each iteration of the loop (as part of another subroutine 'calvel'), with new values for A and b, but the value of the output x was not cleared between iterations and just overwritten, however, since the method uses preconditioning and changes between iterations are small, the previous value of x was handed over as an initial guess. It then seems that there was some kind of memory corruption after the second iteration that corrupted x and then threw the code off the rails (see attached screenshots). In this case, the problem could be solved easily by clearing the value of x on each iteration, which is probably good programming practice anyway. However, I'm still surprised that this problem only occurred on the Apple chip and not on Intel. So is it a bug or a feature? :D Thanks again for your help, |
Beta Was this translation helpful? Give feedback.
Hello again,
I managed to trace the error to a matrix inversion subroutine for solving the problem A.x = b where A and b are computed by the rest of the program. This routine ('linbcg') is called on each iteration of the loop (as part of another subroutine 'calvel'), with new values for A and b, but the value of the output x was not cleared between iterations and just overwritten, however, since the method uses preconditioning and changes between iterations are small, the previous value of x was handed over as an initial guess. It then seems that there was some kind of memory corruption after the second iteration that corrupted x and then threw the code off the rails (see attached screens…