Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using JuMP causes Pardiso crash #88

Open
goulart-paul opened this issue Mar 4, 2022 · 10 comments
Open

Using JuMP causes Pardiso crash #88

goulart-paul opened this issue Mar 4, 2022 · 10 comments

Comments

@goulart-paul
Copy link

I am getting a Pardiso crash if I use JuMP at the same time, but it seems to depend on the order of package inclusion. Example is below. If I do using JuMP before using Pardiso, I get a crash, but if I reverse the order there is no crash. This is all from a fresh Julia session, i.e. I haven't used JuMP for anything, just included it.

using JuMP     #<---- putting JuMP here causes Pardiso to crash
using Pardiso
#using JuMP    #<---- putting JuMP here instead does not

n = 4
A = sparse(I(n)*1.)
b = ones(n)

#solve with Pardiso
ps = MKLPardisoSolver()
set_msglvl!(ps, Pardiso.MESSAGE_LEVEL_ON)
Pardiso.solve(ps, A, b)

The Pardiso output in the case of a crash is something like this:

*** error PARDISO: reordering, symbolic factorization

=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON


Summary: ( starting phase is reordering, ending phase is solution )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.001230 s
Time spent in reordering of the initial matrix (reorder)         : 0.000004 s
Time spent in symbolic factorization (symbfct)                   : 0.001095 s
Time spent in allocation of internal data structures (malloc)    : 0.023739 s
Time spent in additional calculations                            : 6.144252 s
Total time spent                                                 : 6.170320 s

Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

< Linear system Ax = b >
             number of equations:           4
             number of non-zeros in A:      4
             number of non-zeros in A (%): 25.000000

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    4
             size of largest supernode:               1
             number of non-zeros in L:                2817782047
             number of non-zeros in U:                1
             number of non-zeros in L+U:              2817782048
             gflop   for the numerical factorization: 0.000000

ERROR: LoadError: Reordering problem.
Stacktrace:
 [1] check_error(ps::MKLPardisoSolver, err::Int32)
   @ Pardiso ~/.julia/packages/Pardiso/3uj3F/src/mkl_pardiso.jl:80
 [2] ccall_pardiso(ps::MKLPardisoSolver, N::Int64, nzval::Vector{Float64}, colptr::Vector{Int64}, rowval::Vector{Int64}, NRHS::Int64, B::Vector{Float64}, X::Vector{Float64})
   @ Pardiso ~/.julia/packages/Pardiso/3uj3F/src/mkl_pardiso.jl:73
 [3] pardiso(ps::MKLPardisoSolver, X::Vector{Float64}, A::SparseMatrixCSC{Float64, Int64}, B::Vector{Float64})
   @ Pardiso ~/.julia/packages/Pardiso/3uj3F/src/Pardiso.jl:346
 [4] solve!(ps::MKLPardisoSolver, X::Vector{Float64}, A::SparseMatrixCSC{Float64, Int64}, B::Vector{Float64}, T::Symbol)
   @ Pardiso ~/.julia/packages/Pardiso/3uj3F/src/Pardiso.jl:260
 [5] solve
   @ ~/.julia/packages/Pardiso/3uj3F/src/Pardiso.jl:222 [inlined]
 [6] solve(ps::MKLPardisoSolver, A::SparseMatrixCSC{Float64, Int64}, B::Vector{Float64})
   @ Pardiso ~/.julia/packages/Pardiso/3uj3F/src/Pardiso.jl:221

Note the huge number of non-zeros in L. In this example it reports "Reordering problem", but I also see "out of memory" errors on occasion when doing something similar. It seems like including JuMP first somehow causes a memory leak in Pardiso, but I don't understand how that is possible.

@KristofferC
Copy link
Member

Could it be related to #73?

@goulart-paul
Copy link
Author

If I replace JuMP with MKLSparse (the cause of the issue in #73) then I see similar behaviour. Including MKLSparse first produces a crash but the other way around does not. The error in the case of MKLSparse is "Input inconsistent" though and the L and U factors don't appear to be corrupt, at least in the sense that they are reasonable size.

It seems like the root cause may well be the same though.

@KristofferC
Copy link
Member

It would be good to try to minimize the problem a bit. For example, is it required to load the full JuMP package or is it enough to load one of the dependencies for it to occur?

@goulart-paul
Copy link
Author

I will look into it and see what I can find. It's pretty clear that there is a type error of some kind when making a ccall to the pardiso library, and I can see that the inputs to that call are different in the two cases. I will try to unwind the problem from there and make a PR if I can fix it.

@goulart-paul
Copy link
Author

As suggested in #73, the issue appears to be here. Specifically, the call to :MKL_Set_Interface_Layer coming from within MKLSparse.__init__() seems to be the problem.

This causes the same failure:

using MKL_jll 
ccall((:MKL_Set_Interface_Layer, MKL_jll.libmkl_rt), Cint, (Cint,), 1)
using Pardiso

A = sparse(I(1)*1.)
b = ones(1)
ps = MKLPardisoSolver()
Pardiso.solve(ps, A, b)

It's not clear to me how to fix it though. It seems like it should be possible to just set the interface layer back to 0 (presumably 32 bit), but that doesn't work. In other words, this still fails :

using MKL_jll 
ccall((:MKL_Set_Interface_Layer, MKL_jll.libmkl_rt), Cint, (Cint,), 1)
ccall((:MKL_Set_Interface_Layer, MKL_jll.libmkl_rt), Cint, (Cint,), 0) #<-- no effect?
using Pardiso

A = sparse(I(1)*1.)
b = ones(1)
ps = MKLPardisoSolver()
Pardiso.solve(ps, A, b)

It seems as if the setting from the first call to :MKL_Set_Interface_Layer is somehow "sticky", and subsequent calls don't change the layer. If the first call sets the layer to 0, then Pardiso works.

Even if the value were resettable, there is no obvious way of just fetching the current interface layer setting as far as I can tell. Otherwise a possible solution might just to store the value, make ccalls within Pardiso, and then change it back immediately after every time. The MKL documentation here sort of implies that you can just give a bad value and get the current setting as output, i.e. maybe this would work:

mklstate = ccall((:MKL_Set_Interface_Layer, MKL_jll.libmkl_rt), Cint, (Cint,), -1)
<pardiso calls>
ccall((:MKL_Set_Interface_Layer, MKL_jll.libmkl_rt), Cint, (Cint,), mklstate)

That's as far as I could get with this.

@KristofferC
Copy link
Member

I think you need to set the interface layer before calling any other MKL functions.

@goulart-paul
Copy link
Author

It seems like part of the issue is that Pardiso is defaulting to use :pardiso instead of :pardiso_64 when the module is initializing, specifically this line. On my system I end up with the 32 bit version because LinearAlgebra.BLAS.vendor() === :openblas64, rather than === :mkl.

A fix that seems to work on my system is changing that line to this:

if (LinearAlgebra.BLAS.vendor()  (:mkl,:openblas64) && LinearAlgebra.BlasInt == Int64)
    const MklInt = Int64
    const PARDISO_FUNC = :pardiso_64
else
    const MklInt = Int32
    const PARDISO_FUNC = :pardiso
end

I can make a PR with that change if it makes sense, but I suspect that I may be fixing the problem by accident rather than addressing the root cause.

@mipals
Copy link
Member

mipals commented Feb 27, 2024

Is this still an issue? The reordering problem in #90 seems to be fixed, so maybe it is also fixed here?

@goulart-paul
Copy link
Author

Is this still an issue? The reordering problem in #90 seems to be fixed, so maybe it is also fixed here?

I'm afraid I can't say other way since MKL is not supported on Mac anymore, so no way for me to test. If the examples above work for you know then I'm happy to close it.

@mipals
Copy link
Member

mipals commented Mar 6, 2024

I am unfortunately in the same situation (i.e. no MKL on mac) so I can not try it out using MKL. I've tried using Panua-Pardiso for which the example seem to run fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants