Skip to content
This repository has been archived by the owner on Jan 8, 2019. It is now read-only.

Active directory auth fails on first attempt, succeeds on second #1707

Open
jontaa opened this issue Dec 6, 2015 · 17 comments
Open

Active directory auth fails on first attempt, succeeds on second #1707

jontaa opened this issue Dec 6, 2015 · 17 comments

Comments

@jontaa
Copy link

jontaa commented Dec 6, 2015

We noticed that after enabling AD authentication on our Graylog 1.2.2 the first attempt was always unsuccessful, while the second was always successful.

So I did a packet dump during the logins to try and narrow it down.
This is what happens:

Packet dump started
User try to login 1st time
Login fail
Wait for 20 seconds
Packet dump stopped

New packet dump started
User try to login 2nd time
Login successfull
Wait 20 seconds
Packet dump stopped

When checking the packet dump I can see that after the 1st login, no attempt to communicate with LDAP servers were made.
When 2nd login attempt was done, I could see LDAP communication.
I can successful replicate this issue every time by logging out user and attempt to login again.

What debugs and logs could be useful here? Is this a known issue?

@kroepke
Copy link
Contributor

kroepke commented Dec 7, 2015

We have seen reports of this but were unable to reproduce the issue so far.
Does the failed attempt take 2 seconds to fail or does it fail immediately?

@jontaa
Copy link
Author

jontaa commented Dec 7, 2015

I tested now. I counted to about 3 seconds to fail.
Another observation is that it doesn't matter what you put as password the first time, as long as you put in the correct password the second time.

@kroepke
Copy link
Contributor

kroepke commented Dec 7, 2015

Ok, this sounds like a problem with the connection pool. I'll take a look if I see anything obvious.

@jontaa
Copy link
Author

jontaa commented Dec 7, 2015

Thanks. Let me know if you want me to run any tests on my system, since I can replicate the issue every time.

@kroepke
Copy link
Contributor

kroepke commented Dec 7, 2015

Can you please check what the output in the server log is when you set the log level of org.apache.directory.ldap.client.api.LdapNetworkConnection to DEBUG?

For how to do that please see: http://docs.graylog.org/en/1.2/pages/installation/manual_setup.html?highlight=external#supplying-external-logging-configuration or you could use the REST API browser.

@jontaa
Copy link
Author

jontaa commented Dec 8, 2015

Ok so I edited /opt/graylog/conf/log4j.xml and appended

 <logger name="rg.apache.directory.ldap.client.api.LdapNetworkConnection">
        <level value="debug"/>
    </logger>

Then I did a graylog-ctl restart
I then replicated the issue a few times.
graylog-ldap.txt

@kroepke
Copy link
Contributor

kroepke commented Dec 8, 2015

Thanks!

This looks like the connection simply times out the first time, I wonder what it does all that time.
Can you play with the LDAP connection timeout setting in graylog.conf:

ldap_connection_timeout = 5000 # default is 2000 milliseconds

Increasing it should make the problem go away, apparently the connection to the server takes quite some time. If that's successfull we'll increase the default timeout which should solve the issue for most people.

@jontaa
Copy link
Author

jontaa commented Dec 8, 2015

No difference. Now it takes 5 seconds for it to fail the first time.
Remember that I dumped the traffic on the first attempt, and it didnt even try to connect to any LDAP server until the second attempt.

@kroepke
Copy link
Contributor

kroepke commented Dec 8, 2015

when the first failure to connect happens, is there any line in the server log that say Couldn't connect to LDAP directory?

@kroepke
Copy link
Contributor

kroepke commented Dec 8, 2015

Do you have the same problem when testing the LDAP configuration?
With both "Test connection" and "Test login"? Those parts use the exact same code as the login itself.

@jontaa
Copy link
Author

jontaa commented Dec 8, 2015

So here is a text file which is only the log for a failed attempt
graylog-ldap.txt

Testing "test connection" and "test login" works fine when logged in.

@kroepke
Copy link
Contributor

kroepke commented Dec 8, 2015

I simply cannot reproduce this locally. For me the first request simply succeeds and I cannot even see any way how a second connection is any different from the first.

Is there anything special about your setup? High latency to AD, loadbalancer in front etc?

@jontaa
Copy link
Author

jontaa commented Dec 9, 2015

Yeah this is super weird
So lets see how the work flow is
I go to graylog and I get this login box
image
After a few seconds I get this box
image
Which after a few more seconds changes to this box
image

In the log we can see
2015-12-09_11:09:26.75305 ERROR [LdapConnector] Timed out connecting to LDAP server
2015-12-09_11:09:26.75656 ERROR [LdapUserAuthenticator] LDAP error
2015-12-09_11:09:26.78837 WARN [SessionsResource] Unable to log in user jobn
2015-12-09_11:09:26.81711 DEBUG [LdapNetworkConnection] ------>> Cannot get the connection... Retrying
2015-12-09_11:09:26.84720 DEBUG [LdapNetworkConnection] Interrupted while waiting for connection to establish with server XXXX.local:389

However if we check the packet capture we took on the graylog host
image
So we see traffic to and from the graylog server. Lets now filter on all traffic which has TCP port 389
image
Nothing

If I compare with a packet capture during second and successful login attempt take in the same way
image

Ok so we know that if no connection is made, could it be DNS?

So in our unsuccessfull attempt we see that graylog is resolving DNS in the exact same way as the successful.

@pandaadb
Copy link

pandaadb commented Sep 29, 2016

Hi,

I am not sure if this is relevant at all, but I just ran into a similar issue. I thought I would post my findings as this was really odd to me as well.

What happened was that I configured my LDAP config (userfilter and groupfilter) with something like:

ldap.groupFilter= cn=groups,cn=accounts,dc=test,dc=com

Other properties accordingly - not important.

What happened was that each first attempt returned unauthenticated, while each subsquent attempt does authenticate.

In the backgroup, my configuration file used apache PropertySubstitutor. This particular class uses a Property resolver that reads , signs as separators, hence the group filter went from being a string, to being an array of strings (each , separator is 1 entry). So the configuration just didn't work, which is why the initial authentication failed. (I do not know why every other authentication worked .. it was kind of self healing). I only realised this after doing the config manually in a little test program and realising that it did in fact work.

I thought I'd post this, as this could be why one of you can't reproduce it, while it behaves this strangely for others.

Of course I don't know if that's still relevant or if I'm even making sense in this case :) google lead me here, i thought maybe this could help, if not please ignore me :)

Regards,
Artur

@lliknart
Copy link

Hi,

I have the same issue. Users have to try 2 attempts to log in. Same result as @jontaa

@MichaelW-SD
Copy link

Same here. I just updated to 2.1.1 but unfortunately it still happens.

@Pushpendra-blr
Copy link

Hi Guys, This can be caused by any recent Microsoft update which might have tickled .net frame work settings, in our case we went through series of log collection nothing helped us, but cloning the PC and upgrading . Net frame work to version 4 helped us to fix the problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants