Always On VPN – User Tunnel not being established (occasionally)

We have an AO VPN solution where some users are occasionally having problems establishing User Tunnel. It’s important to note that this only occurs occasionally and is not a permanent issue that occurs each time.

The protocol type in profile settings is Automatic, which means that VpnStrategy will be SSTP, IKEv2, PPTP then L2TP. The Device Tunnel will be established just fine on IKEv2, but User Tunnel will fail with error code 800 after trying all protocols. (On the VPN server, we are only permitting connections on SSTP and IKEv2)

Multiple tries will result in the same failure, all the while Device Tunnel for the same user will be connected just fine, and several other users will have active User Tunnels just fine. If the protocol type is changed to IKEv2 in profile settings, the error does not occur, but we need to use SSTP for User Tunnel, and for that we must set protocol type as Automatic in the profile settings.

In the Application log on the client, EventID 20227 is logged with “The user XYZ dialed a connection named ABC which has failed. The error code returned on failure is 800.”

No help from Microsoft Docs,
https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/troubleshoot-always-on-vpn#error-codes

Has anyone else experienced this issue?

As a workaround, set the NativeProtocolType to IKEv2.

Do you have enough ports for all the connections in RRAS?

I saw something yesterday about windows server removing those type of vpn connections via update I think. Probably unrelated but mention it anyways

Edit link: Microsoft deprecates PPTP and L2TP VPN protocols in Windows Server

A few thoughts:

  • SSTP is happening over HTTPS on port 443. Have you checked what kind of web connectivity situations the users are in? e.g. behind web proxies, firewalls, etc.?
  • You say this happens occasionally - is there a pattern you can think of? e.g. it always happens with people staying at a specific hotel chain, a specific coffee vendor, homes with a specific ISP, a specific time of day? This could be something as dumb as a hotel chain transparently routing its guest wifi through a web proxy to block bad sites, and screwing up your connection in the process.
  • What is the IP connectivity situation of the remote site? Depending on the ISP, they might be getting an IPv6-address and PLAT instead of the IPv4 you probably expect. Even if they have an IPv4, Windows will prefer IPv6 unless you specifically changed that.
  • Do the device and the user tunnel aim at the same target? Is the target static? e.g. does the entire setup always connect to vpn.robybaggio-corp.com which only resolves to one IP address, or is it possible the device tunnel and the user tunnel might attempt to connect two different servers?
  • Since SSTP is using SSL/TLS under the hood: Have you made sure the server certificate is valid and can be verified in all cases? This could be something as silly as the client not being able to verify the server certificate because it’s been too long since it saw the CA CRL and the CRL is in LDAP and can’t be accessed without the tunnel.

Basically: I believe your best bet is narrowing down the circumstances. Start keeping track of the circumstances these hiccups happen under, and the more you know about when and where it happens, the easier it will be to determine the root cause.
Even if the result is “it could happen to anyone, at any time”, you’ve narrowed it down to the common settings of your Windows client environment. That’s still a massive reduction of possible sources of error.

I have already tried that, and then the issue does not occur. But we must use SSTP, therefore changing it to anything apart from Automatic is not an option.

Device tunnel is not scoped to 0.0.0.0/0 route.
95% of the time user tunnel is established just fine, it’s an intermittent issue.

Yes, 1024 connections can be accommodated, with only 50 active connections occurring.

  • It’s not an issue with the network itself, as the issue only occurs intermittently, and occurs regardless of the network user is working from. It also suddenly fixes itself, it especially helps to restart.
  • Not able to see any pattern. User might experience the issue, and then suddenly it will work from the same network.
  • Device and User tunnel aim at the same target, but with different FQDN, IP address is the same. With wireshark it’s possibly to view connection attempts on the VPN server. So network connectivity between client and server is definitely there.
  • CRL is accessible on the internet. Server certificate is fine, since 95% of connections work just fine.

Richard Hicks have written about a similar issue, but there IKEv2 is used, and not SSTP (as in our case),

https://directaccess.richardhicks.com/2019/01/07/always-on-vpn-ikev2-connection-failure-error-code-800/

If a restart helps…what’s the uptime of the machines this happens to as reported by systeminfo?

Thanks to Microsoft’s shittiness, a shutdown is no longer a shutdown.
Maybe the issue correlates with de facto session length?

And have you tried all the things a reboot would to manually? Restarting VPN services, restarting network services, disabling and reenabling network adapters…?