Users unable to login to terminal server with Webroot installed.



Show first post

198 replies

Just a quick update - not sure this actually helps anyone but thought I'd share. I've been wondering if this was something that Webroot was doing in real time (perhaps a heuristic trigger?) or something that was just breaking until the system was rebooted. I finally got to test that theory today when this happened to a server that was logged in on the console of the Hyper-V machine. So this time I could actually poke around the server for a bit before the customer had me reboot it.
 
Long story short, I disabled protection and it made no difference. However, I also attempted to uninstall Webroot from the WR console and then force an update from the tray icon, which 'usually' uninstalls pretty quickly from most machines, however it didn't in this case. I rebooted the server and WR was still on it. It did uninstall some time later when I wasn't looking.
 
Anyway, I did load up procmon for a few minutes prior to the reboot, but WRSA.exe touched something like 14 million objects in that time so I had a giant haystack but didn't even know what kind of needle I should look for.
Userlevel 6
Badge +26
All - quick update. We're getting our various test servers setup and during several previous tests, we're seeing a pattern related to networking, especially virtualized aspects, but didn't isolate it to hardware, specifically.
 
A pattern as started to emerge. (Nothing concrete, so please, do not take this as a fix in any way, just something to start looking into.) We're seeing NIC related setup issues that's throwing errors. Again, nothing definative, but we're seeing consistant issues with Broadcom cards vs Intell cards (Broadcom bad, Intel good) and how they're configured.
 
Anyone interested in publishing here or sending me privately thier bare metal setup, please do. I'm gathering a lot of information from various partners as well, so we're looking at all the variables, not just OS and file level, but down the stack a bit further.
 
FYI.. policy settings are not really going to affect anything, we've tested every combination and there's a kernel level issue that's causing issues at a code level. Rather than uninstall, I would recommend just disabling WSAB by either sending down the Unmanaged policy and "shutting down protection" in the system tray icon menu, or just changing your current policy to allow manual shutdown.
 
Full removal kills all historical logs/data. Shutting it down just stops it from running. Same affect, but not as destructive.
I've got dual Intel I350 NIC's in each of my Hyper-V machines. These are put into a team by Windows and this team is what the virtual switch in Hyper-V is bound to.
All,

We're seeing the same problems...  We've got RDS farms setup with 2012R2, under both KVM based and Xen hypervisors.  Both being equally affected.  The smaller setups (fewer users) seem to occur less frequently, but that's somewhat subjective.  Certainly the servers with 30 - 50 users on them, it happens more frequently.

Only resolution once occurred, is to reboot the server.

All running the latest 9.0.13.58 WR.
All fully patched up, including optional MS KB patches, including the ones specified in this thread.

I've been forced to uninstall WR entirely on several installations / customers.  We just can't handle the backlash this is causing, and loss of productivity.  Servers being rebooted daily (or more in some instances), to resolve an issue like this, from what was months previously, is just unacceptable.

As another poster commented...  I like everything about Webroot, except its tendency to occasionally take my customer base entirely offline.

I'm sure you're working on it, and *we* appreciate that, but seriously, this is a MAJOR issue.

Thanks.

W.
Our customers affected are largely IBM hosts, all running virtualized TS's on HyperV x3650;s Mx (m3, m4, m5), x3400 M3, has also hit ProLiant DL380 G7 HyperV host, can't tell you the NIC in this sorry as it's a retired asset and don't have that level of info
 
affected network cards can be any of the following (mostly the terminal servers are hyper-v virtual running on these IBM hosts)
 
Broadcom NetXtreme  & NetXtreme II
Broadcom BCM5709C (NetXtreme II GigE
Broadcom BCM5716C NetXtreme II GigE
Intel I350 
Userlevel 1
Can confirm, experiencing potentially the same issue with a Lenovo ThinkServer RD430, Intel I350 NIC as the Hyper-V host.

Webconsole reports the agent version as 9.0.13.58, but the local "Webroot SecureAnywhere" reports 8.7.28. Not sure which to believe.
 
New sessions are not created. They stall at "Securing remote connection" till they timeout (multiple minutes). Restarting fixes for anywhere from a few hours to a day.
 
Userlevel 2
I would trust the local application if you open it and it says what version it is, don't though trust what is shown in add remove programs list as it will show the version that was originally installed.

I was also going to update now that it has been a week that we have closed all our internal tickets related to webroot issues on terminal servers as they have gone a week without issue since 9.0.13.58 (all on esxi). Hopefully that will give some hope to the hyper-v users that a resolution can be found for them as well.
Userlevel 1
I had shutdown the agent, but now checking the GUI of the agent, the 9.0.13.58 appears correct.
Badge +1
Found this post on Friday after all our TS ran into this problem.  I can't believe how long this has been going on for. Support says roll back to 8.0.8.53 and disable automatic updates is a workaround. 
 
 
Still an ongoing problem for us.
 
I have confirmed the following:
 
We are running Intel I350 NICs
Driver: 12.0.150.0
 
Seems to be a pattern with these NICs if it is NIC related. 
 
Userlevel 1
Happened again this morning to another customer (on a different Server2008R2 where I had yet to remove Webroot).
 
Hyper-V VM, running on Dell hardware with both Broadcom (built-in) and Intel (add-on) NIC cards.
 
Really looking forward to a permanent fix here guys.
Userlevel 1
Having disabled WR for the past ~24 hours, the issue has not recurred. I'm going to wait another 24 hours, and then push a previous version out and see if we're still stable.
Have you rolled back yet? If so, have you experienced any of the aforementioned issues?
Userlevel 1
From previous posts, I believe some other organizations have had success with version 8.0.8.53, but we have not yet re-installed it. Our impact is limited to a single terminal server at a single client.

Our internal process requires confirmation that the issue has been isolated, hence the delay.

The client has been advised of the risks of running with WR disabled, and accepted them.
Userlevel 4
Badge +7
Hi Everyone,
 
I am seeing a lot of information in this thread and was wondering if someone from Webroot could do a roll-up of what works and what doesn't so as those of us reading this can move forwards with a fix.
My understanding from what I have read on here;
- If you are running Server 2008R2 then this is a Webroot problem and the fix is to roll back to version 8.0.8.53
- If you are running Server 2012R2 then it is a Microsoft problem https://support.microsoft.com/en-us/kb/3179574
Can this please be confirmed?

My affected servers so far are physical running both Server 2008R2 and Server 2012R2 However I am sure that there are other servers that are having the issue that I am not aware of given home many clients we support as an MSP I have asked my techs to provide me with a list of sites experiencing this issue.
Cheers,
Andrew
Userlevel 1
?
Ok, I'm doing this, I've sent the Unmanaged policy to the affected server, but after it has refreshed config, there is no "shutdown protection" option in the system tray.  
 
I can open WSAB from the tray and disable:
-Realtime shield
-Web Shield
-Firewall
 
After that, the icon says "protection is disabled" which is good, but the WRSVC service is still running.
 
Is this good enough to "stop it from running" so this issue doesn't happen again?
Userlevel 1
My single client that has been affected has the following configuration:
Windows Server 2012 Hyper-V Host
Lenovo ThinkServer RD430
Intel I350 NIC
 
Terminal Server Guest VM
Windoows Server 2012 R2
WR version that seemingly caused the issue: 9.0.13.58
 
Symptoms:
After some time, new RDP sessions are unable to be created. They stall at "Securing Remote Connection" Existing RDP sessions seem to be fine, but if they logoff, and attempt to log back in, they get the same issue. Restarting the server fixes it temporarily, but the issue recurs within 24 hours.
 
Resolution:
The only thing I've done so far is to completely kill WR locally on the Terminal Server VM, and uninstall it via the web console. We've been stable for 48+ hours now with no other changes. I've confirmed that we did not yet apply the MS patch previously mentioned.

We will be attempting the rollback version today. I will keep you updated on the progress.
Userlevel 6
Badge +26
All on this thread, I'm compiling what we know at a high level, but there is nothing that "works". There are "workarounds" depending on your setup and specific symptoms.
 
If experiencing on a physical server, then work with Microsoft and their patch and update options.
 
For virtual environments:
 
Option 1: Roll back to an older version of WSAB 8.0.8.53  (Pre 9.0.x which addresses many of the newer malicious attack and supports folder/directory whitelistings, so the caveat is, that older version may not catch some of the modern attacks as it's almost a year old AND if you have folder whitelists in place, then that agent version WILL NOT work.)
Option 2: Setup a nightly reboot of the entire server to clear all sessions and reset the Terminal Server. (Kicking the can down the road, but it's helped a number of production servers in the field.)
 
If you've worked directly with Support on your particular situation, continue to work with their response as they may have a little more insight on the specific logs reported from your endpoint.
 
 
 
Formal update from Management:
 
Webroot has been actively tracking and monitoring an issue that affects some customers operating Terminal Servers in a virtualized environment. The Win Login 4005 error occurs where an RDS user is unable to connect to the RDS server, and the event logs display a message indicating that Win logon has stopped responding.
Despite extensive in-house testing, Webroot was only able to fully replicate this issue within its testing environments on the 4th August 2016. This replication was short lived and has not been able to be reproduced in subsequent testing. Information available from the successful fault replication led to a code fix and release to all customers.
 
The Webroot WSAB Agent release (v9.0.13.50) on the 18th October 2016 was designed to fix a given deadlock condition. This release was followed up by (v9.0.13.58) on the 21st October 2016, again with the aim of reducing the fault cases.
 
It is clear that these releases did not deliver a complete code fix solution and that some customers are still experiencing the 4005 error, and that this is preventing normal operation of Servers without corrective action on a regular cadence. Webroot apologies for any inconvenience caused and for the prolonged duration in reaching a permanent fix.
 
We have also been informed that the fault condition may occur more frequently following the releases on the 18th & 21st October. If you have seen such an increase in occurrences following updating to these latest versions we would welcome the opportunity to discuss this further on a direct call with our senior engineering resources. Our Support teams will be able to coordinate a call at a convenient time.
 
Whilst we have seen that the Win login 4005 error can occur when the WSAB agent is not installed Webroot is determined to reach a root cause analysis and fix. Webroot has an active ticket with Microsoft in progress and has assigned the fault the highest priority status and appropriate resources necessary to reach a conclusion. All existing advice provided by our support teams is still valid at this time.
 
A team comprising all key business areas up to SVP level has been convened and will be briefed daily on progress from the core team. The core team will update support regularly as we make progress on addressing this issue. The Director of Product Management has personal responsibility until resolution for this matter.
 
Webroot apologizes for the extended period of time taken to manage this extremely complex fault. We also thank you for your continued patience and understanding in this matter.
 
Kind regards
Product Management – Webroot
Userlevel 1
@, thanks for the summary and official statement.
 
It should be clarified that the "fix" released recently was developed specifically for ESX platforms, not Hyper-V, and Hyper-V testing is ongoing and hopefully a fix there is forthcoming.
 
 
Option 1 is not really viable in today's computing environment.  I mean, are you going to tell your customers that you're "sort of" providing anti-virus/security protection for them because you're using a year-old AV agent that can't deal with modern attacks?  I seriously doubt that will fly.
 
Option 2 listed below (nightly restart of RDS servers) is technically useful and may mitigate the issue somewhat, but we've seen this issue happen 3 times in a single day, so beware - your mileage may vary.  Don't get me wrong, it's better than nothing.
 
Option 3 should be to remove Webroot completely and use another product until Webroot gets this fixed.  That's what we've done, and we haven't seen the issue return since.
Userlevel 6
Badge +26
@ - The fix wasn't specifically for ESXi, we just tested on that platform, which appeared to be specific on the surface. Unfortunately, recent reports from the field have a mixture of ESXi and Hyper-V so that's not completely correct. We're treating this as an Microsoft issue that we're obviously affecting or exacerbating. There are well documented cases that the same problem exists on RDS and TS environments without WSAB. Hence, why our dev team is working closely with MS to uncover what's happening.

The options are what they are and provided by our support. I'm just passig it on. 😎
We've been with WebRoot for about a month. Since installation we have started experiencing this issue across our customer base. The problem was not there prior to installing Webroot.
 
All affected servers are running Windows 2008 R2 RDS on VMWare 5.5. All tenant RDS servers reboot daily, so I can confirm that this is not making a difference.
 
When the problem occurs, the server will stop accepting new RDS connections. Even the VMWare console is non-responsive - it just sits at a black screen. However, the existing RDS connections do not appear to be affected, and the server can be remotely managed and connected to using SMB, WinRM, WMI, etc.
 
We just had another customer go down a few minutes ago which has prompted me to write this. This evening we will start the process of removing WebRoot from all of our tenants, and switching to another solution. This is unfortunate, as I really was happy with overall direction of Webroot.
 
For what it's worth, my personal opinion is that this is not a Microsoft problem. We have tenant RDS servers that run other AV/malware programs without locking up like this. Even servers that experience this issue with Webroot can be *fixed* by simply removing it. Again, this just my opinion looking that the information we all have to work with. My concern, based on how old this post is, is that Webroot may be looking for the problem in the wrong place.
@
 
Agreed on all fronts. I'm working on signing up with another product today. My intention is to only replace Webroot on our RDS servers until Webroot gets this straightened out. Of course the longer that takes the more exposure I'll have to this other product, but I don't really have a reason to want to replace WR on the other 700+ nodes we have it on. Although, as several others have pointed out, this is taking a bit to fix for how much trouble it's causing. It's not that I expect that any software application is going to be completely free of defects, but when we run into a case like this it gives us as users some valuable insight as to what resources a software vendor maintains in the way of addressing problems when they come up.
 
Just for the sake of record keeping - we had the recurrence happen on two more servers in the last two days. Also, we've had all our RDS servers scheduled to reboot at 4:30am for almost a year, so I wouldn't be a good of example of someone that's been very helpful for.
Userlevel 6
Badge +26
@ - the issue has been well documented by MS on many forums and their own KBs list it expressly. WR has released numerous releases attempting to address what is found during kernel debug reviews over the past few weeks/months. Additionally, testing every environment expressed on this forum is nearly impossible. We have limited resources there to test every option, so we do the best we can given the complexity of the problem. Finding a "smoking gun" has been nearly impossible and senior management is fully behind the dev team to figure this out. MS has acknowledge the issue and is working with our dev team as was mentioned in the statement above.

For your reference (Read the KB post, MS clearly acknowledges the issue):  https://support.microsoft.com/en-us/kb/3179574 (KB3179574)

Known issues in this update

Symptoms

After you apply this update on a Remote Desktop Session (RDS) host, some new users cannot connect to an RDP session. Instead, those users see a black screen, and they are eventually disconnected. This issue occurs at unspecified intervals.

Cause

During virtual channel management, a deadlock condition occurs that prevents the RDS service from accepting new connections.

Workaround

Because the RDS service is deadlocked, it cannot be restart or stopped. However, you can stop the relevant SVCHOST.EXE process in Task Manager, and manually start the RDS service to recover. When you do this, all existing users are disconnected and have to reconnect, but new users can now log on. Notice that Group Policy settings can dictate which existing sessions are closed when TermService is stopped. However, the default behavior is for users to remain logged on.

To find the RDS svchost server, follow these steps:

Start Task Manager.
On the Services tab, search on TermService.
Right-click the TermService entry, and then click Go To Details.

Note This highlights the svchost.exe entry on the Details tab.
Right-click svchost.exe, and then click End Task.
Status

Microsoft is researching this problem and will post more information in this article when the information becomes available
@ - well I certainly appreciate any and all efforts that Webroot is putting into resolving the problem. I think what @ was saying though is that the KB you're referring to doesn't seem to apply to many of us. For one, it applies to Server 2012 and Windows 8. I have one RDS server running Server 2012, and it does have that patch installed, however the rest of my RDS servers are 2008 R2. Secondly, it doesn't describe the symptoms we're seeing either, or at least not mine. My users don't see a black screen - their RDP client just sits and spins on Configuring Session for several minutes before timing out. It also doesn't mention the one singular mark that this condition appears to leave in all of our environments every time it happens - Event ID 4005. 
 
I think all he was getting at was that, to at least those of us most impacted anyway, any sense that the only people we feel can fix this (Webroot developers) might be barking up the wrong tree (KB3179574) isn't leaving us all that optimistic about a resolution any time soon.
@  - Yes, that's exactly what I mean.

Reply