Users unable to login to terminal server with Webroot installed.


We are deploying Webroot to our clients and have been running into an issue with users unable to login at a certain point. After testing we found it has to do with Webroot being installed on there but we cant figure out what is causing the issue and we've had to remove Webroot. This seems to only be affecting Server 2008 R2 environments. 

198 replies

Userlevel 6
Badge +26
@ - These are all valid points and concerns, so no problem on my part, I work with customers having issues like this all the time. 8-)
 
I read the post thouroughly. I believe these are different and unrelated issues that can be addressed seperately.
 
Just so everyone understands, our technology touches a lot of different areas and layers on an endpoint. We are not an AV scanning product only. We have different "shields" that monitor and watch different attack vectors, so often times the issues that appear related are not even close to be related. Also our technology is very different than any other traditional AV on the market, so comparing is irrelevant.
 
Our agent is a kernel level code base that operates below the OS layer. The TS error/issue is happening at this layer/level, not at the application/memory/ etc... layer. Deadlocks are an issue at the kernel, where we have to run custom debug software to capture logs.
The performance concern you mentioned appears at the OS layer where our agent many be monitoring action taken at that layer that may appear malicious. There are several options to try and ascertain what's happening with regards to a process being monitored. (Identity protection is often a culprit and the monitored process is reported in the Identity Protection screen on an endpoint under Application Protection, not in any log file, unfortunately. Find that directory where the process is firing and see if it's set to Protect-allow-deny) If it's not set to Allow, then whitelist the dir it's running.) This is a common situation with speicalty products, like CC drivers or custom plug-ins etc.
Whitelisting - if it's not showing up there, then locate the log file in Program DataWRDATA and find the monitored process. It could be a core monitor that just needs whitelisting. If it's a commercial product, open a ticket and have our support push it up to AMR for rules based whitelisting.
Scanning - we view scanning as legacy requirement and only one area of detection (small variable) and while not mentioned here on this thread it comes up all the time. To clarify, we scan below the OS layer as well by looking at the MBR records and scan files at the byte level. So, given how RDS works, there could be areas to address/review there. So, just to be clear dev is looking at many areas and levels regarding the TS issue, not just one.
 
@ - you're welcome to PM me directly and I'll be more than happy to work with you and/or your team to see what's happening related to these performance issues. Would suggest whitelisting any of these known processes that may be being monitored. What you described can easily be viewed as a potential malicious activity when you think about what PowerShell does is outside normal applications. Could be that our agent is monitoring a number of processes that match known suspicious behaviours and I can explain how determinations are made, how unknowns work, whitelisting work and any other questions you may have to better understand how our agent works. (If you're working with an SE, I'd suggest reaching out to them for additiional help, otherwise you're welcome to reach me directly, i've been forthcoming on my contact info.)
 
scooper@webroot.com - 720-842-3562 (I'm located at our corporate offices in Colorado.)
 
Thanks - Shane
 
 
Userlevel 2
@I think it would be really beneficial if you could open a ticket with webroot support for them to investigate with your setup to see why this would have started happening to you with 9.0.13.75. If that is the case I would have to assume you have something uniquely different than their testbeds that they likely won't be able to address without eyes on it.
As a starting reseller, we are pleased to read this, especially to keep us better informed.
Without information/feedback about this issue we are just "floating" in space.
Knowing that your engineers are committed to come with solid solution we can inform our customers to hold on an wait a little longer.
 
Thank you!
Userlevel 2
This was the solution presented to us by support, and so far we've not seen a return of the issue. Hope it helps someone else:
 
"It sounds like our latest release from Friday has some fixes in place for this issue. Can you create a new Terminal Server policy with the following settings? You'll first want to copy the Recommended Server Defaults policy, then ensure the following adjustments are made:

Basic Configuration - Favor low disk usage over verbose logging - ON
Scan Schedule - Time - Choose a day and time that fits in with low disk IO activity (i.e. everyday at a specific time or only on weekends)
Scan Schedule - Hide the scan progress window during scheduled scans - ON
Scan Settings - Scan archived files - OFF
Self Protection - Set to Minimum
Real-time Shield - Scan files when written or modified - OFF

Once that has been completed, I would like to have you try it on one of your servers. Again, I apologize for the delay on this ticket and sincerely thank you for your patience. "
Userlevel 6
Badge +26
? - doubling up here from another post, but for those others on this forum, the latest agent build that went out this week, 9.0.13.x has the TS fix. It was a tough one and difficult to find, but development finally sorted it out. If you haven't updated your TS servers, go fot it now... FYI.. if you're running Server 2012R2 and TS - there's a MS KB fix for that one,
 
 
Just a quick update - not sure this actually helps anyone but thought I'd share. I've been wondering if this was something that Webroot was doing in real time (perhaps a heuristic trigger?) or something that was just breaking until the system was rebooted. I finally got to test that theory today when this happened to a server that was logged in on the console of the Hyper-V machine. So this time I could actually poke around the server for a bit before the customer had me reboot it.
 
Long story short, I disabled protection and it made no difference. However, I also attempted to uninstall Webroot from the WR console and then force an update from the tray icon, which 'usually' uninstalls pretty quickly from most machines, however it didn't in this case. I rebooted the server and WR was still on it. It did uninstall some time later when I wasn't looking.
 
Anyway, I did load up procmon for a few minutes prior to the reboot, but WRSA.exe touched something like 14 million objects in that time so I had a giant haystack but didn't even know what kind of needle I should look for.
Userlevel 7
Badge +35
I have requested an update and should be able to share something with you in the next couple of days. Thanks!
Userlevel 7
Badge +35
We understand the concern and frustration and have asked for our product team to comment on the resolution and time to solve.  We are posting their message below:
 
Please accept our apologies for the time it is taking for us to get a full fix into Production.  We understand that this is a serious issue for impacted customers, and we have making progress towards its resolution.  We have reduced the number of customers seeing this issue, but not all of you, and we are dedicated to fixing that.
 
There are a handful of steps that we need to take in a specific sequence to address these remaining cases, and we are actively mapping those dependencies and planning the timing for those releases. This work is our engineering team’s top priority.  We have been actively working with a small number of customers to test fixes and drops, and the feedback they’ve provided has directly led to the approach we are taking.
 
Once again, we apologize for the extended time taken to get this right and the inconvenience it has caused.
Userlevel 1
Fantastic. Thank you.
Userlevel 1
Yes. This is just a beta and it still hasn't been released. The latest release copy we have is 9.0.15.65 and the beta is 9.0.17.24.
 
I have not used it on a production box. I will wait until a full release until I do that. The issue is the setting under 'self protection'. If you set it to 'minimum' the problem doesn't occur and all is well. If you set it to 'Maximum' then the random RDP blank screen error can occurr at any point. I have not tried the 'medium' setting. The server still works but will not allow RDP users to see the screen and a reboot is required. It's as if it is a graphical corruption as the user can actually log in - they just cannot see anything. To be honest you could probably just restart the RDP service but this I have not tried. The setting itself is there to prevent other programs from modifying Webroot settings so having it on 'minumim' is probably not that much of a security threat. Just very annoying when you are not aware.
Userlevel 7
Badge +35
@ we implemented a fix in version 9.0.17.24, but there may be something else going on. Please submit a ticket to our Support Team so they can review the logs to make a proper determination.
Userlevel 4
Badge +7
Hi Everyone.
 
I had this issue quite a while ago with a lot of our clients and Webroot Support gave me the following fix which has resolved the issue for me on every Terminal Server affected.
Cheers,
Andrew
 
Thank you for contacting Webroot support 

Working closely with our customers, Webroot has identified an issue that manifests itself in the form of existing Terminal Server sessions becoming un-responsive with users no longer being able to log in. The affected Terminal Servers have required restarting in order for normal service to be resumed. For customers who have experienced this, an update to WSA is available now and Webroot support will provide assistance as required. The fix will also be rolled into the next general release of WSA which is forecast to be automatically deployed in October. Forthcoming product bulletins will advise of the exact date. 

Before applying the agent build we have created to address this issue, please ensure that you have applied the below Microsoft patches. These patches were designed specifically to address 4005 errors/RDS connection issues. 

http://support.microsoft.com/kb/3172614 
http://support.microsoft.com/kb/3179574 
http://support.microsoft.com/kb/3197875 
http://support.microsoft.com/kb/3197874 

Before installing the following agent build please ensure that you have removed the agent currently installed and ensure that C:Program DataWRData has been removed (if not please delete this folder: 
http://download.webroot.com/9.0.17.32/WRSASME.EXE 
Please ensure that you reboot the server after applying the above update. If you experience any further issues, please update your support ticket and the escalation team will get back to you promptly. 

Thank you for your patience whilst we have investigated and developed the update. It is deeply appreciated by us all at Webroot. 
Userlevel 1
See AKIM's link above to 4005 issue.
 
The solution shown there worked for me on one 2008r2 server, but not on a second one.
Userlevel 7
Badge +56
I checked with the escalations engineer who's working on this issue. He let me know that the it doesn't re-poll the console to get the policies, it just reloads those from the locally stored configuration.

On the plus side, we have finally figured out how to reliably reproduce the issue, so that means the devs can work on properly fixing this one.
Userlevel 6
Badge +26
? - We've been focusing on TS running in virtualized environments. If you're having issues on physical boxes, you may want to look at microsofts kbs regarding TS, as that's unusual from what we've learend. They did patch 2012r2 due to deadlocks at the kernel level and it's focused on viruatlized machines as well.
 
If you have a Windows 2012 R2 Terminal Server environment running virtually, we suggest performing the Microsoft updates.
 
 
Userlevel 6
Badge +26
? - just curious, have you applied this specific patch for server 2012r? I know you said all patches applied, but this was more of a KB and I'm not sure if it's part of the auto patch or not.
 
KB3179574 (https://support.microsoft.com/en-us/kb/3179574)
Hi Shane -
 
I do see it installed on the one 2012r2 server that we've had this problem on, but obviously not on any of the rest since they're running 2008r2. That's a pretty long list of fixes associated with that roll-up so I hate to uninstall it, but obviously I guess that's got to be balanced against this problem.
 
Thanks for the follow up and looking forward to any news about this being resolved for Hyper-V based RDS servers.
 
Hi All,
 
We are having same issues here and been ongoing now for past month or two.
I have just had 2 days in a row, whereas sometimes it may be a week or more.

Windows 2008 R2 Server. We have approx 30 staff logging in via RDP Connection.
Webroot version 9.0.13.58 installed.
.
We do have patches KB2621440 & KB2667402 installed and have been installed for long period with no issues previously.

All of a sudden users are unable to RDP to Terminal Server. Seems to be more of an issue first thing in the morning, and hasnt occurred so much in working hours. Thats why I was looking at backups/snapshots  and other processes which may have been causing at that time until I found this post.
 
You can connect however via Hyper V but it does lock up loading policies screen. It connects but fails the login process.
 
The only way I have been able to resolve this issue for an immediate emergency fix, is to instruct Hyper V to Turn Off the virtual server in question. I have tried shutdown command but it fails.
 
Any info on issue would be greatly appreciated. 5:00am phone calls are becoming a bit too much to restart server.
 
 
Joined up specifically to add to the list.
 
We are experiencing issues with our RDS servers accross multiple sites since around mid-September. The server's are all running on Windows 2008 R2 and have Webroot installed (they are up to date, with the latest version v9.0.13.58).
 
The issue we run into is the users are unable to connect to the RDS Server, on reveiwing the Event Logs, we see a heap of Winlogon events, with Event ID 4005. The only fix we have found is to reboot the server. On some sites, we are performing this reboot daily, and on occasion twice a day. There doesn't seem to be a defined trigger for the problem that I can find.
 
The servers are running on a mixture of either physical hardware, and also ESX 6.0. Some of the servers have the patches KB2621440 and KB2667402 installed, and some do not. On 1 particular server I have removed the patches, and re-installed them to test the result.
 
I'll keep an eye on this thread to try out any potential fixes and report back as we go.
 
I have been chasing this issue for some time. We have 2 clients, one in Hyper-v w/2008 Terminal servers, and one in esxi5.5 w/2008 terminal servers...Both client are running into the exact same issue in this thread. We create a policy group using the same settings on page one, but we still get the winlogon issue about once a week  or so. Webroot is current on all of the terminal servers for both clients. 
 
Our customers affected are largely IBM hosts, all running virtualized TS's on HyperV x3650;s Mx (m3, m4, m5), x3400 M3, has also hit ProLiant DL380 G7 HyperV host, can't tell you the NIC in this sorry as it's a retired asset and don't have that level of info
 
affected network cards can be any of the following (mostly the terminal servers are hyper-v virtual running on these IBM hosts)
 
Broadcom NetXtreme  & NetXtreme II
Broadcom BCM5709C (NetXtreme II GigE
Broadcom BCM5716C NetXtreme II GigE
Intel I350 
Userlevel 2
I would trust the local application if you open it and it says what version it is, don't though trust what is shown in add remove programs list as it will show the version that was originally installed.

I was also going to update now that it has been a week that we have closed all our internal tickets related to webroot issues on terminal servers as they have gone a week without issue since 9.0.13.58 (all on esxi). Hopefully that will give some hope to the hyper-v users that a resolution can be found for them as well.
Userlevel 6
Badge +26
All on this thread, I'm compiling what we know at a high level, but there is nothing that "works". There are "workarounds" depending on your setup and specific symptoms.
 
If experiencing on a physical server, then work with Microsoft and their patch and update options.
 
For virtual environments:
 
Option 1: Roll back to an older version of WSAB 8.0.8.53  (Pre 9.0.x which addresses many of the newer malicious attack and supports folder/directory whitelistings, so the caveat is, that older version may not catch some of the modern attacks as it's almost a year old AND if you have folder whitelists in place, then that agent version WILL NOT work.)
Option 2: Setup a nightly reboot of the entire server to clear all sessions and reset the Terminal Server. (Kicking the can down the road, but it's helped a number of production servers in the field.)
 
If you've worked directly with Support on your particular situation, continue to work with their response as they may have a little more insight on the specific logs reported from your endpoint.
 
 
 
Formal update from Management:
 
Webroot has been actively tracking and monitoring an issue that affects some customers operating Terminal Servers in a virtualized environment. The Win Login 4005 error occurs where an RDS user is unable to connect to the RDS server, and the event logs display a message indicating that Win logon has stopped responding.
Despite extensive in-house testing, Webroot was only able to fully replicate this issue within its testing environments on the 4th August 2016. This replication was short lived and has not been able to be reproduced in subsequent testing. Information available from the successful fault replication led to a code fix and release to all customers.
 
The Webroot WSAB Agent release (v9.0.13.50) on the 18th October 2016 was designed to fix a given deadlock condition. This release was followed up by (v9.0.13.58) on the 21st October 2016, again with the aim of reducing the fault cases.
 
It is clear that these releases did not deliver a complete code fix solution and that some customers are still experiencing the 4005 error, and that this is preventing normal operation of Servers without corrective action on a regular cadence. Webroot apologies for any inconvenience caused and for the prolonged duration in reaching a permanent fix.
 
We have also been informed that the fault condition may occur more frequently following the releases on the 18th & 21st October. If you have seen such an increase in occurrences following updating to these latest versions we would welcome the opportunity to discuss this further on a direct call with our senior engineering resources. Our Support teams will be able to coordinate a call at a convenient time.
 
Whilst we have seen that the Win login 4005 error can occur when the WSAB agent is not installed Webroot is determined to reach a root cause analysis and fix. Webroot has an active ticket with Microsoft in progress and has assigned the fault the highest priority status and appropriate resources necessary to reach a conclusion. All existing advice provided by our support teams is still valid at this time.
 
A team comprising all key business areas up to SVP level has been convened and will be briefed daily on progress from the core team. The core team will update support regularly as we make progress on addressing this issue. The Director of Product Management has personal responsibility until resolution for this matter.
 
Webroot apologizes for the extended period of time taken to manage this extremely complex fault. We also thank you for your continued patience and understanding in this matter.
 
Kind regards
Product Management – Webroot
Userlevel 4
Badge +7
@ Thanks for the informative update! It is unfortunate that this issue has arisen and puts Webroot in such a bad light, I have been using Webroot across hundreds of my clients sites for a few years now and have found it to be a great product. I have complete faith that this issue will be addressed and it will be business as normal.
I look forward to further updates as this progresses.
Crossy
My concern is that this problem is bigger than just RDS servers, but obviously RDS issues grab the spotlight due to their direct impact on the end user. Please read the entire post before dismissing what I am about to say.
 
In the month or so that we have been using Webroot we have been tracking the cause of intermittent application and scripting hangs across all of our tenants. This is happening on both clients and servers - both physical and virtual. This first caught our attention with some of the maintenance scripts we use. Just a few examples:
 
1) powershell.exe may be called to run a simple one line get-command and pipe the output to a text file. Powershell will execute, the output file will be opened, but the process will halt.
 
2) Similarly, cmd.exe gets called to run a simple one-line DOS command and output to a text file. The command will execute, the output file will be created, but the process will halt.
 
3) Random processes used by our Managed Services platform, such as patch management and scripting, will either hang or fail mid-way through execution.
 
What we've observed is that the processes are often being blocked by *something* and cannot be killed by any means (Task Manager, Taskill, Powershell Kill, etc) In most cases a reboot will be necessary to release the process, and then subsequent executions of the same script/task/process will work as expected. Just to reiterate - intermittently across ALL tenants, all OSs and environments, and nothing other than a reboot is needed to get everything working again.
 
As we started to focus on the cause of background scripts/tasks/processes intermittently failing, we also noticed a huge spike in random LOB application process hangs. Similar to above, processes that spawn, halt and cannot be killed, or  that just abnormally abort.
 
Taking this into consideration, my opinion is that the "RDS deadlock" is just a symptom of a much bigger problem. Consider what would happen if Webroot was interacting with authentication and/or RDS related processes in the same way we suspect it is interacting with other processes. It's not inconceivable that the problems are related.
 
I understand that this post will not go over well and will likely be dismissed. But if I am wrong, then how do we explain that these symptoms seem to appear with the installation of Webroot and go away with the removal of Webroot? Also, If this is not a Webroot issue, then why does Webroot Support recommend rolling back to an old version of the program that does not (as far as we know) have this problem?
 
So I am clear on my intent, I am not trying to bash Webroot in any way. We have an investment in this solution, and we are all on the same team here. I am simply trying to get someone to consider that the RDS problem *could* just be a symptom, and that focusing on it exclusively may not get us closer to resolution.
 
I appreciate the ongoing effort by Webroot Support to work through this issue, and that they are keeping us in the loop as much as possible.

Reply