Saturday, January 24, 2015

Kerberos: Close Encounters of the Third Kind

One of those Saturdays, my goal today was to implement this beast in a production ShrePoint 2013 environment which consists of 3 on-premise farms, regionally deployed on 3 continents. I've been doing the test setup for a couple of days now and found a lot of gotchas that will be useful to everyone willing to go for Kerberos an experience a bit of a pain :) Only if it goes wrong, though...


There are many blogs out there focusing on the Kerberos implementation itself, and this is the best documentation I've found on the topic (although it's for 2010, things haven't changed much), so I'm not going to go into details of how to do the setup itself (there are so many different possible scenarios), but focus mostly on the issues I've faced and found tricky to resolve.

The customer technical requirements are to have all users authenticating with Kerberos, rather than NTLM and to have delegation enabled (between the three SP farms) and to a variety of back-end systems.
For the purpose of this post, I'm describing the authentication and delegation to SQL and between the farms.



The implementation steps to get Kerberos in place are generally described here:


- A records for the web applications are created for all web apps on all farms. No CNAMEs.

There is a known issue with some Kerberos clients that attempt to authenticate with Kerberos enabled services that are configured to resolve using DNS CNAMEs instead of A Records. More info here.

- SPNs are created for the web app pool acounts, without using the port numbers as we're using a default port. A good catch here is 443 is also considered a default, so you don't need it in the SPN.
The documentations says you should create SPNs with, and without the port - but that's not true.

Another confusing thing is that you still use the HTTP service class in the SPNs even if you're using SSL for your web application, a simple reason is that it's the only one existing, there's no HTTPS...

A best practice for setting the SPNs is using the -s switch, rather than -a, this way the setspn.exe tool will check for duplicate SPNs (major problem) and will not allow you to create duplicates and save you some headaches. Cool, isn't it?

- SPNs are created for the SQL Server instances, using the server name and instance name, no port.
The three SQL instances are all running named instances on a custom port, different dedicated boxes.

- The application pool accounts and the computer accounts for the SP servers are trusted for delegation (to any service). We are not using constrained delegation in this scenario.

- The Authentication Provider settings for all the web apps involve are set to Kerberos (Negotiate)

- IIS settings for the website are verified as correct, nothing usually needs changing there after you do the Central Amin bit, but it's worth verifying that Kernel-Mode Authentication is not Enabled and the Authentication Providers for Windows Authentication are listed as 1) Negotiate, and 2) NTLM.

We've done all that and now we expect our farms to start using Kerberos right away. In an ideal world :) 



For the tests, I've used the following toolset in order to be 100 % sure that this is working properly.


- First and most imporatnt is, have Auditing enabled on the Success and Failure logon events.

You need to look at the Event ID 4624, which indicates that an account logged in succesfully.

- Enable Kerberos Logging - a must-have during the implementation,

The information you'll find after enabling this is located in the System event log. You'll only see errors being logged, and the information is far from verbose. That's why you need network capturing in order to see the AS and TGT requests and the respective responses and analyze them.

- Install WireShark or Microsoft Network Monitor (both will do the job, choose the one you're more experienced with). The network capturing part is the one giving the most detailed info.

- KerbTray (part of the Windows 2003 Resouce Kit Tools) - to view / purge the Kerberos tickets issued.

- klist - built-in tool in Windows , more information and more advanced than KerbTray. No GUI.

- Kerberos Authentication Tester by Michel Barneveld - same purpose as the two above, with a GUI. That's the most user friendly tool, it displayed the tickets even when the user did not have permissions to see them with klist for example.

- DelegConfig - an "all in one" tool by Brian Murphy-Booth when he worked with the Microsoft IIS team. That one builds a report for you, helps you set up the SPNs and test if the delegation works.



Issue number one - SPNs for the SQL Server are wrong.

The first thing I've noticed after the start is that the SPNs for the SQL are incorrect.
You'll see this immediately in the System event logs of the WFE servers, as well as in the network capturing (if you're running one). It will also display the SPN that is not recognized by the KDC.


In our case it was the MSSQLSvc/<sql server fqdn>:port. This is what was requested and not found.

I've used The MS documentation to set those up and I found it was a bit misleading, where it advised to use the instance name, rather than a port when using SQL Server named instance. Wrong!

For a named instance, use:
setspn -A MSSQLSvc/myhost.redmond.microsoft.com:instancename accountname

Then I jumped to that one, which is in the context of SCCM really, not SharePoint, but it turned out the correct one.

The command to register an SPN for a SQL Server named instance is the same as that used when registering an SPN for a default instance except that the port number should match the port used by the named instance.

So, we've deleted the current SPNs and re-registered them in the format MSSQLSvc/server.fqdn:port domain\account and the error disappered. To be sure we've done the correct thing, let's check in SQL:

SELECT DB_NAME(dbid) AS DatabaseName, loginame AS LoginName, sys.dm_exec_connections.auth_scheme as AuthMethod
FROM sys.sysprocesses
JOIN sys.dm_exec_connections
ON sys.sysprocesses.spid=sys.dm_exec_connections.session_id
WHERE dbid > 0
GROUP BY dbid, loginame, spid,sys.dm_exec_connections.auth_scheme

That would return a list of connections to all databases, listing the accounts and the protocol.
It's normal that you'd have connections to the system databases using NTLM, but all the connections to the SharePoint Content and Config databases, should be Kerberos if you've configured it correctly. If you see a mix of NTLM and Kerberos connections to one of the Content or Config DBs, iisreset on the SharePoint server(s) will be needed to kill all the current connetions. Or you can just restart the SQL instance.




Issue number two - The App Fabric Caching Service needs an SPN, too.

Same error as for the SQL SPN:  Principal Unknown. Well, that's expected.

The format of the requested SPN (again, logging, network capturing) is:

AppFabricCachingService/server.FQDN:22233 domain\account.

I haven't seen a request to use the NETBIOS name only, but it's recommended to create it. 
Reference: The blog of Sam Betts: SharePoint Escalation Engineer at Microsoft Madrid

So, create this one, too: AppFabricCachingService/server:22233 domain\account

I am assuming you've already configured the managed account to run the AppFabric service.



Issue number three - Delegation tab missing from the Active Directory account properties.

Of course, you've not created an SPN for the account if you can't see that. If you don't need an SPN for that account, but you still want to trust it for delegation, based on your scenario, you can create a dummy SPN for it. 


Issue number four - SharePoint is currently configured to block Intranet calls

Microsoft recommend to configure the RSS Viewer webpart to test the implementation.

"Configure the RSS Viewer web part to display RSS feeds in a local and remote web application".

That has immediately failed for us. We've got the following error on both feeds:



And the following in the ULS logs:

RssWebPart: Exception handed to HandleRuntimeException.HandleException Microsoft.SharePoint.SPException: Access to <the feed URL> is denied. SharePoint is currently configured to block intranet calls.    

The fix: (Run it in SharePoint 2013 Management Shell for each of the involved farms):

$farm=get-spfarm
$farm.properties.disableintranetcalls=$false
$farm.properties.disableintranetcallsfromapps=$false
$farm.Update()


Issue number five - The RSS webpart does not support authenticated feeds



Even after fixing the issue with the blocked Intranet calls,  you still won't be able to see neither local, nor remote RSS feeds through the RSS Viewer webpart if you have this in the web.config of your web application (which is the default):

<add key="aspnet:AllowAnonymousImpersonation" value="true" /> 

Change it to false and you'll be able to see the at least the local RSS feed.

<add key="aspnet:AllowAnonymousImpersonation" value="false" />


Issue number six - You can see the local RSS feed, but not the one from the remote farm

After tackling the previous two issues around the webpart, we still can't see our remote feeds. The account I'm using has Full Control rights on both the source, and the destination site collections.
If I grab the feed URL, I can open it directly in the browser without any problems.
Which lead me to believe that the delegation is not working and we're facing the "double-hop" issue here, which occurs when you're using NTLM. The RSS Viewer webpart does not work with NTLM.

You need to either allow Anonymous access on the site hosting the feed, or as we do - use Kerberos.

I looked around all tools and logs, and I can see that the delegation is working. 4624 Event ID on the web server hosting the feed, look at the Impersonation Level and the Authentication Package:



If you see results like the above, then definitely your Kerberos authentication and delegation are working. The thing that worried me was that the account in this event was the application pool account of the farm requesting the feed, not my personal account which I've used to load the page.

This account, of course, does not have access to the site hosting the feed on the destination farm.
So, the ULS logs will show this nice little 401 error on the server requesting the RSS feed:

RssWebPart: Exception handed to HandleRuntimeException.HandleException System.Net.WebException: The remote server returned an error: (401) Unauthorized.
 at Microsoft.SharePoint.WebControls.BaseXmlDataSource.GetXmlDocument()
 at Microsoft.SharePoint.WebPartPages.DataFormWebPart.GetHierarchicalXPathNavigator(IHierarchicalDataSource ds)  at Microsoft.SharePoint.WebControls.SingleDataSource.GetXPathNavigatorInternal()
 at Microsoft.SharePoint.WebControls.SingleDataSource.GetXPathNavigator()
 at Microsoft.SharePoint.WebPartPages.DataFormWebPart.PrepareAndPerformTransform(Boolean bDeferExecuteTransform)

A lot of digging got me to the point to understand that the RSS Viewer webpart does not work in that way when it's in the context of a Claims web application. It's trying to read the feed as the application pool account and this is not changeable.

The Microsoft Kerberos document I've mentioned in the beginning is mentioning something that might be true for the RSS Viewer (it's for 2010, but I don't think it has changed in 2013):

Currently, most of the service applications that are included with SharePoint Server do not allow for outbound claims authentication, but outbound claims is a platform capability that will be taken advantage of in the future. Further, many of the most common line-of-business systems today do not support incoming claims authentication, which means that using outbound claims authentication may not be possible or will require additional development to work correctly.



This also seems to be true:

Important:

It is a requirement to configure your web applications with classic Windows authentication using Kerberos authentication to ensure that the scenarios work as expected. Windows-Claims authentication can be used in some scenarios but may not produce the results detailed in the scenarios below.

The only solution I've found for proper testing of this is to create two Classic Mode web applications and configure the RSS feed between them.  That worked over Kerberos without any pain.


To double check if Kerberos delegation works, I've switched back to NTLM on the source and destination web apps through the CA Authentication provider properties for the web apps. Boom - all the feeds are not displayed anymore - "The RSS webpart does not support authenticated feeds". Very generic :)

There are a lot possible reasons for which your Kerberos implementation might fail. Just some of them are: 

- Firewall rules (UDP port 88 on the DC); You can switch to TCP or just make sure that one is open.
- Time difference bigger than 5 min between the client requesting the ticket and the KDC; Kerberos is always using GMT time in the errors logged, but you should ignore that. As long as the time is in sync, the zone is OK.
- Browser security settings (it's recommended that your sites are in the Intranet Zone);
- Proxy settings in the web.config - needs to be bypassed for local connections;
- Wrong DNS records, even the PTR records could play a vital role with Kerberos.

1 comment:

  1. Great article Dimitart! This will be the first place to visit when I have issues with the "beast" :)

    ReplyDelete