Friday, January 30, 2015

Continuous Crawl Intervals

So this new feature in SharePoint 2013 sounds good, right? Ever wondered is it really continuous and how fresh really are your search results when using it? Here's the answer in maybe my shortest blog post:

  1. $ssa = Get-SPEnterpriseSearchServiceApplication  
  2. $ssa.GetProperty("ContinuousCrawlInterval")  

The default interval is 15 minutes. You can easily change that to anything else, say 20 in this example:

  1. $ssa = Get-SPEnterpriseSearchServiceApplication  
  2. $ssa.SetProperty("ContinuousCrawlInterval",20)  

Saturday, January 24, 2015

Kerberos: Close Encounters of the Third Kind

One of those Saturdays, my goal today was to implement this beast in a production ShrePoint 2013 environment which consists of 3 on-premise farms, regionally deployed on 3 continents. I've been doing the test setup for a couple of days now and found a lot of gotchas that will be useful to everyone willing to go for Kerberos an experience a bit of a pain :) Only if it goes wrong, though...


There are many blogs out there focusing on the Kerberos implementation itself, and this is the best documentation I've found on the topic (although it's for 2010, things haven't changed much), so I'm not going to go into details of how to do the setup itself (there are so many different possible scenarios), but focus mostly on the issues I've faced and found tricky to resolve.

The customer technical requirements are to have all users authenticating with Kerberos, rather than NTLM and to have delegation enabled (between the three SP farms) and to a variety of back-end systems.
For the purpose of this post, I'm describing the authentication and delegation to SQL and between the farms.



The implementation steps to get Kerberos in place are generally described here:


- A records for the web applications are created for all web apps on all farms. No CNAMEs.

There is a known issue with some Kerberos clients that attempt to authenticate with Kerberos enabled services that are configured to resolve using DNS CNAMEs instead of A Records. More info here.

- SPNs are created for the web app pool acounts, without using the port numbers as we're using a default port. A good catch here is 443 is also considered a default, so you don't need it in the SPN.
The documentations says you should create SPNs with, and without the port - but that's not true.

Another confusing thing is that you still use the HTTP service class in the SPNs even if you're using SSL for your web application, a simple reason is that it's the only one existing, there's no HTTPS...

A best practice for setting the SPNs is using the -s switch, rather than -a, this way the setspn.exe tool will check for duplicate SPNs (major problem) and will not allow you to create duplicates and save you some headaches. Cool, isn't it?

- SPNs are created for the SQL Server instances, using the server name and instance name, no port.
The three SQL instances are all running named instances on a custom port, different dedicated boxes.

- The application pool accounts and the computer accounts for the SP servers are trusted for delegation (to any service). We are not using constrained delegation in this scenario.

- The Authentication Provider settings for all the web apps involve are set to Kerberos (Negotiate)

- IIS settings for the website are verified as correct, nothing usually needs changing there after you do the Central Amin bit, but it's worth verifying that Kernel-Mode Authentication is not Enabled and the Authentication Providers for Windows Authentication are listed as 1) Negotiate, and 2) NTLM.

We've done all that and now we expect our farms to start using Kerberos right away. In an ideal world :) 



For the tests, I've used the following toolset in order to be 100 % sure that this is working properly.


- First and most imporatnt is, have Auditing enabled on the Success and Failure logon events.

You need to look at the Event ID 4624, which indicates that an account logged in succesfully.

- Enable Kerberos Logging - a must-have during the implementation,

The information you'll find after enabling this is located in the System event log. You'll only see errors being logged, and the information is far from verbose. That's why you need network capturing in order to see the AS and TGT requests and the respective responses and analyze them.

- Install WireShark or Microsoft Network Monitor (both will do the job, choose the one you're more experienced with). The network capturing part is the one giving the most detailed info.

- KerbTray (part of the Windows 2003 Resouce Kit Tools) - to view / purge the Kerberos tickets issued.

- klist - built-in tool in Windows , more information and more advanced than KerbTray. No GUI.

- Kerberos Authentication Tester by Michel Barneveld - same purpose as the two above, with a GUI. That's the most user friendly tool, it displayed the tickets even when the user did not have permissions to see them with klist for example.

- DelegConfig - an "all in one" tool by Brian Murphy-Booth when he worked with the Microsoft IIS team. That one builds a report for you, helps you set up the SPNs and test if the delegation works.



Issue number one - SPNs for the SQL Server are wrong.

The first thing I've noticed after the start is that the SPNs for the SQL are incorrect.
You'll see this immediately in the System event logs of the WFE servers, as well as in the network capturing (if you're running one). It will also display the SPN that is not recognized by the KDC.


In our case it was the MSSQLSvc/<sql server fqdn>:port. This is what was requested and not found.

I've used The MS documentation to set those up and I found it was a bit misleading, where it advised to use the instance name, rather than a port when using SQL Server named instance. Wrong!

For a named instance, use:
setspn -A MSSQLSvc/myhost.redmond.microsoft.com:instancename accountname

Then I jumped to that one, which is in the context of SCCM really, not SharePoint, but it turned out the correct one.

The command to register an SPN for a SQL Server named instance is the same as that used when registering an SPN for a default instance except that the port number should match the port used by the named instance.

So, we've deleted the current SPNs and re-registered them in the format MSSQLSvc/server.fqdn:port domain\account and the error disappered. To be sure we've done the correct thing, let's check in SQL:

SELECT DB_NAME(dbid) AS DatabaseName, loginame AS LoginName, sys.dm_exec_connections.auth_scheme as AuthMethod
FROM sys.sysprocesses
JOIN sys.dm_exec_connections
ON sys.sysprocesses.spid=sys.dm_exec_connections.session_id
WHERE dbid > 0
GROUP BY dbid, loginame, spid,sys.dm_exec_connections.auth_scheme

That would return a list of connections to all databases, listing the accounts and the protocol.
It's normal that you'd have connections to the system databases using NTLM, but all the connections to the SharePoint Content and Config databases, should be Kerberos if you've configured it correctly. If you see a mix of NTLM and Kerberos connections to one of the Content or Config DBs, iisreset on the SharePoint server(s) will be needed to kill all the current connetions. Or you can just restart the SQL instance.




Issue number two - The App Fabric Caching Service needs an SPN, too.

Same error as for the SQL SPN:  Principal Unknown. Well, that's expected.

The format of the requested SPN (again, logging, network capturing) is:

AppFabricCachingService/server.FQDN:22233 domain\account.

I haven't seen a request to use the NETBIOS name only, but it's recommended to create it. 
Reference: The blog of Sam Betts: SharePoint Escalation Engineer at Microsoft Madrid

So, create this one, too: AppFabricCachingService/server:22233 domain\account

I am assuming you've already configured the managed account to run the AppFabric service.



Issue number three - Delegation tab missing from the Active Directory account properties.

Of course, you've not created an SPN for the account if you can't see that. If you don't need an SPN for that account, but you still want to trust it for delegation, based on your scenario, you can create a dummy SPN for it. 


Issue number four - SharePoint is currently configured to block Intranet calls

Microsoft recommend to configure the RSS Viewer webpart to test the implementation.

"Configure the RSS Viewer web part to display RSS feeds in a local and remote web application".

That has immediately failed for us. We've got the following error on both feeds:



And the following in the ULS logs:

RssWebPart: Exception handed to HandleRuntimeException.HandleException Microsoft.SharePoint.SPException: Access to <the feed URL> is denied. SharePoint is currently configured to block intranet calls.    

The fix: (Run it in SharePoint 2013 Management Shell for each of the involved farms):

$farm=get-spfarm
$farm.properties.disableintranetcalls=$false
$farm.properties.disableintranetcallsfromapps=$false
$farm.Update()


Issue number five - The RSS webpart does not support authenticated feeds



Even after fixing the issue with the blocked Intranet calls,  you still won't be able to see neither local, nor remote RSS feeds through the RSS Viewer webpart if you have this in the web.config of your web application (which is the default):

<add key="aspnet:AllowAnonymousImpersonation" value="true" /> 

Change it to false and you'll be able to see the at least the local RSS feed.

<add key="aspnet:AllowAnonymousImpersonation" value="false" />


Issue number six - You can see the local RSS feed, but not the one from the remote farm

After tackling the previous two issues around the webpart, we still can't see our remote feeds. The account I'm using has Full Control rights on both the source, and the destination site collections.
If I grab the feed URL, I can open it directly in the browser without any problems.
Which lead me to believe that the delegation is not working and we're facing the "double-hop" issue here, which occurs when you're using NTLM. The RSS Viewer webpart does not work with NTLM.

You need to either allow Anonymous access on the site hosting the feed, or as we do - use Kerberos.

I looked around all tools and logs, and I can see that the delegation is working. 4624 Event ID on the web server hosting the feed, look at the Impersonation Level and the Authentication Package:



If you see results like the above, then definitely your Kerberos authentication and delegation are working. The thing that worried me was that the account in this event was the application pool account of the farm requesting the feed, not my personal account which I've used to load the page.

This account, of course, does not have access to the site hosting the feed on the destination farm.
So, the ULS logs will show this nice little 401 error on the server requesting the RSS feed:

RssWebPart: Exception handed to HandleRuntimeException.HandleException System.Net.WebException: The remote server returned an error: (401) Unauthorized.
 at Microsoft.SharePoint.WebControls.BaseXmlDataSource.GetXmlDocument()
 at Microsoft.SharePoint.WebPartPages.DataFormWebPart.GetHierarchicalXPathNavigator(IHierarchicalDataSource ds)  at Microsoft.SharePoint.WebControls.SingleDataSource.GetXPathNavigatorInternal()
 at Microsoft.SharePoint.WebControls.SingleDataSource.GetXPathNavigator()
 at Microsoft.SharePoint.WebPartPages.DataFormWebPart.PrepareAndPerformTransform(Boolean bDeferExecuteTransform)

A lot of digging got me to the point to understand that the RSS Viewer webpart does not work in that way when it's in the context of a Claims web application. It's trying to read the feed as the application pool account and this is not changeable.

The Microsoft Kerberos document I've mentioned in the beginning is mentioning something that might be true for the RSS Viewer (it's for 2010, but I don't think it has changed in 2013):

Currently, most of the service applications that are included with SharePoint Server do not allow for outbound claims authentication, but outbound claims is a platform capability that will be taken advantage of in the future. Further, many of the most common line-of-business systems today do not support incoming claims authentication, which means that using outbound claims authentication may not be possible or will require additional development to work correctly.



This also seems to be true:

Important:

It is a requirement to configure your web applications with classic Windows authentication using Kerberos authentication to ensure that the scenarios work as expected. Windows-Claims authentication can be used in some scenarios but may not produce the results detailed in the scenarios below.

The only solution I've found for proper testing of this is to create two Classic Mode web applications and configure the RSS feed between them.  That worked over Kerberos without any pain.


To double check if Kerberos delegation works, I've switched back to NTLM on the source and destination web apps through the CA Authentication provider properties for the web apps. Boom - all the feeds are not displayed anymore - "The RSS webpart does not support authenticated feeds". Very generic :)

There are a lot possible reasons for which your Kerberos implementation might fail. Just some of them are: 

- Firewall rules (UDP port 88 on the DC); You can switch to TCP or just make sure that one is open.
- Time difference bigger than 5 min between the client requesting the ticket and the KDC; Kerberos is always using GMT time in the errors logged, but you should ignore that. As long as the time is in sync, the zone is OK.
- Browser security settings (it's recommended that your sites are in the Intranet Zone);
- Proxy settings in the web.config - needs to be bypassed for local connections;
- Wrong DNS records, even the PTR records could play a vital role with Kerberos.

Wednesday, January 7, 2015

Nintex Analytics causing massive load on SQL

Nintex Analytics is a 3rd party product running on SharePoint (installed as a solution) which is used by some of our customers that are still on SharePoint 2010.I won't go into details about the product, Joel Oleson already did a great review on it a while ago. Nintex are not developing this anymore and the support will be retired on 1st July, 2015, so I've decided to share one of my very few real-world experience cases with it. Meanwhile, there is an alternative product for SharePoint 2013 - HarePoint Analytics for the people that used and loved Nintex Analytics.

The environment:

  • SharePoint 2010 Standard - 1 APP, 1 WFE
  • Nintex Workflow running on both servers
  • Nintex Analytics running on the APP server

...All sharing the same SQL box... which was running fine for 3 years.

One wonderful day we started getting complaints from users that their Intranet is running slow, server graphs showed 100 % CPU on the SQL box since a couple of minutes. That trend has continued and the only process eating it was the one for the SQL Server itself.

I ran a report of the most CPU-expensive queries on the server and found the following:

SELECT @WebCount = COUNT_BIG(DISTINCT w.ObjectId)
 FROM dbo.DimSPObjectsSites o with (readuncommitted)
  INNER JOIN (SELECT DISTINCT ObjectId, ObjectTypeId, EventTypeId FROM dbo.FactAuditData with (readuncommitted) WHERE IntervalId >= @IntervalStart AND IntervalId < @IntervalEnd) f          
   ON  f.ObjectId = o.ObjectId
   AND f.ObjectTypeId = o.ObjectTypeId
   AND f.EventTypeId = 3
  INNER JOIN dbo.DimSPWebs w with (readuncommitted)
   ON w.ObjectId = o.SPWebId
   AND w.WebTemplate = @WebTemplate




That query is making use of the following 3 database indexes:


dbo.FactAuditData.IX_FactAuditData
dbo.DimSPObjectsSites.IX_DimSPObjectsSites
dbo.DimSPWebs.IX_DimSPWebs_WebTemplate


...which are all in the Nintex Analytics Content Databases... so I've started stopping the Nintex Analytics Services that are basically Windows services running on the SharePoint server one by one.. after stopping the Nintex Analytics Data Management Service... the CPU time dropped immediately to the recent levels we've been observing.




I've tried a few other bits and pieces like reconfiguring the reports in terms of data to retain, purging intervals and so on but every time I started the Data Management Service, in a minute the SQL Server was getting hammered (100 % CPU).

I've raised this with the Nintex Support and they've sent me the following SQL query to create a stored procedure:

CREATE PROCEDURE [dbo].[CfgIndexInformation]
AS
BEGIN
SET NOCOUNT ON
DECLARE @indexCounter int,
@maxIndexes int,
@partitioncount bigint,
@schemaname sysname,
@objectname sysname,
@indexname sysname,
@objectid int,
@indexid int,
@partitionnum bigint,
@frag float,
@partitions bigint

DECLARE @work TABLE
(
indexNumber int identity(1,1),
objectId int,
indexId int,
partitionNum bigint,
fragmentation float
)

DECLARE @tables TABLE
(
tableName sysname,
indexName sysname,
fragmentation float
)

INSERT @work (objectId, indexId, partitionNum, fragmentation)
SELECT s.object_id,
s.index_id,
s.partition_number,
s.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED') s
WHERE s.avg_fragmentation_in_percent > 10.0 AND s.index_id > 0

SET @maxIndexes = @@ROWCOUNT
SET @indexCounter = 1

WHILE @indexCounter <= @maxIndexes
BEGIN

SELECT @objectid = objectId, 
@indexid = indexId, 
@partitionnum = partitionNum, 
@frag = fragmentation
FROM @work
WHERE indexNumber = @indexCounter

SELECT @objectname = o.name, 
@schemaname = s.name
FROM sys.objects o
INNER JOIN sys.schemas s 
ON s.schema_id = o.schema_id
WHERE o.object_id = @objectid

SELECT @indexname = name 
FROM sys.indexes
WHERE object_id = @objectid 
AND index_id = @indexid

SELECT @partitioncount = count (*) 
FROM sys.partitions
WHERE object_id = @objectid 
AND index_id = @indexid

INSERT @tables
SELECT @schemaname + '.' + @objectname, @indexname, @frag

SET @indexCounter = @indexCounter + 1
END

SELECT * FROM @tables ORDER BY fragmentation DESC, tableName, indexName

END

Then I ran the newly created stored procedure (exec CfgIndexInformation) to get information on the Indexes fragmentation. The indexes used by the most expensive query I found earlier were gragmented at 95%, 91% and 66% respectively. I stopped and disabled all the Nintex Analytics services and ran exec dbo.CfgIndexRefresh against all the Nintex Analytics content databases as per the support team advice.

That has decreased the fragmentation a lot and we've managed to run the Nintex Analytics fine after that. The long-term solution is to schedule the index refresh on a monthly basis to avoid reoccurences of that.

Monday, January 5, 2015

SharePoint Online slowness in document libraries when using Managed Metadata

I've had a very strange complaint from a customer that has been setup on Office 365 quite soon and everything worked like in terms of their SharePoint Online for a few months. They're a very small company with less than 50 users. The scenario is the following:

- A couple of subsites under a root site collection in SharePoint Online
- A few document libraries underneath each site
- Some document libraries having 4-5 Managed Metadata columns in the default view
- Those same document libraries take about 30 seconds to load, compared to 1-3 seconds for the rest.
- None of the document libraries exceeds even a thousand documents.
- That issue appeared out of the blue, it was working fine for months.

I've done some tests to remove the Managed Metadata columns and prior to that, reviewed the term store to see if something is exceeding any limit or is in a contradiction with best practices.

After removing the Managed Metadata columns, the libraries started to load in 10x less time. That was very weird, as the columns have just a few choices, nothing that would potentially load the backend too much. Anyway I've decided to raise this with Microsoft and they resolved it in a few days. The first engineer just tried to replicate the issue on another Office365 test tenant, but he couldn't see such high load times. The escalation engineer, however solved it by working on the backend with their internal tools.

The explanation was the following:

"Using backend tools I was able to look up ECMPermissions table for the specified environment and delete any users that were not found in SPODS (user data base). After this action the issue was solved"

In plain English, that means that if you delete some users from an Office 365 tenant, sometimes there are orphaned records in this ECMPermissions table, and that has to be cleaned up manually. I haven't asked if they plan a hotfix or not, but after they intervened, the Managed Metadata is useful again.

Saturday, January 3, 2015

2015 will be the Office365 year

First of all, Happy New Year everyone! Let it be more successful and more rewarding in every aspect for each and every one of you in personal and professional terms.
 
After reading Joel Oleson's SharePoint and Office 365 2015 Predictions I can't agree more with the power of Office 365 and its growing ascendancy over on-prem versions of SharePoint, Exchange and Lync. We can definitely see that Microsoft is pushing that really hard, especially to IT Pros with the certification track changes. Just a few months ago, you had to learn Windows Server 2012 and sit three exams related with it in order to proceed to the MCSE levels for SharePoint and Exchange.

Today, you have to know Office 365, no matter if you are an Exchange or SharePoint expert.
Both terms are already becoming too narrow and expanded skillset will be required in order for us to be more competitive on the market and provide even higher added value to the organizations we work with.


I sat the 70-347: Enabling Office 365 Services just before Christmas and I can say it was a close one with a passing score of just 775. The reason is SharePoint being just some 25 % of the exam and all the rest belonging to other Office 365 features like Exchange and Lync Online. Configuring Exchange Online and Lync Online is definitely not one of my specialties, SharePoint is though and next (and final) on the list to MCSE is the 70-332: Advanced Solutions for Microsoft SharePoint Server 2013 exam which I'll sit this month. 


Microsoft seem to have a plan to unite all the IT Pros working with their stack by hosting the Ignite Conferene, replacing all others like the world-famous SharePoint Conference.
2015 will be intesting with vNext coming out as well, I am sure that it will be a vigorous thing.