- 1 WFE running the Query Processing Component
- 2 Application servers running all other Search components
- 1 SQL server (dedicated and only one instance on it)
- All databases and service applicatoins provisioned through PowerShell (AutoSPInstaller)
The user experience looks like this on any query, executed from anywhere (and that just happened to occur out of the blue):
The Search Service Administration page in Central Admin was showing "All Errors".
The event logs on the APP servers were all flooded with EventID 1357 containing this:
A database error occurred. Source: .Net SqlClient Data Provider Code: 229 occurred 0 time(s) Description: Error ordinal: 1 Message: The EXECUTE permission was denied on the object 'proc_MSS_GetStatusChangeRequest', database 'Search_Service_Application_DB', schema 'dbo'., Class: 14, Number: 229, State: 5 at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite)
at Microsoft.Office.Server.Data.SqlSession.ExecuteNonQuery(SqlCommand command)
.... and a lot more errors related to Search as a consequence of this.
That's SP2013 + SP1 + only 2 security updates released in May this year. No updates applied after that and everything was working as a charm for 6 months after we've built it. I was given info that there were some network issues experienced just prior to that issue, but I did not have enough data to correlate both.
I've seen the same behavior a couple of months ago when a SQL box was rebooted manually without any actions on the SharePoint farm like stopping services or IIS at all. Then the Search stopped working even when SQL was operational. The reason was lost service account permissions for the Search service account. The same thing happened now for some (unknown yet) reason - the Search service account losts its permission - SPSearchDBAdmin for the Search Service Application database.
I know that's not the best solution, but the only one I could think of in the limited amount of time available to fix this production system - add the permission to the account manually in SQL Server Management Studio. Go to your Search Service Application database (the one referenced above in the application error from the event log, then go to Security -> Users and select the account you use for running your Search Service Application pool, it is also shown in the event log under the error.
In my case that's a dev system on which I've reproduced the issue for the purpose of this post:
On the Membership section, verify the SPSearchDBAdmin role is selected for the account, if it's not (like it was in our case), add it and that should fix your problem. Might need to restart the Search Service again on all the servers running it - we had to do that in our case to get the Admin Component up and running in addition to the Query Processing Component which started working properly immediately after the database permissions were restored.
We still do not know the root cause of that as the environment was handed over long time ago, initially thought that Search in 2013 is very sensitive to connectivity to the SQL Server as I've seen the issue with lost permissoins twice, and the first time was a reboot of SQL definitely causing it. Then I stopped the SQL server on my dev environment for a while to see if that will happen and it didn't, it must be something else then. And yes, we do use SQL aliases, but we only had one SQL server in that production environment. Hopefully that helps you solve such critical issues quickly.