Workflow History and SQL Error

Posted by Rik Hepworth on Thursday, July 3, 2008

When trying to view an item in a list which has workflows run against it, you get an error:

Some part of your SQL statement is nested too deeply. Rewrite the query or break it up into smaller queries

Problem Background

Trying to explain the exact nature of our configuration in this case would break many people’s heads. This, therefore, is a bit of a simplification.

We have a custom webpart which allows users to log an enquiry. We create an item in a list with the enquiry details, and send an email to the account responsible for dealing with those enquiries. A copy of the list item is created in another list (we’ll leave out the why and wherefore of that for now). The two copies must be kept in sync. More details on that later.

Those enquiries must be closed within 30 minutes. If not, an escalation email is sent. An enquiry is closed if a particular column changes value. To ensure the two lists are kept in sync, when an item is changed a workflow is triggered. If the column we care about has changed we sync up the item in the other list.

The escalation process is a timer. It checks the items and sends emails. It updates a column with the time of the last email sent so we can repeat the process every 30 minutes.

What we found was that the enquiries weren’t being closed for a few days and in that time we could then not access the enquiry item at all via the web interface (although datagrid view still worked!). We saw the error at the top of this post.

The Root of the Matter

This fault is currently with our Microsoft Support team and they are working through it. I do, however, have enough knowledge and understanding of why the fault occurred to explain it, and a few dirty hacks to avoid it.

The reason we can’t access the items is because when SharePoint pulls up the item for edit/view it checks the Workflow History for that item. If there are more than about 200 entries for that item in the Workflow History list, we get the SQL query error and boom! That’s the long and short of it.

The deeper question is why? More importantly, why do we have over 200 workflows running on the item?

Workflow History first. The Workflow History list is a hidden list which does exacly what it says on the tin. Items are created each time a workflow runs. It turns out that items in the Workflow History list have a time-to-live and that time is 60 days. That means that any item in the list will automatically be deleted after 60 days. With roughly a 200 item limit before you hit trouble that means about 3 workflows per list item per day is your maximum.

Personally, I think that is a scalability issue. I can envision a scenario where we might want to run that many workflows by design, perhaps more.

Back to the plot. I suspect you’re sitting there thinking that in our case, having that many workflows run is bad design or a fault. Well, you’re not wrong, although you’re not quite right either.

We knew when we built the workflow that we had to avoid circular references and update the lists as little as possible. There is code to make sure that changes made by the workflow itself are not reflected back, and if the change is not the column we care about then the workflow exists cleanly.

We also knew that because the timer job updates a different column in the list item, that would trigger the change event on the list item, running the workflow. As a result the timer performs a system update on the list item which should not trigger events (and indeed does not when we have used the method elsewhere).

What this means is that we actually have two problems:

  1. The system update method when used in our timer is not working correctly and events on the list item are being triggered. This means that the workflow is running too often.
  2. The issue with Workflow History means that we very quickly hit the 200 item limit and meet our end with the SQL query error.

A Legion of Dirty Hacks

As I write, these issues are with Microsoft Support who are ably working to resolve them. In the meantime, we have made the problem go away with two approaches, both of which I regard as dirty hacks.

The Workflow History Conundrum

Whilst investigating this problem I came across a discussion on the TechNet support forums. Ironically this was coming at the same problem but from a wholly opposite angle, whereby people wanted to keep items in the Workflow History list for longer!

What I found in that list was a post by Fred Morrison containing a PowerShell script. I am re-posting it here for completeness in case the forum disappears, but all credit to Fred for this - I didn’t write it!

# SPAdjustAutoCleanupDays.ps1
# Author: Fred Morrison, Senior Software Engineer, Exostar, LLC
#
# Purpose: Adjust SharePoint Workflow Association AutoCleanupDays value, where necessary
# on all workflow associations for a specified List.
#
# Parameters:
# siteName - The SharePoint Site to look at
# listName - The SharePoint List to look at
# newCleanupDays - The number of days to set the workflow association AutoCleanupDays value to, if not already set.
#
# Example call: SPAdjustAutoCleanupDays http://workflow2/FredsWfTestSite FredsNewTestList 180
#
# following makes it easier to work with SharePoint and also means you have to run this script on the SharePoint server
[void] [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint") | Out-Null
# capture command line arguments
$siteName = $args[0] # ex: http://workflow2/FredsWfTestSite/
$listName = $args[1] # ex: FredsNewTestList
[int] $newCleanupDays = [System.Convert]::ToInt32($args[2]) # ex: 1096
Write-Host $siteName
Write-Host $listName
Write-Host $newCleanupDays
# get a reference to the SPSite object
$wfSite = New-Object -TypeName Microsoft.SharePoint.SPSite $siteName
[Microsoft.SharePoint.SPWeb] $wfWeb = $wfSite.OpenWeb()
Write-Host $wfWeb.ToString()
# get a reference to the SharePoint list we wish to examine
[Microsoft.SharePoint.SPList] $wfList = $wfWeb.Lists[$listName];
Write-Host $wfList.Title
[Microsoft.SharePoint.Workflow.SPWorkflowAssociation] $wfAssociation = $null
[Microsoft.SharePoint.Workflow.SPWorkflowAssociation] $a = $null
[int] $assoCounter = 0
[string] $message = ''
# Look at every workflow association on the SPList and make sure the AutoCleanupDays value is correctly set to the desired value
for( $i=0; $i -lt $wfList.WorkflowAssociations.Count; $i++)
{
$a = $wfList.WorkflowAssociations[$i]
[string] $assocName = $a.Name
Write-Host $a.Name
if ( $a.AutoCleanupDays -ne $newCleanupDays )
{ 
$oldValue = $a.AutoCleanupDays 
$a.AutoCleanupDays = $newCleanupDays
# save the changes 
$wfList.UpdateWorkflowAssociation($a) 
$message = "Workflow association $assocName AutoCleanupDays was changed from $oldValue to $newCleanupDays"
}
else
{
$message = "Workflow association $assocName AutoCleanupDays is already set to $newCleanupDays - no change needed"
}
Write-Host $message
}
Write-Host 'Done'

I simply ran that script on our system, setting the value for newCleanUpDays to 1. I waited a day and voila! All the list items were now accessible. Note that, as repeated in the forum discussion time and again, messing about with this is not a good idea. I simply have no choice right now.

The Timer Incident

It was all very well fixing the Workflow History list, but we really shouldn’t be seeing all those workflows in the first place. For some reason, our method of updating the list item from the timer, whilst being the official approach, triggered the workflow anyway.

To the rescue, a method we found on the blog of Paul Kotlyar. In that post, Paul talks about disabling event firing for the list item to ensure that no events get triggered. Why do I think this is a hack? Because the functionality is not normally found in workflows and timers - the method is part of SPEventReceiverBase.

Where Do We Go From Here?

Right now, I have support cases logged with Microsoft and engineers are working on the matter. We’ve already been via the SQL team, who looked at the original query that triggered the whole shebang, and they have returned an updated query for the SharePoint guys to look at. We also need to get to the bottom of a ‘correct’ way of updating list items without triggering events. As soon as I get a resolution from Microsoft, I will let you know.