Converting RTF to Text in SQL Revisited
If you’ve ever attempted to include content from a Project multi-line text custom field or Issues multi-line fields in an Excel report, you will get a result that looks like this:
<div><p>Vendor is having issues with deliveries to the East Coast.</p></div>
At the 2012 Project Conference, I presented a SQL based solution that converts the RTF content to XML and then returns the text. This technique was also published here in the Project Server Forum.
In practice, the technique worked but had failures in some cases. I found early bugs in SharePoint where the lead or trailing <div> tag was missing. However, this did not explain all failures.
I got an email from Doug Welsby of Microsoft Canada, that the failures were due to text values like which are not standard XML. Basically, an inline (Document Type Definition) DTD is needed to convert a non-standard XML value like to XML compliant   The full list of possible values to convert can be found here: HTML Codes
This technique works on any multi-line Project custom fields or multi-line list columns in SharePoint. Therefore, the Issue Discussion field from SharePoint can also be cleaned with this technique.
The following example illustrates the SQL technique, though you may have to extend the DTD definitions to fit your own data. Use the table to above to add additional ENTITY declarations. The technique also assumes that the multi-line field contains valid HTML. I’ve found no issues with this technique on systems patched to June 2012 CU or later on Project Server 2010.
This technique will work with Project 2007, 2010 and 2013 databases as there are no real differences in how this type of data is stored across releases. This technique does work inside an Office Data Connection.
NOTE: One of the challenges I faced is that I could find no good examples of how to do an inline DTD in SQL. While the technique below works, I’m open to a more elegant way to construct the DTD. Please post any ideas or issues in the comments.
The technique does the following:
- Converts the RTF value to an XML document
- Extracts the text value from the XML
- Trims the blank space from the resulting value
- Replaces any null values with a blank
- First, two variables are declared to hold the XML Header and Footer information. The Header contains the inline DTD declaration for and could be augmented with more declarations. The Footer contains the tags needed to complete the document.
The SQL CONVERT function changes the concatenated string of Header, your multi-line Project custom field and Footer into an XML document. The 3 option in the CONVERT command enables the inline DTD as well as it leaves all internal spaces as is. More details on the CONVERT command can be found here.
The .value method performs and XQuery and returns the string value to SQL. More details on this method can be found here. The method returns the value to an nvarchar(4000) field but you can use nvarchar(max) if you think you may have a truncation issue.
LTRIM removes the leading spaces of the resulting string. You could also do a RTRIM to remove following spaces but I didn’t find this to be an issue. The ISNULL is used to return a blank when a null value is returned. If you don’t do this, you get (blank) in Excel rather than an empty cell. You can fix this in Excel but I found it more efficient to simply fix it in the data source for every report that uses it.
Replace the YourMulti-lineCustomField text below with the name of your custom field. Since the original field is also in the query result set, I tend to name it YourMulti-lineCustomFieldClean so that I can distinguish the two versions.
The ISNULL clause should be repeated for each custom field you are cleaning.
declare @Headxml nvarchar(350) declare @Footxml nvarchar(50) set @Headxml = N'<?xml version="1.0"?> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" [<!ENTITY nbsp " "><!ENTITY quot """>]><html><body>' set @Footxml = N'</body></html>' select * ,ISNULL(LTRIM((CONVERT(xml,(@Headxml+[YourMulti-lineCustomField]+@Footxml),3) .value(N'(/)','nvarchar(4000)'))),'') AS [YourMulti-lineCustomFieldNewName] ,ISNULL(LTRIM((CONVERT(xml,(@Headxml+[YourMulti-lineCustomField2]+@Footxml),3) .value(N'(/)','nvarchar(4000)'))),'') AS [YourMulti-lineCustomFieldNewName2] FROM dbo.MSP_EpmProject_UserView