Converting RTF to Text in SQL Revisited

by Welcome to Marquee Insights

If you’ve ever attempted to include content from a Project multi-line text custom field or Issues multi-line fields in an Excel report, you will get a result that looks like this:

<div><p>Vendor is having issues with deliveries to the East Coast.</p></div>

At the 2012 Project Conference, I presented a SQL based solution that converts the RTF content to XML and then returns the text. This technique was also published here in the Project Server Forum.

In practice, the technique worked but had failures in some cases. I found early bugs in SharePoint where the lead or trailing <div> tag was missing. However, this did not explain all failures.

I got an email from Doug Welsby of Microsoft Canada, that the failures were due to text values like &nbsp; which are not standard XML. Basically, an inline (Document Type Definition) DTD is needed to convert a non-standard XML value like &nbsp; to XML compliant &#160;  The full list of possible values to convert can be found here: HTML Codes

Technique Assumptions

This technique works on any multi-line Project custom fields or multi-line list columns in SharePoint. Therefore, the Issue Discussion field from SharePoint can also be cleaned with this technique.

The following example illustrates the SQL technique, though you may have to extend the DTD definitions to fit your own data. Use the table to above to add additional ENTITY declarations. The technique also assumes that the multi-line field contains valid HTML. I’ve found no issues with this technique on systems patched to June 2012 CU or later on Project Server 2010.

This technique will work with Project 2007, 2010 and 2013 databases as there are no real differences in how this type of data is stored across releases. This technique does work inside an Office Data Connection.

NOTE: One of the challenges I faced is that I could find no good examples of how to do an inline DTD in SQL. While the technique below works, I’m open to a more elegant way to construct the DTD. Please post any ideas or issues in the comments.

The Technique

The technique does the following:

  • Converts the RTF value to an XML document
  • Extracts the text value from the XML
  • Trims the blank space from the resulting value
  • Replaces any null values with a blank
    First, two variables are declared to hold the XML Header and Footer information. The Header contains the inline DTD declaration for &nbsp; and could be augmented with more declarations. The Footer contains the tags needed to complete the document.

The SQL CONVERT function changes the concatenated string of Header, your multi-line Project custom field and Footer into an XML document. The 3 option in the CONVERT command enables the inline DTD as well as it leaves all internal spaces as is. More details on the CONVERT command can be found here.

The .value method performs and XQuery and returns the string value to SQL. More details on this method can be found here. The method returns the value to an nvarchar(4000) field but you can use nvarchar(max) if you think you may have a truncation issue.

LTRIM removes the leading spaces of the resulting string. You could also do a RTRIM to remove following spaces but I didn’t find this to be an issue. The ISNULL is used to return a blank when a null value is returned. If you don’t do this, you get (blank) in Excel rather than an empty cell. You can fix this in Excel but I found it more efficient to simply fix it in the data source for every report that uses it.

Code Sample

Replace the YourMulti-lineCustomField text below with the name of your custom field. Since the original field is also in the query result set, I tend to name it YourMulti-lineCustomFieldClean so that I can distinguish the two versions.

The ISNULL clause should be repeated for each custom field you are cleaning.

    declare @Headxml nvarchar(350)
    declare @Footxml nvarchar(50)
    
    set @Headxml = N'<?xml version="1.0"?>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
    "http://www.w3.org/TR/html4/loose.dtd"
    [<!ENTITY  nbsp "&#xA0;"><!ENTITY quot "&#x22;">]><html><body>'
    
    set @Footxml = N'</body></html>'
    
    select   *      
            ,ISNULL(LTRIM((CONVERT(xml,(@Headxml+[YourMulti-lineCustomField]+@Footxml),3)
            .value(N'(/)','nvarchar(4000)'))),'') AS [YourMulti-lineCustomFieldNewName]
    
            ,ISNULL(LTRIM((CONVERT(xml,(@Headxml+[YourMulti-lineCustomField2]+@Footxml),3)
            .value(N'(/)','nvarchar(4000)'))),'') AS [YourMulti-lineCustomFieldNewName2]
    FROM  dbo.MSP_EpmProject_UserView

2 Responses to “Converting RTF to Text in SQL Revisited”

November 03, 2012 at 7:10 am, Tony Proctor said:

Very nice, way to go Treb. You always come up with the best solutions for tough problems.

Reply

March 26, 2014 at 11:41 am, Andrew Payze said:

Hi Treb,
Thanks for this information. I am trying to get this to work with PS013 and I am getting an error message

The data types ntext and varchar are incompatible. I am struggling to get this resolved. Any quick ideas?

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.