Converting RTF to Text in SQL Part 3

by Welcome to Marquee Insights

captain-picard-facepalm-meme

Yes, Part 3. This technique was published some time ago and as it gets further “field testing”, new conditions have come to light which require a revision of this technique. Assuming this is the last post on this topic, I need to figure out how to duplicate this logic in PowerBI.

I consider this to be version 15 or so of this query since my first version was written in mid-September of 2010. The changes here are as follows.

First, the DTD definitions have been expanded to include all possible HTML codes. This should prevent a report blowing up when someone decides to use an unexpected symbol somewhere.

Second, I keep seeing random <br> tags in the fields, which ironically is breaking the query. I’ve added a replace statement for that as well.

Lastly, I’ve expanded the field sizes to Max where necessary to accommodate truncation issues.

Sample code is below. I strongly recommend copying into NotePad first so that you don’t accidentally get curly quotes. I’ve also left it as straight text to make it easier to copy.

declare @Headxml nvarchar(3000)
declare @Footxml nvarchar(50)

set @Headxml = N'<?xml version=”1.0″?>
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”
http://www.w3.org/TR/html4/loose.dtd”
[<!ENTITY quot “&#34;”>
<!ENTITY amp “&#38;”>
<!ENTITY lt “&#60;”>
<!ENTITY gt “&#62;”>
<!ENTITY nbsp “&#160;”>
<!ENTITY iexcl “&#161;”>
<!ENTITY cent “&#162;”>
<!ENTITY pound “&#163;”>
<!ENTITY curren “&#164;”>
<!ENTITY yen “&#165;”>
<!ENTITY brvbar “&#166;”>
<!ENTITY sect “&#167;”>
<!ENTITY uml “&#168;”>
<!ENTITY copy “&#169;”>
<!ENTITY ordf “&#170;”>
<!ENTITY laquo “&#171;”>
<!ENTITY not “&#172;”>
<!ENTITY shy “&#173;”>
<!ENTITY reg “&#174;”>
]><html><body>’

set @Footxml = N'</body></html>’

select   *
,ISNULL(LTRIM((CONVERT(xml,(@Headxml + replace([YourMulti-lineCustomField],'<br>’,”) + @Footxml),3).value(N'(/)’,’nvarchar(Max)’))),”)   AS [YourMulti-lineCustomFieldNewName]

,ISNULL(LTRIM((CONVERT(xml,(@Headxml + replace([YourMulti-lineCustomField2],'<br>’,”) + @Footxml),3).value(N'(/)’,’nvarchar(Max)’))),”)   AS [YourMulti-lineCustomFieldNewName2]

FROM  dbo.MSP_EpmProject_UserView

One Response to “Converting RTF to Text in SQL Part 3”

December 06, 2014 at 3:24 am, Round Up Of Recent Microsoft Project Posts | Project Management Buzz said:

[…] Treb recently updated his ‘getting rid of html tags’ document Converting RTF to Text in SQL Part 3 which I know I will be […]

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.