Converting RTF to Text in SQL Part 3
Yes, Part 3. This technique was published some time ago and as it gets further “field testing”, new conditions have come to light which require a revision of this technique. Assuming this is the last post on this topic, I need to figure out how to duplicate this logic in PowerBI.
I consider this to be version 15 or so of this query since my first version was written in mid-September of 2010. The changes here are as follows.
First, the DTD definitions have been expanded to include all possible HTML codes. This should prevent a report blowing up when someone decides to use an unexpected symbol somewhere.
Second, I keep seeing random <br> tags in the fields, which ironically is breaking the query. I’ve added a replace statement for that as well.
Lastly, I’ve expanded the field sizes to Max where necessary to accommodate truncation issues.
Sample code is below. I strongly recommend copying into NotePad first so that you don’t accidentally get curly quotes. I’ve also left it as straight text to make it easier to copy.
declare @Headxml nvarchar(3000)
declare @Footxml nvarchar(50)
set @Headxml = N'<?xml version=”1.0″?>
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”
“http://www.w3.org/TR/html4/loose.dtd”
[<!ENTITY quot “"”>
<!ENTITY amp “&”>
<!ENTITY lt “<”>
<!ENTITY gt “>”>
<!ENTITY nbsp “ ”>
<!ENTITY iexcl “¡”>
<!ENTITY cent “¢”>
<!ENTITY pound “£”>
<!ENTITY curren “¤”>
<!ENTITY yen “¥”>
<!ENTITY brvbar “¦”>
<!ENTITY sect “§”>
<!ENTITY uml “¨”>
<!ENTITY copy “©”>
<!ENTITY ordf “ª”>
<!ENTITY laquo “«”>
<!ENTITY not “¬”>
<!ENTITY shy “­”>
<!ENTITY reg “®”>
]><html><body>’
set @Footxml = N'</body></html>’
select *
,ISNULL(LTRIM((CONVERT(xml,(@Headxml + replace([YourMulti-lineCustomField],'<br>’,”) + @Footxml),3).value(N'(/)’,’nvarchar(Max)’))),”) AS [YourMulti-lineCustomFieldNewName]
,ISNULL(LTRIM((CONVERT(xml,(@Headxml + replace([YourMulti-lineCustomField2],'<br>’,”) + @Footxml),3).value(N'(/)’,’nvarchar(Max)’))),”) AS [YourMulti-lineCustomFieldNewName2]
FROM dbo.MSP_EpmProject_UserView
December 06, 2014 at 3:24 am, Round Up Of Recent Microsoft Project Posts | Project Management Buzz said:
[…] Treb recently updated his ‘getting rid of html tags’ document Converting RTF to Text in SQL Part 3 which I know I will be […]