Possible duplicate:
What is the best way to clean up Word HTML? PHP to clear Microsoft pasted input
I allow clients to enter notes in a text editor and have only recently been updated to ckEditor 3x, which by default separates the classes, styles, and comments of MS words (when users are inserted into the editor object). So moving forward I'm all set.
Recently, I needed to clear records for 5 years, some of which contain embedded HTML text. I need to skip this text and clear it.
I do not need to highlight all span tags, only those that are defined as written by Microsoft.
I tried using HTMLCleaner, but it does not remove the HTML generated by MS. http://word2cleanhtml.com does exactly what I want, however the developers do not currently offer the API for general use (as of July 9, 2012).
I have been looking for such a class and continued for the past few weeks, and I was not very lucky. Have any of you found a useful class that you would like to share?
source
share