Clearing HTML with PHP to create a clean line

I have a bunch of HTML data that I write to a PDF file using PHP. In PDF, I want all HTML to be removed and cleaned up. For example:

<ul>
    <li>First list item</li>
    <li>Second list item which is quite a bit longer</li>
    <li>List item with apostrophe  's</li>
</ul>

It should become:

First list item
Second list item which is quite a bit longer
List item with apostrophe  's

However, if I just use strip_tags(), I get something like this:

   First list item&#8232;

   Second list item which is quite a bit
longer&#8232;

   List item with apostrophe &rsquo;s &rsquo;s

Also note the indentation of the output.

Any tips on how to properly clear HTML to nice clean lines without messy spaces and odd characters?

Thank:)

+5
source share
3 answers

you can decode the result of strip_tags with html_entity_decode or remove them with preg_replace:

$text = strip_tags($html_text);
$content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$text );

and to remove spaces from the beginning of your lines use ltrim :

$content = join("\n", array_map("ltrim", explode("\n", $content )));

:

$text = strip_tags($html_text);
$text = str_replace("&rsquo;","'", $text); 
$content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$text );
+3

, , html. :

html_entity_decode( strip_tags( $my_html_code ) );
+3

use php tidy to clean your html. But in your case, I would use the DOMDocument class to get data from html.

0
source

All Articles