Regex for replacing content not inside HTML tags

I have a feature that helps link pages on my site by viewing blog entries, news, and other items for specific keywords. He then replaces these keywords with a link to the corresponding page.

I have a problem when there are some words that should not be replaced by links. For example, I have a summary tag in several of my HTML tables that contains a small summary of the contents of the table. For example, I might have a tag that looks like this:

<table width="500" cellspacing="0" cellpadding="4" border="0" summary="This table contains a list of all car parts in inventory along with their corresponding prices">
...
</table>

My function incorrectly replaces a keyword or phrase like "car parts" with a link. How can I structure my replacement regular expression so as NOT to replace it in such cases, but DO replace it if it appears in a paragraph or even inside a cell in an HTML table.

Thank you in advance for your help and guidance!

EDIT: just for clarification, I use PHP to render my pages. I use str_replace () before the content is output as HTML to the page. I want to be able to replace this with ereg_replace () in order to replace the content only if it meets certain conditions (i.e., as described above). Sorry if this caused confusion!

+3
source share
1 answer

HTML. PHP DOM:

$DOM = new DOMDocument;
$DOM->loadHTML($str); // Your HTML

//get all tds
$cells = $DOM->getElementsByTagName('td');

// Do stuff to the cells

//get all paragraphs
$paragraphs = $DOM->getElementsByTagName('p');

// Do stuff to the paragraphs

// Etc...
+6

All Articles