Choose adjacent sibling elements without interfering with non-white text nodes

This markup is like:

<p>
  <code>foo</code><code>bar</code>
  <code>jim</code> and then <code>jam</code>
</p>

I need to choose the first three <code>- but not the last. The logic is "Select all elements codethat have an element preceding or following-sibling- , which is also code, unless there is one or more text nodes with non-white content between them.

Given that I am using Nokogiri (which uses libxml2), I can use XPath 1.0 expressions.

Although a complex XPath expression is required, Ruby code / iterations to do the same in a Nokogiri document are also acceptable.

Note that the CSS adjacent selector selector ignores non-element nodes, so selecting nokodoc.css('code + code')will not correctly select the last block <code>.

Nokogiri.XML('<r><a/><b/> and <c/></r>').css('* + *').map(&:name)
#=> ["b", "c"]

Change . Additional examples for clarity:

<section><ul>
  <li>Go to <code>N</code> and
      then <code>Y</code><code>Y</code><code>Y</code>.
  </li>
  <li>If you see <code>N</code> or <code>N</code> then…</li>
</ul>
<p>Elsewhere there might be: <code>N</code></p>
<p><code>N</code> across parents.</p>
<p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
<p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>

All Yabove should be selected. None of Nwhich should be selected. Content is <code>used only to indicate which one should be selected: you cannot use content to determine whether to select an item.

The context items in which it is displayed <code>do not matter. They can appear in <li>, they can appear in <p>, they can appear in something else.

<code>. , Y .

+5
3

//code
     [preceding-sibling::node()[1][self::code]
    or
      preceding-sibling::node()[1]
         [self::text()[not(normalize-space())]]
     and
      preceding-sibling::node()[2][self::code]
    or
     following-sibling::node()[1][self::code]
    or
      following-sibling::node()[1]
         [self::text()[not(normalize-space())]]
     and
      following-sibling::node()[2][self::code]
     ]

XSLT:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>

     <xsl:template match="/">
      <xsl:copy-of select=
       "//code
             [preceding-sibling::node()[1][self::code]
            or
              preceding-sibling::node()[1]
                 [self::text()[not(normalize-space())]]
             and
              preceding-sibling::node()[2][self::code]
            or
             following-sibling::node()[1][self::code]
            or
              following-sibling::node()[1]
                 [self::text()[not(normalize-space())]]
             and
              following-sibling::node()[2][self::code]
             ]"/>
     </xsl:template>
</xsl:stylesheet>

XML-:

<section><ul>
      <li>Go to <code>N</code> and
          then <code>Y</code><code>Y</code><code>Y</code>.
      </li>
      <li>If you see <code>N</code> or <code>N</code> then…</li>
    </ul>
    <p>Elsewhere there might be: <code>N</code></p>
    <p><code>N</code> across parents.</p>
    <p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
    <p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>

XPath :

<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
+4
//code[
  (
    following-sibling::node()[1][self::code]
    or (
      following-sibling::node()[1][self::text() and normalize-space() = ""]
      and
      following-sibling::node()[2][self::code]
    )
  )
  or (
    preceding-sibling::node()[1][self::code]
    or (
      preceding-sibling::node()[1][self::text() and normalize-space() = ""]
      and
      preceding-sibling::node()[2][self::code]
    )
  )
]

, , , , .

Im, , , , , , , , DOM . Ive , code , , , .

+3

I think this is what you want:

/p/code[not(preceding-sibling::text()[not(normalize-space(.)="")])]
+1
source

All Articles