XSLT processes XML with very weak standards (EAD)

I had a hellish week trying to write XSLT code that can process XML documents that conform to the (very permissive) EAD standards .

Useful information in an EAD document is hard to find for sure. Different EAD documents can place the same bit of information in completely different parts of the data tree. In addition, in the same EAD document, the same tag can be used many times in different places for different information. See this SO post for an example of this . This makes it difficult to develop a single XSLT file that properly handles these different files.

In general terms, the problem can be described as:

  • How to choose a specific EAD node, which is located in an unknown place,
  • Without randomly selecting unwanted nodes with the same name()?

Finally, I put together the XSLT that I need, and thought it was best to abandon the general version of the code so that others could benefit or improve it.

I would like to tag this question with an EAD tag, but I don't have enough reputation. If someone with the appropriate number of representatives thinks this would be helpful, do it.

+1
source share
1 answer

First, a brief description of the solution, followed by the code.

  • , EAD () ( <cXX>). , EAD. . , . 3.
  • , , <dsc> , . , . 3, 4 .
  • , , apply-template node .
  • . , 2 ( , ), 4.

( ) XSLT- :

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="ISO-8859-1" indent="yes"/>

<xsl:template match="/ead">
<records>
    <xsl:if test="//dsc">
        <!-- if there are <cXX> nodes, we'll handle the main record differently.
             <cXX> nodes are always found in the 'dsc' node, which contains nothing else -->
        <xsl:call-template name="carefully_process"/>
    </xsl:if>
    <xsl:if test="not(//dsc)">
        <record>
            <!-- Just process the existing nodes -->
            <xsl:apply-templates select="*"/>
        </record>
    </xsl:if>
</records>
</xsl:template>

<xsl:template name="carefully_process">
    <!-- first we'll process all the nodes for the main
         record. Then we'll call the child records -->
    <record>
        <!-- have to be careful not to process //archdesc/dsc yet -->
        <xsl:apply-templates select="*[not(self::archdesc)]"/>
        <xsl:apply-templates select="archdesc/*[not(self::dsc)]"/>

    <!-- Now we can close off the master record, -->
    </record>
    <!-- and process the child records -->
    <xsl:apply-templates select="/ead/archdesc/dsc"/>
</xsl:template>

<xsl:template match="dsc">
    <!-- Start processing the child records (we use for-each to get a good position() -->
    <xsl:for-each select="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']">
        <xsl:apply-templates select=".">
            <!-- we pass the unittitle and unitid of the master record, so that child
                 records can be linked to it. We pass the position of the child so that
                 a unitid can be created if it doesn't exist -->
            <xsl:with-param name="partitle" select="normalize-space(/ead/archdesc/did/unittitle)"/>
            <xsl:with-param name="parid" select="normalize-space(/ead/archdesc/did/unitid)"/>
            <xsl:with-param name="pos" select="position()"/>
        </xsl:apply-templates>
    </xsl:for-each>
</xsl:template>

<!-- process child nodes -->
<xsl:template match="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']" >
<xsl:param name="partitle"/>
<xsl:param name="parid"/>
<xsl:param name="pos"/>
    <!-- start this child record -->
    <record>

        <!-- EAD does not require a unitid, but my code does.
             If it doesn't exist, create it -->
        <xsl:if test="not(./did/unitid)">
            <atom name="unitid">
                <xsl:value-of select="$parid"/><xsl:text>-</xsl:text><xsl:value-of select="$pos"/>
            </atom>
        </xsl:if>

        <!-- get the level of this component -->
        <atom name="eadlevel">
            <xsl:value-of select="concat(translate(substring(@level,1,1),'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ'),substring(@level,2))"/>
        </atom>

        <!-- Do *something* to attach this record to it parent.
             Probably involves $partitle and $parid. For example: -->
        <ref>
            <atom name="unittitle"><xsl:value-of select="$partitle"/></atom>
            <atom name="unitid"><xsl:value-of select="$parid"/></atom>
        </ref>

        <!-- now process all the other nodes -->
        <xsl:apply-templates select="*[not(starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c')]"/>

    <!-- finish this child record -->
    </record>

    <!-- prep the variables we'll need for attaching any child records (<cXX+1>) to this record -->
    <xsl:variable name="this_title">
        <xsl:value-of select="normalize-space(./did/unittitle)"/>
    </xsl:variable> 
    <xsl:variable name="this_id">
        <xsl:if test="./did/unitid">
            <xsl:value-of select="./did/unitid"/>
        </xsl:if>
        <xsl:if test="not(./did/unitid)">
            <xsl:value-of select="$parid"/><xsl:text>-</xsl:text><xsl:value-of select="$pos"/>
        </xsl:if>
    </xsl:variable>

    <!-- now process the children of this node -->
    <xsl:for-each select="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']">
        <xsl:apply-templates select=".">
            <xsl:with-param name="partitle" select="$this_title"/>
            <xsl:with-param name="parid" select="$this_id"/>
            <xsl:with-param name="pos" select="position()"/>
        </xsl:apply-templates>
    </xsl:for-each>
</xsl:template>

<!-- these are usually just wrappers. Go one level deeper -->
<xsl:template match="descgrp|eadheader|revisiondesc|filedesc|titlestmt|profiledesc|archdesc|archdescgrp|daogrp|langusage|did|frontmatter">
    <xsl:apply-templates select="*"/>
</xsl:template>

<!-- below this point, add templates for processing specific EAD units
     of information. For example, the template might look like

<xsl:template match="titleproper">
    <atom name="titleproper">
        <xsl:value-of select="normalize-space(.)"/>
    </atom>
</xsl:template>
-->

<!-- instead of having a template for each EAD information unit, consider
     a generic template that handles them all the same way. For example:
-->
<xsl:template match="*">
    <atom>
        <xsl:attribute name="name"><xsl:value-of select="name()"/></xsl:attribute>
        <xsl:value-of select="normalize-space(.)"/>
    </atom>
</xsl:template>

</xsl:stylesheet>
+1

All Articles