I am currently analyzing very large xml files> 40 MB. I was just starting to develop in scala, so I looked over the net for some good libraries and came across scala Scales, which seem to do very well with large files.
I read:
http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.1/0.2/ScalesXmlIntro.html
,
http://scala-scales.googlecode.com/svn/sites/scales /scales-xml_2.9.2/0.4.4/PullParsing.html
and then tested the pullXml function to make sure all libraries are imported correctly.
val pull = pullXml(new FileReader("/Users/mycrazyxml/tmp/large.xml"))
while( pull.hasNext ){
pull.next match {
case Left( i : XmlItem ) =>
Logger.info("XmlItem: "+i)
case Left( e : Elem ) => {
Logger.info("Element: "+e)
}
case Right(endElem) =>
Logger.info("Endelement: "+endElem)
}
}
, ! !
db,
, , .
.
Eg. XML Enterprise, LocalUnits.
Enterprise LocalUnits.
endElement Enterprise- Enterprise LocalUnits.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Info SYSTEM "info.dtd">
<Info>
<Enterprise>
<RegNo>12345678</RegNo>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
<EName>
<Legal>Crazy Company</Legal>
</EName>
<SNI>
<Code>00000</Code>
<Rank>1</Rank>
</SNI>
<LocalUnit>
<CFARNo>987654321</CFARNo>
<LUType>1</LUType>
<LUName>Crazy Company Gym</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
<LocalUnit>
<CFARNo>987654322</CFARNo>
<LUType>1</LUType>
<LUName>Crazy Company Restaurant</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
</Enterprise>
<Enterprise>
<RegNo>12345671220</RegNo>
<Address>
<StreetInfo>
<StreetName>Cupertino Road</StreetName>
<StreetNumber>2</StreetNumber>
</StreetInfo>
</Address>
<EName>
<Legal>Fun Company HQ</Legal>
</EName>
<SNI>
<Code>00000</Code>
<Rank>1</Rank>
</SNI>
<LocalUnit>
<CFARNo>987654321</CFARNo>
<LUType>1</LUType>
<LUName>Fun Company</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Cupertino road</StreetName>
<StreetNumber>2</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
</Enterprise>
</Info>
. xml pullXml ?