Join wMUsers | Blog at wMUsers | User Control Panel | Site Map | webMethods Jobs |For Employers

Prashantha Upadhya -- webMethods Ezine Columnist

Maximize Memory Usage with the Node Iterator



By Prashantha Upadhya

 


Introduction

Processing a large file is a real challenge to any developer. When a recent project's requirements dictated that a large XML file be parsed, I recognized that there were two choices -- 1) Write a custom java service to read a chunk of XML into memory and then process it or, 2) Use the webMethods Node Iteration services for parsing. In order to minimize memory consumption, I took the Node Iterator approach.

By iterating over each node of the XML document as needed -- rather than reading the entire document node tree -- the webMethods built-in Node Iterator services give developers an efficient parsing method for large XML documents.


An Example -- The Purchase Order

Let us consider a XML Purchase Order document with the following structure. It is a fictional purchase order, but let's consider it for the sake of illustration.

<PurchaseOrder>
	.........................
	<OrderItemList>
		<OrderItem>
			<ItemName>Item 1</ItemName>
			<ItemQuantity>100</ItemQuantity>
			<ItemPrice>100.00</ItemPrice>
		</OrderItem>
		<OrderItem>
			<ItemName>Item 2</ItemName>
			<ItemQuantity>200</ItemQuantity>
			<ItemPrice>200</ItemPrice>
		</OrderItem>
		................
		<OrderItem>
			<ItemName>Item 100000</ItemName>
			<ItemQuantity>700</ItemQuantity>
			<ItemPrice>700.00</ItemPrice>
		</OrderItem>
	</OrderItemList>
</PurchaseOrder>

Let’s assume that this purchase order contains millions of items, which would make the XML document very large in size. The conventional webMethods approach of parsing the XML document to memory will result in the well-known Integration Server "Out of Memory" error. Meaning, if you use the built-in services pub.web:stringToDocument and pub.web:documentToRecord, the parsing will surely fail (assuming a low-end server).

Since we are only interested in the large document's OrderItem nodes, parsing the entire document is both a waste of system resources and of time. Node Iteration is a better option because the Node Iterator services provide a cursor that can be placed directly on any node of an XML document. In the XML document above, for example, we can invoke the Node Iteration services and parsing logic directly on the <OrderItem> nodes.


Using the Node Iteration Built-In Services

webMethods provides two built-in services for iterating over XML document nodes:

  1. pub.web:getNodeIterator
  2. pub.web:getNextNode

If the option is available, load the XML document as a stream rather than as bytes. This is because stream-based data is loaded on demand and system memory is, therefore, more efficiently managed.

Once loaded (either as a stream or as bytes), generate a Node object by invoking the pub.web:stringToDocument service. Next, feed the Node object to the webMethods built-in service pub.web:getNodeIterator. Be sure to set the criteria input to the name of the node over which you wish to iterate. In our example, the criteria is "OrderItem".

The output of the pub.web:getNodeIterator will be an Iterator object representing the OrderItem node. By invoking pub.web:getNextNode, a developer can then introduce the Node object for the <OrderItem> node into the pipeline.

Now, with the Node object in the pipeline, proceed with the pub.web:documentToRecord service call on the Node object. This action parses only the first <OrderItem> node into an IData object. Continue invoking pub.web:getNextNode until the service output is null, indicating that all of the <OrderItem> nodes have been processed.

A sample package accompanying this webMethods Ezine article demonstrates the above example. Run the service nodeiteratorexample.process:processPO in Debug Mode to see the iterations in action.


Moving Window Mode

The Node Iterator services may also be invoked in "Moving Window" mode. In Moving Window mode, only the most recent Node object returned by pub.web:getNextNode is resident in the Integration Server memory. All previously iterated Node objects are invalid.

In disregarding "old" nodes, memory management is made more efficient. The burden is placed on the developer, though, to completely process the Node object before getting the next node with pub.web:getNextNode.

To use Moving Window mode, set the pub.web:getNodeIteratorService In variable movingWindow equal to "true". There is a sample service in the WmEDISamples package called Iterator810 to help you better understand the Moving Window technique.


A Few Extra Tips

A few more tips for processing large files using the Node Iterator services:

  • Initiate a garbage collection by calling System.gc() in a java service with a scheduler. Remember -- this affects the entire JVM performance.
  • Drop all the unnecessary objects in the pipeline, especially in the loop.
  • Process multiple <OrderItem> nodes into an RecordList. Then, pass the RecordList to a mapping service for processing.
  • Carefully decide on a data validation strategy.



[1]  

Go Deeper on the Subject: The wMUsers Discussion Forums


Prashantha Upadhya is a Senior Technical Principal consultant at Inventa, a firm specializing in performance management, application development, enterprise and business-to-business integration solutions. Prashantha has over 9 years of experience in the software industry including B2B integrations. Prior to Inventa, he held positions in bio-medical research labs and has a masters degree in Bio-Medical Engineering.

Prashantha can be reached via email at


Advertise at wMUsers






  Home | Join wMUsers | Discussion Forums | Knowledge Center | Jobs | Shareware | User Groups | Links |
Contact Us | Terms of Service | Privacy Policy

wMUsers is an independent organization and is not sponsored in any manner by Software AG.


© All Rights Reserved, 2001-2008.