Join wMUsers | Blog at wMUsers | User Control Panel | Site Map | webMethods Jobs |For Employers

Ray Moser -- webMethods Ezine Columnist

Using the IS Repository for Data Storage: A Use Case



By Ray Moser

 

Keeping Count

Because the technical requirement stated that sub-documents must be processed in parallel (that is, not one-at-a-time and in top-down fashion), the task of corraling the counts was a daunting challenge.

An atomic unit of work is one the run independent of other processes and this project's architecture had three -- each requiring its own running count:

  1. Parse the batch XML document into up to 2,500 XML sub-documents -- Within each batch XML document, there is a field which states the number of included XML sub-documents resident within the batch file. The stated count must be equal to the actual number of sub-documents found inside.
  2. Validate and envelope the XML sub-documents -- As the validation process loops over each sub-document, the record's count for valid and invalid are incremented by one for sub-documents which pass validation and which fail validation, respectively.
  3. Send the valid enveloped sub-documents to an exchange via HTTP -- With each sent document, increment a counter by one.

These three counts are then applied as follows:

  1. The stated sub-document count should equal actual number of documents found in the batch XML document. If the stated count does not equal the actual count, then the discrepancy must be noted.
  2. (number of valid documents) + (number of invalid documents) = total number of sub-documents found in the batch XML document.
  3. The total number of sub-documents sent to the exchange == the total number of valid sub-documents. When this condition is true, the Flow to write the acknowledgement message is triggered.

To maintain the various counts required in this project, I employed the internal storage provisions of Integration Server: the repository. Using the pub.storage Built-In Services, I managed the various counts using a primary key unique to each batch file -- its filename. Intelligent naming conventions made this possible and will be covered in a future zine article.

The batch XML document file naming convention for this project was:

MessageType_LocationID_DateTimeStamp


Finding the Proper Course of Action

Thus far, this article has described the mission requirements. The next section will detail how the data was collected.

When I first attacked this problem. I created a repository store that mirrored the acknowledgement message (the previously discussed IData ProcessRecord). With the iteration of the three atomic work units -- parse, validate, and send to the exchange -- I invoked pub.storage:get to retrieve the record. I then incremented the appropriate count by one and invoked pub.storage:put to update the repository data store. Initially, it seemed that this solution worked fine. However, when I introduced 10 files each containing 750 files for testing, the counts did not match (i.e. (number of valid documents) + (number of invalid documents) < total number of sub-documents found in the batch XML document.)

If each Flow had a lock, why the inaccuracies? Because I didn’t use the input variable waitLength. This is the time in milliseconds that service should wait for the repository entry to become available. The single data store I first created was the sole maintainer of counts. The uneven counts arose because in 7,500 individual sub-document processes, this store was accessed approximately 22,500 times. Because of the nature of "lock", each Flow that called the get/put service exclusively held the entire repository until the data was incremented and the lock was released.

As a potential solution, I decided that implementing the waitLength input variable and setting a reasonable value wouldn’t necessarily guarantee results in the alloted timeframe. It may work, I thought, but we didn’t have time to test fully.

Instead, I chose to create separate repository stores for each of the four counts. The creation of separate repository stores for each count provided a definitive and separate process for incrementing the counts. In addition to the main data store, we added separate stores for validCount, inValidCount, SentToExchange, and ErrorRecords. The unique ID (filename) used as a primary key in the main data store was used to maintain the relationship between each of the four new stores and the original process. Since each process accesses the data stores serially, the possibilty of collisions, misreads, over-writes or any other similar issues is erased. This solution has tested out with over 250,000 files satisfactorily.


Tying Off the Loose Ends

In designing this solution, one other consideration was how to recover from a potential Production problem. To support this endeavor, DSP pages were built that provide a list of all processed data contained in the Processes data store. The DSPs can be viewed in a browser to show the original file name (repository key) along with all associated counts and date time stamps. Remember, to obtain all keys for a data store, invoke pub.storage:keys. The DSP loops through the keys and invokes the getProcessData flow to return the data.

Also, another issue to be handled up front is the potential proliferation of data stores. Even though the stores are persisted to the hard drive (WmRepository2 in the server root folder), it makes good sense to develop a system to purge the data. In this project, I decided to keep only the main process store and to delete the other stores upon wrap up of the XML document processing. The main data stores can be emptied manually, but prior to removing key/value pairs from the core data store, save the data to disk in a custom XML file in case performance statistics are needed in the future.



<<Prev  1  [2]  

Go Deeper on the Subject: The wMUsers Discussion Forums


Ray Moser is a a Principal Consultant with Zettaworks LLC, Houston, Texas and is working in Sydney, Australia on various webMethods projects. He has over 10 years experience in application architecture and development. He has architected and built several successful integration projects using the webMethods Integration server, Trading Networks and Enterprise Broker.

Ray can be reached at


Advertise at wMUsers






  Home | Join wMUsers | Discussion Forums | Knowledge Center | Jobs | Shareware | User Groups | Links |
Contact Us | Terms of Service | Privacy Policy

wMUsers is an independent organization and is not sponsored in any manner by Software AG.


© All Rights Reserved, 2001-2008.