View Full Version : Reliable document delivery
Hello everyone - Can I have your comments on this issue please? I'm not a broker person - do forgive the newbie questions below.
I have a source Integration Server publishing documents to a broker. The document setting are: 'Storage Type' = 'Guaranteed' and 'Discard' = 'False'. A trigger on a destination Integration Server subscribes to the published documents, and invokes a DB interface service to inject them to an external database. In summary, this is the setup:
Source IS -> Broker -> Destination IS -> External Database
What is the best approach to make this setup reliable in the face of system failure? For instance, if the external database does down, the trigger still subscribes to the document, but the DB interface service returns failure. Does the trigger notify the broker the document is still unprocessed, or is the document now irretrievably 'gone' from the broker?
If so, do we need to tweak the trigger's 'Max attempts' and 'Retry interval' settings, or the DB interface service's 'Retry on IS Runtime Exception' settings?
What happens if the Destination IS now _also_ goes down - will the retries currently being processed be lost?
Someone mentioned the best approach for reliable delivery is to disable the subscribing trigger as soon as the service it invokes (in this case the DB interface service) starts failing. Is this correct?
Regards and Best Wishes for the New Year!
Sonam
Hi - I ended up taking this approach...
I changed these settings in the trigger:
Retry failure behaviour: 'Suspend and retry later'
Resource monitoring service: <A new DB check service>
The DB check service simply pings the external database to make sure it is up.
Any opinions about this retry process? Is there any time or other limit that kicks in?
srikanth05
02-05-2008, 10:22
i would say the approach u took below is perfect.:)
Thanks Srikanth :-) ... yes, it's looking good so far.
We have used this approach for one of our interface, which requires guarantee delivery.
It is working perfect. But I have some suggestions.(u might have already done that). It might help you in estimating the trigger properties value.
1. What is the average size of the incoming docs?
2. How many docs the IS pushes to the database every day.
3. If for some reason the database is down, What is the max down time of the DB.
The answer for the first two questions will give you an estimate of work load on the IS(memory and storage) that will be imposed due to the down time.
The answer for the last question will give u an estimate for the "max attempts", "retry interval" trigger properties.
You can switch the "deliver until" trigger property to "Success", however, if the db down time is very long, it might affect your IS. SO, i would definitely go with "Max attempts reached".
And you said you changed these settings:
-----------------------------------------------------
I changed these settings in the trigger:
Retry failure behavior: 'Suspend and retry later'
Resource monitoring service: <A new DB check service>
-----------------------------------------------------
where did u change these settings?
Kerni
srikanth05
02-20-2008, 08:44
The settings Sonam mentioned can be seen in the properties section of the trigger.
which webmethods IS version are u using ?
Hi Kerni - thanks.
As Srikanth said, the properties are the trigger properties which you can edit in Developer (I am using Developer for Fabric 7). You do have to write a new service to monitors your resource - when the DB goes down, the trigger automatically suspends, and starts a new scheduled service that runs my new monitoring service once a minute or so, until the DB is back up again. Till that point, documents queue up in the broker.
I tried the retry parameters earlier, but it's confusing (we have retry in the trigger, as well as service level retry). I think this trigger 'suspend and retry later' process is a better and simpler way.
Sonam
Sonam,
That's a good approach. The key i think here is to have a reliable way to detect a resource exception and raise it? Can you tell me how did you achieve it? Like, how did you determine if the failure is caused due to a database going down, database filling up, mapping failure etc.?
Sonam,
That's a good approach. The key i think here is to have a reliable way to detect a resource exception and raise it? Can you tell me how did you achieve it? Like, how did you determine if the failure is caused due to a database going down, database filling up, mapping failure etc.?
Hi Sekay - in our case, we had a 'ping' service available in our resource, that checked it's availability.
The extent of checking depends totally on you. For example, if the resource in question was a DB connection to an Oracle server, you could do a simple 'SELECT SYSDATE FROM DUAL' statement to verify the DB was up. Further SQL statements could be more comprehensive - for eg, inserting and then deleting a dummy record. Personally, I would not do the second check because the resource check service does not need to be comprehensive - if the trigger is incorrectly reactivated by the monitoring service, and an exception is still thrown, the trigger will just suspend again.
..........I tried the retry parameters earlier, but it's confusing (we have retry in the trigger, as well as service level retry). I think this trigger 'suspend and retry later' process is a better and simpler way.
Sonam,
The retry count approach also helps you handle other service exceptions than the DB down or other legacy sys down.
Actually it is easy. Here is how we implemeted it
Set the trigger properties
- deliver until
- Max attempts
- Retry Interval
Your service:
pub.flow:getRetryCount
>>Try/Catch
>>Try
- your code
>>Catch
- pub.flow:throwExceptionForRetry
- ErrorHandling...
This will ensure the guarantee delivery. The only downside is if the property Max attempts is set to "Successful" and if the exception is not resolved on time, it may affect the IS.
Kerni
Thanks Kerni -
Sonam,
The retry count approach also helps you handle other service exceptions than the DB down or other legacy sys down.
Actually it is easy. Here is how we implemeted it
Set the trigger properties
- deliver until
- Max attempts
- Retry Interval
The 'suspend trigger and retry later' approach kicks in in the case of any service exceptions (same as the 'retry count' approach)
However, it is superior for one main reason: if IS happens to also go down when still retrying deliveries, documents being retried are lost. With the 'suspend trigger and retry' approach, the undelivered documents are still queued up on the broker itself - hence documents are never lost even when the delivery endpoint goes down, IS goes down, broker goes down, or any combination of outages - the documents are simply retried until successfully delivered. Also, queueing up the documents on the broker (which has persistent disk based storage) reduces the load on IS.
Sonam,
The retry count approach also helps you handle other service exceptions than the DB down or other legacy sys down.
Actually it is easy. Here is how we implemeted it
Set the trigger properties
- deliver until
- Max attempts
- Retry Interval
Thanks Kerni - the 'suspend trigger and retry later' approach also handles any service exception - it is not specific to database exceptions.
It is better for one main reason: undelivered documents are still queued up on the broker. Hence documents are never lost, whether the delivery endpoint is down, IS goes down, broker goes down, or any combination of such outages - the queued documents are simply retried until successfully delivered. Queueing up the documents on the broker (which has persistent disk based storage) also reduces the load on IS.
With the 'retry count' approach, if IS is retrying, and then IS itself crashes - the documents being retried are lost since they were only in IS memory and had been already removed from the broker client queue.
I have an integration scenario where IS processes a broker queue. Some documents on the queue may have data errors that cause the trigger service to fail. (call these "bad documents".)
If the trigger hits a "bad document", I would like it to move ahead and process other documents in the queue. i.e. "bad documents" should not hold up the processing of "good" documents. However, I would like the "bad documents" to stay in the broker queue, so that they are not lost when IS goes down. Also, I would like to browse/export/delete 'bad documents' at leisure through the MWS broker queue interface.
The documents do _not_ have to be processed in order.
The trigger properties windows offers up two behaviors to cope with retry failures:
(1) Throw service exception - the problem with this is "bad documents" are removed from the broker client queue.
(2) Suspend and retry later - the problem is that both "good documents" and "bad documents" have their queue frozen. When the bad document is processed again, the queue is again frozen.
Can anyone suggest a suitable solution?