PDA

View Full Version : Unable to stopstart IS server


sachdeva01
10-03-2005, 11:16
One of your services hangs occasionally (we see it in the service usage with a '1' next to it) and we have to restart the server. However we are unable to stop the server when this happens through the admin page. We are running IS as a windows 2000 service. The only way is to go into the server and re-boot it. Windows services stops responding. Has anyone else faced the same issue.

In my opinon i should be able to shutdown/ restart the application from the admin page rather than re-booting the server.

griffima
10-03-2005, 11:44
Mick,
"In my opinon i should be able to shutdown/ restart the application from the admin page rather than re-booting the server." Yes you should but you can't. What you have there is a hung non-responsive thread and the IS server is not going to close down gracefully (normally will not).

You should not have to restart the entire server though to get it going again. You can kill it off via the task manager or if you started from a terminal window, you can close there as well. If that doesn't work switch to Unix. http://www.wmusers.com/wmusers/clipart/happy.gif

The best prevention I have found for this is to find what is causing the hung thread. Generally speaking the IS server tends to be pretty reliable and should not have a lot of hung thread instances. I have found that a lot of hung threads are caused by resource issues, such as hung database connections, adapters etc.

markg
http://darth.homelinux.net

mcarlson
10-03-2005, 12:06
Mick,

Not sure what you mean by "one of your services". A few questions: Are you talking about a Windows NT service? If so, what causes you to believe that the service is hung or is no longer responding? What does your webMethods administrator or lead developer say about what processing the Integration Server is doing when this occurs? What version of webMethods Integration Server are you using? (Improved ability to shut down IS from the Windows Services application was delivered with IS 6.1 several months ago.)

Mark

sachdeva01
10-03-2005, 12:38
Mark,

we are using 6.1 on windows 2000. we have an ftp flow service to the mainframe that is colliding with the mainframe process making that subscriber that uses the service to be hung. I believe it has to be a connection issue. We will have to change that process. In the meantime, i was trying to understand if there was any way to recover (stop/start) the server gracefully.
IS itself is running as a NT service.

mcarlson
10-03-2005, 17:36
The pub.client:ftp service should fail gracefully after its timeout limit is reached. If you specified a timeout (default means wait forever) and that is not happening, you should report it as a probable defect to Support.

One workaround for services (built-in or external) that hang and never return control to the calling Flow, is to invoke them on a separate thread and then attempt to collect their results with a pub.sync:wait call.

Since you specify a time to wait, for the thread to complete, you can detect and handle gracefully services that "hang". Of course, this may leave you with orphaned threads, but at least your entire server is not hung.

It's always better if the thing you are calling (especially if WM wrote it) doesn't just hang, but you don't always control the components you have to call and the spawn-a-thread-and-wait-with-timeout approach has worked well for me in the past.

Mark

sachdeva01
10-04-2005, 14:29
Thanks Mark. Good suggestions. I will try out the timeout property. Seems like something i should have tried before!.

Another issue that we have been having is that sometimes (once every month or so), the flat file processing hangs in the 'working' directory and does not move to 'complete' folder. The same file processes fine after re-booting the server! Is this again a thread hanging issue. If so, we are thinking of putting automatic re-boots of servers every weekend (since we have the luxury of doing so!).

Anyone else having run into the same issue. wM support didn't have an answer for this one.

mcarlson
10-09-2005, 17:49
Mick,

I wouldn't consider this a "thread hanging" issue.

I've heard of similar issues when the file processing begins while the file is still being written to the working folder. This usually happens only on very large files. I would try to understand what was being processed when this issue occurs.

I'm not a fan of scheduling reboots just because you can. Of course, if you are using Windows-based servers, you may have to because of the OS itself.

The only time most of my client's reboot Integration Server is when they have to deploy a new jar file.

Mark

griffima
10-10-2005, 07:37
Mick,
I'll explain what I mean by a hanging thread or what is also called a non-responsive thread. When you use the services usage page within the admin tool, you will see a list of services that have run sometime since the last startup of the IS server. If a service is currently executing, you will see a (x) with x being some number equal to the number of instances of that service running. From time to time, a service may become "hung" or non-responsive. This almost always occurs due to some type of external connection issue. The service requested a resource outside of the JVM, the resource didn't response correctly, or the connection was abruptly severed, or the moon and the planets didn't align, whatever. The end result is the IS server no longer has control of that executing service. The only way to clear it at that point is to restart the IS server.

Since the IS server is going to try and shutdown the server gracefully, it will try and make sure all running services shutdown. Hence the problem when it no longer has control of a service. It will often hang on shutdown when this has occurred although as Mark C says it is getting better.

There are about a million different things you can do to ease this problem. Everything from the way you setup your connection pools, jdbc and network timeout settings, the way you code your flow services and handle errors etc..

Mark C's suggestion which is to start a background thread, is a very solid way to do it. That way if the JVM loses control of the thread, it has not lost control of the service. You will orphan that thread, but it will help with your overall uptime. Although if it is happening a lot you really need to find the root cause.

Unfortunately Windows is Windows. Different folks have had various results with Windows. Some good and some not. Our data center folks have a regularly scheduled reboot of all of our Windows servers. Which is why I don't use it for webMethods. Solaris and webMethods are a very solid combination, in my experience. In my opinion you should only have to reboot a serve when patches or new hardware are applied.

markg
http://darth.homelinux.net