K
We do have redundant pair of Honeywell Servers (Experion DCS System). During normal operation one (primary) server failed (probably due to overloading of Server CPU) and backup did not respond leaving console stations blank. Below is the sequence of events.
1) SERVER 2A (primary) got failed and no data was shared to the clients (console stations).
2) Checked CPU usage and found it to be on higher side i.e 60 to 80 % (Refer to attached below snapshots to see heavy services like “spoolsv.exe"," Mcshield.exe”,"HSCServer_servicehost.exe" etc
3) SERVER 2B (secondary) did not pick up the operation and continued running as backup.
4) SERVER 2A was stopped from “Start/ Stop Server” utility as Experion Station Window was not responding to do a manual failover.
5) SERVER 2B continued to Run as Backup however all CPU resources (Usage remained 20 to 25%) & adequate disk space (30GB) was also available.
6) Restarted SERVER 2A and after 15 to 20 mins SERVER 2B became Primary and indications restored.
7) When SERVER 2A logged on again, below error was observed pertaining to “server application dgamngr”.
8) Synchronized SERVER 2A and it kept running as Backup.
9) “Defragmentation Utility” was run thrice on both servers but some files (related to history and events) remained Fragmented.
following are the main concerns,
1) Why did mentioned above services consumed up max resources of CPU ?
2) Why secondary did not pick up the operation although both Servers were sync?
Please share your experience here or on my email to get rid of such failures in future.
Thanks
Kamran
[email protected]
1) SERVER 2A (primary) got failed and no data was shared to the clients (console stations).
2) Checked CPU usage and found it to be on higher side i.e 60 to 80 % (Refer to attached below snapshots to see heavy services like “spoolsv.exe"," Mcshield.exe”,"HSCServer_servicehost.exe" etc
3) SERVER 2B (secondary) did not pick up the operation and continued running as backup.
4) SERVER 2A was stopped from “Start/ Stop Server” utility as Experion Station Window was not responding to do a manual failover.
5) SERVER 2B continued to Run as Backup however all CPU resources (Usage remained 20 to 25%) & adequate disk space (30GB) was also available.
6) Restarted SERVER 2A and after 15 to 20 mins SERVER 2B became Primary and indications restored.
7) When SERVER 2A logged on again, below error was observed pertaining to “server application dgamngr”.
8) Synchronized SERVER 2A and it kept running as Backup.
9) “Defragmentation Utility” was run thrice on both servers but some files (related to history and events) remained Fragmented.
following are the main concerns,
1) Why did mentioned above services consumed up max resources of CPU ?
2) Why secondary did not pick up the operation although both Servers were sync?
Please share your experience here or on my email to get rid of such failures in future.
Thanks
Kamran
[email protected]