Servers are critical pieces of infrastructure that should work without issue for numerous years. Eventually, they will start becoming less efficient and may struggle to keep up with business demands until they may stop functioning all together. When this unfortunate day occurs, it may require a simple component fix, or total replacement. If you’re unable to get your server back up and running quickly, you could be looking at expensive downtime while you try to find a replacement part or new system. To avoid lengthy downtime, be aware of the state that your server is in as well as red flags that may indicate your server is on the brink, so you can be prepared for potential system maintenance. Let’s take a look at signs a server is about to fail as well as some server lifecycle basics.
How long is the average life cycle of a server? It varies, but there are several rules of thumb:
- The average lifespan of a server is roughly five years, but the recent trend has been toward shorter life cycles. Aging servers, in addition to having increased risk of failure, are less powerful and less energy-efficient than newer models. Additionally, individual components within servers are likely to fail before the whole thing needs to be replaced.
- Servers used in production on mission critical apps are as good as their warranties, which typically run three to five years. If they fail before then, you know the manufacturer will have your back in the form of service and replacement parts but keeping servers online after that is a gamble you don’t want to take.
- The cost of maintenance on aging machines can be extensive. If you don’t have the cash to replace hardware or if you’re forced to use older infrastructure, the costs can add up. And if you experience an extended failure, the downtime can have a serious effect on your company’s bottom line. That’s why early problem diagnosis is important and can be a key cost saver!
So, what are the red flags that might appear before a server crash takes its toll on your business procedures? Here are three signs to look out for:
1. Temperature troubles: CPU running hot
A server might be in trouble when it starts running at a higher-than-normal temperature. To figure out if your server is running too hot, you can probably check with your vendor for baselines; many models come with acceptable temperature operating specifications.
The high temperature itself might not be the real problem, but instead an underlying symptom of what’s actually wrong (e.g., issues with power supply, memory, etc.). Therefore, you should check the CPU, chipset, and HDD temperatures, and check whether or not your fans are running properly. If your fans are making more noise than normal, that’s a key indicator that they aren’t running like they used to and could be a factor to your server running hot. Another thing to note regarding fans and server temperature is that when an overheated server first starts up or reboots, the fans will ramp up to max speed for a few moments to help adjust the temperature, so keep a look out for the speed and noise level of your fans as a potential symptom of high system temperatures.
If you can’t immediately determine the cause of excess heat, be diligent and continue researching. Other possible causes of high temperature could include a clogged front intake, blockage of the exhaust or airflow, recent repositioning of the machine, or a dirty heat sink.
2. Constant reboots or random failures
Even a stable server can give out if put under unusually excessive load. Such failures in isolation are usually nothing to worry about, but a mysterious crash for no clear reason on a server with no intensive process running on it is a definite cause for concern. In this case, it’s advised not to reboot and hope for the best. You will need to look into the cause for the crash, take into consideration the below:
- Pour over event logs to see if you can find any explanation for the odd behavior
- A physical check of the motherboard might be worthwhile to see if any components (such as capacitors in the power supply) are damaged
- Running a memory test and resetting the memory sticks is a good idea
- Check the server’s disk for errors
- Use antivirus/anti-malware software to see if an infection or intrusion might be causing the crashes
- Make sure that the server isn’t being put under undue stress (for example, you can use network monitoring software to alert you of high CPU, memory, or disk utilization)
3. Sudden computer and service slowness
“My computer’s running slow!” is undoubtedly one of the most popular help desk ticket subject lines of all time, and the cause could be almost anything. With a server though, sudden slowness is often the result of deep-seated problems that could put it at risk for failure.
For example, a process may cause a memory leak that could eat up all of your system resources, which could result in the system grinding to a halt. A simple software update might fix things in these instances, but your system may crash for other reasons. Your Linux server might decide to go read-only if your hard drive is acting up. Or data corruption might be causing applications to randomly fail. Over time, tiny problems will start to add up, and if regular maintenance isn’t enough to consistently keep your server in working order — it may be time for a replacement.
Maintaining your server may seem like a headache, but it doesn’t have to be. BerganKDV can help keep track of your server and ensure proper operations so you can focus on running your business. If you are curious about the technology services that BerganKDV provides, we encourage you to contact us to learn more.