0

Three tricks to solve industrial control machine failure (I) - Fault analysis method

Jun 10, 2024

This article analyzes the common faults and analysis methods of industrial control integrated machines from the perspective of operators and equipment maintenance engineers.


Although there will be all kinds of strange problems on the industrial control site, the stability and reliability problems of almost all products can be judged based on the three points mentioned in this article. I hope that after reading this article, readers will have a systematic analysis method for problem judgment.

Three axes: resources-logic-environment

Any product failure is inseparable from these three core points. By analyzing and processing these three aspects, on the one hand, it can solve real problems, and on the other hand, measures can be taken from multiple angles to avoid product failures.

1. Resources


Resources generally refer to hardware resources, that is, hardware configuration, whether hard indicators such as motherboard, CPU, memory, hard disk, network bandwidth, and external interface meet the requirements of the on-site environment.

For industrial control integrated machines used on site, the hardware configuration and software matching are usually tested and meet the requirements. Generally, there is no situation where it is normal at the beginning and the software is abnormal after a period of time. Therefore, this abnormality is usually not caused by insufficient hardware configuration of the industrial control integrated machine, so it is necessary to find the cause from hardware facilities such as accessories, external interfaces, and connecting cables.

Lianzhitongda has summarized more than ten years of experience in the development and manufacturing of industrial products and found that the main reasons for the high frequency of failures are as follows:

1. Poor interface contact. Frequently plugged and unplugged interfaces are most likely to have problems, while interfaces that are not plugged and unplugged will basically not have problems, or they may fail after years of use;

2. Cable damage is the most common and difficult to find fault on the project site;

3. Mechanical hard disk damage. It is usually easy to occur in illegal frequent power outages or vibration environments. SSD hard disks are better and are generally recommended;

4. Poor memory contact. In dusty and humid environments, memory is prone to abnormalities after a period of use, and it is normal after replugging;

5. CPU and memory usage are also one of the key points that need to be paid attention to;

Under normal circumstances, the CPU and memory resource usage of officially deployed industrial software is within a certain control range, but there may be occasional high CPU usage, mainly due to overheating of the environment. You can check the ambient temperature and host temperature first. If the host temperature is too high, you need to consider cooling.

Generally speaking, to ensure the stable operation of the industrial control integrated machine, it is recommended that the CPU resource usage should not exceed 40% (the usage visible to the naked eye, not the peak value), the memory usage must be relatively stable, and the overall control should not exceed 50%, let alone memory leakage.

2. Logic


Logic is generally the business logic of the software, which refers to whether the software is mature and whether there are any abnormalities in the logic.

Generally speaking, business logic problems are easier to reproduce. For mature industrial software, there are relatively few problems in this regard. Generally, software that is normally deployed to industrial sites is basically strictly tested and verified on site. Therefore, this article mainly focuses on hardware failures.

There may also be conflicts in business logic. For example, some sensors exceed the normal frequent alarms, resulting in an increase in business logic load, etc. The specific needs need to be tested and analyzed according to the actual situation.

Logic also includes network topology, especially some bus wiring schemes.

The network topology is also the business logic of the overall system. Take the RS485 bus as an example. Usually, there is no problem in the single-machine test, but there is a problem after connecting multiple machines (terminals). This phenomenon is that there is no problem in theory, but it cannot be used in practice. Therefore, it is necessary to consider the interference problems that the network topology may encounter in actual use and take necessary measures.

The purpose of using RS485 as an example is to clarify this seemingly non-logical fault, clearly classify it into the logic category, and facilitate the analysis and handling of the fault. For various specific faults of industrial control all-in-one computers, we will write a special article to share. Please pay attention to Lianzhitongda to obtain the information and product knowledge of industrial control all-in-one computers we share in a timely manner.

3. Environment


Environment generally refers to the working environment of the equipment, such as temperature, humidity, voltage fluctuations, electromagnetic interference and other external factors that affect the stable operation of the equipment.

Temperature and humidity are the easiest to judge, but voltage fluctuations and electromagnetic interference are relatively difficult to judge. Generally, professional instruments are required to detect them. Of course, as an engineer of Lianzhitongda with rich field experience, there are also some simple judgment methods.

How to quickly judge the voltage stability? Generally speaking, a multimeter can be used to test whether the power supply is stable. Another type of voltage instability is caused by the power adapter. The power adapter usually outputs low voltage, and the multimeter usually measures it inaccurately. There is a simple way to check the heating of the power adapter (avoid the problem from another angle). Generally speaking, the power adapter will heat up. If the surface of the adapter is obviously hot, it means that the adapter is overloaded or even exceeds the standard. When the power adapter is overloaded, the voltage stabilization effect will be greatly reduced. It is recommended to replace it.

In mainland China, the power supply of the power grid is relatively stable at present, and there is generally no problem with the power supply, but the opening and closing of local large equipment will cause voltage fluctuations in the local power grid. For example: If the equipment crashes regularly, if the time coincides with the opening/closing of large equipment, it can basically be judged that this is the reason. The solution is to add a voltage stabilizer before the power supply of the equipment or a magnetic ring on the power cord, which can generally solve this problem.

Electromagnetic interference is a problem that is difficult for general engineers to judge. If it occurs frequently, you can consider purchasing an electromagnetic radiation detector for testing. Preventing electromagnetic interference is always the best choice. Pay attention to the following points:

1. Avoid equipment close to high-voltage lines and large power equipment;

2. Shield the equipment casing;

3. Provide good grounding treatment;

The above are general on-site fault analysis and treatment ideas for industrial integrated computers. Don't panic when you encounter problems. According to the above logic analysis, classify the problems and analyze them, and you can generally draw relatively accurate conclusions.

This article is the first in the series of industrial integrated computer fault treatment articles. We will continue to analyze various specific problems in the future. Lianzhitongda combines many years of industrial product experience to share the analysis and treatment ideas of specific problems with you. Follow us to learn about the relevant product knowledge of industrial integrated computers and experience sharing of fault treatment at any time.

<< Three tricks to solve the fault of industrial control integrated machine (Part 2) - Touch screen fault analysis

>> Industrial touch screen integrated machine in logistics warehousing: real-time monitoring, data analysis and decision support