Notes |
(0000474)
ageric (administrator)
2012-06-07 07:33
|
While I'm sure the patch works most of the time, it's really not a stable long-term fix.
If several processes try to write at the same time and one of them fills up the pipe with a half-written command, it's entirely possible that Nagios will read half the command from one process and then continue to read an entirely different command from a different process, leading to one command being missed entirely.
In my opinion, that's worse than not getting all 4K of a service status output line passed onto Nagios.
Fortunately, the up-and-coming I/O-broker may have a solution for this, since it will enable Nagios to have a unix domain socket where much larger dataloads are possible and data can be read from multiple sources at once without the need for differentiating if a command without a new-line is continuated by the next line read, or if that next line is just a new command whose sending process happened to be scheduled first by the kernel.
In the meantime, checking for '[' followed by a timestamp would be a good way to ensure that we don't miss any incoming commands, although without consideration, such a patch might make a local DoS attack possible. |
|
(0000664)
estanley (administrator)
2013-01-12 13:02
|
I was able to send a 6k passive service check result to Nagios core 3.4.4 and it was not truncated. The input buffer for commands is 8k (and has been since at least 3.3.1), so that makes sense. If you have further information about how to reproduce this issue, please add it to the ticket. Otherwise, I think this should be closed as not a bug (at least not for the size buffer claimed). |
|
(0000666)
sbaynes (reporter)
2013-01-14 10:59
|
The problem is when multiple requests come in faster than they can be read from the buffer. 2*6k=12k which is > 8k. I guess the second gets split into two writes and if the first of those is then read before the second gets added to the buffer then it goes wrong.
So you will need to send bursts of multiple 6k results to the server to see the problem. Not just a single result.
I have never tried extracting this into a simple setup outside our full setup. I guess I could - but it will take me time to set up some infrastructure. |
|
(0000673)
estanley (administrator)
2013-02-05 05:40
|
External commands are read from the input buffer very quickly, as quickly as the thread can run in most cases. From there they going into a circular buffer that has configurable number of slots (4096 by default), each of unlimited size. The only case where there is a delay is when the slots are full and that retries every 250 ms.
I agree with Andreas that patch provided has ordering problems, so I don't believe it is the correct approach.
I would first ensure that your command_check_interval is short enough that you can empty the circular buffer quickly. If you're receiving a large number of these large responses in short order, please also verify that the external_command_buffer_slots is high enough to hold them. If that fails try increasing the input buffer size, MAX_EXTERNAL_COMMAND_LENGTH in common.h, to see if that solves the issue. If it does, I believe a patch to increase that or possibly make it a configuration parameter, would be a better approach. |
|