Tech/Some ideas on improving php-fpm

From ~esantoro

So a few months ago I jumped ship and moved back to working with containers, kubernetes, and keeping stuff written in php up and running. It's not the first time I get to keep stuff written in php up and frankly I'm surprised at how php has been declared "dead" for the last 10-15 years and still companies are not only delivering but thriving using it. I could digress more on this it's probably a topic for another post.

In this article I want to write down a few ideas I had about improving php-fpm.

I don't write code for a living, but I might try my hand at it in the future (who knows?). It might be a fun experience.

Kubernetes-style probes as a first-class features

When running php stuff in kubernetes, you really want to configure the various startup/readiness/liveness probes adequately so that kubernetes can do some basic management for you (eg: restarting misbehaving containers and/or replacing entire pods that are in an inconsistent state).

One of the most common approaches is to expose some kind of endpoint like /health or /healthcheck so that kubernetes can determine when you application has booted and if your application think it is in a good state.

In handling requests to such endpoints, before returning 200 OK the application is supposed to inspect its internal state. Is the startup process over and successful? Did it manage to reach all the necessary known-before endpoints it'll use in the future? Are cache servers reachable? Are connections to databases active yet or still active ? This kind of stuff.

Necessary background and terminology

php-fpm is the "fastcgi process-manager".

The two keywords here are "fastcgi" and "process-manager". What this means in layman words is that php-fpm can be instructed to spawn pools of children php processes, and then forward http requests to such children via fastcgi.

Fastcgi is basically a glorified version of the old cgi. Whereas in old cgi the same executable was executed on each request, in fastcgi you spawn the executable only once and you speak with it using a standard protocol (the fastcgi protocol, i mean).

(The fastcgi specification allows for multiplexing multiple requests over a single channel, but I'm not sure this is actually used in php-fpm).

Process-manager means that php-fpm will not only start children process but will keep an eye on its liveliness and start new children should them crash. It will also terminate children if it deems necessary.

When spawning pools, the process-manager (pm) component of php-fpm can use three main strategies:

  • static: it will start a fixed number (pm.max_children) of children processes and if any of them dies it'll start new ones until the desired number of children is reached
  • dynamic:
    • it will start children on demand, up to a maximum (pm.max_children)
    • it will start an initial number of children processes (pm.start_servers) at the beginning and start new ones if the number of alive children goes below that
    • it will always try and keep a number of spare/idle children between pm.min_spare_server and pm.max_spare_server to be ready to serve additional requests.
    • when creating additional children, it will create up to pm.max_spawn_rate children at once.
  • ondemand: it will start children process upon receiving a requests
    • children processes will be kept around up to pm.process_idle_timeout mixed amount of time.

When using an http request as a probe in kubernetes, the php-fpm process will select an available (idle) children from the pool serving requests and will forward the request to that children to serve.

Note: you can configure php-fpm to terminate and restart a children once it has served pm.max_requests requests. As children php processes keep the loaded code in memory this can help work around memory leaks. It's not pretty but it works.

The issue

Occasionally applications will misbehave in unexpected ways.

One of the cases I had the pleasure of observing lately is the following:

  • the static strategy is selected
    • whether or not this is the best approach is irrelevant for this discussion
    • it might happen with dynamic as well
  • another service (external to php) is misbehaving and replying slowly
  • php code executed by the php children processes is dependent on the external service to build the response
    • this means that a request is keeping a php-fpm children process busy
  • soon all the children process in the pool are busy
  • when receiving a request from kubernetes, the request will be queued
  • if the external service degradation persists too long kubernetes will interprete the delay as a timeout
  • if enough such "timeouts" occours, kubernetes will assume that the container (or the pod) is in a bad state and will restart it

In layman terms, this means that an external service acting slow can cause a downtime in a largely unrelated server. This is fairly classic scenario, and it's one of those annoying things to debug because it's not immediately obvious (unless you have proper application-level instrumentation and monitoring).

In general, a web application dragging its feet and replying slowly is still much better than a web application not working at all.

In the case of an increment of real, organic traffic the "slow" web-application replicas might hold long enough for auto-scaling to kick in and add more replicas, alleviating the issues for all the people involved, including (full disclosure: conflict of interest) site reliability engineers.

Proposed solution

What i think might be a good idea is the following:

  • Pools of php-fpm children should also listen on an extra, additional tcp port which is exclusively dedicated to probing by kubernetes (or some other orchestration/management tool).
    • Such port might be called "health-check port" and is specificy to a pool. Each pool might have its own healthcheck port.
  • HTTP requests received on the health-check ports are served with higher priority than normal requests
    • this is under the assumption that health-check requests are "fast" (i'm being purposely vague here) and generally time-bounded (using proper timeouts etc)
    • example: if there are ten regular http requests waiting to be served on from the regular port and a requests comes from the health-check port then this last request skips the queue and gets served as soon as a php children process is available to serve it.
  • Optionally (maybe activated via a boolean parameter): pools of php-fpm children might have an mini-pool consisting of only one process (maybe restarted once each pm.max_requests requests) that is exclusively dedicated to requests received on such port.
    • Such mini-pool might be called "health-check sub-pool" (not sure about this name). Each pool may have its own health-check sub-pool

The core idea is that probing should not be obstructed by the "php children processes" resource being saturated. Having php children processes being saturated and thus causing kubernetes probes to fail might worsen a situation that might otherwise be just temporary.

Such features (health-checks port, health-checks sub-pool) would be optional and everything should keep working the same if they're not explicitly configured.

Cost impact

Listening on an extra port is essentially free.

CPU impact:

  • Minimal cpu scheduling overhead (one more process for the scheduler to juggle)
  • No extra costs application-wise: health-checks request are to be served anyway.

Memory impact:

  • if using the health-check sub-pool then there might be an extra instance of the application fully loaded but only serving health-check requests
    • this however give stronger guarantee of no interference
  • there is no extra cost otherwise
    • this "strongly mitigates" but does not eliminate the chance of a health-check request timing out.

Effectiveness

Effectiveness largely depends on whether the sub-pool is being used.

Sub-pool not being used:

  • If all the php children are currently busy, the health-check request will still have to wait for a child process to become available to serve the request.
  • Having health-check request being served with higher priority might help, but the scenario is only mitigated but not removed.
  • if no php child is available before the probe timeout, php-fpm will be appear unhealthy

Sub-pool being used:

  • The effectiveness is higher but so is the operating cost.

Depending on the specificities of the application, using a sub-pool may or may not be worth it.

Risks

Depending on how the application is written and how it works, using the dedicated sub-pool might pose risks rimilar to an application receiving a request suddenly after a "long" time not receiving any traffic.

Example of a false-positive failure scenario:

  1. An application boots
    1. As part of boot process, it connects to a database
  2. The application receives a request for a health-checks
    1. the health-checks handles the request, checking stuff like the liveness of the database connection
    2. as the the application has just started, the database connection is okay
    3. other checks succeed as well
    4. the application responds positively to the probe
  3. some time passes by
  4. as the code-path that uses the database is never actually called (the health-checks only checks if the connection is still open) the database connection times out
  5. The application receives a new request for a health-checks
    1. the database connection appears to be lost, the health-check fails

This is an example. It can go wrong in other ways, of course.

Mitigations

The application should be either be written knowing it'll be serving health-check requests on a dedicated long-living process or it should avoid using the health-check sub-pool approach.

Security risks

This change does not pose additional security risks in my opinion.

Having a dedicated endpoint for probes might provide a false sense of security and induce people into including in the response body information that should be kept private.

But HTTP endpoints for health-checks should not be reachable from the public internet and that's irrespective of on what port they are executed.

Unavoidable risks

I've realised there is an implicit underlying assumption in this proposal: that false positives are not due to cpu time being the saturated cpu.

If either the physical cpu is saturated or the cpu quotas (think of cpu limits per container in kubernetes) are exhausted then there is not much to fix.

Adoptability impact

Adopting this feature might be as easy as:

  • setting the health-check port via a pool specific configuration file parameter
    • this could be something like pm.health_check_port
  • setting an optional health-check sub-pool size via a configuration file parameter
    • this could be something like pm.health_check_pool_size = 1
    • default value for this would be 0, meaning the feature is not enabled
    • optionally: one might set a parameter like pm.health_check_pool_max_requests = 200 to control after how many requests should the process in the sub-pool should be restarted

Final considerations

After writing down these ideas and playing with them a bit in my head i realised that the prospect of using a sub-pool might be less optimal than i initially though.

Pondering on that, however, made me come up with the idea of re-using the usual pool but serving requests from the health-check port at higher priority.

I'm overall satisfied with the "proposal" as it can both improve some operational issues and provide configurability to decide how "deep" one wants to go on it.

Prometheus-style application metrics support

(i have ideas but it's getting late and I don't know if i feel like writing that now)