Greenplum’s PGBouncer and prepared statements

For almost 6 months we were occasionally getting “ERROR: prepared statement “S_1″ already exists” error messages in Greenplum 4.3 which uses a PostgreSQL 8.2 engine under the hood . The frequency of the issue was unpredictable, sometime we ran into it multiple times within a day, sometimes it was not seen for weeks. Usually re-running the same query or job for a second time made it work flawlessly, sometimes it was failing multiple times in a row which was affecting development productivity. Below are the findings of the investigation and a proposed workaround.

PgBouncer

pgbouncer is a PostgreSQL connection pooler. Any client application can be connect to pgbouncer as if it were a PostgreSQL server, and pgbouncer will create a new connection to the actual database server, or it will reuse one of its existing connections from its own pool.
The aim of pgbouncer is to lower the performance impact of opening new connections to PostgreSQL by reusing idle connections.

In order not to compromise transaction semantics for connection pooling, pgbouncer supports several types of pooling when rotating connections:

  • Session pooling: Most polite method. When a client connects, a server connection will be assigned to it for the whole duration the client stays connected. When the client disconnects, the server connection will be put back into the pool. This is the default method.
  • Transaction pooling: A server connection is assigned to a client only during a transaction. When PgBouncer notices that transaction is over, the server connection will be put back into the pool.
  • Statement pooling: Most aggressive method. The server connection will be put back into the pool immediately after a query completes. Multi-statement transactions are disallowed in this mode as they would break.

The following table list various PostgreSQL features and whether they are compatible with PgBouncer pooling modes. Note that “transaction” pooling breaks client expectations of the server by design and can be used only if the application cooperates by not using non-working features.

FeatureSession poolingTransaction pooling
Startup parametersYesYes
SET/RESETYesNever
LISTEN/NOTIFYYesNever
WITHOUT HOLD CURSORYesYes
WITH HOLD CURSORYesNever
Protocol-level prepared plansYesNo
PREPARE / DEALLOCATEYesNever
ON COMMIT DROP temp tablesYesYes
PRESERVE/DELETE ROWS temp tablesYesNever
Cached plan resetYesYes
LOAD statementYesNever
[ source: https://www.pgbouncer.org/features.html ]

Why PgBouncer is useful and needed

PgBouncer is a lightweight connections pooler for PostgreSQL with the 3 reasons why it is used:

  1. It reduces PostgreSQL resource consumption (memory, backends, fork).
  2. It supports online restart/upgrade without dropping client connections.
  3. It allows allows PostgreSQL restart/upgrade without dropping client connections.

PostgreSQL forks a separate process for each client connection. At the same time applications do not always use connection to the database permanently, connections are often idle. Thus PgBouncer creates a pool of database connections and provides these connections to clients when the connection is required. Thus, if you have a lot of client connections to the database, PgBouncer can reduce the number of PostgreSQL backends processes. The response time between database and a client also decreases, because no need to fork a new backend process.

[ source: https://blog.dataegret.com/2015/06/3-reasons-to-use-pgbouncer.html ]

Prepared statements

A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified
statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed.
This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
[ source: https://www.postgresql.org/docs/9.3/sql-prepare.html ]

Server side prepared statements can improve execution speed as

  1. It sends just statement handle (e.g. S_1) instead of full SQL text
  2. It enables use of binary transfer (e.g. binary int4, binary timestamps, etc); the parameters and results are much faster to parse
  3. It enables the reuse server-side execution plan
  4. The client can reuse result set column definition, so it does not have to receive and parse metadata on each execution

[ source: https://jdbc.postgresql.org/documentation/head/server-prepare.html ]

There are two ways to use prepared statements: either you explicitly create an use them with the PREPARE keyword or the JDBC driver can also implicitly create them on the fly

Explicit prepared statements

PREPARE "test_S_1" (text) AS select column_name from table_name where object_name = $1;
EXECUTE "test_S_1"('INPUT1');
EXECUTE "test_S_1"('INPUT2');
SELECT * FROM pg_prepared_statements; -- this query lists all created prepared statements for the given session
only
DEALLOCATE "test_S_1"; -- this deletes the prepared statement with a given name

Implicit prepared statements

The JDBC driver uses server side prepared statements by default when PreparedStatement API is used. In order to get to server-side prepare, you need to execute the query 5 times (that can be configured via prepareThreshold connection property). An internal counter keeps track of how many times the statement has been executed and when it reaches the threshold it will start to use server side prepared statements.

There might be cases when you would want to disable use of server-prepared statements. For instance, if you route connections through a balancer that is incompatible with server-prepared statements, you have little choice.

You can disable usage of server side prepared statements by setting prepareThreshold=0

[ source: https://jdbc.postgresql.org/documentation/head/server-prepare.html ]

prepareThreshold = int:
Determine the number of PreparedStatement executions required before switching over to use server side prepared statements. The default is five, meaning start using server side prepared statements on the fifth execution of the same PreparedStatement object. More information on server side prepared statements is available in the section called “Server Prepared Statements”.
[ source: https://jdbc.postgresql.org/documentation/head/connect.html ]

In some cases, the prepareThreshold parameter might not work properly and doesn’t disable the use of prepared statements:

If the issue persists after setting the parameter, you’ll need to update the PostgreSQL JDBC4 driver used to 9.3 or newer due to this PgSQL bug: http://pgbouncer.projects.pgfoundry.org/doc/faq.html#_disabling_prepared_statements_in_jdbc

[ source: https://confluence.atlassian.com/crowdkb/crowd-crashes-due-to-psqlexception-error-prepared-statement-s_1-does-not-exist-729482785.
html
]

Using PgBouncer and prepared statements together

  • In Transaction pooling mode prepared statements are disabled. As such, the client should not try to use it, either implicitly by the JDBC driver nor explicitly by the SQL developer. Each client individually need to adhere to this server-side limitation (so all DB clients and ETL tools alike, each of them individually). If the client is not aware of this server side limitation and tries to use a prepared statement still, either of the following error messages can occur:
    • ERROR: prepared statement "S_1" does not exist
    • ERROR: prepared statement "S_1" already exists
  • In Session pooling mode, while prepared statements are enabled, it is important that the server clears all previously prepared statements when a given session is re-allocated to a new client. For better understanding, the flow is the following:
    1. A client connects gets an available session from the connection pooler, either a new or a re-used one. For the sake of the example, lets assume that this client uses prepared statements
    2. The client disconnects, the connection pooler puts the session back into the pool
    3. Either the same or a different client initiates a new connection – PgBouncer allocates one from the pool. For the sake of the example, lets assume it is the same session which was used in step 1. The client doesn’t know anything about what was created in the same session by the previous client and (rightfully) assumes it is starting from a clean sheet.
      • To ensure everything is set back to the default state in any given session, PgBouncer will execute whatever is defined in the server_reset_query of the config
      • If the server_reset_query doesn’t clear prepared statements properly, the second client could get an error message in case it tries to create a prepared statement while the previous client already created one with the same name and it still exists: ERROR: prepared statement "S_1" already exists

server_reset_query in PgBouncer’s config

Query sent to server on connection release, before making it available to other clients. The query is supposed to clean any changes made to the database session so that the next client gets the connection in a well-defined state. The default is DISCARD ALL which cleans everything, but that leaves the next client no pre-cached state. It can be made lighter, e.g. DEALLOCATE ALL to just drop prepared statements, if the application does not break when some state is kept around.

When transaction pooling is used, the server_reset_query is not used, as clients must not use any session-based features as each transaction ends up in a different connection and thus gets a different session state.

[ source: https://www.pgbouncer.org/config.html ]

Since PgBouncer is a connection pooler, it will reuse one actual connection to the database server for potentially many client connections. It has to make sure that whatever session state the first client created is reset when the next client gets the connection. Otherwise the first client might do something like SET statement_timeout = '5min', and that would apply to all subsequent clients who happen to be assigned that connection. To avoid that, PgBouncer issues the “server reset query” before handing out the server connection to a new client. As the documentation says:

A good choice for Postgres 8.2 and below is:

server_reset_query = RESET ALL; SET SESSION AUTHORIZATION DEFAULT;

for 8.3 and above its enough to do:

server_reset_query = DISCARD ALL;

[source: https://oraerr.com/database/postgre/what-is-the-purpose-of-pgbouncers-parameter-server_reset_query/]

The main reason this setting exists is that the DISCARD ALL statement, which was invented for this very purpose, was not available before PostgreSQL 8.3.

Testing the server_reset_query and workaround

In the above example, the expectation would have been that the RESET ALL query in the middle clears any prepared statements. As it didn’t do so, the second PREPARE statement failed.

One workaround to fix this is creating a new function which one by one runs a DEALLOCATE command for all prepared statements and add that function call to the server_reset_query:

CREATE OR REPLACE FUNCTION deallocate_all()
 RETURNS void AS
$BODY$
declare
 r record;
BEGIN
 for r in select 'deallocate "' || name || '";' as dealloc from pg_prepared_statements
 loop
 raise notice 'Command: %', r.dealloc;
 execute r.dealloc;
 end loop;
END;
$BODY$
 LANGUAGE plpgsql VOLATILE;
GRANT EXECUTE ON FUNCTION deallocate_all() TO public;

But if it works fine, is it really just a workaround?

Yes, I believe this is just a workaround and mitigation of one given manifestation of the issue we faced with and there might be other potential issues we yet to discover as the root cause is yet to be resolved. That root cause is that we don’t have a server_reset_query which sets everything back to the default state when a given session is re-allocated to a new client. In 8.3 and above, that is what the DISCARD ALL command should do, but in 8.2 what we are currently on, this command doesn’t exists. They have recommended to use RESET ALL, which clearly doesn’t do a good job otherwise there wouldn’t have been a need to create a new command DISCARD ALL in 8.3 and we wouldn’t have faced the above issue at all. The command RESET ALL doesn’t clear prepared statements, that we have discovered already so we created one mitigation for that given issue. What else RESET ALL doesn’t cater for whereas DISCARD ALL would we don’t know for sure until we face an another issue.

This guy/gal sums it up nicely:

As long as DISCARD ALL guarantees that all connections revert to a pristine state before returning to the pool, all behavior remains deterministic, in that the values of all settings within a transaction are controlled only by statements run within that transaction, plus the regular database configuration (per-user, per database, per-cluster, etc.)

Only if server_reset_query is left empty does the system enter what I would call a broken state: the values of settings are nondeterministic, governed by an unpredictable combination of all database activity since the creation of the connection pool.

[source: https://github.com/pgbouncer/pgbouncer/issues/110]

Although he or she is arguing about using server_reset_query for transaction pooling mode as well, his or her point is that if there is no clean sheet given to the new client of a re-used session then we enter into a nondeterministic and unpredictable state. Will the new client’s prepared statements fail? Sometimes they will, sometimes they won’t, depending on what the previous client of the re-used session did within their session.

One thought on “Greenplum’s PGBouncer and prepared statements

  1. Hello my friend! I wish to say that this post is amazing, great
    written and come with approximately all vital infos.

    I would like to see more posts like this .

Leave a Reply

Your email address will not be published. Required fields are marked *