PostgreSQL is a modern, dynamic database with a very large set of features that allow you to solve a wide range of tasks. Using PostgreSQL usually refers to a very critical segment of the IT infrastructure that is associated with the processing and storage of data. Given a special place in the database infrastructure and the degree of criticality of tasks assigned to it, the question of monitoring and proper control over the work of the DBMS. In this regard, PostgreSQL has extensive internal means of collecting and maintaining statistics. Collect statistics gives a fairly detailed picture of what is going on under the hood in the operation of the database. These statistics are stored in special system tables, views, and is constantly updated. Performing regular SQL queries in these tables can receive a variety of data bases, tables, indexes, and other database subsystems.
Below I describe the method and means for monitoring PostgreSQL monitoring system Zabbix. I like this because the monitoring system provides ample opportunities for the implementation of a custom of monitoring a variety of systems and processes.

Monitoring will be based on SQL queries to tables of statistics. Sami requests take the form of an additional configuration file to zabbix agent in which SQL statements are wrapped in the so-called userParameters – user monitoring parameters. User parameter in Zabbix is a great way that you can configure monitoring for unusual things, such things in our case will be the parameters of PostgreSQL. Each user parameter consists of two elements: The key name and team . The key name is a unique name that does not overlap with other key name. The team – is the actual command-action to be performed by the agent zabbix. An expanded version of this command can be transferred to various parameters. In the configuration zabbiksa, it looks like this:
UserParameter = custom.simple.key, / usr / local / bin / simple-script
UserParameter custom.ext.key = [*], / usr / local / bin / ext-script $ 1 to $ 2

Thus, all requests for statistics are PostgreSQL psql client requests wrapped in custom settings. Strengths:

minimum requirements for the configuration of the observed node – in the simplest case, add the config and restart the zabbix agent (complex case involves tailor the rights of access to PostgreSQL);
Configuring connection to PostgreSQL, as well as thresholds for the triggers are executed through macro variables in the web interface – so no need to climb to the triggers and produce a template for the case of different thresholds for different hosts (macros can be assigned to host);
a wide range of data collected (compound time transactions, statistics databases and tables, streaming replication and so forth.);
low-level detection for databases, tables and servers stand-by.

a lot of the observed parameters, maybe someone wants something to be disabled.
depending on the version of PostgreSQL some things will not work. In particular, it concerns the monitoring of replication, as Some features are simply not present in older versions. Written with an eye to version 9.2 and higher.
also some things you must have installed extensions and pg_stat_statements pg_buffercache – if the extension is not set some parameters will not be available for observation.

Monitoring capabilities:
information on the volume of emissions and writes the buffer, check points and the recording time during chekpoyintov – pg_stat_bgwriter
general information on the shared buffers – it requires expansion pg_buffercache . I also want to note that requests for these statistics are resource-intensive, which is reflected in the documentation for the expansion, so depending on your needs, you can either increase the polling interval, or uncheck all.
general information on the service – uptime, response time, cache hit ratio, the average query time.
Information for client connections and run-time query / transaction – pg_stat_activity .
the size of the database and summary statistics for all databases (commit / rollbeki, read / write temporary files) – pg_stat_database
potablichnaya Statistics (read / write, number of service tasks such as vacuum / analyze) – pg_stat_user_tables , pg_statio_user_tables .
Information for streaming replication (server status, number of replicas, log them) – pg_stat_replication
Other pieces (number of rows in the table, the existence of the trigger configuration, WAL logs)

Additionally, it is worth noting that for the collection of statistical data should include the following parameters in the postgresql.conf:
track_activities – includes tracking commands (queries / statements) of all client processes;
track_counts – includes the collection of statistics on tables and indexes, Installation and setup. All you need to adjust is in Github repository.

Clone # git
# CP zabbix-Extensions / Files / postgresql / postgresql.conf /etc/zabbix/zabbix_agentd.d/

Further it should be noted that to satisfy requests from the agent, it is necessary that in pg_hba configuration was determined by the appropriate access – agent should be able to connect to the postgres ustanalivat service to the target database and execute queries. In the simplest case, you must add the following line in pg_hba.conf (for different distribution file location may be different) – Allow connections on behalf of the postgres database mydb with localhost.
host mydb postgres trust

Memorable that after changing pg_hba.conf postgresql service to do reload (pg_ctl reload). However, this is the easiest option and not entirely safe, so if you want to use the password or a more complex access scheme, then again, carefully read the pg_hba and .pgpass . So the configuration file is copied, it is necessary to load it into the basic configuration, make sure that the main file agent configuration, there is a line Include c given by where there are available additional configuration files. Now restart the agent, then we can test the performance of a very simple test – using in square brackets indicates the option to connect to postgres, which will be transferred to the psql client.

# Systemctl restart zabbix-agent.service
# zabbix-Get -s -k [‘- H -p 5432 -U postgres -d mydb’]

If you are properly registered to access, then you will return to the service response time in milliseconds. If an empty string is returned, then there is a problem with access pg_hba. If the returned string ZBX_NOTSUPPORTED – do not load the configuration, check the configuration agent route to Include, and put up the right configuration. When the check command returns the correct answer is download the template and upload it to the web interface and assign to the target host. Download the template can also be from the repository (postgresql-extended-template.xml). After you import the need to go to the Settings tab macros template and customize them. Below is a list and brief description of:

PG_CONNINFO – this is the connection parameters that will be passed to the client psql when you run the query. This is the most important macro as defines the parameters for connecting to postgres service. The string specified by default is more or less universal for all cases, but if you have multiple servers and each server are different from other settings, then hosts in zabbikse can define a macro with the same name and ask him individual value. When you check the macro host takes precedence over macro template.

PG_CONNINFO_STANDBY – this is the connection parameters which will be transferred to the utility psql when prompting for stanby server (definition replication lag).
PG_CACHE_HIT_RATIO – the threshold value for the trigger on the success rate of a cache hit; triggered flip-flop if the percentage of hits will be below the mark;
PG_CHECKPOINTS_REQ_THRESHOLD – the threshold for checkpoints on demand
PG_CONFLICTS_THRESHOLD – the threshold for the conflicts arising in the performance of queries on the servers of standby;
PG_CONN_IDLE_IN_TRANSACTION – the threshold for compounds that opened the transaction and nothing at the same time do not (bad transaction);
PG_CONN_TOTAL_PCT – the threshold for the percentage of open connections to the maximum possible number of connections (if 100%, everything ran Connect);
PG_CONN_WAITING – the threshold for blocked requests are waiting for the completion of other requests;
PG_DATABASE_SIZE_THRESHOLD – the threshold for the size of the database;
PG_DEADLOCKS_THRESHOLD – the threshold for deadlocks (fortunately they are resolved automatically, but their presence is desirable to be aware of, because it is direct evidence of poorly written code);
PG_LONG_QUERY_THRESHOLD – the threshold for query execution time; work will trigger if the queries whose execution time exceeds this mark;
PG_PING_THRESHOLD_MS – the threshold for the response time of the service;
PG_SR_LAG_BYTE – threshold for replication lag in bytes;
PG_SR_LAG_SEC – threshold for replication lag in seconds;
PG_UPTIME_THRESHOLD – threshold uptime, uptime if below then restart the service;

From the text of the triggers should be understood for what these thresholds:
PostgreSQL active transaction to long – the longest recorded transaction or request;
PostgreSQL cache hit ratio too low – too low percentage of hits in the cache;
PostgreSQL deadlock occured – fixed deadlock;
PostgreSQL idle in transaction connections to high – a lot of connections in the state idle in transaction;
PostgreSQL idle transaction to long – the longest recorded transaction in a state idel in transaction;
PostgreSQL number of waiting connections to high – zafiksirovanna request or transaction is in an idle state;
PostgreSQL recovery conflict occured – collision is detected when restoring to respond;
PostgreSQL required checkpoints occurs to frequently – checkpoints occur too often;
PostgreSQL response to long – long response time;
PostgreSQL service not running – service is not running;
PostgreSQL service was restarted – service restarted;
PostgreSQL total number of connections to high – the total number of connections is too high and close to the max_connections;
PostgreSQL waiting transaction to long – fixed too long queries or transactions in the idle state;
PostgreSQL database {#DBNAME} to large – size of the database is too large;
PostgreSQL streaming lag between {HOSTNAME} and {#HOTSTANDBY} to high – lag replication between servers too big.

Terms of low-level detection
PostgreSQL databases discovery – the discovery of existing database with the ability to filter on regular expressions. Upon detection of the size of the graph is added;
PostgreSQL database tables discovery – the discovery of available tables in the observable can filter based on regular expressions. Be careful with the filter, and add only those tables that you really interesting, because This rule generates 21 parameters on each found a table. Upon detection of the size of graphics added, scanning, changing strings and statistics on reading.
PostgreSQL streaming stand-by discovery – discovery connected replicas. Upon detection of a graph is added to the replication lag.

Available graphics, if we talk about graphics, I have tried to group the observed parameters, while not overloading the drawing excessively large number paramterov. So information from pg_stat_user_tables separated by 4 schedule.
PostgreSQL bgwriter – general information about what happens to a buffer (as allocated, and how much is written).
PostgreSQL buffers – general information on the state of shared buffers (buffers as much use as “dirty” buffers).
PostgreSQL checkpoints – information on what is happening checkpoint.
PostgreSQL connections – information on client connection.
PostgreSQL service responce – response time of service and the average execution time of queries.
PostgreSQL summary db stats: block hit / read – read from the cache and disk.
PostgreSQL summary db stats: events – events in the database (deadlocks, conflicts, commits rollbeki).
PostgreSQL summary db stats: temp files – Information on temporary files.
PostgreSQL summary db stats: tuples – general information on the changes in the lines.
PostgreSQL transactions – query execution time.
PostgreSQL uptime – and uptime percentage of hits in the cache.
PostgreSQL write-ahead log – information about the WAL log (recording capacity and number of files).
PostgreSQL: database {#DBNAME} size – information on changing the size of the database.
PostgreSQL table {#TABLENAME} maintenance – maintenance operations table (autovacuum, autoanalyze, vacuum, analyze).
PostgreSQL table {#TABLENAME} read stats – statistics on reading from the cache disk.
PostgreSQL table {#TABLENAME} rows – a change in the lines.
PostgreSQL table {#TABLENAME} scans – information on scanning (sequential / index scans).
PostgreSQL table {#TABLENAME} size – Information on the size of tables and indexes.
PostgreSQL streaming replication lag with {#HOTSTANDBY} – lag size replication replica server.

In conclusion, a little graphic examples:
Here we can see that the database regularly creates temporary files should look for the culprit in the log and review work_mem. Here the events occurring in the database – commits / rollbeki and conflicts / deadlocks – a whole all is well here. Here the state streaming replication with one of the servers – the time lag in seconds and bytes. And the final schedule – the response time of the service and the average query time. That’s all, thank you all for your attention!

(Visited 3,427 times, 1 visits today)
PostgreSQL is a modern, dynamic database with a very large set of features that allow you to solve a wide range of tasks. Using PostgreSQL usually refers to a very critical segment of the IT infrastructure that is associated with the processing and storage of data. Given a special...

Do you want to be notified about new DBA updates, releases, jobs and free tips? Join our email newsletter. It's fast and easy. You will be among the first to know about hot new DBA updates and stuff, it will help you enhance your DBA skills.
We take your privacy very seriously