Environment variables
PGSync uses the dotenv module for reading a .env file. You can declare environment variables in a .env file located at the root of your application.
Alternatively, you can set environment variables manually. e.g
$ export PG_USER=kermit
$ export PG_HOST=localhost
$ export PG_PORT=5432
$ export PG_PASSWORD=******
$ export ELASTICSEARCH_HOST=127.0.0.1
$ export ELASTICSEARCH_PORT=9200
$ pgsync -c schema.json
Schema
{
"table": "book",
"columns": [
"isbn",
"title",
"description"
],
"children": [
{
"table": "author",
"columns": [
"name"
]
}
]
}
SQL
SELECT
JSON_BUILD_OBJECT(
'isbn', book_1.isbn,
'title', book_1.title,
'description', book_1.description,
'authors', anon_1.authors
) AS "JSON_BUILD_OBJECT_1",
book_1.id
FROM book AS book_1
LEFT OUTER JOIN
(SELECT
JSON_AGG(anon_2.anon) AS authors,
book_author_1.book_isbn AS book_isbn
FROM book_author AS book_author_1
LEFT OUTER JOIN
(SELECT
author_1.name AS anon,
author_1.id AS id
FROM author AS author_1) AS anon_2 ON anon_2.id = book_author_1.author_id
GROUP BY book_author_1.book_isbn) AS anon_1 ON anon_1.book_isbn = book_1.isbn
JSON
[
{
"isbn": "9785811243570",
"title": "Charlie and the chocolate factory",
"description": "Willy Wonka’s famous chocolate factory is opening at last!",
"authors": ["Roald Dahl"]
},
{
"isbn": "9788374950978",
"title": "Kafka on the Shore",
"description": "Kafka on the Shore is a 2002 novel by Japanese author Haruki Murakami",
"authors": ["Haruki Murakami", "Philip Gabriel"]
},
{
"isbn": "9781471331435",
"title": "1984",
"description": "1984 was George Orwell’s chilling prophecy about the dystopian future",
"authors": ["George Orwell"]
}
]
PGSync provides the following environment variables:
Environment variable | Default | Description |
---|---|---|
SCHEMA |
Path to the application schema config | |
CHECKPOINT_PATH |
Path to store the checkpoint file | |
QUERY_CHUNK_SIZE |
10000 | Database query chunk size (how many records to fetch at a time) |
POLL_TIMEOUT |
0.1 | Poll db interval (consider reducing this duration to increase throughput) |
REPLICATION_SLOT_CLEANUP_INTERVAL |
180.0 | Replication slot cleanup interval (in secs) |
LOG_INTERVAL |
0.5 | Stdout log interval (in secs) |
NUM_WORKERS |
2 | Number of workers to spawn for handling events |
USE_ASYNC |
False | Enable experimental async mode |
POLL_INTERVAL |
0.1 | db polling interval for polling mode |
ELASTICSEARCH_SCHEME |
http | Elasticsearch/OpenSearch protocol |
ELASTICSEARCH_HOST |
localhost | Elasticsearch/OpenSearch host |
ELASTICSEARCH_PORT |
9200 | Elasticsearch/OpenSearch port |
ELASTICSEARCH_USER |
Elasticsearch/OpenSearch user | |
ELASTICSEARCH_PASSWORD |
Elasticsearch/OpenSearch password | |
ELASTICSEARCH_TIMEOUT |
10 | Increase this if you are getting read request timeouts |
ELASTICSEARCH_CHUNK_SIZE |
2000 | Elasticsearch/OpenSearch index chunk size (how many documents to index at a time) |
ELASTICSEARCH_MAX_CHUNK_BYTES |
104857600 | The maximum size of the Elasticsearch/OpenSearch request in bytes (default: 100MB) |
ELASTICSEARCH_THREAD_COUNT |
4 | The size of the threadpool to use for Elasticsearch/OpenSearch bulk requests |
ELASTICSEARCH_QUEUE_SIZE |
4 | The size of the task queue between the main thread (producing chunks to send) and the processing threads |
ELASTICSEARCH_VERIFY_CERTS |
True | Verify Elasticsearch/OpenSearch SSL certificates |
ELASTICSEARCH_USE_SSL |
False | Turn on SSL |
ELASTICSEARCH_SSL_SHOW_WARN |
False | Show warnings about ssl certs verification |
ELASTICSEARCH_CA_CERTS |
Path to CA certs on disk | |
ELASTICSEARCH_CLIENT_CERT |
PEM formatted SSL client certificate | |
ELASTICSEARCH_CLIENT_KEY |
PEM formatted SSL client key | |
ELASTICSEARCH_AWS_REGION |
Elasticsearch/OpenSearch AWS Region for fully managed services | |
ELASTICSEARCH_AWS_HOSTED |
False | Elasticsearch/OpenSearch fully managed service |
ELASTICSEARCH_STREAMING_BULK |
False | Elasticsearch/OpenSearch streaming bulk index |
ELASTICSEARCH_MAX_RETRIES |
0 | The maximum number of times a document will be retried when 429 is received |
ELASTICSEARCH_INITIAL_BACKOFF |
2 | The number of seconds we should wait before the first retry |
ELASTICSEARCH_MAX_BACKOFF |
600 | The maximum number of seconds a retry will wait |
ELASTICSEARCH_RAISE_ON_EXCEPTION |
True | if False then don't propagate exceptions from call to elasticsearch |
ELASTICSEARCH_RAISE_ON_ERROR |
True | raise BulkIndexError containing errors (as .errors ) from the execution of the last chunk when some occur |
ELASTICSEARCH_API_KEY_ID |
Elasticsearch/OpenSearch API Key ID | |
ELASTICSEARCH_API_KEY |
Elasticsearch/OpenSearch API Key | |
PG_HOST |
localhost | Postgres database host |
PG_USER |
Postgres database username (superuser) | |
PG_PORT |
5432 | Postgres database port |
PG_PASSWORD |
Postgres database user password | |
PG_SSLMODE |
Postgres SSL TCP/IP connection mode ('disable', 'allow', 'prefer', 'require', 'verify-ca' or 'verify-full') | |
PG_SSLROOTCERT |
The name of a file containing SSL certificate authority (CA) certificate(s) | |
REDIS_SCHEME |
redis | Redis connection scheme |
REDIS_HOST |
localhost | Redis server host |
REDIS_PORT |
6379 | Redis server port |
REDIS_DB |
0 | Redis database |
REDIS_AUTH |
Redis password | |
REDIS_USER |
Redis username | |
REDIS_READ_CHUNK_SIZE |
1000 | Number of items to read from Redis at a time |
REDIS_WRITE_CHUNK_SIZE |
1000 | Number of items to write to Redis at a time |
REDIS_SOCKET_TIMEOUT |
5 | Redis socket connection timeout |
REDIS_POLL_INTERVAL |
0.01 | Redis poll interval |
NEW_RELIC_ENVIRONMENT |
New Relic environment name | |
NEW_RELIC_APP_NAME |
New Relic application name | |
NEW_RELIC_LOG_LEVEL |
Sets the level of detail of messages sent to the log file | |
NEW_RELIC_LICENSE_KEY |
New Relic license key | |
CONSOLE_LOGGING_HANDLER_MIN_LEVEL |
CRITICAL, ERROR, WARNING, INFO or DEBUG | |
CUSTOM_LOGGING |