Self Hosted E-Receipts API
Introduction
Welcome to the BlinkReceipt Self Hosted API. This API allows you to take advantage of the full power of our E-Receipts API without any PII leaving your infrastructure.
Prerequisites
- Install Docker
- Contact your account representative for a starter Docker environment file that will be pre-filled with credentials specific to your account
Infrastructure Requirements
Before deploying the application, ensure the following infrastructure is provisioned and accessible to the containers:
-
PostgreSQL-compatible database instance
- Recommended: Amazon RDS (Aurora PostgreSQL)
- Create a database and a user with read/write access to it
-
Redis instance
- Recommended: Amazon ElastiCache (Redis)
-
AmazonMQ for RabbitMQ Instance
- Recommended: AmazonMQ with RabbitMQ engine
- Instance Type:
mq.t3.micro(minimum) - Engine version:
3.11or later - Deployment mode: Single-instance (for development) or Cluster (for production)
- Network access: Ensure your containers can connect to the broker endpoint
-
S3 Bucket for OTA data updates
- The application must have read access to this bucket.
- Actual's AWS account requires write access to push OTA updates.
-
IAM Role Configuration (AWS)
- Attach IAM role(s) to your ECS tasks or EC2 instance profiles with the necessary permissions to access the S3 bucket, DB, Redis, and AmazonMQ.
System Overview
The application is designed to run entirely within containers and supports four core roles, all of which are encapsulated in the same Docker image and differentiated by the ROLE environment variable:
API Service (ROLE=API)
- Exposes the main HTTP endpoints.
- Requires connectivity to the database, Redis, and RabbitMQ.
- Should be scaled based on expected request traffic.
Worker Service (ROLE=WORKER)
- Processes background jobs from RabbitMQ (e.g., data processing, extraction tasks).
- Should be scaled depending on job volume and desired latency.
- Requires connectivity to Redis and RabbitMQ.
DB Migration Script (ROLE=MIGRATE_DB)
- Should be run once with every deployment, as it is responsible for creating tables initially, and applying any subsequent migrations as new versions of the Docker image are released
- Requires connectivity to the database.
OTA Update Cron (ROLE=UPDATE_CRON)
- Run upon deployment and periodically (e.g., once per day).
- Pulls updated data files from S3 and updates the database accordingly.
- Requires connectivity to the database and S3.
- Note: The updates are designed to be applied as a "hot swap" with minimal downtime, but using a DB like Aurora with read replicas will provide even more safety
Environment Variables
The application expects a number of environment variables to be set. Many of these will be pre-populated by your account rep, and many RabbitMQ configuration parameters are handled automatically by the Docker image. The variables you are expected to populate are:
| Variable Name | Default | Description |
|---|---|---|
| ROLE | API - HTTP API serviceWORKER - Extraction job processorMIGRATE_DB - Apply db migrationsUPDATE_CRON - Script to check for and apply OTA updates | |
| REDIS_HOST | Redis server hostname | |
| REDIS_PORT | 6379 | Redis server port |
| REDIS_USER | Redis username (optional) | |
| REDIS_PASSWORD | Redis password (optional) | |
| REDIS_TLS | false | Whether to use TLS to connect to Redis or not |
| REDIS_DB | 0 | The number of the Redis database to use |
| DB_HOST | PostgreSQL server hostname | |
| DB_PORT | 5432 | PostgreSQL server port |
| DB_NAME | PostgreSQL database name | |
| DB_USERNAME | PostgreSQL username | |
| DB_PASSWORD | PostgreSQL password | |
| DB_SSL | false | Controls whether the app's PostgreSQL connection uses SSL/TLS encryption |
| AWS_REGION | us-east-1 | AWS region for S3 and other services |
| DB_UPDATE_ROLE_ARN | The IAM role ARN for accessing the S3 bucket which will contain OTA DB updates | |
| DB_UPDATE_S3_BUCKET | The name of the S3 bucket which will contain OTA DB updates | |
| SCRAPE_WORKER_TIMEOUT_MS | 60000 | Timeout in milliseconds for each template to be processed |
| CPP_THREAD_POOL_SIZE | 50 | Maximum number of worker threads for handling concurrent requests |
| CPP_RETRY_MAX_ATTEMPTS | 10 | Maximum number of retry attempts before a request fails |
| RABBITMQ_MESSAGE_EXPIRE_MS | 60000 | Number in ms how long until an unacked message expires in a queue. This should have the same value as SCRAPE_WORKER_TIMEOUT_MS |
| RABBITMQ_TEMPLATE_BATCH | 5 | The number of templates processed per batch. Increasing this will require more memory for each worker. We approximate that each template per batch costs ~300mb of memory |
| RABBITMQ_URL | amqp://localhost:5672 | RabbitMQ connection URL (use amqps:// for TLS) |
| NODE_MEMORY | 1900 | This is used to set the NODE_OPTION max-old-space-size |
| SCRAPE_QUEUE | scrape_queue | Name of the queue where workers listen for templates to process |
| HTTPS | false | Enables HTTPS support when set to "true", requires CERT_CN |
| CERT_CN | The Common Name for the TLS certificate that the app will auto-generate | |
| ENABLE_SCAN_API | false | Allows extraction api to send receipts to OCR scanning service for specific merchants |
| OCR_SCAN_URL_ON_PREM | Base URL of the OCR scanning service |
AmazonMQ for RabbitMQ Setup
To set up AmazonMQ for RabbitMQ, follow these steps:
1. Create AmazonMQ Broker
-
Navigate to AmazonMQ Console
- Go to the AmazonMQ console in your AWS account
- Click "Create broker"
-
Configure Broker Settings
- Engine type: RabbitMQ
- Engine version: 3.11.x or later (recommended)
- Deployment mode:
- Single-instance for development/testing
- Cluster for production (provides high availability)
- Instance type:
mq.t3.micro(minimum) or larger based on your throughput requirements
-
Configuration
- Broker name: Choose a descriptive name (e.g.,
ereceipt-extraction-broker) - Username and Password: Create credentials for the broker (you'll use these in
RABBITMQ_URL)
- Broker name: Choose a descriptive name (e.g.,
-
Connectivity
- Virtual Private Cloud (VPC): Select the same VPC where your containers will run
- Subnet(s): Choose appropriate subnets
- Security groups: Create or select security groups that allow:
- Inbound access on port 5671 (AMQP with TLS) or 5672 (AMQP without TLS)
- Access from your container security groups
2. Configure Environment Variables
Once your AmazonMQ broker is created, you'll get an endpoint URL. Configure your environment variables:
# For TLS connection (recommended for production)
RABBITMQ_URL=amqps://username:password@your-broker-id.mq.us-east-1.amazonaws.com:5671
# For non-TLS connection (development only)
RABBITMQ_URL=amqp://username:password@your-broker-id.mq.us-east-1.amazonaws.com:5672
3. Security Group Configuration
Ensure your security groups allow:
- Outbound from your container security group to AmazonMQ security group on port 5671/5672
- Inbound to AmazonMQ security group from your container security group on port 5671/5672
4. Network Connectivity
- If using public subnets: Ensure your AmazonMQ broker has public access enabled
- If using private subnets: Ensure proper routing between your container subnets and AmazonMQ subnets
- For cross-AZ deployment: Consider placing your broker in multiple availability zones for resilience
Running a container
Quick Start
- Download this Docker Compose file
- Make sure your
.env.clientis in your project folder and all vars are populated - Decide how you will authenticate to AWS and set env vars / modify
docker-compose-ereceipts.ymlaccordingly:- Currently the docker compose file mounts your local
~/.awsfolder into the relevant containers so that they can authenticate the same way you do locally - It also sets the
AWS_PROFILEenv var which you should delete if you want it to use your default config, or override with a profile value of your choice - If you prefer to authenticate using access token + secret key, then you can remove the
~/.awsmounts from all containers and instead set the appropriate env vars
- Currently the docker compose file mounts your local
- Make sure to set your db-related env vars according to whatever values (user, pass, db name) you pass into the postgres container in
docker-compose-ereceipts.yml
This should serve as a blueprint for how the service is orchestrated in production.
Testing the API
Once the container is running, you can make requests against http[s]://localhost:4001 as the domain. You can find the request and response structures in our API Spec.
Logging
Logs are written to stdout and can be collected or shipped as needed
Production Deployment
Seeding DB
Make sure to run the DB Migration Script and the OTA Update Cron for each environment upon deployment to ensure that the DB has the structure + data needed by the app to perform extraction
Optimizing Performance
For optimal performance in a Kubernetes or ECS deployment, we recommend starting with the following configuration:
API
- vCPUs:
1 - RAM:
1 GiB - Replicas:
6
Workers
- vCPUs:
1.5 - RAM:
2.5 GiB - Replicas:
50
Worker Env Vars
- RABBITMQ_TEMPLATE_BATCH
5 - NODE_MEMORY
1900
Expected throughput is ~1 rpm per worker, and average latency is ~5s. Scaling horizontally will improve throughput while maintaining the same latency characteristics.
Readiness & Health Checks
For orchestration such as Kubernetes that can make use of readiness and liveness probes, these are available at GET /readyz and GET /healthz respectively. These endpoints will return status code 200 for success and 500 otherwise.
Debugging
To debug data quality issues (i.e. wrong/missing fields), it is most helpful to provide us with the blinkReceiptId associated with the request as well as the email that was passed in so that we can attempt to reproduce.