Self Hosted API
Introduction
Welcome to the BlinkReceipt Self Hosted API. This API allows you to take advantage of the full power of our Scan API without any PII (receipt images or receipt full text) leaving your infrastructure.
Prerequisites
- Install Docker
- Contact your account representative for a starter Docker environment file that will be pre-filled with credentials specific to your account
Environment Variables
The Docker image expects a number of environment variables to be set. Many of these will be pre-populated by your account rep. The other ones are summarized here:
| Variable Name | Default | Description |
|---|---|---|
| SCAN_APP_TIMEOUT_MS | 10000 | Timeout in milliseconds for each scan request |
| TMP_RESULTS_DIR | ./tmp_results | Where to store intermediate scan results between frames (absolute path) |
| ENABLE_APM | 1 | Whether to enable New Relic monitoring |
| NEW_RELIC_APP_NAME | The name of the application within New Relic | |
| NEW_RELIC_LICENSE_KEY | New Relic API key | |
| ENABLE_ES_LOGGING | 1 | Whether to enable logging to an ElasticSearch (or OpenSearch) instance |
| LOGGING_HOST | https://analytics.blinkreceipt.com:19202 | Host for an ElasticSearch instance to ship logs to |
| LOGGING_USER | ElasticSearch user | |
| LOGGING_PASS | ElasticSearch password | |
| LOGGING_INDEX | ElasticSearch index to write logs to | |
| ENABLE_RESOURCE_UPDATES | 1 | Whether to enable OTA updates to merchant detection resources |
| QUEUE_CONCURRENCY | 1 | The concurrency of the request queue - recommended to be set equal to the number of vCPUs in the pod |
| ENABLE_HTTPS | 0 | Enables HTTPS support when set to 1, requires CERT_CN |
| CERT_CN | The Common Name for the TLS certificate that the app will auto-generate |
Running a container
The quickest way to get started is to download this Docker Compose file, make sure your .env.client is in your project folder and is populated, and run docker-compose up.
Alternatively you can run via Docker as follows:
docker run --rm -d \
--env-file .env.client \
-p 3000:3000 \
--name blinkreceipt_api \
public.ecr.aws/microblink/blinkreceipt-self-hosted-api:v1.0.11
Testing the API
Once the container is running, you should be able to test it by making requests just like you would for the regular Scan API, but with http[s]://localhost:3000 as the domain. You can find the request structure in Making Requests and the response structure in Response Structure. Note that you will not need to supply an api-key header for the Self Hosted API as it is assumed that requests will be made server-to-server within your infrastructure and the API will not be exposed publicly.
The self hosted API currently only supports scanning US receipts. Passing in a different country code will result in an error.
Logging and Monitoring
Logging
Logs will always be written to stdout and can optionally be shipped to an ElasticSearch-compatible instance. We provide such an instance by default but you are free to provide your own. These logs are guaranteed not to contain any receipt data, but rather record information about the scan request, timing, error conditions, and the like. These logs can help diagnose problems in your deployment and troubleshoot individual receipt scans (by providing us the blinkReceiptId for a given scan).
Monitoring
By default, monitoring via our APM (New Relic) is enabled, and this will log up transaction information that allows us to view metrics such as throughput, latency, error rate, etc. No receipt data will be shipped to New Relic. That said, it is possible to disable this monitoring via the ENABLE_APM environment variable, or to provide your own New Relic API key to record transaction data to your own New Relic organization.
Production Deployment
Optimizing Performance
For optimal performance in a Kubernetes deployment, we recommend the following resource allocation:
- vCPUs:
${QUEUE_CONCURRENCY} + 0.5 - RAM:
${QUEUE_CONCURRENCY} * 0.5 GiB
Expected throughput is ~65 rpm per vCPU, and average latency is ~2s (higher for scanning image URLs due to download time). Scaling horizontally will improve throughput while maintaining the same latency characteristics.
Intermediate Storage
If you plan to use the /scan_image endpoint for multi-frame receipt images (as opposed to /scan_image_urls in which you can pass multiple frames in a single request), you'll need shared storage (e.g. an NFS drive) outside the API containers. This ensures that if container A processes frame 1 and container B processes frame 2 of the same receipt, container B can access the intermediate results from container A. The shared storage should be mounted to each container as a volume and referenced via the TMP_RESULTS_DIR environment variable as an absolute path.
For this use case it is recommended to set up a cron job to clear this temporary results folder periodically - we use 5 minutes as the interval
Readiness & Health Checks
For orchestration such as Kubernetes that can make use of readiness and liveness probes, these are available at GET /readyz and GET /healthz respectively. These endpoints will return status code 200 for success and 500 otherwise.
Note that a significant amount of data is loaded upon app startup so it is recommended to allow up to 30 seconds in readiness probe config.
Debugging
If scan requests are failing with error codes, you can refer to the Error Handling reference for details.
To debug data quality issues (i.e. wrong/missing fields), it is most helpful to provide us with the blinkReceiptId associated with the scan request as well as the image(s) that were passed in so that we can attempt to reproduce.