Originally Published October 20th, 2021

Digital Ocean App Platform is pretty neat. I've been using it for my own application MailFolio and love how it combines the "set-it-and-forget" nature of tools like render or Heroku while still being a fairly normal server so I can add additional boxes to ssh in, run cron jobs, etc.

But it's still really buggy. Deploying on it was a bit fraught with annoyance, a lot of which appears to be undocumented. I was using Docker Containers and figured if they worked locally they would work on Digital Ocean App with no issue: this was not the case!

This post exists to explain common issues I've noticed and explain some of my adventures.

Gunicorn says what?

Open up django, set up a project with django-admin startproject, create a simple container running gunicorn and bam! it won't run:

[2021-09-21:28:42] [2021-09-21:28:42 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2021-09-21:28:42] [2021-09-21:28:42 +0000] [1] [INFO] Listening at: (1)
[2021-09-21:28:42] [2021-09-21:28:42 +0000] [1] [INFO] Using worker: sync
[2021-09-21:28:42] [2021-09-21:28:42 +0000] [3] [INFO] Booting worker with pid: 3
[2021-09-21:28:42] [2021-09-21:28:42 +0000] [4] [INFO] Booting worker with pid: 4
[2021-09-21:28:47] [2021-09-21:28:47 +0000] [4] [ERROR] Exception in worker process
[2021-09-21:28:47] Traceback (most recent call last):
[2021-09-21:28:47]   File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
[2021-09-21:28:47]     worker.init_process()
[2021-09-21:28:47]   File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 142, in init_process
[2021-09-21:28:47]     self.run()
[2021-09-21:28:47]   File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 125, in run
[2021-09-21:28:47]     self.run_for_one(timeout)
[2021-09-21:28:47]   File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 62, in run_for_one
[2021-09-21:28:47]     self.notify()
[2021-09-21:28:47]   File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 75, in notify
[2021-09-21:28:47]     self.tmp.notify()
[2021-09-21:28:47]   File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/workertmp.py", line 46, in notify
[2021-09-21:28:47]     os.fchmod(self._tmp.fileno(), self.spinner)
[2021-09-21:28:47] PermissionError: [Errno 1] Operation not permitted
[2021-09-21:28:47] [2021-09-21:28:47 +0000] [4] [INFO] Worker exiting (pid: 4)
[2021-09-21:28:47] [2021-09-21:28:47 +0000] [3] [ERROR] Exception in worker process
[2021-09-21:28:47] Traceback (most recent call last):
[2021-09-21:28:47]   File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker

Googling for this reveals very slim results. And just to be clear, I'm not the only one with issues on Digital Ocean here. It's a particularly frustrating issue because Digital Ocean Apps will register a successful deploy but the gunicorn workers keep dying and reviving as they attempt some apparently illegal permission error.

What managed to fix this is:

gunicorn --worker-tmp-dir /dev/shm --workers $NWORKERS --bind :$PORT mailfolio.wsgi

But I'm not 100% sure why this works. I think Gunicorn is trying to write to a protected area as this is kind of hinted at in the documentation for docker containersand adding an extra flag of --worker-tmp-dir /dev/shm fixes it. I found the solution in a long github conversation here.

Environmental Variables are Mostly Undocumented(?)

Environmental variables are not completely documented so far as I can tell.

In the screenshot below, you'll notice some of the environmental variables I include for my dev instance of MailFolio:

Digital Ocean Environmental

I'm thankful they automatically match the database URL. The problem is that a very important environmental variable is not listed:

Namely, PORT

If you look in their documentation, they sort of imply its used in the Database URL.

The only reason I found out about this is because its cited in their docker container example.

Therefore, I include it in my gunicorn line (also above) when binding to a port. Otherwise, it would have to be hardcoded which is very bad practice:

gunicorn --worker-tmp-dir /dev/shm --workers $NWORKERS --bind :$PORT mailfolio.wsgi

Deploys That Time-Out will Have No Logs

I don't have any screenshots from when these happened, but it was not unusual for a timeout during the build stage to result in empty logs.

Small background on this: Digital Ocean App Platform will let you connect a Gitlab repo for an automatic CI/Cd setup. This is very similar to how Heroku works. In my case, I use Docker to build a container from a git repo.

These empty logs lead to a lot of hair-being-pulled-out as I stared at my computer, wondering "Why did my Digital Ocean App Platform build fail with no logs available?"

This was compounded by how random it occurred, making it hard to debug the cause-and-effect chain.

These errors disappeared almost entirely when I started building the docker container locally.

I think there's a "noisy neighbor" effect going on here: using the "Basic" tier meant a timeout everytime but upgrade to a "Pro" tier meant they happened a lot less frequently. These VPSes are sitting on shared servers and high workloads on VPSes that share the same physical server may be causing it to slow and timeout.

But logs should still be visible.

Digital Ocean Spaces and ACLs Don't Play Nice with Django

I use Spaces with django-storagesto do static and media assets. It's basically an s3 clone.

But its an s3 clone with a key difference: different ACLs (access control lists).

Amazon s3 has a very complex handle on who and how assets can be accessed.

Permissions in AWS s3

Digital Ocean is either public or private.

Permissions in Digital Ocean Spaces

This means s3 web apps that rely on querystrings against the access control list to control what their accessing, when it expires, and under what circumstances (e.g. only allowing your app's frontend can access s3 to prevent abuse) will behave inconsistently. That includes django-storages. I eventually removed the querystring auth in me settings and figure I'll confront any problems when that bridge comes.

# settings.py
# ...
DEFAULT_FILE_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"
STATICFILES_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"
AWS_DEFAULT_ACL = "public-read"

This may be because of how my build stage bundles and uploads the CSS, images, etc. at the build stage instead of while the server is running but, as I explained before, the timeouts made this an untenable solution.

As a side-note: I'm not sure how using storage affects the egress costs as they're somewhat low for Digital Ocean App Platform. C'est la vie.

Do you recommend using Digital Ocean App?

I do!

Despite my annoyances, its really straightforward to use. Simply point to a container and it will (mostly) run. It's a VPS like any so you can ssh in with the handy console and do any checks or changes as needed. All the resources are self-contained in the app so you're not chasing down hanging resources trying to clean them up.

But keep in mind these gotchas ;)