Skip to content

Docker Swarm Restoration on geocodes-aws.earthcube.org

Date: 2026-02-18

In Feb 2026, the main aws server drive filled and corrupted the docker database, The server had to be restored in pieces. this

Issue

Docker Swarm encountered a corrupted Raft WAL (Write-Ahead Log) error:

Error: manager stopped: can't initialize raft node: irreparable WAL error: wal: max entry size limit exceeded

Resolution Steps

1. SSH into the server

ssh -i ~/.ssh/earthcube.pem earthcube@geocodes-aws.earthcube.org

2. Force leave the corrupted swarm

docker swarm leave --force

3. Initialize a new swarm with the public IP

docker swarm init --advertise-addr 44.227.79.248

4. Create the overlay network

docker network create \
  --driver overlay \
  portainer_agent_network

5. Deploy the Portainer agent service

docker service create \
  --name portainer_agent \
  --network portainer_agent_network \
  -p 9001:9001/tcp \
  --mode global \
  --constraint 'node.platform.os == linux' \
  --mount type=bind,src=//var/run/docker.sock,dst=/var/run/docker.sock \
  --mount type=bind,src=//var/lib/docker/volumes,dst=/var/lib/docker/volumes \
  --mount type=bind,src=//,dst=/host \
  portainer/agent:2.33.0

6. Verify the service is running

docker service ls
docker service ps portainer_agent

Result

  • Swarm reinitialized successfully
  • Portainer agent accessible at 44.227.79.248:9001

Step 2: Rebuild in Scheduler

install the base, and services stacks. You will need to create networks as listed in the documentation

Step 3: Clean the blazegraph

stop the services stack

sudo ls -lh /var/lib/docker/volumes/graph/_data/
sudo rm /var/lib/docker/volumes/graph/_data/blazegraph.jnl
sudo rm /var/lib/docker/volumes/graph/_data/backup.jnl
sudo rm /var/lib/docker/volumes/graph/_data/rules.log
start the service stack

Step 4: Rebuild in Tenant/communites Scheduler

go to the scheduler * materialize the tenant_create asset
* materialize the tenant_load asset there may be some errored partitions. that's cruft * At present, the containers are not created. that needs to be done manually.