Skip to content

What I've been up to

How to follow Mastodon accounts using an RSS feed

I've been really happy with my self-hosted RSS reader FreshRSS . I've consolidated a lot of the newsletter spam and occasional checking of blogs with a more considered decision to go to the RSS reader.

This means that I have had less desire to check out whats going on on Mastodon.

I just found out that I can get an RSS feed for Mastodon accounts. So, if I wanted to follow @charliemarsh I can just append .rss to the url of his profile webpage. (Note, this has to be his original page). So, Charlies original page is https://hachyderm.io/@charliermarsh and the RSS feed with all his public posts are available at https://hachyderm.io/@charliermarsh.rss.

I have started to follow a few Python people where I'm not aware of any blog to follow instead. Aside from Charlie Marsh, I also now follow:

I also have an account that I barely use at @KPLauritzen that you can follow.

Via Feditips

Warn about destructive Terraform changes in pull requests

The problem

When automating infrastructure changes through CI/CD pipelines, it can be VERY scary to merge a pull request that changes something in your infrastructure that you are not very familiar with.

Sure, you have tested the change in a test environment. Did you try to make a plan against a dev environment too? Have you tested against all the relevant targets of this change?

Maybe you are reviewing someone else's infrastructure changes. How can you be sure you have caught if this is actually destroying and recreating all the databases?

I've dealt with too much of this anxiety! AUTOMATE IT AWAY!

With some inspiration from my friend, Lasse Hels, I created this bash script for Azure DevOps Pipelines.

What it does

  • It is assumed to be running as part of a pull request validation pipeline
  • Assuming there is a terraform plan created in the file tfplan, it parses the plan as plaintext and json
  • No matter what, the plaintext plan is posted as a comment in the pull request. The comment will be collapsed by default.
  • If it finds any destructive changes in the plan, the comment will have a big scary warning and be marked as "Active". This means someone will have to look at it and resolve it before the pull request can be merged.

The script

#!/bin/bash
set -euo pipefail

# Somehow create your plan as tfplan
PLAN_TEXT=$(terraform show tfplan)
PLAN_JSON=$(terraform show -json tfplan)
HAS_DESTRUCTIVE_CHANGES=$(echo "$PLAN_JSON" | jq -r '.resource_changes[] | select(.change.actions[] | contains("delete"))')

# Conditional alert
DANGER_MESSAGE=""
if [ ! -z "$HAS_DESTRUCTIVE_CHANGES" ]; then
 DANGER_MESSAGE="**DANGER! YOU ARE ABOUT TO DESTROY RESOURCES**"
fi

# Actual comment to be posted
CODE_BLOCK_FENCE='```'
COMMENT=$(cat << EOF
${DANGER_MESSAGE}
<details><summary>Click to expand</summary>

${CODE_BLOCK_FENCE}
${PLAN_TEXT}
${CODE_BLOCK_FENCE}
</details>
EOF
)

# Set comment status to Active for destructive changes, Resolved otherwise
COMMENT_STATUS=2 # Resolved
if [ ! -z "$HAS_DESTRUCTIVE_CHANGES" ]; then
 COMMENT_STATUS=1 # Active
fi

# Build payload for ADO API
JSON_PAYLOAD=$(jq -n \
 --arg content "$COMMENT" \
 --arg status "$COMMENT_STATUS" \
 '{comments: [{content: $content}], status: ($status|tonumber)}'
)

# Call ADO API to make the comment
curl -X POST \
 "$(SYSTEM_COLLECTIONURI)/$(SYSTEM_TEAMPROJECT)/_apis/git/repositories/$(BUILD_REPOSITORY_NAME)/pullrequests/$(SYSTEM_PULLREQUEST_PULLREQUESTID)/threads?api-version=6.0" \
 -H "Authorization: Bearer $(SYSTEM_ACCESSTOKEN)" \
 -H "Content-Type: application/json" \
 -d "$JSON_PAYLOAD"

References

Test dependency bounds with uv run --resolution

I'm distributing a small Python package at work. A small library with some utilities for doing Machine Learning work. I'm using uv to manage the dependencies and the build process.

Part of my pyproject.toml file looks like this:

[project]
...
requires-python = ">=3.10,>3.14"
dependencies = [
    "pydantic>=2.0,<3",
]

How do I know that my library will work with both pydantic==2.0 and pydantic==2.10 (The current version at time of writing)? I could just require a much smaller band of possible versions, but I want my library to be useful for as many users as possible. And they might need to use a different version of pydantic for their own projects.

Similarly, I want to make sure my library actually works with the range of allowed Python versions.

I run my tests with uv run pytest. This will use the locked dependencies in the uv.lock file to create a virtual environment and run the tests in that environment.

But, I can use the --resolution flag to test my library with different versions of the dependencies. According to the uv documentation, there are three possible values for the --resolution flag:

  • highest: Resolve the highest compatible version of each package
  • lowest: Resolve the lowest compatible version of each package
  • lowest-direct: Resolve the lowest compatible version of any direct dependencies, and the highest compatible version of any transitive dependencies

I have found that using --resolution lowest is not really that useful, because some transitive dependencies might not specify a version range. Maybe they just require "numpy" without specifying a version. In that case, I will be testing my library against numpy==0.0.1 or whatever the lowest version is. That is not really useful. Instead, I use --resolution lowest-direct to test against the lowest version of the direct dependencies and then just select the highest version of the transitive dependencies.

I can also specify the python version to use with the --python flag.

Finally, I can use the --isolated flag to make sure that the tests are run in an isolated virtual environment, not affecting the active venv of my workspace.

Here is the entry in my justfile that runs the tests with different dependency resolutions:

justfile
test_dependency_bounds:
    uv run --python 3.10 --resolution lowest-direct --isolated pytest
    uv run --python 3.13 --resolution highest --isolated pytest

Homelab: Commiting my secrets to git

I spent some time tonight setting configuring some services on my home kubernetes cluster. See this post for more details on how I set up the cluster. So far it's been a fun experiment to see if I can avoid anything spontanously catching file. At work there is a full team of experts dedicated to keep our cluster running smoothly. At home, there is... me.

Today I managed to get FreshRSS and Atuin Sync running.

I've been using Cursor as a guide generating the yaml files and asking questions about how Kubernetes works. I think I am a decent user of Kubernetes clusters, but a rank novice as an operator of a cluster.

FreshRSS

I want to try to get away from doomscrolling, and being caught in some algorithmically generated news feed. I'll try FreshRSS for a while at least.

To get started I asked Cursor to generate a deployment, giving it a link to the FreshRSS documentation.

I had to go back and forth a few times to understand how to get a URL to resolve on my home network. The kubernetes cluster is running on the host tyr, so I can ping that from my home network on tyr.local.

Initially I wanted to host FreshRSS at rss.tyr.local, but I didn't figure out how to do that. Instead I hosted it at tyr.local/rss and then added Middleware to strip the /rss path before sending the traffic to the Service.

Complete manifest
---
# deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: freshrss
namespace: freshrss
labels:
    app: freshrss
spec:
replicas: 1
selector:
    matchLabels:
    app: freshrss
template:
    metadata:
    labels:
        app: freshrss
    spec:
    containers:
        - name: freshrss
        image: freshrss/freshrss:latest
        ports:
            - containerPort: 80
        env:
            - name: TZ
            value: "Europe/Copenhagen"
            - name: CRON_MIN
            value: "13,43"
        volumeMounts:
            - name: data
            mountPath: /var/www/FreshRSS/data
            - name: extensions
            mountPath: /var/www/FreshRSS/extensions
        resources:
            requests:
            memory: "128Mi"
            cpu: "100m"
            limits:
            memory: "256Mi"
            cpu: "500m"
    volumes:
        - name: data
        persistentVolumeClaim:
            claimName: freshrss-data
        - name: extensions
        persistentVolumeClaim:
            claimName: freshrss-extensions

---
# ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: freshrss
namespace: freshrss
annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
    traefik.ingress.kubernetes.io/router.middlewares: "freshrss-strip-prefix@kubernetescrd"
    traefik.ingress.kubernetes.io/router.tls: "false"
spec:
rules:
    - host: tyr.local
    http:
        paths:
        - path: /rss
            pathType: Prefix
            backend:
            service:
                name: freshrss
                port:
                number: 80

---
# middleware.yaml

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: strip-prefix
namespace: freshrss
spec:
stripPrefix:
    prefixes:
    - /rss
    forceSlash: true

---
# namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
name: freshrss

---
# pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: freshrss-data
namespace: freshrss
spec:
accessModes:
    - ReadWriteOnce
resources:
    requests:
    storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: freshrss-extensions
namespace: freshrss
spec:
accessModes:
    - ReadWriteOnce
resources:
    requests:
    storage: 100Mi

---
# service.yaml

apiVersion: v1
kind: Service
metadata:
name: freshrss
namespace: freshrss
spec:
selector:
    app: freshrss
ports:
    - port: 80
    targetPort: 80
type: ClusterIP

Atuin

Again, I just asked Cursor to generate the Kubernetes manifests after linking to the self-hosting docs.

One issue I had here, was that I want to commit the manifests to git, but Atuin requires a password to communicate with the local Postgres database.

secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: atuin-secrets
type: Opaque
stringData:
  ATUIN_DB_USERNAME: atuin
  ATUIN_DB_PASSWORD: seriously-insecure
  ATUIN_HOST: "127.0.0.1"
  ATUIN_PORT: "8888"
  ATUIN_OPEN_REGISTRATION: "true"
  ATUIN_DB_URI: "postgres://atuin:seriously-insecure@postgres/atuin"
immutable: true

Since both services are local and not exposed to the internet, there is probably no harm in commiting something like this.

But I had a discussion about this problem with my smart, very kubernetes savvy friend, Lasse. So I wanted to find some solution to this that did not commit any deadly sins, such as committing a secret in plaintext.

Again, I asked Cursor for suggestions, prompting it to not make any changes to my files, just list out options.

Here is the full list of suggestions:

List some ways I can add secrets here without leaking it in git
  1. Using Sealed Secrets:

    # Install sealed-secrets controller
    helm install sealed-secrets sealed-secrets/sealed-secrets
    # Encrypt your secrets
    kubeseal < atuin/secrets.yaml > atuin/sealed-secrets.yaml
    

    This lets you safely commit encrypted secrets that only your cluster can decrypt.

  2. Using External Secrets Operator with a secrets manager:

    apiVersion: external-secrets.io/v1beta1
    kind: ExternalSecret
    metadata:
    name: atuin-secrets
    namespace: atuin
    spec:
    refreshInterval: 1h
    secretStoreRef:
        name: vault-backend
        kind: ClusterSecretStore
    target:
        name: atuin-secrets
    data:
        - secretKey: ATUIN_DB_PASSWORD
        remoteRef:
            key: atuin/db-password
    

    This fetches secrets from services like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault.

  3. Using Kustomize with secret generators:

    # kustomization.yaml
    secretGenerator:
    - name: atuin-secrets
    namespace: atuin
    envs:
    - .env # This file is not committed
    
    This generates secrets from local files that aren't committed.

  4. Using environment variables with a CI/CD pipeline:

    apiVersion: v1
    kind: Secret
    metadata:
    name: atuin-secrets
    namespace: atuin
    stringData:
    ATUIN_DB_PASSWORD: ${ATUIN_DB_PASSWORD}
    ATUIN_DB_URI: "postgres://atuin:${ATUIN_DB_PASSWORD}@postgres/atuin"
    
    Your CI/CD system (like GitHub Actions) replaces the variables before applying.

  5. Using SOPS (Secrets OPerationS):

    # Encrypt secrets
    sops -e -i atuin/secrets.yaml
    
    # Decrypt when needed
    sops -d atuin/secrets.yaml | kubectl apply -f -
    
    This encrypts secrets with age, PGP, or cloud KMS keys.

I chose to use Sealed Secrets. It sounded by far the simplest, and I could sort-of understand what was going on:

  • I install the SealedSecret helm chart on my cluster. This can encrypt and decrypt text using some fancy crypto magic. Basically a private/public key pair is generated and stored as Kubernetes secerets and these are used to encrypt my secrets.
  • I install kubeseal on my local machine. It can communicate with the SealedSecret controller running in the cluster.
  • I pass a local (uncommitted) kubernetes Secret to kubeseal, it encrypts it and I get back a SealedSecret.
  • I can then store, apply and commit this SealedSecret. It will get unsealed when applied to my cluster (so my services can use it), but the unsealing only happens inside the cluster. My local manifest file is encrypted.

Let's say I want to encrypt this Secret

secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: atuin-secrets
  namespace: atuin
type: Opaque
stringData:
  ATUIN_DB_USERNAME: atuin
  ATUIN_DB_PASSWORD: "123"
  ATUIN_DB_URI: "postgres://username:123@postgres/atuin" # Match the password here

I can run kubeseal to encrypt:

kubeseal < secrets.yaml > sealed-secrets.yaml

and I get back

sealed-secrets.yaml
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  creationTimestamp: null
  name: atuin-secrets
  namespace: atuin
spec:
  encryptedData:
    ATUIN_DB_PASSWORD: AgBKfphBarMiNX8CIsvjAXqEtRp/Bq+a4y67k/M6bxMm1w/[TRUNCATED FOR SPACE]
    ATUIN_DB_URI: AgCfm2AisGVBlMrOqPvMWOor0e0UXDruZnWVG3klrfSzbtZfrzYF4x[TRUNCATED FOR SPACE]
    ATUIN_DB_USERNAME: AgAt8yDkKRjmvJtB4ecxOOcuEm1Zcoa8pX1UvtvwAAT4M18PN3JK[TRUNCATED FOR SPACE]
  template:
    metadata:
      creationTimestamp: null
      name: atuin-secrets
      namespace: atuin
    type: Opaque

Pretty cool! I have also backed up the Sealed Secrets private key in my 1Password.

kubectl get secret -n kube-system -l sealedsecrets.bitnami.com/sealed-secrets-key -o yaml > sealed-secrets-master.key

If my cluster suddenly catches fire, I can recreate my deployments in a new cluster by adding the key to that cluster

kubectl apply -f sealed-secrets-master.key
kubectl delete pod -n kube-system -l name=sealed-secrets-controller

Here is the complete manifest

Complete manifest
# config.yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: atuin-config
namespace: atuin
data:
ATUIN_HOST: "0.0.0.0"
ATUIN_PORT: "8888"
ATUIN_OPEN_REGISTRATION: "true"

---
# deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
namespace: atuin
spec:
replicas: 1
strategy:
    type: Recreate # Prevent data corruption by ensuring only one pod runs
selector:
    matchLabels:
    app: postgres
template:
    metadata:
    labels:
        app: postgres
    spec:
    containers:
        - name: postgresql
        image: postgres:14
        ports:
            - containerPort: 5432
        env:
            - name: POSTGRES_DB
            value: atuin
            - name: POSTGRES_PASSWORD
            valueFrom:
                secretKeyRef:
                name: atuin-secrets
                key: ATUIN_DB_PASSWORD
            - name: POSTGRES_USER
            valueFrom:
                secretKeyRef:
                name: atuin-secrets
                key: ATUIN_DB_USERNAME
        lifecycle:
            preStop:
            exec:
                command:
                [
                    "/usr/local/bin/pg_ctl",
                    "stop",
                    "-D",
                    "/var/lib/postgresql/data",
                    "-w",
                    "-t",
                    "60",
                    "-m",
                    "fast",
                ]
        resources:
            requests:
            cpu: 100m
            memory: 100Mi
            limits:
            cpu: 250m
            memory: 600Mi
        volumeMounts:
            - mountPath: /var/lib/postgresql/data/
            name: database
    volumes:
        - name: database
        persistentVolumeClaim:
            claimName: database
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: atuin
namespace: atuin
spec:
replicas: 1
selector:
    matchLabels:
    app: atuin
template:
    metadata:
    labels:
        app: atuin
    spec:
    containers:
        - name: atuin
        image: ghcr.io/atuinsh/atuin:v18.4.0 # Using a specific version as recommended
        args:
            - server
            - start
        env:
            - name: ATUIN_DB_URI
            valueFrom:
                secretKeyRef:
                name: atuin-secrets
                key: ATUIN_DB_URI
            - name: ATUIN_HOST
            valueFrom:
                configMapKeyRef:
                name: atuin-config
                key: ATUIN_HOST
            - name: ATUIN_PORT
            valueFrom:
                configMapKeyRef:
                name: atuin-config
                key: ATUIN_PORT
            - name: ATUIN_OPEN_REGISTRATION
            valueFrom:
                configMapKeyRef:
                name: atuin-config
                key: ATUIN_OPEN_REGISTRATION
        ports:
            - containerPort: 8888
        resources:
            limits:
            cpu: 250m
            memory: 1Gi
            requests:
            cpu: 250m
            memory: 1Gi
        volumeMounts:
            - mountPath: /config
            name: atuin-config
    volumes:
        - name: atuin-config
        persistentVolumeClaim:
            claimName: atuin-config

---
# ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: atuin
namespace: atuin
annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
    traefik.ingress.kubernetes.io/router.middlewares: "atuin-strip-prefix@kubernetescrd"
spec:
rules:
    - host: tyr.local
    http:
        paths:
        - path: /atuin
            pathType: Prefix
            backend:
            service:
                name: atuin
                port:
                number: 8888

---
# middleware.yaml

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: strip-prefix
namespace: atuin
spec:
stripPrefix:
    prefixes:
    - /atuin
    forceSlash: true

---
# namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
name: atuin

---
# sealed-secrets.yaml

---
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
creationTimestamp: null
name: atuin-secrets
namespace: atuin
spec:
encryptedData:
    ATUIN_DB_PASSWORD: AgBKfphBarMiNX8CIsvjAXqEtRp/Bq+a4y67k/M6bxMm1w/fJUERNqBKaPWqaABfHR4WEk9ePj4CWcVbHb2xVCviX4zYE4pZ9onMvzRGJa2UUl1qRsJGN/ooMRJux+ztfSXJfRzzZxt1QjBlJOmMxG0XjKu0TdahXnI4BMJ2rrBPPmWx9sr4z8YxG8BU/TL8DiJGiD2DtarQWmqSogueGpsOE/9hdeWvW4E7RNlcd7JJ0Hv/nELlhVIUB9fzGoaioDJO6qodYBWNtt2ckyNp3KwoOKXddwRV5tq1ggPKnZOqlHpDgmTaYAFNPXVGIpMNxzUfs+CU0VdT60hx5e3qMbVD86NrnqmbQ38GYc/A7TDrWImSEPjkweLPSTgK5YuQEHJBGYDy9jNNVTMHwfcXkAZkD8swu8+2Whw6No1D2WO2LwewVdTDOynjVhekGk3UF6B2lqIn9TowkIBbZZ6mYYK4VzXRCRXmo2ZiEqDMQK78ejUHdK5m43cZ9M+BEmE3lKzAmgZt+xons/xcisI63pff31urXWZsFylZvnVUnR/l0cp5jmr8KDnMp1WDPf+UyhSlxVvnfAKRyXIGi6jpMQluXVvx/waX4MdqgJMfyn3cQ6tFH4YiZCX6kdNNWjJp5lYxmhRdqWRznCB1vxuWIfXCc9eUT8Kz0Houmw/S8HR11ApNoxopbalC23wdTa9ZXlJdC4bXElfdC8HHwjTcNezDN9mc+4e+WdaKkbuYZljP
    ATUIN_DB_URI: AgCfm2AisGVBlMrOqPvMWOor0e0UXDruZnWVG3klrfSzbtZfrzYF4x+sY7fVLsfUY3RSRF84m13hIJPBxhiO3pFPAs6e6zm5GH7B+8Iem1ijIXWNVW5oc7h/Kas77k1h+TcJTVyZ4gL52oqzZM3cwAX0UdE/enNrvYWoeTsJ0UMbNw3bKZ9Ll0BPfdirdHT8Ve7jMzaDF+d11difPOhyZ7wgK3ykzOGu9G8LbzJ8IwUYYFK/1DETYU76XC/d79tUOwSYxGwf88/r2zjn9ZFA7rnzzEnV7ECR33fSoRJALZMyHMUOp8cxa1rYGPrBRyHhivdhhUnyRgXqAq/oymQo4+cwBHZFSpmtEqafQ8RpuOr2ymRgrxBGfe4n4eLprzY5EUZpFRhgxonb10YL16vg/oAlWObdYkS17ZayQtsfbHBD2udjljQXrjWNIWlT6fXG8JeJth+kFewr9+2c0Rfh9sQJ+F2otBk5x+dbt5xTKppAsAEHIy9lN8/Gbh+U+woCxgP11x+w/HYX9KXDkGHcOiAteYEI7Cf2Eo1TKD7ICVTVfReETWxAzSpKMabltNuM8fuLj6dHakvkQ6PgS537ShhyGofbLQaWTB8AMpwRCIUZme6EkfZuoO2CBt8gCnL3U6geDhHUB4ZGU4g9wPL/FlIqSPaWhafwbjc+PCyXqpOMNHdXtNc7D7bAsWN1Nri3Gk1D4ae0BDTunG/SgX4rlx6zc8kGgmFtJ/cnX//RO40Om2Yf36bdeb3KgDo4Ia49EZDaH7FlRn1cwUax0Gr3Jz4=
    ATUIN_DB_USERNAME: AgAt8yDkKRjmvJtB4ecxOOcuEm1Zcoa8pX1UvtvwAAT4M18PN3JK+6yOyhHuuTwWtWphlQnAjSWx6Bu8usgIxrw9dhBCRxf4pJIaW2VmszUnn1HOtdEFcU6+40PEZ8vJEqCQz/sQoilhZyH06VYecNZFtUHleFAaEFfSGPtxd73lqpjY62fOI8yoGfd/lmXays5vjSx9kUtUVd71FYEOf7P6x+OWlFWsbQ6FepiHygoCXTiCi9umbherpIHWCMZxELja/mNdVZp2wIO+NytedM47LIy2U0FP3b6quPc1H52OK/9AK9TJf/Ke8vUaRDE6TAqv1K0fT5diD4zwERzpNoHKHhnejKj1FOCm6WVcnPHk17zy9Et+kdB+feKpgbeZlolCSJ+JgNWnM2Y3WaovQI4i4yq3ipqQDI1AgY6hHMj1HGNH8gpFjHRy/+UfPd1f4aDO6hGAbL86O2y18VcqD7gESRJ7XVWikJWpU2hIp2FAEpopoqU1QPWyTGvvC46g+gfTARIphn1EzjKymdc4ICb8Viuy/B1oVuwFaD7y9FnNx3tPP4cSuODiG2u6q0j/UTMkAftGqPZUNu3yfkrJHziKUnGc9kuasgAFJKXL2qJuG4VBxNPwTmp2VnJiBysvUb1JTTYd+2uEu4woGmzVfm/9kjkP1rbRk+hAUj5fyW2Nebds9dgD2gXZ2yGOK/S1G0TXnriSQA==
template:
    metadata:
    creationTimestamp: null
    name: atuin-secrets
    namespace: atuin
    type: Opaque


---
# services.yaml

---
apiVersion: v1
kind: Service
metadata:
name: atuin
namespace: atuin
spec:
type: ClusterIP
ports:
    - port: 8888
    targetPort: 8888
selector:
    app: atuin
---
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: atuin
spec:
type: ClusterIP
ports:
    - port: 5432
    targetPort: 5432
selector:
    app: postgres

---
# storage.yaml

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: database
namespace: atuin
spec:
accessModes:
    - ReadWriteOnce
resources:
    requests:
    storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: atuin-config
namespace: atuin
spec:
accessModes:
    - ReadWriteOnce
resources:
    requests:
    storage: 100Mi

Homelab

Starting the homelab

I found an old mini PC (ASUS Mini PC PN30), left in a drawer from when I thought I needed it to run a Plex media server. With a sudden (unexpected) burst of motivation I decided to run a local kubernetes cluster on it. (In hindsight, I think I might also have been inspired to try self-hosting an RSS reader by this post. I just got distracted, by deciding to self-host using kubernetes).

Making a plan

I asked ChatGPT and Claude for help on how to set up a simple kubernetes setup at home. After some back-and-forth I landed on installing Debian with no graphical desktop environment, and then installing k3s. The choice of k3s was mainly made to limit the resource requirements. The Mini PC is not exactly beefy, with an underpowered CPU and only 4GB RAM (While trying to confirm this number, I found the listing for this on Amazon and it claims that I can upgrade to 8GB RAM. I might do that at some point).

Install Debian

I downloaded an ISO of Debian 12 and made a bootable usb. I connected the Mini PC to a monitor, keyboard and mouse and booted the Debian installer from the usb stick. I selected graphical installer and followed the guidance. I did not create a root user, instead letting my own user get sudo privileges. I did not install a desktop environment. I gave it the hostname tyr. I made sure to select SSH, to allow access after I unplug the peripherals.

Install tools

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install -y vim git curl wget htop

I tried accessing the mini PC over shh from my desktop.

ssh kasper@tyr.local

This did not work, but using the local IP directly works fine.

I really want to use the hostname.local thing, I learned that it is called mDNS, and I need a mDNS service. I installed Avahi, both on my desktop and on the Mini PC

sudo apt-get install avahi-daemon

Install k3s

Now, to install k3s. Following docs at https://k3s.io/.

curl -sfL https://get.k3s.io | sh - 

After a minute, the kubernetes cluster is running and I can query it from tyr

$ sudo k3s kubectl get node 
NAME   STATUS   ROLES                  AGE   VERSION
tyr    Ready    control-plane,master   15h   v1.31.4+k3s1

Next, I want to access it from my desktop. Following the k3s guide I copy /etc/rancher/k3s/k3s.yaml from tyr to ~/.kube/config my desktop with scp, and edit the server field to point to IP of tyr. I tried a lot get tyr.local to resolve instead of the IP, but as far as I can tell, kubectl is not using the mDNS stuff from above. Here is the last chat message (in a long back-and-forth) from o1 on why .local does not work.

A statically compiled binary often does not use the system's usual NSS (Name Service Switch) mechanisms—like /etc/nsswitch.conf and libnss-mdns—for hostname resolution. Instead, it typically performs "pure DNS" lookups.

That explains why:

  • ping tyr.local succeeds, because it honors nsswitch.conf and uses Avahi/mDNS.
  • kubectl fails on tyr.local, because it bypasses your local mDNS setup and tries querying a DNS server that doesn't know about .local names.'

ChatGPT suggest some ways to fix it, but the simplest seemed to be to just plug in the IP.

I made sure to go to my router and reserve the local IP address of tyr, so it does not change after a reboot or something.

And finally, I can run the following from my desktop

$ kubectl get node 
NAME   STATUS   ROLES                  AGE   VERSION
tyr    Ready    control-plane,master   44h   v1.31.4+k3s1

Migrating static site from Jekyll to MkDocs

Intro

I have posted (very) few post to my blog over the years. Recently, one of the things holding me back from posting is that I can't really build it locally any more. I'm using Github Pages, and although it probably very easy to use, I just use it so rarely that I don't really know what is going on.

My brief guide says to run bundle install, but I don't have bundle installed. I also don't know what it is. Github tells me to install Jekyll and Ruby. I don't have either of them.

At work I use Python a lot, and I have created a few docs sites with MkDocs, with Material for MkDocs helping out in making everything pretty. I want to use that tool-stack instead. All the content is markdown anyway, so it should not be too bad.

Build locally

I start by cloning https://github.com/KPLauritzen/kplauritzen.github.io and opening it VSCode.

Let's create a justfile to document how to interact with the repo.

justfile
install:
  uv sync --all-extras
  uv run pre-commit install

lint:
  uv run pre-commit run --all-files

None of this works yet, there is no Python project, but it is a start.

I set up pyproject.toml with uv init and add some packages with uv add pre-commit mkdocs mkdocs-material.

Now I just need the most basic config for MkDocs and we are ready to serve some HTML!

mkdocs.yml
site_name: KPLauritzen.dk
docs_dir: _posts

I can see my site locally with mkdocs serve

It's terrible, but it works! I add that as a command in the justfile

Slightly prettier

How little effort can I put in to make this tolerable?

  • Add a theme to mkdocs.yml

      theme: material
    
  • Move index.md to posts_ so we don't start with a 404 error.

That's it, now it actually looks serviceable.

There is A BUNCH of improvements that could be helpful, but it is too much fun to do. I will save some of that for a rainy day.

For now, I will just try to create a github workflow to publish this.

Publish again to Github Pages

I already have a Github Action called jekyll.yml. Let's delete that and make a new one.

I start by stealing the basic outline from mkdocs-material. After that, I follow the guide to uv in Github Actions.

This is the result:

.github/workflows/ci.yml
name: ci 
on:
  push:
    branches:
      - master 
      - main
permissions:
  contents: write
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Configure Git Credentials
        run: |
          git config user.name github-actions[bot]
          git config user.email 41898282+github-actions[bot]@users.noreply.github.com
      - name: Install uv
        uses: astral-sh/setup-uv@v5
      - run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV 
      - uses: actions/cache@v4
        with:
          key: mkdocs-material-${{ env.cache_id }}
          path: .cache
          restore-keys: |
            mkdocs-material-
      - run: uv run mkdocs gh-deploy --force

EDIT: After publishing I had some problems with my custom domain, kplauritzen.dk. Every time I ran mkdocs gh-deploy it wanted to deploy to kplauritzen.github.io instead.

I think the solution is to create a CNAME file in _posts/ as that will get picked up during the build. See the docs.

Use databricks profiles to emulate service principals

TL;DR: Edit your ~/.databrickscfg file to create a profile for your service principals.

Problem: The feedback loop when developing a CI/CD pipeline is too slow

I have a CI/CD pipeline that interacts with a Databricks workspace through the Databricks CLI. I usually develop the pipeline locally, testing it against a sandbox Databricks workspace, authenticated as myself.

But when I deploy the pipeline to the CI/CD environment, it runs as a service principal, first against a dev workspace, then against a prod workspace.

There can be some issues that only appear when running as a service principal, like permissions errors or workspace configurations. And the feedback loop is too slow: I have to commit, push, wait for the pipeline to run, check the logs, and repeat.

I want to test the pipeline locally, authenticated as a service principal, to catch these issues earlier.

Solution: Use databricks profiles to emulate service principals

Reading about the one million ways to authenticate to an Azure Databricks workspace is enough to give me a headache (Seriously, there are too many options). I have previously used environment variables to authenticate as a service principal, the various secrets in an .env file, and commenting and un-commenting as needed. It is a mess, and I'm guaranteed to forget to switch back to my user account at some point.

Instead, I can use databricks profiles to store the different authentication configurations. In ~/.databrickscfg, I can create a profile for each service principal, and switch between them with the --profile flag.

Here is an example of a ~/.databrickscfg file with two Service principal profiles:

.databrickscfg
[DEFAULT]
host  = <SOME_HOST>
token = <SOME_TOKEN>

[project-prod-sp]
host                = 
azure_client_id     = 
azure_client_secret = 
azure_tenant_id     = 

[project-dev-sp]
<same setup as above>

Of course, you should replace the placeholders with the actual values.

To test what workspace and user your profile is using, you can try the following command:

databricks auth describe --profile project-prod-sp

This will also show you where the authentication is coming from (because, as I mentioned above, there are too many ways to authenticate).

Finally, you can run your pipeline locally, using the --profile flag to specify that you want to use the service principal profile:

databricks bundle deploy --profile project-dev-sp

Alternative to using --profile flag

If you still want to use environment variables, you can set the DATABRICKS_CONFIG_PROFILE variable to the profile name you want to use, e.g.:

DATABRICKS_CONFIG_PROFILE=DEFAULT

Test kafka clients with Docker

TL;DR: Use pytest-docker to create a test fixture that starts a Kafka container.

Problem: I want to test my Kafka client, but I don't have a Kafka cluster

At work, we need to consume and produce messages to some queue. And one of the tools available already is Kafka.

Before integrating with the existing Kafka cluster, I want to test my client code. I want to ensure that it can consume and produce messages correctly.

I have an existing BaseQueueService class like this:

class BaseQueueService(ABC):
    @abstractmethod
    def publish(self, message: str) -> None:
        pass

    @abstractmethod
    def consume(self) -> str | None:
        pass

with existing implementations for Azure Service Bus and an InMemoryQueue for testing business logic.

So I want to create a KafkaQueueService class that implements this interface. And I want to test it, but I don't have a Kafka cluster available.

Solution: Use docker to start a Kafka container for testing

I can use pytest-docker to create a test fixture that starts a Kafka container. This way, I can test my KafkaQueueService class without needing a Kafka cluster.

This is how I did it:

A docker-compose.yml file to start a Kafka container:

docker-compose.yml
services:
  zookeeper:
    image: 'confluentinc/cp-zookeeper:latest'
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - "2181:2181"

  kafka:
    image: 'confluentinc/cp-kafka:latest'
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
    ports:
      - "9092:9092"
    expose:
      - "29092"

  init-kafka:
    image: 'confluentinc/cp-kafka:latest'
    depends_on:
      - kafka
    entrypoint: [ '/bin/sh', '-c' ]
    command: |
      "
      # blocks until kafka is reachable
      kafka-topics --bootstrap-server kafka:29092 --list

      echo -e 'Creating kafka topics'
      kafka-topics --bootstrap-server kafka:29092 --create --if-not-exists --topic testtopic --replication-factor 1 --partitions 1
      kafka-topics --bootstrap-server kafka:29092 --create --if-not-exists --topic input_test_topic --replication-factor 1 --partitions 1
      kafka-topics --bootstrap-server kafka:29092 --create --if-not-exists --topic output_test_topic --replication-factor 1 --partitions 1

      echo -e 'Successfully created the following topics:'
      kafka-topics --bootstrap-server kafka:29092 --list
      "

A conftest.py file to create a test fixture that starts the Kafka container:

def check_kafka_ready(required_topics, host="localhost", port=9092):
    from confluent_kafka import KafkaException
    from confluent_kafka.admin import AdminClient

    try:
        admin = AdminClient({"bootstrap.servers": f"{host}:{port}"})
        topics = admin.list_topics(timeout=5)
        # Check if all required topics are present
        if all(topic in topics.topics for topic in required_topics):
            return True
        else:
            return False
    except KafkaException:
        return False


@pytest.fixture(scope="session")
def kafka_url(docker_services):
    """Start kafka service and return the url."""
    port = docker_services.port_for("kafka", 9092)
    required_topics = ["testtopic", "input_test_topic", "output_test_topic"]
    docker_services.wait_until_responsive(
        check=lambda: check_kafka_ready(port=port, required_topics=required_topics),
        timeout=30.0,
        pause=0.1,
    )
    return f"localhost:{port}"

And finally, a test file to test the KafkaQueueService class:

@pytest.mark.kafka
def test_kafka_queue_can_publish_and_consume(kafka_url):
    kafka_queue_service = KafkaQueueService(
        broker=kafka_url,
        topic="testtopic",
        group_id="testgroup",
    )
    clear_messages_from_queue(kafka_queue_service)

    unique_message = "hello" + str(uuid.uuid4())
    kafka_queue_service.publish(unique_message)

    received_message = kafka_queue_service.consume()
    assert received_message == unique_message

Now I can test my KafkaQueueService class without needing a Kafka cluster. This even works on my CI/CD pipeline in Azure DevOps.

NOTE: The docker-services fixture starts ALL the docker services in the docker-compose.yml file.

Bonus: The passing implementation of KafkaQueueService

This passes the test above (and a few other tests I wrote):

from confluent_kafka import Consumer, KafkaError, Producer

class KafkaQueueService(BaseQueueService):
    def __init__(self, broker: str, topic: str, group_id: str):
        # Configuration for the producer and consumer
        self.topic = topic
        self.producer: Producer = Producer({"bootstrap.servers": broker})
        self.consumer: Consumer = Consumer(
            {
                "bootstrap.servers": broker,
                "group.id": group_id,
                "auto.offset.reset": "earliest",
                "enable.partition.eof": "true",
            }
        )
        self.consumer.subscribe([self.topic])

    def publish(self, message: str) -> None:
        """Publish a message to the Kafka topic."""
        logger.debug(f"Publishing message to topic {self.topic}: {message}")

        self.producer.produce(self.topic, message.encode("utf-8"))
        self.producer.flush()

    def consume(self) -> str | None:
        """Consume a single message from the Kafka topic."""
        logger.debug(f"Consuming message from topic {self.topic}")

        # Get the next message
        message = self.consumer.poll(timeout=20)
        if message is None:
            logger.debug("Consumer poll timeout")
            return None
        # No new message
        if message.error() is not None and message.error().code() == KafkaError._PARTITION_EOF:
            logger.debug("No new messages in topic")
            return None
        # Check for errors
        if message.error() is not None:
            raise Exception(f"Consumer error: {message.error()}")
        self.consumer.commit(message, asynchronous=False)
        return message.value().decode("utf-8")

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}(topic={self.topic})"

Build docker images on remote Linux VM

TL;DR: Create a Linux VM in the cloud, then create a docker context for it with

docker context create linux-builder --docker "host=ssh://username@remote-ip"

then build your image with

docker buildx build --context linux-builder --platform linux/amd64 -t my-image .

Problem: Building some Docker images on a modern Mac fails

At work, I'm using an M3 Macbook. It's a great machine, but it's not perfect. One issue is that I can't always build Docker images target to linux/amd64 on it.

Recently, I had an issue where I needed to package a Python application in Docker, and one of the dependencies was pytorch. I suspect that is where my issue was coming from.

Building the image on Mac works fine when running it on the same machine, but when I try to run it on a Linux machine, it fails with the following error:

exec /app/.venv/bin/python: exec format error

This indicated that the Python binary was built for the wrong architecture. Luckily, you can specify the target architecture using the --platform flag when building the image.

docker buildx build --platform linux/amd64 -t my-image .

Unfortunately, this didn't work for me. I suspect that the pytorch dependency was causing the issue. I got the following error:

Cannot install nvidia-cublas-cu12.

Solution: Build the image on a remote Linux VM

To solve this issue, I decided to build the image on a remote x86_64 Linux VM. This way, I can ensure that the image is built for the correct architecture.

I used an Azure Virtual Machine with an Ubuntu 24.04 image. I enabled "Auto-shutdown" at midnight every day to save costs.

After ssh-ing into the VM, I installed docker and ensured the user was added to the docker group.

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker azureuser

Check that the docker daemon is running:

sudo systemctl status docker

Now, back on my local machine, I created a docker context for the remote VM:

docker context create linux-builder --docker "host=ssh://azureuser@remote-ip"

Now, I can build the image using the context:

docker buildx build --context linux-builder --platform linux/amd64 -t my-image .

I can also enable the context for all future commands:

docker context use linux-builder

Caching Docker images in Azure DevOps Pipelines

TL;DR: Go to the bottom of the post to see the full Pipeline template.

The problem

In the Data Science team at DFDS, we are using Azure DevOps Pipelines to build and deploy our models. We are using Docker containers to package our models, and we are using Azure Pipelines for our CI/CD.

For most projects we will build the docker images in:

  1. The pull request: To make sure the docker image can be built and sometimes also to run some tests in the new container.
  2. After merging to main: To build the final image that will be deployed to production.

Step 1 usually happens more than once, as issues with a PR will often require multiple iterations of reviews and fixes. For this reason, it is important that the build time is as short as possible. Long feedback loops are not good for productivity.

So the solution is to cache the docker images between builds. Azure Pipelines even has a Cache task that claims to help with caching docker builds. But the commands listed on that documentation page have never worked for me.

The solution

My brilliant friend Morten Hels came up with a solution that works. I'm taking the liberty of writing it down here, but he is the one who deserves the credit.

Instead of using docker save and docker load for (attempting to) make cached docker layers available, we use docker buildx to build the image from, and save to, a cache.

The commend to run is:

docker buildx create --name builder --driver docker-container --use #1
docker buildx build \                                               
    --cache-from=type=local,src=docker_cache \                      #2
    --cache-to=type=local,dest=docker_cache,mode=max \              #3
    --file Dockerfile \                                             
    --output=type=docker,name=myimage \                             #4
    .
  1. Create a new builder, and use it. This is needed to make the --cache-from and --cache-to options available. I'm using the docker-container driver, but there are other options available. This one is just the easiest to set up, both locally and in a pipeline.
  2. Use the local cache as a source for the build. This will make the build use the cached layers if they are available.
  3. Save the layers that were used in the build to the local cache. This will make the layers available for the next build.
  4. Set the output to be a docker image. This is needed to make the image available for the next step in the pipeline, e.g. pushing it to a registry.

The pipeline template

Here is a complete pipeline template that you can use in your own pipelines.

templates.yaml
parameters:
  - name: docker_image_name
    type: string
    displayName: 'The name of the Docker image to build. Example: klaur-testing.'
  - name: additional_docker_build_args
    type: string
    default: ''
    displayName: 'Additional arguments to pass to the docker build command. Example: --build-arg SOME_ARG=some_value.'
  - name: dockerfile_path
    type: string
    default: 'Dockerfile'
    displayName: 'The path to the Dockerfile to use. Example: Dockerfile.'
  - name: docker_build_context
    type: string
    default: '.'
    displayName: 'The path to the directory to use as the build context. Example: .'

steps:
  - task: Cache@2
    displayName: Cache Docker layers
    inputs:
      key: '"docker" | "$(Agent.OS)" | "${{ parameters.docker_image_name }}" | ${{ parameters.dockerfile_path }}'
      restoreKeys: |
        "docker" | "$(Agent.OS)" | "${{ parameters.docker_image_name }}"
      path: $(Pipeline.Workspace)/docker_cache

  - script: |
      docker buildx create --name builder --driver docker-container --use
      docker buildx build \
        --cache-from=type=local,src=$(Pipeline.Workspace)/docker_cache \
        --cache-to=type=local,dest=$(Pipeline.Workspace)/docker_cache,mode=max \
        --file ${{ parameters.dockerfile_path }} \
        --output=type=docker,name=${{ parameters.docker_image_name }} \
        ${{ parameters.additional_docker_build_args }} ${{ parameters.docker_build_context }}
    displayName: Build Docker image
    env:
      DOCKER_BUILDKIT: 1

If the above yaml is saved in a templates.yaml file, you can use it in your pipeline like this:

azure-pipelines.yml
jobs:
  - job: BuildDockerImage
    steps:
      - template: templates.yaml
        parameters:
          docker_image_name: 'my-image'
          additional_docker_build_args: '--build-arg SOME_ARG=some_value'
          dockerfile_path: 'Dockerfile'
          docker_build_context: '.'

References