I updated my AI setup.
Because Gemma 4 works really well, I use it for most tasks locally.
It's not very fast, around 15t/s, but privacy means a lot to me.
I use llamacpp-cli with a custom caveman system prompt and set the reasoning budget manually, mostly to 500 tokens.
From time to time I use duck.ai if I'm in a hurry.