Edit completion works with Qwen3.6 35B A3B!

Do you miss the old-style Copilot completions? The ones where it inserted grey text at the cursor? There’s an open version of this called “FIM completion”. And the classic model for doing this is Qwen2.5 Coder 7B.

But it turns out that Qwen3.6 35B A3B is can also do autocompletion! The fact that it has 3B active parameters means that it’s fast. And the 35B total parameters means it’s smarter than the smaller models.

So let’s fire it up using llama-server:

llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ4_XS \
    --cache-type-k q8_0 --cache-type-v q8_0 --no-mmproj \
    --ctx-size 4000

--no-mmproj says to disable the vision mode. --cache-type-k q8_0 --cache-type-v q8_0 reduces the cache precision, since we’re not really using the cache. You might also need to grab a smaller quant, depending on your available VRAM.

Then, we can configure it using Zed:

{
  "edit_predictions": {
    "provider": "open_ai_compatible_api",
    "open_ai_compatible_api": {
      "api_url": "http://localhost:8080/v1/completions",
      "model": "unsloth/Qwen3.6-35B-A3B-GGUF:IQ4_XS",
      "prompt_format": "qwen",
      "max_output_tokens": 256,
    },
  },
}

So how good is this? Well, the completions aren’t too bad at all, but Zed doesn’t seem to do much post-processing. So the completions to be too long. At lot of this could likely be improved with a proxy that did some pre- and post-processing, and maybe a bit of fine tuning.

But this is an actual, working, 100% local autocomplete. And it’s close to being actually good.

More posts