• @brucethemoose@lemmy.world
    link
    fedilink
    English
    04 months ago

    Depends which 14B. Arcee’s 14B SuperNova Medius model (which is a Qwen 2.5 with some training distilled from larger models) is really incrtedible, but old Llama 2-based 13B models are awful.

      • @brucethemoose@lemmy.world
        link
        fedilink
        English
        0
        edit-2
        4 months ago

        Try a new quantization as well! Like an IQ4-M depending on the size of your GPU, or even better, an 4.5bpw exl2 with Q6 cache if you can manage to set up TabbyAPI.