• andrew_bidlaw
    link
    fedilink
    English
    05 months ago

    As it learns from our data, no wonder it fucks up at regexps. They are the arcane knowledge not accessible to us mere mortals, nor to LLMs.

    • @ryathal@sh.itjust.works
      link
      fedilink
      05 months ago

      If you know even a little about how an LLM works it’s obvious why regex is basically impossible for it. I suspect perl has similar problems, but no one is capable of actually validating that.

      • Ignotum
        link
        fedilink
        05 months ago

        What do you mean it’s impossible for it? I know how LLMs work but I don’t know if any such limitations

        Write me a regex that matches a letter repeated four times, followed by a 3 or 4 digit number

        Here’s your regex: ([a-zA-Z])\1{3}\d{3,4}

        • @ryathal@sh.itjust.works
          link
          fedilink
          05 months ago

          They aren’t context aware, it’s using statistical probability. It can replicate things it’s seen a lot of like a tutorial regex. It can’t apply that to make a more complicated one. Regex in the wild isn’t really standard at all, because it’s rarely used to solve common problems. It has a bunch of random regexs from code it analyzed and will spit something out that looks similar.