This might be happening because of the ‘elegant’ (incredibly hacky) way openai encodes multiple languages into their models. Instead of using all character sets, they use a modulo operator on each character, to make all Unicode characters represented by a small range of values. On the back end, it somehow detects which language is being spoken, and uses that character set for the response. Seeing as the last line seems to be the same mathematical expression as what you asked, my guess is that your equation just happened to perfectly match some sentence that would make sense in the weird language.
Do you have a source for that? Seems like an internal detail a corpo wouldn’t publish
Can’t find the exact source–I’m on mobile right now–but the code for the gpt-2 encoder uses a utf-8 to unicode look up table to shrink the vocab size. https://github.com/openai/gpt-2/blob/master/src/encoder.py
Seriously? Python for massive amounts of data? It’s a nice scripting language, but it’s excruciatingly slow
There are bindings in java and c++, but python is the industry standard for AI. The libraries for machine learning are actually written in c++, but use python language bindings. Python doesn’t tend to slow things down since machine learning is gpu-bound anyway. There are also library specific programming languages which urges the user to make pythonic code that can be compiled into c++.
I suppose it’s conceivable that there’s a bug in converting between different representations of Unicode, but I’m not buying and of this “detected which language is being spoken” nonsense or the use of character sets. It would just use Unicode.
The modulo idea makes absolutely no sense, as LLMs use tokens, not characters, and there’s soooooo many tokens. It would make no sense to make those tokens ambiguous.
I completely agree that it’s a stupid way of doing things, but it is how openai reduced the vocab size of gpt-2 & gpt-3. As far as I know–I have only read the comments in the source code– the conversion is done as a preprocessing step. Here’s the code to gpt-2: https://github.com/openai/gpt-2/blob/master/src/encoder.py I did apparently make a mistake, as the vocab reduction is done through a lut instead of a simple mod.
Title mentions speaking italian
Not a single hand gesture anywhere
I’ve been duped
Looks like UiUa: uiua.org
We are so cooked
You may not understand, but we do.
Questo segreto rimarrà custodito gelosamente dalla stirpe italica. ◉‿◉Lmaooo mi ha fatto ridere troppo!
No brother non possiamo tenere questo segreto fino alla fine
Non c’è scelta, se l’ultimo italiano dovesse lasciarci, allora anche questa informazione dovrà lasciare l’umanità
Perché no? Un’ po’ come il segreto per come preparare la pasta
breaks spaghetti near you
How about go die in a hole?
We could care less
Taken literally, that implies you do care.
(To mitigate the pedantry: Given it’s a rather dispassionate response in the context of a provocation, it is probably a very weak “care” though. Just because it’s nonzero doesn’t mean it’s significant.)
Well, I couldn’t care less. I missued the phrase on purpose.
Aw shite, I’ve been pedant-baited? GG
Rememeber, whenever you break one spaghetto you break one heart 💔
That’s not italian that’s obviously Unown
I felt that when he said *83h400+93)*38hpfhi0
Damn, wild Glagolitic script found. I didn’t even realise it was in the Unicode standard.
Never go full APL
Well, it certainly doesn’t overflow on 32 bit systems
Which language uses these signs? It truly looks like some kind of alien language
Unown
I found it! its the Glagolitic script used in the 9th century before Cyrillic took over:
ⰀⰁⰂⰃⰄⰅⰆⰇⰈⰉⰊⰋⰌⰍⰎⰏⰐⰑⰒⰓⰔⰕⰖⰗⰘⰙⰚⰛⰜⰝⰞⰟⰠⰡⰢⰣⰤⰥⰦⰧⰨⰩⰪⰫⰬⰭⰮⰰⰱⰲⰳⰴⰵⰶⰷⰸⰹⰺⰻⰼⰽⰾⰿⱀⱁⱂⱃⱄⱅⱆⱇⱈⱉⱊⱋⱌⱍⱎⱏⱐⱑⱒⱓⱔⱕⱖⱗⱘⱙⱚⱛⱜⱝⱞ
Glagolitic script. Oldest known Slavic alphabet according to Wikipedia.
They should revive this script. I like it more than Cyrillic.
I would like to know too! Never saw that writing system before.
I think it’s the Ge’ez script used in Ethiopian.
Doesn’t look like it to me:
ልዩ ጊዜ ነበር። አሁን የሚሆነውን ለማስተዋል የኢንተርኔት አውራጃ ማረጋገጥ ነበር።
Yeah, you are right.
APL?
No that looks like
⌶⌷⌸⌹⌺⌻⌼⌽⌾⌿⍀⍁⍂⍃⍄⍅⍆⍇⍈⍉⍊⍋⍌⍍⍎⍏⍐⍑⍒⍓⍔⍕⍖⍗⍘⍙⍚⍛⍜⍝⍞⍟⍠⍡⍢⍣⍤⍥⍦⍧⍨⍩⍪⍫⍬⍭⍮⍯⍰⍱⍲⍳⍴⍵⍶⍷⍸⍹⍺
Kind of looks like the writing system of Georgian language but I’m not sure
No, this is Glagolitic script, an alternative to Cyrillic. Mostly used in old Slavic scriptures, was later replaced by Cyrillic and Latin.
Most Slavs themselves don’t know how to read this
It’s a dead script that was not that common in the first place, in Kievan Rus’ it was even used as a form of encryption in XI—XVI centuries for how little spread it was. It is also very different from modern Cyrillic. So, saying “most Slavs don’t know how to read it” is a bit of an understatement. Noone knows how to read it, apart from some linguists and overzealous Witcher fans.
It was widespread in Croatia until the late middle ages, about XIV-XV century.
Noone knows how to read it, apart from some linguists and overzealous Witcher fans.
I could fluently read and write it in high school. Was bored.
Yea, Croatia is the only place it got widely used. Is it some kind of historical elective course in Croatian schools? Been a coupe of times in Croatia, never seen Glagolic in the wild, though. Maybe wasn’t looking good enough.
Is it some kind of historical elective course
No, there was a poster showing correspondence with Latin on the wall, somewhere. The symbols are almost 1-1 with modern orthography, so it takes only about a week of practice. And I was really bored.
never seen Glagolic in the wild
It’s about as distant from modern use as runes are for germanic speakers, but maybe with different connotations. Decorative nonsense.
But I did submit essays written with that when I wanted to fail with style. :)
I also met a guy in college who used it to keep notes. That guy was also bored.
I guess I’ll just add you guys to the “overzealous Witcher fans” and consider my point valid.
Nah, Georgian is arcs and circles everywhere, like this: ეს ქართული დამწერლობაა.
I don’t think so:
(ქართული) გამარჯობა
Well, then I was wrong
Wow, an alien ion drive formula! Try to get warp drive out of it too!