GPT-4o’s Chinese language token-training knowledge is polluted by spam and porn web sites
The brand new tokenizer has 200,000 tokens in whole, and about 25% are in non-English languages, says Deedy Das, an AI investor at Menlo Ventures.… Read More »GPT-4o’s Chinese language token-training knowledge is polluted by spam and porn web sites