News
At the time, most language models scored little better than 25% on MMLU, which is what you would get by picking answers at random; Open AI ’s GPT -3 did best, with a score of 43.9%.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results