Gemini, Grok, and the Crowded Field of Language Models

Jake Browning
Dec 13, 2023
3 min read

Tourists on a guided program in Europe often come up with a simple complaint about the area, summed up as ABC: Another Bloody Church. Although each is the pride and centerpiece of their city, they often begin to run together; after all, they share the same layout, generally were built in the same eras, and usually relied on the same tropes. Before long, you really come to long for some variety.

The world is beginning to feel the weight of ABLM: Another Bloody Language Model. Some of these models are just modifications of ChatGPT or Llama, but we've recently also seen the emergence of new foundation models built from the ground up, such as Google's Gemini and X.AI's Grok. For the most part, the results of these systems indicate they are comparable to competitors--Gemini compares favorably with GPT-4, with its impressive multimodal capacities, whereras Grok seens more similar to GPT 3.5. But, for the average user, the differences will not be that marked (especially since Grok seems to have been trained on ChatGPT's outputs).

But a striking feature of both Gemini and Grok marketing is that both have some personality--that is, they have idiosyncratic features that make them stand out. With Gemini, the marketing trailer shows a machine engaging in a real-time interaction with a human, where the machine seems to play games, recommend songs, and otherwise be playful. The responses are short and witty, often with an exclamation point. In the case of Grok, by contrast, the model is specifically designed to be "not woke," instead responding to queries with risque jokes and offering supposedly less progressive takes on politics than other chatbots. In both cases, the models are shown to be somewhat distinctive from the increasingly bland models typical of the current, highly fine-tuned models.

Appearances are, of course, deceptive. The Gemini trailer is nonsense; the interaciton wasn't in real-time, didn't involve speech, and the responses of Gemini were edited. So, basically, none of it happened as advertised. This should have been obvious: the Gemini answers were short and to the point, and current models just cannot shut up and get to the point. Turns out Gemini is no different; chatting with them on Bard is exhausting, where simple questions receive long responses full of filler. And there is no personality; the model is as dull as the others, bordering on technocratic.

In the other direction, Grok's s jokes are decidely unfunny. This is hardly surprising; humor is one of the least understood or definable features of human life, and is often time-, culture- and even audience-specific. It isn't clear there is a "property" of humor out in the world that we are tracking, so much as "humor" naming all those things that cause people to laugh. And, given the inappropriate things some people laugh at, humor is a bizarre and often awful category. Asking Grok to pick up the nuances of good comedy from reinforcement learning with human feedback is probably a doomed project. What Grok has done instead is recognize that some people think the word "fuck" is funny and that sarcastic responses--"not!", "psyche", and "just kidding"--seem to be a simply fallback for any sitation. In short, they had my sense of humor in elementary school.

Still, the Gemini and Grok roll out both highlight that we are entering the era where language models cannot compete by getting much better. Except in isolated domains, like coding, we are increasingly seeing the limits of these models from continued scaling. While Gemini might get a slight bump from expert users if they combine chain-of-thought with clever prompting, the average user isn't going to bother. They will instead look for the cheapest model out there that they enjoy using.

Without being able to get noticeably better, companies are starting to really appeal to users by making their LLMs quirky--coming up with a gimick that sets them apart: uncensored, right-wing, funny, playful, or whatever. It isn't clear this is an effective strategy for building a business. But it is clear everyone is starting to realize this is about as good as these models get.

Gemini, Grok, and the Crowded Field of Language Models

Recent Posts

Comments