LLaMA

google-goes-“open-ai”-with-gemma,-a-free,-open-weights-chatbot-family

Google goes “open AI” with Gemma, a free, open-weights chatbot family

Free hallucinations for all —

Gemma chatbots can run locally, and they reportedly outperform Meta’s Llama 2.

The Google Gemma logo

On Wednesday, Google announced a new family of AI language models called Gemma, which are free, open-weights models built on technology similar to the more powerful but closed Gemini models. Unlike Gemini, Gemma models can run locally on a desktop or laptop computer. It’s Google’s first significant open large language model (LLM) release since OpenAI’s ChatGPT started a frenzy for AI chatbots in 2022.

Gemma models come in two sizes: Gemma 2B (2 billion parameters) and Gemma 7B (7 billion parameters), each available in pre-trained and instruction-tuned variants. In AI, parameters are values in a neural network that determine AI model behavior, and weights are a subset of these parameters stored in a file.

Developed by Google DeepMind and other Google AI teams, Gemma pulls from techniques learned during the development of Gemini, which is the family name for Google’s most capable (public-facing) commercial LLMs, including the ones that power its Gemini AI assistant. Google says the name comes from the Latin gemma, which means “precious stone.”

While Gemma is Google’s first major open LLM since the launch of ChatGPT (it has released smaller research models such as FLAN-T5 in the past), it’s not Google’s first contribution to open AI research. The company cites the development of the Transformer architecture, as well as releases like TensorFlow, BERT, T5, and JAX as key contributions, and it would not be controversial to say that those have been important to the field.

A chart of Gemma performance provided by Google. Google says that Gemma outperforms Meta's Llama 2 on several benchmarks.

Enlarge / A chart of Gemma performance provided by Google. Google says that Gemma outperforms Meta’s Llama 2 on several benchmarks.

Owing to lesser capability and high confabulation rates, smaller open-weights LLMs have been more like tech demos until recently, as some larger ones have begun to match GPT-3.5 performance levels. Still, experts see source-available and open-weights AI models as essential steps in ensuring transparency and privacy in chatbots. Google Gemma is not “open source” however, since that term usually refers to a specific type of software license with few restrictions attached.

In reality, Gemma feels like a conspicuous play to match Meta, which has made a big deal out of releasing open-weights models (such as LLaMA and Llama 2) since February of last year. That technique stands in opposition to AI models like OpenAI’s GPT-4 Turbo, which is only available through the ChatGPT application and a cloud API and cannot be run locally. A Reuters report on Gemma focuses on the Meta angle and surmises that Google hopes to attract more developers to its Vertex AI cloud platform.

We have not used Gemma yet; however, Google claims the 7B model outperforms Meta’s Llama 2 7B and 13B models on several benchmarks for math, Python code generation, general knowledge, and commonsense reasoning tasks. It’s available today through Kaggle, a machine-learning community platform, and Hugging Face.

In other news, Google paired the Gemma release with a “Responsible Generative AI Toolkit,” which Google hopes will offer guidance and tools for developing what the company calls “safe and responsible” AI applications.

Google goes “open AI” with Gemma, a free, open-weights chatbot family Read More »

everybody’s-talking-about-mistral,-an-upstart-french-challenger-to-openai

Everybody’s talking about Mistral, an upstart French challenger to OpenAI

A challenger appears —

“Mixture of experts” Mixtral 8x7B helps open-weights AI punch above its weight class.

An illustrated robot holding a French flag.

Enlarge / An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It’s hard to draw a picture of an LLM, so a robot will have to do.

On Monday, Mistral AI announced a new AI language model called Mixtral 8x7B, a “mixture of experts” (MoE) model with open weights that reportedly truly matches OpenAI’s GPT-3.5 in performance—an achievement that has been claimed by others in the past but is being taken seriously by AI heavyweights such as OpenAI’s Andrej Karpathy and Jim Fan. That means we’re closer to having a ChatGPT-3.5-level AI assistant that can run freely and locally on our devices, given the right implementation.

Mistral, based in Paris and founded by Arthur Mensch, Guillaume Lample, and Timothée Lacroix, has seen a rapid rise in the AI space recently. It has been quickly raising venture capital to become a sort of French anti-OpenAI, championing smaller models with eye-catching performance. Most notably, Mistral’s models run locally with open weights that can be downloaded and used with fewer restrictions than closed AI models from OpenAI, Anthropic, or Google. (In this context “weights” are the computer files that represent a trained neural network.)

Mixtral 8x7B can process a 32K token context window and works in French, German, Spanish, Italian, and English. It works much like ChatGPT in that it can assist with compositional tasks, analyze data, troubleshoot software, and write programs. Mistral claims that it outperforms Meta’s much larger LLaMA 2 70B (70 billion parameter) large language model and that it matches or exceeds OpenAI’s GPT-3.5 on certain benchmarks, as seen in the chart below.

A chart of Mixtral 8x7B performance vs. LLaMA 2 70B and GPT-3.5, provided by Mistral.

Enlarge / A chart of Mixtral 8x7B performance vs. LLaMA 2 70B and GPT-3.5, provided by Mistral.

Mistral

The speed at which open-weights AI models have caught up with OpenAI’s top offering a year ago has taken many by surprise. Pietro Schirano, the founder of EverArt, wrote on X, “Just incredible. I am running Mistral 8x7B instruct at 27 tokens per second, completely locally thanks to @LMStudioAI. A model that scores better than GPT-3.5, locally. Imagine where we will be 1 year from now.”

LexicaArt founder Sharif Shameem tweeted, “The Mixtral MoE model genuinely feels like an inflection point — a true GPT-3.5 level model that can run at 30 tokens/sec on an M1. Imagine all the products now possible when inference is 100% free and your data stays on your device.” To which Andrej Karpathy replied, “Agree. It feels like the capability / reasoning power has made major strides, lagging behind is more the UI/UX of the whole thing, maybe some tool use finetuning, maybe some RAG databases, etc.”

Mixture of experts

So what does mixture of experts mean? As this excellent Hugging Face guide explains, it refers to a machine-learning model architecture where a gate network routes input data to different specialized neural network components, known as “experts,” for processing. The advantage of this is that it enables more efficient and scalable model training and inference, as only a subset of experts are activated for each input, reducing the computational load compared to monolithic models with equivalent parameter counts.

In layperson’s terms, a MoE is like having a team of specialized workers (the “experts”) in a factory, where a smart system (the “gate network”) decides which worker is best suited to handle each specific task. This setup makes the whole process more efficient and faster, as each task is done by an expert in that area, and not every worker needs to be involved in every task, unlike in a traditional factory where every worker might have to do a bit of everything.

OpenAI has been rumored to use a MoE system with GPT-4, accounting for some of its performance. In the case of Mixtral 8x7B, the name implies that the model is a mixture of eight 7 billion-parameter neural networks, but as Karpathy pointed out in a tweet, the name is slightly misleading because, “it is not all 7B params that are being 8x’d, only the FeedForward blocks in the Transformer are 8x’d, everything else stays the same. Hence also why total number of params is not 56B but only 46.7B.”

Mixtral is not the first “open” mixture of experts model, but it is notable for its relatively small size in parameter count and performance. It’s out now, available on Hugging Face and BitTorrent under the Apache 2.0 license. People have been running it locally using an app called LM Studio. Also, Mistral began offering beta access to an API for three levels of Mistral models on Monday.

Everybody’s talking about Mistral, an upstart French challenger to OpenAI Read More »