Apple has released an artificial intelligence (AI) model called OpenELM ( Open Efficient Language Model ), along with its code, weights, data sets, and training processes.
Like Google, Samsung and Microsoft, which are focusing on developing generative AI models on both desktop and mobile devices, Apple has also joined this trend. This marks the birth of a new family of open language models (LLMs) capable of running on-device without the need for cloud servers.
OpenELM was recently released on Hugging Face, and consists of several small models designed to perform text generation tasks efficiently.
The OpenELM model family consists of eight members, with four pre-trained models and four fine-tuned models , with parameter sizes ranging from 270 million to 3 billion (3B). Microsoft’s Phi-3 model has 3.8 billion parameters (3.8B).
Pre-training is an important method for large models to generate continuous, usable text, while fine-tuning allows models to respond to specific user requests with greater relevance. Specifically, pre-trained models often complete requests by adding new text to the seed words. For example, when faced with the request “teach me how to make bread,” the model may not provide step-by-step instructions but instead respond
with “use a home oven to bake.” This problem can be overcome by fine-tuning.
OpenELM improves the effectiveness of Transformer language models by adopting a hierarchical scaling and fine-tuning strategy after pre-training on public datasets. Therefore, OpenELM’s Transformer layers do not have the same set of parameters but have different configurations and parameters. This strategy significantly improves the model’s accuracy. For example, with a capacity of about 1 billion parameters (1B),
OpenELM’s accuracy is 2.36% higher than that of OLMo, and the number of tokens required for pre-training is halved.
Apple has released the OpenELM model weights under its “Sample Code License,” as well as several checkpoints, model performance statistics, and instructions for pre-training, evaluation, fine-tuning, and parameter efficiency optimization during training. Netizens commented, “It can be said to be very developer-friendly, after all, a large part of the difficulty of deep networks lies in parameter tuning.”
Apple’s Sample Code License does not prohibit commercial use or modification; it only requires that “if you redistribute Apple software in its entirety and without modification, you must retain this notice and the following disclaimer and language on all redistributions.”
This license is not a recognized open source license, and while Apple has not imposed excessive restrictions, it explicitly states that if any derivative work based on OpenELM is found to infringe its rights, Apple reserves the right to assert patent claims.
Apple further stresses that these models “ provide no guarantee of security. Therefore, the models may generate inaccurate, harmful, biased, or offensive output based on the wake words .”
OpenELM is just the latest installment of open AI models released by Apple. Last October, Apple quietly released Ferret, an open language model with multimodal capabilities, which quickly gained attention.
In a paper on the model published on arXiv.org , Apple stated that the development of OpenELM was “ led by Sachin Mehta, with additional contributions from Mohammad Rastegrai and Peter Zatloukal ,” and that the family of models aims to “ enhance and empower the open research community, fostering future research efforts .”
Apple’s OpenELM models come in four sizes, with parameters ranging from 270 million to 3 billion, and two versions: pre-trained and instructionally tuned.
These models were pre-trained on a public dataset consisting of a total of 1.8 trillion tokens from websites including Reddit, Wikipedia, and arXiv.org. OpenELM models are suitable for running on commercial laptops and even some smartphones. Apple noted in the paper that they performed benchmarking tests on both “ a workstation equipped with an Intel i9-13900KF CPU, 64GB DDR5-4000 DRAM, and 24GB VRAM Nvidia RTX 4090 GPU running Ubuntu 22.04 ” and “ an Apple MacBook Pro equipped with an M2 Max system chip, 64 GiB RAM, and running macOS 14.4.1 .” Interestingly, all of the models in the new family use a layered scaling strategy to assign parameters within each layer of the Transformer model.
According to Apple, this approach provides more accurate results while improving computational efficiency.
Leave a Reply