Standards for Openness in AI Models: The Model Openness Framework
Large language models (LLMs) are playing an increasing role in conversational applications of all types, leading to an amazing increase in the quality of our interactions with them. Although LLMs have proved useful in general question-answering applications like OpenAI’s ChatGPT and Google’s Gemini, their real value will come from their use in business applications like customer service. But this might not be so easy.
When an organization decides to make use of LLMs in a commercial application, many more considerations need to be taken into account in addition to the mind-blowing capabilities of the models. A casual end user asking questions of ChatGPT doesn’t have to be concerned with cost, uptime, or scalability, not to mention data security, energy usage, or the biases the model might have inherited from its training data. On the other hand, when organizations start considering how they could use these tools in mission-critical applications, these factors become very important.
Some concerns, like cost, data security, and model bias, can potentially be addressed by using some of the freely available, locally hosted models. Unlike proprietary models such as chatGPT and Gemini, when models are hosted locally, user data is not sent to the cloud, where user privacy and enterprise data security could be compromised. But these freely available models come with their own concerns.
In some ways, these models could be thought of as analogous to open-source software. The difference is that while open-source software can be inspected by anyone considering using it, the situation with open models is much more complex. Many aspects of LLMs aren’t present in software, including training data, training algorithms, training code, data preprocessing code, and model parameters. The more of these components that can be inspected, the more likely potential adopters can be confident that the model will meet their application’s requirements.
Because there are so many aspects of openness in LLMs, it becomes hard to compare them. We can’t just say a model is “open” or “closed” because there are many gradations in between. In addition, since companies want their LLM products to appear open, there is a risk that descriptions of models will gloss over aspects of their models that are not in fact, very open, a practice known as “open-washing.” What is needed is a standard approach to assessing the openness of AI models using well-defined criteria.
This brings us to the Model Openness Framework (MOF) published by the Linux Foundation AI & Data Foundation’s Generative AI Commons Group.
The MOF is described in the recent paper “The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence.” As the paper states, the MOF is “a ranked classification system that rates machine learning models based on their completeness and openness, following principles of open science, open source, open data, and open access.”
The MOF defines three categories of model openness. Class III (Open Model) is the most basic class, and it requires the publication of such components as a description of the model architecture, model parameters, evaluation results, and a technical report. The next class, Class II (Open Tooling), adds additional criteria, including the availability of the training code, the inference code, and the evaluation code. Finally, Class I (Open Science) adds the availability of training datasets, a research paper, and data preprocessing code. The MOF also designates a set of acceptable open licenses for model components and describes a process for implementing the framework.
Along with the MOF, a complementary tool has been developed to support the assessment of models according to MOF criteria—the Model Openness Tool (MOT). This can be found at https://isitopen.ai/. The MOT asks users 16 questions about a model and provides an assessment of its openness. The website also contains a catalog of hundreds of models that have been evaluated or are in the process of being evaluated.
The rise of LLMs has proved to be one of the most dynamic periods in technology development in history. The potential benefits are enormous, but the technologies have to be carefully evaluated before being used in mission-critical applications. The MOF and the MOT are indispensable tools in this assessment process. Instead of a hit-or-miss exploration process, with the MOF and the MOT, potential users of LLMs have a standard, considered, and well-thought-out process that will ensure that the models are appropriate for their intended uses.
Deborah Dahl, Ph.D., is principal at speech and language consulting firm Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interaction Working Group. She can be reached at dahl@conversationaltechnologies.com.
Related Articles
Here's the research on which dramatic new capabilities are based.
05 Aug 2024
Assessing the output of genAI systems is easier said than done.
17 Apr 2024
The W3C has published recent standards that will impact AI-powered speech applications.
21 Jul 2023