Palm 2. GPT-4. The list of text-generating AI is growing practically every day.
Most of these models are hidden behind APIs, so researchers can’t see exactly what drives them. But community efforts are increasingly resulting in open-source AI that is just as sophisticated, if not more so, than its commercial counterparts.
The latest of these efforts is the Open Language Model, a large-scale language model slated for release sometime in 2024 by the nonprofit Allen Institute for AI Research (AI2). Open Language Model, OLMo for short, is being developed in collaboration with AMD and the Large Unified Modern Infrastructure consortium, which provides supercomputing power for training and education, as well as Surge AI and MosaicML (which provide data and training code).
“The research and technology communities need access to open language models to advance this science,” Hanna Hajishirzi, senior director of NLP research at AI2, told TechCrunch in an email interview. “With OLMo, we are working to bridge the gap between public and private research capacity and knowledge by developing a competitive language model.”
One might wonder, including this reporter, why AI2 felt the need to develop an open language model when there are already several to choose from (see bloommeta lama, etc.). From Hajishirzi’s point of view, the previous open source releases, while valuable and even transcending borders, have failed in several respects.
AI2 sees OLMo as a platform and not just a model – one that allows the research community to take any component created by AI2 and either use it themselves or try to improve it. Everything AI2 makes for OLMo will be openly available, including a public demo, training dataset and API, according to Hajishirzi, and documented with “very limited” exceptions under “appropriate” licensing.
“We are building OLMo to give the AI research community better access to work directly on language models,” said Hajishirzi. “We believe that the wide availability of all aspects of OLMo will allow the research community to leverage what we are creating and work to improve it. Our ultimate goal is to jointly develop the best open language model in the world.”
The other differentiator of OLMo, according to Noah Smith, senior director of NLP research at AI2, is its focus on enabling the model to leverage and understand textbooks and academic papers better than, say, code. There have been other attempts at this, like Meta’s infamous one Galactica Model. But Hajishirzi believes that AI2’s work in academia and the tools it has developed for research, such as Semantic Scholar, will help make OLMo “uniquely suited” for scientific and academic applications.
“We believe OLMo has the potential to be something really special in this space, especially in a landscape where many are capitalizing on interest in generative AI models,” Smith said. “AI2’s unique ability to act as external experts gives us the opportunity not only to work with our own world-class expertise, but also to collaborate with the strongest minds in the industry. As such, we believe our rigorous, documented approach will set the stage for building the next generation of safe and effective AI technologies.”
It’s definitely a nice feeling. But what about the thorny ethical and legal issues surrounding the education — and release — of generative AI? The debate revolves around the rights of content owners (and other affected stakeholders), and countless pressing issues remain to be resolved in court.
To address concerns, the OLMo team plans to work with AI2’s legal team and to-be-appointed external experts, stopping at “checkpoints” in the modeling process to reassess privacy and intellectual property rights issues.
“We hope that by having an open and transparent dialogue about the model and its intended use, we can better understand how to mitigate bias and toxicity, and shed light on open research questions within the community, ultimately resulting in one of the strongest models available.” .” Smith said.
What about the possibility of abuse? Models, which are often toxic and biased by design, are vulnerable to bad actors aiming to spread disinformation and generate malicious code.
Hajishirzi said that AI2 will leverage a combination of licensing, model design, and selective access to the underlying components to “maximize scientific utility while reducing the risk of malicious exploitation.” To guide the policy, OLMo has an Ethics Review Committee with internal and external consultants (AI2 wouldn’t say exactly who) who provide feedback throughout the model building process.
We’ll see how much of a difference that makes. A lot is still open at the moment – including most of the model’s technical specifications. (AI2 said it will have around 70 billion parameters, the parameters being the parts of the model learned from historical training data.) Training is set to begin on LUMI’s supercomputer in Finland — the fastest supercomputer Europe (as of January 2019). coming months.
AI2 invites collaborators to contribute and critique the model development process. Interested parties can contact the OLMo project organizers Here.