Exhibitors & Products
Events & Speakers

This is an important step for SiloGen, the company's generative AI arm, and its efforts to strengthen European digital sovereignty and democratize access to large language models (LLMs) for all European languages. The model demonstrates the successful application of a novel method for training LLMs for low-resource languages.

Silo AI and TurkuNLP are building a family of multilingual open source LLMs with the aim of strengthening European digital sovereignty and democratizing access to LLMs. Developing base models that are aligned with European values is critical to this effort, ensuring that they are built on data and information that accurately represent the European Union's diverse languages, citizens, organizations, and cultural landscape. This approach not only aligns with European values, but also allows for sovereignty in how downstream applications and value are created.

Proven approach to building powerful LLMs for low-resource languages

The completion of Poro serves as a proof of concept for an innovative approach to building AI models for low-resource languages. Poro outperforms all existing open language models for Finnish, including FinGPT, Mistral, Llama, and the 176-billion-parameter BLUUMI model.

But they are not even looking for a comparison with the other LLMs. In a LinkedIn post, Korpi responded: "I'm not sure it makes sense to directly compare Poro to these models. I think the question should be how these models can be used cost-effectively in industrial use cases. It's about cost, latency, and the ability to customize the model for a specific language and use case. In some cases, industry will also need smaller models to be able to run inference off the grid." This is the Silo AI approach.“

Back to the technology. The success was attributed to pairing the low-resource Finnish language with high-resource languages. The team worked to determine the optimal frequency of data reuse for low-resource languages during training, and incorporated translated paired texts between English and Finnish. This strategy, which relies on a cross-lingual signal to improve the model's understanding of the connections between languages, proved critical in achieving superior performance in low-resource languages without compromising performance in English.

The completion of Poro exemplifies Silo AI's commitment to advancing AI models for low-resource languages. Releasing Poro as an open source model facilitates widespread access and collaborative improvement, particularly for underrepresented European languages. This approach enriches the AI community, provides a valuable resource for research and development, and reflects a conscious effort to increase linguistic diversity in AI applications. This is the first step in SiloGen's efforts to train state-of-the-art LLMs for all official EU languages.

And Silo AI is coming to HANNOVER MESSE - Jukka Korpi is coming to the show. He will be speaking at the Industrial Transformation Stage on Tuesday.