Hagen 1st July 2024

1. GWDG

LLM-Service with HPC
from 1 of 4 AI-Servicecenter in Germany

Target

serve LLMs (chatGPT and open-source) with openai-compatible API for whole Germany
using HPC systems for scalability
first prototype: chat-ai.academiccloud.de

Implementation

complete Code: https://github.com/gwdg/chat-ai
using Kong as base for middleware
using SLURM as scheduling tool
12 NVIDIA H100 GPUs

Add

SAIA Hub: Platform for more Services, e.g. WHISPER, RAG, …

Principle Architecture — principle architechture

2. HAWKI

Interface-Platform with API to ChatGPT
LDAP as registration
One session for whole team (to use LLM as a colleague)

3. FLEXI

Fernuni LLM Experimental Infrastucture
based on OLLama
available models: DBRX, LLama3, LLava, Mistral, Mixtral, Gemma, Phi3, Command R+
Code: https://github.com/zesch/flexi

4. Conclusion

Everybody wants to have LLM in any kind
Open-Source Models are the best way
RAG doesn’t work without halluzinations
For big models: centralized solution would be best (GWDG)
Focus on small, individual models (like Chatbot for RZ-Support, …)