Emerging Large Language Model Architecture

Author name

Fortan Pireva

May 27, 2024

Emerging Architectures for LLM Applications

This stack is based on in-context learning, which is the design pattern that is most started with.

The core idea of this design pattern is using LLMs off the shelf (without any fine-tuning) and then control their behaviour through clever prompting and conditioning on private “contextual” data.

The workflow can be devided into three stages:

  • Data preprocessing/ embedding - storing private data to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, and stored in vector database.
  • Prompt construction/ retrieval - A compiled prompt typically combines prompt template hard-coded by the developer; examples of valid outputs called few-shots examples; any necessary information retrieved from external APIs, and a set of relevant documents retrieved from the vector database.
  • Prompt execution/ inference - prompts submitted to pre-trained LLM for inference - both proprietary model APIs and open-source or self-trained models. (logging, caching, and validation through this step can be added)

What about agents?

The most important components missing from this reference architecture are AI agent frameworks. AutoGPT, described as “an experimental open-source attempt to make GPT-4 fully autonomous,” was the fastest-growing Github repo in history this spring, and practically every AI project or startup out there today includes agents in some form.

Most developers we speak with are incredibly excited about the potential of agents. The in-context learning pattern we describe in this post is effective at solving hallucination and data-freshness problems, in order to better support content-generation tasks. Agents, on the other hand, give AI apps a fundamentally new set of capabilities: to solve complex problems, to act on the outside world, and to learn from experience post-deployment

Looking ahead

Pre-trained AI models represent the most important architectural change in software since the internet. They make it possible for individual developers to build incredible AI apps, in a matter of days, that surpass supervised machine learning projects that took big teams months to build.