Thousands of businesses are eager to implement generative AI today, whether out of genuine necessity, pressure from shareholders or simply to check a box. Regardless of the motivation, the onus is on the AI engineer to ensure their company is not merely jumping on the bandwagon but maximising time and resources for immediate ROI. It is paramount that engineers understand the best practices for integrating AI into existing systems to stay relevant amid the ongoing revolution while avoiding a rushed and disastrous rollout.
Laying the groundwork for a successful AI deployment
First and foremost, engineers must help decision-makers understand the capabilities and limitations of generative AI technology. While it is true that large language models (LLMs) are great at text summarisation, intent detection and (given proper context) Natural Language Generation, they are not capable of independent reasoning and logic, nor can they serve as sources of authoritative knowledge. Particularly, LLMs are susceptible to hallucinations – a phenomenon where they respond with invented, true-sounding but inaccurate facts.
Engineers can identify products or services where AI can positively impact productivity, efficiency or accuracy. Customer experience (CX) improvement is an ideal low-hanging fruit, with possible use cases including private knowledge-based question-answering bots, AI-generated service analytics summaries and AI-driven user journeys (co-pilot mode).
Moreover, engineers must determine the types of interactions to augment with AI. The option with the highest tolerance for latency is asynchronous text-based communications, such as chat, SMS and social media messengers. Voice, video and hybrid modalities are also valid choices, but these communication channels are more sensitive to latency and require specialised expertise for proper integration and implementation.
Selecting an LLM
One of the early decisions when selecting the LLM or an orchestrated pipeline of multiple LLMs is the choice between proprietary or open-source LLMs. Proprietary LLMs are easier to integrate with using REST APIs and vendor-provided LLM hosting services, which are well-documented and have a broader set of capabilities out-of-the-box. Some disadvantages, however, are the lack of transparency with data sets used to train the models, privacy and data leakage concerns, vendor lock-in and the costs of fine-tuning these models at scale.
Alternatively, open-source LLMs offer greater dataset transparency for model training and permit engineers to fine-tune models more economically. Nevertheless, open-source LLMs require considerable engineering know-how to integrate and operate, but hosting providers like Azure, Hugging Face and an emerging class of LLM-focused IaaS providers like banana.dev can help. Although there is a quality gap with best-in-class commercial LLMs, it continues to close rapidly, with emerging research showing that smaller LLMs fine-tuned for a specific task outperform general-purpose LLMs on those same tasks.
Other factors like service delivery model, output quality, data context size, and inference performance are vital for LLM selection. There are four options for service delivery: vendor-hosted LLMs, private VPCs on a cloud provider, private data center or edge deployment. Optimising a model’s output and performance quality hinges on several variables, but keep in mind, the larger the model, the greater the response latency. Similarly, because of restrictions on token-per-second generation, engineers must consider account concurrency, which may require compromising quality for speed or vice versa.
Best development practices to avoid disasters
With the preparatory work complete, engineers can now develop their AI-based solutions. A good starting point is to utilise LLM sandboxes like OpenAI Playground, Llama2.ai or purpose-built experimentation platforms like Humanloop to understand the concept of prompt engineering and in-context learning. Once the project is in development, leverage toolsets that don’t lock into a single vendor ecosystem. LangChain, for instance, is emerging as a de-facto standard framework that abstracts implementation details for various LLM interfaces.
If there is a need for the AI application to interface with enterprise datasets, engineers should implement a Retrieval Augmented Generation (RAG) pattern. Llamaindex framework can simplify interfacing with various data sources in the AI application stack. Furthermore, a vector database, such as Weaviate, Pinecone, and Qdrant, or extensions to more traditional data engines, like pgvector for PostgreSQL, can provide the semantic relevant search capability required for RAG-enabled applications.
From the beginning of development, engineers must introduce observability, telemetry and user access controls into the code to safeguard the solution. Likewise, engineers should monitor costs while logging interactions securely and compliantly. Other advantageous habits include splitting the delivery pipeline into smaller tasks with further specialisation. And, implementing a human feedback loop to analyse past interactions will improve the quality of datasets used for fine-tuning and retrieval augmentation.
The AI engineer as the voice of reason
With exciting new companies like OpenAI, Anthropic, Cohere and big names like Google and Microsoft leading innovation in this space, generative AI will continue to garner the excitement of business leaders and decision-makers alike. AI engineers must act as the voice of reason in their respective companies. Be careful not to squash enthusiasm, but don’t promise the impossible.
Sergey Galchenko, CTO, IntelePeer