- The Artificially Intelligent Enterprise
- Posts
- Fear and Loathing in Large Language Models
Fear and Loathing in Large Language Models
Why LLMs are being discussed so much, and how they will transform enterprises
[The image above is generated by Midjourney. The prompt I used to create the image is listed at the end of this email.]
Confession, the first draft of this week’s article, was mind-numbingly dull. It was factual. I checked my spelling and grammar twice and still hated every word. Not because it wasn’t informative or my points weren’t made well. But frankly, because it wasn’t fun to read. Truth be told, it wasn’t all that fun to write.
Here’s where I show you how the proverbial sausage is made. To break out of a funk, I often use ChatGPT to help refine my content. I typically use a prompt that is something like the following:
Help me rewrite my content for a business audience - Use good U.S. English grammar, spelling, and punctuation. - Write in the style of Walt Mossberg- Clarify points by using analogies- Remove all extraneous language Do not start writing yet. Do you understand? My next prompt will include the text to be rewritten.
Once ChatGPT confirms, I paste the content I want to refine in the following prompt. I then take the output and tweak and edit the post; I also fact-check if ChatGPT spits out anything I hadn’t included. That’s how I restarted this article; I hope the prompt above helps you whenever you get stuck. Now on to version three of The Artificially Intelligent Enterprise: Fear and Loathing in Large Language Models.
Large Language Models and Why They Matter
Large Language Models (LLMs) are a hot topic today because they are the engines that power our AI. However, they also are causing some concern because of their advanced capabilities. OpenAI CEO, Sam Altman, went to Congress last week to talk about regulation around AI and LLMs. He said that because of such tools' power, we should perhaps require licensing for significantly large language models.
I won’t get into the regulatory issues. However, it is appropriate to acknowledge that LLMs possess immense power and can be likened to nuclear weapons in their potential to influence the world. However, similar to nuclear technology, there can be significant benefits if utilized effectively, resulting in a substantial increase in human productivity.
Enterprises will harness the power of LLMs. They will use already trained models to create custom models that fit their uniqueness for reaping generative AI benefits. While the initial results from training on readily available data may be satisfactory for general use, for enterprises, an even greater value lies in domain-specific models. These models utilize a company's exclusive data to unlock deeper insights and make information more easily accessible for their team members and customers.
They will do this by fine-tuning the models. Fine-tuning is a widely-used technique in machine learning, and it is essential when working with LLMs. It involves adjusting the parameters of a pre-trained model to perform a specific task, usually related to natural language processing.
Here is a general process for fine-tuning an LLM:
Choose a Pre-trained Model: Start with a pre-trained model to fine-tune. Hugging Face has an extensive directory of models that could be fine-tuned. Each model has its strengths and weaknesses, and the choice of the model may depend on the specific task that a company wishes to accomplish.
Prepare Your Data: The next step is to prepare the training data. This data should be relevant to what you want the model to perform. For example, to fine-tune a model for sentiment analysis, the dataset should contain text along with sentiment labels. Here’s where enterprises can leverage their domain expertise and history. Many enterprises have years of customer surveys or reviews that can inform the model. That data should be split into a training set for fine-tuning the model and a validation set of data developed for evaluating its performance. This is a process that is probably best done by humans that have domain expertise. Once that knowledge transfer happens, the AI can start to do that on its own.
Preprocess Your Data: Data must be preprocessed before feeding it into the model. This usually involves tokenizing the text (breaking it up into individual words or subwords) and may also involve padding or truncating the sequences so they all have the same length. Again, the specific preprocessing steps may vary depending on the model used.
Fine-Tune the Model: Once the data is prepared and preprocessed, it is time to fine-tune the model. This usually involves setting up a training loop where training data is fed into the model, then computing the loss (how far the model's predictions are from the correct labels), and then updating the model's parameters to reduce the loss.
Evaluate the Model: After fine-tuning, the model's performance is evaluated on the validation set. This will show how well it will likely perform on new, unseen data. It's important to remember that a model that performs well on its training data might not necessarily perform well on new data, a problem known as overfitting.
Hyperparameter Tuning: Depending on the evaluation results, the model may benefit from hyperparameter tuning. This involves adjusting various settings of the training process (such as the learning rate, batch size, number of training epochs, etc.) to improve the model's performance.
Deploy the Model: Once the model's performance is satisfactory, it can be deployed and used to make predictions on new data.
Building LLMs can be computationally intensive and require significant computational resources (e.g., high-end GPUs). That’s why starting with a pre-trained model that requires limited fine-tuning can reduce the overall resources needed to bring a model to production. In addition, because LLMs are complex and have many parameters, fine-tuning them can require a good understanding of machine learning principles and practices.
Leading LLMs
Today, the leading model is GPT-4, which most people know from using Open AI’s ChatGPT. However, many other models are very powerful and freely available for enterprises to download and use. Their state-of-the-art performance on various language tasks is due to their billions of parameters; in contrast, GPT-4 has over a trillion.
BLOOM: BLOOM stands for BigScience Large Open-science Open-access Multilingual Language Model. It is a transformer-based LLM created by over 1000 AI researchers to provide a free large language model for public use. The model was trained on around 366 billion tokens over March through July 2022 and uses a decoder-only transformer model architecture modified from Megatron-LM GPT-2. BLOOM was trained using data from 46 natural languages and 13 programming languages.
Claude: Anthropic revealed Claude in March 2023 as an advanced AI aide capable of executing diverse NLP functions, including code writing, summarization, and question-answering. Claude has two options: a full high-performance model and a quicker, lower-quality model known as Claude Instant. Notion, DuckDuckGo, and Quora use Claude in some of their customer-facing services.
LaMDA: LaMDA (Language Model for Dialogue Applications) is a family of conversational LLMs developed by Google. Initially developed and introduced as Meena in 2020, the first-generation LaMDA was announced in 2021, and the second generation was released the following year. In June 2022, it gained widespread attention when a Google engineer claimed that the chatbot had become sentient, although the scientific community largely rejected this claim. Finally, in February 2023, Google announced that Bard, a conversational AI chatbot powered by LaMDA, would compete with OpenAI's ChatGPT. The model launched this month is PaLM 2(Pathways Language Model version 2) based on LaMDA.
LLaMA: LLaMA, short for Large Language Model Meta AI, is a foundational LLM developed by Meta. With 65 billion parameters, this model was designed to help researchers advance their work in AI. Smaller models like LLaMA enable researchers who don’t have access to large amounts of infrastructure to study these models, further democratizing access in this fast-changing field. LLaMA is available in several sizes (7B, 13B, 33B, and 65B parameters). This model was trained on a large set of unlabeled data, which makes it ideal for fine-tuning a variety of tasks.
MT-NLG: The Megatron-Turing Natural Language Generation model (MT-NLG) is one of the most significant monolithic transformer English language models with 530 billion parameters. This 105-layer, transformer-based MT-NLG model improves upon the prior state-of-the-art models in zero-, one-and few-shot settings. As a result, it shows exceptional accuracy in a broad set of natural language tasks such as completion prediction, reading comprehension, commonsense reasoning, natural language inferences, and word sense disambiguation. The training of such a large model was made possible by novel parallelism techniques demonstrated on the NVIDIA DGX SuperPOD-based Selene supercomputer.
Many other models are available, ranging from large complex models to smaller ones that may be better suited for your specific use case. However, I think what will be exciting this year is how businesses use this transformative technology to do incredibly creative and productive things.
Tip of the Week: Training Your Own Personal Chatbot
Tuning your own AI model might be daunting; it requires some technical ability. However, I found a solution to tune a simple AI model, and it’s very turnkey and requires no significant technical skill. I will warn you that it’s not nearly as sophisticated as most models, but you get to create your own chatbot in a few minutes rather than days or weeks, and the investment is meager. I am not sold on Dante for long-term use as it’s new, but it’s an excellent example of where we are headed.
I envision having my own Amazon Alexa or Siri that understands all my data and can provide ultra-customized answers to my queries. For example, not just the weather in your city but answers to questions about your company’s product and strategy.
Since Dante is a hosted solution, you may want to consider whether the data you give it is confidential or has other security concerns.
What I Read this Week
Anatomy of LLM-Based Chatbot Applications: Monolithic vs. Microservice Architectural Patterns - Marie Stephen Leo on Medium
The thing missing from generative AI is the ‘why’ - VentureBeat
OpenAI Tells Congress the U.S. Should Create AI' Licenses' to Release New Models - Vice
Facebook Builds Language Models for 1,107 Languages - Meta Research has created models that now support more languages. This is huge, especially for companies whose current reach is not limited by geography but language. What I Listened To
Tree of Thoughts: Deliberate Problem Solving with Large Language Models - A Research paper from Shunyu Yao et al.
AI is coming to a business near you. But let's sort these problems first - ZDNet
What I Listened to
Some of the more interesting podcasts on AI I listened to this week.
AI Tools for Enterprises and Business Users
These tools are all very much aimed at creating enterprise and business applications, and most require e of technical prowess.
10Web AI-Powered WordPress Platform - This is one of the most impressive AI tools for people to maintain their company websites. It’s pretty incredible. I am currently rebuilding a client website there. I will have more feedback soon.
Dante-AI - Create personal chatbots in five minutes and embed them on your website.
Replit - Replit is an online coding platform that provides a comprehensive coding environment for users to develop, share, and collaborate on projects. It allows users to write and execute code directly within their browser, supports over 50 programming languages, and provides features like version control, code collaboration, and web hosting. Replit also features a built-in package manager, code editor, and debugger, allowing integration of third-party tools and libraries.
Label Studio - Label Studio is valuable for providing labeled data to a machine learning backend. Label Studio is an open source data labeling tool that supports multiple projects, users, and data types in one platform. It allows you to perform different kinds of labeling with many data formats. In addition, Label Studio can integrate with machine learning models to supply predictions for labels (pre-labels) or perform continuous active learning.
Microsoft Guidance - Guidance enables you to control modern language models more effectively than traditional prompting or chaining. Guidance programs allow you to interleave generation, prompting, and logical control into a continuous flow matching how the language processes the text.
LangChain - LangChain is a software development framework designed to simplify the creation of applications using large language models (LLMs). As a language model integration framework, LangChain's use cases largely overlap with those of language models in general.
For every issue of the Artificially Intelligent Enterprise, I include the MIdjourney prompt I used to create this edition. For t you who aren’t Hunter S. Thompson nerds like me, Ralph Steadman was a frequent collaborator and illustrator for Thompson’s books and articles. Here’s how I generated the email header for this edition of Artificially Intelligent Enterprise.
A scene resembling Fear and Loathing in Las Vegas drawn in the style of Ralph Steadman::8. Make the theme of fear and loathing of Artificial Intelligence with robots in the background::10. --ar 16:9
Reply