AI In Plain English, For Normal People. PART 4.c Of 7

LLMs: The Second Stage, the Fine-Tuning

Bon dia, amics!

We continue with the best guide to understand AI and don't get behind in the future!

Last week, we saw the first step in the process of creating a large language model. In this one, we will go through:

  • LLMs: The second stage or the fine-tuning of the base model

And as always, at the bottom, you have a selection of the news of the week to stay updated and spark your curiosity, as well as cool tools and lessons that can enhance your life.

That’s damn right!

The initial stage, or training of the base model, uses A LOT of data. However, it's worth noting that sometimes this data, sourced from the Internet, can be of low quality. 

We are not only living in interesting times but also witnessing a modern 'gold rush' for acquiring top-tier GPUs.

The rule of thumb here is simple: the larger the data and computational power, the better the results and accuracy of the model. It's a case of 'the bigger, the better,' where size is a crucial factor. This is why there was little to no buzz about ChatGPT 1 or 2; we simply lacked the computational capacity a few years ago.

In the second stage, they go for quality over quantity.

The second stage or fine-tuning

Now we need to specialise the model or Pre-train it for a specific task. Here we go; we just found the "P" of ChatGPT. The 'specific task' is to be a chatbot for 'general purposes'.

How Is This Achieved?

First, the company that creates the model employs individuals to have conversations with it.

These conversations have to be of high quality, created based on labelled instructions such as "be truthful, harmless, and helpful." These guidelines could be hundreds of pages, by the way, but to get the idea.

The company will specify how the model should behave. In our case, be an Honest, Harmless, and Helpful assistant. This HHH is a framework popularized by Anthropic for how helpful, honest, and harmless a model is.

A model that meets these standards must be aligned with human preference through human-generated data (Reinforcement learning from Human Feedback RLHF) in order to meet the needs of users and businesses.

Or, to make it simple, the model should match what people want, using feedback from humans to better serve users and businesses.

Here we introduce another term: alignment. AI alignment research focuses on guiding AI systems to follow human goals, choices, or moral values. A misaligned AI system pursues some objectives but not the intended ones (established by humans). Basically, let's try not to build robots that are going to evaporate humanity.

Human evaluators label the chatbot's responses based on the HHH framework. They have multiple conversations and fix misbehaviours in time. The corrections made to the first conversations are loaded as training data, so the model learns what is correct and what isn't for the next conversations.

If you think about it, this is a very complex process. Self-improvement usually happens when we have a clear goal or achievement (e.g., "win this chess game"). However, in the case of language, this is very difficult. Everything is open to interpretation; there is no easy way to evaluate reward criteria.

By the way, this stage takes a lot more time than the first one. The first stage takes 10 days because it is super expensive to run GPUs (600.000 GPUs needed to create the new GPT 5…). Companies will do it every year or every six months. The fine-tuning stage is cheaper, and they will do it every week.

Now, the model primarily learns from these new conversation documents. The quantity depends on the resources of the company developing the model.

We train the model using these conversation documents. It learns to follow their style, which is mostly in a question-and-answer format, similar to a helpful assistant. If we ask the model any question, it will answer based on the form of the qualitative training (HHH framework).

The crazy thing is that the models are still able to utilise the knowledge we have acquired in the first stage.

This is a very simplistic way of explaining it; the fine-tuning system is way more complex. The main goal is to have a chatbot "fine-tuned" for general purposes.

Dive into “Attention is all you need," the initial Transformers paper (the neural network used in LLMs), if you want to know more about it.

-

Can these chatbots be more specific? Yes.

For example, in medical situations, it will be trained on medical instructions to write a diagnosis.

Another amazing thing is that we can experience an 'easy-to-use' fine-tuning process ourselves. GPTs, or the creation of chatbots in ChatGPT, are fine-tuned for a specific purpose or case.

We will see in the upcoming weeks how to create our own GPTs, but first, let’s learn how to take advantage of our friend, ChatGPT!

See you next week.

Thanks for your time.

My News Picks

  • Ai vs AI. Will human creativity be surpassed by AI?

Educational

And that’s all. I hope these insights, news, and tools help you prepare for the future!

Have a really nice week.

Stay kind.

Rafa TV