ChatGPT is undeniably one of the most powerful conversational AI models currently available, boasting an impressive 175 billion parameters. Parameters in AI models refer to the variables that determine how the model performs its task. These variables are learned through a process called training, during which the model is presented with vast amounts of data and learns to make predictions based on that data.
In the case of conversational AI like ChatGPT, the model is trained on massive amounts of text data, such as social media posts, news articles, and books. The more data the model is trained on,
the more parameters it requires to capture the nuances of language and generate coherent responses.
The fact that ChatGPT has 175 billion parameters is a testament to the enormous amount of data it has been trained on and the level of complexity it can handle. With this many parameters, ChatGPT is able to generate responses that are often indistinguishable from those of a human, making it an incredibly valuable tool for various industries and applications.
But the next generation of GPT models, GPT-4, is estimated to have a staggering 100 trillion parameters. This increase in parameters will likely result in even more impressive performance and will push the boundaries of what is currently possible in the realm of conversational AI.
Understanding parameters in AI models is crucial because they directly impact the quality of the model’s output. More parameters typically lead to better performance, but they also require more computational power and data to train. As AI models become increasingly complex, it’s important to have a basic understanding of parameters so that we can fully appreciate the capabilities of these powerful tools.
1/ A simple analogy to explain what a parameter is imagine you’re baking a cake. The recipe provides you with a set of instructions.
• How much flour
• How much sugar and milk
• How many eggs
These ingredients and their quantities are the data used to train the model.
But you might need to make adjustments to the recipe to bake your perfect cake.
• Adjust the oven temperature
• Change the baking time
• Add more sugar
These adjustments you make to the recipe are like the parameters in a machine learning algorithm.
• The recipe is the algorithm
• The recipe adjustments are the parameters
It’s how you turn the ‘knobs’ or ‘dials’ to tweak how the model behaves.
Because we all want to bake the perfect cake (or improve the model’s performance).
2/ Too many vs too few parameters
A model’s performance is often evaluated based on its ability to make accurate predictions on unseen data.
Think of unseen data as a new chocolate cake recipe handed to you by a friend that you’ve never seen before.
You use your previous experience to adapt to the new recipe. More parameters can allow a model to capture more patterns. But too many can be a problem. Just like if you play with the dials too much. The oven becomes too hot. You risk burning the cake.
3/ Overfitting vs underfitting
Overfitting happens when the model becomes too complex and is trained too much on the training data.
If a model is trained heavily on legal documents it may become too specialized in legal jargon and struggle to generate text on sports or fashion.
If you use too much data you might get mixed signals about what makes a good cake.
The results may not be reliable.
On the other hand, you might underfit the data.
Underfitting happens when the model is not trained enough resulting in it not capturing the patterns in the data.
This makes it unable to generalise to new, unseen data.In the case of our cake-baking analogy, underfitting could happen if you only bake one or two cakes.If you have too little data you might miss important insights that could lead to baking better cakes.
You need to find the right balance between overfitting and underfitting to achieve the best possible result.
You need to bake enough cakes to capture patterns about what makes a delicious cake.
Without overdoing it & getting lost in the weeds by adding that extra gram of sugar.
4/ Current trends
Models are getting smaller.
GPT-3 was 175B parameters.
InstructGPT was 1.3B parameters.
Eventually, you’ll be able to chop and change language models down to just a few hundred million parameters.
This will enable us to have models on the edge as @EMostaque puts it.
Removing the boundaries between humans and computers.
Your model will have your own knowledge.
You’ll create your own world.
Each of our worlds/models will interact with one another.
So what’s the core takeaway?
Bigger models do not necessarily mean better outcomes.
It’s about having better data which leads to better outcomes.
Fine-tuning with human feedback helps align language models with human intent.