Machine Learning and Artificial Intelligence (first step)

Let's understand the amazing "training" of AI models!

·

10 min read

How does AI understand my prompt?

Have you ever wondered how these AI tools possess a profound understanding of our questions, surpassing our closest friends? I had this question in my mind, How exactly does these AI tools take prompt(user input) and gives result in no time that too understanding every crux of the question, How? What happens behind the scenes when the prompt is passed in that search box? How does even it understand a mixture of languages like Hindi and English? What exactly happens to that “question/prompt” is it converted to binary, machine code like in code compilers and interpreters or what sort of algorithm is used?

My curiosity led to scraping the internet, to get my "How", to know the jargon of this "AI" World and here I am with some more buzzy words: Large Language Model, Deep Learning, Neural Network, Learning models etc. These all are the answers to my "How". So Let's deep dive into the world of AI and understand the Behind-the-scenes of this movie.

An easy Analogy for AI, ML, NN & DL

Artificial Intelligence, Machine Learning, Neural Networks & Deep Learning in easy analogy.

AI

ML

NN

DL

AI is like a computer or a robot that can think and make decisions, a smart buddy that helps you with tricky problems.

Machine learning is teaching a computer to learn things on it’s own, learning from mistakes, like if you touch something hot and it hurts, you learn not to do it again

Neural networks are  like a team of tiny robots/computer working together inside the main computer. These tiny robots are like our brain cells. They talk to each other and help the computer understand things better.

Deep learning is like having a super big team of tiny robots inside the computer. 

From a hierarchical point of view, AI is the broadest field encompassing a wide range of techniques and approaches for creating intelligent systems. Machine learning is a subset of AI, as the name suggests in this we teach machines how to learn from the enormous amount of data via specific techniques like supervised learning, unsupervised learning, semi-supervised, self, reinforcement, transfer learning etc. don't worry we will cover all these buzzy words later in this section. First over a noob question:-

Is our Prompt Compiled or Interpreted?

Easy answer, it's neither compiled nor interpreted, They generate responses based on patterns they have learned from vast amounts of text data during their training. Here's a simple workflow:

  1. Prompt: The user provides a prompt to the AI model.

  2. Pattern recognition: The AI model analyzes the text in the prompt and tries to recognize patterns and relationships, It uses the information it learned during training to understand the context, and meaning of our prompt.

  3. Prediction and Generation: Based on the pattern it recognizes the AI model generates the response.

So all the catch is in the training, what sort of cool "training" did it recieved? damn! girls are mad at it and boys, still figuring it out.

Large Language Model (LLM)

Before going further let's clear this buzzy term "LLM", it will help us is future.

LLM or Large Language Model, are tools created by humans to assist with tasks involving language. They can be incredibly useful for things like chatbots, virtual assistants, content generation, and more.

These models use complex algorithms and patterns to generate responses. When we ask a question or give it a prompt, the LLM uses its vast knowledge of language to provide a response that it thinks makes sense based on what it learned during training. LLMs are trained on massive amounts of text from the internet, books, articles, and more to learn about how language works and how words and sentences are used.

This is the way AI understood Hinglish (Hindi and english), they are trained on a wide range of data from various sources including multilingual content, and they recognise the pattern forms relationships and provide appropriate results accordingly.

General Training of the Model

  1. Data collection: A large amount of Dataset is collected from different sources, which will then be iterated to recognise patterns, respond and predict the future. DATA IS THE NEW OIL! I can feel it in my heart.

  2. Tokenization: This huge dataset is broken down to smaller and smaller parts based on words and punctuation marks. Every word, and punctuation is treated as a token.

  3. Training of Model: Model is trained on how exactly words and tokens are related to each other. It is trained to recognise which words commonly come one after the other and how is a sentence formed. It learns in what context what sort of word should probably come.

  4. Weights: Weights are assigned to various tokens to determine their importance and connections in forming a sentence, higher weight to a token which is most likely to appear.

  5. Iteration: It’s a non linear process, it involves multiple iterations or epochs, in each iteration model goes through dataset multiple times, refining it’s understanding every time to give best possible response.

  6. Validation: During training to see everything is going fine, a test data is set aside for the validation, model’s performance is tested on this data.

  7. Fine-Tuning: Fine-tuning is done on specific task or language to make model more accurate and specialized, baby i’m perfect!

  8. Deployment: After testing, model is finally deployed for the users

Training a Model in such a huge dataset is indeed expensive, the cost of running ChatGPT is $100,000 per day. As far as scaling is concerned according to reports, training of GPT3, which includes 175 billion parameters(total probable outcome before final answer) required a substantial investment in both budget and computing resources, crazy!


OpenAI

Hold on, So in order to create Such a model again do we really need to go through this resource-intensive procedure once again? No No here comes the APIs, offered by OpenAI parent company of GPT.

OpenAI and other organizations that develop advanced AI models invest a significant amount of time, resources, and expertise to train these models on massive datasets. Once these models are trained, they can be accessed by developers and businesses through APIs (Application Programming Interfaces).

Developers use these APIs to integrate the capabilities of the pre-trained models into their own applications and workflows. This approach offers several advantages, like efficiency, Accessibility, scalability and consistency leading to rapid development.

Here are some of the top APIs:-


Supervised vs Prompt based learning model

I was seeing Andrew Ng's (founder of DeepLearning.ai) video he was saying supervised learning can take up to 7 months (labelling, training & deploying) where whereas a new way in which a lot of companies are working is prompt-based ai training model which is relatively faster with same efficiency, let’s study both the models in detail:-

Key difference:

  • supervised learning = labeled data

  • Prompt based model = language understanding

Supervised Learning

Prompt based AI Training e.g(GPT-3)

Data labelling: A large dataset is labelled say as cat and non-cat images, and then the model is trained to recognise cat accordingly. Labeling data is time and cost-intensive.

Data Collection: Instead of labelling data, prompt-based models like GPT3 are trained on massive amounts of unlabelled data from the internet.

Once data is labelled it’s fed on a machine learning model like a neural network, the model learns to map input with output by seeing patterns.

Its training involves predicting the next word in a sentence based on the previous word by analysing a large amount of data. This unsupervised learning doesn’t require explicit input-output mapping.

Speed: requires fresh need of data labelling and model training for each task.

Indeed time time-consuming, but it’s a one-time effort, and once trained can be used for various tasks.

They are task-specific and require new labelled data for every task.

They are versatile and can perform various prompt-based tasks.

E.g: spam email detection, handwriting recognition, sentiment analysis, language translation, image classification etc.

E.g: content generation, question answering, language translation, chatbots, code generation etc.

So exactly how many models are out there? Let's deep dive into profound ones.

Different Learning Models:

Supervised learning

Generative AI

Unsupervised 

Reinforcement 

Trained on labelled data where input is mapped with target output.

Aim to generate new data samples that are similar to training data

Unlabeled data sees patterns and seeks to uncover hidden relationships.

Focuses on decision making in an environment, agent-based learning.

E.g: image classification, speech recognition, natural language processing, regression tasks, etc.

Image, text generation, style transfer, creating synthetic data

Clustering similar documents, anomaly detection, and feature extraction. (ChatGPT is based on this model only)

Robotics, game playing, recommendation systems and optimizing processes.

In short:

*Supervised learning: is about learning from labeled data to make predictions or classifications.

  • Generative AI: is about creating new data samples that resemble the training data.
  • Unsupervised learning: is about finding patterns and structures in unlabeled data.
  • Reinforcement learning: is about learning to make sequences of decisions to maximize rewards in dynamic environments

Talking about ChatGPT

ChatGPT (Generative Pre-trained Transformer) is based on unsupervised learning using a deep learning model called a Transformer. The Transformer architecture is a key component of ChatGPT, allowing it to understand and generate text through unsupervised learning from vast amounts of text data.

Short explanation:

  1. Transformer Architecture: The Transformer architecture is the foundation of models like GPT-3 and ChatGPT. It uses unsupervised learning techniques to pre-train on large text corpora, allowing the model to learn grammar, facts, and language patterns from the text. The Transformer's self-attention mechanism is a crucial aspect of this architecture, enabling it to capture context and relationships between words.

  2. Unsupervised Learning: ChatGPT, like its predecessors, is primarily trained using unsupervised learning. During pretraining, it learns to predict the next word in a sentence based on context, without explicit supervision. This process involves massive amounts of text data from the internet, allowing the model to gain broad language understanding.

Evolution of GPT (by GPT)

Important term "parameters" refers to the trainable variables or weights that the model uses to make predictions and understand patterns in data. These parameters are essential for the model's ability to learn and generate text effectively.

  1. GPT-1 (June 2018):
  • GPT-1, or simply GPT, was introduced by OpenAI in June 2018.

  • It was the first model in the GPT series and had 117 million parameters.

  • GPT-1 demonstrated the power of unsupervised pre-training on large text corpora for various NLP tasks.

  1. GPT-2 (February 2019):
  • GPT-2, unveiled in February 2019, was a significant leap in terms of scale and capabilities.

  • It had 1.5 billion parameters but was initially deemed too "dangerous" to release publicly due to concerns about generating potentially harmful content.

  • OpenAI eventually released GPT-2 to the public in November 2019 with scaled-down versions and gradually increasing the model size.

  1. GPT-3 (June 2020):
  • GPT-3, introduced in June 2020, marked a substantial increase in scale, with 175 billion parameters.

  • It demonstrated remarkable language understanding and generation abilities across a wide range of tasks, including translation, text completion, and question-answering.

  1. GPT-3.5 (January 2022)

    • Developed in January 2022 has improved text generation quality and enhanced prompt engineering.

    • GPT-3.5 is an upgraded version of GPT-3 that has fewer parameters.

    • GPT-3.5 includes a fine-tuning process for machine learning algorithms that uses human feedback to improve accuracy and effectiveness.

  1. GPT-4 (March 2023)

    • GPT-4 is available through the paid chatbot product ChatGPT Plus and OpenAI's API.

    • OpenAI's new GPT-4 language model has one trillion parameters.

    • It can respond to real time data, not like GPT-3/3.5 which is trained on 2021 data.

    • GPT-4 can accept images as input, while GPT-3.5 only accepts text.

    • GPT-4 has a short term memory of around 64,000 words, while GPT-3.5 has a short term memory of around 8,000 words.



"I genuinely hope you have learned something new in the domain of AI, and now you can further expand your knowledge by GPTing :)"