In-context learning: the magic of AI that "learns" on the spot

Discover how language models learn on the go and why this changes the rules of the game

Feb 15, 2025

Have you ever wished for a superintelligent assistant that could learn any task in seconds? Well, it turns out AI can already do something like that. Welcome to the fascinating world of in-context learning (ICL).

If you remember, I recently talked about a fascinating scientific article titled "The Prompt Report: A Systematic Survey of Prompting Techniques". Today we'll look at one of the most fundamental techniques it describes: in-context learning or learning in context.

What exactly is in-context learning (ICL)?

ICL refers to the ability of large language models (LLMs) to learn tasks simply by providing examples within the prompt, without needing to update their parameters.

Imagine you have a robot that's never cooked from scratch. Instead of reprogramming it entirely, you just show it a couple of recipes and - just like that! - it can prepare a gourmet feast. That's ICL in action.

For example, imagine you want an LLM to classify movie reviews as positive or negative. You could use ICL like this:

Classify the following movie reviews as positive or negative:

Review: "This movie was incredible, kept me on the edge of my seat."
Classification: Positive

Review: "What a waste of time, the plot was confusing and the acting terrible."
Classification: Negative

Review: "Can't wait to see the sequel, it was spectacular!"
Classification: Positive

Now classify this one:
Review: "The movie had potential, but the execution left much to be desired."
Classification:

Notice how we give an instruction, provide a few examples, and finally ask it to classify the last review.

Interestingly, when I gave this prompt to ChatGPT, its response was:

Review: "The movie had potential, but the execution left much to be desired."
Classification: Negative

Key Components of ICL

Examples: These are the cases we provide in the prompt. They can be question-answer pairs, task demonstrations, or any other format.
Instructions: Often, in addition to examples, we include explicit instructions about the task to perform.

Types of ICL Based on Number of Examples

Few-shot prompting: We provide several examples (typically 2 to 10):

Translate the following words to French: 
Cat: Chat 
Dog: Chien 
House: Maison 
Sun: Soleil

Now translate: Moon:

One-shot prompting: We provide just one example:

Complete the analogy: 
Big is to small as tall is to short.

Now complete: 
Fast is to slow as hot is to:

Zero-shot prompting: We don't provide any examples, just instructions:

Generate an advertising slogan for a new organic coffee brand.

Factors Affecting Few-shot Prompting Performance

Have you ever noticed how AI sometimes seems to understand everything and other times doesn't understand anything? Well, these factors are the key to solving this mystery:

Number of examples: More isn't always better. The optimal number varies depending on the task and model. For instance, for simple tasks, 3-5 examples might be sufficient, while for more complex tasks, you might need 10 or more.
Order of examples: Surprisingly, the order can significantly affect performance. Think of it like teaching a child: you start with the basics and gradually increase complexity. This same principle can help our AI learn more effectively.
Label distribution: It's important to maintain a balance in the classes or categories of examples. For instance, if you're classifying sentiments, make sure to include a similar number of positive and negative examples. After all, we don't want our AI becoming a pessimist, do we?
Label quality: Interestingly, some studies suggest that label accuracy might not be critical in all cases. However, for high-precision tasks, it's better to use correctly labeled examples.
Example format: How we present the examples matters. "Q: {question} A: {answer}" is common, but not always optimal. Experiment with different formats to see what works best for your specific task. It's like finding the perfect hairstyle for your AI.
Example similarity: Generally, examples similar to the target task work better, but sometimes diversity can be beneficial. For instance, when translating, using examples from different semantic fields can improve the model's robustness.

Advanced Few-shot Prompting Techniques

Now let's get to the really exciting part. Think of these techniques as advanced magic tricks in an AI expert's repertoire:

K-nearest neighbor (KNN): dynamically selecting examples

This technique goes beyond static few-shot prompting by selecting examples dynamically for each new task. Here's how it works:

You have a large database of labeled examples (much larger than what would fit in a normal prompt).
For each new task, an algorithm analyzes its content and searches the database for the K most semantically similar examples.
Only these K most relevant examples are included in the prompt along with the new task.

The innovative thing here is that the prompt is constructed uniquely for each task, instead of always using the same examples.

For instance, imagine you have thousands of product reviews in your database. If you want to classify a new review about a smartphone, the system would automatically select other smartphone reviews for context, rather than reviews about cookware or clothing. It's like having an expert who knows exactly which past experiences are most relevant to the current situation.

Vote-K: democracy meets AI... prompting goes to the polls!

This is a two-stage method for selecting and labeling useful examples:

First stage: The model proposes unlabeled examples that could be useful for the task.
Second stage: A human annotator labels these examples.
The labeled examples are then used for prompting.

Example: Let's say you want to classify tweets about climate change. In the first stage, the model might propose relevant tweets:

1. "Global temperatures have risen 1.1°C since pre-industrial era."
2. "Just bought an electric car to reduce my carbon footprint."
3. "Sea levels are rising at an alarming rate."

Then, a human would label these tweets (for example, as "scientific fact," "personal action," "consequence"), and these labeled examples would be used in the final prompt. Think of it as the perfect partnership between human expertise and AI capabilities!

Self-generated in-context learning (SG-ICL): when AI becomes its own teacher

This technique uses the LLM itself to generate examples. Here's how it works:

You ask the model to generate examples for a specific task.
You use these generated examples as context for the final prompt.

Example: Let's say you want the model to write news headlines. First, you ask it to generate examples:

Generate 3 examples of technology news headlines:

To which the model responds:

"New Google AI surpasses humans in medical diagnosis"
"SpaceX successfully launches 60 more Starlink satellites"
"Apple announces foldable iPhone for 2025"

Finally, you give the final instruction:

Now, generate a headline about a breakthrough in renewable energy:

It's like asking AI to create its own homework exercises. Self-learning at its finest!

Prompt mining: becoming a digital Indiana Jones in search of prompt patterns

This technique searches for optimal prompt formats by analyzing large bodies of text. The idea is to find natural patterns of questions and answers that occur frequently in language.

Example: After analyzing a large body of text, you might discover that the "Did you know...?" format followed by a fact is very common. You could use this format in your prompts:

Complete these interesting facts: 
Did you know that koalas sleep up to 20 hours a day? 
Did you know that a hummingbird's heart beats up to 1,260 times per minute? 
Did you know that giraffes only need 2 hours of sleep per day?

Now, complete this one: Did you know that elephants...?

It's like being Indiana Jones of the digital era, uncovering hidden patterns in language that lead us to prompting gold!

Why is in-context learning important?

ICL is revolutionary because it allows LLMs to be incredibly versatile. We can "reprogram" these models for various tasks without needing to retrain them, simply by adjusting our prompts. This has several important implications:

Flexibility: We can quickly adapt a model to new tasks without retraining.
Efficiency: Saves time and computational resources by avoiding constant retraining (does anyone here retrain their models?).
Personalization: Allows adjusting model behavior for specific user or application needs.
Experimentation: Makes it easy to quickly test different approaches to solving problems.

Imagine having an AI personal assistant that adapts instantly to whatever you need - one minute it's guiding you through complex tax forms, the next it's helping you perfect that signature recipe you've been working on. That's the power of ICL in action. Who needs a one-task specialist when you can have an AI that masters any task on the spot?

Limitations and considerations

Despite its power, ICL also has several limitations:

Context space: Think of it like packing for a trip - just as you can't fit your entire wardrobe in a carry-on, models can only handle a limited number of examples at once. Every token counts!
Consistency: Even the best AI can be unpredictable sometimes. It's like having a brilliant but occasionally forgetful colleague - mostly reliable, but you might need a backup plan for those off days.
Example dependency: Your AI is only as good as the examples you feed it. Just like learning any new skill, the quality of the training matters - you wouldn't learn tennis from watching beginners make mistakes, would you?

The future of ICL

In-context learning is just the tip of the iceberg in the fascinating world of prompting. This technique has opened new possibilities in how we interact with and utilize large language models. As research advances, we're likely to see even more sophisticated techniques that improve ICL's precision and efficiency.

Now that you know how ICL works, what ideas do you have for using it? Whatever it may be, remember: in the world of ICL, the only limit is your imagination. So start experimenting!

What do you think about ICL? Have you tried this technique?

See you next time!

Germán.

Note: This is part of a series of posts about prompting techniques. In upcoming posts in this series, we'll explore other techniques like chain-of-thought, zero-shot, and much more.

Hey! I'm Germán, and I write about AI in both English and Spanish. This article was first published in Spanish in my newsletter AprendiendoIA, and I've adapted it for my English-speaking friends at My AI Journey. My mission is simple: helping you understand and leverage AI, regardless of your technical background or preferred language. See you in the next one!