Skip to content
AI Viewer
Reference

AI Glossary

Every AI term explained in plain English — no jargon, no PhD required.

91 terms defined · Updated March 2026

A
7 terms

AGI (Artificial General Intelligence)

A hypothetical form of AI that can understand, learn, and apply knowledge across any intellectual task a human can do. No AGI system exists today; current AI is narrow, meaning it excels at specific tasks but cannot generalize the way people do.

AI Agent

Software that uses a large language model to plan steps, call tools, and complete multi-step tasks with minimal human oversight. Agents can browse the web, write code, or manage workflows autonomously.

Read our deep dive

AI Alignment

The research discipline focused on ensuring AI systems behave in ways that match human intentions and values. Alignment work ranges from reward modeling to constitutional AI methods that constrain model outputs.

Anomaly Detection

A machine-learning technique that identifies data points, events, or observations that deviate significantly from expected patterns. Widely used in fraud detection, cybersecurity, and manufacturing quality control.

API (Application Programming Interface)

A set of rules that lets one piece of software talk to another. In AI, an API is how developers send prompts to a model and receive responses, typically over HTTP with JSON payloads.

Attention Mechanism

The core innovation inside Transformers. Instead of reading text one word at a time, attention lets the model weigh every word against every other word simultaneously, capturing long-range relationships in a single pass.

Autoregressive Model

A model that generates output one token at a time, where each new token is conditioned on all previously generated tokens. GPT-series models and most modern LLMs are autoregressive.

B
3 terms

Benchmark

A standardized test used to measure and compare AI model performance. Examples include MMLU for general knowledge, HumanEval for coding, and HellaSwag for commonsense reasoning.

Bias (AI)

Systematic errors in AI outputs caused by skewed training data, flawed labeling, or design choices that favor certain groups over others. Bias can surface as stereotyped language, unequal accuracy across demographics, or exclusionary recommendations.

C
7 terms

Chain-of-Thought (CoT)

A prompting technique that asks the model to show its reasoning step by step before giving a final answer. CoT dramatically improves accuracy on math, logic, and multi-step reasoning tasks.

ChatGPT

OpenAI's conversational AI product built on GPT-series models. It popularized the chat-based interface for interacting with LLMs and remains one of the most widely used AI tools globally.

Read our deep dive

Claude

Anthropic's family of large language models, designed with a focus on safety, honesty, and helpfulness. Claude models support long context windows and are available through API and consumer products.

Read our deep dive

CLIP (Contrastive Language-Image Pre-training)

An OpenAI model trained to understand the relationship between images and text by learning from millions of image-caption pairs. CLIP enables zero-shot image classification and powers many text-to-image search systems.

Computer Vision

The branch of AI that enables machines to interpret and act on visual information such as photos, videos, and medical scans. Applications range from facial recognition to autonomous vehicles to quality inspection on factory lines.

Constitutional AI

An alignment technique developed by Anthropic where the model critiques and revises its own outputs according to a set of written principles (a "constitution"), reducing the need for human feedback on every example.

Context Window

The maximum number of tokens a model can process in a single conversation. A larger context window means the model can reference more prior text. Modern models range from 8K to over 1M tokens.

D
5 terms

Data Augmentation

Techniques for expanding a training dataset by creating modified versions of existing data, such as rotating images, paraphrasing text, or adding noise. Augmentation helps models generalize better without collecting new data.

Deep Learning

A subset of machine learning that uses neural networks with many layers ("deep" architectures) to learn patterns from large amounts of data. Deep learning drives breakthroughs in vision, language, speech, and game-playing AI.

Diffusion Model

A generative model that creates images by starting with random noise and gradually removing it, step by step, until a coherent image emerges. Stable Diffusion, DALL-E, and Midjourney all use diffusion-based architectures.

Discriminator

In a GAN (Generative Adversarial Network), the discriminator is the model that tries to distinguish real data from fake data produced by the generator. The two models train against each other in a competitive loop.

Distillation

The process of training a smaller, faster model to replicate the behavior of a larger, more capable model. Distillation preserves much of the large model's quality while drastically reducing compute costs and latency.

E
4 terms

Embedding

A numerical representation of text, images, or other data as a dense vector of numbers. Embeddings capture meaning so that semantically similar items end up close together in vector space, enabling search, clustering, and recommendation systems.

Encoder

The part of a neural network that compresses input data into a compact internal representation. In Transformer architectures, encoder-only models like BERT are used for understanding tasks, while encoder-decoder models like T5 handle translation and summarization.

Ethical AI

An umbrella term for practices, principles, and governance frameworks that aim to make AI development and deployment fair, transparent, accountable, and respectful of human rights.

Evaluation Metric

A quantitative measure used to assess model performance. Common metrics include accuracy, F1 score, perplexity, BLEU (for translation), and human preference ratings (for conversational AI).

F
4 terms

Few-Shot Learning

A technique where a model learns to perform a task from just a handful of examples, typically provided in the prompt. Few-shot learning lets you customize model behavior without fine-tuning or retraining.

Fine-Tuning

The process of taking a pre-trained model and continuing its training on a smaller, task-specific dataset. Fine-tuning adapts general capabilities to specialized domains like medical diagnosis, legal analysis, or brand voice.

Read our deep dive

Foundation Model

A large AI model trained on broad data at scale that can be adapted to many downstream tasks. GPT-4, Claude, Llama, and Gemini are all foundation models. The term emphasizes their role as a starting point, not a finished product.

Frontier Model

The most capable AI model available at any given time, typically defined by leading performance on benchmarks and emergent abilities not seen in smaller models. Frontier models raise the most urgent safety and policy questions.

G
7 terms

GAN (Generative Adversarial Network)

A framework where two neural networks, a generator and a discriminator, compete against each other. The generator creates synthetic data, and the discriminator tries to tell it apart from real data. GANs were groundbreaking for image generation before diffusion models overtook them.

Gemini

Google DeepMind's family of multimodal AI models, capable of processing text, images, audio, and video. Gemini competes directly with GPT-4 and Claude in the frontier model space.

Generative AI

AI systems that create new content, whether text, images, music, code, or video, rather than just classifying or analyzing existing data. The generative AI wave began with GPT-3 and accelerated with ChatGPT, Midjourney, and Stable Diffusion.

Read our deep dive

GPT (Generative Pre-trained Transformer)

A family of autoregressive language models created by OpenAI. "Generative" means it produces text; "Pre-trained" means it learned from a massive corpus before being fine-tuned; "Transformer" is the underlying architecture.

Gradient Descent

The core optimization algorithm used to train neural networks. It works by calculating how wrong the model's predictions are, then nudging the model's weights in the direction that reduces that error, step by step.

Grounding

Connecting a model's outputs to verifiable external sources such as databases, documents, or search results. Grounding reduces hallucination by anchoring responses in real, retrievable information.

Guardrails

Safety mechanisms that constrain AI model behavior, such as content filters, output validators, and policy layers. Guardrails prevent models from generating harmful, off-topic, or policy-violating content.

H
3 terms

Hallucination

When an AI model generates information that sounds plausible but is factually wrong or entirely fabricated. Hallucinations are a fundamental limitation of current LLMs and a major obstacle to trustworthy AI deployment.

Read our deep dive

Hyperparameter

A setting chosen before training begins, such as learning rate, batch size, or number of layers, that controls how the model learns. Unlike regular parameters (weights), hyperparameters are not learned from data.

Human-in-the-Loop (HITL)

A design pattern where human judgment is integrated into an AI workflow, either to review outputs, correct errors, or make final decisions. HITL systems balance automation speed with human accountability.

I
2 terms

Inference

The process of running a trained model to produce outputs from new inputs. When you send a prompt to ChatGPT and receive a response, that's inference. It is computationally cheaper than training but still significant at scale.

Instruction Tuning

A fine-tuning method where the model is trained on datasets of instructions paired with desired responses. This teaches the model to follow human directions more reliably and is a key step in making raw LLMs usable as assistants.

J
1 term

JSON Mode

A model configuration that constrains the output to valid JSON format. JSON mode is essential for building reliable AI pipelines where downstream code needs structured, parseable data instead of free-form text.

K
2 terms

Knowledge Distillation

A training technique where a smaller "student" model learns to mimic the outputs of a larger "teacher" model. The student captures most of the teacher's capability at a fraction of the size and cost, making deployment more practical.

Knowledge Graph

A structured database of entities and their relationships, often represented as nodes and edges. AI systems use knowledge graphs to answer factual queries, power recommendations, and ground language model outputs in verified facts.

L
4 terms

Large Language Model (LLM)

A neural network trained on massive text datasets that can generate, summarize, translate, and reason about language. LLMs like GPT-4, Claude, and Llama are the engines behind modern conversational AI, coding assistants, and content tools.

Read our deep dive

Latent Space

The compressed, high-dimensional representation space that a model uses internally. In image generation, the latent space is where the model manipulates abstract features before decoding them into visible pixels.

Llama

Meta's family of open-weight large language models. Llama models can be downloaded, fine-tuned, and deployed locally, making them central to the open-source AI ecosystem.

LoRA (Low-Rank Adaptation)

An efficient fine-tuning technique that trains only a small number of additional parameters instead of updating the entire model. LoRA dramatically reduces the memory and compute needed to customize large models for specific tasks.

M
5 terms

Machine Learning (ML)

The broad field of computer science where systems learn patterns from data rather than following explicit rules. Machine learning encompasses supervised, unsupervised, and reinforcement learning, and is the foundation on which deep learning and modern AI are built.

Midjourney

A generative AI tool that creates images from text descriptions. Known for its distinctive artistic style and high-quality outputs, Midjourney operates through a Discord-based interface and a web platform.

Read our deep dive

Mixture of Experts (MoE)

An architecture where a model contains multiple specialized sub-networks ("experts") and a routing mechanism that activates only a few experts for each input. MoE allows models to scale to trillions of parameters while keeping inference costs manageable.

Model Collapse

A degradation that occurs when AI models are trained on AI-generated data. Over successive generations, the model's output distribution narrows, loses diversity, and eventually produces repetitive or nonsensical results.

Multimodal

Describes AI systems that can process and generate more than one type of data, such as text and images, text and audio, or text, images, and video simultaneously. GPT-4V, Gemini, and Claude with vision are multimodal models.

Read our deep dive
N
3 terms

Natural Language Processing (NLP)

The branch of AI focused on enabling machines to understand, interpret, and generate human language. NLP powers chatbots, translation services, sentiment analysis, search engines, and voice assistants.

Neural Network

A computing system inspired by the structure of biological brains, consisting of layers of interconnected nodes ("neurons") that process data by passing signals forward and adjusting connection weights during training.

Next-Token Prediction

The training objective of autoregressive language models: given a sequence of tokens, predict the next one. Despite its simplicity, this objective, applied at scale, produces models capable of reasoning, coding, and creative writing.

O
3 terms

ONNX (Open Neural Network Exchange)

An open-source format for representing machine learning models. ONNX lets you train a model in one framework (like PyTorch) and deploy it in another (like TensorFlow), improving portability across tools and hardware.

Open Source AI

AI models and tools released with publicly available weights, code, or training data. Open-source AI democratizes access, enables transparency, and allows communities to build on and audit each other's work. Llama and Mistral are prominent examples.

Overfitting

When a model performs well on training data but poorly on new, unseen data because it has memorized specific examples rather than learning generalizable patterns. Overfitting is one of the most common failure modes in machine learning.

P
4 terms

Parameter

A learned value inside a neural network, essentially a number (weight or bias) that the model adjusts during training. Modern LLMs have billions to trillions of parameters; more parameters generally enable greater capability but require more compute.

Perplexity

A metric that measures how well a language model predicts text. Lower perplexity means the model is less "surprised" by the data, indicating better language understanding. It is commonly used to compare language models during development.

Pre-Training

The initial phase of training where a model learns general patterns from a massive, diverse dataset. Pre-training gives the model broad knowledge and language ability before it is fine-tuned or instruction-tuned for specific tasks.

Prompt Engineering

The practice of crafting effective instructions (prompts) to get better, more reliable outputs from AI models. Techniques include few-shot examples, chain-of-thought, system prompts, and structured output formatting.

Read our deep dive
Q
2 terms

Quantization

Reducing the precision of a model's numerical weights (e.g., from 32-bit to 4-bit) to shrink file size and speed up inference. Quantization enables large models to run on consumer hardware with modest quality trade-offs.

Query

In AI context, a query is the input sent to a model, database, or search system to retrieve or generate information. In RAG systems, the user's question is encoded as a query vector and matched against a knowledge base.

R
5 terms

RAG (Retrieval-Augmented Generation)

A technique that combines a language model with an external knowledge retrieval system. Before generating an answer, the model searches relevant documents and uses that retrieved context to produce more accurate, grounded responses.

Read our deep dive

Reasoning Model

A model specifically trained or prompted to perform multi-step logical reasoning before producing an answer. Examples include OpenAI's o1 and o3 series, which use internal chain-of-thought to solve complex problems.

Reinforcement Learning (RL)

A machine learning paradigm where an agent learns by taking actions in an environment and receiving rewards or penalties. RL is how game-playing AIs like AlphaGo learn, and it is a key component of RLHF for language models.

Retrieval

The process of finding and fetching relevant documents or data from a knowledge base, search index, or vector database. In RAG pipelines, retrieval quality directly determines the accuracy of the model's final output.

RLHF (Reinforcement Learning from Human Feedback)

A training technique where human evaluators rank model outputs by quality, and those rankings are used to train a reward model that guides the AI's behavior. RLHF is how models like ChatGPT learn to be helpful, harmless, and honest.

S
5 terms

Self-Supervised Learning

A training approach where the model creates its own labels from raw data, such as predicting masked words in a sentence or the next frame in a video. Self-supervised learning enables training on vast unlabeled datasets and is the backbone of LLM pre-training.

Stable Diffusion

An open-source diffusion model for generating images from text prompts. Unlike DALL-E and Midjourney, Stable Diffusion can be downloaded and run locally, making it a cornerstone of the open-source AI art ecosystem.

Synthetic Data

Data generated by AI models rather than collected from real-world sources. Synthetic data is used to augment training sets, protect privacy, and create examples for rare edge cases that are difficult to gather organically.

System Prompt

A hidden instruction given to a language model at the start of a conversation that defines its role, personality, constraints, and behavior. System prompts shape how the model responds but are not visible to end users.

T
6 terms

Temperature

A setting that controls randomness in a model's output. Low temperature (e.g., 0.1) makes responses more predictable and focused; high temperature (e.g., 1.0) makes them more creative and varied. A temperature of 0 is nearly deterministic.

Token

The basic unit of text that a language model processes. A token is roughly 3-4 characters in English; "hamburger" might be split into "ham," "bur," and "ger." Pricing, context limits, and speed are all measured in tokens.

Tool Use (Function Calling)

The ability of an AI model to invoke external tools, APIs, or functions during a conversation. Tool use lets models check the weather, query databases, run code, or interact with other software in real time.

Training Data

The dataset used to teach an AI model patterns and relationships. Training data quality, size, and diversity directly determine model capabilities and biases. Modern LLMs are trained on trillions of tokens from the internet, books, and code.

Transfer Learning

The technique of taking knowledge learned on one task and applying it to a different but related task. Transfer learning is why you can fine-tune a general-purpose LLM on 1,000 medical examples and get a competent medical assistant.

Transformer

The neural network architecture introduced in 2017 that powers nearly all modern language and vision models. Transformers use self-attention to process all tokens in parallel, making them vastly more efficient and capable than earlier sequential architectures like RNNs.

U
2 terms

Underfitting

When a model is too simple to capture the patterns in its training data, resulting in poor performance on both training and test sets. Underfitting often means the model needs more parameters, more training time, or better features.

Unsupervised Learning

A machine learning approach where the model finds patterns in data without labeled examples. Clustering customers by behavior, detecting anomalies in network traffic, and learning word embeddings are all forms of unsupervised learning.

V
3 terms

Vector Database

A database optimized for storing and searching high-dimensional vectors (embeddings). Vector databases like Pinecone, Weaviate, and Chroma power semantic search, RAG systems, and recommendation engines by finding the nearest neighbors in embedding space.

Vision-Language Model (VLM)

A model that can process both images and text, enabling tasks like describing photos, answering questions about diagrams, or extracting data from screenshots. GPT-4V and Claude with vision are prominent VLMs.

ViT (Vision Transformer)

An architecture that applies the Transformer model, originally designed for text, to image recognition. ViT splits images into patches and processes them like tokens, achieving state-of-the-art results in computer vision tasks.

W
2 terms

Weight

A numerical value in a neural network that determines the strength of the connection between neurons. During training, weights are adjusted to minimize prediction errors. A model's "weights" are the complete set of learned parameters.

Word Embedding

A technique that maps words to dense numerical vectors where similar words are positioned close together. Word2Vec and GloVe are classic word embedding methods; modern embeddings from Transformers capture richer contextual meaning.

X
1 term

XAI (Explainable AI)

Methods and techniques that make AI decision-making transparent and interpretable to humans. XAI is critical in healthcare, finance, and legal contexts where users need to understand why a model made a particular prediction or recommendation.

Z
1 term

Zero-Shot Learning

A model's ability to perform a task it was never explicitly trained on, based solely on its general knowledge and the natural-language instruction given at inference time. Strong zero-shot ability is a hallmark of capable foundation models.

Missing a term?

AI vocabulary grows every week. If there is a term you want defined, let us know and we will add it.

Suggest a term