AI Glossary

The A-Z of
Artificial Intelligence

A comprehensive, jargon-free reference for every AI term you'll encounter — from attention mechanisms to zero-shot learning.

A

Attention Mechanism

A neural network technique that allows models to focus on the most relevant parts of the input when producing an output. The foundation of the Transformer architecture.

Artificial General Intelligence (AGI)

A hypothetical AI system with the ability to understand, learn, and apply intelligence across any domain — matching or exceeding human-level cognitive abilities.

Autoregressive Model

A model that generates output one token at a time, where each new token is predicted based on all previously generated tokens. GPT models are autoregressive.

B

Backpropagation

The algorithm used to train neural networks. It calculates the gradient of the loss function and propagates errors backward through the network to update weights.

BERT

Bidirectional Encoder Representations from Transformers — Google's pre-trained language model that reads text in both directions to understand context. Revolutionized NLP benchmarks.

Bias (in AI)

Systematic errors in model output caused by imbalanced training data or flawed assumptions. Can lead to unfair or discriminatory predictions across different demographic groups.

C

Chain-of-Thought (CoT) Prompting

A prompting strategy where the model is instructed to show its reasoning step-by-step before giving a final answer. Significantly improves performance on math and logic tasks.

Computer Vision (CV)

A field of AI that enables machines to interpret and understand visual information from images, videos, and real-world scenes.

Convolutional Neural Network (CNN)

A deep learning architecture designed for processing grid-structured data like images. Uses convolutional filters to detect features at multiple scales.

D

Deep Learning

A subset of machine learning that uses neural networks with many layers (deep architectures) to learn complex patterns from data.

Diffusion Model

A generative model that learns to create data by reversing a gradual noising process. Powers image generators like DALL-E 3, Stable Diffusion, and Midjourney.

E

Embedding

A dense vector representation of text, images, or other data that captures semantic meaning. Similar items have similar embeddings in the vector space.

Epoch

One complete pass through the entire training dataset during model training. Most models require multiple epochs to converge.

F

Few-Shot Learning

A technique where the model is given a few examples of the desired task in the prompt to guide its output. Contrasts with zero-shot (no examples) learning.

Fine-Tuning

The process of taking a pre-trained model and training it further on a smaller, domain-specific dataset to specialize its performance for a particular task.

Foundation Model

A large AI model trained on broad data that can be adapted to many downstream tasks. Examples: GPT-4, Claude, Gemini, LLaMA.

G

GAN (Generative Adversarial Network)

A framework where two neural networks (generator & discriminator) compete: one creates fake data, the other detects fakes. This adversarial process produces highly realistic outputs.

GPT (Generative Pre-trained Transformer)

OpenAI's family of large language models that generate text by predicting the next token. GPT-4 is the basis for ChatGPT.

Gradient Descent

An optimization algorithm that iteratively adjusts model parameters in the direction that minimizes the loss function — the core mechanism of neural network training.

H

Hallucination

When an AI model generates plausible-sounding but factually incorrect or fabricated information. A major challenge in LLM reliability.

Hyperparameter

Configuration settings (learning rate, batch size, number of layers) that are set before training begins, as opposed to parameters learned during training.

I

Inference

The process of using a trained model to make predictions on new, unseen data. The "production" phase after training.

J

Jailbreak

A prompt technique that bypasses an AI model's safety guardrails to make it produce restricted or unintended output. A key concern in AI safety research.

K

Knowledge Distillation

A model compression technique where a smaller "student" model is trained to replicate the behavior of a larger "teacher" model, preserving performance with fewer parameters.

L

Large Language Model (LLM)

A neural network with billions of parameters trained on massive text corpora to understand and generate human language. Examples: GPT-4, Claude, Gemini, LLaMA.

LoRA (Low-Rank Adaptation)

An efficient fine-tuning method that adds small, trainable matrices to frozen pre-trained weights — enabling adaptation with minimal compute and memory.

M

Machine Learning (ML)

A subset of AI where algorithms learn patterns from data without being explicitly programmed. Includes supervised, unsupervised, and reinforcement learning.

Multimodal

An AI model capable of processing and generating multiple types of data — text, images, audio, and video. GPT-4V and Gemini are multimodal models.

N

Natural Language Processing (NLP)

The field of AI focused on enabling machines to understand, interpret, and generate human language.

Neural Network

A computing system inspired by biological neurons, consisting of interconnected layers of nodes that process information and learn patterns from data.

O

Overfitting

When a model performs well on training data but poorly on unseen data — it has memorized the training set rather than learning generalizable patterns.

P

Prompt Engineering

The practice of designing and optimizing input text (prompts) to elicit desired outputs from LLMs. A core skill in working with AI systems.

Parameter

A value within a neural network (weights and biases) that is learned during training. GPT-4 has over 1 trillion parameters.

Q

Quantization

A technique to reduce model size and speed up inference by converting weights from higher precision (FP32) to lower precision (INT8 or INT4) with minimal quality loss.

R

RAG (Retrieval-Augmented Generation)

A technique that combines a retrieval system (searching a knowledge base) with a generative model to produce more accurate, grounded answers.

Reinforcement Learning (RL)

A learning paradigm where an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties. RLHF uses human feedback as the reward signal.

S

Self-Attention

A mechanism where each token in a sequence computes attention scores with every other token, enabling the model to capture long-range dependencies. The core of Transformers.

Stable Diffusion

An open-source latent diffusion model for text-to-image generation. Runs locally on consumer GPUs, making AI art creation accessible.

T

Transformer

The neural network architecture introduced in "Attention Is All You Need" (2017). Uses self-attention to process sequences in parallel. The backbone of all modern LLMs.

Token / Tokenization

Tokens are the basic units of text that LLMs process — typically words, subwords, or characters. Tokenization is the process of splitting text into these units.

Temperature

A parameter controlling randomness in model output. Low temperature (0.1) = deterministic and focused. High temperature (1.0+) = creative and diverse.

U

Unsupervised Learning

A type of machine learning where the model learns patterns from unlabeled data without explicit supervision. Used for clustering, anomaly detection, and representation learning.

V

Vector Database

A specialized database that stores and retrieves high-dimensional vectors (embeddings). Core infrastructure for semantic search and RAG systems. Examples: Pinecone, Weaviate, Chroma.

W

Weights

Numerical values in a neural network that determine the strength of connections between neurons. Adjusted during training to minimize the loss function.

X

XAI (Explainable AI)

Methods and techniques that make AI decisions interpretable and transparent to humans — critical for trust, debugging, and regulatory compliance.

Y

YOLO (You Only Look Once)

A family of real-time object detection models that process an entire image in a single forward pass, making them extremely fast for video and edge applications.

Z

Zero-Shot Learning

The ability of a model to perform a task it was never explicitly trained on, by leveraging its general knowledge. No task-specific examples are provided in the prompt.