AI Learning Hub
A growing personal map of AI concepts, buzzwords, and practical knowledge โ explained simply with analogies.
This page organizes AI concepts by category so I can learn the language of AI step by step.
Artificial Intelligence
Explanation: AI is the broad field of making computers perform tasks that normally require human intelligence, such as understanding language, recognizing images, making decisions, or solving problems.
Analogy: Like teaching a computer to act as a helpful assistant that can see patterns, answer questions, and make suggestions.
Machine Learning
Explanation: Machine learning is a type of AI where systems learn patterns from data instead of being explicitly programmed with every rule.
Analogy: Instead of giving a child every rule about dogs, you show many dog pictures until they learn what dogs usually look like.
Deep Learning
Explanation: Deep learning is a type of machine learning that uses many layers of artificial neurons to learn complex patterns.
Analogy: Like a team of people passing information through multiple review levels, where each level notices something more advanced.
Data
Explanation: Data is the information AI learns from, such as text, numbers, images, audio, video, transactions, or user behavior.
Analogy: Data is like the ingredients used to cook an AI model. Bad ingredients lead to a bad meal.
Dataset
Explanation: A dataset is a structured collection of examples used to train, test, or evaluate a model.
Analogy: Like a workbook full of practice problems and answers.
Supervised Learning
Explanation: Supervised learning trains a model using examples that already include the correct answers.
Analogy: Like studying with an answer key. You try a problem, compare with the correct answer, and improve.
Unsupervised Learning
Explanation: Unsupervised learning finds patterns or groups in data without being given correct answers.
Analogy: Like sorting a pile of mixed coins by similarity without anyone telling you the coin types.
Reinforcement Learning
Explanation: Reinforcement learning trains an agent to make decisions by rewarding good actions and penalizing bad ones.
Analogy: Like training a dog with treats, or a video game player learning which moves earn more points.
Features
Explanation: Features are the input pieces of information a model uses to make a prediction.
Analogy: For home price prediction, features are things like location, square footage, bedrooms, and school rating.
Labels
Explanation: Labels are the correct answers used during supervised learning.
Analogy: If the model sees an image, the label tells it whether the image is a cat, dog, car, or house.
Classification
Explanation: Classification means predicting a category or class.
Analogy: Sorting emails into spam or not spam.
Regression
Explanation: Regression means predicting a number.
Analogy: Predicting a home price, stock price, temperature, or delivery time.
Training Data
Explanation: Training data is the data used to teach the model.
Analogy: Like practice questions before an exam.
Test Data
Explanation: Test data is separate data used to check how well the model performs on examples it has not seen.
Analogy: Like a final exam with new questions.
Artificial Neuron
Explanation: The smallest building block of a neural network. It receives numbers as inputs, does some math, and passes one output number to the next layer.
Analogy: Like a dimmer switch that receives signals from multiple wires, adds them up, and decides how brightly to glow.
Weights & Bias
Explanation: Adjustable numbers inside neurons. Weights control how much each input matters; bias shifts the output up or down. Training finds the best values.
Analogy: Like a recipe. Weights are ingredient amounts; bias is the oven setting. Training finds the best recipe.
Forward Pass
Explanation: Data flows from input through the network to produce a prediction. No learning happens here; it is just calculation.
Analogy: Like answering a quiz using what you currently know.
Input โ Hidden Layers โ Output
Activation Function
Explanation: A math function applied after a neuron. It decides whether and how strongly the neuron should fire, allowing networks to learn complex patterns.
Analogy: Like a bouncer at a club deciding who gets in.
Loss Function
Explanation: A formula that measures how wrong the model's prediction is compared to the correct answer.
Analogy: Like GPS measuring how far your current route is from the destination.
Prediction vs Actual = Loss
Backpropagation
Explanation: Backpropagation works backward through the network to figure out which weights caused the error and how to adjust them.
Analogy: Like a coach watching a game replay in reverse to see who made each mistake.
Learning Rate
Explanation: Learning rate controls how big a step the model takes when updating weights.
Analogy: Like adjusting shower temperature. Big turns may overshoot; tiny turns take forever.
Overfitting
Explanation: Overfitting happens when a model memorizes training data but performs poorly on new data.
Analogy: Like a student memorizing exact exam answers but failing when questions are worded differently.
Underfitting
Explanation: Underfitting happens when a model is too simple to learn the real pattern.
Analogy: Like a student who barely studied and cannot answer even the practice questions.
Regularization
Explanation: Regularization helps prevent overfitting by discouraging overly complex models.
Analogy: Like packing only what fits in a carry-on so you bring only what is truly needed.
Gradient Descent
Explanation: Gradient descent is an optimization method that updates model parameters step by step to reduce loss.
Analogy: Like walking downhill in fog by feeling the slope under your feet.
Generative AI
Explanation: Generative AI creates new content such as text, images, audio, video, code, or summaries based on patterns it learned.
Analogy: Like a creative assistant that studied many examples and can produce something new in the same style.
Foundation Model
Explanation: A foundation model is a large model trained on broad data that can be adapted to many tasks.
Analogy: Like a general-purpose engine that can power many different vehicles.
Multimodal AI
Explanation: Multimodal AI can work with more than one type of input or output, such as text, images, audio, or video.
Analogy: Like a person who can read, listen, see, and speak.
Large Language Model
Explanation: An LLM is an AI model trained on large amounts of text to understand and generate language.
Analogy: Like a very advanced autocomplete system that can write, explain, summarize, and reason with text.
Token
Explanation: Tokens are chunks of text that models process. A token can be a word, part of a word, punctuation, or space.
Analogy: Like cutting a sentence into small Lego blocks before the model reads it.
Context Window
Explanation: The context window is the amount of text the model can consider at one time during a request.
Analogy: Like the number of pages you can keep open on your desk while answering a question.
Attention
Explanation: Attention helps a model decide which parts of the input are most important when generating an output.
Analogy: Like highlighting the most important sentences in a document before answering questions.
Embedding
Explanation: An embedding converts text, images, or other data into a list of numbers that captures meaning.
Analogy: Like giving every idea a GPS coordinate so similar ideas are close together.
Hallucination
Explanation: A hallucination is when an AI model gives an answer that sounds confident but is false or unsupported.
Analogy: Like a student guessing confidently instead of admitting they do not know.
Prompt
Explanation: A prompt is the instruction or question given to an AI model.
Analogy: Like giving directions to an employee. Clear directions usually produce better work.
System Prompt
Explanation: A system prompt gives high-level instructions about how the model should behave.
Analogy: Like company policy for an assistant before they start doing tasks.
Few-Shot Prompting
Explanation: Few-shot prompting gives the model a few examples before asking it to perform a similar task.
Analogy: Like showing a trainee three sample invoices before asking them to process a new one.
Chain-of-Thought Style Reasoning
Explanation: This is a prompting approach that encourages step-by-step reasoning, though private reasoning should usually be summarized rather than exposed.
Analogy: Like asking someone to solve a math problem carefully instead of blurting out the first answer.
Structured Output
Explanation: Structured output makes the model return data in a specific format, such as JSON that follows a schema.
Analogy: Like asking someone to fill a form instead of writing a free-form paragraph.
RAG
Explanation: RAG helps an LLM answer using retrieved information from trusted documents or databases instead of relying only on memory.
Analogy: Like an open-book exam where the assistant looks up the right page before answering.
Vector Database
Explanation: A vector database stores embeddings and helps find items with similar meaning.
Analogy: Like a library organized by meaning instead of alphabetically.
Semantic Search
Explanation: Semantic search finds results based on meaning, not just exact keywords.
Analogy: Searching for โcar repairโ can also find โauto mechanicโ because the meaning is similar.
Chunking
Explanation: Chunking splits large documents into smaller pieces so they can be embedded, searched, and retrieved.
Analogy: Like cutting a big book into useful index cards.
Retrieval
Explanation: Retrieval is the process of finding relevant documents or chunks before generating an answer.
Analogy: Like pulling the right folder from a filing cabinet before responding.
AI Agent
Explanation: An AI agent is a system that can use tools, follow goals, make decisions, and take actions across multiple steps.
Analogy: Like an assistant who can not only answer questions but also check your calendar, draft an email, and update a task list.
Tool Calling
Explanation: Tool calling lets an AI model use external tools such as APIs, databases, calculators, search, or file systems.
Analogy: Like a manager asking specialists to perform tasks instead of doing everything alone.
Agent Memory
Explanation: Agent memory stores useful information so an AI system can remember preferences, past work, or ongoing tasks.
Analogy: Like a notebook an assistant keeps so you do not repeat yourself every time.
Workflow Automation
Explanation: Workflow automation connects multiple steps so AI can help complete repeatable processes.
Analogy: Like an assembly line where each station does one part of the work.
Pretraining
Explanation: Pretraining teaches a model broad patterns from large datasets before it is specialized.
Analogy: Like going to school before training for a specific job.
Fine-Tuning
Explanation: Fine-tuning adapts a pretrained model to perform better on a specific task or style of output.
Analogy: Like taking a general doctor and training them further to become a specialist.
Hyperparameters
Explanation: Hyperparameters are settings chosen before training, such as learning rate, batch size, and number of epochs.
Analogy: Like oven temperature and baking time before making a cake.
Epoch
Explanation: An epoch is one full pass through the training dataset.
Analogy: Like reading the entire textbook one time.
Batch Size
Explanation: Batch size is the number of examples processed before the model updates its weights.
Analogy: Like grading 32 homework papers at a time before adjusting how you teach.
Accuracy
Explanation: Accuracy measures the percentage of predictions the model got right.
Analogy: Like a test score showing how many questions were answered correctly.
Precision
Explanation: Precision measures how many predicted positives were actually correct.
Analogy: If a spam filter marks 100 emails as spam, precision asks how many really were spam.
Recall
Explanation: Recall measures how many actual positives the model successfully found.
Analogy: If there were 100 spam emails, recall asks how many the filter caught.
F1 Score
Explanation: F1 score balances precision and recall into one metric.
Analogy: Like judging both how careful and how complete someone is.
Benchmark
Explanation: A benchmark is a standard test used to compare models.
Analogy: Like giving every runner the same race course to compare performance fairly.
Human Evaluation
Explanation: Human evaluation uses people to judge model outputs for quality, helpfulness, accuracy, or tone.
Analogy: Like a teacher grading an essay when automatic scoring is not enough.
Inference
Explanation: Inference is when a trained model is used to make a prediction or generate an answer.
Analogy: Training is studying; inference is taking the actual test.
API
Explanation: An API lets software systems communicate with each other.
Analogy: Like a restaurant menu: you request something from available options, and the kitchen returns it.
Latency
Explanation: Latency is how long it takes for a system to respond.
Analogy: Like waiting time after asking a question.
Throughput
Explanation: Throughput is how many requests a system can handle in a period of time.
Analogy: Like how many cars can pass through a toll booth per minute.
Monitoring
Explanation: Monitoring tracks system behavior such as errors, latency, request volume, cost, and uptime.
Analogy: Like a dashboard in a car showing speed, fuel, and warning lights.
Observability
Explanation: Observability helps teams understand why a system behaves a certain way using logs, metrics, and traces.
Analogy: Monitoring says the car is overheating; observability helps find whether it is the fan, coolant, or engine.
Model Drift
Explanation: Model drift happens when model performance gets worse because real-world data changes over time.
Analogy: Like a map becoming outdated after new roads are built.
Bias
Explanation: Bias happens when an AI system produces unfair or skewed results because of data, design, or usage problems.
Analogy: Like a judge who unknowingly favors one group because past examples were unbalanced.
Explainability
Explanation: Explainability is the ability to understand why a model made a certain prediction or recommendation.
Analogy: Like asking a loan officer to explain why an application was approved or denied.
Privacy
Explanation: Privacy means protecting sensitive user data from misuse, exposure, or unnecessary collection.
Analogy: Like keeping personal documents in a locked cabinet and only sharing what is needed.
Guardrails
Explanation: Guardrails are rules and controls that keep AI systems from producing unsafe, incorrect, or inappropriate outputs.
Analogy: Like lane markings and barriers on a highway.
AI Use Case
Explanation: An AI use case is a specific problem where AI can create value.
Analogy: Like identifying which business task needs a power tool instead of using a hammer everywhere.
ROI
Explanation: ROI measures whether the benefit of an AI solution is worth the cost.
Analogy: Like asking whether buying a machine saves enough labor or time to justify the price.
Human-in-the-Loop
Explanation: Human-in-the-loop means people review, guide, or approve AI outputs instead of letting AI act alone.
Analogy: Like autopilot with a pilot still watching and ready to take control.
AI Workflow
Explanation: An AI workflow is the end-to-end process where AI helps complete a business task.
Analogy: Like a checklist where AI handles some steps and humans handle others.
AI Coding IDE
Explanation: An AI coding IDE is a code editor or development environment with built-in AI help for writing, editing, debugging, and understanding code.
Analogy: Like having a senior developer sitting beside you while you code.
Coding Agent
Explanation: A coding agent can inspect files, understand a task, edit code, run commands, check errors, and iterate toward a solution.
Analogy: Like assigning a junior developer a task and asking them to make code changes, test them, and report back.
OpenCode
Explanation: OpenCode is a terminal-based AI coding agent that can help modify files, run commands, inspect errors, and work inside a project.
Analogy: Like a command-line developer assistant that works directly inside your codebase.
Cursor
Explanation: Cursor is an AI-powered code editor that helps with code generation, refactoring, debugging, and project navigation.
Analogy: Like VS Code with an AI pair programmer built in.
Claude Code
Explanation: Claude Code is an AI coding assistant that can work with codebases, understand files, and help implement changes through natural language instructions.
Analogy: Like asking a thoughtful engineer to inspect your project and carefully make changes.
Gemini CLI
Explanation: Gemini CLI lets you use Gemini from the command line for coding, explanation, generation, and project assistance.
Analogy: Like chatting with Gemini directly from your terminal.
Codex CLI
Explanation: Codex CLI is a command-line coding assistant that can help read, modify, and reason about code using OpenAI models.
Analogy: Like giving a coding task to an AI assistant from your terminal.
GitHub Copilot
Explanation: GitHub Copilot helps developers write code faster by suggesting completions, functions, explanations, and fixes inside the editor.
Analogy: Like autocomplete for programming, but much smarter.
Windsurf
Explanation: Windsurf is an AI-powered development environment focused on helping developers navigate, edit, and build code with AI assistance.
Analogy: Like a coding workspace where AI helps guide the flow of development.
Roo Code
Explanation: Roo Code is an AI coding agent extension that can help plan, edit, and execute coding tasks inside a development environment.
Analogy: Like a project helper that can switch between planning, coding, and debugging modes.
Language Server Protocol
Explanation: LSP is a protocol that gives coding tools smart language features like autocomplete, go-to-definition, error detection, and rename support.
Analogy: Like the grammar and spell-check engine for programming languages.
Pair Programming with AI
Explanation: Pair programming with AI means using an AI assistant as a coding partner to explain, suggest, debug, and improve code.
Analogy: Like coding with another developer who can quickly give suggestions and second opinions.
Prompting Coding Agents
Explanation: Prompting coding agents means giving clear instructions about the goal, scope, constraints, files to touch, and acceptance criteria.
Analogy: Like giving a contractor a precise work order instead of a vague request.
Safe & Scoped Code Changes
Explanation: Safe and scoped code changes mean asking the AI to modify only the needed files and avoid broad refactors or unrelated changes.
Analogy: Like telling a mechanic to fix only the brakes, not rebuild the whole car.
Build Agent vs Chat Agent
Explanation: A chat agent mostly answers questions, while a build agent can actively inspect files, edit code, run commands, and fix issues.
Analogy: A chat agent gives advice; a build agent rolls up its sleeves and works on the project.
AI Application Framework
Explanation: An AI application framework helps developers build apps that use LLMs, tools, memory, retrieval, workflows, and agents without wiring every piece manually.
Analogy: Like a construction toolkit for building AI apps instead of buying every tool separately.
LangChain
Explanation: LangChain is a framework for building LLM-powered applications and agents by connecting models, prompts, tools, memory, and external data sources.
Analogy: Like plumbing that connects the AI model to tools, documents, APIs, and workflows.
LangGraph
Explanation: LangGraph is a framework for building stateful, multi-step AI agents and workflows using graph-based control. It is useful when an agent needs durable execution, branching, memory, or human approval.
Analogy: Like a flowchart engine for AI agents where each node is a step and the graph controls what happens next.
LangSmith
Explanation: LangSmith is used to trace, debug, evaluate, and monitor LLM applications and agents.
Analogy: Like a flight recorder for AI apps that shows what happened at every step.
LlamaIndex
Explanation: LlamaIndex helps connect private or external data to LLM applications. It is commonly used for document ingestion, indexing, retrieval, and RAG workflows.
Analogy: Like a librarian that prepares your documents so an AI assistant can search and use them.
Haystack
Explanation: Haystack is a framework for building search, question-answering, and RAG pipelines using components such as retrievers, readers, generators, and document stores.
Analogy: Like a pipeline factory for search and question-answering systems.
CrewAI
Explanation: CrewAI helps developers build multi-agent systems where different agents have roles, goals, tools, and tasks that work together.
Analogy: Like assigning a team of specialists to solve a project together.
AutoGen
Explanation: AutoGen is a framework for building systems where multiple AI agents can collaborate, chat, use tools, and solve tasks through conversation.
Analogy: Like a group chat of AI specialists discussing and completing work.
Semantic Kernel
Explanation: Semantic Kernel is an SDK for integrating AI models with functions, plugins, memory, and planners, often used in enterprise-style AI applications.
Analogy: Like an orchestration layer that lets AI call business functions in a structured way.
DSPy
Explanation: DSPy is a framework for programming and optimizing language model pipelines, reducing manual prompt tweaking by treating prompts and modules more like trainable components.
Analogy: Like moving from hand-writing every instruction to building a system that can tune its own instructions.
Agno
Explanation: Agno is a framework for building AI agents with tools, memory, knowledge, and multi-agent workflows.
Analogy: Like a lightweight workshop for creating practical AI agents.
MCP
Explanation: MCP, or Model Context Protocol, is a standard way for AI apps and agents to connect to external tools, data, and services through a common interface.
Analogy: Like a universal adapter that lets AI assistants plug into many different systems.
A2A
Explanation: A2A, or Agent-to-Agent communication, refers to protocols and patterns that let AI agents communicate, delegate, and coordinate work with other agents.
Analogy: Like giving AI agents a shared language so they can work as a team.
Agent Orchestration
Explanation: Agent orchestration controls how agents, tools, memory, approvals, and steps are coordinated in a multi-step AI workflow.
Analogy: Like a conductor coordinating musicians so each one plays at the right time.
Agent State
Explanation: Agent state is the information an agent keeps track of while working through a task, such as current step, prior messages, tool results, decisions, and intermediate outputs.
Analogy: Like a project notebook that keeps track of what has already happened and what still needs to happen.
Durable Execution
Explanation: Durable execution means a long-running workflow can survive interruptions, retries, failures, or restarts without losing progress.
Analogy: Like saving your game progress so you can continue after the computer restarts.
Human-in-the-Loop Approval
Explanation: Human-in-the-loop approval means an AI workflow pauses and asks a person to review or approve an important step before continuing.
Analogy: Like a manager approving a payment before it is sent.
Tool Registry
Explanation: A tool registry is a list of tools an AI agent is allowed to use, including their names, descriptions, inputs, and rules.
Analogy: Like a toolbox inventory that tells the agent which tools exist and how to use them.
Framework vs Library
Explanation: A library gives you functions you call when needed; a framework provides a structure and often controls the flow of how your app is built.
Analogy: A library is a box of tools; a framework is a partially built house with rules for where things go.
Low-Code AI Workflow
Explanation: Low-code AI workflow tools help users build AI automations visually or with minimal code, often by connecting prompts, tools, APIs, and triggers.
Analogy: Like building an AI process with Lego blocks instead of writing all the code by hand.
Logistic Regression
Explanation: Logistic regression is a classification algorithm used to predict yes/no or category outcomes, such as spam vs not spam or approved vs denied.
Analogy: Like a decision boundary that says, 'If the score is above this line, choose yes; otherwise choose no.'
Inputs โ Probability โ Class
Naive Bayes
Explanation: Naive Bayes is a classification algorithm based on probability. It assumes features are mostly independent, which makes it fast and useful for text classification.
Analogy: Like guessing the topic of a document by counting clue words.
K-Nearest Neighbors
Explanation: KNN classifies a new example by looking at the most similar nearby examples and choosing the majority label.
Analogy: Like deciding a neighborhood's vibe by looking at the closest houses around it.
New point โ nearest neighbors โ majority class
Decision Tree
Explanation: A decision tree makes predictions by asking a sequence of yes/no questions until it reaches an answer.
Analogy: Like a flowchart that guides you to a decision.
Random Forest
Explanation: Random forest combines many decision trees and averages or votes across them to make a more reliable prediction.
Analogy: Instead of asking one expert, you ask a panel of experts and go with the majority.
Support Vector Machine
Explanation: SVM finds the best boundary that separates classes with the widest possible margin.
Analogy: Like drawing the cleanest fence between two groups of animals while leaving maximum space on both sides.
Linear Regression
Explanation: Linear regression predicts a number by fitting a straight-line relationship between inputs and output.
Analogy: Like drawing the best straight line through points on a chart.
x โ y = mx + b โ predicted number
Support Vector Regression
Explanation: Support Vector Regression predicts numeric values using the same margin-based idea as SVM, but for regression problems.
Analogy: Like drawing a prediction road with a tolerance lane around it.
Lasso Regression
Explanation: Lasso regression predicts numbers while shrinking less useful feature weights toward zero, which can help with feature selection.
Analogy: Like packing only the most useful tools and leaving unnecessary ones behind.
Ridge Regression
Explanation: Ridge regression is linear regression with regularization that keeps weights smaller to reduce overfitting.
Analogy: Like telling the model not to rely too heavily on any one clue.
K-Means Clustering
Explanation: K-means groups data into K clusters by finding center points and assigning examples to the nearest center.
Analogy: Like sorting people into groups based on who stands closest together in a room.
Data points โ cluster centers โ groups
DBSCAN
Explanation: DBSCAN finds clusters based on dense regions of points and can identify outliers as noise.
Analogy: Like spotting crowded groups at a party and ignoring people standing alone.
Gaussian Mixture Model
Explanation: A Gaussian Mixture Model groups data by assuming it comes from a mixture of bell-shaped distributions.
Analogy: Like guessing that a crowd is made of several overlapping groups, each with its own center and spread.
Agglomerative Hierarchical Clustering
Explanation: This clustering method starts with each item as its own group, then repeatedly merges the closest groups into a hierarchy.
Analogy: Like building a family tree of similar items.
Mean Shift Clustering
Explanation: Mean shift finds clusters by moving points toward the densest nearby area.
Analogy: Like people gradually walking toward the busiest spot in a park.
Principal Component Analysis
Explanation: PCA reduces many features into fewer important dimensions while preserving as much variation as possible.
Analogy: Like summarizing a large spreadsheet into the few columns that explain most of the pattern.
Q-Learning
Explanation: Q-learning teaches an agent which action is best in each situation by learning action values from rewards.
Analogy: Like learning the best moves in a game by trying actions and remembering what earns points.
Markov Decision Process
Explanation: An MDP is a mathematical way to model decision-making where an agent moves between states, takes actions, and receives rewards.
Analogy: Like a board game where each move changes your position and score.
Gradient Boosting
Explanation: Gradient boosting builds many weak models one after another, where each new model focuses on fixing previous mistakes.
Analogy: Like a team of editors where each editor improves the previous draft.
XGBoost
Explanation: XGBoost is a powerful and efficient gradient boosting algorithm widely used for structured/tabular data.
Analogy: Like a highly optimized team of decision trees that correct each other's mistakes.
Feedforward Neural Network
Explanation: A feedforward neural network sends data in one direction from input to output through hidden layers.
Analogy: Like an assembly line where each station transforms the input before passing it forward.
Convolutional Neural Network
Explanation: A CNN is designed to detect patterns in images, such as edges, shapes, textures, and objects.
Analogy: Like scanning a picture with small filters that look for visual clues.
Image โ filters โ feature maps โ prediction
Recurrent Neural Network
Explanation: An RNN processes sequential data by using information from previous steps.
Analogy: Like reading a sentence word by word while remembering what came before.
LSTM
Explanation: LSTM is a type of RNN designed to remember important information for longer periods and forget less useful information.
Analogy: Like a smart notebook that keeps important facts and erases noise.
GRU
Explanation: GRU is a simpler alternative to LSTM that also helps neural networks remember useful sequence information.
Analogy: Like a lighter version of LSTM with fewer gates to manage memory.
Transformer
Explanation: A transformer uses attention to understand relationships between tokens, even when they are far apart in the input.
Analogy: Like reading a paragraph and highlighting which words matter most to each other.
BERT
Explanation: BERT is a transformer-based model designed to understand text by reading context from both directions.
Analogy: Like understanding a missing word by looking at the words before and after it.
GPT
Explanation: GPT is a transformer-based model designed to generate text by predicting the next token.
Analogy: Like a very advanced autocomplete system that continues writing based on context.
Autoencoder
Explanation: An autoencoder learns to compress data into a smaller representation and reconstruct it.
Analogy: Like summarizing a document and then trying to recreate the original from the summary.
Variational Autoencoder
Explanation: A VAE learns a compressed probability-based representation that can generate new examples similar to the training data.
Analogy: Like learning the recipe pattern behind images so it can create new variations.
GAN
Explanation: A GAN has two neural networks: a generator that creates fake examples and a discriminator that tries to detect fakes.
Analogy: Like an art forger and an art detective improving together.
Generator vs Discriminator
Diffusion Model
Explanation: A diffusion model learns to create data by reversing a noise process, gradually turning noise into a clear image or output.
Analogy: Like starting with TV static and slowly cleaning it until a picture appears.
U-Net
Explanation: U-Net is a neural network architecture often used for image segmentation, especially in medical imaging.
Analogy: Like tracing the exact outline of objects in an image.
Vision Transformer
Explanation: A Vision Transformer applies transformer ideas to images by splitting an image into patches and using attention.
Analogy: Like cutting an image into puzzle pieces and deciding which pieces matter most.
Multimodal Model
Explanation: A multimodal model can work with more than one type of data, such as text, images, audio, or video.
Analogy: Like a person who can read, see, listen, and speak.
Encoder-Decoder Model
Explanation: An encoder-decoder model reads an input into a representation, then generates an output from it.
Analogy: Like reading a paragraph in one language and then writing it in another.
Sequence-to-Sequence Model
Explanation: A sequence-to-sequence model converts one sequence into another, such as text in one language to text in another.
Analogy: Like converting a recipe from English to Spanish step by step.
Neural Network Architecture
Explanation: A neural network architecture is the design pattern of a model, including its layers, connections, and how data flows.
Analogy: Like the blueprint of a building.
Pretrained Model
Explanation: A pretrained model has already learned useful patterns from large datasets and can be reused or adapted for new tasks.
Analogy: Like hiring someone who already has general experience before training them for your company.
Transfer Learning
Explanation: Transfer learning reuses knowledge from a pretrained model and adapts it to a new task.
Analogy: Like a doctor specializing in cardiology after already completing general medical training.
Temperature
Explanation: Temperature controls how random or creative the model's response is. Lower temperature makes answers more predictable and focused. Higher temperature makes answers more varied and creative.
Analogy: Like choosing how adventurous a chef should be. Low temperature follows the recipe exactly; high temperature experiments with new flavors.
Low = focused ยท High = creative
Max Tokens
Explanation: Max tokens limits how much text the model can generate in its response. A lower value forces shorter answers, while a higher value allows longer responses.
Analogy: Like setting a word limit on an essay. The model must stop once it reaches the allowed length.
Small limit = short answer ยท Large limit = longer answer
Stop Sequences
Explanation: Stop sequences are specific words, symbols, or text patterns that tell the model when to stop generating. When the model reaches that sequence, the response ends.
Analogy: Like telling someone, "Stop talking when you say 'END'."
Generate text โ stop word found โ output ends
Top P
Explanation: Top P, also called nucleus sampling, controls how many likely next-token choices the model considers. Lower Top P narrows choices to safer/common options. Higher Top P allows more variety.
Analogy: Like choosing from only the top few most likely menu items versus considering almost the whole menu.
Low Top P = narrow choices ยท High Top P = wider choices
Frequency Penalty
Explanation: Frequency penalty reduces the chance that the model repeats words or phrases it has already used many times. Higher frequency penalty discourages repetition.
Analogy: Like a teacher saying, "You already said that word too many times. Use different wording."
Higher penalty = less repeated wording
Presence Penalty
Explanation: Presence penalty encourages the model to introduce new topics or ideas instead of staying too close to what it has already mentioned. It penalizes tokens simply for appearing before, not based on how often they appeared.
Analogy: Like encouraging a speaker to bring up new points instead of circling around the same topic.
Higher penalty = more new ideas
Turing Test
Explanation: The Turing Test is a way to evaluate whether a machine can respond so naturally that a human judge may not be able to tell whether they are talking to a person or a computer.
Analogy: Like texting with two hidden participants โ one human and one machine โ and trying to guess which one is which based only on their answers.
Human judge โ hidden conversation โ human or machine?
Narrow AI
Explanation: Narrow AI is AI designed to perform one specific task or a limited set of tasks. Most AI systems used today are narrow AI, including chatbots, recommendation engines, image recognition systems, and fraud detection models.
Analogy: Like a specialist doctor who is very good in one area, such as heart care, but is not trained to do every type of medical work.
Specific task โ specialized AI system
General AI
Explanation: General AI, also called AGI, refers to a theoretical AI system that could understand, learn, and perform many different intellectual tasks at a human level. It would not be limited to one narrow task.
Analogy: Like a person who can learn many different jobs โ teacher, engineer, writer, analyst โ instead of being trained for only one task.
Many tasks โ one flexible intelligence
Superintelligence
Explanation: Superintelligence is a hypothetical form of AI that would surpass human intelligence across many areas, such as reasoning, creativity, science, strategy, and problem-solving.
Analogy: Like comparing a beginner chess player to a grandmaster โ except the AI would be far beyond human experts across many fields, not just one game.
Human-level intelligence โ beyond-human intelligence
Symbolic AI
Explanation: Symbolic AI is an older approach to artificial intelligence that uses explicit rules, logic, and symbols instead of learning patterns from large amounts of data. It tries to represent knowledge in a structured, rule-based way.
Analogy: Like giving a computer a detailed rulebook and asking it to follow the rules step by step instead of learning from examples.
Rules + logic โ AI decision
Bias-Variance Tradeoff
Explanation: The bias-variance tradeoff explains two common ways a model can fail. High bias means the model is too simple and misses important patterns. High variance means the model is too sensitive to training data and may memorize noise instead of learning the real pattern.
Analogy: Like a student preparing for an exam. High bias is a student who oversimplifies everything and misses details. High variance is a student who memorizes exact practice questions but struggles when the exam is worded differently.
High bias = underfitting ยท High variance = overfitting
Cross-Validation
Explanation: Cross-validation is a technique for evaluating a model by splitting the data multiple ways. The model is trained and tested on different portions of the data to get a more reliable estimate of performance.
Analogy: Like taking several practice exams instead of trusting one test score. If you perform well across many practice exams, your understanding is probably more reliable.
Split data โ train/test multiple times โ average performance
Confusion Matrix
Explanation: A confusion matrix is a table that shows how well a classification model performed by counting correct and incorrect predictions. It includes true positives, false positives, true negatives, and false negatives.
Analogy: Like a detailed scoreboard. It does not just say the team won or lost; it shows exactly where the mistakes happened.
Actual vs Predicted โ TP / FP / TN / FN
ROC Curve & AUC
Explanation: An ROC curve shows how a classification model performs at different decision thresholds. AUC summarizes the curve into one score that represents how well the model separates classes.
Analogy: Like testing a smoke alarm at different sensitivity levels. You want it to catch real fires without too many false alarms.
Threshold changes โ ROC curve โ AUC score
Ensemble Methods
Explanation: Ensemble methods combine multiple models to make a stronger prediction than a single model. Common ensemble approaches include bagging, boosting, and stacking.
Analogy: Like asking several experts for their opinions and combining their answers instead of trusting only one person.
Model 1 + Model 2 + Model 3 โ stronger prediction
Dropout
Explanation: Dropout is a regularization technique used during neural network training. It randomly turns off some neurons during each training step so the model does not rely too heavily on any single neuron or pathway.
Analogy: Like making a basketball team practice while randomly benching a few players each time. Everyone learns to contribute instead of depending on one star player.
Training step โ randomly disable neurons โ stronger generalization
Batch Normalization
Explanation: Batch normalization is a technique that normalizes values inside a neural network during training. It helps training become more stable, often faster, and less sensitive to poor starting conditions.
Analogy: Like making sure every ingredient in a recipe is measured consistently before cooking. When the measurements stay stable, the final result becomes easier to control.
Layer values โ normalize โ more stable training
Vanishing Gradient
Explanation: Vanishing gradient is a training problem where the error signal becomes very small as it moves backward through many neural network layers. When this happens, earlier layers learn very slowly or almost stop learning.
Analogy: Like passing a message through many people, but each person whispers it more softly. By the time it reaches the first person, the message is almost gone.
Backpropagation โ smaller gradients โ early layers learn slowly
Exploding Gradient
Explanation: Exploding gradient is a training problem where the error signal becomes extremely large as it moves backward through neural network layers. This can cause weight updates to become unstable and make training fail.
Analogy: Like trying to slightly adjust a shower temperature, but the knob suddenly jumps too far and makes the water scalding hot or freezing cold.
Backpropagation โ huge gradients โ unstable updates
Decoder-only Architecture
Explanation: Decoder-only architecture is the model design used by GPT-style language models. It generates text one token at a time by looking at the previous tokens and predicting what should come next.
Analogy: Like writing a sentence from left to right, choosing the next word based on everything already written.
Previous tokens โ predict next token โ repeat
Encoder-only Architecture
Explanation: Encoder-only architecture is used by BERT-style models that focus on understanding text rather than generating long responses. These models read the input and create rich representations of its meaning.
Analogy: Like a careful reader who studies a paragraph deeply to understand what it means, but is not mainly trying to write the next paragraph.
Text input โ encoded meaning representation
Encoder-Decoder Architecture
Explanation: Encoder-decoder architecture uses one part of the model to read and understand the input, and another part to generate the output. It is common in translation, summarization, and text-to-text tasks.
Analogy: Like one person reading a document carefully and another person writing a clean summary from that understanding.
Encoder reads input โ decoder writes output
Masked Language Modeling
Explanation: Masked language modeling is a training method where some words or tokens are hidden, and the model learns to predict the missing pieces from the surrounding context.
Analogy: Like a fill-in-the-blank worksheet where the model learns by guessing the missing words.
The cat sat on the [MASK] โ model predicts mat
Causal Language Modeling
Explanation: Causal language modeling trains a model to predict the next token using only the tokens that came before it. This is the core training style for many generative LLMs.
Analogy: Like autocomplete that can only look at what you have already typed, not future words.
Past tokens โ next token
Next Token Prediction
Explanation: Next token prediction is the task of predicting the next small piece of text in a sequence. Many generative language models learn by practicing this task at massive scale.
Analogy: Like a very advanced autocomplete system that keeps guessing what should come next.
I like machine โ learning
Self-Attention
Explanation: Self-attention lets each token in a sequence look at other tokens in the same sequence to understand which words matter most for meaning.
Analogy: Like reading a sentence and highlighting the earlier words that help explain the current word.
Each token โ related tokens
Multi-Head Attention
Explanation: Multi-head attention runs multiple attention mechanisms in parallel, allowing the model to focus on different relationships in the text at the same time.
Analogy: Like several reviewers reading the same paragraph, where one focuses on grammar, another on meaning, and another on references.
Multiple attention heads โ combined understanding
Positional Encoding
Explanation: Positional encoding adds information about token order so a transformer can understand sequence position. Without it, the model would know the words but not their order.
Analogy: Like numbering each word in a sentence so the model knows what came first, second, and third.
Token + position information โ transformer input
Top-K Sampling
Explanation: Top-K sampling limits the model to choosing from only the K most likely next tokens. This can reduce strange outputs by ignoring very unlikely choices.
Analogy: Like choosing dinner only from the top 5 most popular menu items instead of considering the entire menu.
All tokens โ top K choices โ sample next token
Beam Search
Explanation: Beam search is a decoding strategy that keeps several likely output paths at the same time and chooses the strongest sequence. It is often used when the goal is a high-probability, structured output.
Analogy: Like exploring several promising roads at once before choosing the best route to the destination.
Keep best paths โ compare sequences โ choose strongest output
Embedding Model
Explanation: An embedding model is a model designed to convert text, images, or other data into numeric vectors that capture meaning. These vectors can then be used for similarity search, recommendations, clustering, and RAG.
Analogy: Like a machine that gives every idea a GPS coordinate so similar ideas end up close together on the map.
Text or data โ embedding model โ meaning vector
Cosine Similarity
Explanation: Cosine similarity compares the direction of two vectors to estimate how similar their meanings are. It is commonly used to compare embeddings in semantic search and RAG systems.
Analogy: Like checking whether two arrows point in the same direction. The closer their direction, the more similar the ideas.
Vector A โ + Vector B โ = high similarity
Maximum Marginal Relevance
Explanation: Maximum Marginal Relevance, or MMR, is a retrieval method that balances relevance and diversity. It tries to return useful results without giving several nearly identical chunks.
Analogy: Like choosing search results that are all helpful, but not all repeating the same paragraph in different words.
Relevant results + diverse results โ better retrieval set
Hybrid Search
Explanation: Hybrid search combines keyword search and semantic search. This helps systems find exact terms when needed while also understanding meaning when users phrase things differently.
Analogy: Like searching a library by both exact book title and topic meaning at the same time.
Keyword search + semantic search โ stronger retrieval
Reranking
Explanation: Reranking is a second retrieval step that reorders candidate results so the most useful or relevant chunks appear first before the LLM receives context.
Analogy: Like first collecting many possible resumes, then having a recruiter rank the best candidates before sending them to the hiring manager.
Retrieve candidates โ rerank โ send best context
Self-Query Retrieval
Explanation: Self-query retrieval uses an LLM to convert a natural-language question into a structured search query with filters, such as date, category, source, or metadata constraints.
Analogy: Like asking a librarian to translate your plain-English question into exact database filters.
User question โ structured query โ filtered retrieval
Parent Document Retriever
Explanation: A parent document retriever searches smaller chunks for accuracy but returns a larger parent document or section for better context. This helps avoid giving the LLM tiny fragments without enough surrounding meaning.
Analogy: Like finding one sentence in an index card, then reading the full page around it to understand the context.
Search small chunk โ return larger parent context
Reflection
Explanation: Reflection is an agent pattern where the AI reviews its own output, identifies possible mistakes, and uses that feedback to improve the next attempt.
Analogy: Like proofreading your own essay, noticing weak points, and rewriting it before submitting.
Answer โ review โ improve โ final answer
Planning
Explanation: Planning is when an AI agent breaks a larger goal into smaller steps before acting. This helps the agent organize work instead of jumping straight into action.
Analogy: Like making a travel itinerary before starting a trip, so you know where to go first, second, and third.
Goal โ steps โ actions
Multi-Agent System
Explanation: A multi-agent system uses more than one AI agent to solve a problem. Agents may collaborate, divide responsibilities, review each other, or compete to improve results.
Analogy: Like a project team where one person researches, another writes, another reviews, and another manages the plan.
Agent A + Agent B + Agent C โ coordinated result
Orchestration
Explanation: Orchestration is the coordination of agent steps, tools, memory, and workflows so the right action happens at the right time.
Analogy: Like a music conductor coordinating different instruments so they play together instead of making noise separately.
Plan + tools + memory + steps โ coordinated workflow
Reinforcement Learning from Human Feedback
Explanation: Reinforcement Learning from Human Feedback, or RLHF, is a training approach where human preferences are used to guide a model toward responses people consider more helpful, safe, or aligned.
Analogy: Like a coach watching practice and saying which responses were better, so the model learns what people prefer.
Model outputs โ human preferences โ reward training
Constitutional AI
Explanation: Constitutional AI is an approach where a model is guided by a written set of principles or rules that help shape safer and more helpful behavior.
Analogy: Like giving an assistant a constitution to follow when deciding what is acceptable or not.
Principles โ model critique โ safer response
Self-Supervised Learning
Explanation: Self-supervised learning is a training approach where the model learns from the structure of the data itself instead of relying on human-provided labels. The system creates its own learning signal from the data.
Analogy: Like making your own quiz from a textbook. You hide part of the material, try to guess it, and learn from whether you were right.
Raw data โ create training signal โ learn patterns
Contrastive Learning
Explanation: Contrastive learning teaches a model by pulling similar examples closer together in representation space and pushing different examples farther apart.
Analogy: Like organizing photos by putting similar pictures in the same pile and moving unrelated pictures to different piles.
Similar examples closer ยท different examples farther
Few-Shot Learning
Explanation: Few-shot learning is the ability to learn or adapt from only a small number of examples. In LLMs, it often means giving a few examples in the prompt so the model follows the pattern.
Analogy: Like understanding a new card game after watching only two or three rounds.
Few examples โ model adapts pattern
Curriculum Learning
Explanation: Curriculum learning is a training strategy where the model starts with easier examples and gradually moves to harder ones.
Analogy: Like learning math by starting with addition, then multiplication, then algebra, instead of starting with calculus on day one.
Easy examples โ medium examples โ hard examples
Learning Rate Scheduling
Explanation: Learning rate scheduling changes the learning rate during training. This can help the model learn quickly early on and make smaller, more careful updates later.
Analogy: Like driving faster on an open highway, then slowing down as you approach a parking spot.
Learning rate changes over training time
Warmup Steps
Explanation: Warmup steps gradually increase the learning rate at the beginning of training so the model does not start with updates that are too aggressive.
Analogy: Like warming up before running at full speed to avoid injury.
Small learning rate โ gradually increase โ normal training
Gradient Accumulation
Explanation: Gradient accumulation lets training collect gradients over multiple smaller batches before applying one update. This can simulate a larger batch size when memory is limited.
Analogy: Like collecting several small payments before making one larger deposit.
Small batch + small batch + small batch โ one update
Mixed Precision Training
Explanation: Mixed precision training uses lower-precision numbers for some calculations to make training faster and use less memory while keeping enough accuracy.
Analogy: Like using shorthand notes to work faster while still preserving the important meaning.
Lower precision math โ faster training + less memory
Distillation
Explanation: Distillation is a technique where a smaller model is trained to imitate a larger, more capable model. The goal is to keep much of the quality while making the model cheaper and faster.
Analogy: Like a student learning from an expert teacher and becoming easier to consult quickly.
Large teacher model โ smaller student model
Quantization
Explanation: Quantization reduces the numerical precision of model weights or calculations so the model uses less memory and can run faster.
Analogy: Like rounding numbers to save space while keeping the answer close enough for practical use.
High precision numbers โ lower precision numbers โ smaller model
Pruning
Explanation: Pruning removes unnecessary weights, neurons, or connections from a model to make it smaller or faster while trying to preserve performance.
Analogy: Like trimming dead branches from a tree so the healthy parts can remain strong.
Large model โ remove unnecessary parts โ smaller model
Perplexity
Explanation: Perplexity is a metric used to evaluate language models by measuring how surprised the model is by a sequence of text. Lower perplexity usually means the model is better at predicting the text.
Analogy: Like reading a sentence and asking, "How unexpected was the next word?" If the words are easy to predict, the model is less confused.
Lower perplexity = less surprise
BLEU Score
Explanation: BLEU Score is a metric often used to evaluate machine translation by comparing generated text against one or more reference translations.
Analogy: Like comparing a student's translation to an expert translation and checking how many phrases overlap.
Generated translation vs reference translation โ BLEU
ROUGE Score
Explanation: ROUGE Score is a metric often used to evaluate summaries by comparing overlap between a generated summary and a reference summary.
Analogy: Like checking whether a student's summary included the same important points as the teacher's model summary.
Generated summary vs reference summary โ ROUGE
A/B Test
Explanation: An A/B test compares two versions of a model, prompt, feature, or user experience with real users to see which performs better.
Analogy: Like showing two store layouts to different customers and measuring which one leads to more purchases.
Version A vs Version B โ compare results
Holdout Set
Explanation: A holdout set is data kept separate from training and tuning so it can provide a more honest final check of model performance.
Analogy: Like keeping a sealed final exam that students cannot practice on before test day.
Training/tuning data separate from final holdout data
Validation Set
Explanation: A validation set is data used during model development to tune choices like hyperparameters, thresholds, or model versions without touching the final test set.
Analogy: Like using practice exams to adjust your study strategy before taking the final exam.
Train โ validate/tune โ final test
LLM-as-Judge
Explanation: LLM-as-Judge means using a language model to evaluate outputs from another model or system. It can help score quality, helpfulness, correctness, tone, or adherence to instructions.
Analogy: Like asking one expert reviewer to grade another assistant's answer using a rubric.
Model output โ judge model โ score or feedback
Model Registry
Explanation: A model registry is a centralized place to store, version, track, and manage trained models before and after deployment.
Analogy: Like a library catalog for model versions, showing which model is approved, which is experimental, and which is currently in production.
Train model โ register version โ deploy selected version
Feature Store
Explanation: A feature store is a system for storing, managing, and serving reusable model features consistently for training and production inference.
Analogy: Like a shared pantry where every team uses the same approved ingredients instead of preparing different versions separately.
Raw data โ features โ training and inference
Canary Deployment
Explanation: Canary deployment gradually releases a new model or feature to a small percentage of users before rolling it out to everyone.
Analogy: Like testing a new recipe with a few customers before adding it to the full restaurant menu.
Small traffic โ monitor โ full rollout
Blue-Green Deployment
Explanation: Blue-green deployment runs two environments: the current version and the new version. Traffic can switch from one to the other when the new version is ready.
Analogy: Like opening a new checkout lane and moving customers over only after confirming it works smoothly.
Blue environment โ switch traffic โ Green environment
Continuous Training
Explanation: Continuous training is the process of automatically retraining models as new data becomes available, usually as part of an MLOps pipeline.
Analogy: Like a student who keeps studying new lessons every week instead of relying only on old knowledge.
New data โ retrain โ evaluate โ deploy
Data Drift
Explanation: Data drift happens when the input data seen by a model changes over time compared to the data it was trained on.
Analogy: Like a store that trained its sales forecast on old customer habits, but customer behavior changes after a new competitor opens nearby.
Training data distribution โ current input data
Concept Drift
Explanation: Concept drift happens when the relationship between inputs and outputs changes over time, even if the input data still looks similar.
Analogy: Like credit risk changing during an economic downturn: the same income and debt profile may no longer mean the same level of risk.
Same inputs โ changed meaning โ changed output relationship
Feedback Loop
Explanation: A feedback loop happens when a model's outputs influence future user behavior or future data, which can then affect later model performance.
Analogy: Like a recommendation system showing certain videos, users watching more of them, and the system learning to recommend even more of the same type.
Model output โ user behavior โ future data โ model impact
Model Cards
Explanation: Model cards are documents that describe a model's purpose, intended use, performance, limitations, risks, and responsible-use guidance.
Analogy: Like a nutrition label for an AI model, showing what it is for, what it contains, and what warnings to know before using it.
Model details โ performance โ limitations โ intended use
Algorithmic Fairness
Explanation: Algorithmic fairness is the practice of measuring and reducing unfair outcomes produced by AI systems. It looks at whether different groups are treated consistently and appropriately by a model or decision process.
Analogy: Like checking whether the rules of a game are fair for every team, not just whether the final score looks reasonable.
Model outcomes โ compare groups โ fairness check
Disparate Impact
Explanation: Disparate impact happens when a system appears neutral but its outcomes disproportionately harm or exclude a protected or sensitive group.
Analogy: Like a rule that applies to everyone on paper but ends up blocking one group much more often in practice.
Neutral rule โ uneven outcome across groups
Toxicity Filtering
Explanation: Toxicity filtering detects and blocks harmful, abusive, hateful, or unsafe language before it reaches users or systems.
Analogy: Like a safety screen that catches dangerous language before it enters a classroom discussion.
Input/output text โ safety filter โ allow or block
Content Moderation
Explanation: Content moderation is the process of reviewing, filtering, or managing user-generated content to enforce safety rules, platform policies, or community standards.
Analogy: Like a moderator keeping a public meeting respectful and removing harmful comments.
User content โ policy check โ allow, flag, or remove
Interpretability vs Explainability
Explanation: Interpretability focuses on understanding how a model works internally, while explainability focuses on giving understandable reasons for a model's decision or output.
Analogy: Interpretability is opening the engine to see how it works. Explainability is telling the driver why the car slowed down.
Interpretability = inside model ยท Explainability = understandable reason
LIME
Explanation: LIME is a technique that explains an individual model prediction by approximating the model locally with a simpler, easier-to-understand model.
Analogy: Like zooming into one neighborhood and using a simple local map to explain what is happening there, even if the full city map is complex.
Complex model prediction โ local simple explanation
SHAP
Explanation: SHAP is an explanation method based on game theory that estimates how much each feature contributed to a model prediction.
Analogy: Like dividing credit among team members for a win based on how much each person contributed.
Prediction โ feature contributions โ explanation
Adversarial Attack
Explanation: An adversarial attack uses carefully designed inputs to trick an AI system into making a wrong prediction or unsafe decision.
Analogy: Like creating an optical illusion specifically designed to fool a computer vision model.
Tiny input change โ wrong model output
Membership Inference
Explanation: Membership inference is an attack that tries to determine whether a specific person's data or record was included in a model's training data.
Analogy: Like trying to guess whether someone's private file was part of a confidential training folder by observing how the model responds.
Model behavior โ infer training membership
Model Inversion
Explanation: Model inversion is an attack that attempts to reconstruct sensitive training data or private attributes by querying or analyzing a model.
Analogy: Like trying to recreate a secret recipe by tasting the final dish many times.
Model access โ infer hidden data
Synthetic Data
Explanation: Synthetic data is artificially generated data that imitates real data. It can be used for testing, privacy protection, data balancing, or simulation when real data is limited or sensitive.
Analogy: Like using realistic practice patients in medical training instead of real patient records.
Real patterns โ generated data โ safer testing/training
LLM Cost Economics
Explanation: LLM cost economics means understanding how token usage, model choice, request volume, caching, and output length affect the cost of running an AI product.
Analogy: Like tracking electricity usage in a factory. The machines may be powerful, but every minute they run has a cost.
Tokens + model price + traffic volume โ AI cost
Prompt Caching
Explanation: Prompt caching reuses repeated parts of prompts so the system can reduce cost and latency when the same context is sent many times.
Analogy: Like preparing common ingredients ahead of time instead of chopping them again for every order.
Repeated prompt context โ cache hit โ lower cost/latency
Token Budgeting
Explanation: Token budgeting is planning how many tokens are used for instructions, user input, retrieved context, and model output so the app stays within cost and context limits.
Analogy: Like budgeting money before shopping so you know how much can go to groceries, gas, and savings.
Prompt + context + answer โค token budget
Rate Limiting
Explanation: Rate limiting controls how many requests a user, app, or system can make within a certain time period. It protects cost, reliability, and system capacity.
Analogy: Like limiting how many people can enter a store per minute so the store does not become overcrowded.
User requests โ limit check โ allow or slow down
Grounding
Explanation: Grounding connects AI outputs to trusted sources, retrieved evidence, or known facts so answers are easier to verify and less likely to be unsupported.
Analogy: Like requiring a student to cite the textbook page instead of giving an answer from memory.
AI answer + supporting source = grounded response
User Feedback Loop
Explanation: A user feedback loop collects user ratings, corrections, comments, or behavior signals and uses them to improve the AI product over time.
Analogy: Like customers reviewing a service so the business can learn what worked, what failed, and what to improve.
User feedback โ analysis โ product improvement
Shadow Mode
Explanation: Shadow mode runs an AI system in the background without affecting real decisions. Teams can compare its outputs against the current process before trusting it in production.
Analogy: Like a trainee making recommendations that are reviewed silently but not used until they prove reliable.
AI runs silently โ compare results โ decide rollout
Confidence Thresholding
Explanation: Confidence thresholding means allowing an AI system to act automatically only when its confidence is high enough. Low-confidence cases can be sent to a human or handled more cautiously.
Analogy: Like asking for manager approval when an employee is unsure instead of letting them make a risky decision alone.
High confidence โ automate ยท Low confidence โ review
GPU
Explanation: A GPU, or graphics processing unit, is a chip designed to perform many calculations in parallel. GPUs are widely used to speed up AI training and inference.
Analogy: Like a factory with thousands of workers doing similar small tasks at the same time instead of one person doing them one by one.
Many parallel calculations โ faster AI workloads
TPU
Explanation: A TPU, or tensor processing unit, is a specialized chip designed to accelerate machine learning workloads, especially tensor operations used in neural networks.
Analogy: Like a custom-built machine designed for one factory job, making that job faster and more efficient than a general-purpose machine.
Tensor operations โ TPU acceleration
NPU
Explanation: An NPU, or neural processing unit, is a chip designed to run AI tasks efficiently on devices such as phones, laptops, and edge hardware.
Analogy: Like a small built-in AI helper inside your device that handles AI tasks without needing a large server.
Device AI task โ NPU โ efficient local processing
CUDA
Explanation: CUDA is NVIDIA's platform and programming model that lets software use NVIDIA GPUs for parallel computing tasks, including AI training and inference.
Analogy: Like a special language that helps your software talk efficiently to NVIDIA GPU workers.
Software โ CUDA โ NVIDIA GPU compute
vLLM
Explanation: vLLM is a high-throughput inference engine designed to serve large language models efficiently, especially when many users are sending requests.
Analogy: Like a fast restaurant kitchen optimized to serve many orders at once without wasting counter space or staff time.
Many LLM requests โ efficient serving engine โ faster responses
ONNX
Explanation: ONNX is an open model format that helps move machine learning models between different frameworks, tools, and runtimes.
Analogy: Like a universal file format for AI models, similar to how PDF helps documents open across many systems.
Model from framework A โ ONNX โ runtime B
TensorRT
Explanation: TensorRT is NVIDIA's toolkit for optimizing trained deep learning models so they run faster and more efficiently during inference on NVIDIA hardware.
Analogy: Like tuning a race car engine so the same car can run faster on the track.
Trained model โ TensorRT optimization โ faster inference
LoRA
Explanation: LoRA, or Low-Rank Adaptation, is a parameter-efficient fine-tuning method that trains small adapter weights instead of updating the entire large model.
Analogy: Like adding a small specialized attachment to a machine instead of rebuilding the whole machine.
Large model frozen + small adapters trained
QLoRA
Explanation: QLoRA is a memory-efficient version of LoRA that uses quantization so large models can be fine-tuned with much less GPU memory.
Analogy: Like compressing a large instruction manual so it can fit on a smaller desk while still allowing you to add notes.
Quantized model + LoRA adapters โ memory-efficient fine-tuning
Emergent Abilities
Explanation: Emergent abilities are capabilities that appear only when AI models become large or capable enough, even if those abilities were weak or absent in smaller models.
Analogy: Like a team suddenly solving problems that no individual member could solve alone once the team becomes large and coordinated enough.
Larger scale โ new capability appears
Scaling Laws
Explanation: Scaling laws describe patterns in how model performance changes as model size, data size, and compute increase.
Analogy: Like studying how a car's speed changes when you improve the engine, fuel, and road quality together.
More data + larger model + more compute โ performance trend
Chain-of-Draft
Explanation: Chain-of-Draft is a concise reasoning approach where a model uses short intermediate notes instead of long step-by-step reasoning.
Analogy: Like writing quick scratch notes on the side of a math problem instead of writing a full essay solution.
Short draft notes โ final answer
Tree-of-Thoughts
Explanation: Tree-of-Thoughts is a reasoning approach where a model explores multiple possible solution paths before choosing the best one.
Analogy: Like exploring several branches of a decision tree before deciding which route is most promising.
Idea branch A / B / C โ evaluate โ choose path
GraphRAG
Explanation: GraphRAG is a retrieval approach that uses knowledge graphs or entity relationships to improve how information is found and connected before generating an answer.
Analogy: Like using a relationship map of people, places, and events instead of searching loose documents one by one.
Documents โ entities/relationships โ graph-based retrieval
Speculative Decoding
Explanation: Speculative decoding speeds up language model generation by using a smaller model to draft possible tokens and a larger model to verify them.
Analogy: Like a junior assistant drafting ahead while a senior expert quickly checks and approves the correct parts.
Small model drafts โ large model verifies โ faster output
Mixture of Experts
Explanation: Mixture of Experts is a model architecture that routes inputs to specialized expert sub-networks instead of using the entire model for every token or request.
Analogy: Like sending each question to the right specialist instead of asking the whole company to work on every question.
Input โ router โ selected experts โ output
Sparse Attention
Explanation: Sparse attention is an attention method that looks at only selected tokens instead of every token, reducing computation and memory use.
Analogy: Like reading only the most relevant pages of a book instead of comparing every page to every other page.
Selected token connections โ lower attention cost
Flash Attention
Explanation: Flash Attention is an optimized attention implementation that reduces memory use and speeds up transformer training or inference.
Analogy: Like reorganizing a messy desk so you can do the same work faster without needing more space.
Attention computation โ memory optimization โ faster processing