<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Axicov]]></title><description><![CDATA[Axicov]]></description><link>https://blog.axicov.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1751467033385/fd070b6e-ee01-48df-ad85-bb059a25af18.png</url><title>Axicov</title><link>https://blog.axicov.com</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 29 Apr 2026 23:08:38 GMT</lastBuildDate><atom:link href="https://blog.axicov.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Planning Pattern: Working Principle, Workflow     and Subtypes]]></title><description><![CDATA[Before going into the definition and all the technical buzzwords, let’s try to understand from a simple point of view and get the intuition behind the planning pattern.
Think about how you approach a big project—say, organizing a cross-country trip. ...]]></description><link>https://blog.axicov.com/planning-pattern-working-principle-workflow-and-subtypes</link><guid isPermaLink="true">https://blog.axicov.com/planning-pattern-working-principle-workflow-and-subtypes</guid><category><![CDATA[design patterns]]></category><category><![CDATA[workflow]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Abhirup Ghosh]]></dc:creator><pubDate>Tue, 15 Jul 2025 01:55:34 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1752733226558/ceaf7542-926d-4f80-9caa-41535cf1b946.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Before going into the definition and all the technical buzzwords, let’s try to understand from a simple point of view and get the <strong>intuition</strong> behind the planning pattern.</p>
<p>Think about how you approach a big project—say, organizing a cross-country trip. You wouldn’t just start booking hotels at random. First, you’d break the trip into steps: decide on destinations, book flights, reserve hotels, plan activities, and so on. This ability to decompose a large goal into manageable chunks and to adapt on the fly is what makes you efficient and resilient.</p>
<p>AI agents face similar challenges. Without a plan, they’re like <strong>tourists wandering aimlessly.</strong> With a planning pattern, they become <strong>strategic travelers</strong>—<em>goal-oriented, adaptable, and efficient</em>.</p>
<hr />
<h1 id="heading-what-is-the-planning-pattern">What Is the Planning Pattern?</h1>
<p>The planning pattern is an <strong>agentic design pattern</strong> where an AI agent <strong>autonomously</strong> breaks down a complex goal into <strong>smaller, actionable subtasks</strong> and dynamically sequences them to achieve the desired outcome. This process is called <strong>“Task Decomposition”</strong>. Think of it as the digital equivalent of making a to-do list but with the added power of <strong>real-time reasoning, adaptation, and self-correction</strong>.</p>
<p>At its core, the planning pattern gives an agent the ability to <em>think ahead</em>, to map out a path rather than react impulsively. Just as a chess player considers several moves in advance, an agent using the planning pattern anticipates challenges, weighs options, and pivots strategies as circumstances change.</p>
<hr />
<h1 id="heading-why-use-the-planning-pattern">Why Use the Planning Pattern?</h1>
<ul>
<li><p><strong>Complexity Handling:</strong> Perfect for tasks where the solution isn’t obvious or requires multiple steps.</p>
</li>
<li><p><strong>Adaptability:</strong> Enables agents to respond to unexpected changes or failures without getting stuck.</p>
</li>
<li><p><strong>Efficiency</strong>: By prioritizing and sequencing tasks intelligently, agents avoid wasted effort.</p>
</li>
<li><p><strong>Scalability</strong>: Supports collaboration between multiple agents, each handling specialized subtasks.</p>
</li>
</ul>
<hr />
<h1 id="heading-working-principle-and-flow">Working Principle And Flow</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752059044445/208cf0f5-73b6-4947-8ea1-d8aaa1ea641f.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Goal Analysis and Context Building</strong></p>
<ul>
<li><p>The agent begins by analyzing the overall objective and any <em>constraints or requirements</em>.</p>
</li>
<li><p>It gathers <em>relevant information</em> and <em>builds context</em>, which may involve <em>querying databases</em>, <em>reviewing documents</em>, or <em>interacting with users</em>.</p>
</li>
</ul>
</li>
<li><p><strong>Strategic Task Decomposition</strong></p>
<ul>
<li>The agent decomposes the primary goal into a <em>hierarchy of subtasks</em>.</li>
</ul>
</li>
<li><p><strong>Dependency Mapping and Sequencing</strong></p>
<ul>
<li><p>The agent identifies which subtasks depend on others and determines the <em>most logical order</em> for execution.</p>
</li>
<li><p>This step <em>prevents wasted effort</em> and <em>ensures prerequisites</em> are satisfied before moving forward.</p>
</li>
</ul>
</li>
<li><p><strong>Single Agent Task Allocation</strong></p>
<ul>
<li><p>The Single Task Agent is responsible for <em>completing each task</em> generated in the previous step.</p>
</li>
<li><p>This agent executes each task using predefined methods like <em>ReAct (Reason + Act)</em> or <em>ReWOo (Reasoning WithOut Observation).</em></p>
</li>
<li><p>Once a task is completed, the agent returns a Task Result, which is sent back to the planning loop.</p>
</li>
</ul>
</li>
<li><p><strong>Resource Allocation and Tool Integration</strong></p>
<ul>
<li><p>The agent <em>selects appropriate tools</em> or <em>external resources</em> for each subtask (e.g., APIs, databases, code interpreters).</p>
</li>
<li><p>It orchestrates tool usage dynamically, matching each task to the best available capability.</p>
</li>
</ul>
</li>
<li><p><strong>Execution and Monitoring</strong></p>
<ul>
<li><p>The agent carries out each subtask, monitoring progress and checking for errors or unexpected outcomes.</p>
</li>
<li><p>If a subtask fails or new information arises, the agent can replan, adjust its sequence, or try alternative strategies.</p>
</li>
</ul>
</li>
<li><p><strong>Feedback Loop and Learning</strong></p>
<ul>
<li><p>After each action, the agent evaluates the <em>result against the goal</em>.</p>
</li>
<li><p>It collects <em>performance data, learns from mistakes</em>, and updates its plan to improve future outcomes.</p>
</li>
</ul>
</li>
<li><p><strong>Completion and Output Delivery</strong></p>
<ul>
<li><p>Once all subtasks are completed and the overall objective is met, the agent <em>compiles and delivers the final output.</em></p>
</li>
<li><p>The agent may also format results or trigger subsequent workflows as needed.</p>
</li>
</ul>
</li>
</ol>
<hr />
<h1 id="heading-key-advantages"><strong>Key Advantages</strong></h1>
<ul>
<li><p><strong>Enhanced Flexibility:</strong> Planning patterns allow agents to dynamically adapt their actions based on changing goals, inputs, or unexpected obstacles, rather than following rigid, pre-set workflows.</p>
</li>
<li><p><strong>Improved Problem-Solving</strong>: By decomposing complex tasks into manageable subtasks, agents can systematically tackle multifaceted problems that would overwhelm traditional, single-step agents.</p>
</li>
<li><p><strong>Greater Efficiency:</strong> Intelligent sequencing and prioritization of subtasks reduce redundant work and optimize resource allocation, leading to faster and more accurate outcomes.</p>
</li>
<li><p><strong>Resilience and Robustness</strong>: Agents can recover from failures or adapt to new information mid-execution, ensuring progress even when initial plans encounter issues.</p>
</li>
<li><p><strong>Scalability:</strong> Planning patterns support modular workflows, making it easier to scale up to more complex tasks or coordinate multiple specialized agents.</p>
</li>
</ul>
<hr />
<h1 id="heading-exploring-the-various-subtypes">Exploring the various subtypes</h1>
<p>Planning patterns have several subtypes or categories that can be either implemented remotely or together for a seamless execution. Each category is used for a specific purpose and goal, with the state management and end goal state. Let’s deep dive into various subtypes of the planning pattern</p>
<h2 id="heading-1-classical-planning">1) Classical Planning</h2>
<p>Classical planning is a foundational approach in planning design patterns where the objective is to find a sequence of actions (a plan) that transitions an agent from a specific initial state to a goal state, under the assumptions that the world is <strong>static, deterministic, and fully observable</strong></p>
<p><strong>Core Assumptions</strong>:</p>
<ul>
<li><p>Known initial state.</p>
</li>
<li><p>Deterministic actions without uncertainty.</p>
</li>
<li><p>Full observability.</p>
</li>
<li><p>No concurrency in actions – one at a time</p>
</li>
</ul>
<p><strong>State Representation and State Diagram:</strong></p>
<p>States are typically represented as <strong>sets of logical propositions</strong> (predicates), and <strong>actions/operators</strong> have defined preconditions and effects that modify the state</p>
<ul>
<li><p><strong>State</strong>: A conjunction of predicates or propositions describing the world at a given time</p>
<p>  (e.g., <em>At(Truck1, Melbourne) ∧ At(Truck2, Sydney)</em>).</p>
</li>
<li><p><strong>Actions/Operators</strong>: Defined by preconditions (what must be true to execute) and effects (how the state changes after execution).</p>
</li>
<li><p><strong>Goal</strong>: A set of predicates that must be satisfied in the final state.</p>
</li>
<li><p><strong>Nodes</strong>: Represent states (sets of predicates).</p>
</li>
<li><p><strong>Edges</strong>: Represent actions that transition from one state to another by applying their effects</p>
</li>
</ul>
<h3 id="heading-forward-state-space-planning-fssp">Forward State Space Planning (FSSP)</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752063429004/805bd36b-5a2c-4d17-84f5-0ba6db99751a.png" alt class="image--center mx-auto" /></p>
<p>Forward State Space Planning (also known as progression planning):</p>
<ul>
<li><p>Starts at the initial state and applies applicable actions to generate successor states.</p>
</li>
<li><p>Continues expanding nodes (states) by applying actions until a state satisfying the goal is reached.</p>
</li>
<li><p>Common search algorithms: Breadth-First Search, Depth-First Search, A*, etc.</p>
</li>
</ul>
<h3 id="heading-backward-state-space-planning-bssp">Backward State Space Planning (BSSP)</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752063491741/339142d7-4524-476b-9dff-5a94168519fb.png" alt class="image--center mx-auto" /></p>
<p>Backward State Space Planning (also known as regression planning):</p>
<ul>
<li><p>Starts at the goal state and works backward, identifying which actions could have produced the current (goal) state.</p>
</li>
<li><p>For each action, it regresses the goal through the action to determine the necessary conditions in the previous state.</p>
</li>
<li><p>Continues until a state is found that matches the initial state.</p>
</li>
<li><p>At each step, the planner determines which actions could achieve the current subgoal and what preconditions must be true before those actions.</p>
</li>
</ul>
<h2 id="heading-2-parallel-planning">2) Parallel Planning</h2>
<p>Parallel Planning is an approach where multiple actions are <strong>executed simultaneously,</strong> rather than sequentially, to reach a goal state <strong>more efficiently</strong>. This paradigm is especially valuable in environments where actions <em>do not interfere with each other</em> and can be performed <strong>concurrently</strong>, reducing the overall number of time steps required to achieve the objective.</p>
<p><strong>Flow of Parallel Planning</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752063719689/89653a4c-90f3-4c08-aa56-41dc9c8abd16.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Initial State</strong>: Start with a representation of the world.</p>
</li>
<li><p><strong>Action Selection</strong>: At each time step, identify all possible actions whose preconditions are satisfied and whose effects do not interfere with each other.</p>
</li>
<li><p><strong>Parallel Execution</strong>: Apply the selected set of actions simultaneously, updating the state.</p>
</li>
<li><p><strong>State Transition</strong>: Move to the new state resulting from the combined effects of the parallel actions.</p>
</li>
<li><p><strong>Repeat</strong>: Continue selecting and executing parallel action sets until the goal state is reached.</p>
</li>
<li><p><strong>Plan Output</strong>: The result is a plan where each step may contain multiple actions, reducing the total number of steps compared to sequential planning.</p>
</li>
</ol>
<h3 id="heading-multi-goal-pursuit">Multi-Goal Pursuit</h3>
<p>Multi-goal pursuit refers to scenarios where an agent or a group of agents <strong>simultaneously</strong> works toward <strong>achieving multiple goals</strong>, which may be <em>independent, overlapping, or even conflicting</em>. In real-world settings, users or agents often pursue several goals concurrently and interleave actions for different goals within the same activity sequence.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li><p><strong>Concurrent and Interleaving</strong>: Actions for different goals may be mixed within a plan, not strictly separated.</p>
</li>
<li><p><strong>Plan Recognition</strong>: Recognizing and managing multiple goals is a challenge, often requiring advanced planning or probabilistic reasoning.</p>
</li>
<li><p><strong>Resource Management</strong>: Agents must allocate resources and prioritize among competing or parallel goals.</p>
</li>
</ul>
<h3 id="heading-synchronous-parallel-planning">Synchronous Parallel Planning</h3>
<p>Synchronous parallel planning is a planning approach where multiple actions are executed at the same time step, but only if they are non-interfering (i.e., their preconditions and effects do not conflict). All agents or sub-plans synchronize at each planning step, and the system waits until all parallel actions are ready to execute before moving to the next step.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li><p><strong>Simultaneous Execution</strong>: Multiple actions occur together, maximizing efficiency when possible.</p>
</li>
<li><p><strong>Synchronization Point</strong>: All actions in a parallel step start and finish together.</p>
</li>
<li><p><strong>Strict Non-Interference</strong>: Only actions that do not conflict can be grouped.</p>
</li>
</ul>
<p><strong>Example Flow:</strong></p>
<ul>
<li><p>Time Step 1: → {Action A, Action B} (executed in parallel)</p>
</li>
<li><p>Time Step 2: → {Action C, Action D} (executed in parallel)</p>
</li>
</ul>
<p><strong>Use Cases:</strong> Robotics (multiple arms working in unison), manufacturing lines, or any system where coordination and timing are critical.</p>
<h3 id="heading-asynchronous-parallel-planning">Asynchronous Parallel Planning</h3>
<p>Asynchronous parallel planning allows actions to be executed in parallel, but without requiring synchronization points. Each action or agent can proceed independently as soon as its preconditions are met, regardless of the state of other actions. This approach is more flexible and can lead to faster completion, especially in distributed or loosely coupled systems.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li><p>Independent Execution: Actions start as soon as possible, not waiting for others.</p>
</li>
<li><p>No Global Synchronization: Agents or sub-plans do not need to align their steps.</p>
</li>
<li><p>Higher Throughput: Can exploit opportunities for concurrency more aggressively.</p>
</li>
</ul>
<p><strong>Example Flow:</strong></p>
<ul>
<li><p>Action A → starts at t=0, completes at t=2</p>
</li>
<li><p>Action B → starts at t=1 (as soon as its preconditions are met), completes at t = 3</p>
</li>
<li><p>Action C → starts at t=2, completes at t=4</p>
</li>
</ul>
<p><strong>Use Cases:</strong> Distributed computing, cloud orchestration, and multi-agent systems with independent tasks.</p>
<h2 id="heading-3-hierarchical-planning">3) Hierarchical Planning</h2>
<p>Hierarchical planning is a structured approach to solving complex planning problems by organizing tasks and actions into multiple levels of abstraction or hierarchy. This method allows a system to break down a high-level goal into smaller, more manageable subgoals and tasks, which can then be further refined until primitive, executable actions are reached.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752064222137/f1b2521a-2b86-47b2-a5dd-a150d53ed6c9.png" alt class="image--center mx-auto" /></p>
<p><strong>Core Concepts:</strong></p>
<ul>
<li><p><strong>Top-Down Decomposition:</strong> Breaking high-level goals into progressively smaller subgoals and tasks.</p>
</li>
<li><p><strong>Bottom-Up Composition:</strong> Synthesizing lower-level solutions or actions to form higher-level plans.</p>
</li>
<li><p><strong>Multilevel Abstraction:</strong> Planning and reasoning occur at various levels of detail, from abstract strategies to concrete actions.</p>
</li>
</ul>
<h3 id="heading-top-down-decomposition">Top-Down Decomposition</h3>
<p>Definition:<br />This is the primary process in hierarchical planning, where a complex, abstract goal is recursively broken down into subgoals and then into primitive actions.</p>
<p><strong>Flow:</strong></p>
<ul>
<li><p>Start with the main (high-level) goal.</p>
</li>
<li><p>Decompose it into a set of subgoals or tasks.</p>
</li>
<li><p>Further decompose each subgoal until reaching actions that the system can directly execute.</p>
</li>
<li><p>At each level, only relevant details are considered, reducing complexity.</p>
</li>
</ul>
<p><strong>Example</strong>:<br />Goal: "Plan a wedding" :<br />→ Subgoals: Book venue, arrange catering, send invitations<br />→ Further subgoals: For "Book venue": shortlist venues, visit venues, finalize booking<br />→ Primitive actions: Call venue, sign contract, make payment</p>
<h3 id="heading-bottom-up-composition">Bottom-Up Composition</h3>
<p>Definition:<br />This approach works in the reverse direction, where solutions to lower-level tasks are composed to achieve higher-level goals.</p>
<p><strong>Flow:</strong></p>
<ul>
<li><p>Solve or plan for the most detailed, concrete tasks first.</p>
</li>
<li><p>Aggregate these solutions to form the solution for their parent subgoals.</p>
</li>
<li><p>Continue aggregating upward until the top-level goal is achieved.</p>
</li>
</ul>
<p><strong>Example</strong>:<br />Primitive actions (e.g., call venue, sign contract)<br />→ Compose into "Book venue" subgoal<br />→ Compose all subgoals to complete the "Plan a wedding" goal</p>
<h3 id="heading-multilevel-abstraction">Multilevel Abstraction</h3>
<p>Hierarchical planning operates across multiple levels of abstraction, allowing the planner to focus on different granularities of the problem as needed.</p>
<p><strong>Architecture:</strong></p>
<ul>
<li><p><strong>High-Level Layer</strong>: Abstract goals and strategies (e.g., "organize event")</p>
</li>
<li><p><strong>Mid-Level Layer</strong>: Intermediate subgoals (e.g., "arrange logistics")</p>
</li>
<li><p><strong>Low-Level Layer</strong>: Concrete, executable actions (e.g., "book taxi").</p>
</li>
</ul>
<p><strong>Benefits:</strong></p>
<ul>
<li><p>Reduces computational complexity by narrowing focus at each level.</p>
</li>
<li><p>Supports efficient plan generation, monitoring, and adaptation in dynamic environments.</p>
</li>
</ul>
<h2 id="heading-4-probabilistic-planning">4) Probabilistic Planning</h2>
<p>Probabilistic planning is an approach where an agent must make decisions under <strong>uncertainty</strong>, specifically when actions can have <strong>multiple possible outcomes</strong>, each with a <strong>certain probability.</strong> Unlike classical (deterministic) planning, where the effects of actions are <em>known and predictable</em>, probabilistic planning explicitly models the likelihood of different outcomes, allowing for more robust and realistic decision-making in dynamic environments</p>
<p><strong>Key Features:</strong></p>
<ul>
<li><p><strong>Uncertainty Modeling</strong>: Actions may lead to different results, each with an associated probability.</p>
</li>
<li><p><strong>Goal</strong>: Maximize the expected reward or minimize the expected cost, rather than guaranteeing a specific outcome.</p>
</li>
<li><p><strong>Continuous Belief Space</strong>: Probabilities make the state space continuous and potentially infinite, increasing complexity</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752066604144/fe8c3c05-dfb8-49fa-a2ec-116563e3a688.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-markov-decision-processes-mdp">Markov Decision Processes (MDP)</h3>
<p>A Markov Decision Process (MDP) is a <strong>mathematical framework</strong> for modeling decision-making in environments where outcomes are partly random and partly under the control of a decision-maker.<br />MDPs provide the <strong>formal foundation</strong> for probabilistic planning with <strong>full observability</strong>, modeling the environment’s uncertainty, and guiding the agent to <em>maximize the expected reward.</em></p>
<p><strong>State Flow in MDP:</strong></p>
<ol>
<li><p>Start at the Initial State</p>
</li>
<li><p>Select Action (based on policy)</p>
</li>
<li><p>Transition to Next State (according to transition probabilities)</p>
</li>
<li><p>Receive Reward</p>
</li>
<li><p>Repeat until the goal or terminal state is reached</p>
</li>
</ol>
<h3 id="heading-partially-observable-markov-decision-processes-pomdp">Partially Observable Markov Decision Processes (POMDP)</h3>
<p>A Partially Observable Markov Decision Process (POMDP) extends the MDP framework to situations where the <em>agent cannot directly observe the true state of the environment.</em><br />POMDPs model planning under both action <strong>uncertainty</strong> and <strong>partial observability</strong>, making them essential for real-world problems where the agent does not have <strong>perfect information.</strong></p>
<p><strong>State Flow in POMDP:</strong></p>
<ol>
<li><p><strong>Agent Maintains Belief State:</strong> A probability distribution over possible true states.</p>
</li>
<li><p><strong>Select Action</strong>: Based on current belief.</p>
</li>
<li><p><strong>Environment Transitions:</strong> To a new (unknown) state, emits an observation.</p>
</li>
<li><p><strong>Agent Updates Belief:</strong> Using the observation and transition/observation models.</p>
</li>
<li><p><strong>Repeat</strong>: Until the goal or terminal belief is reached.</p>
</li>
</ol>
<h2 id="heading-5-temporal-planning">5) Temporal Planning</h2>
<p>Temporal planning is an <strong>advanced AI planning paradigm</strong> where actions are not just sequenced, but also <strong>scheduled over time</strong>, taking into account their <em>durations, possible overlaps (concurrency), and complex temporal constraints</em>. Unlike classical planning—where actions are considered <strong>instantaneous</strong> and <strong>strictly sequential</strong>—temporal planning models the real-world scenario where multiple actions may occur <strong>simultaneously</strong>, each with its own start and end times, and where <strong>timing relationships and deadlines</strong> matter</p>
<p><strong>State Representation and flow</strong></p>
<ul>
<li><p><strong>Timed State</strong>: A state includes not only the current facts about the world but also the current time and the status (active, pending, completed) of all ongoing actions.</p>
</li>
<li><p><strong>Temporal Constraints</strong>: Each action or event may have constraints such as earliest start time, latest finish time, or required intervals between actions.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752232805379/2bb3f266-05a6-4763-b7c2-67647c6e8613.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Initial State</strong>: Define the starting conditions and time.</p>
</li>
<li><p><strong>Action Selection</strong>: Identify which actions can start, considering both logical and temporal preconditions.</p>
</li>
<li><p><strong>Scheduling</strong>: Assign start and end times to actions, checking for overlaps and constraint satisfaction.</p>
</li>
<li><p><strong>State Transition</strong>: Move to the next state, updating time and the status of all actions.</p>
</li>
<li><p><strong>Goal Check</strong>: Repeat until the goal state is achieved within all temporal and resource constraints.</p>
</li>
</ol>
<h3 id="heading-time-windowed-planning">Time Windowed Planning</h3>
<p>Planning where actions or tasks must be performed within <strong>specific time intervals</strong> (time windows).</p>
<ul>
<li><p><strong>Example</strong>: Delivering a package between 10:00 AM and 12:00 PM</p>
</li>
<li><p><strong>Challenges</strong>: Coordinating multiple actions to fit within overlapping or tight time windows, especially when resources are shared.</p>
</li>
<li><p><strong>Applications</strong>: Logistics, delivery routing, healthcare appointment scheduling.</p>
</li>
</ul>
<h3 id="heading-deadline-based-scheduling">Deadline-Based Scheduling</h3>
<p>Scheduling tasks so that each is completed before a specified deadline.</p>
<ul>
<li><p><strong>Deadline-Driven Prioritization</strong>: Tasks with earlier deadlines are prioritized.</p>
</li>
<li><p><strong>Preemptive Scheduling</strong>: Ongoing tasks may be interrupted to ensure critical deadlines are met.</p>
</li>
<li><p><strong>Applications</strong>: Real-time systems, multimedia streaming, operating system process scheduling, safety-critical automation.</p>
</li>
</ul>
<h3 id="heading-resource-constrained-temporal-planning">Resource Constrained Temporal Planning</h3>
<p>Temporal planning where actions require limited resources, and plans must ensure no resource is over-allocated at any time.</p>
<ul>
<li><p><strong>Resource Allocation</strong>: Assigns resources to tasks while considering their availability over time.</p>
</li>
<li><p><strong>Conflict Resolution</strong>: Prevents resource contention and ensures all temporal/resource constraints are met.</p>
</li>
<li><p><strong>Applications</strong>: Manufacturing, project management, multi-robot coordination, cloud computing</p>
</li>
</ul>
<h2 id="heading-6-reactive-planning">6) Reactive Planning</h2>
<p>Reactive planning is a type of planning pattern where agents <strong>select</strong> and <strong>execute</strong> actions in <strong>real-time</strong>, <em>responding instantly to changes in their environment</em> rather than following a <em>pre-computed, long-term plan</em>. This approach is ideal for <strong>highly dynamic</strong> or <strong>unpredictable settings</strong>, as the agent continuously senses its surroundings and decides the next best action based solely on the current context, often using <strong>predefined stimulus-response</strong> rules or <strong>behavior tables</strong>. Unlike classical planning, which generates a full sequence of actions in advance, reactive planning computes just the <strong>immediate next action</strong>, enabling <strong>rapid adaptation</strong> but often <strong>lacking long-term foresight.</strong></p>
<p><strong>State Representation and flow</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752233611248/161f9614-8c6d-4fe5-9d3c-7f594e05ba05.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Perception</strong>: The agent analyzes the environment and gathers current state data.</p>
</li>
<li><p><strong>Action Selection</strong>: Based on the current perception, the agent uses rules or behavior tables to choose the next action.</p>
</li>
<li><p><strong>Execution</strong>: The chosen action is immediately executed, affecting the environment.</p>
</li>
<li><p><strong>Repeat</strong>: The agent loops back to perception, continuously reacting to new stimuli or changes</p>
</li>
</ol>
<h3 id="heading-event-driven-planning">Event-Driven Planning</h3>
<ul>
<li><p>The agent’s behavior is triggered by <strong>specific external</strong> or <strong>internal events</strong> (e.g., obstacle detected, temperature threshold crossed).</p>
</li>
<li><p><strong>Role</strong>: Enables the agent to prioritize and respond to critical events as they occur, rather than following a fixed schedule or sequence.</p>
</li>
</ul>
<h3 id="heading-policy-based-adaptation">Policy-Based Adaptation</h3>
<ul>
<li><p>The agent follows a <strong>set of policies</strong> (mapping from situations to actions) that guide its behavior in different contexts.</p>
</li>
<li><p><strong>Role</strong>: Supports <strong>flexible and adaptive</strong> responses, as the agent can switch policies based on the current state or environment, allowing for more sophisticated and context-aware reactivity.</p>
</li>
</ul>
<h3 id="heading-subsumption-architecture">Subsumption Architecture</h3>
<ul>
<li><p>A layered control system where <strong>higher-level</strong> behaviors can <strong>override</strong> or “<strong>subsume</strong>” lower-level ones.</p>
</li>
<li><p><strong>Role</strong>: Each layer handles a different level of behavior (e.g., obstacle avoidance at the lowest, goal-seeking at a higher level), and the most relevant behavior at any moment takes control. This enables robust, emergent behavior from simple, modular rules.</p>
</li>
</ul>
<h2 id="heading-7-goal-oriented-planning">7) Goal-Oriented Planning</h2>
<p>It is a planning approach where agents <strong>select, pursue, and adapt</strong> their actions to achieve specific objectives, <strong>dynamically</strong> <strong>generating</strong> and <strong>updating plans</strong> based on the current state of the environment and available resources.</p>
<p>This paradigm is exemplified by frameworks like <strong>Goal-Oriented Action Planning (GOAP),</strong> which models the world as a set of states, defines goals as desired outcomes, and uses <strong>planning algorithms</strong> (such as A*) to find optimal action sequences that transition the agent from its current state to the goal state. Each action is associated with <strong>preconditions</strong> (what must be true to execute it) and <strong>effects</strong> (how it changes the world), and the <strong>planner continuously monitors</strong> and <strong>adapts</strong> to changes, replanning if goals or world states shift.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752234064297/00b19bb1-1469-4660-b565-b77b33c36ea5.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-single-goal">Single Goal</h3>
<ul>
<li><p>The agent focuses on achieving <strong>one specific objective</strong> at a time.</p>
</li>
<li><p>The planning algorithm generates the <strong>best sequence of actions</strong> to reach that goal, <strong>updating</strong> or <strong>replanning</strong> if the environment changes or the goal is achieved/interrupted.</p>
</li>
<li><p><strong>Example</strong>: In a game, an NPC may have the single goal “find health pack” and will plan all actions around that objective until it is met.</p>
</li>
</ul>
<h3 id="heading-multi-goal">Multi Goal</h3>
<ul>
<li><p>The agent <strong>manages</strong> and <strong>prioritizes multiple goals</strong>, which may be <em>independent, overlapping, or even conflicting.</em></p>
</li>
<li><p>The planner must <strong>decompose</strong>, <strong>sequence</strong>, and sometimes <strong>interleave</strong> actions to pursue several objectives, often optimizing for utility or resource constraints.</p>
</li>
<li><p><strong>Example</strong>: A robot in a warehouse may simultaneously pursue “deliver package,” “recharge battery,” and “avoid obstacles,” dynamically adjusting priorities as conditions change.</p>
</li>
</ul>
<h3 id="heading-conditional-goal-pursuit">Conditional Goal Pursuit</h3>
<ul>
<li><p>The agent’s goals or the path to those goals <strong>change based on conditions</strong> in the <strong>environment</strong> or the <strong>outcomes of previous actions</strong>.</p>
</li>
<li><p>The planner <em>adapts in real time, abandoning, switching, or reprioritizing goals</em> as new information emerges or as utility values change.</p>
</li>
<li><p><strong>Example</strong>: In the GOAP framework, if an agent’s goal becomes impossible or less valuable due to a new situation (e.g., an enemy appears), it will select a new goal and generate a new plan accordingly.</p>
</li>
</ul>
<h2 id="heading-8-prompt-chaining">8) Prompt Chaining</h2>
<p>Prompt Chaining (Sequential) Planning is an AI technique where <strong>complex tasks are decomposed</strong> into a <strong>sequence of simpler, manageable subtasks</strong>, each handled by a <strong>dedicated prompt</strong>. The <strong>output</strong> of one prompt becomes the <strong>input</strong> for the next, guiding the AI through a structured, step-by-step reasoning process to achieve a coherent and accurate final result. This approach is especially effective for large language models (LLMs), allowing them to <strong>tackle intricate problems</strong> in a <strong>controlled, transparent, and modular fashion.</strong></p>
<h3 id="heading-linear-chaining">Linear Chaining</h3>
<p>Each prompt follows directly from the previous one in a <strong>strict, unbranched</strong> sequence. The output of step <em>n</em> is always used as the input for step <em>n+1</em></p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfOwRuUx3Fsp09dlBFwTgBRTIZ3klHGymkeEfiQsFyh3SmnFYoEBoRdM46etbFNXjcnWSIwcMbJGwiyK5atSA8WqLql9lRHoYyiKq3R4Tj7vSMaCAyPp6rCJoeoRLYrNCzDLrJhRw?key=PCHX87OUCQhNqNiTsFWfAg" alt /></p>
<h3 id="heading-conditional-chaining">Conditional Chaining</h3>
<p>The next prompt in the sequence is chosen based on the <strong>content</strong> or <strong>evaluation</strong> of the previous output, allowing for branching logic or dynamic adaptation within the chain.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXc8yXpviX5liem1N00oH1Xrtj1tRmHM6gkC5lVDjA_pd7jc8zd95-WsykdxfzVdJZu0SWXHsZl86llvOQv-6yMZzYPWXXk93XNRR9mv1FaB2F5CftpYlWwpQmrNtA6eiKZioCsYvQ?key=PCHX87OUCQhNqNiTsFWfAg" alt /></p>
<p><strong>Use Case</strong>: Useful for tasks that require <strong>decision points</strong>, <strong>error handling</strong>, or <strong>adaptive reasoning</strong>, such as customer support flows (if answer is unclear, ask for clarification; if clear, proceed to next step) or diagnostic processes.</p>
<hr />
<h1 id="heading-limitations-of-planning-pattern">Limitations of Planning Pattern</h1>
<ul>
<li><p><strong>Complexity in Design and Implementation:</strong><br />  Developing adaptive, planning-based AI systems requires significant expertise and effort, especially for large-scale or highly dynamic environments.</p>
</li>
<li><p><strong>Resource Intensive:</strong><br />  These systems often demand substantial computational power, especially for real-time or large-scale applications.</p>
</li>
<li><p><strong>Transparency and Trust:</strong><br />  The decision-making process can become opaque, raising concerns about explainability and trust in automated outcomes.</p>
</li>
<li><p><strong>Ethical and Bias Issues:</strong><br />  Ensuring that planning algorithms are unbiased and ethically sound is a significant challenge, particularly as they are given more autonomy.</p>
</li>
<li><p><strong>Data Dependency:</strong><br />  The effectiveness of planning patterns relies heavily on the quality and completeness of input data.</p>
</li>
</ul>
<hr />
<h1 id="heading-future-scope-and-plans">Future Scope and Plans</h1>
<ul>
<li><p><strong>Greater Autonomy:</strong><br />  As planning patterns mature, AI agents will become increasingly autonomous, capable of making complex decisions with minimal human oversight.</p>
</li>
<li><p><strong>Integration with Multi-Agent Systems:</strong><br />  Future developments will see more collaborative planning among multiple agents, enabling sophisticated teamwork and distributed problem-solving.</p>
</li>
<li><p><strong>Explainable and Trustworthy AI:</strong><br />  Research will focus on making planning decisions more transparent and understandable to users, addressing trust and accountability concerns.</p>
</li>
</ul>
<hr />
<blockquote>
<p>To conclude, planning is the most important step in AI design patterns and workflows. This step enables AI agents to outline the overall plan and act on each result to deliver better output, but it is mainly used with other design patterns like Tool use and Reflection patterns to create end-to-end robust, scalable applications.</p>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[Diving deep into RAG (Retrieval Augmented Generation)]]></title><description><![CDATA[The landscape of artificial intelligence is rapidly evolving, and one of the most transformative breakthroughs in recent years is the Retrieval-Augmented Generation (RAG). Traditional large language models (LLMs) have demonstrated impressive abilitie...]]></description><link>https://blog.axicov.com/diving-deep-into-rag</link><guid isPermaLink="true">https://blog.axicov.com/diving-deep-into-rag</guid><category><![CDATA[AI]]></category><category><![CDATA[Retrieval-Augmented Generation]]></category><category><![CDATA[RAG ]]></category><dc:creator><![CDATA[Abhirup Ghosh]]></dc:creator><pubDate>Fri, 04 Jul 2025 13:10:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/nGoCBxiaRO0/upload/e8bf977045143067e45f5bf3efd6f923.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The landscape of artificial intelligence is <strong>rapidly evolving</strong>, and one of the most transformative breakthroughs in recent years is the <strong>Retrieval-Augmented Generation (RAG)</strong>. Traditional <strong>large language models (LLMs)</strong> have demonstrated impressive abilities in generating <strong>fluent</strong> and <strong>contextually relevant</strong> text, but they often falter when it comes to providing <strong>up-to-date, factual, or domain-specific information</strong>. RAG addresses these limitations by combining the generative power of LLMs with the precision of real-time information retrieval from external knowledge sources.</p>
<p>In this article, we’ll explore <strong>what RAG is</strong>, examine its <strong>diverse types</strong>, delve into <strong>real-world applications,</strong> and discuss the <strong>future trends</strong> shaping this exciting field. But before diving deep, let’s start from the beginning and understand some basics first.</p>
<h1 id="heading-how-the-gen-ai-model-works">How the Gen AI Model Works?</h1>
<p>A basic <strong>GenAI model (Generative AI model)</strong> works by learning patterns from <strong>large datasets</strong> and using that knowledge to generate new content—such as <strong>text</strong>, <strong>images</strong>, or <strong>code</strong>—based on <strong>user prompts</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751619010531/9bc16715-c8c0-44a4-ae6d-30efcb8783e2.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-the-workflow-explained">The Workflow explained</h2>
<ul>
<li><p><strong>Training on Large Datasets:</strong> The model is trained on vast amounts of data <em>(text, images, etc.)</em>, learning the patterns, <em>language structures</em>, and <em>factual knowledge</em> present in that data.</p>
</li>
<li><p><strong>Prompting</strong>: Users provide a prompt (<em>a question or instruction</em>), and the model generates a response based on what it has learned from its training data.</p>
</li>
<li><p><strong>Content Generation</strong>: The model uses <em>neural networks</em> to predict and generate the next <em>word, sentence, or image segment</em>, creating content that appears <em>original and contextually relevant</em>.</p>
</li>
<li><p><strong>Response to User</strong>: The generated content is returned to the user, typically all at once or in a streaming fashion.</p>
</li>
</ul>
<h2 id="heading-limitations-of-the-basic-genai-model">Limitations of The Basic GenAI Model</h2>
<p>Traditional GenAI models rely solely on <strong>pre-trained data</strong>, which implies that their knowledge is <strong>frozen at the time of training</strong>. This leads to significant <strong>drawbacks</strong>, especially when users expect r<strong>eal-time, factually accurate responses</strong>.</p>
<p><strong>For instance, consider this user query:</strong></p>
<blockquote>
<p><em>“Who won the 2025 World Test Championship?”</em></p>
</blockquote>
<ul>
<li><p>The model searches its internal training data for relevant information. If the model was last trained on data up to <strong>2024 or early 2025</strong>, it has no actual records or results from the tournament.</p>
</li>
<li><p><strong>Hallucination</strong>: The model generates an answer by <strong><em>guessing</em></strong> based on historical winners (e.g., "India" or "Australia" since they have been <strong>frequent champions</strong>) or using <strong>patterns or popularity</strong>, not the facts from 2025.</p>
</li>
<li><p>It may give a confident answer: <em>“India won the 2025 World Test Championship”,</em> <strong>though actually South Africa won it</strong>.</p>
</li>
<li><p>The model <strong>cannot cite a real, up-to-date source for its answer</strong>, making it impossible for the user to verify the claim.</p>
</li>
</ul>
<hr />
<h1 id="heading-methods-to-improve-llm-output">Methods to Improve LLM Output</h1>
<p>To get <strong>better, more accurate, and contextually relevant</strong> outputs from GenAI models, <strong>three primary approaches</strong> are widely used:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751606718229/fff77954-cca0-4c85-8205-d2cb9ed82750.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-prompt-engineering">Prompt Engineering</h2>
<p><strong>Prompt engineering</strong> is the process of <strong>designing and refining</strong> input prompts to effectively guide generative AI models—especially large language models (LLMs)—to produce <strong>desired, high-quality outputs</strong>. This involves carefully <em>crafting the wording, structure, and context of the prompt</em> or <strong>role-based guidance</strong> so the AI understands the user’s intent and generates <em>relevant, accurate, and useful responses</em>.</p>
<h3 id="heading-advantages-of-prompt-engineering">Advantages of Prompt Engineering</h3>
<ul>
<li><p><strong>Fast and cost-effective</strong> – No need for model retraining or additional infrastructure.</p>
</li>
<li><p><strong>Flexible</strong> – Works across diverse domains and creative tasks.</p>
</li>
<li><p><strong>Accessible</strong> – Ideal for non-technical users and rapid prototyping.</p>
</li>
</ul>
<h3 id="heading-limitations-of-prompt-engineering">Limitations of Prompt Engineering</h3>
<ul>
<li><p><strong>Dependent on model knowledge</strong> – Can’t access new or domain-specific information not present in the training data.</p>
</li>
<li><p><strong>Trial and error</strong> – May require multiple iterations to get the desired output.</p>
</li>
<li><p><strong>Limited control</strong> – No guarantees of consistent output in complex scenarios.</p>
</li>
</ul>
<h3 id="heading-when-to-use-prompt-engineering">When to Use Prompt Engineering?</h3>
<ul>
<li><p>You want quick improvements in clarity, tone, or structure.</p>
</li>
<li><p>The model already knows the topic you’re working on.</p>
</li>
</ul>
<h2 id="heading-fine-tuning">Fine-Tuning</h2>
<p><strong>Fine-tuning</strong> is the process of training a pre-existing generative AI model on a <strong>specialized, domain-specific dataset</strong> to adapt it for <strong>niche tasks or industries</strong>. Unlike prompt engineering, fine-tuning changes the model’s <strong>internal parameters</strong>, allowing it to deeply learn new information.</p>
<h3 id="heading-advantages-of-fine-tuning">Advantages of Fine-Tuning</h3>
<ul>
<li><p><strong>Deep customization</strong> – The model learns domain-specific <em>vocabulary, patterns, and nuances</em>.</p>
</li>
<li><p><strong>Higher accuracy</strong> – Especially useful for repetitive and predictable tasks.</p>
</li>
<li><p><strong>Improved consistency</strong> – Ideal for production-level tasks in specialized sectors.</p>
</li>
</ul>
<h3 id="heading-limitations-of-fine-tuning">Limitations of Fine-Tuning</h3>
<ul>
<li><p><strong>Resource-intensive</strong> – Requires significant computing power, time, and data engineering.</p>
</li>
<li><p><strong>High maintenance</strong> – Needs re-training as domain knowledge evolves.</p>
</li>
<li><p><strong>Less flexibility</strong> – Not suitable for rapidly changing or broad information domains.</p>
</li>
</ul>
<h3 id="heading-when-should-you-use-fine-tuning">When Should You Use Fine-Tuning?</h3>
<ul>
<li><p>Your use case is <strong>highly specialized</strong> and not covered well by base models.</p>
</li>
<li><p>You need <strong>precise and consistent outputs</strong> (e.g., medical diagnosis support, legal contract classification).</p>
</li>
</ul>
<h2 id="heading-retrieval-augmented-generation-rag">Retrieval-Augmented Generation (RAG)</h2>
<p><strong>Retrieval-Augmented Generation (RAG)</strong> is a powerful <strong>hybrid technique</strong> that enhances language models by integrating them with <strong>external knowledge sources</strong> like <em>databases, document stores, or the web</em>. Unlike basic GenAI models, RAG provides responses that are <strong>factually grounded, up-to-date, and contextually accurate</strong>.</p>
<h3 id="heading-advantages-of-rag">Advantages of RAG</h3>
<ul>
<li><p><strong>Factual accuracy</strong> – Combines model reasoning with real-world, retrieved data.</p>
</li>
<li><p><strong>Reduced hallucinations</strong> – Limits the model's tendency to "make up" facts.</p>
</li>
<li><p><strong>Dynamic knowledge</strong> – No need for retraining when information changes.</p>
</li>
<li><p><strong>Better source attribution</strong> – You can trace where the information came from.</p>
</li>
</ul>
<h3 id="heading-limitations-of-rag">Limitations of RAG</h3>
<ul>
<li><p><strong>Complex integration</strong> – Requires retrieval infrastructure (like vector databases, embeddings, and indexing).</p>
</li>
<li><p><strong>Latency</strong> – Retrieval adds a step before generation, which can increase response time.</p>
</li>
<li><p><strong>Data maintenance</strong> – You need to keep the external knowledge base updated and relevant.</p>
</li>
</ul>
<h3 id="heading-when-should-you-use-rag">When Should You Use RAG?</h3>
<ul>
<li><p>You need <strong>real-time or frequently updated</strong> information.</p>
</li>
<li><p><strong>Accuracy</strong> and <strong>source grounding</strong> are critical (e.g., in enterprise, finance, and healthcare).</p>
</li>
</ul>
<hr />
<h1 id="heading-brief-history-of-rag">Brief History of RAG</h1>
<p>The history of Retrieval-Augmented Generation (RAG) is closely tied to the <strong>evolution of question-answering systems</strong> and the <strong>limitations of traditional large language models</strong> (LLMs).</p>
<ul>
<li><p><strong>Early Roots:</strong><br />  The concept of retrieval in AI dates back to the 1960s and 1970s, with early systems like <strong>SHRDLU</strong> and <strong>Baseball</strong>, which could answer natural language questions by retrieving relevant information from a limited dataset. Over time, search engines like <strong>Ask Jeeves</strong> and later <strong>Google</strong> advanced these retrieval techniques, focusing on indexing and ranking information for user queries.</p>
</li>
<li><p><strong>Rise of LLMs and Their Limits:</strong><br />  The late 2010s saw the emergence of powerful pre-trained models like BERT and GPT, which could generate human-like text but were limited by their static, fixed training data. As generative AI became more popular—especially after the release of GPT-3 and user-friendly interfaces like ChatGPT—researchers recognized a major problem: LLMs could not efficiently incorporate new or updated information without expensive retraining.</p>
</li>
<li><p><strong>Birth of RAG (2020):</strong><br />  In 2020, Meta AI (then Facebook AI Research) introduced the RAG framework in their paper <a target="_blank" href="https://arxiv.org/pdf/2005.11401">"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"</a>. This innovation combined the strengths of generative models with retrieval systems. RAG augmented LLMs with a "<strong>non-parametric memory</strong>"—typically a <strong>dense</strong> <strong>vector index of factual databases</strong> like <strong>Wikipedia</strong>—enabling them to fetch relevant information in real time during the generation process</p>
</li>
</ul>
<hr />
<h1 id="heading-working-of-the-rag-model">Working of the RAG Model</h1>
<p>RAG operates through several key stages, integrating retrieval and generation in a seamless pipeline:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751608973432/28c53880-1620-4529-8005-34ec6b2592ec.png" alt class="image--center mx-auto" /></p>
<p><strong>1. Indexing (Knowledge Base Creation)</strong></p>
<ul>
<li><p><strong>Data ingestion</strong>: Gather documents, databases, PDFs, web pages, or other files.</p>
</li>
<li><p><strong>Chunking/splitting</strong>: Break longer documents into smaller, semantically coherent pieces for efficiency.</p>
</li>
<li><p><strong>Embedding</strong>: Convert each chunk into a high-dimensional vector using embedding models (e.g., SBERT, OpenAI embeddings).</p>
</li>
<li><p><strong>Vector database storage</strong>: Store embeddings and metadata in databases like FAISS, Pinecone.</p>
</li>
</ul>
<p><strong>2. Retrieval</strong></p>
<ul>
<li><p><strong>Query embedding</strong>: The User's prompt is also converted into a vector using the same embedding model.</p>
</li>
<li><p><strong>Similarity search</strong>: A retriever (often Dense Passage Retrieval – DPR) finds the top <em>k</em> closest chunks using techniques like <em>Approximate Nearest Neighbor</em> (ANN) search.</p>
</li>
<li><p><strong>Advanced matching</strong>: Sometimes combined with sparse search or reranking models to improve relevance.</p>
</li>
</ul>
<p><strong>3. Augmentation</strong></p>
<ul>
<li><p><strong>Prompt Construction</strong>: Retrieved passages are concatenated or cross-attended with the original user prompt to create an <em>augmented prompt</em></p>
</li>
<li><p>This ensures the LLM has both the question and fresh, factual context to draw upon.</p>
</li>
</ul>
<p><strong>4. Generation</strong></p>
<ul>
<li><p><strong>Grounded response</strong>: The LLM processes the augmented prompt and generates an answer informed by both its internal knowledge and retrieved data.</p>
</li>
<li><p><strong>Optional reranking</strong>: Response quality may be improved via re-ranking passages or extracting citations</p>
</li>
</ul>
<p><strong>5. (Optional) Knowledge Base Updates</strong></p>
<ul>
<li>To maintain accuracy, the external knowledge base can be updated regularly with new data and refreshed embeddings, ensuring the system always references the latest information</li>
</ul>
<hr />
<h2 id="heading-what-is-semantic-search-and-how-is-it-relevant-here">What is Semantic Search, and how is it relevant here?</h2>
<p><strong>Semantic search</strong> enhances RAG results for organizations wanting to add <strong>vast external knowledge</strong> sources to their LLM applications. Modern enterprises store vast amounts of information, like <strong><em>manuals, FAQs, research reports, customer service guides, and human resource document repositories</em></strong>, across various systems. Context retrieval is challenging at scale and consequently lowers generative output quality.</p>
<p>Semantic search technologies can scan large databases of disparate information and retrieve data <strong>more accurately</strong>. For example, they can answer questions such as, <em>"How much was spent on machinery repairs last year?”</em> by mapping the question to the relevant documents and returning specific text instead of search results. Developers can then use that answer to provide <strong>more context to the LLM</strong>.</p>
<p><strong>Conventional or keyword search solutions</strong> in RAG produce <strong>limited</strong> results for <strong>knowledge-intensive tasks</strong>. Developers must also deal with <em>word embeddings, document chunking, and other complexities</em> as they manually prepare their data. In contrast, semantic search technologies do all the work of knowledge base preparation, so developers don't have to. They also generate semantically relevant passages and token words ordered by relevance to <strong>maximize the quality of the RAG payload</strong>.</p>
<hr />
<h2 id="heading-why-do-we-need-to-use-an-embedding-model">Why do we need to use an Embedding Model?</h2>
<p>We convert text to vectorized form using embedding models because this process allows AI systems to <strong>understand and compare</strong> the <em>meaning</em> of words, phrases, or documents, rather than just matching exact keywords. Here’s how and why this helps, especially in RAG and semantic search:</p>
<h3 id="heading-why-convert-to-vectors">Why Convert to Vectors?</h3>
<ul>
<li><p><strong>Captures Meaning and Context:</strong><br />  Embedding models transform text into high-dimensional vectors (arrays of numbers) that encode semantic meaning. Words or phrases with similar meanings end up close together in this vector space, even if they use different vocabulary. For example, “car” and “automobile” would have similar vectors, while “car” and “banana” would be far apart.</p>
</li>
<li><p><strong>Enables Semantic Search:</strong><br />  By working with vectors, search systems can retrieve results based on conceptual relevance, not just keyword overlap. This means a query like “canine behavior” can return documents about “dog training,” since their embeddings are semantically close.</p>
</li>
<li><p><strong>Disambiguates Context:</strong><br />  Embeddings help differentiate between words with multiple meanings (like “bank” as a financial institution vs. “bank” of a river) by considering the surrounding context.</p>
</li>
</ul>
<hr />
<h1 id="heading-types-of-rag-pipeline">Types of RAG Pipeline</h1>
<p>There are multiple types of Retrieval-Augmented Generation (RAG) models, each designed to address specific challenges or optimize for different use cases. The RAG landscape has evolved from simple, original frameworks to advanced, specialized architectures. Here’s an overview of the main types:</p>
<ul>
<li><p>Naive RAG (Normal RAG that we have discussed so far)</p>
</li>
<li><p>Agentic RAG</p>
</li>
<li><p>Multimodal RAG</p>
</li>
<li><p>Corrective RAG (CRAG)</p>
</li>
<li><p>Golden-Retriever RAG</p>
</li>
</ul>
<h2 id="heading-agentic-rag">Agentic RAG</h2>
<p>We are using the LLM model <strong>solely</strong> for generating output based on <strong>augmented prompts</strong> from the vector database. However, <strong>LLMs are far more powerful</strong>, and we can <strong>utilize them wisely</strong> to even make our RAG <strong>more efficient.</strong></p>
<p>Agentic RAG is an advanced evolution of Retrieval-Augmented Generation (RAG) that integrates <strong>autonomous AI agents</strong> into the RAG pipeline, transforming the retrieval and generation process from a static, one-shot interaction into a <strong>dynamic, multi-step, and context-aware system</strong></p>
<h3 id="heading-workflow">Workflow</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751612045612/0c99fa2e-9d45-41cd-bdf9-cb6ea9fe3f20.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p><strong>Agentic Orchestration</strong><br />  An <strong>orchestrator agent</strong> interprets user intent, breaks complex questions into sub-tasks, and deploys specialized agents for retrieval, reasoning, validation, and synthesis.</p>
</li>
<li><p><strong>Dynamic &amp; Adaptive Retrieval</strong></p>
<ul>
<li><p><strong>Retrieval agents</strong> perform iterative searches: reformulating queries, switching sources (vector DBs, APIs, web), re-ranking results, and filtering for reliability.</p>
</li>
<li><p>Multiple rounds allow refinement until a satisfactory context is obtained.</p>
</li>
</ul>
</li>
<li><p><strong>Reasoning &amp; Validation</strong></p>
<ul>
<li><p><strong>Reasoner agents</strong> chain thoughts, connect evidence, cross-check data, assess source credibility, and prevent contradictions.</p>
</li>
<li><p>They may trigger additional retrieval loops or tool use (calculators, APIs) for verification.</p>
</li>
</ul>
</li>
<li><p><strong>Tool &amp; Memory Integration</strong></p>
<ul>
<li><p>Agents can use memory (short/long-term) to recall past interactions or document where they’ve already searched.</p>
</li>
<li><p>They invoke external tools in real time—tools like live web search, APIs, or computation modules—enriching responses and ensuring freshness.</p>
</li>
</ul>
</li>
<li><p><strong>Generation &amp; Refinement</strong></p>
<ul>
<li><p><strong>Generation agents</strong> construct the augmented prompt and produce answers.</p>
</li>
<li><p><strong>Refinement agents</strong> evaluate the initial output, rerun retrieval or reasoning if needed, and polish the final response before delivering it.</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-naive-rag-vs-agentic-rag">Naive RAG vs Agentic RAG</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Naive RAG</strong></td><td><strong>Agentic RAG</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Workflow</strong></td><td>Single-step retrieval → generate</td><td>Multi-step planning, retrieval, and validation loops</td></tr>
<tr>
<td><strong>Decision-makin</strong>g</td><td>Static</td><td>Dynamic orchestration by AI agents</td></tr>
<tr>
<td><strong>Reasoning &amp; validation</strong></td><td>Limited</td><td>Agent-driven reasoning, checks, and corrections</td></tr>
<tr>
<td><strong>Tool access</strong></td><td>Fixed databases</td><td>Web APIs, calculation tools, multi-source retrieval</td></tr>
<tr>
<td><strong>Context &amp; memory</strong></td><td>One-shot context</td><td>Maintains short/long-term context</td></tr>
</tbody>
</table>
</div><h3 id="heading-use-cases">Use Cases</h3>
<ol>
<li><p>Advanced customer support</p>
</li>
<li><p>Healthcare diagnostics</p>
</li>
<li><p>Legal and compliance advisory</p>
</li>
<li><p>Real-time research assistants</p>
</li>
<li><p>Robotics and automation</p>
</li>
</ol>
<hr />
<h2 id="heading-multimodal-rag">Multimodal RAG</h2>
<p><strong>Multimodal RAG (Retrieval-Augmented Generation)</strong> is an advanced AI framework that enables <strong>retrieval and generation across diverse data types</strong>—including text, images, audio, video, and structured data—by embedding all modalities into a <strong>shared vector space</strong> or aligning them through a primary modality for seamless, combined retrieval.</p>
<h3 id="heading-workflow-1">Workflow</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751614027360/4c0c9d32-6668-4928-a43a-537f0eb6e242.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Data Embedding</strong></p>
<ul>
<li><p>Encode various data types (text, images, audio, video) into vectors using multimodal embedding models like CLIP, ALIGN, or audio/text encoders.</p>
</li>
<li><p>Store these embeddings (and metadata) in a multimodal vector database (e.g., FAISS, Weaviate).</p>
</li>
</ul>
</li>
<li><p><strong>Query Embedding &amp; Retrieval</strong></p>
<ul>
<li><p>Convert user queries (whether text, image, or audio) into embeddings using the same models.</p>
</li>
<li><p>Perform a similarity search to retrieve relevant multimodal content (e.g., text passages, matching images, audio clips).</p>
</li>
</ul>
</li>
<li><p><strong>Fusion &amp; Augmentation</strong></p>
<ul>
<li>Align or fuse retrieved multimodal content into a unified context. This may involve cross-modal attention or text grounding of non-text sources.</li>
</ul>
</li>
<li><p><strong>Response Generation</strong></p>
<ul>
<li><p>Feed the fused context into a multimodal LLM (MLLM) or LLM with modality support (e.g., GPT‑4 V, LLaVA).</p>
</li>
<li><p>Generate responses that reference or synthesize information across modalities, producing richer and more accurate outputs.</p>
</li>
</ul>
</li>
</ol>
<h3 id="heading-naive-rag-vs-multimodal-rag">Naive RAG vs Multimodal RAG</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Naive RAG</strong></td><td><strong>Multimodal RAG</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Input Modalities</strong></td><td>Text only</td><td>Text, images, audio, video, structured data</td></tr>
<tr>
<td><strong>Embedding](Query Storage)</strong></td><td>Text embeddings → vector DB</td><td>Multimodal embeddings → shared vector DB</td></tr>
<tr>
<td><strong>Retrieval Process</strong></td><td>Text-based similarity search</td><td>Cross-modal retrieval (e.g., image-query retrieves images + text)</td></tr>
<tr>
<td><strong>Generation Output</strong></td><td>Text-only responses</td><td>Multimodal responses referencing images, charts, and audio descriptions</td></tr>
<tr>
<td><strong>Complexity &amp; Cost</strong></td><td>Low complexity, faster</td><td>Higher complexity, multimodal embedding &amp; fusion required</td></tr>
</tbody>
</table>
</div><h3 id="heading-use-cases-1">Use Cases</h3>
<ol>
<li><p>Medical Diagnostics &amp; Radiology Analysis</p>
</li>
<li><p>E-Commerce &amp; Visual Product Search</p>
</li>
<li><p>Manufacturing &amp; Maintenance Assistance</p>
</li>
<li><p>Business &amp; Financial Data Fusion</p>
</li>
<li><p>Education &amp; Interactive E‑Learning</p>
</li>
<li><p>Customer Service with Multi‑Channel Inputs</p>
</li>
</ol>
<hr />
<h2 id="heading-corrective-rag-crag">Corrective RAG (CRAG)</h2>
<p>CRAG (Corrective Retrieval-Augmented Generation) is an advanced AI framework that builds upon traditional Retrieval-Augmented Generation (RAG) by introducing a robust evaluation and correction mechanism. Its core purpose is to ensure that only accurate, relevant, and high-confidence information is used for generating responses, thereby reducing errors and hallucinations in AI outputs</p>
<h3 id="heading-workflow-2">Workflow</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751617359871/72be458f-7e0c-4674-9fd2-21932b7ae8ee.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Initial Retrieval</strong></p>
<ul>
<li>The system retrieves a set of documents relevant to the user’s query from a knowledge base, similar to standard RAG.</li>
</ul>
</li>
<li><p><strong>Retrieval Evaluation</strong></p>
<ul>
<li><p>A retrieval evaluator (often a lightweight, fine-tuned model) assesses each retrieved document for relevance and accuracy.</p>
</li>
<li><p>Each document receives a confidence score and is categorized as:</p>
<ul>
<li><p>High Confidence (Correct)</p>
</li>
<li><p>Low Confidence (Incorrect)</p>
</li>
<li><p>Medium/Ambiguous Confidence</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Corrective Actions</strong></p>
<ul>
<li><p><strong>High Confidence:</strong></p>
<ul>
<li>The system refines these documents, extracting only the most relevant information (using techniques like decompose-then-recompose).</li>
</ul>
</li>
<li><p><strong>Low Confidence:</strong></p>
<ul>
<li><p>Unreliable documents are discarded.</p>
</li>
<li><p>The system triggers supplementary retrieval, such as a web search, to find better information.</p>
</li>
</ul>
</li>
<li><p><strong>Medium/Ambiguous Confidence:</strong></p>
<ul>
<li>The system blends refined retrieved documents with additional web search results to ensure robustness.</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Knowledge Refinement</strong></p>
<ul>
<li>All selected information is further filtered and broken down into concise, high-quality knowledge strips, removing noise and focusing on key facts.</li>
</ul>
</li>
<li><p><strong>Generation</strong></p>
<ul>
<li>The refined, corrected knowledge is provided as context to the language model, which then generates the final response.</li>
</ul>
</li>
<li><p><strong>(Optional) Feedback Loop</strong></p>
<ul>
<li>In some implementations, the output can be further validated, and the process iterates if inconsistencies are detected.</li>
</ul>
</li>
</ol>
<h3 id="heading-naive-rag-vs-corrective-rag-crag">Naive RAG vs Corrective RAG (CRAG)</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Naive RAG</strong></td><td><strong>CRAG (Corrective RAG)</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Hallucination Handling</strong></td><td>May generate false or misleading answers based on unverified data.</td><td>Evaluates and filters retrieved info to minimize hallucinations.</td></tr>
<tr>
<td><strong>Retrieval Failure Recovery</strong></td><td>No fallback mechanism—poor results degrade output.</td><td>Performs additional retrieval (e.g., web search) if initial results are weak or wrong.</td></tr>
<tr>
<td><strong>Noise Filtering</strong></td><td>Passes all retrieved content directly to the LLM, even irrelevant or verbose data.</td><td>Filters and refines content into concise, relevant knowledge strips.</td></tr>
<tr>
<td><strong>Confidence Scoring</strong></td><td>No concept of scoring—assumes all retrievals are equally useful.</td><td>Assigns confidence scores (High, Medium, Low) to determine how content is handled.</td></tr>
<tr>
<td><strong>Output Quality</strong></td><td>Inconsistent—sometimes accurate, sometimes misleading.</td><td>Consistently more accurate and grounded in vetted content.</td></tr>
</tbody>
</table>
</div><h3 id="heading-use-cases-2">Use Cases</h3>
<ol>
<li><p>Healthcare Assistants</p>
</li>
<li><p>Enterprise Knowledge Assistants</p>
</li>
<li><p>Academic Research Tools</p>
</li>
<li><p>Customer Support Bots</p>
</li>
<li><p>Financial Analysis Copilots</p>
</li>
<li><p>Government &amp; Policy Advisory Systems</p>
</li>
</ol>
<hr />
<h2 id="heading-golden-retriever-rag">Golden Retriever RAG</h2>
<p>Golden-Retriever RAG is a <strong>high-fidelity</strong>, <strong>agentic</strong> Retrieval-Augmented Generation (RAG) system specifically designed to excel in <strong>complex, domain-specific environments</strong>—such as <strong>industrial knowledge bases</strong>—where queries often involve <strong>specialized jargon</strong> and <strong>ambiguous context</strong>.</p>
<h3 id="heading-workflow-3">Workflow</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751617502324/58979384-a9dd-4f98-873d-2572206b7111.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Jargon Identification:</strong><br /> The system scans the user’s query for technical terms, abbreviations, or domain-specific language.</p>
</li>
<li><p><strong>Context Clarification:</strong><br /> Each identified term is cross-referenced with a jargon dictionary and contextualized based on the query.</p>
</li>
<li><p><strong>Question Augmentation:</strong><br /> The original question is rewritten or expanded to include clarified definitions and context, making it more precise for retrieval.</p>
</li>
<li><p><strong>Document Retrieval:</strong><br /> The augmented question is used to search the knowledge base, resulting in the retrieval of highly relevant and contextually accurate documents.</p>
</li>
<li><p><strong>Answer Generation:</strong><br /> Retrieved documents are provided as context to the language model, which then generates a precise, well-grounded answer.</p>
</li>
</ol>
<h3 id="heading-naive-rag-vs-golden-retriever-rag">Naive RAG vs Golden-Retriever RAG</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Naive RAG</strong></td><td><strong>Golden‑Retriever RAG</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Jargon Handling</strong></td><td>Ignores specialized terms or acronyms — retrieval may miss context.</td><td>Identifies and clarifies jargon through a dictionary before retrieval</td></tr>
<tr>
<td><strong>Question Augmentation</strong></td><td>Uses the original user query as-is.</td><td>Augments queries with jargon definitions and context to resolve ambiguity</td></tr>
<tr>
<td><strong>Context Awareness</strong></td><td>Lacks disambiguation — may retrieve irrelevant documents.</td><td>Contextual clarification helps retrieval stay on‑topic</td></tr>
<tr>
<td><strong>Fallback Behavior</strong></td><td>No mechanism for missing jargon or misinterpreted queries.</td><td>Returns a "miss response" suggesting improvements if the jargon isn't found</td></tr>
<tr>
<td><strong>Retrieval Accuracy</strong></td><td>Dependent purely on similarity search — may be noisy for domain terms.</td><td>Higher relevance due to enhanced retrieval query and jargon integration</td></tr>
</tbody>
</table>
</div><h3 id="heading-use-cases-3">Use Cases</h3>
<ol>
<li><p>Legal Counseling &amp; Compliance</p>
</li>
<li><p>Industrial Knowledge Base Exploration</p>
</li>
<li><p>Education &amp; Training Support</p>
</li>
<li><p>Medical Diagnostics Assistance</p>
</li>
<li><p>Enterprise Research &amp; Decision Support</p>
</li>
</ol>
<hr />
<h1 id="heading-limitations-of-rag-1">Limitations of RAG</h1>
<ol>
<li><p><strong>Quality and Accuracy of Retrieval:</strong><br /> RAG systems depend on the quality of external data sources. If the retrieval system fetches irrelevant, outdated, or inaccurate documents, the generated output will be unreliable—even if the language model itself is strong.</p>
</li>
<li><p><strong>Computational Cost and Complexity:</strong><br /> Running the RAG pipeline requires both a robust retrieval system and a generative model, increasing computational resources and latency compared to standalone LLMs. Real-time retrieval from large datasets can slow down response times and increase infrastructure costs.</p>
</li>
<li><p><strong>Dependency on Data Structure:</strong><br /> RAG’s effectiveness relies on well-organized, accessible, and up-to-date knowledge bases. Poorly structured or incomplete data can degrade performance, and not all organizations have the resources to maintain high-quality databases.</p>
</li>
<li><p><strong>Lack of Iterative Reasoning:</strong><br /> Most RAG systems perform a single retrieval step and cannot iteratively refine their search or reason over multiple steps, which limits their ability to handle complex, multi-hop queries.</p>
</li>
<li><p><strong>Bias and Ethical Risks:</strong><br /> If the underlying data sources are biased or flawed, RAG can amplify these issues, leading to unfair or untrustworthy outputs</p>
</li>
</ol>
<hr />
<h1 id="heading-future-plans-and-scope-of-improvements">Future Plans and Scope of Improvements</h1>
<ol>
<li><p><strong>Multimodal Integration:</strong><br /> Future RAG systems will increasingly combine text, images, audio, and video, enabling richer and more context-aware outputs for complex real-world tasks.</p>
</li>
<li><p><strong>Continuous Learning and Adaptation:</strong><br /> RAG models will adopt incremental and online learning, updating their knowledge bases and retrieval strategies in real time without requiring full retraining.</p>
</li>
<li><p><strong>Adaptive and Iterative Retrieval:</strong><br /> Advanced RAG will feature adaptive algorithms that refine queries and retrievals based on user intent and feedback, improving precision and relevance, especially in specialized domains.</p>
</li>
<li><p><strong>Bias Mitigation and Ethical AI:</strong><br /> Research focuses on transparent, accountable frameworks to detect and correct biases in both retrieval and generation, ensuring fair and trustworthy outputs.</p>
</li>
<li><p><strong>Enhanced Reasoning and Multi-Hop Capabilities:</strong><br /> Future RAG systems will support multi-step, hierarchical, and multi-hop reasoning, enabling them to answer more complex queries by connecting information across multiple sources</p>
</li>
</ol>
<hr />
<blockquote>
<p><strong>In conclusion, Retrieval-Augmented Generation is not just enhancing the capabilities of AI—it's reshaping how we access, synthesize, and trust information. As RAG continues to evolve, embracing new modalities and smarter retrieval strategies, it promises to unlock even greater potential for innovation across industries, making AI-driven solutions more accurate, explainable, and impactful than ever before</strong></p>
</blockquote>
]]></content:encoded></item></channel></rss>