gtstudio/module-ai-knowledge-base

Knowledge base management for Magento 2. Upload documents (PDF, TXT) that AI agents can retrieve as context before answering queries.

Maintainers

Package info

github.com/gabrielgts/module-ai-knowledge-base

Type:magento2-module

pkg:composer/gtstudio/module-ai-knowledge-base

Statistics

Installs: 5

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

1.0.3 2026-03-11 03:09 UTC

This package is auto-updated.

Last update: 2026-03-24 18:06:53 UTC


README

Document management for AI agents in Magento 2. Upload files that agents can retrieve as context before answering queries — enabling retrieval-augmented generation (RAG) without a vector database.

Preview

AiKnowledgeBase — uploading a PDF and querying an agent that retrieves relevant excerpts

AI Studio Ecosystem

Part of the AI Studio suite for Magento 2. See all modules:

Module Repository Description
Gtstudio_AiConnector module-aiconnector Core AI provider abstraction
Gtstudio_AiAgents module-ai-agents Agent & tool orchestration, cron scheduling, execution log
Gtstudio_AiWidgets module-ai-widgets Floating admin chat widget + PageBuilder AI generator
Gtstudio_AiDataQuery module-ai-data-query Natural-language store analytics (privacy-first)
Gtstudio_AiKnowledgeBase (this module) Document upload & RAG retrieval for agents
Gtstudio_AiDashboard module-ai-dashboard AI-powered KPI dashboard with ML insights

What It Does

  • Upload and manage documents (PDF, TXT) in the Magento admin
  • Documents are stored and indexed so that agents can fetch relevant excerpts at query time
  • Integrates with Gtstudio_AiAgents — assign a knowledge base to any agent

Requirements

  • Magento 2.4.4+
  • PHP 8.1+
  • Gtstudio_AiConnector enabled and configured
  • Gtstudio_AiAgents enabled
  • smalot/pdfparser: ^2.12 (PDF text extraction)

Installation

composer require gtstudio/module-ai-knowledge-base
php bin/magento module:enable Gtstudio_AiKnowledgeBase
php bin/magento setup:upgrade

Usage

Uploading Documents

Navigate to AI Studio → Agents & Tools → Knowledge Base.

Click Add New, fill in:

Field Description
Title Human-readable label (auto-populated from PDF metadata on upload)
Upload PDF Document Upload a PDF file — text and metadata are extracted automatically
Content Extracted text (editable; used for retrieval)
Tags Comma-separated keywords (auto-populated from PDF metadata)
Agents Associate this document with one or more agents
Is Active Only active entries are searchable by agents

How Retrieval Works

When an agent that has knowledge base documents attached receives a question:

  1. The question is matched against document excerpts using keyword or semantic similarity
  2. Relevant excerpts are prepended to the agent's system prompt as context
  3. The agent responds with awareness of those excerpts

No full document text is sent to the LLM — only the most relevant excerpts, keeping token usage low.

Extensibility

Supporting Additional File Formats

The text extraction pipeline uses a registry pattern. Register a custom extractor for a new MIME type:

<!-- etc/di.xml -->
<type name="Gtstudio\AiKnowledgeBase\Model\Extractor\ExtractorPool">
    <arguments>
        <argument name="extractors" xsi:type="array">
            <item name="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
                  xsi:type="object">
                Vendor\Module\Model\Extractor\DocxExtractor
            </item>
        </argument>
    </arguments>
</type>

Implement Gtstudio\AiKnowledgeBase\Api\ExtractorInterface:

interface ExtractorInterface
{
    /**
     * Extract plain text from the given file path.
     */
    public function extract(string $filePath): string;
}

Custom Retrieval Strategy

Override the retrieval service to use a vector database, OpenSearch k-NN, or any other similarity search:

<preference for="Gtstudio\AiKnowledgeBase\Api\RetrievalServiceInterface"
            type="Vendor\Module\Model\VectorRetrievalService"/>

Chunking Strategy

Document chunking (splitting documents into excerpt-sized pieces) can be customised:

<type name="Gtstudio\AiKnowledgeBase\Model\Chunker\TextChunker">
    <arguments>
        <!-- Maximum characters per chunk -->
        <argument name="chunkSize" xsi:type="number">1500</argument>
        <!-- Overlap between consecutive chunks -->
        <argument name="overlap" xsi:type="number">200</argument>
    </arguments>
</type>

Database Tables

Table Purpose
gtstudio_ai_knowledge_base Document metadata (name, description, file path, agent association)
gtstudio_ai_knowledge_base_chunk Extracted text chunks ready for retrieval

ACL Resources

Resource Controls
Gtstudio_AiKnowledgeBase::management Access to the Knowledge Base admin section