WordPress RAG · External Knowledge NEW

A RAG chatbot, built for WordPress.

PressBot now bundles retrieval-augmented generation straight into the WordPress chatbot. Visitors ask plain questions, the bot calls your retriever, and answers come back grounded in transcripts, manuals, courses, policies — whatever corpus you already own. The plugin stays lean. Your retriever stays in charge. Source URLs can now flow through as clickable markdown links.

External Knowledge/RAG uses PressBot’s BYOK tool workflow today. The WordPress AI Client route in 1.7.0 is text-only while Core tool support matures.

Bring your retriever Encrypted bearer auth Cited passages UTF-8 safe streaming

Why WordPress needs RAG

Generic chatbots guess. Your actual knowledge lives somewhere else.

Most WordPress chatbots reach for whatever the underlying model happened to memorise, then sound confident about it. Retrieval-augmented generation flips that — the bot looks up your corpus first, then answers. For teams whose best source material lives in transcripts, manuals, support docs, or research archives, RAG closes the gap.

Generic answers, polished delivery

The model paraphrases something close enough. Visitors who already know the topic spot the gap immediately. Trust leaks out before the conversation ends.

The source material is elsewhere

Transcripts in a vault. Manuals in a doc site. Course archives behind a login. WordPress only ever saw a fraction of what visitors actually need answered.

Vague replies become tickets

When the bot hedges, the visitor opens a ticket. Or they leave. Either way, the answer was right there in your corpus — just not reachable from the chat.

Who needs WordPress RAG

For teams whose library is bigger than their website.

If your real knowledge lives outside the WordPress database, RAG is the right answer. PressBot plugs into whichever retriever already owns your corpus — same chat widget your visitors already trust, now grounded in the material that actually answers their questions.

Support teams

Connect manuals, troubleshooting libraries, policy decks, and how-to docs. Visitors get the actual answer instead of a deflection to the ticket form.

e.g. shipping rules · SLAs · product specs

Course creators

Let learners ask plain-language questions against course transcripts, lessons, and reference material — without exposing the full archive publicly.

e.g. cohort lessons · module notes

Expert publishers

Ground answers in research libraries, interview archives, document collections, and curated knowledge bases — the work you spent years assembling.

e.g. interviews · whitepapers · case files

How WordPress RAG works

Bring your retriever. We bring the chat layer.

PressBot does not ship a vector database, an embedding pipeline, or an ingestion queue. That is deliberate — RAG works cleanly when your service owns the corpus and PressBot owns the WordPress conversation. Four steps, no SDK.

Setup

Configure endpoint

Add an HTTPS retrieval URL, optional encrypted bearer token, default scope, result limit, label, and any custom instructions for the model.

settings POST https://retriever.example/search

Runtime

Visitor asks anything

The public chatbot decides when to call the visitor-safe search_knowledge_corpus tool — or to keep talking from on-site context.

tool search_knowledge_corpus({ query, scope })

Retrieval

Sources return

Your service returns ranked matches with source titles, URLs, timestamps, paths, or whatever metadata you publish in the response schema.

response { matches: [{ text, score, source }] }

Grounding

PressBot answers

Snippets are normalised and handed to the model with instructions to cite the matches and avoid guessing when nothing scores high enough.

visitor sees cited · grounded · honest

A real RAG exchange

Watch the retrieval happen.

A visitor asks a plain question. PressBot decides to call your retriever, surfaces the top matches, and answers with markdown links the visitor can click. Nothing is invented — this is what RAG looks like in production.

  • The tool call is visible in the conversation — no hidden retrieval.
  • Matches are ranked, capped, and cited before the model writes a word.
  • When scores are weak, PressBot says so instead of filling in the gap.
  • Streaming preserves accents and non-Latin characters — UTF-8 safe.

RAG safety & limits

Source-grounded does not mean unbounded.

External Knowledge inherits the same posture as the rest of PressBot — cap everything, log what matters, default to the safer fallback when something looks off. RAG that respects its own perimeter.

Encrypted auth

Optional bearer-token auth is stored encrypted in WordPress. Endpoint URLs are restricted to HTTP/HTTPS. No tokens leak into request logs.

bearer https-only at rest

Bounded retrieval

Queries, scopes, result count, response size, match text, source metadata, and URLs are all capped before the model ever sees them.

query cap size cap match cap

UTF-8 safe streaming

Public chat streams answers chunk-by-chunk while preserving accents, ideographs, and other non-English characters that retrievers often return.

accents CJK emoji

Developer contract

One POST. One response shape. That’s it.

PressBot does not prescribe how you index, embed, or rank. We send a query and a scope. You return ranked matches with whatever metadata you want surfaced.

  • Works with any retriever — Pinecone, Weaviate, pgvector, Elastic, your own.
  • JSON-in, JSON-out — no SDK, no extra dependency on the WordPress side.
  • Source metadata is passed through verbatim — titles, URLs, timestamps, paths, scopes.
  • Returns nothing? PressBot tells the visitor instead of hallucinating an answer.
POST /search
{
  "query": "What does our refund policy say about digital downloads?",
  "limit": 6,
  "language": "en-US",
  "collection": "policies"      // optional scope
}
{
  "matches": [
    {
      "text": "Digital downloads are refundable within 14 days...",
      "score": 0.94,
      "source": {
        "title": "Refund Policy",
        "url":   "https://example.com/legal/refunds",
        "path":  "docs/refund-policy.md",
        "timestamp_start": 125,
        "timestamp_end":   210,
        "collection": "policies"
      }
    }
  ]
}

Your service

You bring the corpus.

  • Ingestion — pull in transcripts, manuals, PDFs, anything else worth grounding answers in.
  • Embeddings — choose your own model, your own chunking, your own re-rank pass.
  • Ranking — return matches in the order you actually want PressBot to present them.
  • Freshness — you decide when to re-embed and how often the corpus refreshes.

PressBot

We handle the chat layer.

  • Public WordPress chatbot — widget, theming, accessibility, the whole front end.
  • Tool routing — the model decides when to call your retriever, when to answer directly.
  • Citations & fallbacks — passages are normalised, capped, and presented to the visitor.
  • Streaming & language — UTF-8 safe chunks, language detection, polite refusal when matches are weak.

Common WordPress RAG questions

Before you wire it up.

Short answers to what most teams ask before they connect a retriever to their WordPress chatbot.

What is a WordPress RAG chatbot?

A WordPress RAG chatbot uses retrieval-augmented generation — instead of answering from whatever the language model happened to memorise, the bot first looks up your own corpus (transcripts, manuals, docs, policies, anything) and grounds the reply in those matches.

PressBot is the WordPress chatbot layer. Your retrieval service is the corpus. The two talk over a simple POST /search contract, and the visitor sees citations they can click. That is RAG, applied to the WordPress chat widget you already trust.

Does this work with Pinecone, Weaviate, pgvector, or my custom retriever?

Yes — any of them. PressBot only speaks the request/response shape shown above. Whatever sits behind your endpoint is your decision. Pinecone, Weaviate, Qdrant, Milvus, pgvector, Elastic, Algolia, your own homegrown ranker — all fine.

The contract is HTTPS + JSON. No SDK, no specific vendor lock.

What happens when the retriever returns nothing relevant?

PressBot tells the visitor it could not find a match in your corpus, and offers to keep the conversation going from on-site context. It will not invent a passage to fill the gap.

You can also tune a minimum score threshold in the endpoint settings if you want stricter behaviour.

Are the citations clickable for visitors?

Yes, whenever your response includes a source.url. PressBot renders the title as a link inline with the answer. If only a path or timestamp is provided, the citation still shows — just as text rather than a link.

What about non-English content?

External Knowledge passes the visitor’s language preference along to your retriever (BCP-47 codes like es-ES or ja-JP), so you can filter or re-rank accordingly. The chat streaming pipeline is UTF-8 safe end-to-end — accents, ideographs, and emoji all survive.

Is WordPress RAG part of the free plan?

The RAG bridge (External Knowledge) ships with PressBot Pro. The free chatbot still answers from your WordPress content; Pro adds the retriever connector, the agent surface, and everything else listed on the Pro page.

Does PressBot store the retrieved passages?

Only inside the conversation transcript that the visitor already sees. We do not mirror your corpus, build a shadow index, or send the matches anywhere outside the model call required to ground the answer.

Related chatbot guides

Three more angles on the PressBot chatbot.

Same plugin, different lens. Each page goes deeper on a specific reason teams are reaching for a WordPress chatbot in 2026.

WordPress RAG · External Knowledge

Plug RAG into your WordPress site.

Bring transcripts, manuals, support docs, or research libraries. Keep the WordPress plugin lean. Hand visitors retrieval-grounded answers, not confident guesses.