Skip to content
This site is under construction. Content, screenshots, and workflows may change at any time.

Knowledge Sources

A knowledge source is anything you point a bot at so it can answer questions about it. FluentBot currently supports seven source types grouped into three source families, mixed freely on the same bot.

Source Types

FamilySub-typesBest for
Website LinksURL, URL List, Crawl, SitemapHelp centers, docs sites, public marketing pages
Files & TextFile upload (PDF / TXT / Markdown), pasted TextInternal PDFs, brochures, ad-hoc snippets
WPManageNinjaFluent SupportExisting Fluent Support customers - answer from support tickets

For the full add-source walkthrough, see Train your bot.

Add Sources dialog showing Website Links, Files and Text, and WPManageNinja source families

Website Links source types fetch public web content and turn each page into one or more documents. They support scheduled refresh: Never, Daily, Weekly, or Monthly.

Use the bot’s Blocked URLs list to prevent specific pages or paths from being indexed.

URL

Use URL when you only need one public page, such as a landing page, policy page, or single help article.

Form fields:

  • URL - the absolute page URL.
  • Source title - optional display name. Defaults to the page title or URL.
  • Scheduled refresh - how often FluentBot should re-check the page.

Best fit:

  • One page has the exact answer content.
  • You want tight control over what the bot learns.
  • A broader crawl would include low-value pages.

URL List

Use URL List when you already know the exact pages to index. Upload a .txt or .csv file containing one URL per row.

Form fields:

  • File - a TXT or CSV list of public URLs.
  • Source title - optional display name. Defaults to the uploaded filename.
  • Scheduled refresh - how often FluentBot should re-check listed URLs.

Best fit:

  • Curated docs pages.
  • A subset of a large site.
  • A manually reviewed list from another system.

Crawl

Use Crawl when you want FluentBot to start from one URL and follow internal links.

Form fields:

  • Start URL - where the crawler begins.
  • Source title - optional display name. Defaults to the URL.
  • Scheduled refresh - how often FluentBot should re-crawl.

What gets crawled:

  • Internal links reachable from the start URL.
  • Pages on the same site area, subject to crawler limits and blocked URLs.
  • HTML pages with readable public content.

What is skipped:

  • External domains.
  • Blocked URLs.
  • Login-only pages.
  • Non-HTML assets such as images, videos, and downloads.

Prefer Sitemap over Crawl when a clean sitemap exists. Sitemap is faster and avoids accidental discovery of unrelated sections.

Crawl Website source form with Start URL, Source title, Scheduled refresh, and Add Source button

Sitemap

Use Sitemap when a site publishes a sitemap.xml that lists the pages you want indexed.

Form fields:

  • Sitemap URL - the absolute URL of the sitemap file.
  • Source title - display name in the sources table.
  • Scheduled refresh - how often FluentBot should re-fetch the sitemap.

Most sites publish sitemaps at one of these locations:

  • https://example.com/sitemap.xml
  • https://example.com/sitemap_index.xml
  • Listed in https://example.com/robots.txt under Sitemap:

If a sitemap index contains multiple sub-sitemaps, point FluentBot at the specific sitemap you want. This avoids indexing unrelated pages.

Sitemap quality matters:

  • Stale URLs can fail during indexing.
  • Missing URLs will not be indexed.
  • The wrong sub-sitemap can train the bot on unrelated content.

Files & Text

Files and Text sources are for content you already have. They do not have scheduled refresh because FluentBot cannot know when your local file or pasted text changes.

Add Sources dialog with Files and Text selected, showing File and Text source cards

File

Use File for PDFs, TXT files, and Markdown files.

Supported formats:

  • PDF - text-based PDF documents.
  • TXT - plain text.
  • Markdown - .md files, with structure preserved as text.

Form fields:

  • File picker - drag-and-drop or choose a file.
  • Source title - optional display name. Defaults to the filename.

Files are limited to 20 MB at upload time.

To replace a file, delete the old File source and upload the updated file. FluentBot does not edit uploaded files in place.

If a file extracts poorly, convert it to Markdown or paste the corrected text as a Text source.

Text

Use Text for short or medium content that is easiest to paste directly, such as policies, FAQs, canned instructions, or internal notes.

Form fields:

  • Title - required display name.
  • Content - pasted source text.

Text is usually the fastest way to add a small amount of reliable information. It is also useful when a web page or PDF extracts poorly.

To update Text content, delete the old source and add a new Text source with the corrected version.

Fluent Support

Use Fluent Support when your team already has support tickets in a WordPress site running Fluent Support.

Form fields:

  • Domain - the WordPress site domain.
  • Username - a WordPress user with the required access.
  • Application Password - a WordPress application password.
  • Ticket status - ticket statuses to include.
  • Product - product scope for imported tickets.
  • Tags - optional tag filter.
  • Date filter - optional date scope.
  • Refresh schedule - how often FluentBot should sync matching tickets.

FluentBot fetches matching tickets from the Fluent Support API and adds the resulting ticket content to the bot’s knowledge base.

Add Sources dialog with WPManageNinja selected and a Fluent Support source card

How sources become usable

Every source follows the same high-level path: add the source, wait for it to finish processing, then test answers in Playground.

Source vs. document

  • Source - what you add (a URL, a sitemap, a file, a Text paste). Has a status (Indexed / Failed / Partially Indexed / Scraping / Indexing / Queued / Deleting).
  • Document - one indexed unit produced by the source. URL-based sources usually create one document per page. Text creates one document per paste. File sources can create one or more documents because large files are split into chunks.

The bot dashboard tracks both source count and indexed document count.

Choosing a source type

GoalUse
Index every page on a siteSitemap if there is one, otherwise Crawl
Index one public pageURL
Index a curated list of public pagesURL List
Add a PDF, TXT, or Markdown fileFile
Add short internal textText
Use existing Fluent Support ticketsFluent Support

You can mix any combination on one bot.

Status flow

Sources move through status states on the Sources page.

Common states:

  • Queued - waiting to start.
  • Scraping - fetching website content.
  • Indexing - preparing content for answers.
  • Indexed - ready for the bot to use.
  • Partially Indexed - some content worked and some failed.
  • Failed - no usable content was indexed.
  • Deleting - cleanup is running.

Open a source row to see details, failures, and retry options.

Knowledge Base Sources table showing Text and Crawl sources, document counts, indexed status, failed status, filters, sorting, and Add Source button

Plan limits

The PAGES counter on the Billing page caps indexed documents across the team. Website pages usually map one-to-one, while large files can count as multiple chunks. When you hit the limit, new indexing is blocked until you delete sources or upgrade.

Text usually counts as one document. Files and website sources can count as one or more documents depending on their size and page count.

Refresh

Website Links and Fluent Support sources can have a refresh schedule. Files and Text do not change unless you replace them.

On refresh:

  • Website pages are re-fetched.
  • Changed content is updated.
  • New sitemap or crawl pages can be added.
  • Missing or removed pages can be removed from the index.

See Retrain & Update for refresh and retry guidance.

Blocked URLs

A per-bot deny-list applies across every Website Links source. Add URLs the crawler should never index, such as careers pages, login walls, or marketing-only sections that confuse answers.

Manage from Knowledge Base > Blocked URLs (see Sources).

Blocked URLs tab with an Add URL to block button and an empty-state message for URLs excluded from crawl indexing

What’s next