Convex Document Search Component

A component for semantic search over documents, often to provide context to LLMs, e.g. for Retrieval-Augmented Generation (RAG).

✨ Key Features

Document Upsert: Add or replace documents with automatic text chunking and embedding.
Semantic Search: Vector-based search using configurable embedding models
Namespaces: Organize documents into namespaces for per-user search.
Custom Filtering: Filter documents with custom indexed fields.
Importance Weighting: Weight documents by providing a 0 to 1 "importance".
Chunk Context: Get surrounding chunks for better context.
Graceful Migrations: Migrate documents or whole namespaces to new content, models, etc. without disruption.

Found a bug? Feature request? File it here.

Pre-requisite: Convex

You'll need an existing Convex project to use the component. Convex is a hosted backend platform, including a database, serverless functions, and a ton more you can learn about here.

Run npm create convex or follow any of the quickstarts to set one up.

Installation

Install the component package:

npm install @convex-dev/document-search

Create a convex.config.ts file in your app's convex/ folder and install the component by calling use:

// convex/convex.config.ts
import { defineApp } from "convex/server";
import documentSearch from "@convex-dev/document-search/convex.config";

const app = defineApp();
app.use(documentSearch);

export default app;

Basic Setup

// convex/documents.ts
import { components } from "./_generated/api";
import { DocumentSearch } from "@convex-dev/document-search";
// Any AI SDK model that supports embeddings will work.
import { openai } from "@ai-sdk/openai";

const documentSearch = new DocumentSearch(components.documentSearch, {
  filterNames: ["category", "documentType", "categoryAndType"],
  textEmbeddingModel: openai.embedding("text-embedding-3-small"),
  embeddingDimension: 1536,
});

Usage Examples

Document Upload and Chunking

Upload documents with automatic text chunking and embedding:

export const uploadDocument = action({
  args: {
    url: v.string(),
    category: v.string(),
  },
  handler: async (ctx, args) => {
    const { url, category } = args;
    const response = await fetch(url);
    const content = await response.text();
    const chunks = await textSplitter.splitText(content);
    const documentType = response.headers.get("content-type");

    const { documentId } = await documentSearch.upsertDocument(ctx, {
      namespace: "global", // namespace can be any string
      key: url,
      chunks,
      source: { kind: "url", url },
      filterValues: [
        { name: "category", value: category },
        { name: "documentType", value: documentType },
        // To get an AND filter, use a filter with a more complex value.
        { name: "categoryAndType", value: { category, documentType } },
      ],
    });

    return { documentId };
  },
});

Note: The textSplitter here could be LangChain, Mastra, or otherwise. See below for more details.

File Upload with Storage

Upload files directly to a Convex action, httpAction, or upload url. See the docs for more details.

export const uploadFile = action({
  args: {
    filename: v.string(),
    mimeType: v.string(),
    bytes: v.bytes(),
    category: v.string(),
  },
  handler: async (ctx, args) => {
    const userId = await getUserId(ctx);
    if (!userId) throw new Error("Unauthorized");

    const { filename, mimeType, bytes, category } = args;
    // Store file in Convex storage
    const storageId = await ctx.storage.store(
      new Blob([bytes], { type: mimeType })
    );

    // Extract and chunk text content
    const textContent = new TextDecoder().decode(bytes);
    const chunks = await textSplitter.splitText(textContent);

    const { documentId } = await documentSearch.upsertDocument(ctx, {
      namespace: userId, // per-user namespace
      key: filename,
      title: filename,
      chunks,
      source: { kind: "_storage", storageId },
      filterValues: [
        { name: "category", value: category },
        { name: "documentType", value: mimeType },
        { name: "categoryAndType", value: { category, documentType: mimeType } },
      ],
    });

    return { documentId, url: await ctx.storage.getUrl(storageId) };
  },
});

Semantic Search

Search across documents with vector similarity

text is the plain text content of the results concatenated together.
results is an array of matching chunks with scores and more metadata.
sources is an array of the documents that matched the query. Each result has a documentId referencing one of these source documents.

export const searchDocuments = action({
  args: {
    query: v.string(),
  },
  handler: async (ctx, args) => {

    const { results, text, sources } = await documentSearch.search(ctx, {
      namespace: "global",
      query: args.query,
      limit: 10
    });

    return { results, text, sources };
  },
});

Filtered Search

Search with metadata filters:

export const searchByCategory = action({
  args: {
    query: v.string(),
    category: v.string(),
  },
  handler: async (ctx, args) => {
    const userId = await getUserId(ctx);
    if (!userId) throw new Error("Unauthorized");

    const results = await documentSearch.search(ctx, {
      namespace: userId,
      query: args.query,
      filters: [{ name: "category", value: args.category }],
      limit: 10,
    });

    return results;
  },
});

Search with Context

Get surrounding chunks for better context:

export const searchWithContext = action({
  args: {
    query: v.string(),
    userId: v.string(),
  },
  handler: async (ctx, args) => {
    const results = await documentSearch.search(ctx, {
      namespace: args.userId,
      query: args.query,
      chunkContext: { before: 2, after: 1 }, // Include 2 chunks before, 1 after
      limit: 5,
    });

    return results;
  },
});

Document Management

Delete a document:

export const deleteDocument = mutation({
  args: { documentId: vDocumentId },
  handler: async (ctx, args) => {
    await documentSearch.deleteDocument(ctx, {
      documentId: args.documentId,
    });
  },
});

Asynchronous Document Processing

For large documents, use async processing:

export const chunkerAction = documentSearch.defineChunkerAction(
  async (ctx, args) => {
    // Custom chunking logic for large documents
    // This can be an async iterator if you can't fit it all in memory at once.
    const chunks = await processLargeDocument(args.source);
    return { chunks };
  }
);

export const uploadLargeDocument = action({
  args: {
    filename: v.string(),
    url: v.string(),
  },
  handler: async (ctx, args) => {
    const userId = await getUserId(ctx);
    if (!userId) throw new Error("Unauthorized");

    const { documentId } = await documentSearch.upsertDocumentAsync(ctx, {
      namespace: userId,
      key: args.filename,
      source: { kind: "url", url: args.url },
      chunkerAction: internal.example.chunkerAction,
    });

    return { documentId };
  },
});

See more example usage in example.ts.

Run the example with npm i && npm run example.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
example		example
src		src
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
commonjs.json		commonjs.json
eslint.config.js		eslint.config.js
esm.json		esm.json
node10stubs.mjs		node10stubs.mjs
package-lock.json		package-lock.json
package.json		package.json
rename.mjs		rename.mjs
renovate.json		renovate.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Convex Document Search Component

✨ Key Features

Pre-requisite: Convex

Installation

Basic Setup

Usage Examples

Document Upload and Chunking

File Upload with Storage

Semantic Search

Filtered Search

Search with Context

Document Management

Asynchronous Document Processing

About

Uh oh!

Releases

Packages

Languages

License

get-convex/document-search

Folders and files

Latest commit

History

Repository files navigation

Convex Document Search Component

✨ Key Features

Pre-requisite: Convex

Installation

Basic Setup

Usage Examples

Document Upload and Chunking

File Upload with Storage

Semantic Search

Filtered Search

Search with Context

Document Management

Asynchronous Document Processing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages