Document conversion
Services for converting documents between formats — typically HTML, Office, or Markdown into PDF — exposed as HTTP APIs for easy integration.
Self-hosted
Section titled “Self-hosted”- Gotenberg — containerized HTTP API for document conversion; turns URLs, HTML, Markdown, and 100+ Office formats into PDF using headless Chromium and LibreOffice, plus PDF post-processing (merge, split, encrypt) via QPDF, pdfcpu, and ExifTool; supports S3/MinIO/GCS streaming and webhooks
- WeasyPrint — Python library and CLI that renders HTML + CSS to PDF; strong CSS Paged Media support, no headless browser required
Cloud APIs
Section titled “Cloud APIs”- CloudConvert — hosted conversion API supporting 200+ formats including audio, video, images, and documents
- DocRaptor — HTML-to-PDF API based on Prince XML; high-fidelity rendering for invoices and reports
To Markdown
Section titled “To Markdown”- MarkItDown — Python CLI and library from Microsoft that converts PDFs, Office documents (Word/Excel/PowerPoint), images (OCR), audio (transcription), HTML, EPub, CSV/JSON/XML and YouTube URLs into Markdown optimized for LLM pipelines; preserves headings, lists, tables, and links; optional OpenAI client for image captions; MIT licensed