Graphify
Graphify is an open-source “skill” aimed at AI coding assistants. It builds a queryable knowledge graph from a repository’s code, documentation, PDFs, and images so assistants can reason about structure and cross-file relationships, not only retrieve text chunks.
What it does
Section titled “What it does”The project combines Tree-sitter static analysis (ASTs, call graphs, docstrings across many languages) with LLM-based semantic extraction from prose and vision models for diagrams. Extracted nodes and edges are merged into a NetworkX graph; Leiden community detection groups related parts without relying on vector embeddings. The pipeline also surfaces high-degree “god” nodes and highlights unexpected cross-file or cross-domain edges.
Outputs are written under graphify-out/, including an interactive graph.html, a machine-readable graph.json, and a GRAPH_REPORT.md audit-style summary, plus a cache/ directory for incremental work.
Install and CLI
Section titled “Install and CLI”- Python: 3.10+ (as stated on the project site).
- PyPI package name:
graphifyy; the command-line tool isgraphify. - Typical install:
pip install graphifyy && graphify install(per upstream docs).
Assistant-oriented slash commands advertised include /graphify, /graphify query, /graphify path, and /graphify explain. Any environment that can run shell commands can invoke graphify.
Models and privacy
Section titled “Models and privacy”Graphify does not bundle an LLM. Semantic extraction uses the API key already configured in your assistant (e.g. Claude or Codex). The project states that only semantic descriptions are sent upstream—not full raw source files.
Pipeline (high level)
Section titled “Pipeline (high level)”Upstream documentation describes stages such as: detect → extract → build graph → cluster (Leiden) → analyze → report → export (HTML, JSON, Obsidian-oriented flows). Supporting pieces include modules for URL ingest, caching, validation, optional watch mode, and an MCP-oriented serve path.
Reported examples
Section titled “Reported examples”The site documents illustrative runs (exact numbers are claims from the project, not independently verified here):
- A small httpx-style corpus: on the order of hundreds of nodes and edges, with named “god” nodes such as client and request/response types.
- A larger mixed corpus (code repos, papers, diagrams): roughly 71.5× fewer tokens for queries versus a naive baseline in their scenario, as claimed on the marketing page.
Security posture (as described by the project)
Section titled “Security posture (as described by the project)”The documentation emphasizes strict handling of URLs (http/https only), size and time limits on downloads, path containment checks, and HTML escaping of labels to reduce SSRF, injection, and XSS risks. The project reports no telemetry; outbound calls are tied to semantic extraction via your configured model API.
Sources
Section titled “Sources”- Graphify — product overview, install snippet, pipeline, examples, comparison table, FAQ, and security notes (accessed 2026-04-12)