gpt-researcher学习笔记（1）

张

张建站

2026/5/30 1:35:04

10分钟阅读

https://github.com/assafelovic/gpt-researcher来自https://github.com/datawhalechina/Agent-Learning-Hub列出的开源项目参考项目从2023-05开始创建根据提交记录请claude code分析了一下其演变路径如下# GPT-Researcher: Evolution Key Change Points Analysed from 2,980 commits spanning May 2023 to May 2026.---## Phase 1 — Proof of Concept (May–July 2023)The project began with a single commit on **May 12, 2023**. The initial architecture was minimal:- A Python script driving a Selenium/Chrome browser to scrape web pages- DuckDuckGo as the only search backend- OpenAI (GPT-4) for synthesis- Three report types: outline_report, resource_report, detailed_report- A FastAPI server plain HTML/JS frontend added within the first two weeks**Key milestone:** WebSocket real-time streaming of agent progress (July 7), which gave the project its live thinking out loud feel that became its identity.---## Phase 2 — Community Bootstrap (July–September 2023)Growth exploded with community PRs almost immediately after launch. Key additions:- **Auto-agent selection** (Jul 20): LLM dynamically picks the right agent role instead of hardcoding it — Finance, Security Analyst, Business Analyst, etc.- **Docker support** (Jul 14–31): Containerized Chrome app for reproducible deployment- **LangChain integration** (Aug 18): Moved LLM/config to the LangChain abstraction layer- JS-rendered page scraping support via Selenium (Aug 24)The project gained ~150 GitHub stars in this phase.---## Phase 3 — Multi-Agent Architecture (Oct 2023–Sep 2024)This is where the projects research depth fundamentally changed.**~Oct 2023:** Introduction of a multi_agents module using **LangGraph** — a hierarchical system where a Master agent orchestrates Writer, Reviewer, Reviser, Editor, and Publisher sub-agents. This enabled long-form, structured research reports.**~Sep 2024 (PR #875):** The detailed_report type was rebuilt around multi-agent subtopic decomposition — each subtopic gets its own parallel research pass before being merged into a coherent final report.**~Oct 2024 (PR #941):** Introduction of a **Strategic LLM** concept — a separate, more powerful planning model distinct from the fast execution model — allowing cost/quality tradeoffs.Other expansions in this phase:- Azure OpenAI, AWS Bedrock, Ollama support- Multiple search retrievers: Google, Bing, SearxNG, Arxiv, PubMed- Vector store integration for hybrid localweb research (langchain_vectorstore report source)- Configurable embedding providers---## Phase 4 — Frontend Renaissance (Oct–Dec 2024)The plain HTML frontend was replaced with a full **Next.js application** (PR #898, Oct 2024):- TypeScript throughout- Image carousel in reports (Oct 2024, PR #925/#942)- Chat-with-research capability (post-report QA)- Mobile-responsive design- Source deduplication and favicon display- Structured logging system with downloadable logs (Dec 2024)**Language support** was added (Dec 2024, PR #1026) — reports can now be generated in any language via config.---## Phase 5 — Deep Research Agent Ecosystem (Feb–Jun 2025)**February 22, 2025** is the single biggest inflection point in the projects history.**Deep Research mode** (PR #1179/#1195) was introduced as a wholly new research paradigm: iterative, recursive search loops guided by a planning LLM that decides when to dig deeper vs. synthesize. This directly competed with OpenAIs Deep Research product.Simultaneously:- **Nodriver/Zendriver headless scraper** (Feb 2025) — a Chrome-automation library without Selenium, enabling stealthier and faster scraping- **FireCrawl scraper** (Feb 2025) — cloud-based scraping for JS-heavy sites- **MCP Server** (Mar 29, 2025) — GPT-Researcher exposed as a Model Context Protocol server, integrable into Claude, Cursor, etc.- **React NPM package** (Feb–Mar 2025) — embeddable GPTResearcher / component- **Domain filtering** (Feb 2025) — restrict research to specific domains- **Reasoning model support** (Feb–Mar 2025) — o1, o3, Gemini thinking mode, with reasoning_effort config---## Phase 6 — Provider Proliferation Observability (Jun–Nov 2025)The project became a true multi-provider platform:| Provider added | Date ||---|---|| GigaChat | Feb 2025 || OpenRouter | Mar 2025 || AI/ML API | May 2025 || vLLM (local) | May 2025 || DashScope (Alibaba) | Jun/Jul 2025 || Netmind | Jul 2025 || Avian | Feb 2026 || Forge | Feb 2026 || MiniMax | Mar 2026 |**LangChain dependency was removed** (Nov 7, 2025, PR #1547 migration/langchain) — a major architectural cleanup that reduced startup time and dependency bloat.**Hallucination evaluation framework** added (Jun 2025) with a judge-based LLM grader.**PWA support** (Sep 2025) — installable as a mobile app.**New UI redesign** (Apr 2025, Sep 2025) — two separate premium UI overhauls, the second introducing a sidebar, preferences modal, and polished report rendering.**LangSmith observability** (Jan 2026) — full tracing of LLM calls.---## Phase 7 — Modern Ecosystem (2026)The project shifted focus toward ecosystem integrations and reliability:- **AG2 partnership** (Mar 2026) — AutoGen2/AG2 as a multi-agent pipeline alternative- **Agent discovery endpoint** (Apr 2026) — REST API for external tools to enumerate available agents- **OpenAlex retriever** (Apr 2026) — academic literature search- **Xquik X/Twitter retriever** (Apr 2026)- **Smart context compression** (Feb 2026) — fast-path skips compression for small document sets- **URL deduplication** (Feb 2026) — prevents re-scraping identical sources- **Anthropic cost tracking** (May 2026) — real usage metadata from API responses- **Max tokens raised to 200k** (May 2026) — supporting modern long-output models- **Tokentoll CI integration** (May 2026) — GitHub Action that analyzes LLM cost impact of PRs---## Summary: Architectural Arc2023-05 Simple script: 1 search engine 1 LLM → report2023-07 Real-time WebSocket streaming2023-08 LangChain abstraction layer2023-10 Multi-agent LangGraph pipeline2024-09 Strategic LLM / planner separation2024-10 Next.js frontend images in reports2024-12 Language support, structured logging2025-02 Deep Research (iterative recursive loops)2025-03 MCP server / embeddable React component2025-06 MCP UI, FireCrawl, nodriver scrapers2025-07 Hallucination evals, domain filtering2025-11 LangChain fully removed2026-02 vLLM, cost tracking, smart compression2026-04 Agent discovery, academic retrievers2026-05 200k token cap, MiniMax, Anthropic cost trackingThe project evolved from a single-researcher curiosity in May 2023 into a production-grade, multi-provider autonomous research platform with 1,780 PRs merged across nearly 200 contributors in just over 3 years.

3步解决硬件散热难题：FanControl精准控温全指南

3步解决硬件散热难题：FanControl精准控温全指南【免费下载链接】FanControl.Releases This is the release repository for Fan Control, a highly customizable fan controlling software for Windows. 项目地址: https://gitcode.com/GitHub_Trending/fa/FanCo…...

2026/5/30 1:34:11 阅读更多 →

中小企业有必要上ERP吗？ERP核心价值、解决问题与落地方案

中小企业有必要上ERP吗？ERP核心价值、解决问题与落地方案核心摘要：很多中小企业老板都在纠结：企业规模不大，到底有没有必要上ERP系统？不用ERP会存在哪些隐形管理漏洞？上线ERP究竟能帮企业降低多少运营成本&…...

2026/5/30 1:30:17 阅读更多 →

【AI】产品思维_工程思维

产品思维，案例：剪贴板工具“铁汁”1、先做Demo，别做全功能；AI时代代码极其廉价，10秒让用户看懂你的产品能干嘛。用AI做的demo让周围人试用下。2、你觉得好用 ≠ 用户觉得有用，3个prompt：目标用户…...

2026/5/30 1:28:15 阅读更多 →

【限时解密】Claude 3.5 Sonnet专属编程模式：仅开放给前500家企业的上下文感知补全协议

更多请点击： https://kaifayun.com 第一章：Claude 3.5 Sonnet编程辅助的核心能力边界与适用场景 Claude 3.5 Sonnet 在编程辅助领域展现出显著的推理深度与上下文理解能力，但其本质仍是基于大规模语言模型的生成式系统，不具备实时…...

2026/5/28 15:08:49 阅读更多 →

RMAN 增量备份（Incremental Backup）

1、概念RMAN 增量备份是指 RMAN 只备份自上次备份以来发生过更改的数据块，而不是备份整个数据库的所有数据块。它是 Oracle 为解决大型数据库全量备份时间长、占用空间大的问题而设计的核心特性，也是现代企业级备份策略的基础。简单类比：全库…...

2026/5/27 0:57:50 阅读更多 →

终极指南：掌握ProperTree跨平台Plist编辑器的10个高效技巧

终极指南：掌握ProperTree跨平台Plist编辑器的10个高效技巧【免费下载链接】ProperTree Cross platform GUI plist editor written in python. 项目地址: https://gitcode.com/gh_mirrors/pr/ProperTree 想要轻松编辑macOS和iOS的配置文件却苦于复杂的XML语法…...

2026/5/27 16:46:38 阅读更多 →

ScriptHookV解决方案：如何安全扩展GTA V游戏功能而不修改原始文件

ScriptHookV解决方案：如何安全扩展GTA V游戏功能而不修改原始文件【免费下载链接】ScriptHookV An open source hook into GTAV for loading offline mods 项目地址: https://gitcode.com/gh_mirrors/sc/ScriptHookV ScriptHookV是一个专为《侠盗猎车手V》&…...

2026/5/27 17:17:05 阅读更多 →