Title: Jina AI Reader URL Source: https://simonwillison.net/2024/Jun/16/jina-ai-reader Markdown Content: **[Jina AI Reader](https://jina.ai/reader/)**. Jina AI provide a number of different AI-related platform products, including an excellent [family of embedding models](https://huggingface.co/collections/jinaai/jina-embeddings-v2-65708e3ec4993b8fb968e744), but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM. Add `r.jina.ai` to the front of a URL to get back Markdown of that page, for example [https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/](https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/) - in addition to converting the content to Markdown it also does a decent job of extracting just the content and ignoring the surrounding navigation. The API is free but rate-limited (presumably by IP) to 20 requests per minute without an API key or 200 request per minute with a free API key, and you can pay to increase your allowance beyond that. The Apache 2 licensed source code for the hosted service is [on GitHub](https://github.com/jina-ai/reader) - it's written in TypeScript and [uses Puppeteer](https://github.com/jina-ai/reader/blob/main/backend/functions/src/services/puppeteer.ts) to run [Readabiliy.js](https://github.com/mozilla/readability) and [Turndown](https://github.com/mixmark-io/turndown) against the scraped page. It can also handle PDFs, which have their contents extracted [using PDF.js](https://github.com/jina-ai/reader/blob/main/backend/functions/src/services/pdf-extract.ts). There's also a search feature, `s.jina.ai/search+term+goes+here`, which [uses the Brave Search API](https://github.com/jina-ai/reader/blob/main/backend/functions/src/services/brave-search.ts).