HTMLRewriter
Use Bun's HTMLRewriter to transform HTML documents with CSS selectors
HTMLRewriter lets you use CSS selectors to transform HTML documents. It works with Request, Response, as well as string. Bun's implementation is based on Cloudflare's lol-html.
Usage
A common usecase is rewriting URLs in HTML content. Here's an example that rewrites image sources and link URLs to use a CDN domain:
// Replace all images with a rickroll
const rewriter = new HTMLRewriter().on("img", {
element(img) {
// Famous rickroll video thumbnail
img.setAttribute("src", "https://img.youtube.com/vi/dQw4w9WgXcQ/maxresdefault.jpg");
// Wrap the image in a link to the video
img.before('<a href="https://www.youtube.com/watch?v=dQw4w9WgXcQ" target="_blank">', {
html: true,
});
img.after("</a>", { html: true });
// Add some fun alt text
img.setAttribute("alt", "Definitely not a rickroll");
},
});
// An example HTML document
const html = `
<html>
<body>
<img src="/cat.jpg">
<img src="dog.png">
<img src="https://example.com/bird.webp">
</body>
</html>
`;
const result = rewriter.transform(html);
console.log(result);This replaces all images with a thumbnail of Rick Astley and wraps each <img> in a link, producing a diff like this:
<html>
<body>
<img src="/cat.jpg" />
<img src="dog.png" />
<img src="https://example.com/bird.webp" />
<a href="https://www.youtube.com/watch?v=dQw4w9WgXcQ" target="_blank">
<img src="https://img.youtube.com/vi/dQw4w9WgXcQ/maxresdefault.jpg" alt="Definitely not a rickroll" />
</a>
<a href="https://www.youtube.com/watch?v=dQw4w9WgXcQ" target="_blank">
<img src="https://img.youtube.com/vi/dQw4w9WgXcQ/maxresdefault.jpg" alt="Definitely not a rickroll" />
</a>
<a href="https://www.youtube.com/watch?v=dQw4w9WgXcQ" target="_blank">
<img src="https://img.youtube.com/vi/dQw4w9WgXcQ/maxresdefault.jpg" alt="Definitely not a rickroll" />
</a>
</body>
</html>Now every image on the page will be replaced with a thumbnail of Rick Astley, and clicking any image will lead to a very famous video.
Input types
HTMLRewriter can transform HTML from various sources. The input is automatically handled based on its type:
// From Response
rewriter.transform(new Response("<div>content</div>"));
// From string
rewriter.transform("<div>content</div>");
// From ArrayBuffer
rewriter.transform(new TextEncoder().encode("<div>content</div>").buffer);
// From Blob
rewriter.transform(new Blob(["<div>content</div>"]));
// From File
rewriter.transform(Bun.file("index.html"));Note that Cloudflare Workers implementation of HTMLRewriter only supports Response objects.
Element Handlers
The on(selector, handlers) method allows you to register handlers for HTML elements that match a CSS selector. The handlers are called for each matching element during parsing:
rewriter.on("div.content", {
// Handle elements
element(element) {
element.setAttribute("class", "new-content");
element.append("<p>New content</p>", { html: true });
},
// Handle text nodes
text(text) {
text.replace("new text");
},
// Handle comments
comments(comment) {
comment.remove();
},
});The handlers can be asynchronous and return a Promise. Note that async operations will block the transformation until they complete:
rewriter.on("div", {
async element(element) {
await Bun.sleep(1000);
element.setInnerContent("<span>replace</span>", { html: true });
},
});CSS Selector Support
The on() method supports a wide range of CSS selectors:
// Tag selectors
rewriter.on("p", handler);
// Class selectors
rewriter.on("p.red", handler);
// ID selectors
rewriter.on("h1#header", handler);
// Attribute selectors
rewriter.on("p[data-test]", handler); // Has attribute
rewriter.on('p[data-test="one"]', handler); // Exact match
rewriter.on('p[data-test="one" i]', handler); // Case-insensitive
rewriter.on('p[data-test="one" s]', handler); // Case-sensitive
rewriter.on('p[data-test~="two"]', handler); // Word match
rewriter.on('p[data-test^="a"]', handler); // Starts with
rewriter.on('p[data-test$="1"]', handler); // Ends with
rewriter.on('p[data-test*="b"]', handler); // Contains
rewriter.on('p[data-test|="a"]', handler); // Dash-separated
// Combinators
rewriter.on("div span", handler); // Descendant
rewriter.on("div > span", handler); // Direct child
// Pseudo-classes
rewriter.on("p:nth-child(2)", handler);
rewriter.on("p:first-child", handler);
rewriter.on("p:nth-of-type(2)", handler);
rewriter.on("p:first-of-type", handler);
rewriter.on("p:not(:first-child)", handler);
// Universal selector
rewriter.on("*", handler);Element Operations
Elements provide various methods for manipulation. All modification methods return the element instance for chaining:
rewriter.on("div", {
element(el) {
// Attributes
el.setAttribute("class", "new-class").setAttribute("data-id", "123");
const classAttr = el.getAttribute("class"); // "new-class"
const hasId = el.hasAttribute("id"); // boolean
el.removeAttribute("class");
// Content manipulation
el.setInnerContent("New content"); // Escapes HTML by default
el.setInnerContent("<p>HTML content</p>", { html: true }); // Parses HTML
el.setInnerContent(""); // Clear content
// Position manipulation
el.before("Content before").after("Content after").prepend("First child").append("Last child");
// HTML content insertion
el.before("<span>before</span>", { html: true })
.after("<span>after</span>", { html: true })
.prepend("<span>first</span>", { html: true })
.append("<span>last</span>", { html: true });
// Removal
el.remove(); // Remove element and contents
el.removeAndKeepContent(); // Remove only the element tags
// Properties
console.log(el.tagName); // Lowercase tag name
console.log(el.namespaceURI); // Element's namespace URI
console.log(el.selfClosing); // Whether element is self-closing (e.g. <div />)
console.log(el.canHaveContent); // Whether element can contain content (false for void elements like <br>)
console.log(el.removed); // Whether element was removed
// Attributes iteration
for (const [name, value] of el.attributes) {
console.log(name, value);
}
// End tag handling
el.onEndTag(endTag => {
endTag.before("Before end tag");
endTag.after("After end tag");
endTag.remove(); // Remove the end tag
console.log(endTag.name); // Tag name in lowercase
});
},
});Text Operations
Text handlers provide methods for text manipulation. Text chunks represent portions of text content and provide information about their position in the text node:
rewriter.on("p", {
text(text) {
// Content
console.log(text.text); // Text content
console.log(text.lastInTextNode); // Whether this is the last chunk
console.log(text.removed); // Whether text was removed
// Manipulation
text.before("Before text").after("After text").replace("New text").remove();
// HTML content insertion
text
.before("<span>before</span>", { html: true })
.after("<span>after</span>", { html: true })
.replace("<span>replace</span>", { html: true });
},
});Comment Operations
Comment handlers allow comment manipulation with similar methods to text nodes:
rewriter.on("*", {
comments(comment) {
// Content
console.log(comment.text); // Comment text
comment.text = "New comment text"; // Set comment text
console.log(comment.removed); // Whether comment was removed
// Manipulation
comment.before("Before comment").after("After comment").replace("New comment").remove();
// HTML content insertion
comment
.before("<span>before</span>", { html: true })
.after("<span>after</span>", { html: true })
.replace("<span>replace</span>", { html: true });
},
});Document Handlers
The onDocument(handlers) method allows you to handle document-level events. These handlers are called for events that occur at the document level rather than within specific elements:
rewriter.onDocument({
// Handle doctype
doctype(doctype) {
console.log(doctype.name); // "html"
console.log(doctype.publicId); // public identifier if present
console.log(doctype.systemId); // system identifier if present
},
// Handle text nodes
text(text) {
console.log(text.text);
},
// Handle comments
comments(comment) {
console.log(comment.text);
},
// Handle document end
end(end) {
end.append("<!-- Footer -->", { html: true });
},
});Response Handling
When transforming a Response:
- The status code, headers, and other response properties are preserved
- The body is transformed while maintaining streaming capabilities
- Content-encoding (like gzip) is handled automatically
- The original response body is marked as used after transformation
- Headers are cloned to the new response
Error Handling
HTMLRewriter operations can throw errors in several cases:
- Invalid selector syntax in
on()method - Invalid HTML content in transformation methods
- Stream errors when processing Response bodies
- Memory allocation failures
- Invalid input types (e.g., passing Symbol)
- Body already used errors
Errors should be caught and handled appropriately:
try {
const result = rewriter.transform(input);
// Process result
} catch (error) {
console.error("HTMLRewriter error:", error);
}See also
You can also read the Cloudflare documentation, which this API is intended to be compatible with.