@osmn-byhn/htmlparser 🚀

Unify HTML, CSS, and JS into a single, element-centric JSON structure. This library is designed for developers who need to extract data from HTML while preserving its visual and functional context. Unlike traditional parsers, @osmn-byhn/htmlparser inlines styles from style tags and resolves JavaScript event handlers (like onclick) into their actual function bodies.

🌟 Why use this?


📦 Installation

npm install @osmn-byhn/htmlparser

or

pnpm add @osmn-byhn/htmlparser

or

yarn add @osmn-byhn/htmlparser

🚀 Quick Start

TypeScript

import { extractUnifiedFromHTML } from "@osmn-byhn/htmlparser";

const html = `
  
    
      
    
    
      
      
    
  
`;

async function main() {
  const result = await extractUnifiedFromHTML(html);
  
  const button = result.body.children[0];
  console.log(button.inlineStyle); // { color: 'red' }
  console.log(button.events.click.function); // "function sayHi() { ... }"
}

main();

JavaScript (ES Modules)

import { extractUnifiedFromHTML } from "@osmn-byhn/htmlparser";

const result = await extractUnifiedFromHTML('
Hello
'); console.log(result.body);

JavaScript (CommonJS)

const { extractUnifiedFromHTML } = require("@osmn-byhn/htmlparser");

extractUnifiedFromHTML('
Hello
').then(result => { console.log(result.body); });

🛠️ Output Structure

The output is a UnifiedExtraction object:
Field Description
metadata Stats: totalElements, maxDepth, totalTextNodes, etc.
body The root UnifiedElement (usually the tag).

The UnifiedElement object:

Every element in the tree has this structure:
{
  "tag": "div",
  "id": "main-container",
  "class": "active primitive",
  "attrs": { "data-custom": "value" },
  "inlineStyle": { 
    "color": "red", 
    "font-size": "16px" 
  },
  "events": {
    "click": {
      "handler": "myFunc()",
      "function": "function myFunc() { ... }"
    }
  },
  "children": [ ... ],
  "textContent": "Hello World"
}

🎯 Use Cases


📜 License

MIT © osmn-byhn

Author: Osman Beyhan


❤️ Make via MDtoWeb