Back to homepage

Antheta.com 2021 - Present

Antheta.com is an work in progress web scraper for scraping all kinds of different data. You can currently download data in .JSON format.
Antheta automatically attempts to scrape multiple different types of data:

  • General data
    • Title, description, favicon
    • Detected technologies (WordPress, jQuery, MailChimp etc...)
  • Server
    • Cookies
    • Headers
    • Host (IP-address)
    • API endpoints
    • AJAX Requests along with methods, parameters, headers and timestamps
    • Site size
  • Images 
  • Links (links in 3 different categories: outbount, inbound & navbar links)
    • Outbound links
    • Inbound links (with the same host)
    • Navbar links
  • Socials (social links)
  • Scripts
    • Inline + linked
    • Parses JSON Scripts e.g. schema.org
  • Styles
    • Inline + linked
  • Fonts (Google fonts found on site)
  • Forms (form method, action and all form elements + checks for recaptcha)
  • Contacts (note that the tool is not intended for spam, instead use it to secure your own sites against bots)
    • Email addresses
      • Parses and tries to match multiple different regexes
      • Works with common obfuscators e.g. CloudFlare (looks for dynamically loaded content and not the initial html)
    • Phone numbers
  • Tables (table header + content)
  • HTML (full page HTML code)
  • IP-Addresses (parses IP-addresses in real-time)
 {
  "ipAddresses": [
    {
      "ip-address": "177.75.97.128",
      "port": "3128",
      "match": "177.75.97.128",
      "ip": "177.75.97.128:3128",
      "protocol": "socks5",
      "anonymity": "elite",
      "speed": "31ms"
    }
  ]
} 
  • Dynamic content (content that is being loaded over javascript)
  • __NEXT_DATA__
    • Most React based websites/apps have an __NEXT_DATA__ element that contain important information.

You can also use CSS element selectors to specify the data you wish to scrape. Example: ".container .links a->href" will get the "href" attribute from that element along with other common attributes (class, id, name, html, text etc)

Read more about the types of data on Antheta.com




Antheta.com will have a full dashboard that anyone can use and specify exactly what they wish to scrape. Heres what the upcoming Anheta Editor currently looks like:


Dark version:

The user will have the ability to create their own datasets where they can specify their own columns and this would act sort of like database tables. Say you want to gather emails from thousands of sites, you can make a dataset for it and later on connect to it using our API or just export the list.


Light version:



👏 😍
Share

cookie Cookies