This project is read-only.

The main fuctionality of the program is the detection of HTML related SEO violations. It can detect the majority of violations supported by Microsoft's SEO toolkit and promotes the same best practices like writing a title of 5 to 65 characters long or using the h1 tag once per page. An agent based Web crawler collects HTML pages until reaching the specified limit of links to follow and displays the results. For each discovered HTML page the user can generate a report containing:

  • details about the page: title, description, server ...
  • violations: missing title, multiple <h1> tags ...
  • keywords: one, two and three words keywords combinations with occurrence and density stats
  • internal links and links pointing to other domains
  • Web response headers
  • source code


Each generated report displays the following details if they exist:

  • content length
  • last modification date
  • server
  • Web request duration
  • title
  • description
  • keywords
  • h1 tag
open source seo software - details


The violation detection algorithm distincts between errors and warnings. Fixing errors is necessary and removing warnings is recommended. The folowing violations are checked:

  • multiple title tags
  • empty title tag
  • title longer than 65 characters
  • title shorter than 5 characters
  • missing title tag
  • title and description with identical content
  • multiple meta description tags
  • empty meta description tag
  • missing meta description tag
  • description longer than 150 characters
  • description shorter than 25 characters
  • multiple h1 tags
  • missing h1 tag
  • empty h1 tag
  • large inline script code (> 2048 characters)
  • large inline CSS definitions (> 1024 characters)
  • missing alt attribute in img tags
open source seo software - violations


Keyword analysis supports English and French and uses Bing's translator API for language detection so a Bing app ID is required. Supporting other languages is a matter of adding a stopwords list (unless there's an encoding issue). Keyword combinations of one, two and three words are extracted while calculating the occurrence and density of each combination. Only keyword combinations that are repeated at least twice are retained in order to keep the results clean. The density is calculated by multiplying the occurrence and the number of words in the combination and dividing the result by the total number of keywords in the page. For example, if a Web page contains a total of 100 keywords where the keyword "key1" is repeated 4 times, the keyword "key2" is repeated 5 times and the combination "key1 key2" has an occurrence of 3 then the density of "key1" is 4 %, the density of "key2" is 5 % and the combination "key1 key2" has a density of 6 % (3 * 2 / 100 * 100). The user can also export the keywords to an Excel spreadsheet (async task).

open source seo software - keywords


During links extraction, canonicalization is forced for internal ones in order to keep the results clean. The user also have the possibility to check if the page contains broken links. This relatively long running tasks runs asynchrounsly so the UI remains responsive.

open source seo software - links


These are the Web response headers.

open source seo software - headers


The user can locate a character by line and column numbers and validate the markup against the W3C validator.

open source seo software - html

Last edited Feb 18, 2011 at 9:24 AM by taha123, version 2


No comments yet.