This project has moved and is read-only. For the latest updates, please go here.

Find most contentl'y block

May 20, 2010 at 4:12 AM

Hello! I need find in html page block which is most contently look. So i have to find  all nodes and select only each of them which contains text. Then i have to look this text and count number of words, characters, commas and give some score to this node depend on this data.
-50 score if attribute class is: footer, footnote, navigation, menu etc.('cause these block doesn't contain main content, for example div class="footer").
-50 score if attribute id is the same(for example div id="footer").
+25 if attribute class contain /((^|\s)(post|entry|entry[-]?(content|text|body)?|article[-]?(content|text|body)?)(\\s|$))/ (div class="post"-content, post, article etc)
+25 the same with id(div id=post-content, post, article etc)
+1 for each paragraph which contain more that 10 characters.
+ number of commas in paragraph.

After it i know which block on page is most-contently look and i must show it/

Please each of you who may be interesting in this post help or give me some code to try and go on )