URL

About this prototype

This is a prototype of how classifiers can be used to detect spam blogs (splogs). Right now two classifiers cooperate to identify spam. If this idea shows potential we will be using clusters of dynamic classifiers for spam identification. Urls are also detected directly however more urls are needed for the training data.

Known Issues

  • Training data is quite old.
  • More training data required.
  • Will mistake most non english blogs for spam.
  • Doesn't work as well on forums.
  • Can have long response times.
  • URL corpus is too small.

API

Right now there is no quality, stability or performance guarantee but feel free to use the API like this: http://www.spamhuh.com/Classify.aspx?testUrl=blog.uclassify.com The response is in XML, see the uclassify API documentation for a better explaination!