Shaun Mccran

My digital playground


Stopping form submissions containing HTML code

Are you getting web form spam? I was, mainly from bots, or 'copy and paste' farms where users sit and paste bulk code into your forms. Mostly these contain links or image tags. Rather than compromise the form usability for genuine users I decided to fix my spam issue with a little more tact than you may usually find, such as in the case of Captcha devices.

I'm already cleaning html code submissions on the server side but why not make things even more informative for my users and tell them up front that their HTML isn't appreciated.

I didn't want a complicated Regex or pattern matching plugin, I simply wanted to detect key HTML elements and tell the user that it wasn't going to be accepted by the form. This code uses the JQuery plugin for form validation. You can get it here:

First things first, let's create a custom validation rule to detect our html elements.

view plain print about
1<s/cript language="javascript">
4 $.validator.addMethod("findhtml", function(value, element) {
6 var commentclean = true,
7 disallowed = ['<', 'http'];
9 for (var i=0, len = disallowed.length; i<len; i++) {
10 if (element.value.indexOf(disallowed[i]) != -1) {
11 commentclean = false;
12 break;
13 }
14 }
16 return this.optional(element) || ( commentclean );
18 }, "
* match the string");

This creates an array of disallowed elements and loops through them when the rule is invoked.

Secondly we will use this rule in our validation routine when a user tries to submit the form.

view plain print about
1<!--- Form now powered from JQuery --->
2    <s/cript type="text/javascript">
3        $(document).ready(function(){
4            $("#form").validate({
6            rules: {
7                name: {required: true},
8                email: {required: true},
9                tel: {required: false},
10                message: {required: true, findhtml: true}
11             },
13            messages: {name: "Please enter your Name",
14                email: "Please enter a valid email address",
15                tel: "",
16                message: {required: "Please enter a message", findhtml: "You cannot enter HTML in this field"}
17             }
19            });
20        });
21    </s/cript>
The key line here is:
view plain print about
1message: {required: true, findhtml: true}

This invokes our previously created validation rule.

In this way the user is told 'You cannot enter HTML in this field'. A friendly validation message that clearly shows WHY the form isn't going to work.

You can see this working on my contact form here :

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Jeff Smallwood's Gravatar I've been struggling to get a better handle on blogspam and html comments too. I like this approach but looking for just the string "http" seems a bit strong as this prevents someone from just pasting a link into a message even if the intent is not to put actual HTML in the page. The script picks up the http on the link and stops even though actual HTML markup doesn't exist in the form. I like your idea of trying to avoid regex, but can't think of a solid solution that doesn't use regex without getting false positives like this. Maybe only search for strings that all start with "<" to ensure it is limited to tags? like <InvalidTag, <html, <div, etc Thoughts? BTW, captcha image wasn't working earlier, it is now :)
# Posted By Jeff Smallwood | 15/07/2013 08:59
Shaun McCran's Gravatar Hi Jeff,
It likely is a bit strong, but I thought I'd try and cover the widest range of potential HTML input and http requests and img tags seem to be the most submitted as spam. By trapping http and < it seems to have the most anti spam impact for the least technical disruption.

You could implement a third party API or a full Regex but with the level of spam reduction I've seen from this I think it works well enough :-)
# Posted By Shaun McCran | 16/07/2013 06:48
James's Gravatar There are so many paper writing services and programs in our computer but mostly people know just about the basic and we contact to experts when we face some troubles so we should read this post and visit site daily if we want to solve every problems of our computers by our self.
# Posted By James | 24/12/2015 03:37
Back to top