dinsdag 17 april 2018

Ultimate HTML color sanitation in PHP

How to sanitise a HTML-COLOR in PHP?

That's become rather difficult, with rgb and hsl colors, as well as short color values and long color values. It took me a 20 minutes, but this regex is ALMOST perfect..

https://regex101.com/r/A2IjNO/26

in PHP:

$col="#fff";
$col=preg_replace("/[^a-fA-F0-9\#(),.%rgbhsl]/",'',$col); // only leave legal characters!
if(!preg_match("/^(\#[\da-f]{3}|\#[\da-f]{6}|rgba\(((\d{1,2}|1\d\d|2([0-4]\d|5[0-5]))\s*,\s*){2}((\d{1,2}|1\d\d|2([0-4]\d|5[0-5]))\s*)(,\s*(0\.\d+|1))\)|hsla\(\s*((\d{1,2}|[1-2]\d{2}|3([0-5]\d|60)))\s*,\s*((\d{1,2}|100)\s*%)\s*,\s*((\d{1,2}|100)\s*%)(,\s*(0\.\d+|1))\)|rgb\(((\d{1,2}|1\d\d|2([0-4]\d|5[0-5]))\s*,\s*){2}((\d{1,2}|1\d\d|2([0-4]\d|5[0-5]))\s*)\)|hsl\(\s*((\d{1,2}|[1-2]\d{2}|3([0-5]\d|60)))\s*,\s*((\d{1,2}|100)\s*%)\s*,\s*((\d{1,2}|100)\s*%)\))$/i",$col))
{
 $col="notacolor";
}
echo "COLOR: $col";
/* just to make that VERY long regex a little clearer!
if(!preg_match("
^(
  \#[\da-f]{3}
 |\#[\da-f]{6}
 |rgba\(
         ((\d{1,2}|1\d\d|2([0-4]\d|5[0-5]))\s*,\s*){2}
         ((\d{1,2}|1\d\d|2([0-4]\d|5[0-5]))\s*)
         (,\s*(0\.\d+|1))
      \)
 |hsla\(
       \s*((\d{1,2}|[1-2]\d{2}|3([0-5]\d|60)))\s*,
       \s*((\d{1,2}|100)\s*%)\s*,
       \s*((\d{1,2}|100)\s*%)
       (,\s*(0\.\d+|1))
     \)
 |rgb\(
         ((\d{1,2}|1\d\d|2([0-4]\d|5[0-5]))\s*,\s*){2}
         ((\d{1,2}|1\d\d|2([0-4]\d|5[0-5]))\s*)
      \)
 |hsl\(
       \s*((\d{1,2}|[1-2]\d{2}|3([0-5]\d|60)))\s*,
       \s*((\d{1,2}|100)\s*%)\s*,
       \s*((\d{1,2}|100)\s*%)
     \)
)$i
",$col)); 
*/
How do you use it?
User can input ANY string and ONLY valid colorstring parts will pass. So it will pass:
#fff;
#ff8800
rgb(255,15,0)
rgba(255,15,0,0.5)
hsla(208, 56%, 46%, 1)
hsl(0, 100%, 100%)

If it doesn't pass in the above example the string: notacolor is given to $col.

HOW IT WORKS

We work in two steps, first step is to ONLY allow the characters: abcdefgrABCDEFGR.()hlsHLS #0123456789. We replace anything that is not those characters with a "".

Then we match the result with ANY of the legal color patterns. If that doesn't work, then BANG, you get 'notacolor'.

This way, if the user makes an obvious typo, it will be forgiven like "#000" => #000 would still work, but get's cleaned up.. Also things like: rgb( 5 , 15 , 255 ) which is legal will be rewritten as:  rgb(5,15,255) saving precious bytes.

Now in theory a hacker could still do a thing like: rgba(0,0,0,0.000000000000000......), which would pass as legal, so if you REALLY want to make it safe from overflow-type-attacks, checking the length of the string would be a good idea too. But that goes without saying.. that's always a good idea..

You cannot do it with a match alone, because then "#fff<script alert('oops')></script>" would be a legal string. In our case this would become: #fffrar()r, which is NOT a color so it would become 'notacolor'

NOTE:
At first I thought about using an inverse pattern of the above match to preg_replace anything NOT legal for colors.., but that proved to be more difficult than I cared for and it would probably be quite slow, because of the need to negate a | operator. Also, I don't think regex HAS a & operator, so negation might be impossible.. Anyway..

Disclaimer:


  •  I didn't build it on my own, there where 24 tries before me to study and use parts of. But this seems to be the first to pass all tests for the above cases. I am just proud to be of use :)
  • The only thing it doesn't validate is things like: red..
    Which is a valid color code.
    The complete list is here:
    https://www.w3schools.com/tags/ref_colornames.asp
    I just think it defeats the purpose to check this list with a regex.
    It can be done much more transparent with a in_array() check.

When do we use this?

Not much, the color input from HTML5 will give you a nice #555555 value and doesn't even support rgba colors (yet). But the fallback is an ordinary text-input, so we need to protect ourselves.
This regex is just for the future, when the color input will support alpha-colors and hsl colors as well.
I am building a framework that I'd like to think I'll use for the coming 10 years.