Writing custom sanitization rules with HtmlSanitizeEx
by Paweł Świątkowski
Sanitizing input in web applications is a common task, but usually is connected with security. Of course, this is an important part - letting the users to submit malicious HTML into your page can be deadly. Many frameworks support such kind of sanitization out-of-the box (Ruby on Rails, Hanami, Phoenix) and you have to explicitly mark an HTML string as raw to render it without escaping.
However, allowing HTML input is frequently a requirement, especially when using rich text editors (RTEs) on frontend side. In this case, security is one side of the problem, but the other - almost equally important - is to only let users use some tags. For example, I usually forbid them to use images, because they have a tendency to break layouts. Disabling image input in RTE is not enough. Sooner or later users will discover that they are able to send arbitrary HTML to server.
In Elixir, there is a very good package for HTML sanitization, namely HtmlSanitizeEx. It includes a nice set of built-in sanitization rules. For example this will allow some “basic HTML” tags to be render and will filter out everything else:
This is good for general input sanitization, but does not necessarily match the subset of HTML I allow in my RTE. Fortunately, writing own rule set (or Scrubber, how package authors like to call it) is really easy. For example, say I only want to allow
a. The Scrubber looks like that:
Note that additionally we can restrict URI schemes we allow.
To use it we define a
sanitize method. We can put it in a common module with a Scrubber:
Than we are able to use it (and test it):
This also reveals one downside: when you filter URI schema, you might end up with empty
a tag. This is how
HtmlSanitizeEx currently works and I can actually accept that. But still, I feel very good that it was so easy to adjust sanitizer’s settings to my needs.
If you need inspiration for writing your own sanitization rules, check out built in ones on GitHub.