Sometimes, websites need to allow users to submit some custom HTML code, known as the rich text. For example, a user posts in the forum; the contents of the post have pictures, videos, tables, etc., and the effects of these rich texts are achieved by the HTML code.
How to distinguish security rich text and offensive XSS?
In dealing with rich text, we need to return to the input check ideas. In Input Check, the main problem is not the inspection output of the variable context. But rich text data submitted by the user, whose semantics is the complete HTML code, does not produce the output to a patchwork of tag attributes. Therefore, special cases can get special treatment.
In the previous acticles, we listed all possible places to execute the script in the HTML. An excellent XSS filter should be able to identify all possible executions of scripts in the HTML code.
HTML is a structured language, easy to analyze. Htmlparser can parse out tags, labels, properties, and events of HTML code. Filtering the rich text event should be strictly prohibited, because the presentation requirements for rich text should not include the dynamic effects of event. Dangerous labels such as <iframe>, <script>, <base>, and <form> also should be strictly prohibited.
In the choice of the label, you should use the whitelist to avoid the use of blacklists. For example, it only allows <a>, <img>, <div>, and other safe labels to exist. The whitelist principles are not only used for the selection of the pins; the same should be used for the selection of attributes and events. In rich text filtering, processing CSS is a troublesome thing. If you allow users to customize the CSS style, it may also lead to XSS attacks. Therefore, as far as possible, prevent users from customizing CSS and style.
If it must allow users to self-custom style, you can only filter CSS like rich text. This requires a CSS parser for intelligent analysis of the style; check whether it contains dangerous code.
Some of the more mature open-source projects implement XSS checking for rich text. Anti-Samy* is an open-source project in OWASP, also the best of the XSS filters. It is based on Java at the earliest and now has been extended to.NET and other languages:
Policy policy = Policy.getInstance(POLICY_FILE_LOCATION);
AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(dirtyInput, policy);
MyUserDAO.storeUserProfile(cr.getCleanHTML()); // some custom function
You can use another widely acclaimed open-source project in PHP:HTMLPurify.