Web site XSS using BOM on UTF-8 pages
- June 1, 2006
- Masatoshi Kimura
- Firefox, SeaMonkey, Thunderbird
- Fixed in
- Firefox 18.104.22.168
- SeaMonkey 1.0.2
- Thunderbird 22.214.171.124
Masatoshi Kimura reports that the Unicode Byte-order-Mark (BOM) is
stripped from UTF-8 pages during the conversion to Unicode before
the parser sees the web page. As a result the parser will see and
script tags that web input sanitizers may miss
because they appear as "scr[BOM]ipt" or similar in the comment code
on the web site.
Although Firefox 126.96.36.199 and later will be fixed and no longer accept such script tags, web sites will continue to be visited by older versions of Firefox and Mozilla browsers. Web sites can protect themselves by explicitly setting the character encoding to something other than UTF-8, or by adding the Unicode byte-order marks to the repertoire of the site's input sanitizer.
Sites can protect their users by stripping the BOM from web input or, if appropriate, specifying a character encoding other than UTF-8.