Content Restrictions

Version 0.9.2 - 2007-03-20
Previous: 0.5, 0.6

Introduction

Cross-site scripting (XSS) attacks would always fail if the browser could know for absolute certain which scripts were legitimate and which were malicious. In the absence of affordable and reliable mind-reading technology, and in consideration of the mental fatigue this would undoubtedly induce in web page authors, this specification allows a site designer to explain his state of mind to the user agent by specifying restrictions on the capabilities of his content.

As a real-world example, a webmail system might serve an HTML email and specify that the user agent should not execute any script in the body of that page. This means that, even if the webmail system's content-filtering process failed, the user of a conforming user agent would not be at risk from malicious content in the attachment.

Goal

This mechanism is primarily intended to aid in the prevention or mitigation of cross-site scripting (XSS) attacks. Sites would define and serve Content Restrictions for pages which contained untrusted content which they had filtered. If the filtering failed, the Content Restrictions may still prevent malicious script from executing or doing damage.

Note that this specification is designed to be a backstop to server-side content filtering, not a replacement for it. There is intentionally no defined way for a server to determine the existence of or level of support for this specification in a given user agent. It's about protecting the user and covering the designer's ass, not about allowing him to be lazy.

Restrictions

This specification is intended to be content-agnostic, but the initial implementation will focus on HTML and the exact meaning for HTML or XHTML content is specified as a guide. "all" is the default in all cases.

script

Value Meaning
all No restrictions.
header Only script defined in the document header is allowed. For document types which don't have such a header, this is equivalent to "all".
HTML: <script> in the <head> element only. No event handlers in the markup.
external Only external script is allowed.
HTML: <script src="[url]"> only. No inline script, event handlers or javascript: URLs in markup.
none No script may execute.

Multiple values of this restriction may be specified.

host

The value of this parameter is a string specifying a domain or IP address (e.g. "example.com" or "127.0.0.1") and optionally a port number (default is ":80"). All requests initiated by the content (either embedded URLs or script) can only be made to the specified IP address or domain and its subdomains, using the standard Same Origin rules. This prevents malicious content phoning home (even using e.g. URL parameters on an <img>) or importing extra unwanted malicious content. Multiple values of host make it possible to access any of the named domains or their subdomains. For IDN domains, the punycode form is specified.

The magic value "this" means the host from which the page was served, or its subdomains. This makes implementing the headers simpler in the case where the user's content is all served from one of a pool of hosts.

This mechanism does not require the browser to permit accesses which would have been blocked anyway (e.g. cross-site XMLHttpRequest).

script-host

The value of this parameter is a IP/domain/magic-name string, as above. However, specifying it means that external scripts are only permitted if served from the IP/domain specified (again, using Same Origin rules). HTML: <script src="[url]"> is restricted to given location only.

It is useful to have both this and the "host" restriction because script loads are the only loads which are currently not restricted at all by Same Origin rules. This allows that hole to be closed on an opt-in basis. The two restrictions interact such that a script load must be permitted by both to be allowed.

The "script" and "script-host" restrictions are implemented at parse time, so that permitted script can dynamically add event handlers to content where script was forbidden, and they will still work.

cookies

Value Meaning
all No restrictions (both write and read allowed).
writeWrite access only.
read Read access only.
none No access to cookies.

HTML: controls script access to document.cookie. Many sites do all their cookie stuff server-side, so have no need for client-side access to cookies. The value "none" has roughly the same effect as the "httpOnly" cookie header extension, although this is more fine-grained because you can allow access to particular cookies on some pages and not on others.

hierarchy

Value Meaning
all No restrictions (both child and parent access allowed).
childrenThe children of the page are accessible, but not the parent.
HTML: the frames array is accessible, but not parent or top. This allows sites to sandbox same-domain content inside an <iframe>.
parent the parent is accessible, but not the children.
HTML: the opposite of the above.
none No hierarchy traversal allowed.

The same-origin policy still applies.

Applying to Content

The restrictions are applied to content served over the web by serving it with an HTTP header, as follows, or in an XHTML or HTML page using <meta http-equiv="Content-Restrictions" content="...">

The syntax is of the following form:

PolicyHeader  = "Content-Restrictions: " Rules ;
Rules         = [ Version "," ] Rule { "," Rule } ;
Rule          = HostRule | CookieRule | HierarchyRule | ScriptRule ;
HostKey       = "domain" | "script-domain" ;
HostRule      = HostKey "=" Host ;
                (* Host is a domain (punycode in the IDN case)
                   or an IP address, plus optional port number, or the magic value
                   'this' *)
AllNoneValues = ( "all" | "none" ) ;
CookieRule    = "cookies="   ( AllNoneValues | "write" | "read" ) ;
HierarchyRule = "hierarchy=" ( AllNoneValues | "parent" | "children" ) ; 
ScriptRule    = "script="    ( AllNoneValues | "header" | "external" ) ; 
Version       = "version="   VersionNumber ;
                (* VersionNumber regexp: /^[1-9]\d*$/ *)

Example: Content-Restrictions: script=header,cookies=none,frames=none

New versions of this policy definition may be given a distinguishing version number; this is version 1. Compatible sub-versioning is handled by the fact that if either the name or value is unrecognised, the rule is ignored. If the version number is missing, 1 is assumed.

There may be multiple headers or meta tags. The implementation should combine the restrictions of all the policy strings which have the highest version number which is present and which it understands. The meta tag version is not guaranteed to have an effect on script which comes before it in the document. Scripts loaded by a document use the policy of the parent document.

If a chosen string has a parsing error, the remainder of the string is ignored.

Open issue: what happens if, say, a page is served with one Content-Restrictions header and it includes a JS file with another one? Do you combine the two and take the toughest restriction using the hierarchy? Does that apply to all the script or just that in the included file?

Open issue: do we have a JS interface so pages can detect which restrictions are supported?

Q & A

Why write the spec in terms of "restrictions" rather than "capabilities"?
Backwards-compatibility. Current user agents are fully capable. Any restrictions we can place on content to possibly mitigate XSS is therefore a bonus. Also, if it were in terms of capabilities, you might require UI if the capabilities the page wanted conflicted with the desires of the user. This is a UI-free specification, which is a feature.

Original URL: http://www.gerv.net/security/content-restrictions/