There are three ways of including scripts in HTML documents:-
The current HTML version is 4.01. The HTML 4.01 transitional DTD defines a script element with:-
<!ELEMENT SCRIPT - - %Script; -- script statements -->
<!ATTLIST SCRIPT
charset %Charset; #IMPLIED - char encoding of linked resource - type %ContentType; #REQUIRED - content type of script language - language CDATA #IMPLIED - predefined script language name - src %URI; #IMPLIED - URI for an external script - defer (defer) #IMPLIED - UA may defer execution of script - event CDATA #IMPLIED - reserved for possible future use - for %URI; #IMPLIED - reserved for possible future use ->
(The HTML 4.01 strict DTD omits the language
attribute
as it is deprecated in the HTML 4.01 standard.)
Script elements are defined with opening and closing script tags. The two dashes after the element name in the DTD means that neither opening nor closing tag may be omitted, even when the element is importing a javascript file and has no contents.
The charset
attribute can declare the character encoding of an external
javascript file that is being imported using the src
attribute. In all
cases it is preferable that the server sending the file provide
character encoding information in Content-Type
HTTP headers (with the
slight problem that there was no official content-type for use with
scripts; see the type
attribute below).
Javascript itself uses a very limited repertoire of characters but the
content of string literals in non-Latin languages may necessitate an
interest in character encodings with script files. That is not a
problem that I have faced to date so I don't know how it should best
be handled. I am yet to see a charset
attribute used in
a script tag.
The type
attribute is required in HTML 4 but the HTML 4
specification is not very helpful on the subject. It says:-
type = content-type [CI]
This attribute specifies the scripting language of the element's contents and overrides the default scripting language. The scripting language is specified as a content type (e.g., "text/javascript"). Authors must supply a value for this attribute. There is no default value for this attribute.
(The [CI] means that the attribute's value is case insensitive.)
Pursuing the permissible values of content-type through the HTML
specification leads to a list of currently recognised content types
(MIME or Media types). Up until mid 2005 that list did not include
anything related to ECMAScript or javascript. So although the attribute
is required, and so must have a value, there was no standardised
content-type for that value. However, the HTML 4 specification did give
text/javascript as an example (even though it was not a recognised
standard content type) so it was that value that has traditionally been
used with the type
attribute when including or importing
ECMAScript/javascript into an HTML page.
The MIME types introduced in 2005 are application/ecmascript,
application/javascript and text/javascript. The last of these, and the
value that has traditionally been used; text/javascript, was official
deprecated and so should be phased-out over time. However, at the point
of officially recognising these new MIME types no browsers exist that
will recognise either of application/ecmascript and application/javascript.
This means that if either are actually used for the value of the type
attribute the likelihood is that the script in question will never be
executed.
So for the present, and probably many years to come, text/javascript is
the only viable value for use with the type
attribute when using javascript.
type="text/javascript"
The language
attribute is deprecated (and not allowed under
the strict DTD) and it is unnecessary when the type
attribute
is required, as that attribute will determine the language used.
The language
attribute can be more specific than the
type
attribute because it can also specify the language
version. In almost all respects specifying a language version is not
helpful and even potentially dangerous.
By default a web browser will execute a script using the latest version of the language that it supports. Generally all current (March 2004) browsers support all of the language features specified in ECMA 262 2nd edition (approximately JavaScript 1.3) and most fully support the 3rd edition. Restricting the language features used to those defined in ECMA 262 2nd edition (with additional care in some less used areas) should result in scripts that will happily execute on all current browsers without a need to specify a language version.
Netscape initially attempted to tie the DOM features of their browser to the language version, which would have allowed a specified language version to imply the DOM features supported. That idea was abandoned because other browsers produced by their competitors introduced scripting with near identical languages but significantly different DOMs. DOM support should be determined independently of language version using object/feature detecting.
The potential danger with specifying a language version comes with specifying version 1.2. Version 1.2 was an aberration. It deviated significantly from earlier versions of the language in anticipation of changes to the ECMA specification, but those changes were never made. Netscape had to reverse the changes it had made to version 1.2 in version 1.3 in order to conform with what was eventually published as ECMA 262 2nd edition. The only browsers released for which version 1.2 was the default JavaScript version were Netscape 4.00 to 4.05 (and you won't find many of those left in the wild).
The problem is that if you specify version 1.2 in a language
attribute
you may actually get it, with all of its deviant characteristics, but
at the same time most browsers will not exhibit those characteristics.
It is always a bad idea to encourage the same code to be interpreted in
two different ways, and certainly never without fully understanding how
the language versions differ. The specific problem can be avoided by
never specifying the language version as 1.2. The issue can be avoided
by never providing the deprecated language
attribute at all.
The SRC attribute specifies the URL of an external javascript file that
is to be imported by the script element. If no file is being imported
by the element (the script is the element's contents) then the
src
attribute is omitted.
The defer
attribute is specified as providing a
"hint" to the browser as to whether it needs to process
the script immediately (as it usually would), or whether it can carry
on parsing the HTML following the script element and leave the
javascript interpreter to process the script in its own time.
If a script uses the document.write
method to insert
content into the HTML being processed then the script element
containing that script must not be deferred as the inserted HTML
could end up at any point in the document (or even be inserted after
the current document has closed, replacing it). If a script is
deferred additional care must be taken before any part of it, such
as a function it defines, is interacted with by other scripts (such
as intrinsic events).
It is unusual for a script element to have a defer
attribute. And many browsers will not recognise/act upon a
defer
attribute even if one is present.
Leaving the defer
and charset
attributes
aside, the normal formulation for a valid HTML 4 script element that
imports a javascript file is:-
<script type="text/javascript"
src="http://example.com/scriptFile.js"></script>
<!-- or using an example relative URL -->
<script type="text/javascript" src="../scripts/scriptFile.js"></script>
HTML is case insensitive so the tag name and attribute names can be in upper or lower (or mixed) case (current practice tends to prefer lower case).
The attribute values must be quoted because in both cases they include
characters that are forbidden in unquoted attribute values (forbidden
characters would be any character that is not: letters (a-z and A-Z),
digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46),
underscores (ASCII decimal 95), and colons (ASCII decimal 58)). The
quote characters used may be double quotes ("
) or single quotes
('
). Common practice prefers double quotes for HTML attributes.
Traditionally javascript files are given a two-letter extension of .js
.
That extension is not required, any valid URL to a resource that
returns content that can be interpreted as valid javascript source
code will work. In addition, browsers do not appear to be interested
in any Content Type headers sent with the javascript source, which is
probably a good thing as officially recognised content types have only
just (mid 2005) been introduced.
Script that is to be included in an HTML page is placed as the content of a script element. Appearing between the opening and closing script tags:-
<script type="text/javascript"> function exampleFunctionDeclaration(n){ return (n * 4); } </sciprt>
The same case sensitivity and attribute value quoting considerations apply to this application of the script tags as applied to their use when importing external script files.
Script elements may not appear in all contexts in an HTML document.
They may be children of the HEAD
element because the DTD
defines the content of the HEAD
element as including
%head_misc;
content which includes SCRIPT
in
its definition. Script elements may also appear within the
BODY
element in any context that is specified as
%flow;
, %inline;
, %special;
or
specifically SCRIPT
by the HTML DTDs. This is because
%flow;
includes all elements defined as
%inline;
, which includes all elements defined as
%special;
, which includes SCRIP
in its
definition (among others).
Reading the DTDs and looking out for these categories will indicate where script elements are allowed to appear. For example, the (HTML 4.01 transitional) DTD definition for the paragraph element reads:-
<!ELEMENT P - O (%inline;)* -- paragraph -->
The content for the P
element is %inline;
and
%inline;
encompasses SCRIPT
. Similarly:-
<!ELEMENT DD - O (%flow;)* -- definition description -->
The DD
element has %flow;
defining its content so it is
allowed SCRIPT
as its content (or part of it). Whereas:-
<!ELEMENT DL - - (DT|DD)+ -- definition list -->
The DL
element is only allowed DT
and
DD
elements as its children. So a script element
cannot appear as a child of a DL
element in valid HTML 4.
The DTD for the particular flavour of HTML being authored is the best guide as the where in a document script elements may appear, but note that the different versions of the DTD differ slightly in terms of the content defined for some elements.
Script elements that have source code as their contents and appear on an HTML page need some special consideration.
When scripting was first introduced the preceding generations of browsers had no concept of what a script element was, and would treat the content of the unrecognised script tags in the way unrecognised tags are normally handled in HTML. The content is treated as HTML source, which, for a scripts, meant including it as text in a page. The results did not look good and a mechanism was provided to hide the script element contents from browsers that did not know how to handle script elements.
Javascript has an end of line comment symbol consisting of two slashes
(//
). All characters between that symbol and the end of
the line are treated as a comment and ignored. HTML also provides a
means of commenting out parts of its source, an opening comment
tag <!--
and a closing comment tag
-->
(strictly these are not opening and closing tags
in HTML, it is the pairs of dashes that start and end a comment. The
surrounding <!
and >
represent a
processing instruction, which is the only context in which a comment
is recognised in HTML.).
The trick to hiding javascript source code from
browsers that did not recognise the script element, so it would not
be shown on the page, was to allow script included in an HTML page to
use an additional end of line comment symbol that corresponded with
the <!--
opening comment tag used by HTML. The script
author would then place this tag/comment symbol at the start of the
script source code (on a line of its own, so as not to comment out
any javascript code) and then use the normal javascript end of line
comment symbol to comment out (from the javascript interpreter) an
HTML end of comment tag.
<script type="text/javascript"> <!-- function exampleFunctionDeclaration(n){ return (n * 4); } // --> </sciprt>
A browser incapable of recognising the script element would treat its content as HTML source and so it would interpret the script within the script element as effectively commented out, thus not displaying it on the page.
When scripting was introduced the practice was necessary and highly recommended, but that was some time ago and browsers and HTML versions have moved on two or three generations. We are now at a point where the oldest browsers in current use are already two generations into HTML versions that formalised script elements. They all know what a script element is and how its contents should be handled. Even browsers that cannot execute scripts know that they are supposed to ignore the content of script elements.
The practice of hiding scripts from "older" browsers has become an anachronism, no longer needed and no longer used by informed javascript authors. It is still often seen because it is recommended in out of date books and in out of date javascript tutorials on the web. And the readers of those books and tutorials continue to use and promote it, not realising that it no longer serves any real purpose.
The existence of this additional comment syntax in javascript included in HTML pages also lead to HTML style comments being used extensively in on-page javascript. This was, and is, a very bad idea. Javascript has end of line and multi-line comment syntaxes and they should be used exclusively to comment javascript source code.
When a script is included on an HTML page the HTML parser needs to
decide how much of the page's source text to pass on to the javascript
interpreter and where it should start processing other HTML again.
Officially an HTML parser is required to take the first occurrence of
the character sequence "</
" it finds after
the opening script tag as marking the end of the script element. In
practice browsers seem to be a lot more lax and only terminate the
script section when they encounter the character sequence
"</script>
".
That seems reasonable (if lax) but it does not eliminate all problems.
Suppose that a script includes HTML source in the form of a string
literal, and that source includes a closing script tag, as might be
the case when using document.write
to write a new script
element to the page:-
<script type="text/javascript"> document.write( '<script type="text/javascript" src="scriptFile.js"></script>'); </script>
That is an example simplified to the point of being futile but it
should be obvious that if the HTML parser considers the first
occurrence of "</script>
" as terminating
the script element the results will be undesirable.
The solution is to do something to make the character sequence within the javascript string of HTML different from the sequence that will be recognised as the closing script tag. This is often done by splitting the string and using a concatenation operation to let the script produce the same output:-
<script type="text/javascript"> document.write( '<script type="text/javascript" src="scriptFile.js"></scr'+'ipt>'); </script>
This conceals the closing script tag from the HTML parser but it is not a good idea because string concatenation is a surprisingly heavyweight operation and the same goal of disrupting the character sequence that the HTML parser will mistake for a closing tag can be achieved by using the javascript escape character to escape any character in the closing script tag:-
<script type="text/javascript"> document.write( '<script type="text/javascript" src="scriptFile.js"></script\>'); </script>
The HTML parser will now not find the character sequence
"</script>
" until it encounters the real
closing script tag, but the internal representation of the string is not
affected by the use of the escape character in the javascript source
and no additional operations are needed.
However, as I said, it is the character sequence
"</
" that is officially to be taken as
terminating a script element's contents. While no current browsers are
known to be that strict it is entirely realistic that some browsers may
exist (or be introduced) that takes the HTML specifications to hart and
treat "</
" as the end of the script content.
But HTML validaters already tend to take the HTML specification
seriously and will report many mark-up errors as a result of getting
the impression that a script element has terminated sooner than a
browser would think it had.
The above use of the escape character may placate all known browsers but it will not address the requirements of the HTML specification. But they can both be addressed by escaping a different character, specifically the forward slash:-
<script type="text/javascript"> document.write( '<script type="text/javascript" src="scriptFile.js"><\/script>'); </script>
Of course now it is not just the closing script tag that needs to be
escaped but all occurrences of closing tags appearing in string
literals. All occurrences of "</
" would need
to be escaped to "<\/
" to completely avoid
HTML parser and validation problems. Alternatively the javascript
source could be moved to an external file as then it is never
examined by an HTML parser or considered in HTML validation.
Placing javascript source code in external files has several advantages. For those who are required to use a browser that is javascript incapable/disabled it can significantly reduce download time as those browsers just will not bother getting the external file as they have no use for it, scripts on an HTML page must be downloaded with the page if the HTML is to be used.
External javascript files can also be cached separately from HTML pages so they may need to be downloaded less often even for the users of javascript capable/enabled browsers.
They entirely remove the need to worry about script hiding (no longer needed anyway), escaping HTML closing tags in strings or any other factors relating to the parsing of mark-up languages.
Javascript imported by using the src
attribute of a script element is
used in place of the content for the script element that imported it.
The position of that element in the page defines the
"location" of the script in the document. If the file
executes document.write
then any content written will be
inserted following the script element that imported the file, and any
other elements on the page referenced by that script as it loads will
need to have already been parsed by the HTML parser at that point or
they will not be found in the DOM.
Javascript files imported using the src
attribute of script elements
must contain only javascript source code. They must not contain any
HTML. It is a surprisingly common error for opening and closing script
tags and/or the "hide from older browsers" HTML comment
tags to be included in external script files, in that context they are
javascript syntax errors and nothing else.
Script elements may attempt to both import a file and contain script contents. The idea here is to provide some scripted action in the event that the external file cannot be loaded for some reason. Such a script element may look like:-
<script type="text/javascript" src="../scripts/scriptFile.js"> var externalScriptLoaded = false; </script>
The browser should handle this formulation of the script element by
attempting to load the external file, but in the even of that attempt
failing instead the contents of the script element are executed. So, in
the example above, if the external file is loaded and executed the
contents of the element would not be executed. That external file
would itself define the externalScriptLoaded
global
variable and assign it a value of boolean true
. If the
file did not load the contents would be executed, again creating
the externalScriptLoaded
variable, but this time
assigning it a false
value. Another script on the page
can then read the externalScriptLoaded
variable as a
means of determining whether the external script loaded successfully.
The definition of failing to load an external script is centred
around
HTTP. If no connection
to the server can be made, or an
HTTP error response,
such as 404
, is returned, then the external script has
failed to load and the browser can execute the contents of the
element. However, many servers are set up in such a way that they
do not actually return the expected
HTTP error responses,
but instead return an HTML page that is intended to inform the user
of the error. This is fine for humans but from the point of view of
the browser such a response is indistinguishable from a returned
(but erroneous) javascript source file (This is in part because
the browser disregards content-type headers sent with external
javascript files so even if the HTML error reporting page is sent
with a text/html content type the browser will still assume that
it contains javascript source). The browser attempts to
execute the returned HTML source as javascript and fails at the
first inevitable syntax error. But erroring while executing what
the browser thought was an external javascript file does not
result in the execution of the code within the script element.
In practice script elements are rarely used where an external
file is imported and script contents are provided for the element.
If a separate script wanted to verify that an externally imported
script was available it would not need the mechanism demonstrated
in the example above as javascript provides many ways of verifying
the existence of javascript defined entities. So, for example, if
the external script defined a function called functionName
, the
availability of that function could be verified as:-
if(typeof functionName == "function"){ functionName(); }
- and if a function defined in an external file is available then that external file must have been successfully loaded.
The final place where javascript can be included in an HTML document is as the value strings provided for event handling attributes.
The values of event handling attributes will almost certainly need to be quoted because it is nearly impossible to write a javascript statement that only uses the characters allowed in an unquoted attribute value. And quoting can get quite involved in attribute values because they need to be quoted in the HTML source so whatever type of quote marks are used in the HTML cannot be used within the javascript code provided as the value because the HTML parser would take them as ending the string for the attribute value. While javascript string literals allow the use of double quotes or single quotes as delimiters and allow the type of quote not used as the delimiter to appear within the string literal unescaped.
So, given a desire to assign the string "don't do
that"
to an element's value property in an onclick event,
because of the single quote appearing in the string itself the attribute
value onclick='this.value = "don't do that";'
will not work because the HTML parser will take the second single quote
as ending the attribute value. It will not work to simply escape the
single quote as onclick='this.value = "don\'t do
that";'
because the HTML parser doesn't know anything about
javascript escapes and still sees the second single quote in the middle
of the javascript string.
In this case escaping the single quote and reversing the quoting
between the HTML and the javascript
onclick="this.value = 'don\'t do that';"
or
using a javascript hex escape (which the HTML parser will not see as a
quote) onclick='this.value = "don\x27t do that";'
would solve the problem. But quotes in event handling attribute strings
that define code that uses string literals often needs to be thought
about.
All else being equal, web browsers seem to all default the scripting
language used with intrinsic events to javascript (ECMAScript, in
whichever implementation is provided) and there is no formal mechanism
for associating a scripting language with individual event handling
attributes (unlike script elements which must be provided with a
type
attribute).
The HTML specification calls for a page wide default scripting language to be set, and that is the only specified way to set the scripting language for intrinsic events.
To this end The HTML specification proposes the inclusion in the
HEAD
section of a page of a META
tag:-
<meta http-equiv="Content-Script-Type" content="text/javascript">
This is supposed to assert the default type of script language on a
page (possibly overridden by the (required) type
attributes provided for individual script elements). As a result it
is formally correct to include this tag in HTML 4.01 documents
(or provide a corresponding HTTP header when the page is served).
However, there is no evidence that any current browsers pay any attention to
this META
element at all (or would have any interest in a corresponding
HTTP
header), but then there are not many browsers that can execute any
scripting language but javascript. This entire proposed mechanism
has also been subject to criticism, and many recommend disregarding
it entirely in favour of relying on the tendency of browsers to
default to interpreting intrinsic event code as javascript.
The general idea of a NOSCRIPT
element is to provide a
holder for HTML marked-up content
that will only be displayed when scripting is not enabled/available on
a web browsers. At first sight this seems to be a useful idea, and a
contribution towards providing clean degradation in circumstances where
scripts cannot be executed. Showing content that would be a substitute
for any content that would otherwise have been provided by a script.
However, SCRIPT
and NOSCRIPT
elements are not
actually directly substitutable in
HTML. That is, you
cannot use a NOSCRIPT
element in all of the contexts in
which you can use a SCRIPT
element and produce valid
HTML as a result.
The
HTML DTDs
categories SCRIPT
and NOSCRIPT
differently:
SCRIPT
is an %inline
, %special
or %head.misc
element,
it may appear in the HEAD
of a document (as a child of
a HEAD
element (%head.misc
)), or in any context that
allows inline or %special
content (descendants of the BODY
element, but not in all contexts). The NOSCRIPT
element
is categorised as %block
, and as a result it cannot appear in the
HEAD
at all, and may only appear in the body in a context
that allows %block
content (%flow
or %block
but not %inline
). This
means that the one cannot always stand as a direct substitute for the
other in a valid
HTML document.
HTML NOSCRIPT
elements probably seemed like a good idea when they were first introduced.
They were probably even viable at the time because so few browsers were
able to execute javascript that a division between SCRIPT
and NOSCRIPT
could encompass all of the possibilities. The
problem with them now is the diversity of javascript capable web
browsers, with their differing object models and language implementations.
While it remains the case that any browser on which scripting is disabled
or unavailable will use any NOSCRIPT
elements provided in
an
HTML page, it is not the
case that all javascript supporting and enabled browsers will be able
to successfully execute any script specified within (or imported by) a
SCRIPT
element. The browser may lack the features needed
by the script, or just not be sufficiently dynamic to present any
content that the script intends to insert into the document.
Even browser features as seemingly universal as the
document.write
function are not universally supported
(even on modern browsers), and anything even remotely dynamic is bound
to fail somewhere. So instead of having to cope with two certain
outcomes, successful execution and no script execution at all, it is
actually necessary to cope with 3 possible outcomes, adding the
possibility that scripting is supported by the browser but the features
required by any individual script are not available. In that third
case the script fails to provide what it was intended to provide, but
the contents of the NOSCRIPT
elements are not presented
either.
This effectively renders NOSCRIPT
elements next to
useless when it comes to providing clean degradation. They leave
an unbridgeable gap between browsers unwilling or unable to execute
scripts at all and browsers that will fully support any given script.
And whatever content seemed to make sense within those
NOSCRIPT
elements must also make sense in the context
of a javascript capable browser that does not support the features
required by a script.
Recognising a requirement for clean degradation in script design,
and the inability of NOSCRIPT
elements to contribute
towards facilitating it, many recommend never using
NOSCRIPT
elements. Instead providing content that
works in place of active script support within the
HTML and then
having their scripts remove, or transform by manipulation, that
content only when the browser proves sufficiently supportive for
the script to be viable. This technique allows the design to only
consider two conditions; the browser fully supports the script
and will execute it, or the browser does not support the scripts
so whatever was originally included in the
HTML will be
what the user is exposed to.