URLs are just one kind of Uniform Resource Identifiers (URIs) and formally speaking the URL Specification is obsolete and has been replaced by the URI (RFC 3986) specification. However, in practical terms it is still useful (much easier to understand than the URI specs ...).
This piece is just a short (cut&paste) summary from the obsolete RFC 1738 specification.
According to the RFC1738 specification, URLs are written as follows:
<scheme>:<scheme-specific-part>
A URL contains the name of the scheme being used (<nowki><scheme></nowiki>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.
A scheme refers an Internet protocol like HTTP or Telnet or Email. This is why one also could write:
<protocol>:<protocol-specific-part>
Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
Each scheme (protocol) further defines specific parts, e.g. see HTTP Scheme below.
Do not use the following characters (unless you know what you do)
Many URL schemes reserve certain characters for a special meaning, e.g. ";", "/", "?", ":", "@", "=" and "&"
http Hypertext Transfer Protocol ftp File Transfer protocol mailto Electronic mail address news USENET news nntp USENET news using NNTP access telnet Reference to interactive sessions file Host-specific file names
prospero Prospero Directory Service gopher The Gopher protocol wais Wide Area Information Servers
An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <path></nowki> is an HTTP selector, and <nowiki><searchpart> is a query string. The <path> is optional, as is the <searchpart> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.