Safe, Unsafe, Reserved & Unreserved Characters in URLs

Safe, unsafe, reserved, unreserved characters in urls.

Nicolas Lule Web Developer in Chicago, IL
Nicolas Lule September 5, 2022 ยท min read
Share:

When creating websites or web applications, we get to the point where we need to structure our web pages with proper and consistent URL naming conventions.

There are semantics and URL-naming standards already in place to help us locate and access resources on the internet. Still, there are some characters we need to consider when applying such naming conventions to avoid conflicts and security vulnerabilities.

In this blog post, I’ll go through the safe, unsafe, reserved & unreserved characters in Uniform Resource Locators (URLs) and Uniform Resource Identifiers (URIs).

Safe Characters in URLs

As the name implies, safe characters are safe to use in URLs, and no encoding is required. Safe characters include decimal digits, lowercase, uppercase, and reserved characters when used for their reserved purpose.

  • Decimal digits – “0 – 9”
  • Lowercase letters – “a – z”
  • Uppercase letters – “A – Z”
  • Reserved characters (when used for their reserved purposes)

Reserved Characters in URLs

Reserved characters in URLs are reserved for performing specific functionalities and should not be used in URLs if they are not used for their reserved purpose. Reserved characters must be encoded when not used for their reserved purpose.

  • Colon – “:”
  • Question mark – “?”
  • Pound – “#”
  • Open bracket – “[“
  • Close bracket – “]”
  • ‘At’ symbol – “@”
  • Exclamation mark – “!”
  • Dollar sign – “$”
  • Ampersand – “&”
  • Forward Slash – “/”
  • Asterisk – “*”
  • Plus sign – “+”
  • Comma – “,”
  • Semi-colon – “;”
  • Equal sign – “=”
  • Apostrophe – “‘”
  • Left parenthesis – “(“
  • Right parenthesis – “)”

Unreserved Characters in URLs

Characters that don’t have a reserved purpose and are allowed in an URL are called unreserved and include the hyphen, period, underscore, tilde, decimal digits, uppercase, and lowercase letters.

  • Hyphen – “-“
  • Period – “.”
  • Underscore – “_”
  • Tilde – “~”
  • Decimal digits – “0 – 9”
  • Uppercase letters – “A – Z”
  • Lowercase letters – “a – z”

Unsafe Characters in URLs

There are different reasons why characters in URLs may be unsafe. Some systems or programs modify certain characters and this can caused some security vulnerabilities. For example, the space indicates the end of the URL and is considered unsafe because significant spaces may be added or removed in certain programs e.g. word processing programs. The # character is unsafe because it is used to delimit a URL from an anchor identifier that might follow it.

  • Blank/empty space – ” “
  • Quotation mark – “””
  • Less-than sign – “<“
  • Greater than sign – “>”
  • Percent sign – “%”
  • Left curly bracket – “{“
  • Right curly bracket – “}”
  • Vertical bar – “|”
  • Backslash – “\”
  • Circumflex or caret – “^”
  • Backtick or grave accent – “`”

Following proper web standards and protocols is essential to reduce vulnerabilities, create web consistency, and improve functionality.