Java Characters

The char data type is used to represent letters (both uppercase characters and lowercase characters), digits, and other symbols. Each character is just a symbol enclosed in single quotes.

char lowerCaseLetter = 'a';
char upperCaseLetter = 'Q';
char number = '1';
char space = ' ';
char dollar = '$';

This type can represent all characters in most major languages as well as some special and computer symbols. It corresponds to the Unicode (UTF-16) format. Unicode is a computer encoding methodology that assigns a unique number for every character. It doesn't matter what language, or computer platform it's on. This is important in a global, networked world, and for computer systems that must accommodate multiple languages and special characters. Unicode code truly unifies all of these into a single standard.

Initializing characters with codes

A character can also be created using its hexadecimal code from the Unicode table. The code starts with \u.

char ch = '\u0040'; // it represents '@'
System.out.println(ch); // prints @

Although we use a sequence of characters to represent such a code, the code itself represents exactly one single character.

As an example, capital Latin letters have hexadecimal codes from '\u0041' to '\u005A', and small Latin letters have codes from '\u0061' to '\u007A'.

The char type has a minimum value encoded as '\u0000' and the maximum value encoded as '\uffff'.

It is also possible to initialize a char with a positive integer number.

char ch = 64;
System.out.println(ch); // prints @

The number 64 just corresponds to the Unicode hexadecimal code '\u0040'.

Any char variable may be considered as an unsigned integer value ranging from 0 to 65535.

Retrieving subsequent characters

There are two operators for adding (+) and subtracting (-) integer numbers in order to get the next and previous character according to the Unicode order.

char ch = 'b';
ch += 1; // Changes value of ch to 'c' by adding 1
ch -= 2; // Changes value of ch to 'a' by subtracting 2

It is also possible to add or subtract two characters.

char ch = 'b';
ch += 'a';
ch -= 'b';
System.out.println(ch); // prints 'a' without quotes

Actually, these operations manipulate the underlying Unicode code of characters. Treating characters as respective integer values based on the Unicode table, 'b' has the next code after 'a'.

It is possible to use increment (++) and decrement (--) operators in prefix and postfix forms.

char ch = 'A';
ch += 10;
System.out.println(ch);   // prints 'K'
System.out.println(++ch); // prints 'L'
System.out.println(++ch); // prints 'M'
System.out.println(--ch); // prints 'L'

Escape sequences

There are some special characters starting with a backslash \ which are known as the escape or control sequences. They do not have corresponding symbols and cannot be found on a keyboard. To represent such characters we use a pair of regular symbols. In a program, this pair will be considered as exactly one single character with the appropriate code.

  • '\n' is the newline character;
  • '\t' is the tab character;
  • '\r' is the carriage return character;
  • '\\' is the backslash character itself;
  • '\'' is the single quote mark;
  • '\"' is the double quote mark.

Here are several examples:

System.out.print('\t'); // makes a tab
System.out.print('a');  // prints 'a'
System.out.print('\n'); // goes to the new line
System.out.print('c');  // prints 'c'

This code prints:

  a
c

There is also a character to represent a single space ' '. It is just a regular character, not an escape sequence. In Java, an escape sequence is a set of characters used to represent a special character that does not have a corresponding symbol on a keyboard.

Conclusion

The char type can represent characters from all languages and special symbols, thanks to its Unicode support. Characters can be initialized using their hexadecimal Unicode codes, which are denoted by '\u' followed by the code. The char type has a minimum value of '\u0000' and a maximum value of '\uffff'. Characters can be manipulated using arithmetic operations to retrieve subsequent or previous characters based on their Unicode order. Escape sequences, starting with a backslash '\', are used to represent special characters that don't have corresponding symbols on a keyboard. Understanding the char type and its capabilities is essential for working with textual data in Java.

Create a free account to access the full topic

“It has all the necessary theory, lots of practice, and projects of different levels. I haven't skipped any of the 3000+ coding exercises.”
Andrei Maftei
Hyperskill Graduate