In Java, Sign extension will kick in if we are doing the following stuff:
- Widening primitive conversion – Cast from one type to another, which involves increasing the bits of the original type, for example, cast
byte (8 bits)
toint (32 bits)
. - Bitwise right shift by n bit
>> n
.
The sign extension is an operation of increasing the bits while preserves the number’s sign (positive and negative). The Sign extension fills the increased bits with the original of the most significant bit (msb) or the leftmost bit, except one condition – casting an unsigned char
to other types, please scroll down to the 3. Java Multicast
below for explanation.
This article also explains the famous Java multicast problem in the below code. Can you answer it?
byte b = -1; // twos complement char c = (char) (b); // byte -> int -> char (widening and narrowing type) (sign extension) System.out.println((int)c); // char -> int (zero extension), output = 65535 c = (char) (b & 0xff); // byte -> int (sign extension) System.out.println((int) c); // char -> int (zero extension), output = 255
1. Java Sign Extension – Widening Primitive Conversion
For example, we cast a byte
(8 bits) to int
(32 bits).
Positive Number
1.1 Here is a byte 10
in binary format, 8 bits (1 byte).
0000 1010 = 10
If we cast/widen the byte
(8 bits) to an int
(32 bits, 4 bytes), what are the values (1 or 0) for the extra bits?
# byte, 1 byte, 8 bits
0000 1010 = 10
# int, 4 bytes, 32 bits
???? ???? | ???? ???? | ???? ???? | 0000 1010 = 10
The answer is sign extension; it depends on the “original” most significant bit (msb) or the leftmost bit. In the above example, the msb of byte 10
or 0000 1010
is 0
, and the sign extension fills all the extra unknown bits with zero.
{0}000 1010 , {0} = most significant bit or leftmost bit.
After we cast the byte 10
to int
, the result is still 10
.
0000 0000 | 0000 0000 | 0000 0000 | 0000 1010 = 10
byte aByte = 10;
int aByte1 = (int) aByte;
System.out.println(aByte1); // output = 10
Negative Number
1.2 For byte -10
, Java uses two’s complement to represent the negative value.
# two's complement formula
0000 1010 = 10
1111 0101 (invert bits)
1111 0110 (add 1)
1111 0110 = -10
If we cast/widen the byte -10
(8 bits) to int
(32 bits, 4 bytes), what are the values (1 or 0) for the extra bits?
???? ???? | ???? ???? | ???? ???? | 1111 0110 = -10
For byte -10
or 1111 0110
, the most significant bit of 1111 0110
is 1
, and the Sign extension fills all the unknown bits with one.
{1}111 0110 , {1} = most significant bit or leftmost bit.
After we cast the byte -10
to int
, the result is still -10
.
1111 1111 | 1111 1111 | 1111 1111 | 1111 0110
For two’s complement (leftmost 1 is negative, 0 is positive)
# two's complement formula
1111 1111 | 1111 1111 | 1111 1111 | 1111 0110 (this is a negative number)
1111 1111 | 1111 1111 | 1111 1111 | 1111 0101 (sub 1)
0000 0000 | 0000 0000 | 0000 0000 | 0000 1010 (invert bits)
0000 0000 | 0000 0000 | 0000 0000 | 0000 1010 = 10
1111 1111 | 1111 1111 | 1111 1111 | 1111 0110 = -10
byte aByte = -10;
int aByte1 = (int) aByte;
System.out.println(aByte1); // output = -10
The hard part is understand the two’s complement.
2. Java Sign Extension – Bitwise Right Shift >>
The bitwise shift operator >>
, also called arithmetic right shift, it preserves the sign (positive or negative numbers) after bits shift.
For example, here is an integer 10. In Java, int
is 32 bits, 4 bytes.
0000 0000 | 0000 0000 | 0000 0000 | 0000 1010
And we perform a right shift by 3 bits, 10 >> 3
. For integer 10
or {0=msb}0001010
, the most significant bit is zero, and the sign extension fills all the unknown bits with zero.
???0 0000 | 0000 0000 | 0000 0000 | 0000 0001 | 010 >> 3
The answer to 10 >> 3
is 1
.
0000 0000 | 0000 0000 | 0000 0000 | 0000 0001 = 1
System.out.println(10>>3); // output = 1
Further Reading
3. Java Multicast – Sign Extension and Zero Extension.
So far, everything looks simple and straightforward. The real challenge is multicast, and it makes everything complicated.
Review the following Java multicast example (I found this code snippet from a forum if I do not remember wrong, the original is from the Java Puzzle book).
byte b = -1; // twos complement char c = (char) (b); // byte -> int -> char (widening and narrowing type) (sign extension) System.out.println((int)c); // char -> int (zero extension), output = 65535 c = (char) (b & 0xff); // byte -> int (sign extension) System.out.println((int) c); // char -> int (zero extension), output = 255
The above example involved multicast, from byte
-> int
-> char
-> int
, two’s complement, widening primitive conversion, narrowing primitive conversion, sign extension, zero extension, and also bitwise masking. Let break it down one by one.
3.1 Java uses two’s complement to represent negative.
0000 0001 = 1
1111 1110 = (invert bits)
1111 1111 = (add 1)
1111 1111 = -1 in twos complement
byte b = 1111 1111
3.1 In Java byte
is a signed byte, 8 bits; while char
is an unsigned 2 bytes 16 bits. Java unable to cast a byte
to a char
directly (unable to cast a signed type to an unsigned type), so it cast/widen the byte
to an int
first and narrow down to char
.
byte b = -1;
char c = (char) (b); = byte -> int -> char
1111 1111 = byte (8 bits)
1111 1111 | 1111 1111 | 1111 1111 | 1111 1111 = int (32 bits, sign extension)
1111 1111 | 1111 1111 = char (16 bits)
char c = 1111 1111 | 1111 1111
3.2 This (int)c
is interesting; it cast a char
2 bytes to an int
4 bytes. First, we may think the Sign extension will perform and fills the increased bit with the original most significant bit, which is one.
char c = 1111 1111 | 1111 1111
int(c) = ???? ???? | ???? ???? | 1111 1111 | 1111 1111
int(c) = 1111 1111 | 1111 1111 | 1111 1111 | 1111 1111 (sign extension) ??? but why the output is 65535?
The answer is no, in the above case, Java uses zero extension to fill the increased bit with ZERO.
Char -> Any type = Zero Extension
Please remember
char
is an unsigned type, if we cast an unsigned typechar
to any other types,zero extension
is used, not “Sign extension”.
What is Zero Extension?
The zero extension fills all the increased bits with ZERO. This zero extension also applied to the Logical right shift operator
>>>
.
Below is the correct answer to int(c)
.
char c = 1111 1111 | 1111 1111
int(c) = ???? ???? | ???? ???? | 1111 1111 | 1111 1111
int(c) = 0000 0000 | 0000 0000 | 1111 1111 | 1111 1111 (zero extension on unsigned char), the output is 65535.
3.4 Now, we look at the following statement.
c = (char) (b & 0xff); // mask, int -> char
System.out.println((int) c); // char -> int, output: 255, but why?
Rewrite it to
int i = (b & 0xff)
c = (char)i
System.out.println((int) c);
The b & 0xff
will return a int
. Java cast the byte
to int
and performs a bitwise &
with this 0xff
, aka masking.
byte b = 1111 1111
int = 1111 1111 | 1111 1111 | 1111 1111 | 1111 1111 (sign extension)
&
0xff = 0000 0000 | 0000 0000 | 0000 0000 | 1111 1111
int i = 0000 0000 | 0000 0000 | 0000 0000 | 1111 1111
c = (char)i = 0000 0000 | 1111 1111 (narrow down, int -> char)
The last one (int) c
, cast an unsigned char
to int
, zero extension.
c = 0000 0000 | 1111 1111
(int)c = ???? ???? | ???? ???? | 0000 0000 | 1111 1111 (zero extension)
(int)c = 0000 0000 | 0000 0000 | 0000 0000 | 1111 1111
In two’s complement, first leftmost defined signed, 1 is a negative number, 0 is a positive number.
0000 0000 | 0000 0000 | 0000 0000 | 1111 1111
= 1x2^0 +... 1x2^8
1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 = 255
Done. Thanks for reading this long article.
Note
Let me know if you found any typos, errors, or wrong explanation.