Another Python built-in type that’s important to understand is the bytes class. This provides an immutable sequence of integer values from zero through to 255, representing the bytes typically from something like a string. One way to create a bytes object is to use what they’d call a bytes literal. Kind of like a raw string had a r prefix, a bytes literal will have a b prefix before a quoted string. That string can only contain ASCII characters, and escaped hexadecimal characters. Like hexadecimal c2 and hexadecimal a9, that we see here. And you can see, when you print out a bytes literal, it will show, basically, a same kind of representation that you used to create it.
#The bytes class provides an immutable sequence # Values must be integers from 0-255 to represent a byte bytes_literal = b'Copyright \xc2\xa9' print('bytes_literal =', bytes_literal) print('bytes_literal.decode()->', bytes_literal.decode()) print('bytes_literal.decode("utf-8")->', bytes_literal.decode('utf-8')) print('bytes_literal.decode("utf-16") ->', bytes_literal.decode('utf-16')) str_literal = 'Trademark ®' bytes_encoded = str_literal.encode() print('bytes_encoded =',bytes_encoded) print('bytes_encoded.decode() ->', bytes_encoded.decode()) print('bytes(str_literal) ->', bytes(str_literal, 'utf-8')) bytes_construct = bytes(str_literal,'utf-8') print('bytes_construct.decode() ->', bytes_construct.decode()) bytes_from_hex = bytes.fromhex('54 72 61 64 65 6d 61 72 6b 20 c2 ae') print('bytes_from_hex.decode() ->',bytes_from_hex.decode()) # A bytes sequence behaves similar to a string print('str_literal.count("T") ->', str_literal.count('T')) print('str_literal.index("T") ->', str_literal.index('T')) #However, byte values are used instead of string values print('bytes_encoded.count(0x54) ->', bytes_encoded.count(0x54)) print('bytes_encoded.index(0x54) ->', bytes_encoded.index(0x54)) The output : bytes_literal = b 'Copyright \xc2\xa9' bytes_literal.decode() -> Copyright © bytes_literal.decode("utf-8") ->Copyright © bytes_literal.decode("utf-16") -> (in this instance the output shows a set of Asian characters) bytes_encoded = b 'Trademark ®\xc2\xae' bytes_encoded.decode() -> Trademark ® bytes(str_literal) -> b'Trademark ® \xc2\xae' bytes_construct.decode() -> Trademark ® bytes_from_hex.decode() -> Trademark ® str_literal.count("T") -> 1 str_literal.index("T") -> 0 bytes_encoded.count(0x54) -> 1 bytes_encoded.index(0x54) -> 0
The bytes literal object is literally storing the bytes for each of those characters. Using the decode method, it’ll decode it, by default, in Unicode UTF-8 format. So if we use that bytes literal decode method, and not specify a decoding to use, then we see that it gets decoded. And that hexadecimal c2 a9 becomes our copyright symbol. You can also explicitly decode in a particular format, or code page with specify that, such as UTF 8, or UTF-16. Now if you do not use the same kind of encoding as you use decoding, this can cause problems. And UTF-8, again, it looks like the normal copyright symbol, along with the word copyright. But in UTF-16, this appears to be some kind of Asian character set, but I’m not sure what those characters represent.
Another way to go about creating bytes objects is to take a string literal, like this string which has trademark and that reserved trademark symbol. By using the encode method, it will create a bytes object. And we can see that bytes encoded is a byte literal object, starting with that b prefix before a string. Now that we have encoded those bytes from that string literal, you could decode those bytes, again, in the default UTF-8, and then we’d see the trademark symbol. A third way to go about creating a bytes object is to use the bytes constructor.
str_literal = 'Trademark ®' bytes_encoded =str_literal.encode() print('bytes_encoded =', bytes_encoded) print('bytes_encoded.decode() ->', bytes_encoded.decode()) print('bytes(str_literal) ->', bytes(str_literal, 'utf-8')) The output : bytes_encoded = b 'Trademark ® \xc2\xae' bytes_encoded.decode() -> Trademark ® bytes(str_literal) -> b 'Trademark ® \xc2\xae'
The bytes constructor, when applied to a string literal, will also yield a byte literal string. So if you use an assignment, something like a variable like bytes_construct, you could use that bytes constructor function, along with your literal string, and then the encoding to use. And that will be no different from the other bytes literals or encoded bytes that we’ve created so far. You can see, if that gets decoded, it again shows the trademark text, along with that trademark symbol. A fourth approach to creating a bytes object, is to take the hexadecimal byte values, put those into a string, and then from the bytes class, use the from hex method with that string of hexadecimal values. And it’ll create that bytes object for you. And you can see, these hexadecimal values equate to that same trademark string, along with that reserved trademark symbol.
bytes_construct =bytes(str_literal, 'utf-8') print('bytes_construct.decode() ->', bytes_construct.decode()) bytes_from_hex = bytes.fromhex('54 72 61 64 65 6d 61 72 6b 20 c2 ae') print('bytes_from_hex.decode() ->', bytes_from_hex.decode()) The output: bytes_construct.decode() -> Trademark ® bytes_from_hex.decode() -> Trademark ®
If you were to go examine the methods and functions available to a bytes object, you’d see that it’s very similar to a string, only it will be using byte values instead of string values. So with our string literal, like we saw when we discussed the strings class str, we could count the number of occurrences of a particular string, like capital T, in our string with trademark. And you can see there’s one of those. And you could use an index method of a string to find out the position. And in this case, the index of that capital T is right at the beginning of the string, or at index position 0.
print('str_literal.count("T") ->', str_literal.count('T')) print('str_literal.index("T") ->', str_literal.index('T')) The output: str_literal.count("T") -> 1 str_literal.index("T") -> 0
And if you notice, the actual hexadecimal bytes that were used to create the string, 54 is what represent the capital T. And so, we could use the bytes that we’ve encoded to count that occurrence of hexadecimal 54. In addition, we could also find the index of byte 54. And you can see there’s one of those 54s, or capital Ts, and that it appears in the very first index position, 0. As we go on further, working with sequences, we’ll see other ways to work with sequences, like bytes and string, lists, and tuples. But for now, this provides a decent overview of the bytes class.
# However, byte values are used instead of string values print('bytes_encoded.count(0x54) ->', bytes_encoded.count(0x54)) print('bytes_encoded.index(0x54) ->', bytes_encoded.index(0x54)) The output : bytes_encoded.count(0x54)-> 1 bytes_encoded.index(0x54) -> 0