If you want to work with text in Python, then you need to know about the class called String, or str. It allows you to create strings in Python. First of all, strings can be created with either single, or double quotes; and when you print them out, it makes no difference. You can see these single and double quoted strings appear exactly the same. One of the reasons why you might choose to use one quote over the other is that you want to contain a particular quote within your string. So if you wanted to use double quotes as part of your string, then you can contain the entire string within single quotes. Or similarly, if you wanted to use a single quote inside your string, then you could contain the entire string within double quotes. And we can see that it’ll allow you to print the type of quote that you’re enclosing, like the double quotes, or the single quotes.
#Strings of class str are created with quotes quote_1 = 'single quoted' quote_2 = "double quoted" print(quote_1, quote_2) why_1 = 'She said,"Hello!"' why_2 = "It's mine!" print(why_1, why_2) why_not_1 = "She said, \"Hello!\"" why_not_2 = 'It\'s mine!' print(why_not_1, why_not_2) # Special escape sequences exist new_line = 'line1\nline2\nline3\n' print(new_line) tab_char = 'col1\tcol2\tcol3\t' print(tab_char) backslash = 'the backslash: \\' print(backslash) # Raw strings prevent escape sequence interpretation raw_new_line = r'line1\nline2\nline3\n' print(raw_new_line) raw_tab_char = r'col1\tcol2\tcol3\t' print(raw_tab_char) raw_backslash = r'the backslash: \\' print(raw_backslash) The Output : single quoted,double quoted She said, "Hello!" It's mine! She said, "Hello!" It's mine!" line1 line2 line3 col1 col2 col3 the backslash: \ line1\nline2\nline3\n col1\tcol2\tcol3\t the backslash: \\
It’s not required to do that. Anytime you want to use a quote within a quote, if it’s the same type of quote, you can escape it by using a backslash in front of that quote character. So we have a escaped double quote or two, within double quotes. Or we have an escaped single quote, within single quotes, and the effect is the same. You’re allowed then to use that quote within that string. There are some other special escape sequences beyond the quotes. We have \n, if we want a new line. There’s \t, for a tab character. And because the backslash is used as an escape character, you even have to escape the backslash to get a backslash. So we can see the effect of using \n creates new lines; \t creates those tabs or columns; and the \ allows you to print just a single backslash.
Python does have a mechanism to avoid interpreting the escape sequences, called Raw strings. So if, for some reason, you wanted to print out \n, or \t, or \, all you need to do is create your string as a raw string by prefixing it with the letter R. And notice then that those new lines, or those tabs, or those backslash escape sequences are not interpreted. One of the simplest operations that you can do is concatenation. By using the plus operator, it will combine two strings together. So sub_text + sub_text is ‘double’ + ‘double’ – and we end up with that doubledouble string. You can also do what’s called repetition, which looks like multiplication. In this case, we might have a underscore character repeated forty times, and you see how that creates this nice line across the screen.
sub_text = 'double' print('sub_text =',sub_text) print(sub_text + sub_text) #Which create the following output: sub_text = double doubledouble print('_' * 40) # This outputss 40 instances of the underscore character,creating a line across the screen
We have a length function, which will tell you how long a string is. The word double is six characters long. We have a min function to give you the smallest value within that sequence of characters. In this case, b is the lowest character in that Unicode table. And the max function would give us the largest, or the u value. For testing whether or not one string is part of another string, you can use in or not in. We could check to see if double is in quote_2, or sub_text is in quote_2. Quote_2 is double quoted. We can also check to see if it’s in the string quote_1, which is single quoted. With sub_text not in quote_1, that would be true, because that was single quoted, not double quoted. It’s not in that quote_1. On the other hand, sub_text, or double, is in the string double quoted, which is quote_2, so we see that that returns True.
sub_text = 'double' print('len(sub_text) returns', len(sub_text)) print('min(sub_text) returns', min(sub_text)) print('max(sub_text) returns', max(sub_text)) The output: len(sub_text) returns 6 min(sub_text) returns b max(sub_text) returns u
You can also check the index or position of a character within a string. Be careful of using this method, because it can give you a value error, if the sub string that you’re looking for is not a part of the other string. But since we already have counted two e’s, we can find the index position of the first one, which it returns 2. And if we look at that string, the e is in the third position, the index is two, because it uses zero based indexing. Zero, one, two – it’s that third position, with an index of two.
why_1 = 'She said, "Hello!"' print('why_1.count("e") returns', why_1.count('e')) print('why_1.index("e") returns', why_1.index('e')) The Output : why_1.count("e") returns 2 why_1.index("e") returns 2
Now with the index method, you can search at a particular starting position, going up until a particular ending position. Since we already found one at position, or index, two, we could start at index three to go find the next one, and go all the way through to the end of the string, which is 18. But looking for an e somewhere between index three and 18, we find that there’s one at 12, which is the e in Hello. A safer method, instead of index, is find. If you use find, and the string that you’re trying to find is not contained in the other string, then it will return a -1, instead of raising a value error.
why_1 = 'She said,"Hello!"' why_2 = "It's mine!" print('why_1.index("e", 3, 18) returns', why_1.index('e', 3, 18)) print('why_1.find("X") returns',why_1.find('X')) The output: why_1.index("e", 3, 18) returns 12 why_1.find("X") returns -1
There’s a number of different formatting type methods in the string class. Upper will change the case to upper case, lower will change the case to lower case. As you
can see, all upper or all lower case characters were made out of that string.
why_1 = 'She said, "Hello!"' why_2 = "It's mine!" print('why_1.startswith("She") returns', why_1.startswith("She")) print('why_1.endswith("!\\"") returns', why_1.endswith("!\"")) print('why_1.upper() returns', why_1.upper()) print('why_1.lower() returns', why_1.lower()) The output: why_1.startswith("She") returns True why_1.endswith("!\"") returns True why_1.upper() returns SHE SAID, "HELLO!" why_1.lower() returns she said, "hello!"
If you want to break up a string on a certain character, say a comma, you can use a split method. Notice the new string csv is a, b, c. If you use that string split method on a comma, then it will return a list of all the sub strings separated at that split character, not including the split character itself. So we can see, in this case, something called a list that has three elements; the string a, the string b, and the string c. If you have a list or a tuple of elements, some kind of sequence of elements, you can use join method of a string to combine them together again. The corollary to split is join. Since I split on a comma before, I could now use that same comma to join that list of characters back together again, to recreate the original string a, b, c.
csv = 'a,b,c' print('csv.split(",") returns', csv.split(",")) print('",".join(["a", "b", "c"]) returns', ",".join(["a", "b", "c"])) The output : csv.split(",") returns ['a', 'b', 'c'] ",".join(["a", "b", "c"]) returns a,b,c
And finally, we might mention that strings have a number of inspection functions; is this and is that. For example, is alpha will be true if the entire string is only alphabetic character. Is digit will be true if it’s only characters 0 through 9. So with our string double it is alpha, it returns True. It is not digit, so that returns False. There’s more to talk about with strings, but it also applies generically to other sequences, like the lists and the tuples, so we’ll get to that business about slicing the sequences later. For now, that’s a good start on understanding the strings in Python.
print('sub_text.isalpha() returns',sub_text.isalpha()) print('sub_text.isdigit() returns',sub_text.isdigit()) The output is: sub_text.isalpha() returns True sub_text.isdigit() returns False