Working with Python Strings
If you choose Python language as a go-to and you want to start making some simple projects, such as a Discord bot, a Telegram bot, or a simple app that you can run in the console or any other project, there is almost a 100% chance that you will deal with strings and lists. Even if you're a seasoned programmer, you still can miss out on some concepts or libraries. This article will discuss that and help you unleash your hidden string-fu power. Let's begin!
Before juggling our strings, we should agree on the definition first. Python's string is a built-in data type that you use to represent some human-readable text. These texts are immutable sequences of Unicode code points. In this article, we will not go into the depths of Unicode characters or Unicode strings. If you want to learn more about Unicode, please refer to this page.
Basic level
Let's talk about some basics as we've settled with our definition.
To declare a string variable, you must enclose it in quotes. It could be single quotes, double quotes, or even triple quotes!
Let's look at the examples; we will use the print function to print it to the console:
Here is the result:
Here we go, our first baby strings. As simple as it goes, you might have some questions. When do I use what? What is the difference between them?
To answer this, follow the simple guidelines. In real life, you should stick to one type and use it throughout your project. Pythonistas mostly prefer single quotes, so you can choose them as well. Double quotes are suitable, too; if you come from another language, using double quotes would be more natural for you. There is no difference between single or double quotes, as it could be in some other languages (e.g., Perl, PHP, SQL); use what you like. The same for single-character strings like 'a'; these are still strings and not another data type, such as a character.
Note that you can't use the exact quote inside the string. This will cause an error.
To avoid this, you need to use a backslash "\". In this case, it will treat it as part of a string, not a syntactic end.
Using a backslash is not limited to escape characters. Like many other languages, "\" also makes a unique character!
Here are some special characters:
- \n—a newline character
- \b—a backspace character
- \\—the backslash itself
- \t—the tab
- \N{alias}—unicode alias, learn more here
The result is:
If you want to avoid any special characters, you should use raw strings. To make a raw string, add "r" before the string.
In this case, "\n" is a part of the string with no special meaning. Even if it's "\\n", it will not be a double negation.
Let's talk about triple quotes in more detail.
The first thing to notice is that triple quotes are used for multiline strings. That is, you can write the whole text without a special new line symbol. This makes multiline strings more convenient, readable, and easy to understand and work with. One of the simplest examples is an SQL query. Or, if you're more down to earth, my "monkeys" example shows the usage perfectly.
It's a convention to use triple quotes as docstrings for your code. If you don't know what they are, they are string literals that describe the functionality of your class or your method. You can get this description by directly passing your function to the "help" function or using the "object.__doc__" method.
Or a class:
You can notice some additional documentation from the inherited "object" type.
String functions
Let's talk about some manipulations with strings. As you might remember, I referred to a string as a string literal.
String literal means that the string is hard-coded in your code LITERALLY.
What if you don't know whether something is a string, how can you check this?
To do this, you can pass some variable to the "type" function or check if the variable is an instance of the built-in string class.
Here, we see that a binary number is not a string, and '1' is a string indeed! Now, you can be sure that any method you apply, you're applying it on a string, so there are no surprises with exceptions.
Intermediate
Passing and printing strings is fun. However, it's not enough to get the work done in the real wild world.
Fortunately, Python provides more than one useful built-in function for us to work with. Let's be brave and take a look at some of them.
Adding, multiplying, and comparing strings
Strings in Python support some basic operations that you may know as math functionality.
To stick two strings together, use a plus sign. That is a good technique, but you should avoid it if you must stick multiple strings together.
The adequate name for sticking strings together is concatenation.
What if you want to add `str1` 25 times? Don't worry; you don't need to do it manually. Just multiply it by 25.
Here we go. "str1" and 24 copies of that, just chilling and questioning your sanity. 🤪
You can compare strings for equality by using "==" or "!=". By doing that, all individual characters will be compared one by one until the first mismatch.
You can compare them using "<", ">" or ">=", "<=". In this case, all strings are compared lexicographically. Lexicographically is almost as alphabetically, except it compares the code points of each character.
"len" is a must-know function, it shows the length of a string.
As you can see, it counts any character, even if it's invisible and doesn't look like English.
A single index
To get a single character from a string, you can use brackets. Inside the brackets, you specify the index position. The index in Python starts at 0. Remember that the error is raised if there is no corresponding character for an index. So, check the length before indexing unknown entities!
You can use a negative index as well. In this case, it goes from the end.
Note that the last character is at index -1 and not -0. -0 is 0 in Python, and index 0 is the 1st character.
String as an iterable
You can iterate over it using a for loop without indices.
Advanced indexing
You can retrieve a sequence of characters at once using a slice syntax.
The usage is as simple as retrieving a single character by an index but slightly more advanced.
[A:B:C]—this syntax is similar to the range function, so if you're familiar with it, you shouldn't have any problems
A: starting index
B: end index (excluding)
C: step
The last example is how you reverse a string in a Pythonic way.
Using "ord" and "chr" functions
If your string is just a single character, you might want to know its Unicode code point.
To do that, use the "ord" function.
As you might guess, you should use the "chr" function to return from the code point.
If you're interested in cryptography and want to start with some simple ciphers, then it's the right function to choose!
Note that the valid range for the "chr" function is from 0 through 1_114_111.
"str.join" to join strings
Do you have many characters but want a beautiful string? "str.join" has got you covered.
Let's combine it with the previous step and use the `chr` function and the random module.
You can specify any concatenation symbol inside of quotes; in our case, we didn't use anything.
Using "str.join" is a fast, concise, and preferred way to concatenate strings; you should add it to your arsenal.
"str" to convert anything to a string
What if the variable we are working with is not a string? We can easily make one with the "str" function.
Note that you can convert anything to a string.
Substring, finding index
To check if a string is a substring of another string you can use the operator "in". "not in" is the opposite operator of "in"
If you're interested in knowing the index of a substring and not just if it's present or not, use the "index" method.
This method returns the index for the first occurrence of a substring or raises a ValueError with a "substring not found" message otherwise.
"capitalize", "casefold", "lower", "swapcase", "title", and "upper"
All these methods are used to modify the case of the string. Let's look at the examples.
All these are self-explanatory just by looking at the result.
Note: you may notice that methods "casefold" and "lower" act differently even though they serve the same functionality. This is a common pitfall when people try to compare two strings by lowering the case. Using "casefold" is recommended for dealing with the non-ASCII alphabet.
"isalnum", "isalpha", "isascii", "isdecimal", "isdigit", "isidentifier", and "islower"
These methods check if the given string is in some possible form.
For example, "isdigit": will check if a given string can be a digit. "Isidentifier" is a method used to check if the given name can be used as a Python valid identifier, for example, a name for your function.
Strips and splits
When working with some text, you may want to trim it slightly—annoying spaces, a useless character that the author of an API decided to add.
To deal with this professionally, you must get familiar with the strip methods.
"lstrip", "rstrip", "strip" methods can take no arguments and remove all that is considered to be a space by Python or a set of characters that you want to remove from the string.
What if you want to remove a prefix or suffix from a string, not any character from a given set?
"removeprefix" and "removesuffix" will come to the rescue. Just invoke them with a string in mind, and it will return a copy without prefixes or suffixes.
What if the substring is inside the string?
Just use the "replace" method. This method removes any occurrences of the provided string.
If you have a large amount of text and want to split it into sentences, the simplest method (but not smart) is the "split" method!
It splits into spaces by default or by any string that you provided.
Note: the result of a split is an array of strings. Also note that there are no splits in the last example, meaning that the provided string is treated as a whole and not as a set of characters to choose from.
Upper intermediate level
Making use of string methods while doing Object-Oriented Programming
The world of Python developers is filled with awesome stuff that makes their life easier. One of these things is magic methods.
You may want to give your class' objects a beautiful string presentation when working on a class.
Imagine you're working on a class `Entity` that has some fields with some information. Let's look at the example.
After printing the unit instance, you have no idea what it is.
You should use “__str__” and “__repr__” magic methods to fix it.
As you can see, the "__repr__" function gives a string representation of how you created an object. It's useful for debugging and a quick way to evaluate (e.g., using the "eval" function) this string if needed. The "__str__" function gives a friendly message to the user. Add some docstrings, and voilà, you leveled up in the eyes of your colleagues.
Formatting strings
You probably noticed that many of the strings I used are not string literals. They have an "f" prefix, and they have curly braces "{}" or an exclamation point "!".
In this section, we will discuss string formatting and what these symbols mean. Welcome to the world of string formatting!
Let's explore the most fundamental idea about string formatting. The most advanced, popular, and correct way to format strings is f-strings. "f-string" formatting can be used with single, double, or triple quotes.
String formatting consists of 3 pieces:
- the string itself
- placeholders
- values to insert
As you can see, it's pretty simple. "{}" are placeholders. Inside of them, you're placing your values. Don't forget to add "f" before quotes.
Rounding floats is as easy as adding ":.5f"; here, ".5f" means that we will round it to 5 decimal digits.
Formatting time can be challenging at first, take a look at the formatting symbols in the official documentation.
You can easily align strings. To do so, add ":" after your variable name. Add a symbol with which you want to fill the space. Use "^" or ">" or "<" to pick a direction. Finally, specify the gap size you should fill (including original text or original string).
Another (and the oldest) way to format a string is to use "%".
It looks like this:
One of the advantages of the old formatting is that you can nicely unpack dictionaries based on their keys.
The last but not least are "!r" and "!s". This is a short and nice way to tell Python what magic methods it should invoke while evaluating a string. "!r" means that the "__repr__" method should be invoked, and "!s" means that the "__str__" method should be invoked. Use them wisely. The default is "!s", so writing it is unnecessary. 😊
Suppose you want to see all of them, including paddings, aligning, number formatting, date formatting, named placeholders, and more. In this case, you can visit this resource to find everything you need or official docs.
Built-in modules
If you want to expand your skill to another level while working with strings, you might be interested in the following built-in libraries:
"string" module. This module provides some constants, for example, a set of lowercase ASCII characters, digits, and punctuations.
"re" module. If you're familiar with Regular Expressions, this module is right up your alley.
"textwrap" is used to deal with blocks of text. Removing additional spaces from your docstring or fitting your text in a small text block, "textwrap" can do it.
Do you have two almost identical texts and want to find their differences?
"difflib" is the library that efficiently does precisely that.
Conclusion
Strings are everywhere, and their concept is fundamental. Strings are one of the most used data types in Python. It's essential to know regardless of your type of work. It's helpful at any time! 🙂 Immutable sequences of Unicode code points are not just some fancy way of saying "python strings" anymore. It's a good sign of your accomplishment in your journey. Single, double, and triple quoted strings, you know them all and will track them down if something goes wrong.
Strings have many functionalities that make them powerful and versatile. To some of the functionalities you've been introduced to in this article, some of them are waiting to be explored in the official documentation.
Additionally, it's important to remember that strings are immutable, so you must reassign or return the modified string from the method to continue working with them.
How about applying your knowledge and making some pet projects involving strings? Caesar cipher or maybe a Vigenere cipher? MadLib? A hangman project?
You can try some of these at Hyperskill!
Hangman - https://hyperskill.org/projects/69
Text Based Adventure Game - https://hyperskill.org/projects/161
Cracking The Caesar Cipher - https://hyperskill.org/projects/365
Good luck!
Related Hyperskill topics
like this