Table of contents
Text Link

Working with Python Strings

If you choose Python language as a go-to and you want to start making some simple projects, such as a Discord bot, a Telegram bot, or a simple app that you can run in the console or any other project, there is almost a 100% chance that you will deal with strings and lists. Even if you're a seasoned programmer, you still can miss out on some concepts or libraries. This article will discuss that and help you unleash your hidden string-fu power. Let's begin!

Before juggling our strings, we should agree on the definition first. Python's string is a built-in data type that you use to represent some human-readable text. These texts are immutable sequences of Unicode code points. In this article, we will not go into the depths of Unicode characters or Unicode strings. If you want to learn more about Unicode, please refer to this page.

Basic level

Let's talk about some basics as we've settled with our definition.

To declare a string variable, you must enclose it in quotes. It could be single quotes, double quotes, or even triple quotes!

Let's look at the examples; we will use the print function to print it to the console:

name = 'Python'
surname = "Cleese"
snake = """🐍"""
monkeys = '''
🌴🌴🌴🌴🌴🌴🌴🌴
🌴🌴  🙊🙉🙈  🌴🌴
🌴🌴🌴🌴🌴🌴🌴🌴'''
print(name, surname)
print(snake)
print(monkeys)

Here is the result:

python and monkeys

Here we go, our first baby strings. As simple as it goes, you might have some questions. When do I use what? What is the difference between them?

To answer this, follow the simple guidelines. In real life, you should stick to one type and use it throughout your project. Pythonistas mostly prefer single quotes, so you can choose them as well. Double quotes are suitable, too; if you come from another language, using double quotes would be more natural for you. There is no difference between single or double quotes, as it could be in some other languages (e.g., Perl, PHP, SQL); use what you like. The same for single-character strings like 'a'; these are still strings and not another data type, such as a character.

Note that you can't use the exact quote inside the string. This will cause an error.

 

lyrics = 'It's a me! Mario!'

print(lyrics)
mario lyrics in python shows error

To avoid this, you need to use a backslash "\". In this case, it will treat it as part of a string, not a syntactic end.

lyrics = 'It\'s a me! Mario!'

Using a backslash is not limited to escape characters. Like many other languages, "\" also makes a unique character!

Here are some special characters:

  • \n—a newline character
  • \b—a backspace character
  • \\—the backslash itself
  • \t—the tab
  • \N{alias}—unicode alias, learn more here
unicode = '\u12FF'
alias = '\N{LATIN CAPITAL LETTER GHA}'
backspace = 'Hello, wooiq\b\b\brld'
new_lines = 'Stop\nDoing\nLike That \\'


print(f'{unicode = !r}')
print(f'{alias = !r}')
print(f'{backspace = !s}')
print(f'{new_lines = !s}')

The result is:

unicde symbols in python

If you want to avoid any special characters, you should use raw strings. To make a raw string, add "r" before the string.

raw_string = r'This is a row string\n Hooray!'
print(raw_string)

In this case, "\n" is a part of the string with no special meaning. Even if it's "\\n", it will not be a double negation.

 

Let's talk about triple quotes in more detail.

The first thing to notice is that triple quotes are used for multiline strings. That is, you can write the whole text without a special new line symbol. This makes multiline strings more convenient, readable, and easy to understand and work with. One of the simplest examples is an SQL query. Or, if you're more down to earth, my "monkeys" example shows the usage perfectly.

It's a convention to use triple quotes as docstrings for your code. If you don't know what they are, they are string literals that describe the functionality of your class or your method. You can get this description by directly passing your function to the "help" function or using the "object.__doc__" method.

def my_sum_function(*args, **kwargs) -> str:
   """ This is a unique sum function that I made myself
   :param args: numbers
   :param kwargs: some magic
   :return: result string message
   """
   return f'The result is: {sum(*args, **kwargs)}'


print(my_sum_function(range(100), start=-1))
help(my_sum_function)
triple quotes in pythong

Or a class:

class Utopianium:
   """
   All world problems will disappear, as soon as I complete this class. 💖
   """


help(Utopianium)

You can notice some additional documentation from the inherited "object" type.

object type in python

String functions

Let's talk about some manipulations with strings. As you might remember, I referred to a string as a string literal.

String literal means that the string is hard-coded in your code LITERALLY.

string_literal = "I'm a string literal! Fight me if you dare!"

What if you don't know whether something is a string, how can you check this?

To do this, you can pass some variable to the "type" function or check if the variable is an instance of the built-in string class.

objs = ['1', None, 0b101010, str, 'Hello, world']
for obj in objs:
   print(f'The type of {obj!r} is:', type(obj))
# The type of '1' is: 
# The type of None is: 
# The type of 42 is: 
# The type of  is: 
# The type of 'Hello, world' is: 

Here, we see that a binary number is not a string, and '1' is a string indeed! Now, you can be sure that any method you apply, you're applying it on a string, so there are no surprises with exceptions.

Intermediate

Passing and printing strings is fun. However, it's not enough to get the work done in the real wild world.

Fortunately, Python provides more than one useful built-in function for us to work with. Let's be brave and take a look at some of them.

Adding, multiplying, and comparing strings

Strings in Python support some basic operations that you may know as math functionality.

To stick two strings together, use a plus sign. That is a good technique, but you should avoid it if you must stick multiple strings together.

The adequate name for sticking strings together is concatenation.

str1 = 'Hello!'
str2 = 'Goodbye!'
print(str1 + ' ' + str2)  # Hello! Goodbye!

What if you want to add `str1` 25 times? Don't worry; you don't need to do it manually. Just multiply it by 25.

print(str1 * 25)
# Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!

Here we go. "str1" and 24 copies of that, just chilling and questioning your sanity. 🤪

You can compare strings for equality by using "==" or "!=". By doing that, all individual characters will be compared one by one until the first mismatch.

print(str1 != str2)
# True, that is correct these strings are unique and not equal!

You can compare them using "<", ">" or ">=", "<=". In this case, all strings are compared lexicographically. Lexicographically is almost as alphabetically, except it compares the code points of each character.

"len" is a must-know function, it shows the length of a string.

texts = ['Hello, world', '1', '', "J'avais le chat.", 'Straße']
for text in texts:
   print(f'The length of {text!r} is', len(text), 'character(s)')
# The length of 'Hello, world' is 12 character(s)
# The length of '1' is 1 character(s)
# The length of '' is 0 character(s)
# The length of "J'avais le chat." is 16 character(s)
# The length of 'Straße' is 6 character(s)

As you can see, it counts any character, even if it's invisible and doesn't look like English.

A single index

To get a single character from a string, you can use brackets. Inside the brackets, you specify the index position. The index in Python starts at 0. Remember that the error is raised if there is no corresponding character for an index. So, check the length before indexing unknown entities!

string = 'abcdef'
for index in range(len(string)):
   print(f'The character at index {index} is {string[index]}')
# The character at index 0 is a
# The character at index 1 is b
# The character at index 2 is c
# The character at index 3 is d
# The character at index 4 is e
# The character at index 5 is f

You can use a negative index as well. In this case, it goes from the end.

print('abc'[-1])  # 'c'

Note that the last character is at index -1 and not -0. -0 is 0 in Python, and index 0 is the 1st character.

String as an iterable

You can iterate over it using a for loop without indices.

string = 'hyperskill'
for character in string:
   print(f'Looking at the character {character} 👀...')
# Looking at the character h 👀...
# Looking at the character y 👀...
# Looking at the character p 👀...
# Looking at the character e 👀...
# Looking at the character r 👀...
# Looking at the character s 👀...
# Looking at the character k 👀...
# Looking at the character i 👀...
# Looking at the character l 👀...
# Looking at the character l 👀...

Advanced indexing

You can retrieve a sequence of characters at once using a slice syntax.

The usage is as simple as retrieving a single character by an index but slightly more advanced.

[A:B:C]—this syntax is similar to the range function, so if you're familiar with it, you shouldn't have any problems

A: starting index

B: end index (excluding)

C: step

string = '0123456789'
print(string[::2])  # 02468, with step 2 we could retrieve all even digits
print(string[1::2])  # 13579, all odd digits
print(string[4:7])  # 456, note that it doesn't include 7
print(string[::-1])  # 9876543210

The last example is how you reverse a string in a Pythonic way.

Using "ord" and "chr" functions

If your string is just a single character, you might want to know its Unicode code point.

To do that, use the "ord" function.

print(ord('a'))  # 97

As you might guess, you should use the "chr" function to return from the code point.

print(chr(97 + 25))  # z

If you're interested in cryptography and want to start with some simple ciphers, then it's the right function to choose!

Note that the valid range for the "chr" function is from 0 through 1_114_111.

"str.join" to join strings

Do you have many characters but want a beautiful string? "str.join" has got you covered.

Let's combine it with the previous step and use the `chr` function and the random module.

import random
random.seed(42)
my_new_password = ''.join(chr(random.randint(33, 126)) for _ in range(52))
print('My new password is:', my_new_password)
# My new password is: r/$D@=2.wf,lW%$,an$h:|tzfV=ZlD!5zWLD4L.,Q-NMnB~

You can specify any concatenation symbol inside of quotes; in our case, we didn't use anything.

Using "str.join" is a fast, concise, and preferred way to concatenate strings; you should add it to your arsenal.

"str" to convert anything to a string

What if the variable we are working with is not a string? We can easily make one with the "str" function.

integer = 123456789
integer_str = str(integer)
print(type(integer))  
print(type(integer_str))

Note that you can convert anything to a string.

Substring, finding index

To check if a string is a substring of another string you can use the operator "in". "not in" is the opposite operator of "in"

print('some' in 'something')  # True

If you're interested in knowing the index of a substring and not just if it's present or not, use the "index" method.

This method returns the index for the first occurrence of a substring or raises a ValueError with a "substring not found" message otherwise.

"capitalize", "casefold", "lower", "swapcase", "title", and "upper"

All these methods are used to modify the case of the string. Let's look at the examples.

text = 'Straße sOmEtHinG'
for method in 'capitalize casefold lower swapcase title upper'.split():
   print(f'{method} on {text!r} -> {eval(f"{text!r}.{method}()")!r}')
# capitalize on 'Straße sOmEtHinG' -> 'Straße something'
# casefold on 'Straße sOmEtHinG' -> 'strasse something'
# lower on 'Straße sOmEtHinG' -> 'straße something'
# swapcase on 'Straße sOmEtHinG' -> 'sTRASSE SoMeThINg'
# title on 'Straße sOmEtHinG' -> 'Straße Something'
# upper on 'Straße sOmEtHinG' -> 'STRASSE SOMETHING'

All these are self-explanatory just by looking at the result.

Note: you may notice that methods "casefold" and "lower" act differently even though they serve the same functionality. This is a common pitfall when people try to compare two strings by lowering the case. Using "casefold" is recommended for dealing with the non-ASCII alphabet.

"isalnum", "isalpha", "isascii", "isdecimal", "isdigit", "isidentifier", and "islower"

These methods check if the given string is in some possible form.

For example, "isdigit": will check if a given string can be a digit. "Isidentifier" is a method used to check if the given name can be used as a Python valid identifier, for example, a name for your function.

print('1_isbasic'.isidentifier())  # False
print('123'.isdigit())  # True
print('Hello'.istitle())  # True
print('Straße'.isascii())  # False
print('-1e3'.isdecimal())  # False

Strips and splits

When working with some text, you may want to trim it slightly—annoying spaces, a useless character that the author of an API decided to add.

To deal with this professionally, you must get familiar with the strip methods.

"lstrip", "rstrip", "strip" methods can take no arguments and remove all that is considered to be a space by Python or a set of characters that you want to remove from the string.

print(' s p a c e s '.strip())  # 's p a c e s'
print(' s p a c e s '.lstrip(' sp'))  # 'a c e s '
print(' s p a c e s '.rstrip(' es'))  # ' s p a c'

What if you want to remove a prefix or suffix from a string, not any character from a given set?

"removeprefix" and "removesuffix" will come to the rescue. Just invoke them with a string in mind, and it will return a copy without prefixes or suffixes.

What if the substring is inside the string?

Just use the "replace" method. This method removes any occurrences of the provided string.

 

If you have a large amount of text and want to split it into sentences, the simplest method (but not smart) is the "split" method!

It splits into spaces by default or by any string that you provided.

print('One. Two. Three.'.split())  # ['One.', 'Two.', 'Three.']
print('One. Two, Three.'.split('.'))  # ['One', ' Two, Three', '']
print('One. Two, Three.'.split('.,'))  # ['One. Two, Three.']

Note: the result of a split is an array of strings. Also note that there are no splits in the last example, meaning that the provided string is treated as a whole and not as a set of characters to choose from.

Upper intermediate level

Making use of string methods while doing Object-Oriented Programming

The world of Python developers is filled with awesome stuff that makes their life easier. One of these things is magic methods.

You may want to give your class' objects a beautiful string presentation when working on a class.

Imagine you're working on a class `Entity` that has some fields with some information. Let's look at the example.

class Entity:
   def __init__(self, name, age, skillset=None):
       self.name = name
       self.age = age
       self.skillset = skillset




unit = Entity('Morty', 12, ['chill', 'coding', 'cry'])
print(unit)  

After printing the unit instance, you have no idea what it is.

You should use “__str__” and “__repr__” magic methods to fix it.

class Entity:
   def __init__(self, name, age, skillset=None):
       self.name = name
       self.age = age
       self.skillset = skillset


   def __str__(self):
       return f'My name is {self.name}. My age is {self.age}. I can {", ".join(self.skillset)}.'


   def __repr__(self):
       return f'Entity(name={self.name!r}, age={self.age!r}, skillset={self.skillset!r})'


unit = Entity('Morty', 12, ['chill', 'code', 'cry'])
print(unit)
# My name is Morty. My age is 12. I can chill, code, cry.
print(repr(unit))
# Entity(name='Morty', age=12, skillset=['chill', 'code', 'cry'])

As you can see, the "__repr__" function gives a string representation of how you created an object. It's useful for debugging and a quick way to evaluate (e.g., using the "eval" function) this string if needed. The "__str__" function gives a friendly message to the user. Add some docstrings, and voilà, you leveled up in the eyes of your colleagues.


Formatting strings

You probably noticed that many of the strings I used are not string literals. They have an "f" prefix, and they have curly braces "{}" or an exclamation point "!".

In this section, we will discuss string formatting and what these symbols mean. Welcome to the world of string formatting!

Let's explore the most fundamental idea about string formatting. The most advanced, popular, and correct way to format strings is f-strings. "f-string" formatting can be used with single, double, or triple quotes.

String formatting consists of 3 pieces:

  • the string itself
  • placeholders
  • values to insert
color = 'black'
cost = 50
fstring = f"I'm out of a {color} paint. Could you buy me a new can, it costs ${cost}."
print(fstring)
# I'm out of a black paint. Could you buy me a new can, it costs $50.

As you can see, it's pretty simple. "{}" are placeholders. Inside of them, you're placing your values. Don't forget to add "f" before quotes.

Rounding floats is as easy as adding ":.5f"; here, ".5f" means that we will round it to 5 decimal digits.

import math
PI = math.pi
print(f"Let's round the PI number to 5 digits: {PI:.5f}")
# Let's round the PI number to 5 digits: 3.14159

Formatting time can be challenging at first, take a look at the formatting symbols in the official documentation.

from datetime import datetime
current_time = datetime.now()
print(f'Current time is: {current_time:%I:%M %p %Z}')
# Current time is: 01:12 AM

You can easily align strings. To do so, add ":" after your variable name. Add a symbol with which you want to fill the space. Use "^" or ">" or "<" to pick a direction. Finally, specify the gap size you should fill (including original text or original string).

name = 'Python'
print(f'{name:💔^10}')  # 💔💔Python💔💔

Another (and the oldest) way to format a string is to use "%".

It looks like this:

string = '%dkg of pure sugar. %d%% of sugar rush.' % (10, 100)
print(string)
# 10kg of pure sugar. 100% of sugar rush.

One of the advantages of the old formatting is that you can nicely unpack dictionaries based on their keys.

string = '%(language)s has %(number)03d quote types.' % {'language': "Python", "number": 2}
print(string)  # Python has 002 quote types.

The last but not least are "!r" and "!s". This is a short and nice way to tell Python what magic methods it should invoke while evaluating a string. "!r" means that the "__repr__" method should be invoked, and "!s" means that the "__str__" method should be invoked. Use them wisely. The default is "!s", so writing it is unnecessary. 😊

 Suppose you want to see all of them, including paddings, aligning, number formatting, date formatting, named placeholders, and more. In this case, you can visit this resource to find everything you need or official docs.

Built-in modules

If you want to expand your skill to another level while working with strings, you might be interested in the following built-in libraries:

"string" module. This module provides some constants, for example, a set of lowercase ASCII characters, digits, and punctuations.

"re" module. If you're familiar with Regular Expressions, this module is right up your alley.

"textwrap" is used to deal with blocks of text. Removing additional spaces from your docstring or fitting your text in a small text block, "textwrap" can do it.

Do you have two almost identical texts and want to find their differences?

"difflib" is the library that efficiently does precisely that.

 

Conclusion

Strings are everywhere, and their concept is fundamental. Strings are one of the most used data types in Python. It's essential to know regardless of your type of work. It's helpful at any time! 🙂 Immutable sequences of Unicode code points are not just some fancy way of saying "python strings" anymore. It's a good sign of your accomplishment in your journey. Single, double, and triple quoted strings, you know them all and will track them down if something goes wrong.

Strings have many functionalities that make them powerful and versatile. To some of the functionalities you've been introduced to in this article, some of them are waiting to be explored in the official documentation.

Additionally, it's important to remember that strings are immutable, so you must reassign or return the modified string from the method to continue working with them.

How about applying your knowledge and making some pet projects involving strings? Caesar cipher or maybe a Vigenere cipher? MadLib? A hangman project?

You can try some of these at Hyperskill!

Hangman - https://hyperskill.org/projects/69

Text Based Adventure Game - https://hyperskill.org/projects/161

Cracking The Caesar Cipher - https://hyperskill.org/projects/365

Good luck!

Related Hyperskill topics

Share this article
Get more articles
like this
Thank you! Your submission has been received!
Oops! Something went wrong.

Create a free account to access the full topic

Wide range of learning tracks for beginners and experienced developers
Study at your own pace with your personal study plan
Focus on practice and real-world experience
Andrei Maftei
It has all the necessary theory, lots of practice, and projects of different levels. I haven't skipped any of the 3000+ coding exercises.