Python Strings#

Learn to create, format, modify and delete strings in Python. Also, you will be introduced to various string operations and functions.

What is String in Python?#

A string is a built-in type sequence of characters. It is used to handle textual data in python. Python Strings are immutable sequences of Unicode points. Creating Strings are simplest and easy to use in Python.

Summary#

Data types

Type

String

immutable

A character is simply a symbol. For example, the English language has 26 characters.

Computers do not deal with characters, they deal with numbers (binary). Even though you may see characters on your screen, internally it is stored and manipulated as a combination of 0s and 1s.

This conversion of character to a number is called encoding, and the reverse process is decoding. ASCII and Unicode are some of the popular encodings used.

In Python, a string is a sequence of Unicode characters. Unicode was introduced to include every character in all languages and bring uniformity in encoding. These Unicodes range from \(0_{hex}\) to \(10FFFF_{hex}\). Normally, a Unicode is referred to by writing “U+” followed by its hexadecimal number. Thus strings in Python are a sequence of Unicode values. You can learn about Unicode from Python Unicode.

How to create a string in Python?#

Strings can be created by enclosing characters inside a single quote or double-quotes. Even triple quotes can be used in Python but generally used to represent multiline strings and docstrings.

# Example:

print(999)          # ▶ 999 ∵ integer number
print(type(999))    # ▶ <class 'int'>

print("999")        # ▶ 999 ∵ Watever we write inside " " it become string
print(type("999"))  # ▶ <class 'str'>
999
<class 'int'>
999
<class 'str'>
# Example: defining strings in Python

my_string = 'Hello'  # A string could be a single character or a bunch of texts
print(my_string)     # ▶ Hello

my_string = "Hello"
print(my_string)     # ▶ Hello

my_string = '''Hello'''
print(my_string)     # ▶ Hello

# triple quotes string can extend multiple lines
my_string = """Hello, welcome to
           the world of Python"""
print(my_string)     
Hello
Hello
Hello
Hello, welcome to
           the world of Python
# Multiline String

multiline_string = '''I am a resarcher cum teacher and I enjoy teaching.
I didn't find anything as rewarding as empowering people.
That's why I created this repository.'''
print(multiline_string)
I am a resarcher cum teacher and I enjoy teaching.
I didn't find anything as rewarding as empowering people.
That's why I created this repository.
# Another way of doing the same thing

multiline_string = """I am a researcher cum teacher and I enjoy teaching.
I didn't find anything as rewarding as empowering people.
That's why I created this repository."""
print(multiline_string)
I am a resarcher cum teacher and I enjoy teaching.
I didn't find anything as rewarding as empowering people.
That's why I created this repository.
# Unpacking characters 

language = 'Python'
a,b,c,d,e,f = language # unpacking sequence characters into variables
print(a) # ▶ P
print(b) # ▶ y
print(c) # ▶ t 
print(d) # ▶ h
print(e) # ▶ o
print(f) # ▶ n
print(h) # ▶ NameError: name 'h' is not defined
P
y
t
h
o
n
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-3da34cc7641e> in <module>
      9 print(e) # ▶ o
     10 print(f) # ▶ n
---> 11 print(h) # ▶  # error

NameError: name 'h' is not defined

How to access characters in a string?#

  • In Python, Strings are stored as individual characters in a contiguous memory location.

  • The benefit of using String is that it can be accessed from both the directions (forward and backward).

  • Both forward as well as backward indexing are provided using Strings in Python.

  • Forward indexing starts with 0,1,2,3,....

  • Backward indexing starts with -1,-2,-3,-4,....

  • Trying to access a character out of index range will raise an IndexError. The index must be an integer. We can’t use floats or other types, this will result into IndexError.

  • Strings can be indexed with square brackets. Indexing starts from zero in Python.

  • We can access a range of items in a string by using the slicing operator :(colon).

  • And the len() function provides the length of a string

str[0] = 'P' = str[-6] ,
str[1] = 'Y' = str[-5] ,
str[2] = 'T' = str[-4] ,
str[3] = 'H' = str[-3] ,
str[4] = 'O' = str[-2] , # refers to the second last item
str[5] = 'N' = str[-1].  # refers to the last item
language = 'Python'

first_letter = language[0]
print(first_letter)   # ▶ P

second_letter = language[1]
print(second_letter)  # ▶ y

last_index = len(language) - 1  # ∵ 6-1=5
last_letter = language[last_index]
print(last_letter)    # ▶ n
P
y
n
# If we want to start from right end we can use negative indexing. -1 is the last index

language = 'Python'
last_letter = language[-1]
print(last_letter) # ▶ n
second_last = language[-2]
print(second_last) # ▶ o
n
o

If we try to access an index out of the range or use numbers other than an integer, we will get errors.

# Accessing string characters in Python
str = 'PYTHON'
print('str = ', str)               # ▶ str =  PYTHON

# index must be an integer
# print('str[1.50] = ', str[1.5])  # ▶ TypeError: string indices must be integers

# index must be in range
# print('str[15] = ', str[15])     # ▶ IndexError: string index out of range
str =  PYTHON
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-96052f3c0c69> in <module>
      4 
      5 # index must be an integer
----> 6 print('str[1.50] = ', str[1.5])
      7 
      8 # index must be in range

TypeError: string indices must be integers
# Here, we are creating a simple program to retrieve String in reverse as well as normal form.

name="Anukool"
length=len(name)
i=0

for n in range(-1,(-length-1),-1):
    print(name[i],"\t",name[n])
    i+=1
A 	 l
n 	 o
u 	 o
k 	 k
o 	 u
o 	 n
l 	 A

How to slice a string in Python?#

Python String slice can be defined as a substring which is the part of the string. Therefore further substring can be obtained from a string.

There can be many forms to slice a string, as string can be accessed or indexed from both the direction and hence string can also be sliced from both the directions.

Slicing can be best visualized by considering the index to be between the elements as shown below.

If we want to access a range, we need the index that will slice the portion from the string.

Syntax of Slice Operator :

str[start : stop : step ]

other syntax of slice:

str[start : stop]  # items start through stop-1

str[start : ]      # items start through the rest of the array

str[ : stop]       # items from the beginning through stop-1

str[ : ]           # a copy of the whole array
s = 'Python'

# 0  1  2  3  4  5  <- Index number: POSITIVE
# P  y  t  h  o  n
#-6 -5 -4 -3 -2 -1  <- Index number: NEGATIVE

# access elements in range with jump/skip
#s[x:y:z] # Start: x Stop:y-1 Jump:z

s[0:5:1]  # ▶ 'Pytho' ∵ Start:0  Stop:5  Jump:1 
'Pytho'
s = '123456789' # Indexing strats from 0 to 8

print("The string '%s' string is %d characters long" %(s, len(s)))  
print('First character of',s,'is',s[0]) 
print('Last character of',s,'is',s[8])
print('Last character of',s,'is',s[len(s)-1]) # [9-1] = [8] is 9
The string '123456789' string is 9 characters long
First character of 123456789 is 1
Last character of 123456789 is 9
Last character of 123456789 is 9

Negative indices can be used to start counting from the back

print('First character of',s,'is',s[-len(s)])
print('First character of',s,'is',s[(-9)])
print('Second character of',s,'is',s[(-8)])
print('Last character of',s,'is',s[-1])
First character of 123456789 is 1
First character of 123456789 is 1
Second character of 123456789 is 2
Last character of 123456789 is 9

Finally a substring (range of characters) an be specified as using \(a:b\) to specify the characters at index \(a,a+1,\ldots,b-1\). Note that the last charcter is not included.

print("First three characters",s[0:3])
print("Next three characters",s[3:6])
First three characters 123
Next three characters 456

An empty beginning and end of the range denotes the beginning/end of the string:

s = '123456789' #Indexing strats from 0 to 8
print("First three characters", s[:3])
print("Last three characters", s[-3:])
First three characters 123
Last three characters 789
# Accessing string characters in Python

str = 'PYTHON'
print('str = ', str)

#first character
print('str[0] = ', str[0])

#last character
print('str[-1] = ', str[-1])

#slicing 2nd to 5th character
print('str[1:5] = ', str[1:5])

#slicing 6th to 2nd last character
print('str[5:-2] = ', str[3:-1])
str =  PYTHON
str[0] =  P
str[-1] =  N
str[1:5] =  YTHO
str[5:-2] =  HO
# Example: 

s="Anukool Python"

print(s[6:10])
print(s[-12:-7])
print(s[-1: :-1])  #reversed all string
print(s[2: 10: 2]) #step = 2
print(s[ : : -1])  #reversed all string
print(s[ : 5])     #from 0 to 4
print(s[3 : ])     #from 3 to end of the string
print(s[ : ])      #copy all string
l Py
ukool
nohtyP lookunA
uolP
nohtyP lookunA
Anuko
kool Python
Anukool Python

NOTE: Both the operands passed for concatenation must be of same type, else it will show an error.

Breaking appart strings#

When processing text, the ability to split strings appart is particularly useful.

  • partition(separator): breaks a string into three parts based on a separator

  • split(): breaks string into words separated by white-space (optionally takes a separator as argument)

  • join(): joins the result of a split using string as separator

s = "one ➡ two ➡ three"
print( s.partition("➡") )
print( s.split() )
print( s.split(" ➡ ") )
print( ";".join( s.split(" ➡ ") ) )
('one ', '➡', ' two ➡ three')
['one', '➡', 'two', '➡', 'three']
['one', 'two', 'three']
one;two;three
"This will split all words into a list".split()
['This', 'will', 'split', 'all', 'words', 'into', 'a', 'list']
' '.join(['This', 'will', 'join', 'all', 'words', 'into', 'a', 'string'])
'This will join all words into a string'
'Happy New Year'.find('ew')
7
'Happy New Year'.replace('Happy','Brilliant')

How to change or delete a string?#

Strings are immutable. This means that elements of a string cannot be changed once they have been assigned. We can simply reassign different strings to the same name.

my_string = 'python'
my_string[5] = 'a'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-9df797f83624> in <module>
      1 my_string = 'python'
----> 2 my_string[5] = 'a'

TypeError: 'str' object does not support item assignment
s='012345'
sX=s[:2]+'X'+s[3:] # this creates a new string with 2 replaced by X
print("creating new string",sX,"OK")

sX=s.replace('2','X') # the same thing
print(sX,"still OK")

s[2] = 'X' # an error!!!
creating new string 01X345 OK
01X345 still OK
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-9e95083e441a> in <module>
      6 print(sX,"still OK")
      7 
----> 8 s[2] = 'X' # an error!!!

TypeError: 'str' object does not support item assignment

We cannot delete or remove characters from a string. But deleting the string entirely is possible using the del keyword.

my_string = 'python'
del my_string[1]  # deleting element of string generates error!
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-4a9f159cd120> in <module>
      1 my_string = 'python'
----> 2 del my_string[1]  # deleting element of string generates error!

TypeError: 'str' object doesn't support item deletion
my_string = 'python'
del my_string # deleting whole string using 'del' keyword can delete it.
my_string
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-a04242241123> in <module>
      1 my_string = 'python'
      2 del my_string # deleting whole string using 'del' keyword can delete it.
----> 3 my_string

NameError: name 'my_string' is not defined

Python Strings Operations#

There are many operations that can be performed with strings which makes it one of the most used data types in Python.

To learn more about the data types available in Python visit: Python Data Types.

To perform operation on string, Python provides basically 3 types of Operators that are given below.

  • Basic Operators/Concatenation of Two or More Strings.

  • Membership Operators.

  • Relational Operators.

1. Basic Operators for concatenation of two or more strings#

There are two types of basic operators in String + and *.

The + (concatenation) operator can be used to concatenates two or more string literals together.

The * (Replication) operator can be used to repeat the string for a given number of times.

String Concatenation Operator (+)#

Joining of two or more strings into a single one is called concatenation.

# String Concatenation

a = "Hello,"
b= 'World!'
print(a+b)
print(a+" "+b)
Hello,World!
Hello, World!
# String Concatenation

string1='World'
string2='!'
print('Hello,' + " " + string1 + string2)
Hello, World!

Expression

Output

"10" + "50"

“1050”

"hello" + "009"

“hello009”

"hello99" + "world66"

“hello99world66”

Note: Both the operands passed for concatenation must be of same type, else it will show an error.

# Example:

print("HelloWorld"+99)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-4c669126467b> in <module>
      1 # Example:
      2 
----> 3 print("HelloWorld"+99)

TypeError: can only concatenate str (not "int") to str

Python String Replication Operator (*)#

Replication operator uses two parameters for operation, One is the integer value and the other one is the String argument.

The Replication operator is used to repeat a string number of times. The string will be repeated the number of times which is given by the integer value.

Expression

Output

"ArcX" * 2

“ArcXArcX”

3 *'5'

“555”

'@'* 5

“@@@@@”

Note:: We can use Replication operator in any way i.e., int * string or string * int. Both the parameters passed cannot be of same type.

# Example:

print("HelloWorld" * 5)
print(3 * "Python")
print("Hello World! "*5)  #note the space in between 'Hello' and 'World!'
Hello World! Hello World! Hello World! Hello World! Hello World! 
# Python String Operations
str1 = 'Hello'
str2 ='World!'

# using +
print('str1 + str2 = ', str1 + str2)

# using *
print('str1 * 3 =', str1 * 3)
str1 + str2 =  HelloWorld!
str1 * 3 = HelloHelloHello

If we want to concatenate strings in different lines, we can use parentheses ().

# two string literals together
'Hello ''World!'
'Hello World!'
# using parentheses
s = ('Hello '
     'World')
s
'Hello World'

Iterating Through a string#

We can iterate through a string using a for loop. Here is an example to count the number of ‘l’s in a string.

# Iterating through a string
count = 0
for letter in 'Hello World':
    if(letter == 'l'):
        count += 1
print(count,'letters found')
3 letters found

2. Python String Membership Operators#

Membership Operators are already discussed in the Operators section. Let see with context of String.

There are two types of Membership operators :

  1. in - “in” operator returns true if a character or the entire substring is present in the specified string, otherwise false.

  2. not in - “not in” operator returns true if a character or entire substring does not exist in the specified string, otherwise false.

# Example:

str1="HelloWorld"
str2="Hello"
str3="World"
str4="Milan"

print('Exmple of in operator ::')
print(str2 in str1)
print(str3 in str1)
print(str4 in str1)
print()
print(str2 not in str1)
print(str3 not in str1)
print(str4 not in str1)
Exmple of in operator ::
True
True
False

False
False
True
>>> 'a' in 'program'
True
>>> 'at' not in 'battle'
False
False

3. Python Relational Operators#

All the comparison (relational) operators i.e., (<, ><=, >=, ==, !=, <>) are also applicable for strings. The Strings are compared based on the ASCII value or Unicode(i.e., dictionary Order).

# Example:

print("HelloWorld"=="HelloWorld")
print("helloWorld">="HelloWorld")
print("H"<"h")
True
True
True

Explanation:

The ASCII value of a is 97, b is 98, c is 99 and so on. The ASCII value of A is 65, B is 66, C is 67 and so on. The comparison between strings are done on the basis on ASCII value.

The % operator is used to format a string inserting the value that comes after. It relies on the string containing a format specifier that identifies where to insert the value. The most common types of format specifiers are:

  • %s ➡ string

  • %d ➡ Integer

  • %f ➡ Float

  • %o ➡ Octal

  • %x ➡ Hexadecimal

  • %e ➡ exponential

These will be very familiar to anyone who has ever written a C or Java program and follow nearly exactly the same rules as the printf() function.

print("Hello %s" % string1)
print("Actual Number = %d" %19)
print("Float of the number = %f" %19)
print("Octal equivalent of the number = %o" %19)
print("Hexadecimal equivalent of the number = %x" %19)
print("Exponential equivalent of the number = %e" %19)
Hello World
Actual Number = 19
Float of the number = 19.000000
Octal equivalent of the number = 23
Hexadecimal equivalent of the number = 13
Exponential equivalent of the number = 1.900000e+01

When referring to multiple variables parentheses is used. Values are inserted in the order they appear in the parantheses (more on tuples in the next section)

print("Hello %s %s. My name is Bond, you can call me %d" %(string1,string2,99))
Hello World !. My name is Bond, you can call me 99

We can also specify the width of the field and the number of decimal places to be used. For example:

print('Print width 10: |%10s|'%'x')
print('Print width 10: |%-10s|'%'x') # left justified
print("The number pi = %.1f to 1 decimal places"%3.1415)
print("The number pi = %.2f to 2 decimal places"%3.1415)
print("More space pi = %10.2f"%3.1415)
print("Pad pi with 0 = %010.2f"%3.1415) # pad with zeros
Print width 10: |         x|
Print width 10: |x         |
The number pi = 3.1 to 1 decimal places
The number pi = 3.14 to 2 decimal places
More space pi =       3.14
Pad pi with 0 = 0000003.14

Built-in functions to Work with Python#

Various built-in functions that work with sequence work with strings as well.

Some of the commonly used ones are enumerate() and len(). The enumerate() function returns an enumerate object. It contains the index and value of all the items in the string as pairs. This can be useful for iteration.

Similarly, len() returns the length (number of characters) of the string.

str = 'cold'

# enumerate()
list_enumerate = list(enumerate(str))
print('list(enumerate(str) = ', list_enumerate)

#character count
print('len(str) = ', len(str))
list(enumerate(str) =  [(0, 'c'), (1, 'o'), (2, 'l'), (3, 'd')]
len(str) =  4

Python String Formatting#

Escape Sequence#

If we want to print a text like He said, "What's there?", we can neither use single quotes nor double quotes. This will result in a SyntaxError as the text itself contains both single and double quotes.

print("He said, "What's there?"")
  File "<ipython-input-39-5b2db8c64782>", line 1
    print("He said, "What's there?"")
                     ^
SyntaxError: invalid syntax

One way to get around this problem is to use triple quotes. Alternatively, we can use escape sequences.

An escape sequence starts with a backslash and is interpreted differently. If we use a single quote to represent a string, all the single quotes inside the string must be escaped. Similar is the case with double quotes. Here is how it can be done to represent the above text.

# using triple quotes
print('''He said, "What's there?"''')

# escaping single quotes
print('He said, "What\'s there?"')

# escaping double quotes
print("He said, \"What's there?\"")
He said, "What's there?"
He said, "What's there?"
He said, "What's there?"

Here is a list of all the escape sequences supported by Python.#

Escape Sequence

Description

\newline

Backslash and newline ignored

\\

Backslash

\'

Single quote

\"

Double quote

\a

ASCII Bell

\b

ASCII Backspace

\f

ASCII Formfeed

\n

ASCII Linefeed

\r

ASCII Carriage Return

\t

ASCII Horizontal Tab

\v

ASCII Vertical Tab

\ooo

Character with octal value ooo

\xHH

Character with hexadecimal value HH

# Escape sequence

print('I hope every one enjoying the python tutorials.\nDo you ?') # '\n' line break
print('Days\tChapters\tTopics')  # '\t' tab space
print('Day 1\tChp 1\tPython Introduction')
print('Day 2\tChp 2\tPython Datatypes')
print('Day 3\tChp 3\tPython Flow Control')
print('Day 4\tChp 4\tPython Functions')
print('Day 5\tChp 5\tPython Files')
print('This is a back slash  symbol (\\)') # To write a back slash
print('In every programming language it starts with \"Hello, World!\"')
I hope every one enjoying the python tutorials.
Do you ?
Days	Chapters	Topics
Day 1	Chp 1	Python Introduction
Day 2	Chp 2	Python Datatypes
Day 3	Chp 3	Python Flow Control
Day 4	Chp 4	Python Functions
Day 5	Chp 5	Python Files
This is a back slash  symbol (\)
In every programming language it starts with "Hello, World!"
# Here are some examples

print("C:\\Python32\\Lib")
#C:\Python32\Lib

print("This is printed\nin two lines")
#This is printed
#in two lines

print("This is \x48\x45\x58 representation")
#This is HEX representation
C:\Python32\Lib
This is printed
in two lines
This is HEX representation

Raw String to ignore escape sequence#

Sometimes we may wish to ignore the escape sequences inside a string. To do this we can place r or R in front of the string. This will imply that it is a raw string and any escape sequence inside it will be ignored.

print("This is \x61 \ngood example")
This is a 
good example
print(r"This is \x61 \ngood example")
This is \x61 \ngood example

The format() Method for Formatting Strings#

The format() method that is available with the string object is very versatile and powerful in formatting strings. Format strings contain curly braces {} as placeholders or replacement fields which get replaced.

We can use positional arguments or keyword arguments to specify the order.

# Python string format() method

# default(implicit) order
default_order = "{}, {} and {}".format('Allan','Bill','Cory')
print('\n--- Default Order ---')
print(default_order)

# order using positional argument
positional_order = "{1}, {0} and {2}".format('Allan','Bill','Cory')
print('\n--- Positional Order ---')
print(positional_order)

# order using keyword argument
keyword_order = "{s}, {b} and {j}".format(j='Allan',b='Bill',s='Cory')
print('\n--- Keyword Order ---')
print(keyword_order)
--- Default Order ---
Allan, Bill and Cory

--- Positional Order ---
Bill, Allan and Cory

--- Keyword Order ---
Cory, Bill and Allan

The format() method can have optional format specifications. They are separated from the field name using colon. For example, we can left-justify <, right-justify > or center ^ a string in the given space.

We can also format integers as binary, hexadecimal, etc. and floats can be rounded or displayed in the exponent format. There are tons of formatting you can use. Visit here for all the string formatting available with the format() method.

# formatting integers
"Binary representation of {0} is {0:b}".format(12)
'Binary representation of 12 is 1100'
# formatting floats
"Exponent representation: {0:e}".format(1966.365)
'Exponent representation: 1.966365e+03'
# round off
"One third is: {0:.3f}".format(1/3)
'One third is: 0.333'
# string alignment
"|{:<10}|{:^10}|{:>10}|".format('bread','butter','jam')
'|bread     |  butter  |       jam|'

Old style formatting#

We can even format strings like the old sprintf() style used in C programming language. We use the % operator to accomplish this.

x = 36.3456789
print('The value of x is %3.2f' %x)
The value of x is 36.35
print('The value of x is %3.4f' %x)
The value of x is 36.3457

Common Python String Methods#

There are numerous methods available with the string object. The format() method that we mentioned above is one of them.

Strings can be tranformed by a variety of functions that are all methods on a string. That is they are called by putting the function name with a . after the string. They include:

  • Upper vs lower case: upper(), lower(), captialize(), title() and swapcase(), join(), split(), find(), replace() etc, with mostly the obvious meaning. Note that capitalize makes the first letter of the string a capital only, while title selects upper case for the first letter of every word.

  • Padding strings: center(n), ljust(n) and rjust(n) each place the string into a longer string of length n padded by spaces (centered, left-justified or right-justified respectively). zfill(n) works similarly but pads with leading zeros.

  • Stripping strings: Often we want to remove spaces, this is achived with the functions strip(), lstrip(), and rstrip() respectively to remove from spaces from the both end, just left or just the right respectively. An optional argument can be used to list a set of other characters to be removed.

Here is a complete list of all the built-in methods to work with Strings in Python.

# Example:

s="heLLo wORLd!"
print(s.capitalize(),"vs",s.title())

print("upper case: '%s'"%s.upper(),"lower case: '%s'"%s.lower(),"and swapped: '%s'"%s.swapcase())

print('|%s|' % "Hello World".center(30)) # center in 30 characters

print('|%s|'% "     lots of space             ".strip()) # remove leading and trailing whitespace

print('%s without leading/trailing d,h,L or ! = |%s|',s.strip("dhL!"))

print("Hello World".replace("World","Class"))
Hello world! vs Hello World!
upper case: 'HELLO WORLD!' lower case: 'hello world!' and swapped: 'HEllO WorlD!'
|         Hello World          |
|lots of space|
%s without leading/trailing d,h,L or ! = |%s| eLLo wOR
Hello Class
# capitalize(): Converts the first character the string to Capital Letter

challenge = 'Python Datatypes'
print(challenge.capitalize()) # 'Python Datatypes'
Python datatypes
# count(): returns occurrences of substring in string, count(substring, start=.., end=..)

challenge = 'Python Datatypes'
print(challenge.count('y')) # 2
print(challenge.count('y', 6, 14)) # 1
print(challenge.count('ty')) # 1
2
1
1
# endswith(): Checks if a string ends with a specified ending

challenge = 'Python Datatypes'
print(challenge.endswith('es'))   # True
print(challenge.endswith('type')) # False
True
False
# expandtabs(): Replaces tab character with spaces, default tab size is 8. It takes tab size argument

challenge = 'Python\tDatatypes'
print(challenge.expandtabs())   # 'Python  Datatypes'
print(challenge.expandtabs(10)) # 'Python    Datatypes'
Python  Datatypes
Python    Datatypes
# find(): Returns the index of first occurrence of substring

challenge = 'Python Datatypes'
print(challenge.find('y'))  # 1
print(challenge.find('u')) # -1
1
-1
# format()	formats string into nicer output    
first_name = 'Milaan'
last_name = 'Parmar'
job = 'Lecturer'
country = 'Finland'
sentence = 'I am {} {}. I am a {}. I live in {}.'.format(first_name, last_name, job, country)
print(sentence) # I am Milaan Parmar. I am a Lecturer. I live in Finland.
I am Milaan Parmar. I am a Lecturer. I live in Finland.
# index(): Returns the index of substring

challenge = 'Python Datatypes'
print(challenge.find('y'))  # 1
print(challenge.find('th')) # 2
1
2
# isalnum(): Checks alphanumeric character

challenge = 'PythonDatatypes'
print(challenge.isalnum()) # True

challenge = 'Pyth0nDatatypes'
print(challenge.isalnum()) # True

challenge = 'Python Datatypes'
print(challenge.isalnum()) # False

challenge = 'Python Datatypes 2021'
print(challenge.isalnum()) # False
True
True
False
False
# isalpha(): Checks if all characters are alphabets

challenge = 'PythonDatatypes'
print(challenge.isalpha()) # True

num = '123'
print(num.isalpha())      # False
True
False
# isdecimal(): Checks Decimal Characters

challenge = 'Python Datatypes'
print(challenge.find('y'))  # 1
print(challenge.find('th')) # 2
1
2
# isdigit(): Checks Digit Characters

challenge = 'Ninety'
print(challenge.isdigit()) # False
challenge = '90'
print(challenge.isdigit())   # True
False
True
# isdecimal():Checks decimal characters

num = '30'
print(num.isdecimal()) # True
num = '30.6'
print(num.isdecimal()) # False
True
False
# isidentifier():Checks for valid identifier means it check if a string is a valid variable name

challenge = '2021PythonDatatypes'
print(challenge.isidentifier()) # False, because it starts with a number
challenge = 'Python_Datatypes'
print(challenge.isidentifier()) # True
False
True
# islower():Checks if all alphabets in a string are lowercase

challenge = 'python datatypes'
print(challenge.islower()) # True
challenge = 'Python datatypes'
print(challenge.islower()) # False
True
False
# isupper(): returns if all characters are uppercase characters

challenge = 'python datatypes'
print(challenge.isupper()) #  False
challenge = 'PYTHON DATATYPES'
print(challenge.isupper()) # True
False
True
# isnumeric():Checks numeric characters

num = '90'
print(num.isnumeric())         # True
print('ninety'.isnumeric())    # False
True
False
# join(): Returns a concatenated string

web_tech = ['HTML', 'CSS', 'JavaScript', 'React']
result = '#, '.join(web_tech)
print(result) # 'HTML# CSS# JavaScript# React'
HTML#, CSS#, JavaScript#, React
# strip(): Removes both leading and trailing characters

challenge = ' python datatypes '
print(challenge.strip('y')) # 5
 python datatypes 
# replace(): Replaces substring inside

challenge = 'python datatypes'
print(challenge.replace('datatypes', 'data-types')) # 'thirty days of coding'
python data-types
# split():Splits String from Left

challenge = 'python datatypes'
print(challenge.split()) # ['python', 'datatypes']
['python', 'datatypes']
# title(): Returns a Title Cased String

challenge = 'python datatypes'
print(challenge.title()) # Python Datatypes
Python Datatypes
# swapcase(): Checks if String Starts with the Specified String
  
challenge = 'python datatypes'
print(challenge.swapcase())   # PYTHON DATATYPES
challenge = 'Python Datatypes'
print(challenge.swapcase())  # pYTHON dATATYPES
PYTHON DATATYPES
pYTHON dATATYPES
# startswith(): Checks if String Starts with the Specified String

challenge = 'python datatypes'
print(challenge.startswith('python')) # True
challenge = '2 python datatypes'
print(challenge.startswith('two')) # False
True
False

Inspecting Strings#

There are also lost of ways to inspect or check strings. Examples of a few of these are given here:

  • Checking the start or end of a string: startswith("string") and endswith("string") checks if it starts/ends with the string given as argument

  • Capitalisation: There are boolean counterparts for all forms of capitalisation, such as isupper(), islower() and istitle()

  • Character type: does the string only contain the characters:

    • 0-9: isdecimal(). Note there is also isnumeric() and isdigit() which are effectively the same function except for certain unicode characters

    • a-zA-Z: isalpha() or combined with digits: isalnum()

    • non-control code: isprintable() accepts anything except ‘\n’ an other ASCII control codes

    • \t\n \r (white space characters): isspace()

    • Suitable as variable name: isidentifier()

  • Find elements of string: s.count(w) finds the number of times w occurs in s, while s.find(w) and s.rfind(w) find the first and last position of the string w in s.

# Example:

s="Hello World"
print("The length of '%s' is"%s,len(s),"characters") # len() gives length of the string

s.startswith("Hello") and s.endswith("World") # check start/end

# count strings
print("There are %d 'l's but only %d World in %s" % (s.count('l'),s.count('World'),s))

print('"el" is at index',s.find('el'),"in",s) #index from 0 or -1
The length of 'Hello World' is 11 characters
There are 3 'l's but only 1 World in Hello World
"el" is at index 1 in Hello World

Advanced string processing#

For more advanced string processing there are many libraries available in Python including for example:

  • re for regular expression based searching and splitting of strings

  • html for manipulating HTML format text

  • textwrap for reformatting ASCII text

  • … and many more