Python : Strings


A sequence in Python is a positionally ordered collection of objects. Sequences maintain a left-to-right order among the items they contain. Their items are stored and fetched by their relative position.

Strings are used to record textual information as well as arbitrary collections of bytes. Strictly speaking, strings are sequences of one-character strings; other types of sequences include lists and tuples.

Creating a string variable

Since Python is a dynamically typed language, variables need not be defined before they can be used. Creating a string variable is as simple as specifying the variable’s name and a sequence of characters within single or double quotes.

>>> s = 'hello'
>>> s
>>> type(s)
<type 'str'>

# double quotes also do the same thing
>>> ss = "world"
>>> ss
>>> type(ss)
<type 'str'>

Note in Python everything is an object. Here both s and ss are objects of class str.

Sequence operations
As sequences strings support operations that assume a positional ordering among items. For example, if we have a five-character string, we can verify its length with the built-in len function and fetch its components with indexing expressions:

>>> s = 'magic'
>>> len(s)
>>> s[0]   # the first item from left
>>> s[1]   # the second item from left

In Python, indexes are coded as offsets from the front, and so start from 0: the first item is at index 0, the second is at index 1, and so on.
We can also index backward, from the end—positive indexes count from the left, and negative indexes count back from the right:

>>> s[-1]   # last item from the end
>>> s[-2]   # second last item from the end

Formally, a negative index is simply added to the string’s size, so the following two operations are equivalent (though the first is easier to code and less easy to get wrong):

>>> s[-1]        # the last item in s
>>> s[len(s)-1]  # negative indexing the hard way

Notice that we can use an arbitrary expression in the square brackets, not just a hard coded number literal—anywhere that Python expects a value, we can use a literal, a variable, or any expression. Python’s syntax is completely general this way.

In addition to the array-like positional indexing Python also supports a more general form of indexing known as slicing, which is a way to extract an entire section(slice) within a string in a single step.

>>> s        # a five-character string
>>> s[1:3]   # slice of s from offsets 1 through 2 (not 3)

Probably the easiest way to think of slices is that they are a way to extract an entire column from a string in a single step. Their general form, X[I:J], means “give me everything in X from offset I up to but not including offset J.” The result is returned in a new object. The second of the preceding operations, for instance, gives us all the characters in string s from offsets 1 through 2 (that is, 3 – 1) as a new string. The effect is to slice or “parse out” the two characters in the middle.

In a slice, the left bound defaults to zero, and the right bound defaults to the length of the sequence being sliced. This leads to some common usage variations:

>>> s[1:]   # everything past the first (1:len(s))
>>> s
'magic'     # s itself hasn't changed
>>> s[0:4]  # everything but the last
>>> s[:4]   # same as s[0:4]
>>> s[:-1]  # everything but the last again but simpler (0:-1)
>>> s[:]    # all of s as a top-level copy (0:len(s))

Finally, as sequences, strings also support concatenation with the plus sign (joining two strings into a new string) and repetition (making a new string by repeating another):

>>> s + 'ian'  # concatenation
>>> s          # s is unchanged
>>> s * 3      # repetition

Notice that the plus sign (+) means different things for different objects: addition for numbers, and concatenation for strings. This is a general property of Python (specifically OOP) that we call polymorphism — the meaning of an operation depends on the objects being operated on.

Notice that in the prior examples, we were not changing the original string with any of the operations we ran on it. Every string operation is defined to produce a new string as its result, because strings are immutable in Python—they cannot be changed in-place after they are created.

For example, you can’t change a string by assigning to one of its positions, but you can always build a new one and assign it to the same name. Because Python cleans up old objects as you go, this isn’t as inefficient as it may sound:

>>> s
>>> s[1] = 'z'    # immutable objects can't be changed
... error text ...
TypeError: 'str' object does not support item assignment

>>> s = 'z' + s[1:] # but we can run expressions to make new objects
>>> s

Every object in Python is classified as either immutable (unchangeable) or not. In terms of the core types, numbers, strings, and tuples are immutable; lists and dictionaries are mutable (they can be changed in-place freely). Among other things, immutability can be used to guarantee that an object remains constant throughout your program.

String-specific Methods
Every string operation we’ve studied so far is really a sequence operation—that is, these operations will work on other sequences in Python as well, including lists and tuples. In addition to generic sequence operations, though, strings also have operations all their own, available as methods—functions attached to the object, which are triggered with a call expression.

The string find method is the basic substring search operation (it returns
the offset of the passed-in substring, or −1 if it is not present), and the string replace method performs global searches and replacements:

>>> s.find('ag')          # find the offset of a substring
>>> s
>>> s.replace('za','lo')  # replace occurrences of a substring with another

Again, despite the names of these string methods, we are not changing the original
strings here, but creating new strings as the results—because strings are immutable, we have to do it this way.

String methods are the first line of text-processing tools in Python. Other methods split a string into substrings on a delimiter (handy as a simple form of parsing), perform case conversions, test the content of the string (digits, letters, and so on), and strip whitespace characters off the ends of the string:

>>> line = 'aaa,bbbb,cc,dd'
>>> line.split(',')          # split on a delimiter into a list of substrings
['aaa', 'bbbb', 'cc', 'dd']
>>> s = 'magic'
>>> s.upper()    # upper and lower-case conversions
>>> s.lower()
>>> s.isalpha() # content tests: isalpha(), isdigit() etc.
>>> s.isdigit()
>>> line = 'aaa,bbbb,cc,dd\n'
>>> line.rstrip()  # remove white-space characters on the right side

Strings also support an advanced substitution operation known as formatting, available as both an expression (the original) and a string method call (new in 2.6 and 3.0):

>>> '%s, tricks, %s' % (s,'illusion')     # Formatting expression (all)
'magic, tricks, illusion'

>>> '{0},tricks,{1}'.format(s,'illusion') # Formatting function (>Python 2.6)

Note: Although sequence operations are generic, methods are not—although
some types share some method names, string method operations generally work only
on strings, and nothing else. As a rule of thumb, Python’s toolset is layered: generic operations that span multiple types show up as built-in functions or expressions (e.g., len(X), X[0]), but type-specific operations are method calls (e.g., aString.upper()).


About Deepak Devanand

Seeker of knowledge
This entry was posted in Python, Uncategorized and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s