Language fundamentals#
This chapter will start with a short tutorial to familiarize you with the Python language. You will quickly see the similarities with the programming language you learned in your bachelors. Remember, our goal here is to formalize and name the programming constructs (semantics). As we discussed in Unit 01, using clear semantics is crucial to understand software documentation and to “ask questions the right way” in search engines.
An entry level tutorial#
Let’s start by following a simple tutorial together.
Tip
You can simply read through the examples and try to remember them. This might work out for those of you with programming experience. For the majority of you, I highly recommend to open an ipython interpreter (or a jupyter notebook) to test the commands yourself as the tutorial goes on. You can open the interpreter on MyBinder, your laptop, or through JupyterHub in OLAT.
Copyright notice: many of these examples and explanations are copy-pasted from the official python tutorial.
Python as a Calculator#
The interpreter acts as a simple calculator: you can type an expression and it will write the value. Expression syntax is straightforward: the operators +
, -
, *
and /
work just like in most other languages:
2 + 2
4
50 - 5*6
20
8 / 5 # division always returns a floating point number
1.6
Comments in Python start with the hash character, #
, and extend to the end of the physical line. A comment may appear at the start of a line or following whitespace or code:
# this is the first comment
spam = 1 # and this is the second comment
# ... and now a third!
Parentheses ()
can be used for grouping:
(50 - 5 * 6) / 4
5.0
With Python, the **
operator is used to calculate powers:
5 ** 2
25
The equal sign (=
) is used to assign a value to a variable (variable assignment). Afterwards, no result is displayed before the next interactive prompt:
width = 20
height = 5 * 9
width * height
900
Tip
I remember my first programming class very well: the professor wrote i = i + 1
on the blackboard, and I was horrified: how can one write something so obviously wrong?
Many programming instructors recommend against reading out variable assignments as “name equals value” (i.e. from the example above: “i equals i + 1”), because it wrongly associates the =
operator to “equals” in spoken language or mathematics.
A much better translation in spoken language would be “i becomes i + 1” or “i is assigned i + 1”. Try to remember this - I will do my best to use this in class as well, but I might forget.
If a variable is not “defined” (assigned a value), trying to use it will give you an error:
n # trying to access an undefined variable raises an error
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 n # trying to access an undefined variable raises an error
NameError: name 'n' is not defined
In interactive mode, the last printed expression is assigned to the variable _
. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example:
tax = 12.5 / 100
price = 100.50
price * tax
12.5625
price + _
113.0625
_
should be treated as a read-only variable, to use in the interpreter only.
Strings#
Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes ('...'
) or double quotes ("..."
) with the same result:
'spam eggs'
'spam eggs'
"spam eggs"
'spam eggs'
The double quotes are useful if you need to use a single quote in a string:
"doesn't"
"doesn't"
Alternatively, \
can be used to escape quotes:
'doesn\'t'
"doesn't"
If you do not want characters prefaced by \
to be interpreted as special characters, you can use raw strings by adding an r
before the first quote. This is useful for Windows paths:
print('C:\some\name') # here \n means newline!
C:\some
ame
<>:1: SyntaxWarning: invalid escape sequence '\s'
<>:1: SyntaxWarning: invalid escape sequence '\s'
/tmp/ipykernel_1023905/4052845350.py:1: SyntaxWarning: invalid escape sequence '\s'
print('C:\some\name') # here \n means newline!
print(r'C:\some\name') # note the r before the quote
C:\some\name
For Windows users
Windows users: remember this trick! Paths to files or folders are used constantly in programming.
Strings can be concatenated (glued together) with the +
operator, and repeated with *
:
("She's a " + 'witch! ') * 3
"She's a witch! She's a witch! She's a witch! "
Strings can be indexed (subscripted), with the first character having index 0:
word = 'Python'
word[0] # character in position 0
'P'
word[5] # character in position 5
'n'
Indices may also be negative numbers, to start counting from the right:
word[-1] # last character
'n'
word[-2] # second-last character
'o'
In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:
word[0:2] # characters from position 0 (included) to 2 (excluded)
'Py'
word[2:5] # characters from position 2 (included) to 5 (excluded)
'tho'
Note how the start is always included, and the end always excluded. This makes sure that s[:i] + s[i:]
is always equal to s
:
word[:2] + word[2:]
'Python'
Attempting to use an index that is too large will result in an error:
word[42] # the word only has 6 characters: this will raise an error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[25], line 1
----> 1 word[42] # the word only has 6 characters: this will raise an error
IndexError: string index out of range
However, out of range slice indexes are handled gracefully when used for slicing:
word[4:42]
'on'
word[42:]
''
The built-in function len()
returns the length of a string:
s = 'supercalifragilisticexpialidocious'
len(s)
34
Basic data types#
Now that you are more familiar with the basics, let’s start to name things “the right way”. For example: an informal way to describe a programming language is to say that it “does things with stuff”.
This “stuff” is formally called “objects” in python. We will define objects more precisely towards the end of the course, but for now remember one important thing: in python, everything is an object. Yes, everything.
Python objects have a type (synonym: data type). In the previous tutorial, you used exclusively built-in types. Built-in data types are directly available in the interpreter, as opposed to other data types which may be obtained either by importing them (e.g. from collections import OrderedDict
) or by creating new data types yourselves.
Asking for the type of an object#
type(1)
int
a = 'Hello'
type(a)
str
Try print(type(a))
instead to see the difference with ipython’s simplified print. What is the type of type
, by the way?
Numeric types#
There are three distinct numeric types: integers (int
), floating point numbers (float
), and complex numbers (complex
). We will talk about these in more detail in the numerics chapter.
Booleans#
There is a built-in boolean data type (bool
) useful to test for truth value. Examples:
type(True), type(False)
(bool, bool)
type(a == 'Hello')
bool
3 < 5
True
Note that there are other rules about testing for truth in python. This is quite convenient if you want to avoid doing operations on invalid or empty containers:
if '':
print('This should not happen')
In Python, like in C, any non-zero integer value is true; zero is false:
if 1 and 2:
print('This will happen')
This will happen
Refer to the docs for an exhaustive list of boolean operations and comparison operators.
Text#
In python (and many other languages) text sequences are named strings (str
), which can be of any length:
type('Français, 汉语') # unicode characters are no problem in Python
str
Unlike some languages, there is no special type for characters:
for char in 'string':
# "char" is also a string of length 1
print(char, type(char))
s <class 'str'>
t <class 'str'>
r <class 'str'>
i <class 'str'>
n <class 'str'>
g <class 'str'>
Since strings behave like lists in many ways, they are often classified together with the sequence types, as we will see below.
Python strings cannot be changed - they are immutable. Therefore, assigning to an indexed position in the string results in an error:
word = 'Python'
word[0] = 'J'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[38], line 2
1 word = 'Python'
----> 2 word[0] = 'J'
TypeError: 'str' object does not support item assignment
Python objects have methods attached to them. We will learn more about methods later, but here is an example:
word.upper() # the method .upper() converts all letters in a string to upper case
'PYTHON'
"She's a witch!".split(' ') # the .split() method divides strings using a separator
["She's", 'a', 'witch!']
Sequence types - list, tuple, range#
Python knows a number of sequence data types, used to group together other values. The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.
squares = [1, 4, 9, 16, 25, 36, 49]
squares
[1, 4, 9, 16, 25, 36, 49]
Lists can be indexed and sliced:
squares[0]
1
squares[-3:]
[25, 36, 49]
squares[0:7:2] # new slicing! From element 0 to 7 in steps of 2
[1, 9, 25, 49]
squares[::-1] # new slicing! All elements in steps of -1, i.e. reverse
[49, 36, 25, 16, 9, 4, 1]
Warning
Lists are not the equivalent of arrays in Matlab. One major difference being that the addition operator concatenates lists together (like strings), instead of adding the numbers elementwise like in Matlab. For example:
squares + [64, 81, 100]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Unlike strings, which are immutable, lists are a mutable type, i.e. it is possible to change their content:
cubes = [1, 8, 27, 65, 125] # something's wrong here
cubes[3] = 64
cubes
[1, 8, 27, 64, 125]
Assignment to slices is also possible, and this can even change the size of the list:
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
letters[2:5] = ['C', 'D', 'E'] # replace some values
letters
['a', 'b', 'C', 'D', 'E', 'f', 'g']
letters[2:5] = [] # now remove them
letters
['a', 'b', 'f', 'g']
The built-in function len()
also applies to lists:
len(letters)
4
It is possible to nest lists (create lists containing other lists), as it is possible to store different objects in lists. For example:
a = ['a', 'b', 'c']
n = [1, 2, 3]
x = [a, n, 3.14]
x
[['a', 'b', 'c'], [1, 2, 3], 3.14]
x[0][1]
'b'
Lists also have methods attached to them (see 5.1 More on lists for the most commonly used). For example:
alphabet = ['c', 'b', 'd']
alphabet.append('a') # add an element to the list
alphabet
['c', 'b', 'd', 'a']
alphabet.sort() # sort it
alphabet
['a', 'b', 'c', 'd']
Other sequence types include: string, tuple, range. Sequence types support a common set of operations and are therefore very similar:
l = [0, 1, 2]
t = (0, 1, 2)
r = range(3)
s = '123'
# Test if elements can be found in the sequence(s)
1 in l, 1 in t, 1 in r, '1' in s
(True, True, True, True)
# Ask for the length
len(l), len(t), len(r), len(s)
(3, 3, 3, 3)
# Addition
print(l + l)
print(t + t)
print(s + s)
[0, 1, 2, 0, 1, 2]
(0, 1, 2, 0, 1, 2)
123123
The addition operator does not work for the range type though. Ranges are a little different than lists or strings:
r = range(2, 13, 2)
r # r is an object of type "range". It does not print all the values, just the interval and steps
range(2, 13, 2)
list(r) # applying list() converts range objects to a list of values
[2, 4, 6, 8, 10, 12]
Ranges are usually used as loop counters or to generate other sequences. Ranges have a strong advantage over lists and tuples: their elements are generated when they are needed, not before. Ranges have therefore a very low memory consumption. See the following:
range(2**100) # no problem
range(0, 1267650600228229401496703205376)
list(range(2**100)) # trying to make a list of values out of it results in an error
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
Cell In[62], line 1
----> 1 list(range(2**100)) # trying to make a list of values out of it results in an error
OverflowError: Python int too large to convert to C ssize_t
An OverflowError
tells me that I am trying to create an array too big to fit into memory.
The “tuple” data type is probably a new concept for you, as tuples are quite specific to python. A tuple behaves almost like a list, but the major difference is that a tuple is immutable:
l[1] = 'ha!' # I can change an element of a list
l
[0, 'ha!', 2]
t[1] = 'ha?' # But I cannot change an element of a tuple
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[64], line 1
----> 1 t[1] = 'ha?' # But I cannot change an element of a tuple
TypeError: 'tuple' object does not support item assignment
It is their immutability which makes tuples useful, but for beginners this is not really obvious at the first sight. We will get back to tuples later in the lecture.
Sets#
Sets are an unordered collection of distinct objects:
s1 = {'why', 1, 9}
s2 = {9, 'not'}
s1
{1, 9, 'why'}
# Let's compute the union of these two sets. We use the method ".union()" for this purpose:
s1.union(s2) # 9 was already in the set, however it is not doubled in the union
{1, 9, 'not', 'why'}
Sets are useful for operations such as intersection, union, difference, and symmetric difference between sequences. You will not use them much this semester, but remember that they exist.
Mapping types - dictionaries#
A mapping object maps values (keys) to arbitrary objects (values): the most frequently used mapping object is called a dictionary. It is a collection of (key, value) pairs:
tel = {'jack': 4098, 'sape': 4139}
tel
{'jack': 4098, 'sape': 4139}
tel['guido'] = 4127
tel
{'jack': 4098, 'sape': 4139, 'guido': 4127}
del tel['sape']
tel
{'jack': 4098, 'guido': 4127}
Keys can be of any immutable type: e.g. strings and numbers are often used as keys. The keys in a dictionary are all unique (they have to be):
d = {'a':1, 2:'b', 'c':1} # a, 2, and c are keys
d
{'a': 1, 2: 'b', 'c': 1}
You can ask whether a key exists in a dict with the statement:
2 in d
True
However, you cannot check the existence of a value, since the values are not necessarily unique:
1 in d
False
Warning
A python dict
is not guaranteed to remember the order in which the keys have been added to it. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted, but it is not guaranteed in previous python versions and you should not count on it.
Dictionaries are (together with lists) the container type you will use the most often.
Note: there are other container types in python, but they are used less often. See Container datatypes in the official documentation.
Can you think of examples of application of a dict
? Describe a couple of them!
Semantics parenthesis: “literals”#
Literals are the fixed values of a programming language (“notations”). Some of them are pretty universal, like numbers or strings (9
, 3.14
, "Hi!"
, all literals) some are more language specific and belong to the language’s syntax. Curly brackets {}
for example are the literal representation of a dict
. The literal syntax has been added for convenience only:
d1 = dict(bird='parrot', plant='crocus') # one way to make a dict
d2 = {'bird':'parrot', 'plant':'crocus'} # another way to make a dict
d1 == d2
True
Both {}
and dict()
are equivalent: using one or the other to construct your containers is a matter of taste, but in practice you will see the literal version more often.
Control flow#
First steps towards programming#
Of course, we can use Python for more complicated tasks than adding two and two together. For instance, we can write an initial sub-sequence of the Fibonacci series as follows:
# Fibonacci series:
# the sum of two previous elements defines the next
a, b = 0, 1
while a < 10:
print(a)
a, b = b, a+b
0
1
1
2
3
5
8
This example introduces several new features.
The first line contains a multiple assignment: the variables a and b simultaneously get the new values 0 and 1. On the last line this is used again, demonstrating that the expressions on the right-hand side are all evaluated first before any of the assignments take place. The right-hand side expressions are evaluated from the left to the right.
The while loop executes as long as the condition (here:
a < 10
) remains true. The standard comparison operators are written the same as in C:<
(less than),>
(greater than),==
(equal to),<=
(less than or equal to),>=
(greater than or equal to) and!=
(not equal to).The body of the loop is indented: indentation is Python’s way of grouping statements, and not via brackets or
begin .. end
statements. Hate it or love it, this is how it is ;-). I learned to like this style a lot. Note that each line within a basic block must be indented by the same amount. Although the indentation could be anything (two spaces, three spaces, tabs…), the recommended way is to use four spaces.
The print() function accepts multiple arguments:
i = 256*256
print('The value of i is', i)
The value of i is 65536
The keyword argument (see definition below) end
can be used to avoid the newline after the output, or end the output with a different string:
a, b = 0, 1
while a < 1000:
print(a, end=',')
a, b = b, a+b
0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,
The if
statement#
Perhaps the most well-known statement type is the if statement:
x = 12
if x < 0:
x = 0
print('Negative changed to zero')
elif x == 0:
print('Zero')
elif x == 1:
print('Single')
else:
print('More')
More
There can be zero or more elif
parts, and the else
part is optional. The keyword elif
is short for “else if”, and is useful to avoid excessive indentation.
The for
statement#
The for
loops in python can be quite different than in other languages: in python, one iterates over sequences, not indexes. This is a feature I very much like for its readability:
words = ['She', 'is', 'a', 'witch']
for w in words:
print(w)
She
is
a
witch
The equivalent for
loop with a counter is considered “unpythonic”, i.e. not elegant.
Unpythonic:
seq = ['This', 'is', 'very', 'unpythonic']
# Do not do this at home!
n = len(seq)
for i in range(n):
print(seq[i])
This
is
very
unpythonic
Pythonic:
seq[-1] = 'pythonic'
for s in seq:
print(s)
This
is
very
pythonic
for i in range(xx)
is almost never what you want to do in python. If you have several sequences you want to iterate over, then do:
squares = [1, 4, 9, 25]
for s, l in zip(seq, squares):
print(l, s)
1 This
4 is
9 very
25 pythonic
The break
and continue
statements#
The break
statement breaks out of the innermost enclosing for or while loop:
for letter in 'Python':
if letter == 'h':
break
print('Current letter:', letter)
Current letter: P
Current letter: y
Current letter: t
The continue statement
continues with the next iteration of the loop:
for num in range(2, 10):
if num % 2 == 0:
print("Found an even number", num)
continue
print("Found a number", num)
Found an even number 2
Found a number 3
Found an even number 4
Found a number 5
Found an even number 6
Found a number 7
Found an even number 8
Found a number 9
Defining functions#
A first example#
def fib(n):
"""Print a Fibonacci series up to n."""
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
# Now call the function we just defined:
fib(2000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
The def
statement introduces a function definition. It must be followed by the function name and the parenthesized list of formal parameters. The statements that form the body of the function start at the next line, and must be indented.
The first statement of the function body can optionally be a string literal; this string literal is the function’s documentation string, or docstring (more about docstrings later: in the meantime, make a habit out of it).
A function definition introduces the function name in the current scope (we will learn about scopes soon). The value of the function name has a type that is recognized by the interpreter as a user-defined function. This value can be assigned to another name which can then also be used as a function. This serves as a general renaming mechanism:
fib
<function __main__.fib(n)>
f = fib
f(100)
0 1 1 2 3 5 8 13 21 34 55 89
Coming from other languages, you might object that fib
is not a function but a procedure since it does not return a value. In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called None
(it is a built-in name). Writing the value None
is normally suppressed by the interpreter if it would be the only value written. You can see it if you really want to by using print()
:
fib(0) # shows nothing
print(fib(0)) # prints None
None
It is simple to write a function that returns a list of the numbers of the Fibonacci series, instead of printing it:
def fib2(n): # return Fibonacci series up to n
"""Return a list containing the Fibonacci series up to n."""
result = []
a, b = 0, 1
while a < n:
result.append(a)
a, b = b, a+b
return result
r = fib2(100) # call it
r # print the result
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
Positional and keyword arguments#
Functions have two types of arguments: positional arguments and keyword arguments.
keyword arguments are preceded by an identifier (e.g. name=
) and are attributed a default value. They are therefore optional:
def f(arg1, arg2, kwarg1=None, kwarg2='Something'):
"""Some function with arguments."""
print(arg1, arg2, kwarg1, kwarg2)
f(1, 2) # no need to specify them - they are optional and have default values
1 2 None Something
f(1, 2, kwarg1=3.14, kwarg2='Yes') # but you can set them to a new value
f(1, 2, kwarg2='Yes', kwarg1=3.14) # and the order is not important!
1 2 3.14 Yes
1 2 3.14 Yes
Unfortunately, it is also possible to set keyword arguments without naming them, in which case the order matters:
f(1, 2, 'Yes', 'No')
1 2 Yes No
I am not a big fan of this feature because it reduces the clarity of the code. I recommend to always use the kwarg=
syntax. Others agree with me, and therefore python implemented a syntax to make calls like the above illegal:
# The * before the keyword arguments make them keyword arguments ONLY
def f(arg1, arg2, *, kwarg1=None, kwarg2='None'):
print(arg1, arg2, kwarg1, kwarg2)
f(1, 2, 'Yes', 'No') # This now raises an error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[95], line 1
----> 1 f(1, 2, 'Yes', 'No') # This now raises an error
TypeError: f() takes 2 positional arguments but 4 were given
positional arguments are named like this because their position matters, and unlike keyword arguments they do not have a default value and they are mandatory. Forgetting to set them results in an error:
f(1)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[96], line 1
----> 1 f(1)
TypeError: f() missing 1 required positional argument: 'arg2'
Importing modules and functions#
Although python ships with some built-in functions available in the interpreter (e.g. len()
, print()
), it is by far not enough to do real world programming. Thankfully, python comes with a mechanism which allows us to access much more functionality:
import math
print(math)
print(math.pi)
<module 'math' from '/home/c707201/mambaforge/envs/scipro/lib/python3.12/lib-dynload/math.cpython-312-x86_64-linux-gnu.so'>
3.141592653589793
math
is a module, and it has attributes (e.g. pi
) and functions attached to it:
math.sin(math.pi / 4) # compute a sinus
0.7071067811865475
math
is available in the python standard library: this means that it comes pre-installed together with python itself. Other modules can be installed (like numpy
or matplotlib
), but we will not need them for now.
Modules often have a thematic grouping, i.e. math
, time
, multiprocessing
. You will learn more about them in the next lecture.
Take home points#
in python, everything is an object - we will learn more about them later
for now you can remember that python objects have methods (“services”) attached to them, such as
.split()
for strings or.append()
for listsall objects have a data type: examples of data types include
float
,string
,dict
,list
…you can ask for the type of an object with the built-in function
type()
“built-in” means that a function or data type is available at the command prompt without import statement
the “standard library” is not the same as “built-in” (the standard library is the suite of modules which come pre-installed with python)
list
anddict
are the container data types you will use most often,tuple
is often returned by Python itself or libraries.certain objects are immutable (
string
,tuple
), but others are mutable and can change their state (dict
,list
)in python, indentation matters! This is how you define blocks of code. Keep your indentation consistent, with 4 spaces.
in python, one iterates over sequences, not indexes (
for i in ...
is very rare in python and so is the variablei
)functions are defined with
def
, and also rely on indentation to define blocks. They can have areturn
statementthere are two types or arguments in functions: positional (mandatory) and keyword (optional) arguments
the
import
statement opens a whole new world of possibilities: you can access other standard tools that are not available at the top-level prompt
We learned the basic elements of the python syntax: to become fluent with this new language you will have to get familiar with all of the elements presented above. With time, you might want to get back to this chapter (or to the python reference documentation) to revisit what you have learned. I also highly recommend to follow the official python tutorial, sections 3 to 5.