Object Oriented Programming: why?#

In the first two OOP units you learned the basic semantics of OOP in python. In this unit we will attempt to provide concrete examples of the use of objects in python (and other OOP languages) and provide some arguments in favor of the use of OOP in your everyday programming tasks.

Introduction#

OOP is a tool that, when used wisely, can help you to structure your programs in a way which might be more readable, easier to maintain and more flexible than purely procedural programs. But why is that so? In this lecture, we will discuss five core concepts of OOP:

  • abstraction

  • encapsulation

  • modularity

  • polymorphism

  • inheritance

Abstraction#

Data abstraction refers to the separation between the abstract properties of an object and its internal representation. By giving a name to things and hiding unnecessary details from the user, objects provide an intuitive interface to concepts which might be very complex internally.

Going back to our examples from the last two units: we used the term “objects” in programming as surrogate for actual objects in the real world: a cat, a pen, a car… These objects have a state (in OOP: attributes) and realize actions (in OOP: methods). For a pen, the states (attributes) could be: ink_color, ink_volume, point_size, etc. The actions (methods) could be: write(), fill_ink(), etc.

OOP allows you to write programs which feel more natural and intuitive than functions and procedures. If a concept in your program is easily describable in terms of “state” and “actions”, it might be a good candidate for writing a python class.

Let’s make an example based on a widely used object in Python, with an instance of the class string:

a = 'hello!'

The state of our object is relatively simple to describe: it is the sentence (list of characters) stored in the object. We have access to this state (we can print its values) but the way these values are stored in memory is abstracted away. We do not care about the details, we just want a string. Now, a string provides many actions:

a.capitalize()
'Hello!'
a.capitalize().istitle()
True
a.split('l')
['he', '', 'o!']

Abstractions should be as simple and well defined as possible. Sometimes there is more than one possible way to provide an abstraction to the user, and it becomes a debate among the developers of a project whether these abstractions are useful or not.

Well defined abstractions can be composed together. A good example is provided by the xarray library: an xarray.DataSet is composed of several xarray.DataArray objects. These xarray.DataArray objects have the function to store data (a numpy.ndarray object) together with coordinates (other numpy.ndarray objects) and attributes (units, name, etc.). The xarray.DataArray objects have the task to storexarray.DataArrays. This chain of abstractions is possible only if each of these concepts has a clearly defined role: xarray does not mess around with numbers in arrays: numpy does the numerical job behind the scenes. Inversely, numpy does not care whether an array has coordinates or not: xarray does the job of tracking variable descriptions and units.

Encapsulation#

Encapsulation is tied to the concept of abstraction. By hiding the internal implementation of a class behind a defined interface, users of the class do not need to know details about the internals of the class to use it. The implementation of the class can be changed (or internal data can be modified) without having to change the code of the users of the class.

In python, encapsulation is more difficult to achieve than in other languages like Java. Indeed, Java implements the concept of private methods and attributes, which are hidden from the user per definition. In python, nothing is hidden from the user: however, developers make use of important conventions to inform the users that a method or attribute is not meant to be used by the class alone. Let’s take an xarray DataArray as an example:

import xarray as xr
import numpy as np
da = xr.DataArray([1, 2, 3])
print(dir(da))
['T', '_HANDLED_TYPES', '__abs__', '__add__', '__and__', '__annotations__', '__array__', '__array_priority__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__dask_graph__', '__dask_keys__', '__dask_layers__', '__dask_optimize__', '__dask_postcompute__', '__dask_postpersist__', '__dask_scheduler__', '__dask_tokenize__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmatmul__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__sizeof__', '__slots__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__weakref__', '__xor__', '_all_compat', '_attr_sources', '_binary_op', '_cache', '_calc_assign_results', '_close', '_construct_direct', '_coords', '_copy', '_copy_attrs_from', '_cum_extra_args_docstring', '_dask_finalize', '_from_temp_dataset', '_get_axis_num', '_getitem_coord', '_in_memory', '_indexes', '_inplace_binary_op', '_ipython_key_completions_', '_item_key_to_dict', '_item_sources', '_iter', '_name', '_overwrite_indexes', '_reduce_extra_args_docstring', '_reduce_method', '_reindex_callback', '_replace', '_replace_maybe_drop_dims', '_repr_html_', '_resample', '_result_name', '_setattr_dict', '_title_for_slice', '_to_dataset_split', '_to_dataset_whole', '_to_index', '_to_temp_dataset', '_unary_op', '_variable', 'all', 'any', 'argmax', 'argmin', 'argsort', 'as_numpy', 'assign_attrs', 'assign_coords', 'astype', 'attrs', 'bfill', 'broadcast_equals', 'broadcast_like', 'chunk', 'chunks', 'chunksizes', 'clip', 'close', 'coarsen', 'combine_first', 'compute', 'conj', 'conjugate', 'convert_calendar', 'coords', 'copy', 'count', 'cumprod', 'cumsum', 'cumulative', 'cumulative_integrate', 'curvefit', 'data', 'diff', 'differentiate', 'dim_0', 'dims', 'dot', 'drop', 'drop_attrs', 'drop_duplicates', 'drop_encoding', 'drop_indexes', 'drop_isel', 'drop_sel', 'drop_vars', 'dropna', 'dt', 'dtype', 'encoding', 'equals', 'expand_dims', 'ffill', 'fillna', 'from_dict', 'from_iris', 'from_series', 'get_axis_num', 'get_index', 'groupby', 'groupby_bins', 'head', 'identical', 'idxmax', 'idxmin', 'imag', 'indexes', 'integrate', 'interp', 'interp_calendar', 'interp_like', 'interpolate_na', 'isel', 'isin', 'isnull', 'item', 'load', 'loc', 'map_blocks', 'max', 'mean', 'median', 'min', 'name', 'nbytes', 'ndim', 'notnull', 'pad', 'persist', 'pipe', 'plot', 'polyfit', 'prod', 'quantile', 'query', 'rank', 'real', 'reduce', 'reindex', 'reindex_like', 'rename', 'reorder_levels', 'resample', 'reset_coords', 'reset_encoding', 'reset_index', 'roll', 'rolling', 'rolling_exp', 'round', 'searchsorted', 'sel', 'set_close', 'set_index', 'set_xindex', 'shape', 'shift', 'size', 'sizes', 'sortby', 'squeeze', 'stack', 'std', 'str', 'sum', 'swap_dims', 'tail', 'thin', 'to_dask_dataframe', 'to_dataframe', 'to_dataset', 'to_dict', 'to_index', 'to_iris', 'to_masked_array', 'to_netcdf', 'to_numpy', 'to_pandas', 'to_series', 'to_unstacked_dataset', 'to_zarr', 'transpose', 'unify_chunks', 'unstack', 'values', 'var', 'variable', 'weighted', 'where', 'xindexes']

In this (very) long list of methods and attributes, some of them are available and documented. For example:

da.values
array([1, 2, 3])

Other methods/attributes start with one underscore. This underscore has no special meaning in the Python language other than being a warning to the users, saying as much as: “Don’t use this method or attribute. If you do, do it at your own risk”. For example:

da._in_memory
True

_in_memory is an attribute which is meant for internal use in the class (it is called private). Setting it to another value might have unpredictable consequences, and relying on it for your own code is not recommended: the xarray developers might rename it or change it without notice.

The methods having two leading and trailing underscores have a special meaning in Python and are part of the language specifications. We already encountered __init__ for our class instantiation, and we will talk about some others later in this chapter.

Modularity#

Modularity is a technique to separate different portions of the program (modules) based on some logical boundary. Modularity is a general principle in programming, although object-oriented programming typically makes it more explicit by giving meaningful names and actions to the program’s tools.

Taking the example of xarray.DataArray and numpy.Array again: both classes have very clear domains of functionality. The latter shines at doing fast numerical computations on arrays, the former provides an intuitive abstraction to the internal arrays by giving names and coordinates to its axes. Modularity is achieved thanks to the naming and documentation of each object’s tasks and purpose.

Polymorphism#

Polymorphism is the name given to the technique of creating multiple classes that obey the same interface. The “interface” of an object is the set of public attributes and methods it defines.

Objects from different classes can be mixed at runtime if they obey the same interface. In other words, polymorphism originates from the fact that a certain action can have well defined but different meanings depending on the objects they apply to.

An example of polymorphism is provided by the addition operation in python:

1 + 1
2
1 + 1.2
2.2
[1, 2] + [3, 4] + [5]
[1, 2, 3, 4, 5]
np.array([1, 2]) + [3, 4]
array([4, 6])

Each of these addition operations are performing a different action depending on the object they are applied to.

OOP relies on polymorphism to provide higher levels of abstraction. In our Cat and Dog example from last week, both classes provided a say_name() method: the internal implementation, however, was different in each case.

Many OOP languages (including Python) provide powerful tools for the purpose of polymorphism. One of them is operator overloading:

class ArrayList(list):

    def __repr__(self):
        """Do NOT do this at home!"""
        return 'ArrayList(' + super().__repr__() + ')'

    def __add__(self, other):
        """Do NOT do this at home!"""
        return [a + b for a, b in zip(self, other)]

What did we just do? The class definition (class ArrayList(list)) indicates that we created a subclass of the parent class list, a well known data type in python. Our child class has all the attributes and methods of the original parent class:

a = ArrayList([1, 2, 3])
len(a)
3
a
ArrayList([1, 2, 3])
b = [1, 2, 3]
b
[1, 2, 3]
np.array([1, 2, 3])
array([1, 2, 3])

Now, we defined a method __add__, which allows us to do some python magic: __add__ is the method which is actually called when two objects are added together. This means that the two statements below are totally equivalent:

[1] + [2]
[1, 2]
[1].__add__([2])  # the functional version of the literal above
[1, 2]

Now, what does that mean for the addition on our ArrayList class? Let’s try and find out:

a + [11, 12, 13]
[12, 14, 16]

We just defined a new way to realize additions on lists! How did this happen? Well, exactly like the example above: the python interpreter understood that it has to apply the literal operator + on the two objects a and [11, 12, 13], which translates to a call to a.__add__([11, 12, 13]), which calls our own implementation of the list addition.

This is a very powerful mechanism: a prominent example is provided by numpy: by implementing the __add__ method on ndarray objects, they provide a new functionality which is hidden from the user but intuitive at the same time. Numpy arrays not only implement __add__, they also implement __mul__, __div__, __repr__, etc.

Exercise 25

What does the __repr__ operator do? Can you implement one for our ArrayList class? For example, it could make clear that the __repr__ is that of an ArrayList, not of a list (like numpy arrays).

a
ArrayList([1, 2, 3])
a + [1, 2, 3]
[2, 4, 6]

Operator overloading should be used with care and can be considered an advanced use case of python classes, in particular when used with inheritance as in the example above. People used to lists in python will nott be happy with your new behavior, that is, you have to be careful to document what you are doing.

For example, our class above is not finished yet. Indeed, see what happens here:

[11, 12, 13] + a
[11, 12, 13, 1, 2, 3]

Huh? This did not work as expected! The difference to the above example is that our custom list a is now on the right-hand side of the operator. For this behavior there is a class interface as well, called __radd__ (for “right-hand side addition”). Let’s define this operator to do the same as on the left-hand side (this makes sense, because addition is commutative anyway):

class ArrayList(list):

    def __add__(self, other):
        """Do NOT do this at home!"""
        return [a + b for a, b in zip(self, other)]

    __radd__ = __add__
a = ArrayList([1, 2, 3])
[11, 12, 13] + a
[12, 14, 16]

This looks better now! But this example illustrates the complexity of the topic, and recommends due caution with operator overloading.

Inheritance#

Inheritance is a core OOP mechanism which is very useful to provide abstraction, encapsulation, modularity, and polymorphism to python objects. Let’s take the concrete example of the Pet, Cat and Dog classes from last week. Inheritance provides:

  • abstraction by giving clear names and actions to the concepts of “pet” and their real-world realizations “cats” and “dogs”

  • encapsulation by keeping track of a pet’s weight each time they eat, so that the user does not have to take care of the weight computation or how it is done

  • modularity by merging common code in the pet class while allowing class-specific code in each pet implementation

  • polymorphism by defining a common interface: each pet can “say its name loudly” (.say_name_loudly()), but the actual outcome depends on the class of the caller (cat or dog)

Now, how does the concept of class inheritance apply to real-world scientific applications? There are several examples from the scientific libraries you are using yourself:

  • xarray uses object inheritance to provide a single consistent interface to the various data files it can open: netCDF4, HDF5, geotiffs, images… All these file readers comply to common interfaces (parent classes) called WritableCFDataStore or BackendArray.

  • matplotlib’s YAxis and XAxis classes inherit from the general Axis class, which itself inherits from the base Artist class, which is responsible for drawing all kinds of things. Such relationships are sometimes visually summarized in a class diagram like this one.

  • cartopy’s PlateCarree projection is one realization of the more general Projection parent class, which has many subclasses.

Inheritance can also be useful for numerical models. In the glacier model we are developing, we are using inheritance to provide different ways to compute the surface mass-balance of glaciers. Some mass-balance models are very simple (e.g. LinearMassBalance) and others are more complex (e.g. PastMassBalance), but all models inherit from a common MassBalanceModel class which defines their interface, that is, how their methods should be called and the units of the data they compute. This is very useful in the model structure, because the actual user of the mass-balance models (in our case, a glacier dynamics model) does not have to care at all about which mass-balance model is actually providing the data: it just needs the mass-balance, not the details about how it is computed.

Take home points#

  • Object-oriented programming (OOP) is more of a “paradigm” than a true necessity: it is possible to write very powerful software without OOP. In practice, however, the OOP paradigm and its flexibility are in wide use in the scientific computing world. If you will not write OOP programs yourself, you will definitely use OOP programs for your own work.

  • The OOP concepts listed above help to formalize the role of OOP in a program’s structure. I would like you to remember that these terms exist and maybe remember examples of their application.

  • In this lecture we only talked about the advantages of OOP. There are also disadvantages in using OOP, but I decided not to cover this topic here because it requires a certain experience with OOP first. In practice, the disadvantages often summarize to one point: OOP is sometimes “over-used” and tends to over-complexify things which could be kept simple in functional programming. I recommend you to make your own experiences with OOP and come back to reading about these pitfalls later on.

Concluding remarks#

A “true” OOP lecture could occupy a full semester. Here, we only scratched the surface of a complex topic, and if you want to dig deeper into the programming world you will have to learn these skills yourself or with more advanced lectures. Fortunately, programming is one of the easiest skills to train online, and I hope that my introduction will help you to get started.