Chapter 18. Nick Coghlan

Nick Coghlan

Nick Coghlan is an Australian software developer and systems architect. His past roles include software engineer at Boeing Australia and senior software engineer at Red Hat Asia Pacific, a provider of open source solutions. Nick is a CPython core developer and BDFL-delegate for Python packaging interoperability standards. He is a founding member of the Python Software Foundation (PSF)'s Python Packaging Working Group, and the founder of the PyCon Australia Education Seminar. Over the past 20 years, Nick has contributed to a range of open source systems and software projects.

Mike Driscoll: What made you decide to become a computer programmer?

Nick Coghlan: Originally, I just did programming as a plaything as a kid. We had the good old BASIC programming book for the Apple IIe.

It wasn't until I did IT in my first year of high school that I discovered that computers were actually a thing you could play with as a job. The school that I went to was one of the first in the state to actually have an IT class. So that was pretty much why I then went into computer systems engineering at university.

My initial full-time job out of university was embedded systems programming in C, for a Texas Instruments DSP. From there, I ended up doing a lot more systems control and automation stuff, which looks a lot more like programming than it does embedded software development. So it was just the case that I enjoyed programming, I was good at it, and you can make money from it.

Driscoll: So why did you move into Python?

Coghlan: So the way that I came to Python is actually kind of interesting, because I was originally a C/C++ developer.

My only exposure to Python at university was from a networking lecturer who said, "I'm going to make you all do the assignments in Python, because I'm confident that none of you will know it". I was the guy who then replied, "Can we use a different language instead? I already know Java, and I'd like to use Java".

My lecturer said, "Well, if you really want to use Java then use it, but try Python first". So I tried Python 1.5.2 and it was fun.

Professionally, I was working for a large-scale system integrator here in Australia. For the DSP program I was working on, my test suite was a really rudimentary C program, which was a success if it got to the end without crashing.

We were just having lots of problems with the DSP code not working properly when we got to the next level of integration testing. So we had a huge amount of behavioral bugs getting through. We decided that we needed to write a better test suite to feed the audio in. It was important to check that we were getting the answers we were expecting from the actual data analysis, not simply that we could talk to the DSP and ask it to do things remotely.

We wanted to check the actual signal processing itself. We also really didn't want to write that in C and C++. Another part of the system had already had Python approved as a language for system control components. So Python wasn't being used for critical path stuff, but just orchestrating all the different bits of the system, and starting them when they were supposed to be started.

There were two main options that we were looking at for doing the automated testing. One option was using Python's unittest module, with SWIG, to generate the bindings to the C++ drivers that actually talked to the DSP. The alternative was the in-house C/C++ test framework that we used for everything else. We selected Python.

Driscoll: Why did you choose Python?

Coghlan: The thing was that Python had the unittest module to actually organize the testing. Python had SWIG to tie to the C++ driver. We controlled the API of that driver, so making it play nicely with SWIG was straightforward.

Then the last key piece was that Python, in its standard library, had the wave module, to play WAV files out of the PC. So that established a trend for that whole project, which was Australia's High Frequency Modernization Project. Python just ended up kind of proliferating through that project for all of the bits that were testing, mocking and simulating system interfaces for testing purposes.

Driscoll: So I know that another Australian helped to create pywin32. Did you have any involvement in that project?

Coghlan: No, I've only ever been a pywin32 user. There are actually lots of Australians who have historically contributed to the Python community. But because they haven't really been active in PyCon Australia, or anything like that, I've never actually met them!

Driscoll: Well, let's move on. How did you become a core developer for the Python language?

Coghlan: So my short answer to this question is that I became a core developer by arguing with Guido van Rossum!

What actually happened was that I'd been on Usenet since the late 1990s, and so I was very familiar with that whole online discussion format. After I started using Python, I ended up joining the original Python mailing list, and participating in discussions there.

I discovered that Python-Dev was a thing and started lurking on that, originally with the intention just to listen to what people were talking about. I actually started participating actively in discussions and posting as well. The first contribution that I can remember actually making was in discussions on the Python list.

It was very common to use the timeit module to time snippets of code and say, "Oh this is faster than that." At that point, if you wanted to time the snippets between two different versions, you had to find where the timeit module was in a particular version of the standard library.

We said, "Hang on! Python already knows where the timeit module is. Why are we having to tell Python where to find it?" So that ended up becoming a patch to add the initial version of the -m switch in Python 2.4. I think Raymond Hettinger reviewed that. This initial version of Python could only do top-level modules and couldn't do packages or submodules. Then finally by the time we reached Python 2.7, the -m switch actually worked properly and did all the things you would expect of it.

Something else interesting happened in late 2004. After a major crunch period at work, I took a leave of absence of three months. I ended up helping out Raymond and Facundo Batista with the initial performance enhancements on the Python decimal module. We were looking at what we could do to make the module faster.

Driscoll: Did you find a way to speed things up?

Coghlan: There was actually an eventual solution several years later, but in those early days, there was lots of benchmarking to say, "How fast can we make this just as a pure Python thing?"

There was a glorious hack that I remember from those days. We made the discovery that in pure Python, if you have a tuple of digits that you would like to turn into a decimal number, then the fastest conversion mechanism that CPython itself offers is to convert all the digits to strings, concatenate the strings, and then use int to convert the concatenated string back to a number.

This is because the string int conversions have been optimized to a point where doing that is faster than doing all the multiplication and addition operations as Python code. In C, of course, you do the arithmetic. Our findings really annoyed the PyPy developers. From their point of view, doing the arithmetic was a lot better, because the JIT worked. So this meant that their decimal module was slower than they liked.

I think that I began getting involved in discussions just after Python 2.3 came out. One of the popular pastimes was making fun of the extended slice syntax. You had the reverse smiley of open bracket, colon, colon, -1, and close bracket, to reverse a sequence. This was long before reversed or anything like that.

reversed became a thing because it turned out that getting the arithmetic right for reversing a slice was actually quite tricky. It was just really prone to off-by-one errors if you did it manually. So adding in reversed made things easier to read.

Driscoll: What do you think about the long life of Python 2.7? Should people move over to the latest version?

Coghlan: We deliberately set the support period of Python 2.7 such that existing users could make their own decision about when they considered the Python 3 ecosystem to be sufficiently mature for them to switch over.

Folks that had personally felt the pain of Python 2.7's limitations migrated early, so we're now at the point where most of the folks that are still to migrate are either looking for better tools to help them with that process, or are simply planning to sunset affected projects and products along with Python 2.7.

On the tooling front, one of the important use cases for Python 3's type hinting machinery is to allow folks to statically check for Python 3 type correctness errors, even if their automated test coverage is low. This greatly expands the scope of code which can be reliably migrated.

Driscoll: What changes would you like to see in future Python releases?

Coghlan: I'd like to see better tools for working with partially structured hierarchical data, but in a way that preserves Python's reputation as executable pseudo code. I'd also like to continue reducing the discrepancies between what can be done with extension modules, and what specifically requires a Python source module.

Finally, I'd like to see better support for protected memory management models, where rather than aiming to serve as a security boundary, we're instead providing memory separation as a way to assist with maintaining the correctness of concurrent code. CPython's subinterpreter feature already provides this to some degree, but that capability currently has a lot of usability challenges, which Eric Snow is looking to address.

Driscoll: Well good! So let's pretend that I want to become a core developer like you. What would I need to do to actually become one?

Coghlan: So one of the most important things is to figure out why you want to become a core developer. You need the answer to that question because there are going to be inevitable frustrations where you ask yourself: "Why the hell am I doing this?!"

If you don't know what your motivations are, then that's going to be a problem! Nobody else can answer the question for you. Having got past that point, the main thing about becoming a core developer is that a lot of it's actually about trust and earning trust.

It's a case of contributing, so as core reviewers we're basically there saying, "Do we want to accept this change and maintain it into the future? Can we give a good answer about why we have accepted the change, if later asked?"

What we're looking for when nominating new core developers and core reviewers is someone whose ability we trust to make good judgements. We want them to say, "Yes, this is a suitable change that will, on balance, make life better for future Python users."

Programming language design is a game of trade-offs. If you try to optimize for everything at once, then you end up optimizing for nothing. So there are a lot of things that have emerged over time as the trade-offs that make something Pythonic. It becomes a matter of understanding whether you can decide something on your own, or whether you need to take a problem to Python-Dev for discussion.

Then there is a final level of escalation, when we say, "This proposal is tricky enough and there are enough subtleties here. There is enough potential controversy here that we should escalate this problem to become a full Python Enhancement Proposal and thrash out the details, before doing anything else." It's ultimately a core developer that makes the decision about where in that spectrum a particular change lies.

Driscoll: How does a core developer go about making that decision?

Coghlan: Well, bug fixes are usually pretty straightforward because we know something is wrong. Even with a bug fix though, it's sometimes confusing.

We have three sources of truth, because we have what the reference interpreter does, what the test suite says it does, and what the documentation says it does. When all three of those are in agreement, then you know that there is consistency with what you are doing.

Where things start becoming more of a matter of design judgment is when the interpreter does something, and the test suite and the docs are silent on it. That case just isn't tested, and isn't documented as doing anything in particular. Then the other case is when the documentation says one thing, but the tests and the implementation say something different. In those cases, you have to say, "Well, is the documentation right and it's a bug, or are the docs just wrong?"

Those are the kinds of things that you get to do as a core developer. Whereas when you're contributor, you just want to get your ideas in. That's still a question of trust management, but what you're trying to do is persuade reviewers that your change is worth making. So yeah, it's certainly interesting!

You need to understand what becoming a core developer entails, and why it's something you want. In terms of the practical mechanics of the role, there's the Dev Guide that Brett Cannon originally wrote with BSF funding. The Dev Guide has been maintained and enhanced over time and it explains the difference between being a core developer and being a contributor to CPython.

There are extra responsibilities that come with being a core developer. The role includes working with issues, working with the reviewer, understanding the review process, discussing things on the mailing lists and making design decisions. You end up dealing with the inevitable frustrations of actually working on such a big project. The core mentorship mailing list can also be useful, depending on the kind of person you are.

Driscoll: So I've always been interested in Python Enhancement Proposals. Could you describe the process of how they get created and accepted?

Coghlan: Yes, so there are two different flows that the Python Enhancement Proposals (PEPs) can go through.

One flow is when a core developer proposes a change that we know we want to make, but we also know that this change will be big and complex. We know without anybody telling us that this change needs to be a PEP. So in those cases, we'll often just start by writing the PEP and committing the PEP to the PEPs repo.

We will then start the discussion on Python-ideas by saying, "Hey, I've written a new PEP proposing this, and here is why." Discussions basically just start at that level. Core developers manage the PEP process, because we've been through it a few times and we know when a change is big enough to qualify.

For other PEPs, the usual point of genesis is when somebody comes to Python-ideas with a suggestion. This suggestion will have been kicked around as a Python-ideas thread for a bit. People will then have said, "You know what, this actually sounds like it could potentially be a good idea!" The decision is then made to turn the idea into a full PEP and propose the idea that way, rather than just submitting it as an issue on the issue tracker.

That does actually remind me of the third way that PEPs happen. They can come out of discussions on the issue tracker when we definitely know we want to make a change, but there are lots of niggly details. We write a PEP, thrash out the details, and then use that to drive how we implement the idea.

Driscoll: So are these changes just discussed until they eventually get ironed out, and then accepted or rejected?

Coghlan: It depends on the proposal. With some proposals, the change itself is not controversial, but the details just need thrashing out.

Those proposals will usually go through some discussion on Python-ideas and Python-Dev. The decision will then be made to stop thrashing out the idea and start implementing it. The proposal becomes an accepted PEP and eventually goes through to final.

Some proposals are more borderline and we put a question to Python-Dev about whether they are in fact a good idea. We do actually have a proposal open at the moment around the null coalescing operator. We genuinely don't know if we want to proceed. This PEP would make the language more complex, because it's a cryptic syntax that people would have to learn and understand. So that's the main argument against the idea. But on the argument in favor, you're saying, "Well, this is a pattern that comes up fairly often in data manipulation pipelines."

So that PEP is still in discussion, until it does get to the point of finally being put to Python-Dev as a yes or no question. Then the decision will be made that yes we definitely want to proceed, or no we don't, unless something changes.

Very occasionally, you do get PEPs that are written specifically to be rejected. In those cases, an idea keeps coming up, but the arguments against it have never been clearly documented anywhere. So someone is just taking the time to write down the idea and write down all the reasons that we rejected the PEP, before saying, "Right! I'm posting this as a rejected PEP, to say this is why we don't do this". That makes me think of some of the new stuff that I've seen in Python 3.5 and 3.6, that was only partially accepted and classed as provisional.

Driscoll: So is that slightly different? Does that mean that people have agreed enough that they want to add something, but they may not keep it?

Coghlan: Yes, so we got caught a couple of times when we accepted a change, and the new API, and immediately put it under our standard backwards compatibility guarantee.

What we ended up doing was painting ourselves into a corner. We were stuck supporting an API that actually wasn't very good for the problem it was aiming to solve. We were getting these suggestions and potential module additions that were clearly beneficial and clearly helpful for users. The problem was that we were not sure we had the API design details right.

We didn't want to put anything under our full standard library backwards compatibility guarantee, so we decided not to include the additions. This approach ended up being bad for everyone, because it kept things out of the standard library that really should have been in there.

We also couldn't use that type of module to help us to improve other parts of the standard library. Honestly, one of the main ways that new building blocks get into the standard library is because we want to use them in other parts of the standard library. So there's a standard library enum type now, because we wanted enum types in things like the socket module.

The provisional PEP, which I think ended up being PEP 411, went through a few iterations. Basically PEP 411 was designed to give us that ability to accept modules that we're pretty confident we're going to keep, but we're not sure we have the API design details right yet.

We leave a PEP as provisional for a couple of releases, to give ourselves the right to make breaking changes to the API if we mess something up. I think async I/O only just went non-provisional in Python 3.6.

Driscoll: So does leaving a PEP as provisional work well?

Coghlan: Yes, we're actually really happy with how that's worked out. It lets us give people that clear warning that a PEP is still a bit in flux. This lets users know that we're still figuring out the details and if this bothers them, then they shouldn't use that PEP yet.

There was actually an interesting example recently for Python 3.6 with pathlib. So pathlib had been included as a provisional API and it had lots of interoperability problems with other standard library APIs that were expecting strings.

For Python 3.6, pathlib had hit a crossroads and was either going to get taken out of the standard library again and pushed back to purely being a PyPI module, or the interoperability issues had to be fixed. That was the either/or decision that was before the core development team for Python 3.6.

This decision became the os.path protocol, or the os.fspath protocol and the path-like objects support, which is basically fixing the interoperability problem for pathlib. So this means that there are a lot of standard library APIs now that automatically accept path-like objects.

Driscoll: Alright, so what is the Python Packaging Authority?

Coghlan: So the Python Packaging Authority's name actually started as a joke by the pip and virtualenv developers. They wanted a name for the development team that covered both projects. So they said, "Let's call ourselves the Python Packaging Authority, because nobody expects the Python Packaging Authority!"

Then, back in 2013, we were starting to actively try to bring more of the tools, like setuptools and distutils, into that space. The Python Packaging User Guide started bringing all that information together, to offer a more coherent and officially recommended way of doing things. We needed a name for that umbrella group too. We decided that the Python Packaging Authority was kind of cool as a name, so we could start bringing in more projects under that umbrella.

Basically, the Python Packaging Authority occupies a role around packaging tools and interoperability standards, that's similar to the role that core developers play in relation to Python as a whole. While there's some overlap between people who are interested in programming language design and people who are interested in software distribution design, there are a lot of people who fall on one side or the other. Those people aren't the least bit interested in the other aspects.

Separating the two types of people means that anyone who cares about both types of design can participate in both subcommunities. But we're not constantly trying to explain the complexities of software distribution to language designers and vice versa. I think this split has made people a lot happier in general. It's nice to be in a group that you understand. I like packaging, but I like Python too. So I'm kind of torn on which one I'd probably fall under. I'd probably want to work on Python and the Python Packaging Authority too.

Driscoll: Python is one of the major languages being used in AI and machine learning. Why do you think this is?

Coghlan: AI and machine learning are an interesting mix of exploratory interactive data analysis and heavy-duty number-crunching. CPython's rich C API has led to Python serving as a 'glue' language for interconnecting high performance components written in languages like C, C++, and Fortran.

The scientific research community has been using Python that way for more than 20 years (the first version of Numeric was released in 1995). This means that Python offers a unique hybrid of a flexible, yet easy-to-learn and general-purpose computing language, combined with a set of scientific computing libraries, developed for use in high-performance computing environments.

Driscoll: What could be done to make Python a better language for AI and machine learning?

Coghlan: On the ease of use side, there are still a lot of opportunities to make components more readily available to users, either through preconfigured freemium web services (like Google Colabatory or Microsoft Azure Notebooks), or locally through the Python and Conda packaging toolchains.

On the performance side, there are also a lot of unexplored opportunities to better optimize the CPython interpreter and the Cython static compiler (for example, Cython doesn't currently ship a shared dynamic runtime, so there's likely a lot of duplicated boilerplate code in generated modules, that not only makes them larger and slower to compile, but also slower to import at runtime).

Driscoll: So I noticed that you are a fellow blogger. How long have you been writing about Python and what made you decide to become a blogger?

Coghlan: It was probably around Python 3.3 that I started talking about programming stuff on my blog. Mostly, I find writing is a very useful aid to thinking. You're forced to get an idea coherent enough to be readable. So that's mainly the way that I still use the blog now. If there's something in particular about Python that I want to reference later, then I write down my current thoughts.

Driscoll: In your opinion, is Python a good language to actually start learning programming with?

Coghlan: I do recommend Python as a first text-based language. For a lot of people, starting with one of the plug-and-play languages is a good alternative if they want to get the basic concepts down.

Once you want to get into full combinatorial programming, then Python's a very good language. The deliberate language design restrictions are not very bright. You cannot get them to parse very complicated action at a distance things. If you study linguistics, then you realize that the human brain also struggles to parse complicated at a distance things.

So the advantage of Python is that you only need one token look ahead to understand the context of the thing you're currently looking at. You don't need to keep much in your head to understand what the code is trying to tell you. We try to keep things visible as to where different names are coming from. I think that makes a surprising amount of difference to how easy it is for people to fit ideas into their brain.

I made a post several years ago about scripting languages and suitable complexity. If you look at a cookbook, or a work instruction guide, then you will find procedural instructions. The outer layer of a cookbook is very much procedural and sequential. Then the subfunctions and the objects are all kind of embedded within that framework. I think Python works well for people because it reflects how we interact with the world.

Driscoll: Could you explain a little more about why Python works so well?

Coghlan: Sure, we do things in sequence. Starting procedurally as your foundation, and then layering all of your other things on top, as you need them, makes a lot of sense.

Object-oriented programming, functional programming and event-based programming are all techniques that we have come up with to manage complexity. Whichever one of them you choose, as your fundamental organizing principle for your language, then sets the minimum level of complexity for what you do.

It's really interesting to talk to people that teach with robotics and the embodied computing type environment. When you teach that way, starting with objects is a good way to go. Embodied computing people have that natural ability to say, "That robot sitting on my desk corresponds to the class 'Robot' in my program." They can do that visual correlation.

I think it's the case that procedural by default really does match the way cookbooks and instructions are written. That is good for lowering barriers to entry but, at the same time, Python is a language that can grow with you. Python has all the tools to do mathematical programming, object-oriented programming and functional programming.

You can use Python based on the kinds of problems that you have. When you start learning more about particular aspects of Python, then you can use that as a launching point to get into languages that specialize in a particular area. So you can use Python to launch into Haskell (functional programming), Java or C#.

Driscoll: So let's pretend that I know all the basics of Python and now I want to enhance my understanding of the language. What should I do?

Coghlan: The important question to ask yourself at this point is how you learn. So for example, for myself, I figured out that I'm very much about needs-based learning.

I don't do well learning things just for the sake of learning them. I learn new programming techniques and new libraries in order to solve a problem. In my case, I find the problem I'm interested in solving and then learn whatever I need to do to solve that.

In terms of learning more, Allison Kaptur has written some quite good stuff. We've started adding a section to the Dev Guide about diving into internals. One useful trick can be to look at something you use every day, particularly an open source library, and just start digging into the code.

So in the standard library, there will actually be links to the source code from the standard library module documentation. Actually just going and reading that, and trying to figure out why certain things are done, can be useful.

That reminds me of another interesting project called Python Tutor ( Python Tutor is a code visualizer or a behavioral visualizer. As you work through the code, Python Tutor has a little system model that it updates progressively, explaining what's going on.

One strategy, that I know some people have certainly found useful, is trying to change things, not because they actually want to make a change, but just to learn the mechanics of what's involved.

Driscoll: What are you most excited about in Python today?

Coghlan: I'll give a split answer here, as my professional and personal perspectives on the question are slightly different.

In a lot of ways, Python has done to the Linux ecosystem what the Linux ecosystem did to enterprise organizations in general: become ubiquitous without anyone really bothering to tell executive management about it. This means that everything we've achieved so far has been done primarily through the efforts of the volunteer community contributors, with only occasional and intermittent investments from large commercial and institutional users.

So professionally, the thing that most excites me is the fact that the increase in the use of AI and machine learning techniques in business software development is prompting a lot of organizations to realize that there's more to the world of software development than the current enterprise incumbents of C, C++, Java, and C#.

This has been most clearly visible in recent years through IEEE Spectrum's annual multi-data-source language ranking, where Python started out, in 2014, at the edge of the top five (with C#), but has steadily climbed through those rankings, reaching first place in the 2017 edition of the survey.

Personally, the thing that most excites me is the way we're getting teachers and other educators directly involved in the open source Python community. Prompted by an excellent keynote from James Curran at PyCon Australia 2014, and the Education Track at PyCon UK, I founded the PyCon Australia Education Seminar in 2015, and we've been running that every year since.

A lot of Python user groups also have a specific focus on adult education and offer workshops for folks either looking to improve their computing skills in their current profession, or contemplating a career change into software development.

Driscoll: Thank you, Nick Coghlan.

