Chapter 19. Mike Bayer

Mike Bayer

Mike Bayer is an American software developer and a senior software engineer at Red Hat, which sells open source software products. Previous positions include many New York-based internet companies like MLB.com. He also worked on content management software at Major League Baseball. Mike is the creator of a number of open source programming libraries for Python, such as SQLAlchemy, an SQL toolkit and object-relational mapper. He plays an active role in the Python community by promoting good database software practices. Mike is a regular speaker at PyCon US and smaller conferences in Europe.

Discussion themes: SQLAlchemy, AI, v2.7/v3.x.

Catch up with Mike Bayer here: @zzzeek

Mike Driscoll: What made you become a programmer?

Mike Bayer: I've had an interest in computers since 1980, when I was first exposed to early personal computers. I tried to learn game programming in assembly language for early 8-bit computers, without much success. In high school, I was exposed to data structures and procedural programming with Pascal.

It seemed pretty natural that I'd become a programmer, but as it turned out, I switched majors from computer engineering to music and took several years off from touching computers at all. I had found myself being overly competitive with other programmers that I met on bulletin boards and I didn't like who I was.

I got back into computers strictly because it was the only way that I could eat and pay rent. About that time, the internet became a commercial industry and I immediately got involved in that kind of work.

Once the first internet bubble came along, being a programmer in NYC was suddenly intense and exciting. Everyone wanted you to work for them. The competitive element of programming has in fact created continuous problems for me over the years. I've had to work to minimize that issue.

Driscoll: So how did you get started with Python?

Bayer: Most of my pre-Python career was spent programming in Perl, Java, and a little bit of C. I was really into object-oriented application design and I ended up going through a deep architecture astronaut phase, which was very common with Java programmers in the late 1990s and early 2000s.

I liked the idea of scripting languages, because they allowed you to jump right into a text file. You would have something that could work immediately without the formality, boilerplate and compilation step of Java. So I also spent a lot of time trying to realize OO design in Perl, which was pretty unsatisfying.

Mike Bayer: 'After a few years of refusing to accept significant whitespace, I finally got into Python.'

I became aware that Python might be something that could really strike a balance between those two worlds. After a few years of refusing to accept significant whitespace, I finally got into Python and realized that the language was in fact everything that I was looking for.

Mike Driscoll: What makes Python special to you?

Bayer: What impressed me about Python was the way that everything in your interpreter was a Python object, including all of the modules that you imported.

Nowadays, that whole way of looking at things is second nature to me. But when I first learned that I could inspect elements of the program itself as just more data, all of the other languages that I'd been exposed to were nothing like that.

Python was so simple to understand, especially after I had spent years never really understanding what Perl's use statement did. I also observed in Python a certain emphasis on consistency and correctness that was uncharacteristic in scripting languages in general.

I predicted that the Python programmers that I'd be working with would be higher quality developers than I'd otherwise been exposed to, since they were attracted to Python! That turned out to be completely true.

Driscoll: So what inspired you to create SQLAlchemy?

Bayer: Well, I had always had the goal of figuring out which programming language I wanted to make my home in. Within that language, I wanted to work up a full suite of tools that I could use for everything. I wanted to be able to strike out independently and build applications for people.

Mike Bayer: 'I wanted to be able to strike out independently and build applications for people.'

At my various jobs, I had always had to create some kind of database abstraction layer that I'd then use in many projects. I was always building little template engines, mini web frameworks and database abstraction layers, in whatever language I was using, which I'd try to standardize for all of my projects.

So when I got into Python, I was unsatisfied with the web framework tools and database abstraction tools that were available at that time. I had also written many template engines and database access tools already, so I had a lot of ideas.

Mike Bayer: 'When I got into Python, I was unsatisfied with the web framework tools and database abstraction tools that were available at that time.'

I first wrote a template engine called Myghty, which was an almost line-for-line port of the Perl template engine HTML::Mason. Myghty was horrible, yet it gained some brief popularity and formed the basis of the first version of the Pylons web framework.

When I set out to write SQLAlchemy, I took a very deep and slow approach, to try to make it amazing. I was still very flawed as a programmer and especially as a Python programmer at that point. Early SQLAlchemy had many awful design choices, but it still shined as something that was truly unique and potentially kind of amazing. The first time that I saw the unit of work do a flush, I was amazed. I realized that this thing might have a deep impact on people.

Driscoll: So how did Mako come about?

Bayer: Mako was very simply created to replace Myghty and all of its horrible design choices, so that Pylons could have a template engine that wasn't embarrassing.

Mako was meant to be a very capable and solid template engine, which could more or less be left to go on its own once it was complete. While Mako did gain more features over the years, I've considered it to be complete for many years now. I still use Mako, but I'm happy for Jinja2 to be the de facto template engine in Python. Armin Ronacher did, after all, credit Mako's architecture for being a lot of his inspiration for creating Jinja2.

Mike Bayer: 'I still use Mako, but I'm happy for Jinja2 to be the de facto template engine in Python.'

Driscoll: If you could start over with SQLAlchemy, what would you do differently?

Bayer: There were some mistakes that I made, which led to scenarios that ultimately benefited the project immensely. So if I had not made those mistakes, then I'm not sure how things would have turned out.

My issue with competitiveness, which I've mentioned, caused me to have poor interactions very early on with some of the contributors. Chasing away people who had good ideas, and in many cases, saw things much more clearly than I did, was a huge mistake.

I should also have spent more time reading other Python code and getting better at using the correct idiomatic patterns, rather than having to retroactively fix all of the code once I learned new things about Python.

If I could start over with SQLAlchemy, I would do other things differently too. There were a lot of design patterns that were in the 0.1 version that I tried to get rid of by version 0.2 or 0.3. I couldn't remove those patterns totally.

Version 0.1 relied heavily on the implicit association of objects with database connections, both at the core and ORM levels. Today, two of these patterns still exist as bound metadata and connectionless execution. These patterns remain extremely popular, but continue to create subtle confusion, in contrast to the newer patterns that are based on explicitness.

Mike Bayer: 'Had I been starting with what I know today, SQLAlchemy would have been much closer to the mark to begin with.'

There are many other API patterns that have been heavily revised over the years. Had I been starting with what I know today, then SQLAlchemy would have been much closer to the mark to begin with. There would have been no need to go through major API changes in the early releases.

I also should have recognized the need for a good SQL migrations tool early on, although sqlalchemy-migrate did a good job of handling this until I had time to create Alembic migrations.

Driscoll: What have you learned from creating open source projects?

Bayer: Well, for one thing, if your open source project turns out to be popular, then it will never be finished. If your project is linked to some set of constantly changing technology, like Python database APIs, then your work will never be done.

Mike Bayer: 'If your open source project turns out to be popular, then it will never be finished.'

I had no idea that the pace of bug fixing would remain constant for over ten years. I have also learned that to be successful in open source, you do have to have a lot of luck. You must be fortunate enough to be doing a project at the right time. I got into Python much earlier than most of the community and produced my software at the perfect time.

Finally, I've learned a lot about the calculus that you must apply when a user wants some feature, or behavior X. You can't really take them at their word. Often, when users think that they want X, they really want Y. Sometimes they think that they want X, but they haven't thought through the ramifications.

You always have to be very careful about how you go about adding X. At the same time, you don't want the user to be upset if you are denying their feature request. Above all, as the maintainer, you need to be as courteous as possible. This is extremely difficult, because lots of users are pretty disrespectful and entitled. You gain nothing by venting about this though.

Driscoll: We're seeing Python being used a lot in AI and machine learning. Why do you think that Python is such a great language for this?

Bayer: What we're doing in that field is developing our math and algorithms. We're putting the algorithms that we definitely want to keep and optimize into libraries such as scikit-learn. Then we're continuing to iterate and share notes on how we organize and think about the data.

Mike Bayer: 'A high-level scripting language is ideal for AI and machine learning, because we can quickly move things around and try again.'

A high-level scripting language is ideal for AI and machine learning, because we can quickly move things around and try again. The code that we create spends most of its lines on representing the actual math and data structures, not on boilerplate.

A scripting language like Python is even better, because it is strict and consistent. Everyone can understand each other's Python code much better than they could in some other language that has confusing and inconsistent programming paradigms.

The availability of tools like IPython notebook has made it possible to iterate and share our math and algorithms on a whole new level. Python emphasizes the core of the work that we're trying to do and completely minimizes everything else about how we give the computer instructions, which is how it should be. Automate whatever you don't need to be thinking about.

Mike Bayer: 'Automate whatever you don't need to be thinking about.'

Driscoll: How do you think that Python could be a better language for AI and machine learning?

Bayer: Machine learning is a CPU intensive task, so we need to continue iterating on how to make better use of all of those processor cores, which unfortunately means the Global Interpreter Lock (GIL). Right now, the only way to do that is to use multiprocessing.

Mike Bayer: 'Python still lacks a decent concurrency paradigm.'

Python still lacks a decent concurrency paradigm that is somewhere between threads, where Python's dynamic contract means that we have a GIL and processes, which incur complexity and expense regarding how to share data. It might be helpful to have an interpreter concept that acts largely like multiprocessing, but is somehow doing it within a single process space. This concept would use OS-level threads, yet still keep the processes isolated enough that they don't share the same GIL.

Driscoll: What advice would you give to someone who is new to programming in general?

Bayer: There is a lot of conventional wisdom in computer programming. You should always put conventional wisdom on trial.

Mike Bayer: 'You should always put conventional wisdom on trial.'

There are rules in programming, such as don't use mutable global variables, which are actually more like training wheels for beginners. They are good rules, that have a lot of truth in them, but none of them apply in every case.

As you progress from being a beginner to being more advanced, you want to be able to think on your own. You also want to gain experience by finding novel and creative ways to solve problems. These ideas might not always work out, but establishing a core practice, of always challenging the status quo, will hopefully allow you to see a great new solution to a problem one day.

Driscoll: Which language would you recommend to someone who is starting out in programming?

Bayer: I think Python is the best beginner language that I've ever seen. For your first few years of programming, you can just use Python and you'll probably be doing JavaScript as well, since the browser is unavoidable.

At some point, it's also a great idea to write some kind of scripting language interpreter or compiler. An understanding of how instructions declared at a high level, like a Python function, end up manifesting as instructions run by a CPU, is an essential perspective to have.

Driscoll: What about Python today most excites you?

Bayer: I'm excited that Python is becoming the default language that virtually everyone who wants to do thoughtful work with data chooses first, particularly in the field of journalism.

Mike Bayer: 'I look forward to a new crop of journalists who can program Python as well as they can write a headline.'

Journalism is becoming more data-driven and I look forward to a new crop of journalists who can program Python as well as they can write a headline. We need journalists who can produce stories that are based on data from the ground up. This will hopefully lead to more data being available as the demand increases. Imagine if each time we read a story in the Washington Post, there was also an IPython notebook right there, which we could use to analyze the data in the story.

Driscoll: Should people now leave Python 2.7 behind?

Bayer: Moving from Python 2.7 is a problem that will solve itself. I think that people in the data field are definitely starting with the 3.x series now. In the infrastructure world that I work in, we are understandably taking a lot longer to get there, but we will.

Mike Bayer: 'Moving from Python 2.7 is a problem that will solve itself. I think that people in the data field are definitely starting with the 3.x series now.'

Driscoll: What are some changes that you're hoping to see in future Python releases?

Bayer: To be honest, in the future I'd like to see less emphasis on the asyncio system, which I believe is a widely misunderstood API.

New programmers are starting their projects using async for the entire system end-to-end. They are creating buggy and overly complicated applications as a result, which don't perform any better than they would using traditional techniques.

There is definitely a place for asynchronous I/O, but in virtually any real-world application, it should be limited to dealing with interaction with external resources and clients. This should only be when the scale of external data interaction will be very wide and concurrent (e.g. scraping thousands of websites, or waiting for commands from thousands of clients).

The central engines of our applications (those which are interacting with local data and doing our business logic and algorithms), should be written with traditional threading. Asynchronous and synchronous components can talk to each other quite well, however the programmer needs to understand both paradigms well. The current async culture does not emphasize this at all.

Driscoll: Thank you, Mike Bayer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset