Your code can sometimes take a long time to execute. Depending on who notices, this may or may not be an issue. If some code takes 30 seconds to do its thing “behind the scenes,” the wait may not be an issue. However, if your user is waiting for your application to respond, and it takes 30 seconds, everyone notices. What you should do to fix this problem depends on what you’re trying to do (and who’s doing the waiting). In this short chapter, we’ll briefly discuss some options, then look at one solution to the issue at hand: what happens if something takes too long?
When you write code that has the potential to make your users wait, you need to think carefully about what it is you are trying to do. Let’s consider some points of view.
Maybe it is the case that waiting for a write is different from waiting for a read, especially as it relates to how your webapp works?
Let’s take another look at the SQL queries in log_request
and view_the_log
to see how you’re using them.
In the log_request
function, we are using an SQL INSERT
to add details of the request to our backend database. When log_request
is called, it waits while the INSERT
is executed by cursor.execute
:
Code that waits for something external to complete is referred to as “blocking code,” in that the execution of your program is blocked from continuing until the wait is over. As a general rule, blocking code that takes a noticeable length of time is bad.
The same holds for the view_the_log
function, which also waits whenever the SQL SELECT
query is executed:
Both functions block. However, look closely at what happens after the call to cursor.execute
in both functions. In log_request
, the cursor.execute
call is the last thing that function does, whereas in view_the_log
, the results of cursor.execute
are used by the rest of the function.
Let’s consider the implications of this difference.
If you’re reading the title to this page and thinking “Of course they are!”, be assured that (this late in this book) we haven’t lost our marbles.
Yes: an SQL INSERT
is different from an SQL SELECT
, but, as it relates to your use of both queries in your webapp, it turns out that the INSERT
in log_request
doesn’t need to block, whereas the SELECT
in view_the_log
does, which makes the queries very different.
This is a key observation.
If the SELECT
in view_the_log
doesn’t wait for the data to return from the backend database, the code that follows cursor.execute
will likely fail (as it’ll have no data to work with). The view_the_log
function must block, as it has to wait for data before proceeding.
When your webapp calls log_request
, it wants the function to log the details of the current web request to the database. The calling code doesn’t really care when this happens, just that it does. The log_request
function returns no value, nor data; the calling code isn’t waiting for a response. All the calling code cares about is that the web request is logged eventually.
Which begs the question: why does log_request
force its callers to wait?
Yes. That’s our madcap idea.
When users of your webapp enter a new search, they couldn’t care less that the request details are logged to some backend database, so let’s not make them wait while your webapp does that work.
Instead, let’s arrange for some other process to do the logging eventually and independently of the webapp’s main function (which is to allow your users to perform searches).
Here’s the plan: you’re going to arrange for the log_request
function to execute independently of your main webapp. To do this, you’re going to adjust your webapp’s code so each call to log_request
runs concurrently. This will mean that your webapp no longer has to wait for log_request
to complete before servicing another request from another user (i.e., no more delays).
If log_request
takes an instant, a few seconds, a minute, or even hours to execute, your webapp doesn’t care (and neither does your user). What you care about is that the code eventually executes.
When it comes to arranging for some of your application’s code to run concurrently, Python has a few options. As well as lots of support from third-party modules, the standard library comes with some built-in goodies that can help here.
One of the most well known is the threading
library, which provides a high- level interface to the threading implementation provided by the operating system hosting your webapp. To use the library, all you need to do is import
the Thread
class from the threading
module near the top of your program code:
from threading import Thread
Go ahead and add this line of code near the top of your vsearch4web.py
file.
For the full list of (and all the details about) Python’s standard library concurrency options, see https://docs.python.org/3/library/concurrency.html.
Now the fun starts.
To create a new thread, you create a Thread
object, assigning the name of the function you want the thread to execute to a named argument called target
, and providing any arguments as a tuple to another named argument called args
. The created Thread
object is then assigned to a variable of your choosing.
As an example, let’s assume that you have a function called execute_slowly
, which takes three arguments, which we’ll assume are three numbers. The code that invokes execute_slowly
has assigned the three values to variables called glacial
, plodding
, and leaden
. Here’s how execute_slowly
is invoked normally (i.e., without our worrying about concurrent execution):
execute_slowly(glacial, plodding, leaden)
If execute_slowly
takes 30 seconds to do what it has to do, the calling code blocks and waits for 30 seconds before doing anything else. Bummer.
In the big scheme of things, waiting 30 seconds for the execute_slowly
function to complete doesn’t sound like the end of the world. But, if your user is sitting and waiting, they’ll be wondering what’s gone wrong.
If your application can continue to run while execute_slowly
goes about its business, you can create a Thread
to run execute_slowly
concurrently. Here’s the normal function call once more, together with the code that turns the function call into a request for threaded execution:
Granted, this use of Thread
looks a little strange, but it’s not really. The key to understanding what’s going on here is to note that the Thread
object has been assigned to a variable (t
in this example), and that the execute_slowly
function has yet to execute.
Assigning the Thread
object to t
prepares it for execution. To ask Python’s threading technology to run execute_slowly
, start the thread like this:
At this point, the code that called t.start
continues to run. The 30-second wait that results from running execute_slowly
has no effect on the calling code, as execute_slowly
’s execution is handled by Python’s threading
module, not by you. The threading module conspires with the Python interpreter to run execute_slowly
eventually.
Your first instinct may be to back out the code you added to run log_request
in its own thread (and get yourself back to a known good state). But let’s not panic, and let’s not do that. Instead, let’s take a look at that descriptive paragraph that appeared twice in the traceback messages:
... This typically means that you attempted to use functionality that needed an active HTTP request. Consult the documentation on testing for information about how to avoid this problem. ...
This message is coming from Flask, not from the threading
module. We know this because the threading
module couldn’t care less about what you use it for, and definitely has no interest in what you’re trying to do with HTTP.
Let’s take another look at the code that schedules the thread for execution, which we know takes 15 seconds to run, as that’s how long log_request
takes. While you’re looking at this code, think about what happens during that 15 seconds:
The instant the thread is scheduled for execution, the calling code (the do_search
function) continues to execute. The render_template
function executes (in the blink of an eye), and then the do_search
function ends.
When do_search
ends, all of the data associated with the function (its context) is reclaimed by the interpreter. The variables request
, phrase
, letters
, title
, and results
cease to be. However, the request
and results
variables are passed as arguments to log_request
, which tries to access them 15 seconds later. Unfortunately, at that point in time, the variables no longer exist, as do_search
has ended. Bummer.
Based on what you’ve just learned, it appears the log_request
function (when executed within a thread) can no longer “see” its argument data. This is due to the fact that the interpreter has long since cleaned up after itself, and reclaimed the memory used by these variables (as do_search
has ended). Specifically, the request
object is no longer active, and when log_request
goes looking for it, it can’t be found.
So, what can be done? Don’t fret: help is at hand.
There’s really no need for a rewrite.
At first glance, it might appear that you’d need to rewrite log_request
to somehow rely less on its arguments... assuming that’s even possible. But it turns out that Flask comes with a decorator that can help here.
The decorator, copy_current_request_context
, ensures that the HTTP request that is active when a function is called remains active even when the function is subsequently executed in a thread. To use it, you need to add copy_current_request_context
to the list of imports at the top of your webapp’s code.
As with any other decorator, you apply it to an existing function using the usual @
syntax. However, there is a caveat: the function being decorated has to be defined within the function that calls it; the decorated function must be nested inside its caller (as an inner function).
Here are the four questions posed at the start of Chapter 11:
What happens if the database connection fails?
Is our webapp protected from web attacks?
What happens if something takes a long time?
What happens if a function call fails?
Your webapp now handles a number of runtime exceptions, thanks to your use of try
/except
and some custom exceptions that you can raise
and catch as required.
When you know something can go wrong at runtime, fortify your code against any exceptions that might occur. This improves the overall robustness of your application, which is a good thing.
Note that there are other areas where robustness could be improved. You spent a lot of time adding try
/except
code to view_the_log
’s code, which took advantage of the UseDatabase
context manager. UseDatabase
is also used within log_request
, and should probably be protected, too (and doing so is left as a homework exercise for you).
Your webapp is more responsive due to your use of threading to handle a task that has to be performed eventually, but not right away. This is a good design strategy, although you do need to be careful not to go overboard with threads: the threading example in this chapter is very straightforward. However, it is very easy to create threading code that nobody can understand, and which will drive you mad when you have to debug it. Use threads with care.
In answering question 3—what happens if something takes a long time?—the use of threads improved the performance of the database write, but not the database read. It is a case of just having to wait for the data to arrive after the read, no matter how long it takes, as the webapp wasn’t able to proceed without the data.
To make the database read go faster (assuming it’s actually slow in the first place), you may have to look at using an alternative (faster) database setup. But that’s a worry for another day that we won’t concern ourselves with further in this book.
However, having said that, in the next and last chapter, we do indeed consider performance, but we’ll be doing so while discussing a topic everyone understands, and which we’ve already discussed in this book: looping.