Designing a class for reliable pickle processing

The __init__() method of a class is not actually used to unpickle an object. The __init__() method is bypassed by using __new__() and setting the pickled values into the object's __dict__ directly. This distinction matters when our class definition includes some processing in __init__(). For example, if __init__() opens external files, creates some part of a GUI, or performs some external update to a database, then this will not be performed during unpickling.

If we compute a new instance variable during the __init__() processing, there is no real problem. For example, consider a Blackjack Hand object that computes the total of the Card instances when the Hand is created. The ordinary pickle processing will preserve this computed instance variable. It won't be recomputed when the object is unpickled. The previously computed value will simply be unpickled.

A class that relies on processing during __init__() has to make special arrangements to be sure that this initial processing will happen properly. There are two things we can do:

  • Avoid eager startup processing in __init__(). Instead, do only the minimal initialization processing. For example, if there are external file operations, these should be deferred until required. If there are any eager summarization computations, they must be redesigned to be done lazily. Similarly, any initialization logging will not be executed properly.
  • Define the __getstate__() and __setstate__() methods that can be used by pickle to preserve the state and restore the state. The __setstate__() method can then invoke the same method that __init__() invokes to perform a one-time initialization processing in ordinary Python code.

We'll look at an example where the initial Card instances loaded into a Hand are logged for audit purposes by the __init__() method. Here's a version of Hand that doesn't work properly when unpickling:

audit_log = logging.getLogger("audit")

class
Hand_bad:

def __init__(self, dealer_card: Card, *cards: Card) -> None:
self.dealer_card = dealer_card
self.cards = list(cards)
for c in self.cards:
audit_log.info("Initial %s", c)

def append(self, card: Card) -> None:
self.cards.append(card)
audit_log.info("Hit %s", card)

def __str__(self) -> str:
cards = ", ".join(map(str, self.cards))
return f"{self.dealer_card} | {cards}"

This has two logging locations: during __init__() and append(). The __init__() processing works nicely for most cases of creating a Hand_bad object. It doesn't work when unpickling to recreate a Hand_bad object. Here's the logging setup to see this problem:

import logging,sys 
audit_log = logging.getLogger("audit") 
logging.basicConfig(stream=sys.stderr, level=logging.INFO) 

This setup creates the log and ensures that the logging level is appropriate for seeing the audit information. Here's a quick script that builds, pickles, and unpickles Hand:

>>> h = Hand_bad(FaceCard("K", "♦"), AceCard("A", "♣"), Card("9", "♥"))
INFO:audit:Initial A♣
INFO:audit:Initial 9♥
>>> data = pickle.dumps(h) >>> h2 = pickle.loads(data)

When we execute this, we see that the log entries written during __init__() processing. These entries are not written when unpickling Hand. Any other __init__() processing would also be skipped.

In order to properly write an audit log for unpickling, we could put lazy logging tests throughout this class. For example, we could extend __getattribute__() to write the initial log entries whenever any attribute is requested from this class. This leads to stateful logging and an if statement that is executed every time a hand object does something. A better solution is to tap into the way state is saved and recovered by pickle.

class Hand2:

def __init__(self, dealer_card: Card, *cards: Card) -> None:
self.dealer_card = dealer_card
self.cards = list(cards)
for c in self.cards:
audit_log.info("Initial %s", c)

def append(self, card: Card) -> None:
self.cards.append(card)
audit_log.info("Hit %s", card)

def __str__(self) -> str:
cards = ", ".join(map(str, self.cards))
return f"{self.dealer_card} | {cards}"

def __getstate__(self) -> Dict[str, Any]:
return vars(self)

def __setstate__(self, state: Dict[str, Any]) -> None:
# Not very secure -- hard for mypy to detect what's going on.
self.__dict__.update(state)
for c in self.cards:
audit_log.info("Initial (unpickle) %s", c)

The __getstate__() method is used while picking to gather the current state of the object. This method can return anything. In the case of objects that have internal memoization caches, for example, the cache might not be pickled in order to save time and space. This implementation uses the internal __dict__ without any modification.

The __setstate__() method is used while unpickling to reset the value of the object. This version merges the state into the internal __dict__ and then writes the appropriate logging entries.

In the next section, we'll take a look at security and the global issue.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset