In this chapter, you will learn about:
For most iPhone apps today, developers usually either load data from their own servers or consume data from third-party services. A minority of apps have data stored in the file system and load it to display to users when necessary. Very few apps do not use any kind of network or file IO processing. Therefore, understanding the impact of these types of processing helps you to figure out problems and solve them easier.
Let's see how long it takes to load an image from file system to memory and from a given server to memory. Of course, the results will vary depending on how quickly the server processes the request, the speed of the network, and how far the server is from the testing machine. However, I want to demonstrate the important idea that loading an image across the network is much slower than loading it from a file and that loading it from a file is much slower than having the image already in memory. I tested the performance based on loading a 50kb image. Here are the results:
File Loading Time: 0.001147
Network Loading Time: 4.160634
Loading the image from the file system took 1 millisecond, while loading from network took 4 seconds—a huge difference! The 1 millisecond doesn't seem like much in terms of performance; however, imagine if you needed to load 10-20 images at a time and some of the images were large in size—up to few hundred kbs. The total time for loading all those images would be more than couple of seconds.
There are two main problems with loading from a file or the network.
UITableView
, users will have to wait everytime they scroll down to see more information.Thus, because loading from file/network takes much more time than loading and processing data/images in memory, this loading process is usually your performance bottleneck. If your app has to wait for the data from the network, all other processes must wait as well. So testing for file/network loading should always be the first thing you do when you run into performance issues. As you saw in Chapter2, you can observe the data loading process with System Activity and File Activity. Figure 4–1 shows the UI of those instruments.
Figure 4–1. File Activity instrument
Figure 4–2. List of results from File Activity
Figure 4–1 shows the file activities including loading and writing file/directory and reading the file attributes. Figure 4–2 shows more details about each activity, which helps you to see what kinds of file activities are running most often.
Figure 4–3. System Usage instrument
Figure 4–4. List of results from System Usage
Figure 4–3 and 4–4 shows more about System Usage, which is more general and covers more data types. As you can see in Figure 4—4, there are activities with the plist and nib files.
This section will explain many important terms and concepts in caching and will introduce some basic algorithms for caching in general environments such as web and desktop applications. Many of these algorithms can be applied to the iPhone environment; these will be explained in subsequent sections of this chapter.
I will explain the following concepts: cache hit, cache miss, storage cost, retrieval cost, invalidation, replacement policy, and measuring cache. Then I will explain some of the most used caching algorithms: Belady's algorithms, random replacement, first in first out (FIFO), least frequently used (LFU), simple time-based, least recently used (LRU), and adaptive replacement cache (ACR).
Caching is when you store part of a set of data/images in a nearer level. For example, if the original images come from the network, you would store data/images in your file system so that next time your app won't need to go through the network to get the data/images again. Likewise, if the data/images are already on the file system, you would store it in memory so that your app can calculate or display these data/images immediately when necessary.
Cache hit happens when your app looks for a specific data/image and it finds it in the cache and loads it directly from there. This is a good thing because it doesn't require loading it from the original source. In other words, it saves time.
The cache hit ratio is usually used to determine if your algorithm is a good or bad one. It is usually combined with how much size you save whenever a cache hit happens. For example, if you decide to store small images with 4kb per image and you have a cache hit around 90%, then you already save 3.6kb. This savings helps you to improve the loading time from both network and file activities. However, if you save big images (200kb per image) and you have a cache hit around 10%, you already save the cost of loading 20kb. Another perspective to consider is how your users feel about the performance: if users know that they will be receiving a big image and are willing to wait for the network, then it's fine to not cache that image often.
Cache miss happens when your app looks for a specific image/data and it can't find it in the cache and so needs to retrieve it from the original place. This can be a bad thing for your app, for example, when the app is not connected to the Internet. When the app is not connected to the Internet and many cache misses happen, the app may show blank image/data to users. To avoid this, you usually need to cache many images and data when the app is online to show them offline.
Retrieval cost is the cost to load image/data from network/file to the next data level. This can be separated into two cases.
Storage cost is also divided into two cases: storing in file and storing in memory.
“There are only two hard problems in Computer Science: cache invalidation and naming things.”
— Phil Karlton
As this statement suggests, cache invalidation is a hard problem. When developers cache images/data, it can become out of date quickly. The problem is how to know if images/data are out of date and when to check for the latest data.
For images, if every image is uniquely named by aURL/name, then when that URL/name changes (e.g. user changes his avatar), your application will immediately know that the image is out of date and the application can get the new image.
However, if the URL/name of image doesn't change, you can set a specific period of time after which you get a new image, such as 3-7 days. Happily, most web services will take care of this and will change the URL when the image is actually changed.
For data, it's harder. If you cache the data in your file system (by using a database or plain text; I will explain more about this later in this chapter), then you may never know if your data is out of date. If every time you use the data you have to go to the network to check if there is new data in the server, you lose all the benefits of caching data within your file system.
There are a couple of ways to solve this cache invalidation problem.
Usually developers use a combination of the first and third approach because it makes more sense. You can only reduce the size of data so much; after that, you can't reduce it any more. The third approach can be done easily with a refresh button, and the app can reload for new data when the app is open or is closed, which can also be done easily.
Replacement policy is a strategy (usually implemented by a specific algorithm) to determine which piece of data/image will get deleted if necessary. It can also tell you at what time you should check to delete cache to keep a fresh cache base. There are four main things you should always consider when you pick one replacement policy over other policies.
I will cover some very basic and easy-to-implement algorithms like random replacement, first in first out and simple time-based plus more complex algorithms like least recently used and least frequently used.
This is a theoretical algorithm that states that if a piece of information is not necessary in the future, the app should go ahead and delete it. Why do I say that this is a theoretical algorithm? Because in many real-world cases, you never know if that piece of information will be necessary in the future or not. For example, say the web site changes the image to some new image and the app figures out that it doesn't need to use the old image anymore. Bing! The image gets deleted. Then the web designer changes his mind and replaces back the old image. Now the app needs to reload the old image. Therefore, most of time (but not always), Belady's algorithm is not practical and actually impossible to implement with real code. However, this algorithm is used as a benchmark to compare other algorithms.
There is not much to tell about this algorithm. You delete cache based on some random access.
Advantage:
Disadvantage:
Implementation:
File:
// File Random Replacement
NSFileManager *fileManager = [NSFileManager defaultManager];
NSString *filePath = NSTemporaryDirectory();
NSDirectoryEnumerator *fileNames = [fileManager enumeratorAtPath:filePath];
NSString *firstFileName = @"";
for (NSString *fileName in fileNames) {
firstFileName = fileName;
break;
}
[fileManager removeItemAtPath:firstFileName error:nil];
Memory: For memory caching, you will usually use a dictionary that binds a unique URL/name to the object. This helps you retrieve the data easily by passing the unique URL/name to the dictionary. In this sample, I use cacheDictionary
to store the memory data.
// cacheDictionary is a dictionary to store a map between a name of image and the
// image itself
NSObject *firstObj = nil;
for (NSObject *obj in [cacheDictionary allKeys]) {
firstObj = obj;
break;
}
[cacheDictionary removeObjectForKey:firstObj];
In short, this algorithm tells you that the first one come into the cache will be the first one gets deleted.
Advantages:
Disadvantage:
Implementation:
File: You need to base it on the creation date of the files. You will need to find the creation date of all files and get the oldest creation date to delete.
NSFileManager *fileManager = [NSFileManager defaultManager];
NSString *filePath = NSTemporaryDirectory();
NSDirectoryEnumerator *fileNames = [fileManager enumeratorAtPath:filePath];
NSString *smallestDateFilePath = @"";
for (NSString *fileName in fileNames) {
NSString *uniquePath = [filePath stringByAppendingPathComponent:fileName];
NSDictionary* attributes = [fileManager attributesOfItemAtPath:uniquePath error:nil;
NSDate *createdDate = [attributes objectForKey:NSFileCreationDate];
// I will let you find the smallest createdDate yourself as a small exercise
// if (createdDate is smallest) {
// smallestDateFilePath = uniquePath;
// }
}
[fileManager removeItemAtPath:smallestDateFilePath error:nil];
Memory: It can be built with a dictionary to keep track of all data/images and an array to keep track the order of the data/image. I use an array cacheOrders
to store the list of unique names of images. This array stores data in an ordered manner: the oldest item is at index 0 and the newest item is at the end of the array.
// cacheDictionary is a dictionary to store a map between a name of image and the
// image itself
NSString *firstName = [cacheOrders objectAtIndex:0];
[cacheDictionary removeObjectForKey:firstName];
[cacheOrders removeObjectAtIndex:0];
This algorithm is based mainly on time. You specify a basic period of time for a cache. After that specific period of time, the app checks how long a cache exists (the age of the cache). If the cache is older than a specific age (for example, 14 days), the cache is automatically deleted.
Advantages:
Disadvantages:
Implementation: Usually used with file system cache rather than memory cache.
File: You need to base it on the creation date of the files. You will need to find the creation date of all files and extract the files that have creation date older than your specified date. Then you delete those files.
NSFileManager *fileManager = [NSFileManager defaultManager];
NSString *filePath = NSTemporaryDirectory();
NSDirectoryEnumerator *fileNames = [fileManager enumeratorAtPath:filePath];
NSTimeInterval maximumTimeInterval = 7 * 24 * 3600; // 7 days caching
for (NSString *fileName in fileNames) {
NSString *uniquePath = [filePath stringByAppendingPathComponent:fileName];
NSDictionary* attributes = [fileManager attributesOfItemAtPath:uniquePath error:nil;
NSDate *createdDate = [attributes objectForKey:NSFileCreationDate];
if ([createdDate timeIntervalSinceDate:[NSDate date]] > maximumTimeInterval) {
[fileManager removeItemAtPath:uniquePath error:nil];
}
}
Next are the more difficult caching algorithms that you probably won't ever need to use. From my own experience, because the file storage can be really big, and you may only need to delete it once per several months, you can just use the simple time-based algorithm. For memory, because the memory is really limited in terms of storage but fast in terms of retrieval, you may need to use FIFO or LFU.
This algorithm is more complex than the algorithms covered up to this point. Imagine that you have a list of items. Whenever the same item is requested, you put that item at the head of the list. When you need to delete an item, you delete an item from the tail of the list. Table 4-1 displays an example array of four items.
As you can see in step 5, because a new item comes in (called 5) and I can only store four items inside the list, I have to delete item 3. But in a later step, item 3 is requested, and I don't have enough space to cache it, so I delete item 1.
Advantage:
Disadvantage:
Implementation: It can be implemented both with file and memory. But as mentioned, unless you need to store lots of data (especially images/videos) in file system and frequently need to access/delete them, you will not need to use this algorithm.
Memory: I'm only showing you one way to implement this algorithm so that, but it might not be the best way to do it. However, I hope that you get the main idea.
// cacheArrays is an array that stores cached items in an attempt to make the least
// recently used object to be at the end of the array.
NSString *requestedItem = @"5";
if ([cacheArrays containsObject:requestedItem]) {
// If the caller requests an existing item, I add that item to the top of the
// array
[cacheArrays removeObject:requestedItem];
[cacheArrays insertObject:requestedItem atIndex:0];
} else {
// If the code requests a new item that does not exist, I remove the last item
// (if the cacheArrays is full) and add that new item into the top of the array.
[cacheArrays removeLastObject];
[cacheArrays insertObject:requestedItem atIndex:0];
}
This algorithm is a little bit different than the least recently used algorithm. It focuses on how often an item is requested. If an item is requested more often than other items, it should be kept on the cache. Table 4-2 demonstrates the least frequently used algorithm.
As you can see, with the same initial list and the same list of item requests, these two algorithms generate different results at the end.
Advantage:
Disadvantage:
Implementation: It can be implemented both with file and memory. But, as mentioned, unless you need to store lots of data (especially images/videos) in file system and frequently need to access/delete them, you won't need to use this algorithm.
Memory: As a quick exercise, write a small objective-C code to implement the algorithm. Don't look at the hints below and finish the algorithm yourself.
Again, try to do it yourself as an exercise before moving to the next page to see the hints.
HINTS: You should use two dictionaries, one to keep track of the name → object of the cache and the other of the name count of access. Whenever a new request comes in, you increase the count and check if you need to delete the old cache. If you need to delete the old cache, then you should choose the cache name with the smallest count of access.
Let's try some sample exercises to demonstrate some important points. As stated in the replacement policy, you will have to face four main issues when deciding your caching policy: cache hit ratio, latency, storage constraint, and algorithm complexity. The algorithm complexity depends on your capability; if you are a good developer, it may not have much effect on the decision over other factors.
Imagine that you have two file-caching scenarios related to cache hit ratio and retrieval constraint.
As you can see, the storage cost for both strategies is the same: 30MB. The cache hit ratio of the first strategy gives you 21KB while the other strategy gives you 50KB. In terms of retrieval cost, you already saved more bandwidth and loading time with the second strategy.
Table 4-3 offers a quick summary of the terms you just learned.
Developers should only care about caching few things: images, videos, data, or sometimes HTML files so they can be loaded quickly in UIWebView.
I separate them out into two main types: either you cache the whole file or you cache data and the relationship between that data. For each kind, I will show you some important features that you need to remember when you store those files.
For files that you don't store inside your bundle, you have to think about where you should store your files. In other words, where should you cache your files so that you can retrieve them and not worry about its lifetime? There are some classic locations where you can store your files: tmp directory, cache directory, documents directory, photo albums (for images/videos only), and application bundle. Each of them has a unique characteristic that you need to know and remember.
You can access and store data inside this directory by calling NSTemporaryDirectory();
Advantages:
Disadvantages:
Usage:
You can access and store data inside this directory by using the following code snippet:
+ (NSString *)userCacheDirectory {
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES);
return [paths objectAtIndex:0];
}
Advantages:
Disadvantages:
Usage:
You can access and store data inside this directory by using the following code snippet:
+ (NSString *)userCacheDirectory {
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
return [paths objectAtIndex:0];
}
Advantages:
Disadvantages:
Figure 4–5. List of documents inside the document folder
Document
directory. It's a good place to download documents or photos from a web service that users always want to keep with them.This directory belongs to the OS and will be shared between many applications. You are allowed to store photos and videos inside these albums. The main application for users to see all photos and videos in an iOS device is the Photos app, as shown in Figure 4–6.
To store photos and videos inside the Photos app, use the appropriate code snippet.
UIImageWriteToSavedPhotosAlbum (imageToSave, nil, nil, nil).
Note that the imageToSave
is an UIImage
variable.
For videos:
// I assume that this video will be downloaded from the Internet
NSData *videoData = [self getVideoDataFromNetwork];
NSString *moviePath = [NSTemporaryDirectory()stringByAppendingPathComponent: @”video.mp4”];
UISaveVideoAtPathToSavedPhotosAlbum (moviePath, nil, nil, nil)
Advantages:
Disadvantages:
Usage:
Images and videos will be bundled to your app when you develop the app and will be installed with the app, as shown in Figure 4–7.
To get files, you can use the following code snippets:
Images:
UIImage *image = [UIImage imageNamed:@"check_icon.png"];
Other Files:
NSString *path = [[NSBundle mainBundle] bundlePath];
NSString *questionsPath = [path stringByAppendingPathComponent:@"Questions.plist"];
NSArray *questionData = [NSArray arrayWithContentsOfFile:questionsPath];
Disadvantages:
Usage:
For data caching, you may have to handle structured data; it's up to you to decide how and where to store that data. Here are the main ways to store your data.
You can store your structured data in a simple form with the plist or XML format; this can be visualized easily, as shown in Figure 4–8.
You can also see the data in XML format, as shown in Figure 4–9.
You can load and parse the plist file by using the following code:
NSString *path = [[NSBundle mainBundle] bundlePath];
NSString *questionsPath = [path stringByAppendingPathComponent:@"Questions.plist"];
NSArray *questionsData = [NSArray arrayWithContentOfFile:questionsPath];
Advantages:
Disadvantages:
Usage:
CoreData is a strong and powerful built-in framework provided by Apple to help developers manage and control their data easily. It creates a layer of object-oriented abstraction over the relational relationship problem between data. You can create your object relational model using drag and drop. Figure 4–10 shows an example of a CoreData model between questions, answers, and their categories.
CoreData will also generate code automatically. Figures 4–11 and 4–12 show you the code that is generated for the object question in Figure 4–10. CoreData will take care of all actions from storing, retrieving, and managing relationship between objects. There are great books on CoreData if you're interested in more information on using it to store data.
Advantages:
Disadvantages:
Usage:
CoreData uses SQLite as its backend storage. However, CoreData can be implemented with other SQL database frameworks, so it's hard to say if CoreData is more powerful than SQLite. Using SQLite is just like using other SQL database frameworks, with a few minor changes here and there.
Advantages:
Disadvantages:
Usage:
In general, I can't provide any techniques or offer any advice on when you should check the cache. This decision depends on the environment in which you are working; in other words, an iOS environment is different than a web environment. In a web server, you can run a check method in a loop to see if any cache is outdated and delete it. In an iOS environment, this is time- and CPU-consuming.
When you should check cache contents and delete them depends much on the selected algorithm that you use to cache them. For example, if you mainly use FIFO, LRU, or LFU, you may need to wait until a new request comes in. Or you can check if the cache is full; if it's full, you can delete the old cache based on your algorithm. This approach is usually used for memory caching where you need more precise algorithm and have a strict storage constraint.
For file caching, you will usually choose a simple time-based algorithm. File caching does not have a strict limited storage capacity like memory does. With a simple time-based algorithm, you don't need to worry too much about every single file because you may delete a set of files at the same time.
If you expect that you won't need much time to do calculations or when retrieving file information to check for the attributes of the cache, you may not to worry about when you check and delete them at all. Again, here is an important lesson: don't over-optimize. If you think your approach can cause a bottleneck, you can try to benchmark it. If the current approach runs well, it doesn't matter if your app checks and deletes cache right at the beginning, at the end, or the first time the app requests and stores some new cache.
Here are some main approaches:
Here are some specific details about how to do memory caching and the specific problems you may encounter when you do memory caching in iOS environment.
Because there is a strict storage constraint in memory, you can't and shouldn't cache too many images here. If you do, the iOS run-time environment will keep giving you memory warnings until your app is forced to close.
Some people are scared of storing in memory because of the limited memory environment. Not to worry: you do have memory to use; you just need to learn how to utilize and maximize your memory capability to improve your performance. If you're concerned, don't forget to double check if you are using too little or too much memory via the instruments you learned about in Chapter 2, as shown in Figure 4–13.
If you see that you are using too little memory compared to the capability of the memory environment, don't hesitate to use more. The more you can cache data and images in memory, the better performance your app will get. However, if you are using too much memory, be aware that your app may get memory warning or to be forced to close.
For memory caching, you may need to think about whether you want this caching to be available to any class and methods within your application or only strictly to some classes and methods. If this is global access, anybody can change the cache whenever they want. However, if you keep the caching data so private and hard to access, the cache becomes useless because some classes or methods may need the data and will just request it from the server again. The actual software engineering term is data encapsulation, which means you should protect your data and only share it with the classes/methods that need it.
Global Access:
You may need to use static to define global object.
#import <UIKit/UIKit.h>
@interface MyObject {
}
static NSMutableDictionary *imagesCaching;
@end
Strict Access:
#import <UIKit/UIKit.h>
@interface RootViewController : UITableViewController {
@private
NSMutableDictionary *imageCaches;
}
- (void)cacheImage:(UIImage *)image withName:(NSString *)uniqueURL;
- (void)getCacheImageWithName:(NSString *)uniqueURL;
@end
#import "RootViewController.h"
@implementation RootViewController
- (void)cacheImage:(UIImage *)image withName:(NSString *)uniqueURL {
// main code goes here
}
- (void)getCacheImageWithName:(NSString *)uniqueURL {
// main code goes here
}
@end
Preloading is when you load the image before you actually need it. The good thing about this is that it saves time when you actually need to display the file/image. The difficult part is that you may need to guess if you need to preload the images. Here's a simple case where you may need to preload images for faster performance (Figure 4–14).
As Figure 4–14 shows, for a view like PageView where user wants to have a smooth scrolling experience, preloading images into cache is the best way to keep the scrolling smooth. Otherwise, the user either has to wait after scrolling to the view (for multithreading or loading the image after stopping in the view) or they have a stuck scrolling experience
Another way to load the images into cache is called just in time. This saves the actual bandwidth or CPU-loading process until you are sure that you really need that data/images. The bad part about this is that it may slow down and keep your users waiting for the data. This approach is good if you need really big data/images that will consume a lot of memory and your users are willing to wait for a couple of seconds.
Figure 4–15 is an example where the user knows that he may need to wait for a couple of seconds in order to get the big image view. If you load from file to memory to display, the loading time will be quite fast; therefore you may not need to worry about preloading the images into memory. Another problem is if the image is big, preloading it into memory will cost your app a huge amount of memory.
Based on my tests, you can see that the performance from network is much slower than performance from file and the performance from file is much slower than performance from memory. You should always try to identify the bottleneck with instrument tools before spending too much of your time to optimize some parts. In other words, please avoid over-optimization.
You learned about basic topics in caching such as cache hit, cache miss, storage cost, retrieval cost, replacement policy, and several well-known algorithms. Some of them are really basic but not efficient; some of them can be efficient but cost time and effort to implement. You will probably use the random replacement algorithm for simplicity and LRU or LFU for complex cases.
You also learned about all the possible ways to store cache and the sceanrios in which to use each approach. You can store your data in the temporary folder, the cache folder, or document folder based on the specific purpose of your requirement. You may also choose to use the photos album so you can easily share it with other applications. You also learned the techniques to store/cache data inside your app and the different results, advantages, and disadvantages.
Lastly, you learned about memory caching. Should you allow global access or strict access to your memory cache data? Should you preload data into memory or load them just in time? Both offer benefits and drawbacks that you will need to consider carefully before making your decision.
EXERCISES