We’ll start by defining our TextQuery
class. The user will create objects of this class by supplying an istream
from which to read the input file. This class also provides the query
operation that will take a string
and return a QueryResult
representing the lines on which that string
appears.
The data members of the class have to take into account the intended sharing with QueryResult
objects. The QueryResult
class will share the vector
representing the input file and the set
s that hold the line numbers associated with each word in the input. Hence, our class has two data members: a shared_ptr
to a dynamically allocated vector
that holds the input file, and a map
from string
to shared_ptr<set>
. The map
associates each word in the file with a dynamically allocated set
that holds the line numbers on which that word appears.
To make our code a bit easier to read, we’ll also define a type member (§ 7.3.1, p. 271) to refer to line numbers, which are indices into a vector
of string
s:
class QueryResult; // declaration needed for return type in the query function
class TextQuery {
public:
using line_no = std::vector<std::string>::size_type;
TextQuery(std::ifstream&);
QueryResult query(const std::string&) const;
private:
std::shared_ptr<std::vector<std::string>> file; // input file
// map of each word to the set of the lines in which that word appears
std::map<std::string,
std::shared_ptr<std::set<line_no>>> wm;
};
The hardest part about this class is untangling the class names. As usual, for code that will go in a header file, we use std::
when we use a library name (§ 3.1, p. 83). In this case, the repeated use of std::
makes the code a bit hard to read at first. For example,
std::map<std::string, std::shared_ptr<std::set<line_no>>> wm;
is easier to understand when rewritten as
map<string, shared_ptr<set<line_no>>> wm;
TextQuery
ConstructorThe TextQuery
constructor takes an ifstream
, which it reads a line at a time:
// read the input file and build the map of lines to line numbers
TextQuery::TextQuery(ifstream &is): file(new vector<string>)
{
string text;
while (getline(is, text)) { // for each line in the file
file->push_back(text); // remember this line of text
int n = file->size() - 1; // the current line number
istringstream line(text); // separate the line into words
string word;
while (line >> word) { // for each word in that line
// if word isn't already in wm, subscripting adds a new entry
auto &lines = wm[word]; // lines is a shared_ptr
if (!lines) // that pointer is null the first time we see word
lines.reset(new set<line_no>); // allocate a new set
lines->insert(n); // insert this line number
}
}
}
The constructor initializer allocates a new vector
to hold the text from the input file. We use getline
to read the file a line at a time and push each line onto the vector
. Because file
is a shared_ptr
, we use the ->
operator to dereference file
to fetch the push_back
member of the vector
to which file
points.
Next we use an istringstream
(§ 8.3, p. 321) to process each word in the line we just read. The inner while
uses the istringstream
input operator to read each word from the current line into word
. Inside the while
, we use the map
subscript operator to fetch the shared_ptr<set>
associated with word
and bind lines
to that pointer. Note that lines
is a reference, so changes made to lines
will be made to the element in wm
.
If word
wasn’t in the map
, the subscript operator adds word
to wm
(§ 11.3.4, p. 435). The element associated with word
is value initialized, which means that lines
will be a null pointer if the subscript operator added word
to wm
. If lines
is null, we allocate a new set
and call reset
to update the shared_ptr
to which lines
refers to point to this newly allocated set
.
Regardless of whether we created a new set
, we call insert
to add the current line number. Because lines
is a reference, the call to insert
adds an element to the set
in wm
. If a given word occurs more than once in the same line, the call to insert
does nothing.
QueryResult
ClassThe QueryResult
class has three data members: a string
that is the word whose results it represents; a shared_ptr
to the vector
containing the input file; and a shared_ptr
to the set
of line numbers on which this word appears. Its only member function is a constructor that initializes these three members:
class QueryResult {
friend std::ostream& print(std::ostream&, const QueryResult&);
public:
QueryResult(std::string s,
std::shared_ptr<std::set<line_no>> p,
std::shared_ptr<std::vector<std::string>> f):
sought(s), lines(p), file(f) { }
private:
std::string sought; // word this query represents
std::shared_ptr<std::set<line_no>> lines; // lines it's on
std::shared_ptr<std::vector<std::string>> file; // input file
};
The constructor’s only job is to store its arguments in the corresponding data members, which it does in the constructor initializer list (§ 7.1.4, p. 265).
query
FunctionThe query
function takes a string
, which it uses to locate the corresponding set
of line numbers in the map
. If the string
is found, the query
function constructs a QueryResult
from the given string
, the TextQuery file
member, and the set
that was fetched from wm
.
The only question is: What should we return if the given string
is not found? In this case, there is no set
to return. We’ll solve this problem by defining a local static
object that is a shared_ptr
to an empty set
of line numbers. When the word is not found, we’ll return a copy of this shared_ptr
:
QueryResult
TextQuery::query(const string &sought) const
{
// we'll return a pointer to this set if we don't find sought
static shared_ptr<set<line_no>> nodata(new set<line_no>);
// use find and not a subscript to avoid adding words to wm!
auto loc = wm.find(sought);
if (loc == wm.end())
return QueryResult(sought, nodata, file); // not found
else
return QueryResult(sought, loc->second, file);
}
The print
function prints its given QueryResult
object on its given stream:
ostream &print(ostream & os, const QueryResult &qr)
{
// if the word was found, print the count and all occurrences
os << qr.sought << " occurs " << qr.lines->size() << " "
<< make_plural(qr.lines->size(), "time", "s") << endl;
// print each line in which the word appeared
for (auto num : *qr.lines) // for every element in the set
// don't confound the user with text lines starting at 0
os << " (line " << num + 1 << ") "
<< *(qr.file->begin() + num) << endl;
return os;
}
We use the size
of the set
to which the qr.lines
points to report how many matches were found. Because that set
is in a shared_ptr
, we have to remember to dereference lines
. We call make_plural
(§ 6.3.2, p. 224) to print time
or times
, depending on whether that size is equal to 1.
In the for
we iterate through the set
to which lines
points. The body of the for
prints the line number, adjusted to use human-friendly counting. The numbers in the set
are indices of elements in the vector
, which are numbered from zero. However, most users think of the first line as line number 1, so we systematically add 1 to the line numbers to convert to this more common notation.
We use the line number to fetch a line from the vector
to which file
points. Recall that when we add a number to an iterator, we get the element that many elements further into the vector
(§ 3.4.2, p. 111). Thus, file->begin() + num
is the num
th element after the start of the vector
to which file
points.
Note that this function correctly handles the case that the word is not found. In this case, the set
will be empty. The first output statement will note that the word occurred 0 times. Because *res.lines
is empty. the for
loop won’t be executed.
Exercise 12.30: Define your own versions of the TextQuery
and QueryResult
classes and execute the runQueries
function from § 12.3.1 (p. 486).
Exercise 12.31: What difference(s) would it make if we used a vector
instead of a set
to hold the line numbers? Which approach is better? Why?
Exercise 12.32: Rewrite the TextQuery
and QueryResult
classes to use a StrBlob
instead of a vector<string>
to hold the input file.
Exercise 12.33: In Chapter 15 we’ll extend our query system and will need some additional members in the QueryResult
class. Add members named begin
and end
that return iterators into the set
of line numbers returned by a given query, and a member named get_file
that returns a shared_ptr
to the file in the QueryResult
object.