Vladimir B. Balakirsky† and Anahit R. Ghazaryan, State University of Aerospace Instrumentation, St-Petersburg, Russia
We propose a network-type scheme of private information retrieval, presented as a modification of the conventional setup, where the user is replaced with two users, the user-sender and the user-receiver. As a result of communication, the user-receiver becomes informed about the bit located at a certain position of the database, owned by the servers. Each server receives a query from the user-sender that contains information about the position in a hidden form and the server cannot disclose this position. On the basis of the query and the database, each server forms the replica, which is then transmitted to the user-receiver. By combining replicas, the user-receiver decodes the retrieved bit. We present a simple algebraic scheme where the communication complexity and the computational complexity are expressed as functions of the logarithm of the database size. The approaches allow extensions to the one-server scheme, the multi-scheme with noisy replicas of a fixed number of servers, and the authentication of a certain fragment of the database.
privacy; cryptography; networking; encoding; decoding
Private information retrieval schemes are cryptographic protocols designed to safeguard the privacy of database users. They allow clients to retrieve records from public databases while completely hiding the identity of the retrieved records from database owners [1]. One of the earliest references for problems of this sort belongs to Rivest et al. [2]. Formalization of the private retrieval problem, which was studied by many authors, belongs to Chor et al. [3] and a number of surveys are available on the Internet. If the database is owned by one server and information-theoretic privacy is required, then there is no better solution than the trivial one when the server transmits the whole database to the user. However, if constraints on the privacy of the retrieval are relaxed, there are algorithms based on the use of computationally hard problems that have to be solved by the server to discover the data and the use of the so-called one-way hash functions. An implementation of these algorithms is usually time-consuming for the database user. Solutions to the multi-server private information retrieval problem are based on the encoding of the content of the database, as shown by Chor et al. [3]. In particular, Woodruff-Yekhanin [4] proposed solutions that are based on algebraic encoding.
We use the ideas of the Woodruff-Yekhanin scheme and present a simple algebraic solution to a variant of the multi-server private information retrieval problem. Moreover, the database user in our scheme can be split between the user-sender, who sends the query to the servers, and the user-receiver, who decodes the retrieved bit. This network-type organization of information retrieval brings additional possibilities. As the server, which makes the decision associated with the corresponding bits, does not emit energy, its location cannot be discovered. Furthermore, the query is randomized by the user-sender, but parameters of the randomization should not be delivered to the user-receiver, which decodes the retrieved bit without knowledge of the query.
The chapter is organized as follows: In the first section we describe the data processing scheme and discuss its constraints and complexities. In the second section we present algorithmic aspects of the approach, while we discuss their algebraic background in the third sections. Possible extensions of the setup are briefly discussed in the concluding section.
We will consider the data processing scheme containing the user-sender, the user-receiver, and servers. Each server has a binary vector, , which is interpreted as content of the database. The user-sender chooses an index, . The user-receiver wants to know the bit , while servers have to be ignorant about the index
We fix an integer and assume that where is the function specified later, which is determined by the structure of the -ary extension of the Galois field , and design a scheme parameterized by the integers chosen in such a way that and The user-sender sends a query formed as a list of binary matrices, where the matrix is constructed by application of a deterministic function of and a randomly chosen binary matrix that is, for all The -th server computes the replica, expressed as a binary column vector of length by application of a deterministic function of the query and the vector that is, for all The scheme is constructed in such a way that the value of and the binary vector of length , obtained by the concatenation of replicas allows the user-receiver to decode bit
The communication complexity of the scheme is defined as the total number of bits transmitted over the channels user-sender → servers → user-receiver. As and are lengths of the query and the replicas, the communication complexity can be expressed as Comp We are also interested in the quantity since is the number of bits needed to specify the index when no constraints on privacy of the retrieval are included. The computational complexity is understood as the total number of arithmetic operations performed by the user-sender, the servers, and the user-receiver. As it will follow from the description of the scheme, all operations are reduced to multiplications of binary matrices and their number is linear in
The approximation and the approximation to the function given above, bring and Therefore, Comp where The numerical illustration of the line
We assume that and introduce as the set of column-vectors whose components, denoted by belong to the set and for all Let us construct a one-to-one mapping which will be referred to as the -encoding. This can be done by the following lexicographic algorithm: (1) Set for all output the vector having components and the set (2). For all find the minimum index such that where , and set equal to if respectively. Output the vector having components and the set
Let be the -ary extension of the Galois field constructed using a primitive polynomial of degree and let be the primitive element defined as the root of the polynomial that is, As each element can be uniquely expressed by a linear combination of the elements components of the column-vector which specifies the binary representation of the element are determined by the equality
Moreover, implies and vice versa for all and
For any vector let
where .
Let the encoding algorithm be organized on the basis of binary matrices We also introduce the binary matrix , where the vector is constructed in such a way that the 0-th component is equal to 1 or 0, if or respectively. All other components of the vector are 0s. Let the encoding be defined as dependent transformations
for all
Let the -th server run the following algorithm: (1) Represent the matrix by the vector (2) Compute and transmit the binary column-vector to the user-receiver.
The user-receiver uses the received replicas and the binary matrix , determined by the primitive polynomial , and runs the following algorithm: (1) Construct the binary column-vector of length by concatenating column-vectors (2) Compute the binary column-vector and output the decision if is the all-zero vector and otherwise.
The testing program contains the subroutines:
and it should be organized according to a certain scenario of generating the index the matrix and the vector The subroutines above use the matrices and determined by the primitive polynomial
In the following numerical illustration, we assume that and use the hex-decimal notation for binary vectors.
Suppose that is the primitive polynomial, which is used to construct Then the elements of are determined by the vector
We set and consider the retrieval of a bit in the binary vector of length . If the (4,2)-encoding is organized according to the lexicographic algorithm, then
In particular, if then and (0,8,8,0). The columns of the matrices are defined as binary representations of four powers of the elements that is,
Suppose that Then
and
The 4×4 matrices are transmitted to the first server, the second server, and the third server, respectively. Having received the matrix the -th server forms the vector Thus,
Let 10001. As we write Therefore,
The -th server sends the vector to the user-receiver, who concatenates the received vectors and constructs the column-vector The product of the matrix
by the vector is equal to the all-zero column-vector of length 4. Therefore, the user-receiver decides that Rules for constructing the decoding matrix will be specified in the next section.
Let be the vector of the maximum length whose components belong to the set and satisfy the condition where Under additional constraint that is the minimum element of the set the vector is uniquely determined, and it is known as the vector specifying leaders of cyclotomic classes or chords of size We also denote and for all Notice that all coefficients of the polynomial belong to For example, if then and
For all let us introduce the matrix and let bin for all For example, if and then
For a binary column-vector having components let
where if and if One can easily see that bin are binary representations of the values of the polynomial at We also denote
Notice that if and if where
is the polynomial of degree at most having the 0-th coefficient equal to 0. The 0-th coefficient of the polynomial is equal to 1. We force the user-receiver to solve the problem of constructing the bit on the basis of the values which is formulated as finding the solution to the two-hypotheses testing problem: The scheme should be designed in such a way that this problem has a unique solution in the following sense. Suppose that the system consisting of equations uniquely specifies the 0-th coefficient of the polynomial of degree at most The user-receiver outputs this coefficient as bit
Let us extend the notation of the previous subsection and denote
We also introduce as the vector of length whose components are different and belong to the set As the possible vector is Let and bin be the matrices constructed by the row concatenation of the matrices and respectively.
Consider the system of linear equations or, equivalently, the system
where is a binary column-vector of length . These systems have a unique solution for or, equivalently, the matrix bin has rank because there does not exist a polynomial of having the degree at most and the 0-th coefficient equal to 0, which is divisible by the product Let us denote the solution by and let Construct the vector bin and notice that
Therefore, if then the matrix which is used by the user-receiver, can be constructed by the concatenation of the binary identity matrix and the product of the matrix bin() by the inverse matrix for the matrix bin.
Implementation of the proposed simple data processing scheme, where all operations are reduced to matrix multiplications, can meet practical difficulties, as we require many servers to have the identical database, and all servers should simultaneously carry out any update of the database. However, our algorithms are prepared for extensions and used in the situation when the setup with the user-sender, servers, and the user-receiver is modified. Here we mention a few extensions:
1. There is a one-server scheme and the server, having received the query, forms the replicas from which he knows the bit transmitted to the user-receiver (actually, the decoding can be assigned to the server in this case). In other words, there is some information leakage in the retrieval. However, if the database contains approximately equal numbers of 0s and 1s, then the knowledge that the index belongs to the subset of cardinality does not seem to be essential. The problem is to find the encoding algorithm, which does not allow further reduction of the ambiguity about the index by analyzing the query. Algebraic approaches, described for the -server scheme, can be used to attack this problem.
2. Some of replicas in the scheme with servers can be incorrect. The case appears when some servers do not update the content of the database. This situation leads to the problem of decoding bit when at least replicas are generated by servers having identical databases.
3. The user-receiver wants to know whether a certain fragment of the database coincides with the fixed vector or not. In particular, the user-receiver can be interested in the solution to the problem when the fixed vector is the all-zero vector. A more efficient solution than asking the server(s) about all bits of the fragment can be found in this case.
We believe that the investigations in the directions above should start with the understanding of the approaches for the setup that is presented in this chapter.
In this chapter, we present simple algebraic solutions to some multi-server private information retrieval problems. Approaches are general, and they are useful for the development of other private information retrieval schemes.
1. Yekhanin S. A locally decodable codes and private information retrieval schemes. Foundations and Trends in Theoretical Computer Science. 2011;7(1):1–117.
2. Rivest R, Adelman L, Dertouzos M. On database and privacy homomorphism. Foundations of Secure Computation 1978;168–177.
3. Chor B, Goldreich O, Kushilevitz E, Sudan M. Private information retrieval. Proceedings of the thirty sixth Annual Foundations of Computer Science; 1995. p. 41–50. Also in Journal of the ACM 1998;45: 965–81.
4. Woodruff D, Yekhanin S. A geometric approach to information-theoretic private retrieval. Proceedings of the twentieth IEEE Computational Complexity Conference 2005;275–284.
†Deceased