Chapter 18

3D Model Retrieval

18.1. Introduction

The use of three-dimensional (3D) models in multimedia data is expanding rapidly. The appearance of 3D scanners, the increasing power and simplicity of 3D creation tools and the arrival of 3D television (3DTV) all make it increasingly easy to create realistic and detailed 3D models. One question raised by the designers of virtual worlds concerns the best use of huge existing collections of 3D models, and efficient retrieval mechanisms for use with these collections.

The most immediately apparent solution to this problem is one already used to index images (by Google Images, for example): textual collection indexing, whereby each element in a collection is described using one or more keywords. This solution, although it appears simple, presents a number of drawbacks, including the time involved in tagging or labeling a collection and the subjective nature of keywords, which will inevitably be linked to the perception and culture of the tagging operator.

The most efficient method, and certainly the most intuitive method for final users of the retrieval system, is known as search by example. This method consists of characterizing 3D models using shape descriptors which are invariant to certain geometric transformations, notably translations, rotations and homothetic transformations. The user expresses a request in the form of an example of the object to find in a collection. A metric is then used to compare descriptors in the request and those of the models in the collection. To compare the request to these models, the metric must express a distance, which translates as a visual distance between two objects. The process of retrieving 3D models from a data collection using an example is known as indexing.

3D models are represented in different ways according to needs: point clouds, surface or volumetric representations, etc. Surface representations, which only take account of the external surface of the represented object, are the most widespread. These representations generally take the form of a polygon mesh, which constitutes a discrete sampling of a continuous surface. Polygon meshes are ideal when representing a 3D object for a computer graphics card. In this chapter, we will focus on the representation of 3D models as polygonal, usually triangular, meshes.

We will also limit ourselves to consider the shape of objects. The colors and/or textures associated with the illustrations presented in this chapter are purely decorative, with no relevance or impact on the methods presented.

We will begin by discussing the general principles of 3D indexing before considering global 3D shape descriptors, 2D view descriptors and local 3D shape descriptors. We will then present the principles of 3D shape retrieval in video sequences, followed by methods for evaluating 3D indexing and recognition. Finally, we will discuss certain key applications of indexing and 3D recognition.

18.2. General principles of shape retrieval

The 3D model indexing process may be divided into two stages; before retrieval can be carried out, the collection needs to be prepared.

The first stage consists of indexing the collection of 3D models to search. During this stage, each 3D model is associated with one or more concise geometric descriptors describing the form of the object. These geometric descriptors must be invariant to geometric transformations such as translations, rotations and scaling (homothety). As none of these transformations affects the shape of 3D objects, they should also not affect geometric descriptors. These descriptors are known as 3D model indexes. This first stage is known as the offline stage, as it precedes the search stage. It may be relatively long, as calculations must be carried out for each 3D model in the collection; however, this is not a major drawback as the stage is carried out only once.

The second stage is the search by example itself. The user expresses a request – i.e. the desired 3D object – by presenting an example in one form or another: full 3D model, photograph or drawing of the object, etc., depending on the method used by the retrieval system. The system then calculates one or more geometric descriptors of the sort used in the offline stage. Using a distance, it then compares calculated descriptors with the indexes of each 3D model in the collection. Results are presented in the form of a list of 3D models by order of relevance, from that which most resembles the request (i.e. the 3D model with the minimum distance value) to that which is least like the request (i.e. the model with the maximum distance value). This second stage is known as the online stage, and is carried out for each new user request.

Figure 18.1 shows the two stages of 3D model indexing.

Figure 18.1. The two stages of 3D model indexing. The offline stage is carried out once, whereas the online stage is carried out for each request

ch18_image001

During the offline stage, the main difficulty lies in finding or combining efficient geometric descriptors which are invariant to geometric transformations. For the online stage, the use of a suitable mathematical distance, which may be assimilated to a visual distance, between 3D objects is the most critical aspect.

18.3. Global 3D shape descriptors

Global descriptors are based on a shape function which measures the geometric properties of a 3D object as a whole, breaking down an object into several elementary subobjects or calculating the distances between points on the surface of the object, etc.

18.3.1. Distribution of a shape descriptor

Osada et al. [OSA 02] divide 3D mesh model indexing into three steps: selection of a discriminant shape function, efficient construction of the distribution of this function for each model in the collection and calculation of similarity distances between distribution pairs.

The authors tested five shape functions, chosen on the basis of simplicity of calculation: measurement of angles between three random points on the surface of the 3D model, measurement of the Euclidean distance between a fixed point in the model (the center of gravity, for example) and a random point on the surface, measurement of the Euclidean distance between two random points on the surface of the model, measurement of the area of triangles of three random points on the surface of the model and measurement of the tetrahedral volume between four random points on the surface of the model.

These descriptors are invariant to geometric transformations. To obtain invariance to scaling, the authors chose to normalize distributions. Further testing was carried out using the third function (Euclidean distance between two random points on the surface of the model), known as D2, which is the best known and most efficient of these descriptors, showing good results for the chosen approach when used for pre-classification before application of a more precise, but more costly, search method.

18.3.2. Spherical harmonics

Spherical harmonics constitute a 3D shape descriptor based on the frequency decomposition of a 3D signal. This descriptor emerged as the result of work by the Leipzig team, who applied a Fourier transform to sphere S2 [SAU 01], applying the spherical harmonic formulas proposed in [HEA 96]. To remove the rotation invariance problem, the Princeton team then proposed the application of the spherical harmonic decomposition of spherical functions defined by the intersection of the surface of the 3D object with a set of concentric spheres [KAZ 02].

18.4. 2D view oriented methods

The general idea used in oriented views is that if two objects are similar, their views will also be similar from all viewpoints. Based on the fact that two objects that are similar in shape terms will have similar sets of extracted views, these methods can be used to classify 3D objects based on a 2D request, or using a 3D request model from which characteristic views are extracted. Two important issues must be addressed: the choice of views to represent the 3D object and the distance used to compare two 3D objects based on these views.

Chen et al. [CHE 03] proposed the light field descriptor method, where a 3D object is characterized by 10 sets of 10 orthographic views taken from the first 10 faces of a dodecahedron centered on the object. These views are then characterized by Zernike moments and Fourier coefficients; they may then be compared using a Euclidean distance. Filali et al. [FIL 07] suggested determining a number of views as a function of the complexity of the 3D object by using an information criterion. Subsequently, the problem of comparing two objects based on their views is formalized using a Bayesian approach. Figure 18.2 shows an example of results obtained using this method [FIL 07]. The image in the top left of the figure shows a 2D view of a 3D model of a human being. The figure shows the first 15 results provided by the search engine.

Figure 18.2. Example of results obtained using the 2D view oriented approach developed by the MIIRE team [FIL 07]

ch18_image002

18.5. Local 3D shape descriptors

Unlike global descriptors, which tend to describe the shape of 3D objects in a global manner, local descriptors take account of the local geometric characteristics of the shape of these objects. In this section, we will present the main local descriptors used for indexing and similarity research of 3D objects.

18.5.1. 3D shape spectrum descriptor

The 3D shape spectrum descriptor (3D-SSD) put forward by Zaharia and Prteux [ZAH 03] is defined as the distribution of the shape index across all facets of the mesh. The shape index is a well-known differential geometric criterion (see [KOE 90]), defined as follows:

Let p be a point on surface S, with curves ch18_image003 and ch18_image004 associated with each point p of the surface. The shape index ch18_image005 is defined for any point p by:

[18.1] ch18_image006

where ch18_image003 and ch18_image004 are the principal curvatures of the point p on the surface S. The shape index ch18_image005 allows us to group different local geometric forms into families of elementary surfaces such as cylinders, spheres, saddles, etc. While initially used as a global descriptor, the shape spectrum has also been used as a local descriptor to characterize surface details in 3D objects.

18.5.2. 3D shape contexts

Initially introduced for object recognition in 2D images, Körtgen et al. [KÖR 03] used shape contexts in 3D. Their idea was to draw N points, uniformly distributed across the surface of the 3D object, to which distributions characterizing the shape of the object are associated. Each point is associated with a histogram of the distances to the other N – 1 points. To conserve a notion of spatial coherence, the authors suggest partitioning the space into C cells according to three classic types of decomposition. Each point is localized by the cell in which it is located. Depending on the type of representation chosen (shell, sector or combined), a normalization of the object pose may or may not be required. The shape context descriptor is very large; descriptor comparison consists of reducing its size, using vector quantization techniques.

18.5.3. Spin images

Spin images were proposed by Johnson et al. [JOH 99] for 3D object recognition. Spin image representation takes account of the global shape characteristics of an object, while remaining robust to occlusion. Spin images are classified as local descriptors as they may be calculated for local characteristic points. For each characteristic point p of a mesh, a spin image is constructed by projecting the other points in the plane tangent to the object onto p. The size of the spin image is chosen arbitrarily. The similarity between two 3D objects may be calculated using the correspondence minimizing the sum of the distance between the spin images.

18.5.4. Heat kernel signature

The heat kernel signature (HKS) is a descriptor used to analyze the shape of deformable 3D objects. This descriptor belongs to the group of spectral analysis methods for 3D shapes. It may be calculated for a local or global surface of a 3D object.

The HKS was introduced by Sun et al. [SUN 09] in 2009 and is based on the heat kernel, a fundamental solution to the heat equation, which itself is based on the concept of heat diffusion across a surface. Given an initial heat distribution f0(p) over a surface, the heat kernel ht(p, q) describes the quantity of heat transferred between p and q after a time t. This description is invariant to isometric transformations. The HKS represents the geometric properties of a surface as a function of time t. For a low value of t, the HKS is considered to be a local descriptor; however, as t increases, the HKS increasingly characterizes the global geometric properties of the shape.

As ht(p, q) is defined for a pair of points in a temporal domain, the use of heat kernels in this form as a 3D descriptor can be complex. For this reason, Sun et al. [SUN 09] concentrated on the temporal domain alone, considering only ht(p, p). The HKS inherits most properties of heat kernels, under certain conditions.

The heat diffusion equation for a compact Riemannian manifold M (which may include a border) is given by:

[18.2] ch18_image016

where ∆ is the Laplace–Beltrami operator and f (p, t) is the heat distribution at a point p at instant t. The solution to this equation may be expressed as follows [SUN 09]:

[18.3] ch18_image095

The Eigen decomposition of the heat kernel is expressed as follows:

[18.4] ch18_image096

where λi and φi are the ith eigenvalues and eigenfunctions of ∆.

The heat kernel entirely characterizes the surface to within an isometry. For any surjective map T: MN between two Riemannian manifolds M and N, if ht(p, q) = ht(T(p), T(q)) then T is an isometry, and vice-versa. For the concise descriptor proposed by Sun et al. [SUN 09], the calculation of the HKS is restricted to the temporal domain:

[18.5] ch18_image098

Authors such as El Khoury et al. [ELK 12] have also used the HKS with various modifications: the diffusion distance is used to calculate characteristic points on the 3D model, then the commute-time distance is used to calculate local descriptors in relation to characteristic points.

18.6. Similarity between 3D shapes

In this section, we will present two approaches used to calculate the similarity between 3D shapes: reeb graphs and the bag-of-words method.

18.6.1. Reeb graphs

Structural approaches consist of representing a 3D object in the form of a graph. This representation allows us to code information linked to the general structure of 3D objects. As a graph characterizes the global structure of a 3D object, it may be used as a powerful tool in searching for and comparing 3D shapes. However, structural representation loses its discriminatory strength when comparing objects to some topological transformations. in this section, we will present a number of structural approaches, particularly those based on 3D object segmentation and those using Reeb graphs.

Funkhouser et al. [FUN 04] suggest segmenting objects to enable partial research. Gal and Cohen-Or [GAL 06] used salient characteristic points to extract a signature characterizing 3D objects. Points are selected based on the variance of curvature in their neighborhood. Thus, each region is described by a local descriptor. The authors used hash-chaining techniques to accelerate the mapping of similar regions.

Based on Morse’s theory, Reeb graphs [REE 46] are used to capture the topology of 3D objects. This type of graph has been used in 3D searches and indexing by a number of authors. In [HIL 01], for example, Hilaga et al. propose the use of multiresolution Reeb graphs to match 3D objects. The multiresolution Reeb graph represents the skeleton and topological outline of a 3D shape at different resolution levels. The similarity between shapes is calculated using a coarse-to-fine strategy, while preserving the coherence of the graph structures, thus establishing correspondences between parts of objects. This type of representation can be modified as a function of the size of database used. It may be adapted for use with large databases by limitation to low-resolution levels; it may also be extended to higher levels of resolution depending on the desired processing time and the level of details.

DEFINITION 18.1 (REEB GRAPH).– Take ch18_image099 a Morse function defined for a compact manifold M. The Reeb graph R(f) is the quotient space defined by notation-1.gif for the equivalence relationship ch18_image100 if and only if f(p1) = f(p2) and p1, p2 belong to the same connected component ch18_image016a.

Figure 18.3 shows an example of a Reeb graph obtained for the height function f. Biasotti et al. [BIA 03] propose a graph mapping method based on the propagation of mapped subgraphs. The method presents quadratic complexity and uses extended Reeb graphs or ERGs: Reeb graphs are oriented and possess edge information.

Figure 18.3. Example of a Reeb graph obtained by the Morse height function

ch18_image003

Tierny et al. [TIE 09] propose the use of Reeb graphs to extract signatures of 3D objects. A Reeb graph of the input surface is first calculated in order to segment it into controlled topology maps, known as Reeb charts, which take the form of either a disk or a ring. For each Reeb chart, an application toward the canonical planar domain is defined. The stretch signature of each application is calculated, based on an estimation of area distortion. Finally, the input surface is represented by the set of stretch signatures. An application to evaluate similarities insensitive to pose and to partial transformations by comparing the signatures of different Reeb charts has also been proposed.

18.6.2. Bag-of-words

Bag-of-words representation is a widely used description in information searches. This representation method was first used to index text documents, then for the representation of 2D images and, more recently, for 3D objects. In bag-of-words–based approaches, a document (text, image or 3D object) is described using a dictionary. In the case of 3D objects, we generally talk about visual words or elementary shapes. These methods require several stages: first the local description of the 3D model and then the calculation of invariant local descriptors. A quantization method is generally used to construct classes, each of which has representatives, known as the dictionary of 3D visual words. A histogram of these words is then constructed. Any specific 3D object is represented by the histogram of occurrences of the visual words, which it includes. Thus, we have a vector of the same size as the dictionary of which component i indicates the number of occurrences of the ith word in the dictionary in the object. The constitution of the dictionary is, therefore, a critical stage for determining the performance of systems using these representations.

Several authors [BRO 11, LAV 12, TAB 12] have used this approach for indexing and establishing correspondences between 3D objects. Bag-of-words–based representation is well suited to searching large databases. Figure 18.4 summarizes the different stages involved in the bag-of-words approach. In addition, Tabia et al. [TAB 11] introduced the notion of signification of 3D parts to map. The authors initially consider each part as a source of information on a 3D object. Next, the information source is weighted by significance. Finally, the information sources relating to all parts of the 3D object are combined in a belief function framework to produce a final decision concerning the shape of the object. Bronstein et al. [BRO 09] used distributions of geodesic distances on the object surface.

Figure 18.4. Bag-of-words approach. A set of characteristic points is first extracted from the 3D object. Next, a descriptor is calculated for each characterized point; each descriptor is then represented by the closest visual word in the shape dictionary. Finally, a histogram is calculated showing the number of occurrences of each visual word.

ch18_image028.gif

18.7. Shape recognition in 3D video

In recent years, new activities have emerged on the basis of the retrieval of actions and movements and the recognition of facial expressions in 3D video sequences. These research activities have been facilitated by the arrival of sensors that are able to produce very high-resolution 3D sequences. The availability of dynamic model databases, as shown in Table 18.1, has greatly contributed to the development of research activities in this domain.

Table 18.1. Dynamic 3D bases (3D+time)

Bases Context of use

IXMAS [WEI 06]

i3DPost [GKA 09]

BU-4DFE [YIN 08]

Hi4D-ADSIP [MAT 12]

3D Actions

3D Action and dynamic facial expressions

Dynamic facial expressions

Dynamic facial expressions

18.7.1. Action recognition in 3D videos

Using a video sequence, we should be able to find 3D models of similar poses or, more generally, sequences or subsequences containing similar actions in a video database. A 3D video is a sequence of frames, each consisting of a mesh (geometry and connectivity), in this case of a human in a given pose. Figure 18.5 shows examples of frames (meshes) taken from a 3D video sequence.

Figure 18.5. Examples of 3D meshes extracted from a 3D video sequence

ch18_image030.jpg

Huang et al. [HUA 10] consider searches for models with similar shapes and poses in temporal sequences of moving human beings. The aim of the study was to discriminate between different poses of the same object, rather than to distinguish different classes of objects. Static shape descriptors, such as shape distributions, spin images and spherical harmonics, were evaluated and compared using synthetic and real databases. The results showed that the best performances were obtained using shape distributions. These same descriptors were then extended into temporal descriptors by convolution, using a time filter which allows us to take into account the frame and its vicinity within a fixed window. The authors showed that the use of this filter improves results when identifying similar frames in 3D videos.

Tung et al. [TUN 12] used Reeb graphs to analyze 3D videos. The video was made up of meshes of moving people, and each frame was represented by a Reeb graph of its topology. The similarity of two frames was then estimated by mapping the nodes of these diagrams. The authors proposed a topological dictionary for use in coding and describing the content of a 3D video. Their model is based on a description and topological classification using multiresolution Reeb graphs, and a moving Markov graph to model states relating to changes in topology.

Slama et al. [SLA 13] proposed a new descriptor, known as the extremal human curves (EHC) descriptor, extracted from the mesh of a 3D model. First, the extremities of the body are detected, then the shortest pathways along the mesh connecting two extremities are identified, exploiting their properties of invariance in relation to changes in pose and topology. Finally, we obtain a collection of curves representing the pose of the body. To compare two poses, the authors compared the corresponding curves of two models through a framework based on Riemannian geometry. The curve comparison estimates the deformation and produces a similarity score for models in a video sequence, or a database of human body images with pose variations.

18.7.2. Facial expression recognition in 3D videos

The analysis of 3D facial expressions has applications in a number of domains, such as user–machine interfaces, facial animation and, more generally, in “ambient intelligence”. The problem consists of analyzing frame (mesh) deformations in a sequence of faces. A classifier is used to assign this deformation vector to one of six classes (six facial expressions). In [DRI 12], the authors proposed a vector known as the deformation vector field (DVF) and the Random Forest algorithm for classifying facial expressions.

18.8. Evaluation of the performance of indexing methods

The evaluation of the performance of indexing methods is an important element to take into account in choosing an approach.

3D model indexing and retrieval represent two distinct steps, as discussed in section 18.2. Indexing constitutes the offline stage. This step is carried out only once, and its performance is based solely on the choice of the descriptor(s). These choices have an effect on retrieval performances, i.e. when a user expresses a request and the online stage begins.

Our attention, therefore, must be focused on the online stage. In this case, the term “performance” refers to two aspects of the performance of any search engine: the request processing time and the speed at which responses are provided, and – especially – the relevance of the results produced in response to a request. To judge the relevance of results during test phases, a collection of ground truth data is required. 3D models from the collection are grouped into several classes by experts. Each 3D model in the collection is then used as a request submitted to the search engine, then we evaluate the relevance of results: a model response that falls into the same class as the request is judged to be relevant, while responses outside this class are considered irrelevant.

18.8.1. Statistical evaluation tools

Statistical evaluation tools are available for use in evaluating any search engine. Examples include:

Nearest Neighbor (NN): this is an evaluation of the closest neighbor produced by the search engine and is simply the percentage of closest matches that belong to the same class as the request.

First Tier (FT) and Second Tier (ST): percentage of models in the same class as the request appearing in the first K results. K depends on the size of the class in question.

Recall (R) and Precision (P): recall and precision are defined as follows: R = N/Q and P = N/A. N is the number of relevant results in the first A results. Q is the number of relevant models in the whole collection, i.e. the number of models in the class to which the request belongs.

Recall versus Precision plot: recall is often expressed as a function of precision. The recall versus precision plot is an interesting visual tool, offering a global or class-specific idea of the performance of a retrieval method.

Other tools are also available, such as ROC curves, E-Measure and Discounted Cumulative Gain. More details are given in [SHI 04], for example.

18.8.2. Benchmarks

Benchmarks include work from the University of Princeton1, presented in the article by Shilane et al. [SHI 04]. This benchmark is made up of 1,814 meshed 3D models from a variety of sources, essentially found on the Internet. The models are grouped into 92 classes, from airplanes to automobiles, including furniture, plants, humanoids, etc. It also provides standard statistical tools for evaluating and comparing the performance of research methods. Established in 2004–2005, its use is waning; annual international competitions, the Shape Retrieval Contests (SHREC), are now preferred.

The SHREC competition was established in 2006 in the context of a European excellency network, AIM@SHAPE2, and shows the growing interest of the scientific community for retrieval and recognition of 3D models. Calls for proposals are sent out each year. Researchers then propose tracks in different domains linked to 3D shape recognition. These have included protein models (2007, 2010), CAD models (2007, 2008), 3D face models (2007, 2008, 2010), partial matching (from 2007 to 2010), 3D retrieval using machine learning (2009, 2010), non-rigid shapes (2010, 2011), architectural models (2010), 3D mesh segmentation (2012), sketch-based 3D shape retrieval (2012), generic 3D model retrieval (every year), etc.

Data collections, performance measurement tools and the results of these contests are increasingly used in articles to present the strengths and weaknesses of new methods. Over the years, there has been growing interest in methods capable of accounting for the partial aspects of 3D models, both in indexed collections and in requests.

18.9. Applications

3D model indexing and recognition techniques have numerous applications, both in purely multimedia and 3D domains and elsewhere. Some of these applications will be explored in this section.

18.9.1. Browsing a collection of 3D models

As we highlighted in section 18.1, the use of simple keywords is not the best way for characterizing a request. 3D model indexing and recognition help users to navigate a large collection of data. By specifying the search subject using a drawing, a photo or even a full 3D model, the retrieval system is able to provide the 3D models from the collection that best respond to the request.

Search engines are currently available online, including those developed by the University of Princeton3 or the MIIRE team at the LIFL4, illustrated in Figure 18.6.

18.9.2. Modeling by example

The most frequent question asked by 3D designers concerns the best way of exploiting the vast collections of existing 3D models. The creation of a new 3D model is a long and costly process. 3D model indexing and recognition can constitute a design assistance tool. A search engine may be used to sort through existing models in their collection, and the results produced may act as a starting point for new creations.

Figure 18.6. Web interface of the MIIRE team’s 3D search engine [FIL 07]

ch18_image006

This simple inspiration suggestion may be extended to a more active reuse of the results turned up by a search engine. Modeling, for example, first proposed by Funkhouser et al. [FUN 04], is an approach where the creation of new 3D models is assisted by existing model retrieval. It consists of providing designers with tools to assemble pieces of 3D models in order to create new objects. The indexing used in this application must evidently take account of the partial nature of 3D models in requests and results. To this end, Tierny et al. [TIE 09] propose an approach whereby 3D models are broken down into different parts using Reeb graphs. Figure 18.7 shows an example of a new 3D model created by joining parts of other retrieved models.

18.9.3. Decision-making tools

3D model recognition may also be used in other types of applications.

The automobile industry, for example, produces large numbers of 3D objects, used to precisely model vehicles before production or even full-scale tests. Once the car is on the market, these models are then used to produce illustrated user manuals or technical repair sheets used by technicians in maintaining and repairing the vehicles. 3D model searches using a photo or 3D scan of a mechanical part can assist the technician in finding similar pieces or suitable repair notices for the component. The RNRT SEMANTIC-3D5 project considered the automatic search approach for automobile parts, alongside other issues, in collaboration with Renault.

Figure 18.7. Example modeling. a) Initial request; b) results produced by the search engine; and c) use of parts of results to create a new 3D object

ch18_image037

18.9.4. 3D facial recognition

The automatic recognition of human faces based on the use of 2D images has undergone major developments over the recent years, and several techniques have been proposed. In spite of the results obtained in the domain, robust facial recognition remains very difficult. Current methods are effective when the conditions in which test images are obtained are similar to those used for the images in the learning stage. However, significant variation created by changes in lighting and capture conditions causes serious problems for a large number of current recognition systems. One solution consists of using 3D facial information, which gives better information on the characteristics of a face in 3D space. This information produces invariance in relation to lighting and view-capture conditions. Moreover, recent advances in 3D imaging (acquisition tools, modeling software, graphics maps, etc.) have enabled the creation and storage of 3D facial images. Nevertheless, faces cannot be treated as rigid images, as they are subject to deformations resulting from facial expressions. Several approaches have been proposed recently to develop algorithms that are robust to non-rigid transformations [DRI 13, PEI 13].

18.10. Conclusion

In this chapter, we have shown the growing interest in analyzing the shape of 3D objects in recent years, particularly in terms of retrieving objects from large 3D model databases. This is essentially due to the spectacular progress made by 3D sensors, which are now able to produce high-quality static and dynamic 3D models. These models are now used in a variety of applications, such as CAD, special effects for cinema, video games, archeology, bioinformatics, virtual reality and simulation. 3D model retrieval methods use various mathematical tools, such as spectral methods and differential/Riemannian geometry, machine-learning mechanisms and numerous computing aspects (multidimensional data structures and Human-Machine Interfaces (HMIs)). A number of issues remain open, such as pose and isometric transformation invariant 3D object recognition, and the recognition of partially represented or hidden objects. Red-Green-Blue and Depth (RGB-D) captors, such as the Microsoft Kinect, open interesting perspectives and new lines of research.

18.11. Bibliography

[BIA 03] BIASOTTI S. et al., “3D shape matching through topological structures”, Discrete Geometry for Computer Imagery, vol. LNCS 2886, Springer-Verlag, pp. 194–203, 2003.

[BRO 09] BRONSTEIN A.M., BRONSTEIN M.M., BRUCKSTEIN A., et al., “Partial similarity of objects, or how to compare a centaur to a horse”, International Journal of Computer Vision, vol. 84, no. 2, pp. 163–183, August 2009.

[BRO 11] BRONSTEIN A.M., BRONSTEIN M.M., GUIBAS L.J., et al., “Shape google: geometric words and expressions for invariant shape retrieval”, ACM Transactions on Graphics, vol. 30, pp. 1–20, 2011.

[CHE 03] CHEN D.-Y., TIAN X.-P., SHEN Y.-T., et al., “On visual similarity based 3D model retrieval”, Computer Graphics Forum, vol. 22, no. 3, pp. 223–232, September 2003.

[DRI 12] DRIRA H., BEN AMOR B., DAOUDI M., et al., “3D dynamic expression recognition based on a novel deformation vector field and random forest”, 21st International Conference on Pattern Recognition, Tsukuba Science City, Japan, November 2012.

[DRI 13] DRIRA H., BEN AMOR B., SRIVASTAVA A., et al., “3D face recognition under expressions, occlusions and pose variations”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 9, pp. 2270–2283, September 2013.

[ELK 12] EL KHOURY R., VANDEBORRE J.-P., DAOUDI M., “Indexed heat curves for 3D-model retrieval”, 21st International Conference on Pattern Recognition (ICPR), Tsukuba Science City, Japan, 11–15 November 2012.

[FIL 07] FILALI ANSARY T., DAOUDI M., VANDEBORRE J.-P., “A Bayesian 3D search engine using adaptive views clustering”, IEEE Transactions on Multimedia, vol. 9, no. 1, pp. 78–88, January 2007.

[FUN 04] FUNKHOUSER T., KAZHDAN M., SHILANE P., et al., “Modeling by example”, ACM SIGGRAPH 2004 Papers, pp. 652–663, 2004.

[GAL 06] GAL R., COHEN-OR D., “Salient geometric features for partial shape matching and similarity”, ACM Transactions on Graphics, vol. 25, no. 1, pp. 130–150, January 2006.

[GKA 09] GKALELIS N., KIM H., HILTON A., et al., “The i3DPost multi-view and 3D human action/interaction database”, Proceedings of the 2009 Conference for Visual Media Production (CVMP ’09), IEEE Computer Society, pp. 159–168, 2009.

[HEA 96] HEALY D., JR., ROCKMORE D., KOSTELEC P.J., et al., “FFTs for the 2-sphere – improvements and variations”, The Journal of Fourier Analysis and Applications, vol. 9, pp. 341–385, 1996.

[HIL 01] HILAGA M., SHINAGAWA Y., KOHMURA T., et al., “Topology matching for fully automatic similarity estimation of 3D shapes”, 28th Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’01), pp. 203–212, 2001.

[HUA 10] HUANG P., HILTON A., STARCK J., “Shape similarity for 3D video sequences of people”, International Journal of Computer Vision, vol. 89, no. 2–3, pp. 362–381, September 2010.

[JOH 99] JOHNSON A., HEBERT M., “Using spin images for efficient object recognition in cluttered 3D scenes”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 433–449, May 1999.

[KAZ 02] KAZHDAN M., FUNKHOUSER T., “Harmonic 3D shape matching”, ACM SIGGRAPH 2002 Conference Abstracts and Applications, 2002.

[KOE 90] KOENDERINK J., Solid Shape, Artificial Intelligence, MIT Press, 1990.

[KÖR 03] KÖRTGEN M., NOVOTNI M., KLEIN R., “3D shape matching with 3D shape contexts”, 7th Central European Seminar on Computer Graphics, 2003.

[LAV 12] LAVOUÉ G., “Combination of bag-of-words descriptors for robust partial shape retrieval”, The Visual Computer, vol. 28, no. 9, pp. 931–942, September 2012.

[MAT 12] MATUSZEWSKI B.J., QUAN W., SHARK L.-K., et al., “Hi4D-ADSIP 3-D dynamic facial articulation database”, Image and Vision Computing, vol. 30, no. 10, October 2012.

[OSA 02] OSADA R., FUNKHOUSER T., CHAZELLE B., et al., “Shape distributions”, ACM Transactions on Graphics, vol. 21, no. 4, pp. 807–832, october 2002.

[PEI 13] PEIJIANG L., YUNHONG W., DI H., et al., “Learning the spherical harmonic features for 3-D face recognition”, IEEE Transactions on Image Processing, vol. 22, no. 3, pp. 914–925, 2013.

[REE 46] REEB G., “Sur les points singuliers d’une forme de Pfaff complètement intégrable ou d’une fonction numérique”, Comptes Rendus Acad. Sciences, vol. 222, pp. 847–849, 1946.

[SAU 01] SAUPE D., VRANIC D.V., “3D model retrieval with spherical harmonics and moments”, Pattern Recognition, 23rd DAGM-Symposium, Munich, Germany, 12–14 September, 2001, Proceedings, Lecture Notes in Computer Science, vol. 2191, Springer, pp. 392–397, 2001.

[SHI 04] SHILANE P., MIN P., KAZHDAN M., et al., “The Princeton shape benchmark”, Shape Modeling International, Genova, Italy, June 2004.

[SLA 13] SLAMA R., WANNOUS H., DAOUDI M., “Extremal human curves: a new human body shape and pose descriptor”, 10th IEEE Conference on Automatic Face and Gesture Recognition, 2013.

[SUN 09] SUN J., OVSJANIKOV M., GUIBAS L., “A concise and provably informative multi-scale signature based on heat diffusion”, Proceedings of the Symposium on Geometry Processing, SGP ’09, 2009.

[TAB 11] TABIA H., DAOUDI M., VANDEBORRE J.-P., et al., “A new 3D-matching method of nonrigid and partially similar models using curve analysis”, IEEE Transaction Pattern Analysis and Machine Intelligence, vol. 33, no. 4, pp. 852–858, April 2011.

[TAB 12] TABIA H., DAOUDI M., COLOT O., et al., “Three-dimensional object retrieval based on vector quantization of invariant descriptors”, SPIE Journal of Electronic Imaging, vol. 21, no. 2, April–June 2012.

[TIE 09] TIERNY J., VANDEBORRE J.-P., DAOUDI M., “Partial 3D shape retrieval by Reeb pattern unfolding”, Computer Graphics Forum - Eurographics Association, vol. 28, no. 1, pp. 41–55, March 2009.

[TUN 12] TUNG T., MATSUYAMA T., “Topology dictionary for 3D video understanding”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1645–1657, August 2012.

[WEI 06] WEINLAND D., RONFARD R., BOYER E., “Free viewpoint action recognition using motion history volumes”, Computer Vision and Image Understanding, vol. 104, no. 2, pp. 249–257, November 2006.

[YIN 08] YIN L., CHEN X., SUN Y., et al., “A high-resolution 3D dynamic facial expression database”, Automatic Face and Gesture Recognition, Amsterdam, The Netherlands, September 2008.

[ZAH 03] ZAHARIA T.B., PRÊTEUX F.J., “Descripteurs de forme pour l’indexation de maillages 3D”, Technique et Science Informatiques, vol. 22, no. 9, pp. 1077–1105, 2003.

Chapter written by Jean-Philippe VANDEBORRE, Hedi TABIA and Mohamed DAOUDI.

1 http://shape.cs.princeton.edu/benchmark/.

2 http://www.aimatshape.net/.

3 http://shape.cs.princeton.edu/search.html.

4 http://www-rech.telecom-lille1.eu/3dretrieval/.

5 http://liris.cnrs.fr/semantic-3d/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset