Home Page Icon
Home Page
Table of Contents for
Contents
Close
Contents
by Sanjay Ranka, Ann Gordon-Ross, Arslan Munir
Modeling and Optimization of Parallel and Distributed Embedded Systems
Cover
Title Page
Copyright
Dedication
Preface
About This Book
Highlights
Intended Audience
Organization of the Book
Acknowledgment
Part One: Overview
Chapter 1: Introduction
1.1 Embedded Systems Applications
1.2 Characteristics of Embedded Systems Applications
1.3 Embedded Systems—Hardware and Software
1.4 Modeling—An Integral Part of the Embedded Systems Design Flow
1.5 Optimization in Embedded Systems
1.6 Chapter Summary
Chapter 2: Multicore-Based EWSNs—An Example of Parallel and Distributed Embedded Systems
2.1 Multicore Embedded Wireless Sensor Network Architecture
2.2 Multicore Embedded Sensor Node Architecture
2.3 Compute-Intensive Tasks Motivating the Emergence of MCEWSNs
2.4 MCEWSN Application Domains
2.5 Multicore Embedded Sensor Nodes
2.6 Research Challenges and Future Research Directions
2.7 Chapter Summary
Part Two: Modeling
Chapter 3: An Application Metrics Estimation Model for Embedded Wireless Sensor Networks
3.1 Application Metrics Estimation Model
3.2 Experimental Results
3.3 Chapter Summary
Chapter 4: Modeling and Analysis of Fault Detection and Fault Tolerance in Embedded Wireless Sensor Networks
4.1 Related Work
4.2 Fault Diagnosis in WSNs
4.3 Distributed Fault Detection Algorithms
4.4 Fault-Tolerant Markov Models
4.5 Simulation of Distributed Fault Detection Algorithms
4.6 Numerical Results
4.7 Research Challenges and Future Research Directions
4.8 Chapter Summary
Chapter 5: A Queueing Theoretic Approach for Performance Evaluation of Low-Power Multicore-Based Parallel Embedded Systems
5.1 Related Work
5.2 Queueing Network Modeling of Multicore Embedded Architectures
5.3 Queueing Network Model Validation
5.4 Queueing Theoretic Model Insights
5.5 Chapter Summary
Part Three: Optimization
Chapter 6: Optimization Approaches in Distributed Embedded Wireless Sensor Networks
6.1 Architecture-Level Optimizations
6.2 Sensor Node Component-Level Optimizations
6.3 Data Link-Level Medium Access Control Optimizations
6.4 Network-Level Data Dissemination and Routing Protocol Optimizations
6.5 Operating System-Level Optimizations
6.6 Dynamic Optimizations
6.7 Chapter Summary
Chapter 7: High-Performance Energy-Efficient Multicore-Based Parallel Embedded Computing
7.1 Characteristics of Embedded Systems Applications
7.2 Architectural Approaches
7.3 Hardware-Assisted Middleware Approaches
7.4 Software Approaches
7.5 High-Performance Energy-Efficient Multicore Processors
7.6 Challenges and Future Research Directions
7.7 Chapter Summary
Chapter 8: An MDP-Based Dynamic Optimization Methodology for Embedded Wireless Sensor Networks
8.1 Related Work
8.2 MDP-Based Tuning Overview
8.3 Application-Specific Embedded Sensor Node Tuning Formulation as an MDP
8.4 Implementation Guidelines and Complexity
8.5 Model Extensions
8.6 Numerical Results
8.7 Chapter Summary
Chapter 9: Online Algorithms for Dynamic Optimization of Embedded Wireless Sensor Networks
9.1 Related Work
9.2 Dynamic Optimization Methodology
9.3 Experimental Results
9.4 Chapter Summary
Chapter 10: A Lightweight Dynamic Optimization Methodology for Embedded Wireless Sensor Networks
10.1 Related Work
10.2 Dynamic Optimization Methodology
10.3 Algorithms for Dynamic Optimization Methodology
10.4 Experimental Results
10.5 Chapter Summary
Chapter 11: Parallelized Benchmark-Driven Performance Evaluation of Symmetric Multiprocessors and Tiled Multicore Architectures for Parallel Embedded Systems
11.1 Related Work
11.2 Multicore Architectures and Benchmarks
11.3 Parallel Computing Device Metrics
11.4 Results
11.5 Chapter Summary
Chapter 12: High-Performance Optimizations on Tiled Manycore Embedded Systems: A Matrix Multiplication Case Study
12.1 Related Work
12.2 Tiled Manycore Architecture (TMA) Overview
12.3 Parallel Computing Metrics and Matrix Multiplication (MM) Case Study
12.4 Matrix Multiplication Algorithms' Code Snippets for Tilera's TILEPro64
12.5 Performance Optimization on a Manycore Architecture
12.6 Results
12.7 Chapter Summary
Chapter 13: Conclusions
References
Index
End User License Agreement
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Title Page
Table of Contents
Cover
Title Page
Copyright
Dedication
Preface
About This Book
Highlights
Intended Audience
Organization of the Book
Acknowledgment
Part One: Overview
Chapter 1: Introduction
1.1 Embedded Systems Applications
1.2 Characteristics of Embedded Systems Applications
1.3 Embedded Systems—Hardware and Software
1.4 Modeling—An Integral Part of the Embedded Systems Design Flow
1.5 Optimization in Embedded Systems
1.6 Chapter Summary
Chapter 2: Multicore-Based EWSNs—An Example of Parallel and Distributed Embedded Systems
2.1 Multicore Embedded Wireless Sensor Network Architecture
2.2 Multicore Embedded Sensor Node Architecture
2.3 Compute-Intensive Tasks Motivating the Emergence of MCEWSNs
2.4 MCEWSN Application Domains
2.5 Multicore Embedded Sensor Nodes
2.6 Research Challenges and Future Research Directions
2.7 Chapter Summary
Part Two: Modeling
Chapter 3: An Application Metrics Estimation Model for Embedded Wireless Sensor Networks
3.1 Application Metrics Estimation Model
3.2 Experimental Results
3.3 Chapter Summary
Chapter 4: Modeling and Analysis of Fault Detection and Fault Tolerance in Embedded Wireless Sensor Networks
4.1 Related Work
4.2 Fault Diagnosis in WSNs
4.3 Distributed Fault Detection Algorithms
4.4 Fault-Tolerant Markov Models
4.5 Simulation of Distributed Fault Detection Algorithms
4.6 Numerical Results
4.7 Research Challenges and Future Research Directions
4.8 Chapter Summary
Chapter 5: A Queueing Theoretic Approach for Performance Evaluation of Low-Power Multicore-Based Parallel Embedded Systems
5.1 Related Work
5.2 Queueing Network Modeling of Multicore Embedded Architectures
5.3 Queueing Network Model Validation
5.4 Queueing Theoretic Model Insights
5.5 Chapter Summary
Part Three: Optimization
Chapter 6: Optimization Approaches in Distributed Embedded Wireless Sensor Networks
6.1 Architecture-Level Optimizations
6.2 Sensor Node Component-Level Optimizations
6.3 Data Link-Level Medium Access Control Optimizations
6.4 Network-Level Data Dissemination and Routing Protocol Optimizations
6.5 Operating System-Level Optimizations
6.6 Dynamic Optimizations
6.7 Chapter Summary
Chapter 7: High-Performance Energy-Efficient Multicore-Based Parallel Embedded Computing
7.1 Characteristics of Embedded Systems Applications
7.2 Architectural Approaches
7.3 Hardware-Assisted Middleware Approaches
7.4 Software Approaches
7.5 High-Performance Energy-Efficient Multicore Processors
7.6 Challenges and Future Research Directions
7.7 Chapter Summary
Chapter 8: An MDP-Based Dynamic Optimization Methodology for Embedded Wireless Sensor Networks
8.1 Related Work
8.2 MDP-Based Tuning Overview
8.3 Application-Specific Embedded Sensor Node Tuning Formulation as an MDP
8.4 Implementation Guidelines and Complexity
8.5 Model Extensions
8.6 Numerical Results
8.7 Chapter Summary
Chapter 9: Online Algorithms for Dynamic Optimization of Embedded Wireless Sensor Networks
9.1 Related Work
9.2 Dynamic Optimization Methodology
9.3 Experimental Results
9.4 Chapter Summary
Chapter 10: A Lightweight Dynamic Optimization Methodology for Embedded Wireless Sensor Networks
10.1 Related Work
10.2 Dynamic Optimization Methodology
10.3 Algorithms for Dynamic Optimization Methodology
10.4 Experimental Results
10.5 Chapter Summary
Chapter 11: Parallelized Benchmark-Driven Performance Evaluation of Symmetric Multiprocessors and Tiled Multicore Architectures for Parallel Embedded Systems
11.1 Related Work
11.2 Multicore Architectures and Benchmarks
11.3 Parallel Computing Device Metrics
11.4 Results
11.5 Chapter Summary
Chapter 12: High-Performance Optimizations on Tiled Manycore Embedded Systems: A Matrix Multiplication Case Study
12.1 Related Work
12.2 Tiled Manycore Architecture (TMA) Overview
12.3 Parallel Computing Metrics and Matrix Multiplication (MM) Case Study
12.4 Matrix Multiplication Algorithms' Code Snippets for Tilera's TILEPro64
12.5 Performance Optimization on a Manycore Architecture
12.6 Results
12.7 Chapter Summary
Chapter 13: Conclusions
References
Index
End User License Agreement
Pages
xv
xvi
xvii
xviii
xix
xx
xxi
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
49
51
52
53
54
55
56
57
58
59
60
61
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
141
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
343
344
345
346
347
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
369
370
371
372
373
374
375
Guide
Cover
Table of Contents
Preface
Part One: Overview
Begin Reading
List of Illustrations
Chapter 2: Multicore-Based EWSNs—An Example of Parallel and Distributed Embedded Systems
Figure 2.1 A heterogeneous multicore embedded wireless sensor network (MCEWSN) architecture
Figure 2.2 Multicore embedded sensor node architecture
Figure 2.3 Omnibus sensor information fusion model for an MCEWSN architecture
Chapter 4: Modeling and Analysis of Fault Detection and Fault Tolerance in Embedded Wireless Sensor Networks
Figure 4.1 Wireless sensor network architecture
Figure 4.2 Byzantine faulty behavior in WSNs
Figure 4.3 Various types of sensor faults [93]: (a) outlier faults; (b) stuck-at faults; (c) noisy faults
Figure 4.4 A non-FT (NFT) sensor node Markov model
Figure 4.5 FT sensor node Markov model [130]
Figure 4.6 WSN cluster Markov model [130]
Figure 4.7 WSN cluster Markov model with three states [130]
Figure 4.8 WSN Markov model [130]
Figure 4.9 The ns
2-based simulation architecture
Figure 4.10 Effectiveness and false positive rate of the Chen algorithm: (a) error detection accuracy for the Chen algorithm; (b) false positive rate of Chen algorithm
Figure 4.11 Effectiveness and false positive rate for the Ding algorithm: (a) error detection accuracy for the Ding algorithm; (b) false positive rate for the Ding algorithm
Figure 4.12 Noise power levels in the Intel Berkeley sample
Figure 4.13 Distribution of constant error occurrences
Figure 4.14 Error detection and false positive rate for the Chen algorithm using real-world data
Figure 4.15 Error detection and false positive rate for the Ding algorithm using real-world data
Figure 4.16 MTTF in days for an NFT and an FT sensor node [130]
Figure 4.17 MTTF in days for an NFT WSN cluster and an FT WSN cluster with
[130]
Figure 4.18 MTTF in days for an NFT WSN and an FT WSN with
[130]
Chapter 5: A Queueing Theoretic Approach for Performance Evaluation of Low-Power Multicore-Based Parallel Embedded Systems
Figure 5.1 Queueing network model for the 2P-2L1ID-2L2-1M multicore embedded architecture
Figure 5.2 Queueing network model for the 2P-2L1ID-1L2-1M multicore embedded architecture
Figure 5.3 Queueing network model validation of the response time in ms for mixed workloads for 2P-2L1ID-1L2-1M for a varying number of jobs
Figure 5.4 Flow chart for our queueing network model setup in SHARPE
Figure 5.5 The effects of cache miss rate on response time (ms) for mixed workloads for 2P-2L1ID-2L2-1M for a varying number of jobs
: (a) relatively low cache miss rates; (b) relatively high cache miss rates
Figure 5.6 The effects of processor-bound workloads on response time (ms) for 2P-2L1ID-2L2-1M for a varying number of jobs
for cache miss rates: L1-I = 0.01, L1-D = 0.13, and L2 = 0.3: (a) processor-bound workloads (processor-to-processor probability
); (b) processor-bound workloads (processor-to-processor probability
)
Chapter 6: Optimization Approaches in Distributed Embedded Wireless Sensor Networks
Figure 6.1 Embedded wireless sensor network architecture
Figure 6.2 Embedded sensor node architecture with tunable parameters
Figure 6.3 Data aggregation
Figure 6.4 Directed diffusion: (a) interest propagation, (b) initial gradient setup, and (c) data delivery along the reinforced path
Chapter 7: High-Performance Energy-Efficient Multicore-Based Parallel Embedded Computing
Figure 7.1 High-performance energy-efficient parallel embedded computing (HPEPEC) domain
Figure 7.2 Classification of optimization techniques based on embedded application characteristics
Chapter 8: An MDP-Based Dynamic Optimization Methodology for Embedded Wireless Sensor Networks
Figure 8.1 Process diagram for our MDP-based application-oriented dynamic tuning methodology for embedded wireless sensor networks
Figure 8.2 Reward functions: (a) power reward function
; (b) throughput reward function
; (c) delay reward function
Figure 8.4 Symbolic representation of our MDP model with four sensor node states
Figure 8.3 Reliability reward functions: (a) linear variation; (b) quadratic variation
Figure 8.5 The effects of different discount factors on the expected total discounted reward for a security/defense system.
if
,
Figure 8.6 Percentage improvement in expected total discounted reward for
for a security/defense system as compared to the fixed heuristic policies.
if
,
Figure 8.7 The effects of different state transition costs on the expected total discounted reward for a security/defense system.
,
Figure 8.8 The effects of different reward function weight factors on the expected total discounted reward for a security/defense system.
,
if
Figure 8.9 The effects of different discount factors on the expected total discounted reward for a healthcare application.
if
,
Figure 8.10 Percentage improvement in expected total discounted reward for
for a healthcare application as compared to the fixed heuristic policies.
if
,
Figure 8.11 The effects of different state transition costs on the expected total discounted reward for a healthcare application.
,
Figure 8.12 The effects of different reward function weight factors on the expected total discounted reward for a healthcare application.
,
if
Figure 8.13 The effects of different discount factors on the expected total discounted reward for an ambient conditions monitoring application.
if
,
Figure 8.14 Percentage improvement in expected total discounted reward for
for an ambient conditions monitoring application as compared to the fixed heuristic policies.
if
,
Figure 8.15 The effects of different state transition costs on the expected total discounted reward for an ambient conditions monitoring application.
,
Figure 8.16 The effects of different reward function weight factors on the expected total discounted reward for an ambient conditions monitoring application.
,
if
Chapter 9: Online Algorithms for Dynamic Optimization of Embedded Wireless Sensor Networks
Figure 9.1 Dynamic optimization methodology for distributed EWSNs
Figure 9.2 Lifetime objective function
Figure 9.3 Objective function value normalized to the optimal solution for a varying number of states explored for the greedy and simulated annealing algorithms for a security/defense system where
,
,
,
Figure 9.4 Objective function value normalized to the optimal solution for a varying number of states explored for the greedy and simulated annealing algorithms for a healthcare application where
,
,
,
Figure 9.5 Objective function value normalized to the optimal solution for a varying number of states explored for the greedy and simulated annealing algorithms for an ambient conditions monitoring application where
,
,
,
Figure 9.6 Data memory requirements for exhaustive search, greedy, and simulated annealing algorithms for design space cardinalities of 8, 81, 729, and 46,656
Chapter 10: A Lightweight Dynamic Optimization Methodology for Embedded Wireless Sensor Networks
Figure 10.1 A lightweight dynamic optimization methodology per sensor node for EWSNs
Figure 10.2 Lifetime objective function
Figure 10.3 Objective function value normalized to the optimal solution for a varying number of states explored for one-shot, greedy, and SA algorithms for a security/defense system where
,
,
, and
Figure 10.4 Objective function value normalized to the optimal solution for a varying number of states explored for one-shot, greedy, and SA algorithms for a security/defense system where
,
,
, and
Figure 10.5 Objective function value normalized to the optimal solution for a varying number of states explored for one-shot, greedy, and SA algorithms for a healthcare application where
,
,
, and
Figure 10.6 Objective function value normalized to the optimal solution for a varying number of states explored for one-shot, greedy, and SA algorithms for a healthcare application where
,
,
, and
Figure 10.7 Objective function value normalized to the optimal solution for a varying number of states explored for one-shot, greedy, and SA algorithms for an ambient conditions monitoring application where
,
,
, and
Figure 10.8 Objective function value normalized to the optimal solution for a varying number of states explored for one-shot, greedy, and SA algorithms for an ambient conditions monitoring application where
,
,
, and
Figure 10.9 Objective function values normalized to the optimal solution for a varying number of states explored for SA and the greedy algorithms for a security/defense system where
,
,
, and
Figure 10.10 Objective function values normalized to the optimal solution for a varying number of states explored for SA and greedy algorithms for a security/defense system where
,
,
, and
Figure 10.11 Objective function values normalized to the optimal solution for a varying number of states explored for SA and greedy algorithms for an ambient conditions monitoring application where
,
,
, and
Chapter 11: Parallelized Benchmark-Driven Performance Evaluation of Symmetric Multiprocessors and Tiled Multicore Architectures for Parallel Embedded Systems
Figure 11.1 Tilera TILEPro64 processor [352]
Figure 11.2 Performance per watt (MOPS/W) comparison between
and the TILEPro64 for the information fusion application when
Figure 11.3 Performance per watt (MFLOPS/W) comparison between
and the TILEPro64 for the Gaussian elimination benchmark when
Figure 11.4 Performance per watt (MOPS/W) comparison between
and the TILEPro64 for the integer MM benchmark when
Chapter 12: High-Performance Optimizations on Tiled Manycore Embedded Systems: A Matrix Multiplication Case Study
Figure 12.1 Intel's TeraFLOPS research chip (adapted from [365])
Figure 12.2 IBM Cyclops-64 chip (adapted from [364])
Figure 12.3 Tilera's TILEPro64 processor (adapted from [352])
Figure 12.4 Feedback-based optimization workflow
Figure 12.5 The impact of high-performance optimizations on the execution time of MM on the TILEPro64 (
denotes corresponds to)
Figure 12.6 The impact of high-performance optimizations on the performance per watt of MM on the TILEPro64 (
denotes corresponds to)
List of Tables
Chapter 3: An Application Metrics Estimation Model for Embedded Wireless Sensor Networks
Table 3.1 Crossbow IRIS mote platform hardware specifications
Chapter 4: Modeling and Analysis of Fault Detection and Fault Tolerance in Embedded Wireless Sensor Networks
Table 4.1 Summary of notations used in the DFD algorithms
Table 4.2 Summary of notations used in our Markov models
Table 4.3 Estimated values for a fault detection algorithm's accuracy
Table 4.4 Reliability for an NFT and an FT sensor node
Table 4.5 Percentage MTTF improvement for an FT sensor node as compared to an NFT sensor node
Table 4.6 Reliability for an NFT and an FT WSN cluster when
(
)
Table 4.7 Percentage MTTF improvement for an FT WSN cluster as compared to an NFT WSN cluster (
)
Table 4.8 Iso-MTTF for WSN clusters (
).
denotes the redundant sensor nodes required by an NFT WSN cluster to achieve a comparable MTTF as that of an FT WSN cluster
Table 4.9 Reliability for an NFT WSN and an FT WSN when
(
)
Table 4.10 Iso-MTTF for WSNs (
).
denotes the redundant WSN clusters required by an NFT WSN to achieve a comparable MTTF as that of an FT WSN
Chapter 5: A Queueing Theoretic Approach for Performance Evaluation of Low-Power Multicore-Based Parallel Embedded Systems
Table 5.1 Multicore embedded architectures with varying processor cores and cache configurations
Table 5.2 Queueing network model probabilities for SPLASH-2 benchmarks for 2P-2L1ID-2L2-1M
Table 5.3 Queueing network model probabilities for SPLASH-2 benchmarks for 2P-2L1ID-1L2-1M
Table 5.4 Execution time comparison of the SPLASH-2 benchmarks on SESC for multicore architectures
Table 5.5 Dual-core architecture evaluation (
) on SESC and
(our queueing theoretic model) based on the SPLASH-2 benchmarks
Table 5.6 Execution time and speedup comparison of our queueing theoretic models versus SESC.
denotes the execution time required for simulating an
-core architecture using
where
(
denotes our queueing theoretic model)
Table 5.7 Area and power consumption of architectural elements for two-core embedded architectures
Table 5.8 Area and power consumption of architectural elements for four-core embedded architectures
Table 5.9 Area and power consumption for multicore architectures
Chapter 6: Optimization Approaches in Distributed Embedded Wireless Sensor Networks
Table 6.1 EWSN optimizations at different design levels
Chapter 7: High-Performance Energy-Efficient Multicore-Based Parallel Embedded Computing
Table 7.1 Top Green500 and Top500 supercomputers as of November 2014 [233, 234]
Table 7.2 High-performance energy-efficient multicore processors
Table 7.3 High-performance energy-efficient embedded computing (HPEEC) challenges
Chapter 8: An MDP-Based Dynamic Optimization Methodology for Embedded Wireless Sensor Networks
Table 8.1 Summary of MDP notations
Table 8.2 Parameters for wireless sensor node state
(
is specified in volts,
in MHz, and
in kHz)
Table 8.3 Minimum
and maximum
reward function parameter values and application metric weight factors for a security/defense system, health care, and ambient conditions monitoring application
Table 8.4 The effects of different discount factors
for a security/defense system
Chapter 9: Online Algorithms for Dynamic Optimization of Embedded Wireless Sensor Networks
Table 9.1 Desirable minimum
, desirable maximum
, accepTable minimum
, and accepTable maximum
objective function parameter values for a security/defense system, health care, and an ambient conditions monitoring application
Chapter 10: A Lightweight Dynamic Optimization Methodology for Embedded Wireless Sensor Networks
Table 10.1 Crossbow IRIS mote platform hardware specifications
Table 10.2 Desirable minimum
, desirable maximum
, accepTable minimum
, and accepTable maximum
objective function parameter values for a security/defense (defense) system, health care, and an ambient conditions monitoring application
Table 10.3 Percentage improvements attained by one-shot (
) over other initial parameter settings for
and
Table 10.4 Greedy algorithms with different parameter arrangements and exploration orders
Table 10.5 Energy consumption for the one-shot and the improvement mode for our dynamic optimization methodology
Chapter 11: Parallelized Benchmark-Driven Performance Evaluation of Symmetric Multiprocessors and Tiled Multicore Architectures for Parallel Embedded Systems
Table 11.1 Parallel computing device metrics for the multicore architectures (Intel's Xeon E5430 refers to the Xeon quad-core chip on
)
Table 11.2 Performance results for the information fusion application for
when
Table 11.3 Performance results for the Gaussian elimination benchmark for
Table 11.4 Performance results for the EP benchmark for
Table 11.5 Performance results for the MM benchmark for
Table 11.6 Performance results for the information fusion application for the TILEPro64 when
Table 11.7 Performance results for the Gaussian elimination benchmark for the TILEPro64
Table 11.8 Performance results for the EP benchmark for the TILEPro64
Chapter 12: High-Performance Optimizations on Tiled Manycore Embedded Systems: A Matrix Multiplication Case Study
Table 12.1 Theoretical peak performance and power consumption of Intel's TeraFLOPS research chip [368, 385]
Table 12.2 Performance and performance per watt optimization notations used in our MM case study
Table 12.3 Performance and performance per watt of a blocked (B) and a non-blocked (NB) MM algorithm on a single tile of the TILEPro64 for different matrix sizes
Table 12.4 Performance of the blocked MM algorithm on a single tile of the TILEPro64 for different matrix sizes
, block sizes
, and sub-block sizes
Table 12.5 Compiler directives-based optimizations with compiler optimization level -O2 for the non-blocked MM algorithm
Table 12.6 Performance of the blocked MM algorithm on a single tile of the TILEPro64 using feedback-based optimizations for different matrix sizes
Table 12.7 Performance per watt of the blocked MM algorithm on a single tile of the TILEPro64 using feedback-based optimizations for different matrix sizes
Table 12.8 Performance of the blocked MM algorithm on a single tile of the TILE64 using feedback-based optimizations for different matrix sizes
Table 12.9 Performance per watt of the blocked MM algorithm on a single tile of the TILE64 using feedback-based optimizations for different matrix sizes
Table 12.10 Performance of the parallelized blocked MM algorithm for a different number of tiles
for the TILEPro64 for different matrix sizes
, block sizes
, and sub-block sizes
.
denotes the speedup using
tiles
Table 12.11 Performance per watt of the parallelized blocked MM algorithm for a different number of tiles
for the TILEPro64 for different matrix sizes
, block sizes
, and sub-block sizes
Table 12.12 Performance and performance per watt for a parallelized non-blocked (NB) and a parallelized blocked (B) MM algorithm for different number of tiles
for the TILEPro64 for different matrix sizes
.
denotes the speedup for the blocked MM algorithm
Table 12.13 Performance of a parallelized blocked Cannon's algorithm for MM for a different number of tiles
for the TILEPro64 for different matrix sizes
, block sizes
, and sub-block sizes
Table 12.14 Performance per watt of a parallelized blocked Cannon's algorithm for MM for a different number of tiles
for the TILEPro64 for different matrix sizes
, block sizes
, and sub-block sizes
Table 12.15 Execution time of the parallelized blocked MM algorithm for a different number of tiles
for the TILEPro64 for different matrix sizes
Table 12.16 The impact of high-performance optimizations on the performance per watt of MM on the TILEPro64 (
denotes corresponds to)
Table 12.17 Performance per watt of the parallelized blocked MM algorithm for a different number of tiles
for the TILEPro64 for different matrix sizes
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset