References

Adve S.V, Gharachorloo K. Shared memory consistency models: A tutorial. IEEE Computer. 1996;29(12):66-76. (December)

Adve S.V, Hill M.D. Weak ordering—a new definition. May 28–31, 1990, Seattle, Wash. Proc. 17th Annual Int’l. Symposium on Computer Architecture (ISCA). 1990:2-14.

Agarwal, A. [1987]. “Analysis of Cache Performance for Operating Systems and Multiprogramming,” Ph.D. thesis, Tech. Rep. No. CSL-TR-87-332, Stanford University, Palo Alto, Calif.

Agarwal A. Limits on interconnection network performance. IEEE Trans. on Parallel and Distributed Systems. 1991;2(4):398-412. (April)

Agarwal A., Pudar S.D. Column-associative caches: A technique for reducing the miss rate of direct-mapped caches. May 16–19, 1993, San Diego, Calif. 20th Annual Int’l. Symposium on Computer Architecture (ISCA), 1993. Also appears in Computer Architecture News. 1993;21;2:179-190. (May)

Agarwal A., Bianchini R., Chaiken D., Johnson K., Kranz D. The MIT Alewife machine: Architecture and performance. (Denver, Colo.). Int’l. Symposium on Computer Architecture. 1995 June, 2–13

Agarwal A., Hennessy J.L, Simoni R., Horowitz M.A. An evaluation of directory schemes for cache coherence. Proc. 15th Int’l. Symposium on Computer Architecture (June). 1988:280-289.

Agarwal A., Kubiatowicz J., Kranz D., Lim B.-H, Yeung D., D’Souza G., Parkin M. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro. 1993;13:48-61. (June)

Agerwala, T.,and J. Cocke [1987]. High Performance Reduced Instruction Set Processors, IBM Tech. Rep. RC12434, IBM, Armonk, N.Y.

Akeley K., Jermoluk T. High-Performance Polygon Rendering. August 1–5, 1988, Atlanta, Ga. Proc. 15th Annual Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH 1988). 1988:239-246.

Alexander W.G, Wortman D.B. Static and dynamic characteristics of XPL programs. IEEE Computer. 1975;8(11):41-46. (November)

Alles, A. [1995]. “ATM Internetworking,” White Paper (May), Cisco Systems, Inc., San Jose, Calif. (www.cisco.com/warp/public/614/12.html).

Alliant. Alliant FX/Series: Product Summary. Acton, Mass: Alliant Computer Systems Corp., 1987.

Almasi G.S, Gottlieb A. Highly Parallel Computing. Redwood City, Calif.: Benjamin/Cummings, 1989.

Alverson G., Alverson R., Callahan D., Koblenz B., Porterfield A., Smith B. Exploiting heterogeneous parallelism on a multithreaded multiprocessor. November 16–20, 1992, Minneapolis, Minn. Proc. ACM/IEEE Conf. on Supercomputing. 1992:188-197.

Amdahl G.M. Validity of the single processor approach to achieving large scale computing capabilities. April 18–20, 1967, Atlantic City, N.J. Proc. AFIPS Spring Joint Computer Conf. 1967:483-485.

Amdahl G.M, Blaauw G.A, Brooks F.P.Jr. Architecture of the IBM System 360. IBM J. Research and Development. 1964;8(2):87-101. (April)

Amza C., Cox A.L, Dwarkadas S., Keleher P., Lu H., Rajamony R., Yu W., Zwaenepoel W. Treadmarks: Shared memory computing on networks of workstations. IEEE Computer. 1996;29;2:18-28. (February)

Anderson D. You don’t know jack about disks. Queue. 2003;1(4):20-30. (June)

Anderson D., Dykes J., Riedel E. SCSI vs. ATA—More than an interface. March 31–April 2, 2003, San Francisco. Proc. 2nd USENIX Conf. on File and Storage Technology (FAST ’03). 2003.

Anderson D.W, Sparacio F.J, Tomasulo R.M. The IBM 360 Model 91: Processor philosophy and instruction handling. IBM J. Research and Development. 1967;11(1):8-24. (January)

Anderson M.H. Strength (and safety) in numbers (RAID, disk storage technology). Byte. 1990;15(13):337-339. (December)

Anderson T.E, Culler D.E, Patterson D. A case for NOW (networks of workstations). IEEE Micro. 1995;15(1):54-64. (February)

Ang B., Chiou D., Rosenband D., Ehrlich M., Rudolph L., Arvind. StarT-Voyager: A flexible platform for exploring scalable SMP issues. November 7–13, 1998, Orlando, FL. Proc. ACM/IEEE Conf. on Supercomputing. 1998.

Anjan K.V, Pinkston T.M. An efficient, fully-adaptive deadlock recovery scheme: Disha. June 22–24, 1995 Santa Margherita, Italy. Proc. 22nd Annual Int’l. Symposium on Computer Architecture (ISCA). 1995.

Anon. et al. [1985]. A Measure of Transaction Processing Power, Tandem Tech. Rep. TR85.2. Also appears in Datamation 31:7 (April), 112–118, 1985.

Apache Hadoop. http://hadoop.apache.org. 2011.

Archibald J., Baer J.-L. Cache coherence protocols: Evaluation using a multiprocessor simulation model. ACM Trans. on Computer Systems. 1986;4(4):273-298. (November)

Armbrust, M., A. Fox, R. Griffith, A. D Joseph, R. Katz., A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia [2009]. Above the Clouds: A Berkeley View of Cloud Computing, Tech. Rep. UCB/EECS-2009-28, University of California, Berkeley (http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html).

Arpaci R.H, Culler D.E, Krishnamurthy A., Steinberg S.G, Yelick K. Empirical evaluation of the CRAY-T3D: A compiler perspective. June 22–24, 1995, Santa Margherita, Italy. 22nd Annual Int’l. Symposium on Computer Architecture (ISCA). 1995.

Asanovic, K. [1998]. “Vector Microprocessors,” Ph.D. thesis, Computer Science Division, University of California, Berkeley.

Associated Press. Gap Inc. shuts down two Internet stores for major overhaul. USATODAY.com. August 8, 2005.

Atanasoff, J.V. [1940]. Computing Machine for the Solution of Large Systems of Linear Equations, Internal Report, Iowa State University, Ames.

Atkins M. Performance and the i860 Microprocessor. IEEE Micro. 1991;11(5):72-78. (September), 24–27

Austin T.M, Sohi G. Dynamic dependency analysis of ordinary programs. May 19–21, 1992, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992:342-351.

Babbay F., Mendelson A. Using value prediction to increase the power of speculative execution hardware. ACM Trans. on Computer Systems. 1998;16(3):234-270. (August)

Baer J.-L., Wang W.-H. On the inclusion property for multi-level cache hierarchies. May 30–June 2, 1988, Honolulu, Hawaii. Proc. 15th Annual Int’l. Symposium on Computer Architecture. 1988:73-80.

Bailey D.H, Barszcz E., Barton J.T, Browning D.S, Carter R.L, Dagum L., Fatoohi R.A, Frederickson P.O, Lasinski T.A, Schreiber R.S, Simon H.D, Venkatakrishnan V., Weeratunga S.K. The NAS parallel benchmarks. Int’l. J. Supercomputing Applications. 1991;5:63-73.

Bakoglu H.B, Grohoski G.F, Thatcher L.E, Kaeli J.A, Moore C.R, Tattle D.P, Male W.E, Hardell W.R, Hicks D.A, Nguyen Phu M., Montoye R.K, Glover W.T, Dhawan S. IBM second-generation RISC processor organization. 30–October 4, 1989, Rye, N.Y. Proc. IEEE Int’l. Conf. on Computer Design, September. 1989:138-142.

Balakrishnan H., Padmanabhan V.N, Seshan S., Katz R.H. A comparison of mechanisms for improving TCP performance over wireless links. IEEE/ACM Trans. on Networking. 1997;5(6):756-769. (December)

Ball T., Larus J. Branch prediction for free. June 23–25, 1993, Albuquerque, N.M. Proc. ACM SIGPLAN’93 Conference on Programming Language Design and Implementation (PLDI). 1993:300-313.

Banerjee, U. [1979]. “Speedup of Ordinary Programs,” Ph.D. thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign.

Barham P., Dragovic B., Fraser K., Hand S., Harris T., Ho A., Neugebauer R. Xen and the art of virtualization. October 19–22, 2003, Bolton Landing, N.Y. Proc. of the 19th ACM Symposium on Operating Systems Principles. 2003.

Barroso L.A. Warehouse Scale Computing [keynote address]. June 8–10, 2010, Indianapolis, Ind. Proc. ACM SIGMOD. 2010.

Barroso L.A, Hölzle U. The case for energy-proportional computing. IEEE Computer. 2007;40(12):33-37. (December)

Barroso L.A, Hölzle U. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. San Rafael, Calif.: Morgan & Claypool, 2009.

Barroso L.A, Gharachorloo K., Bugnion E. Memory system characterization of commercial workloads. July 3–14, 1998, Barcelona, Spain. Proc. 25th Annual Int’l. Symposium on Computer Architecture (ISCA). 1998:3-14.

Barton R.S. A new approach to the functional design of a computer. May 9–11, 1961, Los Angeles, Calif. Proc. Western Joint Computer Conf.. 1961:393-396.

Bashe C.J, Buchholz W., Hawkins G.V, Ingram J.L, Rochester N. The architecture of IBM’s early computers. IBM J. Research and Development. 1981;25(5):363-375. (September)

Bashe C.J, Johnson L.R, Palmer J.H, Pugh E.W. IBM’s Early Computers. Cambridge, Mass: MIT Press, 1986.

Baskett F., Keller T.W. An evaluation of the Cray-1 processor. In: Kuck D.J, Lawrie D.H, Sameh A.H., editors. High Speed Computer and Algorithm Organization. San Diego: Academic Press; 1977:71-84.

Baskett F., Jermoluk T., Solomon D. The 4D-MP graphics superworkstation: Computing + graphics = 40 MIPS + 40 MFLOPS and 10,000 lighted polygons per second. February 29–March 4, 1988, San Francisco. Proc. IEEE COMPCON. 1988:468-471.

BBN Laboratories. [1986]. Butterfly Parallel Processor Overview, Tech. Rep. 6148, BBN Laboratories, Cambridge, Mass.

Bell C.G. The mini and micro industries. IEEE Computer. 1984;17(10):14-30. (October)

Bell C.G. Multis: A new class of multiprocessor computers. Science. 1985;228:462-467. (April 26)

Bell C.G. The future of high performance computers in science and engineering. Communications of the ACM. 1989;32(9):1091-1101. (September)

Bell, G.,and J. Gray [2001]. Crays, Clusters and Centers, Tech. Rep. MSR-TR-2001-76, Microsoft Research, Redmond, Wash.

Bell C.G, Gray J. What’s next in high performance computing? CACM. 2002;45(2):91-95. (February)

Bell C.G, Newell A. Computer Structures: Readings and Examples. New York: McGraw-Hill, 1971.

Bell C.G, Strecker W.D. Computer structures: What have we learned from the PDP-11?. January 19–21, 1976, Tampa, Fla. Third Annual Int’l. Symposium on Computer Architecture (ISCA). 1976:1-14.

Bell C.G, Strecker W.D. Computer structures: What have we learned from the PDP-11?. ACM, New York. 25 Years of the International Symposia on Computer Architecture (Selected Papers). 1998:138-151.

Bell C.G, Mudge J.C, McNamara J.E. A DEC View of Computer Engineering. Bedford, Mass: Digital Press, 1978.

Bell C.G, Cady R., McFarland H., DeLagi B., O’Laughlin J., Noonan R., Wulf W. A new architecture for mini-computers: The DEC PDP-11. May 5–May 7, 1970, Atlantic City, N.J. Proc. AFIPS Spring Joint Computer Conf.. 1970:657-675.

Benes V.E. Rearrangeable three stage connecting networks. Bell System Technical Journal. 1962;41:1481-1492.

Bertozzi D., Jalabert A., Murali S., Tamhankar R., Stergiou S., Benini L., De Micheli G. NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans. on Parallel and Distributed Systems. 2005;16(2):113-130. (February)

Bhandarkar D.P. Alpha Architecture and Implementations. Newton, Mass: Digital Press, 1995.

Bhandarkar D.P, Clark D.W. Performance from architecture: Comparing a RISC and a CISC with similar hardware organizations. April 8–11, 1991, Palo Alto, Calif. Proc. Fourth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1991:310-319.

Bhandarkar D.P, Ding J. Performance characterization of the Pentium Pro processor. February 1–February 5, 1997, San Antonio, Tex. Proc. Third Int’l. Symposium on High-Performance Computer Architecture. 1997:288-297.

Bhuyan L.N, Agrawal D.P. Generalized hypercube and hyperbus structures for a computer network. IEEE Trans. on Computers. 1984;32(4):322-333. (April)

Bienia, C., S. Kumar, P. S Jaswinder, K. Li [2008]. The Parsec Benchmark Suite: Characterization and Architectural Implications, Tech. Rep. TR-811-08, Princeton University, Princeton, N.J.

Bier, J. [1997]. “The Evolution of DSP Processors,” presentation at Univesity of California, Berkeley, November 14.

Bird S., Phansalkar A., John L.K, Mericas A., Indukuru R. Characterization of performance of SPEC CPU benchmarks on Intel’s Core Microarchitecture based processor. January 21, 2007, Austin, Tex. Proc. 2007 SPEC Benchmark Workshop. 2007.

Birman M., Samuels A., Chu G., Chuk T., Hu L., McLeod J., Barnes J. Developing the WRL3170/3171 SPARC floating-point coprocessors. IEEE Micro. 1990;10(1):55-64.

Blackburn M., Garner R., Hoffman C., Khan A.M, McKinley K.S., Bentzur R., Diwan A., Feinberg D., Frampton D., Guyer S.Z, Hirzel M., Hosking A., Jump M., Lee H., Moss J.E.B, Phansalkar A., Stefanovic D., VanDrunen T., von Dincklage D., Wiedermann B. The DaCapo benchmarks: Java benchmarking development and analysis. October 22–26, 2006. ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 2006:169-190.

Blaum M., Bruck J., Vardy A. MDS array codes with independent parity symbols. IEEE Trans. on Information Theory. 1996;IT-42:529-542. (March)

Blaum M., Brady J., Bruck J., Menon J. EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures. April 18–21, 1994, Chicago. Proc. 21st Annual Int’l. Symposium on Computer Architecture (ISCA). 1994:245-254.

Blaum M., Brady J., Bruck J., Menon J. EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures. IEEE Trans. on Computers. 1995;44(2):192-202. (February)

Blaum M., Brady J., Bruck J., Menon J., Vardy A. The EVENODD code and its generalization. In: Jin H., Cortes T., Buyya R., editors. High Performance Mass Storage and Parallel I/O: Technologies and Applications. New York: Wiley–IEEE; 2001:187-208.

Bloch E. The engineering design of the Stretch computer. December 1–3, 1959, Boston, Mass. 1959 Proceedings of the Eastern Joint Computer Conf.. 1959:48-59.

Boddie J.R. History of DSPs. http://www.lucent.com/micro/dsp/dsphist.html, 2000.

Bolt K.M. Amazon sees sales rise, profit fall. Seattle Post-Intelligencer. 2005. October 25 http://seattlepi.nwsource.com/business/245943_techearns26.html

Bordawekar R., Bondhugula U., Rao R. Believe It or Not!: Multi-core CPUs can Match GPU Performance for a FLOP-Intensive Application!. Vienna, Austria, September 11–15, 2010. 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010). 2010:537-538.

Borg A., Kessler R.E, Wall D.W. Generation and analysis of very long address traces. May 19–21, 1992, Gold Coast, Australia. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992:270-279.

Bouknight W.J, Deneberg S.A, McIntyre D.E., Randall J.M, Sameh A.H, Slotnick D.L. The Illiac IV system. Proc. IEEE. 1972;60(4):369-379. Also appears inSiewiorek D.P., Bell C.G, Newell A. Computer Structures: Principles and Examples. New York: McGraw-Hill, 1982. 306–316

Brady J.T. A theory of productivity in the creative process. IEEE CG&A. 1986. (May), 25–34

Brain M. Inside a Digital Cell Phone. www.howstuffworks.com/inside-cellphone.htm. 2000.

Brandt M., Brooks J., Cahir M., Hewitt T., Lopez-Pineda E., Sandness D. The Benchmarker’s Guide for Cray SV1 Systems. Seattle, Wash: Cray Inc., 2000.

Brent R.P, Kung H.T. A regular layout for parallel adders. IEEE Trans. on Computers. 1982;C-31:260-264.

Brewer E.A, Kuszmaul B.C. How to get good performance from the CM-5 data network. April 26–27, 1994, Cancun, Mexico. Proc. Eighth Int’l. Parallel Processing Symposium. 1994.

Brin S., Page L. The anatomy of a large-scale hypertextual Web search engine. April 14–18, 1998, Brisbane, Queensland, Australia. Proc. 7th Int’l. World Wide Web Conf.. 1998:107-117.

Brown A., Patterson D.A. Towards maintainability, availability, and growth benchmarks: A case study of software RAID systems. June 18–23, 2000, San Diego, Calif. Proc. 2000 USENIX Annual Technical Conf. 2000.

Bucher I.V, Hayes A.H. I/O performance measurement on Cray-1 and CDC 7000 computers. NBS 500-65. Proc. Computer Performance Evaluation Users Group, 16th Meeting. 1980:245-254.

Bucher I.Y. The computational speed of supercomputers. August 29–31, 1983, Minneapolis, Minn. Proc. Int’l. Conf. on Measuring and Modeling of Computer Systems (SIGMETRICS 1983). 1983:151-165.

Bucholtz W. Planning a Computer System: Project Stretch. New York: McGraw-Hill, 1962.

Burgess N., Williams T. Choices of operand truncation in the SRT division algorithm. IEEE Trans. on Computers. 1995;44(7):933-938.

Burkhardt III, H., S. Frank, B. Knobe, J. Rothnie [1992]. Overview of the KSR1 Computer System, Tech. Rep. KSR-TR-9202001, Kendall Square Research, Boston, Mass.

Burks A.W, Goldstine H.H, von Neumann J. Preliminary discussion of the logical design of an electronic computing instrument. In: Aspray W., Burks A., editors. Report to the U.S. Army Ordnance Department, p. 1;. Los Angeles, Calif.: MIT Press, Cambridge, Mass., and Tomash Publishers; 1987:97-146. also appears in Papers of John von Neumann

Calder B., Reinman G., Tullsen D.M. Selective value prediction. May 2–4, 1999, Atlanta, Ga. Proc. 26th Annual Int’l. Symposium on Computer Architecture (ISCA). 1999.

Calder B., Grunwald D., Jones M., Lindsay D., Martin J., Mozer M., Zorn B. Evidence-based static branch prediction using machine learning. ACM Trans. Program. Lang. Syst.. 1997;19(1):188-222.

Callahan D., Dongarra J., Levine D. Vectorizing compilers: A test suite and results. November 12–17, 1988, Orland, Fla. Proc. ACM/IEEE Conf. on Supercomputing. 1988:98-105.

Cantin J.F, Hill M.D. Cache Performance for Selected SPEC CPU2000 Benchmarks. www.jfred.org/cache-data.html. 2001. (June)

Cantin J.F, Hill M.D. Cache Performance for SPEC CPU2000 Benchmarks, Version 3.0. www.cs.wisc.edu/multifacet/misc/spec2000cache-data/index.html. 2003.

Carles S. Amazon reports record Xmas season, top game picks. Gamasutra, December 27 http://www.gamasutra.com/php-bin/news_index.php?story=7630. 2005.

Carter J., Rajamani K. Designing energy-efficient servers and data centers. IEEE Computer. 2010;43(7):76-78. (July)

Case R.P, Padegs A. The architecture of the IBM System/370. Communications of the ACM. 1978;21(1):73-96. Also appears inSiewiorek D.P., Bell C.G, Newell A. Computer Structures: Principles and Examples. New York: McGraw-Hill, 1982. 830–855

Censier L., Feautrier P. A new solution to coherence problems in multicache systems. IEEE Trans. on Computers. 1978;C-27(12):1112-1118. (December)

Chandra R., Devine S., Verghese B., Gupta A., Rosenblum M. Scheduling and page migration for multiprocessor compute servers. October 4–7, 1994, San Jose, Calif. Sixth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1994:12-24.

Chang F., Dean J., Ghemawat S., Hsieh W.C, Wallach D.A, Burrows M., Chandra T., Fikes A., Gruber R.E. Bigtable: A distributed storage system for structured data. November 6–8, 2006, Seattle, Wash. Proc. 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’06). 2006.

Chang J., Meza J., Ranganathan P., Bash C., Shah A. Green server design: Beyond operational energy to sustainability. October 3, 2010, Vancouver, British Columbia. Proc. Workshop on Power Aware Computing and Systems (HotPower ’10). 2010.

Chang P.P, Mahlke S.A, Chen W.Y, Warter N.J, Hwu W.W. IMPACT: An architectural framework for multiple-instruction-issue processors. May 27–30, 1991, Toronto, Canada. 18th Annual Int’l. Symposium on Computer Architecture (ISCA). 1991:266-275.

Charlesworth A.E. An approach to scientific array processing: The architecture design of the AP-120B/FPS-164 family. Computer. 1981;14(9):18-27. (September)

Charlesworth A. Starfire: Extending the SMP envelope. IEEE Micro. 1998;18(1):39-49. (January/February)

Chen P.M, Lee E.K. Striping in a RAID level 5 disk array. May 15–19, 1995, Ottawa, Canada. Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems. 1995:136-145.

Chen P.M, Gibson G.A, Katz R.H, Patterson D.A. An evaluation of redundant arrays of inexpensive disks using an Amdahl 5890. May 22–25, 1990, Boulder, Colo. Proc.ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems. 1990.

Chen P.M, Lee E.K, Gibson G.A, Katz R.H, Patterson D.A. RAID: High-performance, reliable secondary storage. ACM Computing Surveys. 1994;26(2):145-188. (June)

Chen S. Large-scale and high-speed multiprocessor system for scientific applications. June 20–22, 1983, Jülich, West Germany. Proc. NATO Advanced Research Workshop on High-Speed Computing. 1983. Also appears in. Hwang K., editor. Superprocessors: Design and applications. 1984:602-609. IEEE (August)

Chen T.C. Overlap and parallel processing. In: Stone H., editor. Introduction to Computer Architecture. Chicago: Science Research Associates; 1980:427-486.

Chow, F. C. [1983]. “A Portable Machine-Independent Global Optimizer—Design and Measurements,” Ph.D. thesis, Stanford University, Palo Alto, Calif.

Chrysos G.Z, Emer J.S. Memory dependence prediction using store sets. July 3–14, 1998, Barcelona, Spain. Proc. 25th Annual Int’l. Symposium on Computer Architecture (ISCA). 1998:142-153.

Clark B., Deshane T., Dow E., Evanchik S., Finlayson M., Herne J., Neefe Matthews J. Xen and the art of repeated research. June 27–July 2, 2004. Proc. USENIX Annual Technical Conf.. 2004:135-144.

Clark D.W. Cache performance of the VAX-11/780. ACM Trans. on Computer Systems. 1983;1(1):24-37.

Clark D.W. Pipelining and performance in the VAX 8800 processor. October 5–8, 1987, Palo Alto, Calif. Proc. Second Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1987:173-177.

Clark D.W, Emer J.S. Performance of the VAX-11/780 translation buffer: Simulation and measurement. ACM Trans. on Computer Systems. 1985;3(1):31-62. (February)

Clark D., Levy H. Measurement and analysis of instruction set use in the VAX-11/780. April 26–29, 1982, Austin, Tex. Proc. Ninth Annual Int’l. Symposium on Computer Architecture (ISCA). 1982:9-17.

Clark D., Strecker W.D. Comments on ‘the case for the reduced instruction set computer,’. Computer Architecture News. 1980;8(6):34-38. (October)

Clark W.A. The Lincoln TX-2 computer development. February 26–28, 1957, Los Angeles. Proc. Western Joint Computer Conference. 1957:143-145.

Clidaras J., Johnson C., Felderman B. Private communication. 2010.

Climate Savers Computing Initiative. Efficiency Specs. http://www.climatesaverscomputing.org/. 2007.

Clos C. A study of non-blocking switching networks. Bell Systems Technical Journal. 1953;32:406-424. (March)

Cody W.J, Coonen J.T, Gay D.M, Hanson K., Hough D., Kahan W., Karpinski R., Palmer J., Ris F.N, Stevenson D. A proposed radix- and word-lengthindependent standard for floating-point arithmetic. IEEE Micro. 1984;4(4):86-100.

Colwell R.P, Steck R. A 0.6 μm BiCMOS processor with dynamic execution. February 15–17, 1995, San Francisco. Proc. of IEEE Int’l. Symposium on Solid State Circuits (ISSCC). 1995:176-177.

Colwell R.P, Nix R.P, O’Donnell J.J., Papworth D.B, Rodman P.K. A VLIW architecture for a trace scheduling compiler. October 5–8, 1987, Palo Alto, Calif. Proc. Second Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1987:180-192.

Comer D. Internetworking with TCP/IP, 2nd ed. Englewood Cliffs, N.J.: Prentice Hall, 1993.

Compaq Computer Corporation. [1999]. Compiler Writer’s Guide for the Alpha 21264, Order Number EC-RJ66A-TE, June, www1.support.compaq.com/alpha-tools/documentation/current/21264_EV67/ec-rj66a-te_comp_writ_gde_for_alpha21264.pdf.

Conti C., Gibson D.H, Pitkowsky S.H. Structural aspects of the System/360 Model 85. Part I. General organization. IBM Systems J.. 1968;7(1):2-14.

Coonen J. [1984]. “Contributions to a Proposed Standard for Binary Floating-Point Arithmetic,” Ph.D. thesis, University of California, Berkeley.

Corbett P., English B., Goel A., Grcanac T., Kleiman S., Leong J., Sankar S. Row-diagonal parity for double disk failure correction. March 31–April 2, 2004, San Francisco. Proc. 3rd USENIX Conf. on File and Storage Technology (FAST ’04). 2004.

Crawford J., Gelsinger P. Programming the 80386. Alameda, Calif.: Sybex Books, 1988.

Culler D.E, Singh J.P, Gupta A. Parallel Computer Architecture: A Hardware/Software Approach. San Francisco: Morgan Kaufmann, 1999.

Curnow H.J, Wichmann B.A. A synthetic benchmark. The Computer J.. 1976;19(1):43-49.

Cvetanovic Z., Kessler R.E. Performance analysis of the Alpha 21264-based Compaq ES40 system. June 10–14, 2000, Vancouver, Canada. Proc. 27th Annual Int’l. Symposium on Computer Architecture (ISCA). 2000:192-202.

Dally W.J. Performance analysis of k-ary n-cube interconnection networks. IEEE Trans. on Computers. 1990;39(6):775-785. (June)

Dally W.J. Virtual channel flow control. IEEE Trans. on Parallel and Distributed Systems. 1992;3(2):194-205. (March)

Dally W.J. Interconnect limited VLSI architecture. May 24–26, 1999, San Francisco. Proc. of the International Interconnect Technology Conference. 1999.

Dally W.J, Seitz C.I. The torus routing chip. Distributed Computing. 1986;1(4):187-196.

Dally W.J, Towles B. Route packets, not wires: On-chip interconnection networks. June 18–22, 2001, Las Vegas. Proc. 38th Design Automation Conference. 2001.

Dally W.J, Towles B. Principles and Practices of Interconnection Networks. San Francisco: Morgan Kaufmann, 2003.

Darcy J.D, Gay D. FLECKmarks: Measuring floating point performance using a full IEEE compliant arithmetic benchmark. CS 252 class project. Berkeley: University of California. 1996 HTTP.CS.Berkeley.EDU/~darcy/Projects/cs252/

Darley, H. M. et al. [1989]. “Floating Point/Integer Processor with Divide and Square Root Functions,” U.S. Patent 4,878,190, October 31.

Davidson E.S. The design and control of pipelined function generators. January 19–21, 1971, Oaxtepec, Mexico. Proc. IEEE Conf. on Systems, Networks, and Computers. 1971:19-21.

Davidson E.S, Thomas A.T, Shar L.E, Patel J.H. Effective control for pipelined processors. February 25–27, 1975, San Francisco. Proc. IEEE COMPCON. 1975:181-184.

Davie B.S, Peterson L.L, Clark D. Computer Networks: A Systems Approach, 2nd ed. San Francisco: Morgan Kaufmann, 1999.

Dean J. Designs, lessons and advice from building large distributed systems [keynote address]. October 11–14, 2009, Big Sky, Mont. Proc. 3rd ACM SIGOPS Int’l. Workshop on Large-Scale Distributed Systems and Middleware, Co-located with the 22nd ACM Symposium on Operating Systems Principles. 2009.

Dean J., Ghemawat S. MapReduce: Simplified data processing on large clusters. December 6–8, 2004, San Francisco, Calif. In Proc. Operating Systems Design and Implementation (OSDI). 2004:137-150.

Dean J., Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM. 2008;51(1):107-113.

DeCandia G., Hastorun D., Jampani M., Kakulapati G., Lakshman A., Pilchin A., Sivasubramanian S., Vosshall P., Vogels W. Dynamo: Amazon’s highly available key-value store. October 14–17, 2007, Stevenson, Wash. Proc. 21st ACM Symposium on Operating Systems Principles. 2007.

Dehnert J.C, Hsu P.Y.-T., Bratt J.P. Overlapped loop support on the Cydra 5. April 3–6, 1989, Boston, Mass. Proc. Third Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1989:26-39.

Demmel J.W, Li X. Faster numerical algorithms via exception handling. IEEE Trans. on Computers. 1994;43(8):983-992.

Denehy T.E, Bent J., Popovici F.I, Arpaci-Dusseau A.C., Arpaci-Dusseau R.H. Deconstructing storage arrays. October 7–13, 2004, Boston, Mass. Proc. 11th Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2004:59-71.

Desurvire E. Lightwave communications: The fifth generation. Scientific American (International Edition). 1992;266(1):96-103. (January)

Diep T.A, Nelson C., Shen J.P. Performance evaluation of the PowerPC 620 microarchitecture. June 22–24, 1995, Santa Margherita, Italy. Proc. 22nd Annual Int’l. Symposium on Computer Architecture (ISCA). 1995.

Digital Semiconductor. Alpha Architecture Handbook, Version 3. Maynard, Mass: Digital Press, 1996.

Ditzel D.R, McLellan H.R. Branch folding in the CRISP microprocessor: Reducing the branch delay to zero. June 2–5, 1987, Pittsburgh, Penn. Proc. 14th Annual Int’l. Symposium on Computer Architecture (ISCA). 1987:2-7.

Ditzel D.R, Patterson D.A. Retrospective on high-level language computer architecture. May 6–8, 1980, La Baule, France. Proc. Seventh Annual Int’l. Symposium on Computer Architecture (ISCA). 1980:97-104.

Doherty W.J, Kelisky R.P. Managing VM/CMS systems for user effectiveness. IBM Systems J.. 1979;18(1):143-166.

Dongarra J.J. A survey of high performance processors. March 3–6, 1986, San Francisco. Proc. IEEE COMPCON. 1986:8-11.

Dongarra J., Sterling T., Simon H., Strohmaier E. High-performance computing: Clusters, constellations, MPPs, and future directions. Computing in Science & Engineering. 2005;7(2):51-59. (March/April)

Douceur J.R, Bolosky W.J. A large scale study of file-system contents. May 1–9, 1999, Atlanta, Ga. Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems. 1999:59-69.

Douglas J. [2005]. “Intel 8xx series and Paxville Xeon-MP microprocessors,” paper presented at Hot Chips 17, August 14–16, 2005, Stanford University, Palo Alto, Calif.

Duato J. A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans. on Parallel and Distributed Systems. 1993;4(12):1320-1331. (December)

Duato J., Pinkston T.M. A general theory for deadlock-free adaptive routing using a mixed set of resources. IEEE Trans. on Parallel and Distributed Systems. 2001;12(12):1219-1235. (December)

Duato J., Yalamanchili S., Ni L. Interconnection Networks: An Engineering Approach, 2nd printing. San Francisco: Morgan Kaufmann, 2003.

Duato J., Johnson I., Flich J., Naven F., Garcia P., Nachiondo T. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. February 12–16, 2005, San Francisco. Proc. 11th Int’l. Symposium on High-Performance Computer Architecture. 2005.

Duato J., Lysne O., Pang R., Pinkston T.M. Part I: A theory for deadlock-free dynamic reconfiguration of interconnection networks. IEEE Trans. on Parallel and Distributed Systems. 2005;16(5):412-427. (May)

Dubois M., Scheurich C., Briggs F. Synchronization, coherence, and event ordering. IEEE Computer. 1988;21(2):9-21. (February)

Dunigan W., Vetter K., White K., Worley P. Performance evaluation of the Cray X1 distributed shared memory architecture. IEEE Micro. 2005:30-40. January/February

Eden A., Mudge T. The YAGS branch prediction scheme. November 30–December 2, 1998, Dallas, Tex. Proc. of the 31st Annual ACM/IEEE Int’l. Symposium on Microarchitecture. 1998:69-80.

Edmondson J.H, Rubinfield P.I, Preston R., Rajagopalan V. Superscalar instruction execution in the 21164 Alpha microprocessor. IEEE Micro. 1995;15(2):33-43.

Eggers, S. [1989]. “Simulation Analysis of Data Sharing in Shared Memory Multiprocessors,” Ph.D. thesis, University of California, Berkeley.

Elder J., Gottlieb A., Kruskal C.K, McAuliffe K.P., Randolph L., Snir M., Teller P., Wilson J. Issues related to MIMD shared-memory computers: The NYU Ultracomputer approach. June 17–19, 1985, Boston, Mass. Proc. 12th Annual Int’l. Symposium on Computer Architecture (ISCA). 1985:126-135.

Ellis J.R. Bulldog: A Compiler for VLIW Architectures. Cambridge, Mass: MIT Press, 1986.

Emer J.S, Clark D.W. A characterization of processor performance in the VAX-11/780. June 5–7, 1984, Ann Arbor, Mich. Proc. 11th Annual Int’l. Symposium on Computer Architecture (ISCA). 1984:301-310.

Enriquez P. What happened to my dial tone? A study of FCC service disruption reports. October 18–20, 2001, Houston, Tex. poster, Richard Tapia Symposium on the Celebration of Diversity in Computing. 2001.

Erlichson A., Nuckolls N., Chesson G., Hennessy J.L. SoftFLASH: Analyzing the performance of clustered distributed virtual shared memory. October 1–5, 1996, Cambridge, Mass. Proc. Seventh Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1996:210-220.

Esmaeilzadeh H., Cao T., Xi Y., Blackburn S.M, McKinley K.S. Looking Back on the Language and Hardware Revolution: Measured Power, Performance, and Scaling. March 5–11, 2011, Newport Beach, Calif. Proc. 16th Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2011.

Evers M., Patel S.J, Chappell R.S, Patt Y.N. An analysis of correlation and predictability: What makes two-level branch predictors work. July 3–14, 1998, Barcelona, Spain. Proc. 25th Annual Int’l. Symposium on Computer Architecture (ISCA). 1998:52-61.

Fabry R.S. Capability based addressing. Communications of the ACM. 1974;17(7):403-412. (July)

Falsafi B., Wood D.A. Reactive NUMA: A design for unifying S-COMA and CC-NUMA. June 2–4, 1997, Denver, Colo. Proc. 24th Annual Int’l. Symposium on Computer Architecture (ISCA). 1997:229-240.

Fan X., Weber W., Barroso L.A. Power provisioning for a warehouse-sized computer. June 9–13, 2007, San Diego, Calif. Proc. 34th Annual Int’l. Symposium on Computer Architecture (ISCA). 2007.

Farkas K.I, Jouppi N.P. Complexity/performance trade-offs with non-blocking loads. April 18–21, 1994, Chicago. Proc. 21st Annual Int’l. Symposium on Computer Architecture (ISCA). 1994.

Farkas K.I, Jouppi N.P, Chow P. How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors?. January 22–25, 1995, Raleigh, N.C. Proc. First IEEE Symposium on High-Performance Computer Architecture. 1995:78-89.

Farkas K.I, Chow P., Jouppi N.P, Vranesic Z. Memory-system design considerations for dynamically-scheduled processors. June 2–4, 1997, Denver, Colo. Proc. 24th Annual Int’l. Symposium on Computer Architecture (ISCA). 1997:133-143.

Fazio D. It’s really much more fun building a supercomputer than it is simply inventing one. February 23–27, 1987, San Francisco. Proc. IEEE COMPCON. 1987:102-105.

Fisher J.A. Trace scheduling: A technique for global microcode compaction. IEEE Trans. on Computers. 1981;30(7):478-490. (July)

Fisher J.A. Very long instruction word architectures and ELI-512. June 5–7, 1982, Stockholm, Sweden. 10th Annual Int’l. Symposium on Computer Architecture (ISCA). 1982:140-150.

Fisher J.A, Freudenberger S.M. Predicting conditional branches from previous runs of a program. October 12–15, 1992, Boston, Mass. Proc. Fifth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1992:85-95.

Fisher J.A, Rau B.R. Journal of Supercomputing. 1993 January (special issue)

Fisher J.A, Ellis J.R, Ruttenberg J.C, Nicolau A. Parallel processing: A smart compiler and a dumb processor. June 17–22, 1984, Montreal, Canada. Proc. SIGPLAN Conf. on Compiler Construction. 1984:11-16.

Flemming P.J, Wallace J.J. How not to lie with statistics: The correct way to summarize benchmarks results. Communications of the ACM. 1986;29(3):218-221. (March)

Flynn M.J. Very high-speed computing systems. Proc. IEEE. 1966;54(12):1901-1909. (December)

Forgie J.W. The Lincoln TX-2 input-output system. Institute of Radio Engineers, Los Angeles. Proc. Western Joint Computer Conference. 1957:156-160. (February)

Foster C.C, Riseman E.M. Percolation of code to enhance parallel dispatching and execution. IEEE Trans. on Computers. 1972;C-21(12):1411-1415. (December)

Frank S.J. Tightly coupled multiprocessor systems speed memory access time. Electronics. 1984;57(1):164-169. (January)

Freiman C.V. Statistical analysis of certain binary division algorithms. Proc. IRE. 1961;49(1):91-103.

Friesenborg S.E, Wicks R.J. DASD Expectations: The 3380, 3380-23, and MVS/XA. Gaithersburg, Md.: Tech. Bulletin GG22-9363-02, IBM Washington Systems Center, 1985.

Fuller S.H, Burr W.E. Measurement and evaluation of alternative computer architectures. Computer. 1977;10(10):24-35. (October)

Furber S.B. ARM System Architecture. Harlow, England: Addison-Wesley. 1996. see www.cs.man.ac.uk/amulet/publications/books/ARMsysArch.

Gagliardi U.O. Report of workshop 4—software-related advances in computer hardware. September 17–19, 1973, Monterey, Calif. Proc. Symposium on the High Cost of Software. 1973:99-120.

Gajski D., Kuck D., Lawrie D., Sameh A. CEDAR—a large scale multiprocessor. August, Columbus, Ohio. Proc. Int’l. Conf. on Parallel Processing (ICPP). 1983:524-529.

Gallagher D.M, Chen W.Y, Mahlke S.A, Gyllenhaal J.C, Hwu W.W. Dynamic memory disambiguation using the memory conflict buffer. October 4–7, 1994, Santa Jose, Calif. Proc. Sixth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1994:183-193.

Galles M. Scalable pipelined interconnect for distributed endpoint routing: The SGI SPIDER chip. August 15–17, 1996, Stanford University, Palo Alto, Calif. Proc. IEEE HOT Interconnects ’96. 1996.

Game M., Booker A. CodePack code compression for PowerPC processors. MicroNews. 1999;5;1. www.chips.ibm.com/micronews/vol5_no1/codepack.html.

Gao Q.S. The Chinese remainder theorem and the prime memory system. May 16–19, 1993, San Diego, Calif. 20th Annual Int’l. Symposium on Computer Architecture (ISCA). 1993 (Computer Architecture News 21:2 (May), 337–340)

Gap. [2005]. “Gap Inc. Reports Third Quarter Earnings,” http://gapinc.com/public/documents/PR_Q405EarningsFeb2306.pdf.

Gap. [2006]. “Gap Inc. Reports Fourth Quarter and Full Year Earnings,” http://gapinc.com/public/documents/Q32005PressRelease_Final22.pdff.

Garner R., Agarwal A., Briggs F., Brown E., Hough D., Joy B., Kleiman S., Muchnick S., Namjoo M., Patterson D., Pendleton J., Tuck R. Scalable processor architecture (SPARC). February 29–March 4, 1988, San Francisco. Proc. IEEE COMPCON. 1988:278-283.

Gebis J., Patterson D. Embracing and extending 20th-century instruction set architectures. IEEE Computer. 2007;40(4):68-75. (April)

Gee J.D, Hill M.D, Pnevmatikatos D.N, Smith A.J. Cache performance of the SPEC92 benchmark suite. IEEE Micro. 1993;13(4):17-27. (August)

Gehringer E.F, Siewiorek D.P, Segall Z. Parallel Processing: The Cm* Experience. Bedford, Mass: Digital Press, 1987.

Gharachorloo K., Gupta A., Hennessy J.L. Hiding memory latency using dynamic scheduling in shared-memory multiprocessors. May 19–21, 1992, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992.

Gharachorloo K., Lenoski D., Laudon J., Gibbons P., Gupta A., Hennessy J.L. Memory consistency and event ordering in scalable shared-memory multiprocessors. May 28–31, 1990, Seattle, Wash. Proc. 17th Annual Int’l. Symposium on Computer Architecture (ISCA). 1990:15-26.

Ghemawat S., Gobioff H., Leung S.-T. The Google file system. October 19–22, 2003, Bolton Landing, N.Y. Proc. 19th ACM Symposium on Operating Systems Principles. 2003.

Gibson D.H. Considerations in block-oriented systems design. AFIPS Conf. Proc.. 1967;30:75-80.

Gibson G.A. Redundant Disk Arrays: Reliable, Parallel Secondary Storage, ACM Distinguished Dissertation Series. Cambridge, Mass: MIT Press, 1992.

Gibson J. C. [1970] “The Gibson mix,” Rep. TR. 00.2043, IBM Systems Development Division, Poughkeepsie, N.Y. (research done in 1959).

Gibson J., Kunz R., Ofelt D., Horowitz M., Hennessy J., Heinrich M. FLASH vs. (simulated) FLASH: Closing the simulation loop. November 12–15, 2000, Cambridge, Mass. Proc. Ninth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2000:49-58.

Glass C.J, Ni L.M. The Turn Model for adaptive routing. May 19–21, 1992, Gold Coast, Australia. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992.

Goldberg D. What every computer scientist should know about floating-point arithmetic. Computing Surveys. 1991;23(1):5-48.

Goldberg I.B. 27 bits are not enough for 8-digit accuracy. Communications of the ACM. 1967;10(2):105-106.

Goldstein, S. [1987]. Storage Performance—An Eight Year Outlook, Tech. Rep. TR 03.308-1, Santa Teresa Laboratory, IBM Santa Teresa Laboratory, San Jose, Calif.

Goldstine H.H. The Computer: From Pascal to von Neumann. Princeton, N.J.: Princeton University Press, 1972.

González J., González A. Limits of instruction level parallelism with data speculation. June 21–23, 1998, Porto, Portugal. Proc. Vector and Parallel Processing (VECPAR) Conf.. 1998:585-598.

Goodman J.R. Using cache memory to reduce processor memory traffic. June 5–7, 1982, Stockholm, Sweden. Proc. 10th Annual Int’l. Symposium on Computer Architecture (ISCA). 1982:124-131.

Goralski W. SONET: A Guide to Synchronous Optical Network. New York: McGraw-Hill, 1997.

Gosling J.B. Design of Arithmetic Units for Digital Computers. New York: Springer-Verlag, 1980.

Gray J. A census of Tandem system availability between 1985 and 1990. IEEE Trans. on Reliability. 1990;39(4):409-418. (October)

Gray J. The Benchmark Handbook for Database and Transaction Processing Systems, 2nd ed. San Francisco: Morgan Kaufmann, 1993.

Gray J. Sort benchmark home page. 2006. http://sortbenchmark.org/.

Gray J., Reuter A. Transaction Processing: Concepts and Techniques. San Francisco: Morgan Kaufmann, 1993.

Gray J., Siewiorek D.P. High-availability computer systems. Computer. 1991;24(9):39-48. (September)

Gray J., van Ingen C. Empirical Measurements of Disk Failure Rates and Error Rates. Redmond, Wash: MSR-TR-2005-166, Microsoft Research, 2005.

Greenberg A., Jain N., Kandula S., Kim C., Lahiri P., Maltz D., Patel P., Sengupta S. VL2: A Scalable and Flexible Data Center Network. Proc. ACM SIGCOMM. August 17–21, 2009, Barcelona, Spain. 2009.

Grice C., Kanellos M. Cell phone industry at crossroads: Go high or low?. CNET News. 2000, August 31. technews.netscape.com/news/0-1004-201-2518386-0.html?tag=st.ne.1002.tgif.sf.

Groe J.B, Larson L.E. CDMA Mobile Radio Design. Boston: Artech House, 2000.

Gunther K.D. Prevention of deadlocks in packet-switched data transport systems. IEEE Trans. on Communications. 1981;COM–29(4):512-524. (April)

Hagersten E., Koster M. WildFire: A scalable path for SMPs. January 9–12, 1999, Orlando, Fla. Proc. Fifth Int’l. Symposium on High-Performance Computer Architecture. 1998.

Hagersten E., Landin A., Haridi S. DDM—a cache-only memory architecture. IEEE Computer. 1992;25(9):44-54. (September)

Hamacher V.C, Vranesic Z.G, Zaky S.G. Computer Organization, 2nd ed. New York: McGraw-Hill, 1984.

Hamilton J. [2009]. “Data center networks are in my way,” paper presented at the Stanford Clean Slate CTO Summit, October 23, 2009 (http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_CleanSlateCTO2009.pdf).

Hamilton J. [2010]. “Cloud computing economies of scale,” paper presented at the AWS Workshop on Genomics and Cloud Computing, June 8, 2010, Seattle, Wash. (http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_GenomicsCloud20100608.pdf).

Handy J. The Cache Memory Book. Boston: Academic Press, 1993.

Hauck E.A, Dent B.A. Burroughs’ B6500/B7500 stack mechanism. April 30–May 2, 1968, Atlantic City, N.J. Proc. AFIPS Spring Joint Computer Conf.. 1968:245-251.

Heald R., Aingaran K., Amir C., Ang M., Boland M., Das A., Dixit P., Gouldsberry G., Hart J., Horel T., Hsu W.-J, Kaku J., Kim C., Kim S., Klass F., Kwan H., Lo R., McIntyre H., Mehta A., Murata D., Nguyen S., Pai Y.-P, Patel S., Shin K., Tam K., Vishwanthaiah S., Wu J., Yee G., You H. Implementation of third-generation SPARC V9 64-b microprocessor. ISSCC Digest of Technical Papers. 2000:412-413. and slide supplement

Heinrich J. MIPS R4000 User’s Manual. Englewood Cliffs, N.J.: Prentice Hall, 1993.

Henly, M., and B. McNutt [1989]. DASD I/O Characteristics: A Comparison of MVS to VM,” Tech. Rep. TR 02.1550 (May), IBM General Products Division, San Jose, Calif.

Hennessy J. VLSI processor architecture. IEEE Trans. on Computers. 1984;C-33(11):1221-1246. (December)

Hennessy J. VLSI RISC processors. VLSI Systems Design. 1985;6(10):22-32. (October)

Hennessy J., Jouppi N., Baskett F., Gill J. MIPS: A VLSI processor architecture. In: CMU Conference on VLSI Systems and Computations. Rockville, Md.: Computer Science Press; 1981.

Hewlett-Packard. PA-RISC 2.0 Architecture Reference Manual, 3rd ed. Palo Alto, Calif: Hewlett-Packard, 1994.

Hewlett-Packard. HP’s ‘5NINES:5MINUTES’ Vision Extends Leadership and Redefines High Availability in Mission-Critical Environments. February 10 www.future.enterprisecomputing.hp.com/ia64/news/5nines_vision_pr.html. 1998.

Hill, M. D. [1987]. “Aspects of Cache Memory and Instruction Buffer Performance,” Ph.D. thesis, Tech. Rep. UCB/CSD 87/381, Computer Science Division, University of California, Berkeley.

Hill M.D. A case for direct mapped caches. Computer. 1988;21(12):25-40. (December)

Hill M.D. Multiprocessors should support simple memory consistency models. IEEE Computer. 1998;31(8):28-34. (August)

Hillis W.D. The Connection Multiprocessor. Cambridge, Mass: MIT Press, 1985.

Hillis W.D., Steele G.L. Data parallel algorithms. Communications of the ACM. 1986;29;12:1170-1183, (December). http://doi.acm.org/10.1145/7902.7903.

Hinton G., Sager D., Upton M., Boggs D., Carmean D., Kyker A., Roussel P. The microarchitecture of the Pentium 4 processor. Intel Technology Journal. 2001. February

Hintz R.G, Tate D.P. Control data STAR-100 processor design. September 12–14, 1972, San Francisco. Proc. IEEE COMPCON. 1972:1-4.

Hirata H., Kimura K., Nagamine S., Mochizuki Y., Nishimura A., Nakase Y., Nishizawa T. An elementary processor architecture with simultaneous instruction issuing from multiple threads. May 19–21, 1992, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992:136-145.

Hitachi. SuperH RISC Engine SH7700 Series Programming Manual. Santa Clara, Calif: Hitachi. 1997. see www.halsp.hitachi.com/tech_prod/. and search for title

Ho R., Mai K.W, Horowitz M.A. The future of wires. Proc. of the IEEE. 2001;89(4):490-504. (April)

Hoagland A.S. Digital Magnetic Recording. New York: Wiley, 1963.

Hockney R.W, Jesshope C.R. Parallel Computers 2: Architectures, Programming and Algorithms. Bristol, England: Adam Hilger, Ltd., 1988.

Holland J.H. A universal computer capable of executing an arbitrary number of subprograms simultaneously. Proc. East Joint Computer Conf.. 1959;16:108-113.

Holt R.C. Some deadlock properties of computer systems. ACM Computer Surveys. 1972;4(3):179-196. (September)

Hopkins M. [2000]. “A critical look at IA-64: Massive resources, massive ILP, but can it deliver?” Microprocessor Report, February.

Hord R.M. The Illiac-IV, The First Supercomputer. Rockville, Md: Computer Science Press, 1982.

Horel T., Lauterbach G. UltraSPARC-III: Designing third-generation 64-bit performance. IEEE Micro. 1999;19(3):73-85. (May–June)

Hospodor A.D, Hoagland A.S. The changing nature of disk controllers. Proc. IEEE. 1993;81(4):586-594. (April)

Hölzle U. Brawny cores still beat wimpy cores, most of the time. IEEE Micro. 30(4), 2010. (July/August)

Hristea C., Lenoski D., Keen J. Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks. November 16–21, 1997, San Jose, Calif. Proc. ACM/IEEE Conf. on Supercomputing. 1997.

Hsu P. Designing the TFP microprocessor. IEEE Micro. 1994;18(2):2333. (April)

Huck J. Introducing the IA-64 Architecture. IEEE Micro. 2000;20(5):12-23. (September–October)

Hughes C.J, Kaul P., Adve S.V, Jain R., Park C., Srinivasan J. Variability in the execution of multimedia applications and implications for architecture. June 30–July 4, 2001, Goteborg, Sweden. Proc. 28th Annual Int’l. Symposium on Computer Architecture (ISCA). 2001:254-265.

Hwang K. Computer Arithmetic: Principles, Architecture, and Design. New York: Wiley, 1979.

Hwang K. Advanced Computer Architecture and Parallel Programming. New York: McGraw-Hill, 1993.

Hwu W.-M., Patt Y., . HPSm, a high performance restricted data flow architecture having minimum functionality. June 2–5, 1986, Tokyo. Proc. 13th Annual Int’l. Symposium on Computer Architecture (ISCA). 1986:297-307.

Hwu W.W, Mahlke S.A, Chen W.Y, Chang P.P, Warter N.J, Bringmann R.A, Ouellette R.O, Hank R.E, Kiyohara T., Haab G.E, Holm J.G, Lavery D.M. The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomputing. 1993;7(1):229-248. 2 (March)

IBM. The Economic Value of Rapid Response Time. White Plains, N.Y.: GE20-0752-0, IBM, 1982. 11–82

IBM. [1990]. “The IBM RISC System/6000 processor” (collection of papers), IBM J. Research and Development 34:1 (January).

IBM. The PowerPC Architecture. San Francisco: Morgan Kaufmann, 1994.

IBM. Blue Gene. IBM J. Research and Development. 49(2/3), 2005. (special issue)

IEEE. IEEE standard for binary floating-point arithmetic. SIGPLAN Notices. 1985;22(2):9-25.

IEEE. Intel virtualization technology, computer. IEEE Computer Society. 2005;38(5):48-56. (May)

IEEE. 754-2008 Working Group. DRAFT Standard for Floating-Point Arithmetic 754-2008. http://dx.doi.org/10.1109/IEEESTD.2008.4610935. 2006.

Imprimis Product Specification, 97209 Sabre Disk Drive IPI-2 Interface 1.2 GB, Document No. 64402302, Imprimis, Dallas, Tex.

InfiniBand Trade Association. [2001]. InfiniBand Architecture Specifications Release 1.0.a, www.infinibandta.org.

Intel. Using MMX Instructions to Convert RGB to YUV Color Conversion. cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Legacy::irtm_AP548_9996&cntType=IDS_EDITORIAL. 2001.

Internet Retailer. The Gap launches a new site—after two weeks of downtime. Internet® Retailer. 2005, September 28. http://www.internetretailer.com/2005/09/28/the-gap-launches-a-new-site-after-two-weeks-of-downtime.

Jain R. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. New York: Wiley, 1991.

Jantsch A., Tenhunen H., editors. Networks on Chips. The Netherlands: Kluwer Academic Publishers, 2003.

Jimenez D.A, Lin C. Neural methods for dynamic branch prediction. ACM Trans. on Computer Systems. 2002;20(4):369-397. (November)

Johnson M. Superscalar Microprocessor Design. Englewood Cliffs, N.J.: Prentice Hall, 1990.

Jordan H.F. Performance measurements on HEP—a pipelined MIMD computer. June 5–7, 1982, Stockholm, Sweden. Proc. 10th Annual Int’l. Symposium on Computer Architecture (ISCA). 1982:207-212.

Jordan K.E. Performance comparison of large-scale scientific processors: Scalar mainframes, mainframes with vector facilities, and supercomputers. Computer. 1987;20(3):10-23. (March)

Jouppi N.P. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. May 28–31, 1990, Seattle, Wash. Proc. 17th Annual Int’l. Symposium on Computer Architecture (ISCA). 1990:364-373.

Jouppi N.P. Retrospective: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. ACM, New York. 25 Years of the International Symposia on Computer Architecture (Selected Papers). 1998:71-73.

Jouppi N.P, Wall D.W. Available instruction-level parallelism for super-scalar and superpipelined processors. April 3–6, 1989, Boston. Proc. Third Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1989:272-282.

Jouppi N.P, Wilton S.J.E. Trade-offs in two-level on-chip caching. April 18–21, 1994, Chicago. Proc. 21st Annual Int’l. Symposium on Computer Architecture (ISCA). 1994:34-45.

Kaeli D.R, Emma P.G. Branch history table prediction of moving target branches due to subroutine returns. May 27–30, 1991, Toronto, Canada. Proc. 18th Annual Int’l. Symposium on Computer Architecture (ISCA). 1991:34-42.

Kahan J. [1990]. “On the advantage of the 8087’s stack,” unpublished course notes, Computer Science Division, University of California, Berkeley.

Kahan W. 7094-II system support for numerical analysis. In SHARE Secretarial Distribution SSD-159. University of Toronto: Department of Computer Science; 1968.

Kahaner D.K. Benchmarks for ‘real’ programs. SIAM News. 1988. November

Kahn R.E. Resource-sharing computer communication networks. Proc. IEEE. 1972;60(11):1397-1407. (November)

Kane G. MIPS R2000 RISC Architecture. Englewood Cliffs, N.J.: Prentice Hall, 1986.

Kane G. PA-RISC 2.0 Architecture. Upper Saddle River, N.J: Prentice Hall, 1996.

Kane G., Heinrich J. MIPS RISC Architecture. Englewood Cliffs, N.J: Prentice Hall, 1992.

Katz R.H, Patterson D.A, Gibson G.A. Disk system architectures for high performance computing. Proc. IEEE. 1989;77(12):1842-1858. (December)

Keckler S.W, Dally W.J. Processor coupling: Integrating compile time and runtime scheduling for parallelism. May 19–21, 1992, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992:202-213.

Keller R.M. Look-ahead processors. ACM Computing Surveys. 1975;7(4):177-195. (December)

Keltcher C.N, McGrath K.J., Ahmed A., Conway P. The AMD Opteron processor for multiprocessor servers. IEEE Micro. 2003;23;2:66-76, (March–April). dx.doi.org/10.1109.MM.2003.119116.

Kembel R. Fibre Channel: A comprehensive introduction. Internet Week. 2000. April

Kermani P., Kleinrock L. Virtual Cut-Through: A New Computer Communication Switching Technique. Computer Networks. 1979;3:267-286. (January)

Kessler R. The Alpha 21264 microprocessor. IEEE Micro. 1999;19(2):24-36. (March/April)

Kilburn T., Edwards D.B.G, Lanigan M.J, Sumner F.H. One-level storage system. IRE Trans. on Electronic Computers. 1962;EC-11:223-235. (April). Also appears inSiewiorek D.P., Bell C.G, Newell A. Computer Structures: Principles and Examples. New York: McGraw-Hill, 1982. 135–148

Killian E. MIPS R4000 technical overview–64 bits/100 MHz or bust. August 26–27, 1991, Stanford University, Palo Alto, Calif. Hot Chips III Symposium Record. 1991:1.6-1.19.

Kim M.Y. Synchronized disk interleaving. IEEE Trans. on Computers. 1986;C-35(11):978-988. (November)

Kissell K.D. MIPS16: High-density for the embedded market. June 15, 1997, Las Vegas, Nev. Proc. Real Time Systems ’97. 1997. see www.sgi.com/MIPS/arch/MIPS16/MIPS16.whitepaper.pdf.

Kitagawa K., Tagaya S., Hagihara Y., Kanoh Y. A hardware overview of SX-6 and SX-7 supercomputer. NEC Research & Development J.. 2003;44(1):2-7. (January)

Knuth D., 2nd ed. The Art of Computer Programming, Vol. II. Reading, Mass: Addison-Wesley. 1981.

Kogge P.M. The Architecture of Pipelined Computers. New York: McGraw-Hill, 1981.

Kohn L., Fu S.-W. A 1,000,000 transistor microprocessor. February 15–17, 1989, New York. Proc. of IEEE Int’l. Symposium on Solid State Circuits (ISSCC). 1989:54-55.

Kohn L., Margulis N. Introducing the Intel i860 64-Bit Microprocessor. IEEE Micro. 1989;9(4):15-30. (July)

Kontothanassis L., Hunt G., Stets R., Hardavellas N., Cierniak M., Parthasarathy S., Meira W., Dwarkadas S., Scott M. VM-based shared memory on low-latency, remote-memory-access networks. June 2–4, 1997, Denver, Colo. Proc. 24th Annual Int’l. Symposium on Computer Architecture (ISCA). 1997.

Koren I. Computer Arithmetic Algorithms. Englewood Cliffs, N.J: Prentice Hall, 1989.

Kozyrakis C. [2000]. “Vector IRAM: A media-oriented vector processor with embedded DRAM,” paper presented at Hot Chips 12, August 13–15, 2000, Palo Alto, Calif, 13–15.

Kozyrakis C., Patterson D. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks. November 18–22, 2002, Istanbul, Turkey. Proc. 35th Annual Int’l. Symposium on Microarchitecture (MICRO-35). 2002.

Kroft D. Lockup-free instruction fetch/prefetch cache organization. May 12–14, 1981, Minneapolis, Minn. Proc. Eighth Annual Int’l. Symposium on Computer Architecture (ISCA). 1981:81-87.

Kroft D. Retrospective: Lockup-free instruction fetch/prefetch cache organization. ACM, New York. 25 Years of the International Symposia on Computer Architecture. 1998:20-21. (Selected Papers)

Kuck D., Budnik P.P, Chen S.-C, Lawrie D.H, Towle R.A, Strebendt R.E, Davis E.WJr., Han J., Kraska P.W, Muraoka Y. Measurements of parallelism in ordinary FORTRAN programs. Computer. 1974;7(1):37-46. (January)

Kuhn D.R. Sources of failure in the public switched telephone network. IEEE Computer. 1997;30(4):31-36. (April)

Kumar A. The HP PA-8000 RISC CPU. IEEE Micro. 1997;17(2):27-32. (March/April)

Kunimatsu A., Ide N., Sato T., Endo Y., Murakami H., Kamei T., Hirano M., Ishihara F., Tago H., Oka M., Ohba A., Yutaka T., Okada T., Suzuoki M. Vector unit architecture for emotion synthesis. IEEE Micro. 2000;20(2):40-47. (March–April)

Kunkel S.R, Smith J.E. Optimal pipelining in supercomputers. June 2–5, 1986, Tokyo. Proc. 13th Annual Int’l. Symposium on Computer Architecture (ISCA). 1986:404-414.

Kurose J.F, Ross K.W. Computer Networking: A Top-Down Approach Featuring the Internet. Boston: Addison-Wesley, 2001.

Kuskin J., Ofelt D., Heinrich M., Heinlein J., Simoni R., Gharachorloo K., Chapin J., Nakahira D., Baxter J., Horowitz M., Gupta A., Rosenblum M., Hennessy J.L. The Stanford FLASH multiprocessor. April 18–21, 1994, Chicago. Proc. 21st Annual Int’l. Symposium on Computer Architecture (ISCA). 1994.

Lam M. Software pipelining: An effective scheduling technique for VLIW processors. June 22–24, 1988, Atlanta, Ga. SIGPLAN Conf. on Programming Language Design and Implementation. 1988:318-328.

Lam M.S, Wilson R.P. Limits of control flow on parallelism. May 19–21, 1992, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992:46-57.

Lam M.S, Rothberg E.E, Wolf M.E. The cache performance and optimizations of blocked algorithms. April 8–11, 1991, Santa Clara, Calif. Proc. Fourth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1991:63-74. (SIGPLAN Notices 26:4 (April)

Lambright D. Experiences in measuring the reliability of a cache-based storage system. October 22, 2000, San Diego, Calif. Proc. of First Workshop on Industrial Experiences with Systems Software (WIESS 2000), Co-Located with the 4th Symposium on Operating Systems Design and Implementation (OSDI). 2000.

Lamport L. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. on Computers. 1979;C-28(9):241-248. (September)

Lang W., Patel J.M, Shankar S. Wimpy node clusters: What about non-wimpy workloads?. June 7, 2010, Indianapolis, Ind. Proc. Sixth International Workshop on Data Management on New Hardware (DaMoN). 2010.

Laprie J.-C. Dependable computing and fault tolerance: Concepts and terminology. June 19–21, 1985, Ann Arbor, Mich. Proc. 15th Annual Int’l. Symposium on Fault-Tolerant Computing. 1985:2-11.

Larson, E. R. [1973] “Findings of fact, conclusions of law, and order for judgment,” File No. 4-67, Civ. 138, Honeywell v. Sperry-Rand and Illinois Scientific Development, U.S. District Court for the State of Minnesota, Fourth Division (October 19).

Laudon J., Lenoski D. The SGI Origin: A ccNUMA highly scalable server. June 2–4, 1997, Denver, Colo. Proc. 24th Annual Int’l. Symposium on Computer Architecture (ISCA). 1997:241-251.

Laudon J., Gupta A., Horowitz M. Interleaving: A multithreading technique targeting multiprocessors and workstations. October 4–7, 1994, San Jose, Calif. Proc. Sixth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1994:308-318.

Lauterbach G., Horel T. UltraSPARC-III: Designing third generation 64-bit performance. IEEE Micro. 19(3), 1999. (May/June)

Lazowska E.D, Zahorjan J., Graham G.S, Sevcik K.C. Quantitative System Performance: Computer System Analysis Using Queueing Network Models. Englewood Cliffs, N.J.: Prentice Hall. 1984. (Although out of print, it is available online at www.cs.washington.edu/homes/lazowska/qsp/.)

Lebeck A.R, Wood D.A. Cache profiling and the SPEC benchmarks: A case study. Computer. 1994;27(10):15-26. (October)

Lee R. Precision architecture. Computer. 1989;22(1):78-91. (January)

Lee W.V., et al. Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. June 19–23, 2010, Saint-Malo, France. Proc. 37th Annual Int’l. Symposium on Computer Architecture (ISCA). 2010.

Leighton F.T. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. San Francisco: Morgan Kaufmann, 1992.

Leiner A.L. System specifications for the DYSEAC. J. ACM. 1954;1(2):57-81. (April)

Leiner A.L, Alexander S.N. System organization of the DYSEAC. IRE Trans. of Electronic Computers. 1954;EC-3(1):1-10. (March)

Leiserson C.E. Fat trees: Universal networks for hardware-efficient supercomputing. IEEE Trans. on Computers. 1985;C-34(10):892-901. (October)

Lenoski D., Laudon J., Gharachorloo K., Gupta A., Hennessy J.L. The Stanford DASH multiprocessor. May 28–31, 1990, Seattle, Wash. Proc. 17th Annual Int’l. Symposium on Computer Architecture (ISCA). 1990:148-159.

Lenoski D., Laudon J., Gharachorloo K., Weber W.-D, Gupta A., Hennessy J.L, Horowitz M.A, Lam M. The Stanford DASH multiprocessor. IEEE Computer. 1992;25(3):63-79. (March)

Levy H., Eckhouse R. Computer Programming and Architecture: The VAX. Boston: Digital Press, 1989.

Li K. IVY: A shared virtual memory system for parallel computing. In Proc. 1988 Int’l. Conf. on Parallel Processing. University Park, Penn: Pennsylvania State University Press; 1988.

Li, S., K. Chen, J. B. Brockman, N. Jouppi [2011]. “Performance Impacts of Non-blocking Caches in Out-of-order Processors,” HP Labs Tech Report HPL-2011-65 (full text available at http://Library.hp.com/techpubs/2011/Hpl-2011-65.html).

Lim K., Ranganathan P., Chang J., Patel C., Mudge T., Reinhardt S. Understanding and designing new system architectures for emerging warehouse-computing environments. June 21–25, 2008, Beijing, China. Proc. 35th Annual Int’l. Symposium on Computer Architecture (ISCA). 2008.

Lincoln N.R. Technology and design trade offs in the creation of a modern supercomputer. IEEE Trans. on Computers. 1982;C-31(5):363-376. (May)

Lindholm T., Yellin F. The Java Virtual Machine Specification, 2nd ed. Reading, Mass: Addison-Wesley. 1999. (also available online at java.sun.com/docs/books/vmspec/).

Lipasti M.H, Shen J.P. Exceeding the dataflow limit via value prediction. December 2–4, 1996, Paris, France. Proc. 29th Int’l. Symposium on Microarchitecture. 1996.

Lipasti M.H, Wilkerson C.B, Shen J.P. Value locality and load value prediction. October 1–5, 1996, Cambridge, Mass. Proc. Seventh Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1996:138-147.

Liptay J.S. Structural aspects of the System/360 Model 85, Part II: The cache. IBM Systems J.. 1968;7(1):15-21.

Lo J., Barroso L., Eggers S., Gharachorloo K., Levy H., Parekh S. An analysis of database workload performance on simultaneous multithreaded processors. July 3–14, 1998, Barcelona, Spain. Proc. 25th Annual Int’l. Symposium on Computer Architecture (ISCA). 1998:39-50.

Lo J., Eggers S., Emer J., Levy H., Stamm R., Tullsen D. Converting thread-level parallelism into instruction-level parallelism via simultaneous multithreading. ACM Trans. on Computer Systems. 1997;15(2):322-354. (August)

Lovett T., Thakkar S. The Symmetry multiprocessor system. University Park, Penn. Proc. 1988 Int’l. Conf. of Parallel Processing. 1988:303-310.

Lubeck O., Moore J., Mendez R. A benchmark comparison of three supercomputers: Fujitsu VP-200, Hitachi S810/20, and Cray X-MP/2. Computer. 1985;18(12):10-24. (December)

Luk C.-K., Mowry T.C. Automatic compiler-inserted prefetching for pointer-based applications. IEEE Trans. on Computers. 1999;48(2):134-141. (February)

Lunde A. Empirical evaluation of some features of instruction set processor architecture. Communications of the ACM. 1977;20(3):143-152. (March)

Luszczek, P., J. J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, D. Takahashi [2005]. “Introduction to the HPC challenge benchmark suite,” Lawrence Berkeley National Laboratory, Paper LBNL-57493 (April 25), repositories.cdlib.org/lbnl/LBNL-57493.

Maberly N.C. Mastering Speed Reading. New York: New American Library, 1966.

Magenheimer D.J, Peters L., Pettis K.W, Zuras D. Integer multiplication and division on the HP precision architecture. IEEE Trans. on Computers. 1988;37(8):980-990.

Mahlke S.A, Chen W.Y, Hwu W.-M, Rau B.R, Schlansker M.S. Sentinel scheduling for VLIW and superscalar processors. October 12–15, 1992, Boston. Proc. Fifth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1992:238-247.

Mahlke S.A, Hank R.E, McCormick J.E., August D.I, Hwu W.W. A comparison of full and partial predicated execution support for ILP processors. June 22–24, 1995, Santa Margherita, Italy. Proc. 22nd Annual Int’l. Symposium on Computer Architecture (ISCA). 1995:138-149.

Major J.B. Are queuing models within the grasp of the unwashed?. December 11–15, 1989, Reno, Nev. Proc. Int’l. Conf. on Management and Performance Evaluation of Computer Systems. 1989:831-839.

Markstein P.W. Computation of elementary functions on the IBM RISC System/6000 processor. IBM J. Research and Development. 1990;34(1):111-119.

Mathis H.M, Mercias A.E, McCalpin J.D., Eickemeyer R.J, Kunkel S.R. Characterization of the multithreading (SMT) efficiency in Power5. IBM J. Research and Development. 2005;49(4/5):555-564. (July/September)

McCalpin J. STREAM: Sustainable Memory Bandwidth in High Performance Computers. www.cs.virginia.edu/stream/. 2005.

McCalpin, J., D. Bailey, D. Takahashi [2005]. Introduction to the HPC Challenge Benchmark Suite, Paper LBNL-57493 Lawrence Berkeley National Laboratory, University of California, Berkeley, repositories.cdlib.org/lbnl/LBNL-57493.

McCormick, J., and A. Knies [2002]. “A brief analysis of the SPEC CPU2000 benchmarks on the Intel Itanium 2 processor,” paper presented at Hot Chips 14, August 18–20, 2002, Stanford University, Palo Alto, Calif.

McFarling S. Program optimization for instruction caches. April 3–6, 1989, Boston. Proc. Third Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1989:183-191.

McFarling S. Combining Branch Predictors. Palo Alto, Calif: WRL Technical Note TN-36, Digital Western Research Laboratory, 1993.

McFarling S., Hennessy J. Reducing the cost of branches. June 2–5, 1986, Tokyo. Proc. 13th Annual Int’l. Symposium on Computer Architecture (ISCA). 1986:396-403.

McGhan H., O’Connor M. PicoJava: A direct execution engine for Java bytecode. Computer. 1998;31(10):22-30. (October)

McKeeman W.M. Language directed computer design. November 14–16, 1967, Washington, D.C. Proc. AFIPS Fall Joint Computer Conf.. 1967:413-417.

McMahon, F. M. [1986]. “The Livermore FORTRAN Kernels: A Computer Test of Numerical Performance Range,” Tech. Rep. UCRL-55745, Lawrence Livermore National Laboratory, University of California, Livermore.

McNairy C., Soltis D. Itanium 2 processor microarchitecture. IEEE Micro. 2003;23(2):44-55. (March–April)

Mead C., Conway L. Introduction to VLSI Systems. Reading, Mass: Addison-Wesley, 1980.

Mellor-Crummey J.M, Scott M.L. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. on Computer Systems. 1991;9(1):21-65. (February)

Menabrea L.F. Sketch of the analytical engine invented by Charles Babbage. Bibliothèque Universelle de Genève. 82, 1842. (October)

Menon A., Renato Santos J., Turner Y., Janakiraman G., Zwaenepoel W. Diagnosing performance overheads in the xen virtual machine environment. June 11–12, 2005, Chicago. Proc. First ACM/USENIX Int’l. Conf. on Virtual Execution Environments. 2005:13-23.

Merlin P.M, Schweitzer P.J. Deadlock avoidance in store-and-forward networks. Part I. Store-and-forward deadlock. IEEE Trans. on Communications. 1980;COM-28(3):345-354. (March)

Metcalfe R.M. Computer/network interface design: Lessons from Arpanet and Ethernet. IEEE J. on Selected Areas in Communications. 1993;11(2):173-180. (February)

Metcalfe R.M, Boggs D.R. Ethernet: Distributed packet switching for local computer networks. Communications of the ACM. 1976;19(7):395-404. (July)

Metropolis N., Howlett J., Rota G.C., editors. A History of Computing in the Twentieth Century. New York: Academic Press, 1980.

Meyer R.A, Seawright L.H. A virtual machine time sharing system. IBM Systems J.. 1970;9(3):199-218.

Meyers G.J. The evaluation of expressions in a storage-to-storage architecture. Computer Architecture News. 1978;7(3):20-23. (October)

Meyers G.J. Advances in Computer Architecture, 2nd ed. New York: Wiley, 1982.

Micron. Calculating Memory System Power for DDR2. http://download.micron.com/pdf/pubs/designline/dl1Q04.pdf. 2004.

Micron. The Micron® System-Power Calculator. http://www.micron.com/systemcalc. 2006.

MIPS. MIPS16 Application Specific Extension Product Description. www.sgi.com/MIPS/arch/MIPS16/mips16.pdf. 1997.

Miranker G.S, Rubenstein J., Sanguinetti J. Squeezing a Cray-class supercomputer into a single-user package. February 29–March 4, 1988, San Francisco. Proc. IEEE COMPCON. 1988:452-456.

Mitchell D. The Transputer: The time is now. Computer Design (RISC suppl.). 1989:40-41.

Mitsubishi. Mitsubishi 32-Bit Single Chip Microcomputer M32R Family Software Manual. Cypress, Calif: Mitsubishi, 1996.

Miura K., Uchida K. FACOM vector processing system: VP100/200. June 20–22, 1983, Jülich, West Germany. Proc. NATO Advanced Research Workshop on High-Speed Computing. Also appears in Hwang K., editor. Superprocessors: Design and applications, IEEE. 1983:59-73.(August 1984)

Miya E.N. Multiprocessor/distributed processing bibliography. Computer Architecture News. 1985;13(1):27-29.

Montoye R.K, Hokenek E., Runyon S.L. Design of the IBM RISC System/6000 floating-point execution. IBM J. Research and Development. 1990;34(1):59-70.

Moore B., Padegs A., Smith R., Bucholz W. Concepts of the System/370 vector architecture. June 2–5, 1987, Pittsburgh, Penn. 14th Annual Int’l. Symposium on Computer Architecture (ISCA). 1987:282-292.

Moore G.E. Cramming more components onto integrated circuits. Electronics. 1965;38(8):114-117. (April 19)

Morse S., Ravenal B., Mazor S., Pohlman W. Intel microprocessors—8080 to 8086. Computer. 13(10), 1980. (October)

Moshovos A., Sohi G.S. Streamlining inter-operation memory communication via data dependence prediction. December 1–3, 1997, Research Triangle Park, N.C. Proc. 30th Annual Int’l. Symposium on Microarchitecture. 1997:235-245.

Moshovos A., Breach S., Vijaykumar T.N, Sohi G.S. Dynamic speculation and synchronization of data dependences. June 2–4, 1997, Denver, Colo. 24th Annual Int’l. Symposium on Computer Architecture (ISCA). 1997.

Moussouris J., Crudele L., Freitas D., Hansen C., Hudson E., Przybylski S., Riordan T., Rowen C. A CMOS RISC processor with integrated system functions. March 3–6, 1986, San Francisco. Proc. IEEE COMPCON. 1986:191.

Mowry T.C, Lam S., Gupta A. Design and evaluation of a compiler algorithm for prefetching. October 12–15, 1992, Boston (SIGPLAN Notices 27:9 (September). Proc. Fifth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1992:62-73.

MSN Money. Amazon Shares Tumble after Rally Fizzles. http://moneycentral.msn.com/content/CNBCTV/Articles/Dispatches/P133695.asp. 2005.

Muchnick S.S. Optimizing compilers for SPARC. Sun Technology. 1988;1(3):64-77. (Summer)

Mueller M., Alves L.C, Fischer W., Fair M.L, Modi I. RAS strategy for IBM S/390 G5 and G6. IBM J. Research and Development. 1999;43(5–6):875-888. (September–November)

Mukherjee S.S, Weaver C., Emer J.S, Reinhardt S.K, Austin T.M. Measuring architectural vulnerability factors. IEEE Micro. 2003;23(6):70-75.

Murphy B., Gent T. Measuring system and software reliability using an automated data collection process. Quality and Reliability Engineering International. 1995;11;5:341-353. (September–October)

Myer T.H, Sutherland I.E. On the design of display processors. Communications of the ACM. 1968;11(6):410-414. (June)

Narayanan D., Thereska E., Donnelly A., Elnikety S., Rowstron A. Migrating server storage to SSDs: Analysis of trade-offs. April 1–3, 2009, Nuremberg, Germany. Proc. 4th ACM European Conf. on Computer Systems. 2009.

National Research Council. The Evolution of Untethered Communications, Computer Science and Telecommunications Board. Washington, D.C.: National Academy Press, 1997.

National Storage Industry Consortium. Tape Roadmap. www.nsic.org. 1998.

Nelson V.P. Fault-tolerant computing: Fundamental concepts. Computer. 1990;23(7):19-25. (July)

Ngai T.-F., Irwin M.J. Regular, area-time efficient carry-lookahead adders. June 4–6, 1985, University of Illinois, Urbana. Proc. Seventh IEEE Symposium on Computer Arithmetic. 1985:9-15.

Nicolau A., Fisher J.A. Measuring the parallelism available for very long instruction word architectures. IEEE Trans. on Computers. 1984;C-33(11):968-976. (November)

Nikhil R.S, Papadopoulos G.M, Arvind. *T: A multithreaded massively parallel architecture. May 19–21, 1992, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992:156-167.

Noordergraaf L., van der Pas R. Performance experiences on Sun’s WildFire prototype. November 13–19, 1999, Portland, Ore. Proc. ACM/IEEE Conf. on Supercomputing. 1999.

Nyberg C.R, Barclay T., Cvetanovic Z., Gray J., Lomet D. AlphaSort: A RISC machine sort. May 24–27, 1994, Minneapolis, Minn. Proc. ACM SIGMOD. 1994.

Oka M., Suzuoki M. Designing and programming the emotion engine. IEEE Micro. 1999;19(6):20-28. (November–December)

Okada S., Okada S., Matsuda Y., Yamada T., Kobayashi A. System on a chip for digital still camera. IEEE Trans. on Consumer Electronics. 1999;45(3):584-590. (August)

Oliker L., Canning A., Carter J., Shalf J., Ethier S. Scientific computations on modern parallel vector systems. November 6–12, 2004, Pittsburgh, Penn. Proc. ACM/IEEE Conf. on Supercomputing. 2004:10.

Pabst T. Performance Showdown at 133 MHz FSB—The Best Platform for Coppermine. www6.tomshardware.com/mainboard/00q1/000302/. 2000.

Padua D., Wolfe M. Advanced compiler optimizations for supercomputers. Communications of the ACM. 1986;29(12):1184-1201. (December)

Palacharla S., Kessler R.E. Evaluating stream buffers as a secondary cache replacement. April 18–21, 1994, Chicago. Proc. 21st Annual Int’l. Symposium on Computer Architecture (ISCA). 1994:24-33.

Palmer J., Morse S. The 8087 Primer. New York: John Wiley & Sons, 1984. 93

Pan S.-T., So K., Rameh J.T. Improving the accuracy of dynamic branch prediction using branch correlation. October 12–15, 1992, Boston. Proc. Fifth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1992:76-84.

Partridge C. Gigabit Networking. Reading, Mass: Addison-Wesley, 1994.

Patterson D. Reduced instruction set computers. Communications of the ACM. 1985;28(1):8-21. (January)

Patterson D. Latency lags bandwidth. Communications of the ACM. 2004;47(10):71-75. (October)

Patterson D.A, Ditzel D.R. The case for the reduced instruction set computer. Computer Architecture News. 1980;8(6):25-33. (October)

Patterson D.A, Hennessy J.L. Computer Organization and Design: The Hardware/Software Interface, 3rd ed. San Francisco: Morgan Kaufmann, 2004.

Patterson, D. A., G. A. Gibson, and R. H. Katz [1987]. A Case for Redundant Arrays of Inexpensive Disks (RAID), Tech. Rep. UCB/CSD 87/391, University of California, Berkeley. Also appeared in Proc. ACM SIGMOD, June 1–3, 1988, Chicago, 109–116.

Patterson D.A, Garrison P., Hill M., Lioupis D., Nyberg C., Sippel T., Van Dyke K. Architecture of a VLSI instruction cache for a RISC. June 13–16, 1983, Stockholm, Sweden. 10th Annual Int’l. Conf. on Computer Architecture Conf. Proc.. 1983:108-116.

Pavan P., Bez R., Olivo P., Zanoni E. Flash memory cells—an overview. Proc. IEEE. 1997;85(8):1248-1271. (August)

Peh L.S, Dally W.J. A delay model and speculative architecture for pipe-lined routers. January 22–24, 2001 Monterrey, Mexico. Proc. 7th Int’l. Symposium on High-Performance Computer Architecture. 2001.

Peng V., Samudrala S., Gavrielov M. On the implementation of shifters, multipliers, and dividers in VLSI floating point units. May 19–21, 1987, Como, Italy. Proc. 8th IEEE Symposium on Computer Arithmetic. 1987:95-102.

Pfister G.F. In Search of Clusters, 2nd ed. Upper Saddle River, N.J.: Prentice Hall, 1998.

Pfister G.F, Brantley W.C, George D.A, Harvey S.L, Kleinfekder W.J, McAuliffe K.P., Melton E.A, Norton V.A, Weiss J. The IBM research parallel processor prototype (RP3): Introduction and architecture. June 17–19, 1985, Boston, Mass. Proc. 12th Annual Int’l. Symposium on Computer Architecture (ISCA). 1985:764-771.

Pinheiro E., Weber W.D, Barroso L.A. Failure trends in a large disk drive population. February 13–16, 2007, San Jose, Calif. Proc. 5th USENIX Conference on File and Storage Technologies (FAST ’07). 2007.

Pinkston T.M. Deadlock characterization and resolution in interconnection networks. In: Zhuand M.C., Fanti M.P., editors. Deadlock Resolution in Computer-Integrated Systems. Boca Raton, FL: CRC Press; 2004:445-492.

Pinkston T.M, Shin J. Trends toward on-chip networked microsystems. Int’l. J. of High Performance Computing and Networking. 2005;3(1):3-18.

Pinkston T.M, Warnakulasuriya S. On deadlocks in interconnection networks. June 2–4, 1997, Denver, Colo. 24th Annual Int’l. Symposium on Computer Architecture (ISCA). 1997.

Pinkston T.M, Benner A., Krause M., Robinson I., Sterling T. InfiniBand: The ‘de facto’ future standard for system and local area networks or just a scalable replacement for PCI buses? Cluster Computing (special issue on communication architecture for clusters). 2003;6(2):95-104. (April)

Postiff M.A, Greene D.A, Tyson G.S, Mudge T.N. The limits of instruction level parallelism in SPEC95 applications. Computer Architecture News. 1999;27(1):31-40. (March)

Przybylski S.A. Cache Design: A Performance-Directed Approach. San Francisco: Morgan Kaufmann, 1990.

Przybylski S.A, Horowitz M., Hennessy J.L. Performance trade-offs in cache design. May 30–June 2, 1988, Honolulu, Hawaii. 15th Annual Int’l. Symposium on Computer Architecture. 1988:290-298.

Puente V., Beivide R., Gregorio J.A, Prellezo J.M, Duato J., Izu C. Adaptive bubble router: A design to improve performance in torus networks. September 21–24, 1999, Aizu-Wakamatsu, Fukushima, Japan. Proc. 28th Int’l. Conference on Parallel Processing. 1999.

Radin G. The 801 minicomputer. March 1–3, 1982, Palo Alto, Calif. Proc. Symposium Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1982:39-47.

Rajesh Bordawekar, Uday Bondhugula, Ravi Rao: Believe it or not!: mult-core CPUs can match GPU performance for a FLOP-intensive application! 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010), Vienna, Austria, September 11-15, 2010: 537-538.

Ramamoorthy C.V, Li H.F. Pipeline architecture. ACM Computing Surveys. 1977;9(1):61-102. (March)

Ranganathan P., Leech P., Irwin D., Chase J. Ensemble-Level Power Management for Dense Blade Servers. June 17–21, 2006, Boston, Mass. Proc. 33rd Annual Int’l. Symposium on Computer Architecture (ISCA). 2006:66-77.

Rau B.R. Iterative modulo scheduling: An algorithm for software pipelining loops. November 30–December 2, 1994, San Jose, Calif. Proc. 27th Annual Int’l. Symposium on Microarchitecture. 1994:63-74.

Rau B.R, Glaeser C.D, Picard R.L. Efficient code generation for horizontal architectures: Compiler techniques and architectural support. April 26–29, 1982, Austin, Tex. Proc. Ninth Annual Int’l. Symposium on Computer Architecture (ISCA). 1982:131-139.

Rau B.R, Yen D.W.L, Yen W., Towle R.A. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. IEEE Computers. 1989;22(1):12-34. (January)

Reddi V.J., Lee B.C, Chilimbi T., Vaid K. Web search using mobile cores: Quantifying and mitigating the price of efficiency. June 19–23, 2010, Saint-Malo, France. Proc. 37th Annual Int’l. Symposium on Computer Architecture (ISCA). 2010.

Redmond K.C, Smith T.M. Project Whirlwind—The History of a Pioneer Computer. Boston: Digital Press, 1980.

Reinhardt S.K, Larus J.R, Wood D.A. Tempest and Typhoon: User-level shared memory. April 18–21, 1994, Chicago. 21st Annual Int’l. Symposium on Computer Architecture (ISCA). 1994:325-336.

Reinman G., Jouppi N.P. Extensions to CACTI. research.compaq.com/wrl/people/jouppi/CACTI.html. 1999.

Rettberg R.D, Crowther W.R, Carvey P.P, Towlinson R.S. The Monarch parallel processor hardware design. IEEE Computer. 1990;23(4):18-30. (April)

Riemens A., Vissers K.A, Schutten R.J, Sijstermans F.W, Hekstra G.J, La Hei G.D. Trimedia CPU64 application domain and benchmark suite. October 10–13, 1999, Austin, Tex. Proc. IEEE Int’l. Conf. on Computer Design: VLSI in Computers and Processors (ICCD’99). 1999:580-585.

Riseman E.M, Foster C.C. Percolation of code to enhance paralleled dispatching and execution. IEEE Trans. on Computers. 1972;C-21(12):1411-1415. (December)

Robin J., Irvine C. Analysis of the Intel Pentium’s ability to support a secure virtual machine monitor. August 14–17, 2000, Denver, Colo. Proc. USENIX Security Symposium. 2000.

Robinson B., Blount L. The VM/HPO 3880-23 Performance Results. Gaithersburg, Md: IBM Tech. Bulletin GG66-0247-00, IBM Washington Systems Center, 1986.

Ropers, A., H. W Lollman, J. Wellhausen [1999]. DSPstone: Texas Instruments TMS320C54x, Tech. Rep. IB 315 1999/9-ISS-Version 0.9, Aachen University of Technology, Aaachen, Germany (www.ert.rwth-aachen.de/Projekte/Tools/coal/dspstone_c54x/index.html).

Rosenblum M., Herrod S.A, Witchel E., Gupta A. Complete computer simulation: The SimOS approach. in IEEE Parallel and Distributed Technology (now called Concurrency). 1995;4(3):34-43.

Rowen C., Johnson M., Ries P. The MIPS R3010 floating-point coprocessor. IEEE Micro. 1988;8(3):53-62. (June)

Russell R.M. The Cray-1 processor system. Communications of the ACM. 1978;21(1):63-72. (January)

Rymarczyk J. Coding guidelines for pipelined processors. March 1–3, 1982, Palo Alto, Calif. Proc. Symposium Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1982:12-19.

Saavedra-Barrera, R. H. [1992]. “CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Benchmarking,” Ph.D. dissertation, University of California, Berkeley.

Salem K., Garcia-Molina H. Disk striping. February 5–7, 1986, Washington, D.C. Proc. 2nd Int’l. IEEE Conf. on Data Engineering. 1986:249-259.

Saltzer J.H, Reed D.P, Clark D.D. End-to-end arguments in system design. ACM Trans. on Computer Systems. 1984;2(4):277-288. (November)

Samples A. D., and P. N. Hilfinger [1988]. Code Reorganization for Instruction Caches, Tech. Rep. UCB/CSD 88/447, University of California, Berkeley.

Santoro M.R, Bewick G., Horowitz M.A. Rounding algorithms for IEEE multipliers. September 6–8, 1989, Santa Monica, Calif. Proc. Ninth IEEE Symposium on Computer Arithmetic. 1989:176-183.

Satran J., Smith D., Meth K., Sapuntzakis C., Wakeley M., Von Stamwitz P., Haagens R., Zeidner E., Dalle Ore L., Klein Y. iSCSI. IPS Working Group of IETF. 2001. Internet draft www.ietf.org/internet-drafts/draft-ietf-ips-iscsi-07.txt.

Saulsbury A., Wilkinson T., Carter J., Landin A. An argument for Simple COMA. January 22–25, 1995, Raleigh, N.C. Proc. First IEEE Symposium on High-Performance Computer Architectures. 1995:276-285.

Schneck P.B. Superprocessor Architecture. Norwell, Mass: Kluwer Academic Publishers, 1987.

Schroeder B., Gibson G.A. Understanding failures in petascale computers. J. of Physics Conf. Series. 2007;78(1):188-198.

Schroeder B., Pinheiro E., Weber W.-D. DRAM errors in the wild: a large-scale field study. June 15–19, 2009, Seattle, Wash. Proc. Eleventh Int’l. Joint Conf. on Measurement and Modeling of Computer Systems (SIGMETRICS). 2009.

Schurman E., Brutlag J. The user and business impact of server delays. June 22–24, 2009, San Jose, Calif. Proc. Velocity: Web Performance and Operations Conf. 2009.

Schwartz J.T. Ultracomputers. ACM Trans. on Programming Languages and Systems. 1980;4(2):484-521.

Scott N.R. Computer Number Systems and Arithmetic. Englewood Cliffs, N.J.: Prentice Hall, 1985.

Scott S.L. Synchronization and communication in the T3E multiprocessor. October 1–5, 1996, Cambridge, Mass. Seventh Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1996.

Scott S.L, Goodman J. The impact of pipelined channels on k-ary n-cube networks. IEEE Trans. on Parallel and Distributed Systems. 1994;5(1):1-16. (January)

Scott S.L, Thorson G.M. The Cray T3E network: Adaptive routing in a high performance 3D torus. August 15–17, 1996, Stanford University, Palo Alto, Calif. Proc. IEEE HOT Interconnects ’96. 1996:14-156.

Scranton R. A., D. A. Thompson, and D. W. Hunter [1983]. The Access Time Myth,” Tech. Rep. RC 10197 (45223), IBM, Yorktown Heights, N.Y.

Seagate. [2000]. Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate, Scotts Valley, Calif. (www.seagate.com/support/disc/manuals/scsi/29478b.pdf).

Seitz C.L. The Cosmic Cube (concurrent computing). Communications of the ACM. 1985;28(1):22-33. (January)

Senior J.M. Optical Fiber Commmunications: Principles and Practice, 2nd ed. Hertfordshire, U.K.: Prentice Hall, 1993.

Sharangpani H., Arora K. Itanium Processor Microarchitecture. IEEE Micro. 2000;20(5):24-43. (September–October)

Shurkin J. Engines of the Mind: A History of the Computer. New York: W.W. Norton, 1984.

Shustek L. J. [1978]. “Analysis and Performance of Computer Instruction Sets,” Ph.D. dissertation, Stanford University, Palo Alto, Calif.

Silicon Graphics. [1996]. MIPS V Instruction Set (see www.sgi.com/MIPS/arch/ISA5/#MIPSV_indx).

Singh J.P, Hennessy J.L, Gupta A. Scaling parallel programs for multiprocessors: Methodology and examples. Computer. 1993;26(7):22-33. (July)

Sinharoy B., Koala R.N, Tendler J.M, Eickemeyer R.J, Joyner J.B. POWER5 system microarchitecture. IBM J. Research and Development. 2005;49(4–5):505-521.

Sites, R. [1979]. Instruction Ordering for the CRAY-1 Computer, Tech. Rep. 78-CS-023, Dept. of Computer Science, University of California, San Diego.

Sites R.L., editor. Alpha Architecture Reference Manual. Burlington, Mass: Digital Press, 1992.

Sites R.L, Witek R., editors. Alpha Architecture Reference Manual, 2nd ed., Newton, Mass: Digital Press, 1955.

Skadron K., Clark D.W. Design issues and tradeoffs for write buffers. February 1–5, 1997, San Antonio, Tex. Proc. Third Int’l. Symposium on High-Performance Computer Architecture. 1997:144-155.

Skadron K., Ahuja P.S, Martonosi M., Clark D.W. Branch prediction, instruction-window size, and cache size: Performance tradeoffs and simulation techniques. IEEE Trans. on Computers. 48(11), 1999. (November)

Slater R. Portraits in Silicon. Cambridge, Mass: MIT Press, 1987.

Slotnick D.L, Borck W.C, McReynolds R.C. The Solomon computer. December 4–6, 1962, Philadelphia, Penn. Proc. AFIPS Fall Joint Computer Conf.. 1962:97-107.

Smith A.J. Cache memories. Computing Surveys. 1982;14(3):473-530. (September)

Smith A., Lee J. Branch prediction strategies and branch-target buffer design. Computer. 1984;17(1):6-22. (January)

Smith B.J. A pipelined, shared resource MIMD computer. August, Bellaire, Mich. Proc. Int’l. Conf. on Parallel Processing (ICPP). 1978:6-8.

Smith B.J. Architecture and applications of the HEP multiprocessor system. Real-Time Signal Processing IV. 1981;298:241-248. (August)

Smith J.E. A study of branch prediction strategies. May 12–14, 1981, Minneapolis, Minn. Proc. Eighth Annual Int’l. Symposium on Computer Architecture (ISCA). 1981:135-148.

Smith J.E. Decoupled access/execute computer architectures. ACM Trans. on Computer Systems. 1984;2(4):289-308. (November)

Smith J.E. Characterizing computer performance with a single number. Communications of the ACM. 1988;31(10):1202-1206. (October)

Smith J.E. Dynamic instruction scheduling and the Astronautics ZS-1. Computer. 1989;22(7):21-35. (July)

Smith J.E, Goodman J.R. A study of instruction cache organizations and replacement policies. June 5–7, 1982, Stockholm, Sweden. Proc. 10th Annual Int’l. Symposium on Computer Architecture (ISCA). 1982:132-137.

Smith J.E, Pleszkun A.R. Implementing precise interrupts in pipelined processors. IEEE Trans. on Computers. 1988;37(5):562-573. (May) (This paper is based on an earlier paper that appeared in Proc. 12th Annual Int’l. Symposium on Computer Architecture (ISCA), June 17–19, 1985, Boston, Mass.)

Smith J.E, Dermer G.E, Vanderwarn B.D, Klinger S.D, Rozewski C.M, Fowler D.L, Scidmore K.R, Laudon J.P. The ZS-1 central processor. October 5–8, 1987, Palo Alto, Calif. Proc. Second Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1987:199-204.

Smith M.D, Horowitz M., Lam M.S. Efficient superscalar performance through boosting. October 12–15, 1992, Boston. Proc. Fifth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1992:248-259.

Smith M.D, Johnson M., Horowitz M.A. Limits on multiple instruction issue. April 3–6, 1989, Boston. Proc. Third Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1989:290-302.

Smotherman M. A sequencing-based taxonomy of I/O systems and review of historical machines. Computer Architecture News. 1989;17(5):5-15 (September). Reprinted in Hill M.D, Jouppi N.P, Sohi G.S., editors. Computer Architecture Readings. San Francisco: Morgan Kaufmann. 1999:451-461.

Sodani A., Sohi G. Dynamic instruction reuse. June 2–4, 1997, Denver, Colo. Proc. 24th Annual Int’l. Symposium on Computer Architecture (ISCA). 1997.

Sohi G.S. Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Trans. on Computers. 1990;39(3):349-359. (March)

Sohi G.S, Vajapeyam S. Tradeoffs in instruction format design for horizontal architectures. April 3–6, 1989, Boston. Proc. Third Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1989:15-25.

Soundararajan V., Heinrich M., Verghese B., Gharachorloo K., Gupta A., Hennessy J.L. Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors. July 3–14, 1998, Barcelona, Spain. Proc. 25th Annual Int’l. Symposium on Computer Architecture (ISCA). 1998:342-355.

SPEC. [1989]. SPEC Benchmark Suite Release 1.0 (October 2).

SPEC. [1994]. SPEC Newsletter (June).

Sporer M., Moss F.H, Mathais C.J. An introduction to the architecture of the Stellar Graphics supercomputer. February 29–March 4, 1988, San Francisco. Proc. IEEE COMPCON. 1988:464.

Spurgeon C. Charles Spurgeon’s Ethernet Web Site. wwwhost.ots.utexas.edu/ethernet/ethernet-home.html. 2001.

Spurgeon C. Charles Spurgeon’s Ethernet Web SITE. www.ethermanage.com/ethernet/ethernet.html. 2006.

Stenström P., Joe T., Gupta A. Comparative performance evaluation of cache-coherent NUMA and COMA architectures. May 19–21, 1992, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992:80-91.

Sterling T. Beowulf PC Cluster Computing with Windows and Beowulf PC Cluster Computing with Linux. Cambridge, Mass: MIT Press, 2001.

Stern N. Who invented the first electronic digital computer? Annals of the History of Computing. 1980;2(4):375-376. (October)

Stevens W.R. TCP/IP Illustrated (three volumes). Reading, Mass: Addison-Wesley, 1994–1996.

Stokes J. Sound and Vision: A Technical Overview of the Emotion Engine. arstechnica.com/reviews/1q00/playstation2/ee-1.html. 2000.

Stone H. High Performance Computers. New York: Addison-Wesley, 1991.

Strauss W. DSP Strategies 2002. www.usadata.com/market_research/spr_05/spr_r127-005.htm. 1998.

Strecker W.D. Cache memories for the PDP-11?. January 19–21, 1976, Tampa, Fla. Proc. Third Annual Int’l. Symposium on Computer Architecture (ISCA). 1976:155-158.

Strecker W.D. VAX-11/780: A virtual address extension of the PDP-11 family. June 5–8, 1978, Anaheim, Calif., 47. Proc. AFIPS National Computer Conf.. 1978:967-980.

Sugumar R.A, Abraham S.G. Efficient simulation of caches under optimal replacement with applications to miss characterization. May 17–21, 1993, Santa Clara, Calif. Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems. 1993:24-35.

Sun Microsystems. [1989]. The SPARC Architectural Manual, Version 8, Part No. 8001399-09, Sun Microsystems, Santa Clara, Calif.

Sussenguth E. IBM’s ACS-1 Machine. IEEE Computer. 22(11), 1999. (November)

Swan R.J, Fuller S.H, Siewiorek D.P. Cm*—a modular, multi-microprocessor. June 13–16, 1977, Dallas, Tex. Proc. AFIPS National Computing Conf.. 1977:637-644.

Swan R.J, Bechtolsheim A., Lai K.W, Ousterhout J.K. The implementation of the Cm* multi-microprocessor. June 13–16, 1977, Dallas, Tex. Proc. AFIPS National Computing Conf.. 1977:645-654.

Swartzlander E., editor. Computer Arithmetic. Los Alamitos, Calif: IEEE Computer Society Press, 1990.

Takagi N., Yasuura H., Yajima S. High-speed VLSI multiplication algorithm with a redundant binary addition tree. IEEE Trans. on Computers. 1985;C-34(9):789-796.

Talagala, N. [2000]. “Characterizing Large Storage Systems: Error Behavior and Performance Benchmarks,” Ph.D. dissertation, Computer Science Division, University of California, Berkeley.

Talagala, N., and D. Patterson [1999]. An Analysis of Error Behavior in a Large Storage System, Tech. Report UCB//CSD-99-1042, Computer Science Division, University of California, Berkeley.

Talagala N., Arpaci-Dusseau R., Patterson D. Micro-Benchmark Based Extraction of Local and Global Disk Characteristics. University of California, Berkeley: CSD-99-1063, Computer Science Division, 2000.

Talagala N., Asami S., Patterson D., Futernick R., Hart D. The art of massive storage: A case study of a Web image archive. Computer. 2000. (November)

Tamir Y., Frazier G. Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans. on Computers. 1992;41(6):725-734. (June)

Tanenbaum A.S. Implications of structured programming for machine architecture. Communications of the ACM. 1978;21(3):237-246. (March)

Tanenbaum A.S. Computer Networks, 2nd ed. Englewood Cliffs, N.J: Prentice Hall, 1988.

Tang C.K. Cache design in the tightly coupled multiprocessor system. June 7–10, 1976, New York. Proc. AFIPS National Computer Conf.. 1976:749-753.

Tanqueray D. The Cray X1 and supercomputer road map. December 11–12, 2002, Daresbury Laboratories, Daresbury, Cheshire, U.K. Proc. 13th Daresbury Machine Evaluation Workshop. 2002.

Tarjan, D., S. Thoziyoor, and N. Jouppi [2005]. “HPL Technical Report on CACTI 4.0,” www.hpl.hp.com/techeports/2006/HPL=2006+86.html.

Taylor G.S. Compatible hardware for division and square root. May 18–19, 1981, University of Michigan, Ann Arbor, Mich. Proc. 5th IEEE Symposium on Computer Arithmetic. 1981:127-134.

Taylor G.S. Radix 16 SRT dividers with overlapped quotient selection stages. June 4–6, 1985, University of Illinois, Urbana, Ill. Proc. Seventh IEEE Symposium on Computer Arithmetic. 1985:64-71.

Taylor G., Hilfinger P., Larus J., Patterson D., Zorn B. Evaluation of the SPUR LISP architecture. June 2–5, 1986, Tokyo. Proc. 13th Annual Int’l. Symposium on Computer Architecture (ISCA). 1986.

Taylor M.B, Lee W., Amarasinghe S.P, Agarwal A. Scalar operand networks. IEEE Trans. on Parallel and Distributed Systems. 2005;16(2):145-162. (February)

Tendler J.M, Dodson J.S, Fields J.SJr., Le H., Sinharoy B. Power4 system microarchitecture. IBM J. Research and Development. 2002;46(1):5-26.

Texas Instruments. History of Innovation: 1980s. www.ti.com/corp/docs/company/history/1980s.shtml. 2000.

Tezzaron Semiconductor. [2004]. Soft Errors in Electronic Memory, White Paper, Tezzaron Semiconductor, Naperville, Ill. (http://www.tezzaron.com/about/papers/soft_errors_1_1_secure.pdf).

Thacker C.P, McCreight E.M., Lampson B.W, Sproull R.F, Boggs D.R. Alto: A personal computer. In: Siewiorek D.P., Bell C.G, Newell A., editors. Computer Structures: Principles and Examples. New York: McGraw-Hill; 1982:549-572.

Thadhani A.J. Interactive user productivity. IBM Systems J.. 1981;20(4):407-423.

Thekkath R., Singh A.P, Singh J.P, John S., Hennessy J.L. An evaluation of a commercial CC-NUMA architecture—the CONVEX Exemplar SPP1200. April 1–7, 1997, Geneva, Switzerland. Proc. 11th Int’l. Parallel Processing Symposium (IPPS). 1997.

Thorlin J.F. Code generation for PIE (parallel instruction execution) computers. April 18–20, 1967, Atlantic City, N.J. Proc. Spring Joint Computer Conf.. 1967:27.

Thornton J.E. Parallel operation in the Control Data 6600. October 27–29, 1964, San Francisco, 26. Proc. AFIPS Fall Joint Computer Conf., Part II. 1964:33-40.

Thornton J.E. Design of a Computer, the Control Data 6600. Glenview, Ill: Scott, Foresman, 1970.

Tjaden G.S, Flynn M.J. Detection and parallel execution of independent instructions. IEEE Trans. on Computers. 1970;C-19(10):889-895.

Tomasulo R.M. An efficient algorithm for exploiting multiple arithmetic units. IBM J. Research and Development. 1967;11(1):25-33. (January)

Torrellas J., Gupta A., Hennessy J. Characterizing the caching and synchronization performance of a multiprocessor operating system. October 12–15, 1992, Boston (SIGPLAN Notices 27:9 (September). Proc. Fifth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1992:162-174.

Touma W.R. The Dynamics of the Computer Industry: Modeling the Supply of Workstations and Their Components. Boston: Kluwer Academic, 1993.

Tuck N., Tullsen D. Initial observations of the simultaneous multithreading Pentium 4 processor. September 27–October 1, 2003, New Orleans, La. Proc. 12th Int. Conf. on Parallel Architectures and Compilation Techniques (PACT’03). 2003:26-34.

Tullsen D.M, Eggers S.J, Levy H.M. Simultaneous multithreading: Maximizing on-chip parallelism. June 22–24, 1995, Santa Margherita, Italy. Proc. 22nd Annual Int’l. Symposium on Computer Architecture (ISCA). 1995:392-403.

Tullsen D.M, Eggers S.J, Emer J.S, Levy H.M, Lo J.L, Stamm R.L. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. May 22–24, 1996, Philadelphia, Penn. Proc. 23rd Annual Int’l. Symposium on Computer Architecture (ISCA). 1996:191-202.

Ungar D., Blau R., Foley P., Samples D., Patterson D. Architecture of SOAR: Smalltalk on a RISC. June 5–7, 1984, Ann Arbor, Mich. Proc. 11th Annual Int’l. Symposium on Computer Architecture (ISCA). 1984:188-197.

Unger S.H. A computer oriented towards spatial problems. Proc. Institute of Radio Engineers. 1958;46(10):1744-1750. (October)

Vahdat A., Al-Fares M., Farrington N., Niranjan Mysore R., Porter G., Radhakrishnan S. Scale-Out Networking in the Data Center. IEEE Micro. 2010;30(4):29-41. (July/August)

Vaidya A.S., Sivasubramaniam A, Das C.R. Performance benefits of virtual channels and adaptive routing: An application-driven study. November 16–21, 1997, San Jose, Calif. Proc. ACM/IEEE Conf. on Supercomputing. 1997.

Vajapeyam, S. [1991]. “Instruction-Level Characterization of the Cray Y-MP Processor,” Ph.D. thesis, Computer Sciences Department, University of Wisconsin-Madison.

van Eijndhoven J.T.J., Sijstermans F.W, Vissers K.A, Pol E.J.D, Tromp M.I.A, Struik P., Bloks R.H.J, van der Wolf P., Pimentel A.D, Vranken H.P.E. Trimedia CPU64 architecture. October 10–13, 1999, Austin, Tex. Proc. IEEE Int’l. Conf. on Computer Design: VLSI in Computers and Processors (ICCD’99). 1999:586-592.

Van Vleck T. The IBM 360/67 and CP/CMS. http://www.multicians.org/thvv/360-67.html. 2005.

von Eicken T., Culler D.E, Goldstein S.C, Schauser K.E. Active Messages: A mechanism for integrated communication and computation. May 19–21, 1992, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1992.

Waingold E., Taylor M., Srikrishna D., Sarkar V., Lee W., Lee V., Kim J., Frank M., Finch P., Barua R., Babb J., Amarasinghe S., Agarwal A. Baring it all to software: Raw Machines. IEEE Computer. 1997;30:86-93. (September)

Wakerly J. Microcomputer Architecture and Programming. New York: Wiley, 1989.

Wall D.W. Limits of instruction-level parallelism. April 8–11, 1991, Palo Alto, Calif. Proc. Fourth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1991:248-259.

Wall D.W. Limits of Instruction-Level Parallelism. Palo Alto, Calif: Research Rep. 93/6, Western Research Laboratory, Digital Equipment Corp., 1993.

Walrand J. Communication Networks: A First Course. Homewood, Ill: Aksen Associates/Irwin, 1991.

Wang W.-H., Baer J.-L, Levy H.M. Organization and performance of a two-level virtual-real cache hierarchy. May 28–June 1, 1989, Jerusalem. Proc. 16th Annual Int’l. Symposium on Computer Architecture (ISCA). 1989:140-148.

Watanabe T. Architecture and performance of the NEC supercomputer SX system. Parallel Computing. 1987;5:247-255.

Waters F., editor. IBM RT Personal Computer Technology. Austin, Tex: SA 23-1057,0 IBM, 1986.

Watson W.J. The TI ASC—a highly modular and flexible super processor architecture. December 5–7, 1972, Anaheim, Calif. Proc. AFIPS Fall Joint Computer Conf.. 1972:221-228.

Weaver D.L, Germond T. The SPARC Architectural Manual, Version 9. Englewood Cliffs, N.J.: Prentice Hall, 1994.

Weicker R.P. Dhrystone: A synthetic systems programming benchmark. Communications of the ACM. 1984;27(10):1013-1030. (October)

Weiss S., Smith J.E. Instruction issue logic for pipelined supercomputers. June 5–7, 1984, Ann Arbor, Mich. Proc. 11th Annual Int’l. Symposium on Computer Architecture (ISCA). 1984:110-118.

Weiss S., Smith J.E. A study of scalar compilation techniques for pipelined supercomputers. October 5–8, 1987, Palo Alto, Calif. Proc. Second Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1987:105-109.

Weiss S., Smith J.E. Power and PowerPC. San Francisco: Morgan Kaufmann, 1994.

Wendel D., Kalla R., Friedrich J., Kahle J., Leenstra J., Lichtenau C., Sinharoy B., Starke W., Zyuban V. The Power7 processor SoC. June 2–4, 2010, Grenoble, France. Proc. Int’l. Conf. on IC Design and Technology. 2010:71-73.

Weste N., Eshraghian K. Principles of CMOS VLSI Design: A Systems Perspective, 2nd ed. Reading, Mass: Addison-Wesley, 1993.

Wiecek C. A case study of the VAX 11 instruction set usage for compiler execution. March 1–3, 1982, Palo Alto, Calif. Proc. Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1982:177-184.

Wilkes M. Slave memories and dynamic storage allocation. IEEE Trans. Electronic Computers. 1965;EC-14(2):270-271. (April)

Wilkes M.V. Hardware support for memory protection: Capability implementations. March 1–3, 1982, Palo Alto, Calif. Proc. Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1982:107-116.

Wilkes M.V. Memoirs of a Computer Pioneer. Cambridge, Mass: MIT Press, 1985.

Wilkes M.V. Computing Perspectives. San Francisco: Morgan Kaufmann, 1995.

Wilkes M.V, Wheeler D.J, Gill S. The Preparation of Programs for an Electronic Digital Computer. Cambridge, Mass: Addison-Wesley, 1951.

Williams S., Waterman A., Patterson D. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM. 2009;52(4):65-76. (April)

Williams T.E, Horowitz M., Alverson R.L, Yang T.S. A self-timed chip for division. In: Losleben P., editor. Stanford Conference on Advanced Research in VLSI. Cambridge, Mass: MIT Press, 1987.

Wilson A.WJr. Hierarchical cache/bus architecture for shared-memory multiprocessors. June 2–5, 1987, Pittsburgh, Penn. Proc. 14th Annual Int’l. Symposium on Computer Architecture (ISCA). 1987:244-252.

Wilson R.P, Lam M.S. Efficient context-sensitive pointer analysis for C programs. June 18–21, 1995, La Jolla, Calif. Proc. ACM SIGPLAN’95 Conf. on Programming Language Design and Implementation. 1995:1-12.

Wolfe A., Shen J.P. A variable instruction stream extension to the VLIW architecture. April 8–11, 1991, Palo Alto, Calif. Proc. Fourth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1991:2-14.

Wood D.A, Hill M.D. Cost-effective parallel computing. IEEE Computer. 1995;28(2):69-72. (February)

Wulf W. Compilers and computer architecture. Computer. 1981;14(7):41-47. (July)

Wulf W., Bell C.G. C.mmp—A multi-mini-processor. December 5–7, 1972, Anaheim, Calif. Proc. AFIPS Fall Joint Computer Conf.. 1972:765-777.

Wulf W., Harbison S.P. Reflections in a pool of processors—an experience report on C.mmp/Hydra. June 5–8, 1978, Anaheim, Calif. Proc. AFIPS National Computing Conf.. 1978:939-951.

Wulf W.A, McKee S.A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News. 1995;23(1):20-24. (March)

Wulf W.A, Levin R., Harbison S.P. Hydra/C.mmp: An Experimental Computer System. New York: McGraw-Hill, 1981.

Yamamoto W., Serrano M.J, Talcott A.R, Wood R.C, Nemirosky M. Performance estimation of multistreamed, superscalar processors. January 4–7, 1994, Maui. Proc. 27th Annual Hawaii Int’l. Conf. on System Sciences. 1994:195-204.

Yang Y., Mason G. Nonblocking broadcast switching networks. IEEE Trans. on Computers. 1991;40(9):1005-1015. (September)

Yeager K. The MIPS R10000 superscalar microprocessor. IEEE Micro. 1996;16(2):28-40. (April)

Yeh T., Patt Y.N. Alternative implementations of two-level adaptive branch prediction. May 19–21, 1993, Gold Coast, Australia. Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA). 1993:124-134. 1992

Yeh T., Patt Y.N. A comparison of dynamic branch predictors that use two levels of branch history. May 16–19, 1993, San Diego, Calif. Proc. 20th Annual Int’l. Symposium on Computer Architecture (ISCA). 1993:257-266.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset