Donald Yeung's Publications

Articles in Refereed Symposia, Conferences, and Workshops:

Martin Peckerar, Po-Chun Huang, Rachid Ahmad Jamil, Bruce Jacob, and Donald Yeung. Critical Issues in Advanced ReRAM Development. In Proceedings of the 9th International Symposium on Memory Systems. Alexandria, VA. October 2023.
[pdf]

Daniel Gerzhoy and Donald Yeung. Pipelined CPU-GPU Scheduling to Reduce Main Memory Accesses. In Proceedings of the 7th International Symposium on Memory Systems. Virtual Conference. October-December, 2021.
[pdf]

Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Luyi Kang, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald Yeung. Tileable Monolithic ReRAM Memory Design. In Proceedings of the IEEE Symposium on Low-Power and High-Speed Chips and Systems. Tokyo, Japan. April 2020.
[pdf]

Daniel Gerzhoy, Xiaowu Sun, Michael Zuzak, and Donald Yeung. Exploiting Nested MIMD-SIMD Parallelism on Heterogeneous Microprocessors. Presented at the High Performance and Embedded Architecture and Compilation Conference. Bologna, Italy. January 2020.

Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Luyi Kang, Shang Li, Mehdi Asnaashari, Sylvain Dubois, Donald Yeung, and Bruce Jacob. Design for ReRAM-based Main-Memory Architectures. In Proceedings of the 5th International Symposium on Memory Systems. Washington, D.C. September 2019.
[pdf]

Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Luyi Kang, Shang Li, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald Yeung. Memory Systems Challenges in Realizing Monolithic Computers. In Proceedings of the 4th International Symposium on Memory Systems. National Harbor, MD. October 2018.
[pdf]

Abdel-Hameed A. Badawy and Donald Yeung. Optimizing Locality in Graph Computations using Reuse Distance Profiles. In Proceedings of the 36th International Performance Computing and Communications Conference. San Diego, CA. December 2017.
(IEEE digital library distribution)

Michael Zuzak and Donald Yeung. Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors. In Proceedings of the 10th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with HiPEAC-12. Stockholm, Sweden. January 2017.
Best paper award.
[pdf]

Stephen P. Crago and Donald Yeung. Reducing Data Movement with Approximate Computing Techniques. In Proceedings of the IEEE International Conference on Rebooting Computing. San Diego, CA. October 2016.
[pdf]

Caleb Serafy, Ankur Srivastava, Avram Bar-Cohen, and Donald Yeung. Design Space Exploration of 3D CPUs and Micro-Fluidic Heatsinks with Thermo-Electrical-Physical Co-Optimization. In ASME 2015 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems, and 13th International Conference on Nanochannels, Microchannels, and Minichannels. San Francisco, CA. July 2015.
(ASME digital collection distribution)

Minshu Zhao and Donald Yeung. Studying the Impact of Multicore Processor Scaling on Directory Techniques via Reuse Distance Analysis. In Proceedings of the 21st International Symposium on High Performance Computer Architecture (HPCA-XXI). San Francisco Bay Area, CA. February 2015.
[pdf, gzip'd ps]

Caleb Serafy, Ankur Srivastava, and Donald Yeung. Unlocking the True Potential of 3D CPUs with Micro-Fluidic Cooling. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'14). La Jolla, California. August 2014.
(ACM digital library distribution)

Caleb Serafy, Bing Shi, Ankur Srivastava, and Donald Yeung. Electro-Thermo Co-Design for Micro-Fluidically Cooled 3D ICs. In Proceedings of the 51st Design Automation Conference, Work-In-Progress Session. San Francisco, California. June 2014.

Caleb Serafy, Ankur Srivastava, Bing Shi, and Donald Yeung. Continued Frequency Scaling in 3D ICs through Micro-fluidic Cooling. In Proceedings of the IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems. Orlando, Florida. May 2014.
(IEEE digital library distribution)

Caleb Serafy, Bing Shi, Ankur Srivastava, and Donald Yeung. High Performance 3D Stacked DRAM Processor Architectures with Micro-Fluidic Cooling. In Proceedings of the IEEE 3D System Integration Conference. San Francisco, CA. October 2013.
(IEEE digital library distribution)

Meng-Ju Wu, Minshu Zhao, and Donald Yeung. Studying Multicore Processor Scaling via Reuse Distance Analysis. In Proceedings of the 40th International Symposium on Computer Architecture (ISCA-XL). Tel-Aviv, Israel. June 2013.
[pdf, gzip'd ps]

Meng-Ju Wu and Donald Yeung. Identifying Optimal Multicore Cache Hierarchies for Loop-based Parallel Programs via Reuse Distance Analysis. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC-2012). Beijing, China. June 2012.
[pdf]

Meng-Ju Wu and Donald Yeung. Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT-XX). Galveston Island, TX. October 2011.
[pdf, gzip'd ps]

Eric Lau, Jason Miller, Inseok Choi, Donald Yeung, Saman Amarasinghe, and Anant Agarwal. Multicore Performance Optimization using Positive Energy Partnerships. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar '11). Berkeley, CA. May 2011.
[pdf]

Inseok Choi, Minshu Zhao, Xu Yang, and Donald Yeung. Early Experience with Profiling and Optimizing Distributed Shared Cache Performance on Tilera's Tile Processor. In Proceedings of the 6th International Workshop on Unique Chips and Systems. Atlanta, GA. December 2010.
One of 2 best papers out of 12 papers appearing in the workshop.
[pdf, gzip'd ps]

Wanli Liu and Donald Yeung. Using Aggressor Thread Information to Improve Shared Cache Management for CMPs. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT-XVIII). Raleigh, NC. September 2009.
[pdf, gzip'd ps]

Xuanhua Li and Donald Yeung. Exploiting Value Prediction for Fault Tolerance. In Proceedings of the 3rd Workshop on Dependable Architectures (WDA-III). Lake Como, Italy. November 2008.
[pdf, gzip'd ps]

Xuanhua Li and Donald Yeung. Application-Level Correctness and its Impact on Fault Tolerance. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture (HPCA-XIII). Phoenix, AZ. February 2007.
[pdf, ps]

Xuanhua Li and Donald Yeung. Exploiting Soft Computing for Increased Fault Tolerance. In Proceedings of the 2006 Workshop on Architectural Support for Gigascale Integration. Boston, MA. June 2006.
[pdf, ps]

Seungryul Choi and Donald Yeung. Learning-Based SMT Processor Resource Distribution via Hill-Climbing. In Proceedings of the 33rd International Symposium on Computer Architecture (ISCA-XXXIII). Boston, MA. June 2006.
[pdf, gzip'd ps]

Kursad Albayraktaroglu, Aamer Jaleel, Xue Wu, Manoj Franklin, Bruce Jacob, Chau-Wen Tseng, and Donald Yeung. BioBench: A Benchmark Suite of Bioinformatics Applications. In Proceedings of the 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-V). Austin, TX. March 2005.
[pdf] [benchmark suite download]

Deepak N. Agarwal, Sumitkumar N. Pamnani, Gang Qu, and Donald Yeung. Transferring Performance Gain from Software Prefetching to Energy Reduction. In Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS2004). Vancouver, Canada. May 2004.
[pdf, gzip'd ps]

Dongkeun Kim, Steve Shih-wei Liao, Perry Wang, Juan del Cuvillo, Xinmin Tian, Xiang Zou, Hong Wang, Donald Yeung, Milind Girkar, and John Shen. Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors. In Proceedings of the 2004 International Symposium on Code Generation and Optimization with Special Emphasis on Feedback-Directed and Runtime Optimization (CGO2004). San Jose, CA. March 2004.
[pdf, gzip'd ps]

Deepak Agarwal, Wanli Liu, and Donald Yeung. Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption. Fourth Workshop on Complexity-Effective Design. San Diego, CA. June 2003.
[pdf, gzip'd ps]

Dongkeun Kim and Donald Yeung. Design and Evaluation of Compiler Algorithms for Pre-Execution. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). San Jose, CA. October 2002.
[pdf, gzip'd ps]

Gautham K. Dorai and Donald Yeung. Transparent Threads: Resource Allocation in SMT Processors for High Single-Thread Performance. In Proceedings of the 11th Annual International Conference on Parallel Architectures and Compilation Techniques (PACT-XI). Charlottesville, VA. September 2002.
[pdf, gzip'd ps]

Nicholas Kohout, Seungryul Choi, Dongkeun Kim, and Donald Yeung. Multi-Chain Prefetching: Effective Exploitation of Inter-Chain Memory Parallelism for Pointer-Chasing Codes. In Proceedings of the 10th Annual International Conference on Parallel Architectures and Compilation Techniques (PACT-X). Barcelona, Spain. September 2001.
[pdf, gzip'd ps]

Abdel-Hameed A. Badawy, Aneesh Aggarwal, Donald Yeung, and Chau-Wen Tseng. Evaluating the Impact of Memory System Performance on Software Prefetching and Locality Optimizations. In Proceedings of the 15th Annual International Conference on Supercomputing (ICS-XV). Sorrento, Italy. June 2001.
[pdf, gzip'd ps]

Nicholas Kohout, Seungryul Choi, and Donald Yeung. Multi-Chain Prefetching: Exploiting Memory Parallelism in Pointer-Chasing Codes. Solving the Memory Wall Problem Workshop. Vancouver, Canada. June 2000.
[pdf, gzip'd ps]

Donald Yeung. The Scalability of Multigrain Systems. In Proceedings of the 13th Annual International Conference on Supercomputing (ICS-XIII). Rhodes, Greece. June 1999.
[pdf, gzip'd ps]

Donald Yeung, Nicholas Kohout, Sujata Ramasubramanian, Ilya Khazanov, and Rishi Kurichh. Vortex: Irregular Data Stream Support for Data-Intensive Applications. Eighth Scalable Shared Memory Multiprocessors Workshop. Atlanta, GA. April 1999.
[abstract]

Andras Moritz, Donald Yeung, and Anant Agarwal. Exploring Optimal Cost-Performance Designs for Raw Microprocessors. In Proceedings of the 6th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM-VI). Napa, California. April 1998.
[pdf, gzip'd ps]

Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA-XXIII). Philadelphia, PA. May 1996.
[pdf, gzip'd ps]

Anant Agarwal, Ricardo Bianchini, David Chaiken, Kirk L. Johnson, David Kranz, John Kubiatowicz, Beng-Hong Lim, Ken Mackenzie, and Donald Yeung. The MIT Alewife Machine: Architecture and Performance. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA-XXII). Santa Margherita, Italy. June 1995.
[pdf, gzip'd ps]

John Kubiatowicz, David Chaiken, Anant Agarwal, Arthur Altman, Jonathan Babb, David Kranz, Beng-Hong Lim, Ken Mackenzie, John Piscitello, and Donald Yeung. The Alewife CMMU: Addressing the Multiprocessor Communications Gap. In Hot Chips: A Symposium on High Performance Chips. Stanford, CA. August 1994.
[pdf, gzip'd ps]

Donald Yeung and Anant Agarwal. Experience with Fine-Grain Synchronization in MIMD Machines for Preconditioned Conjugate Gradient. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP-IV). San Diego, California. May 1993.
[pdf, gzip'd ps]

Invited Conference and Workshop Papers:

Stephen Crago, Janice Onanian McMahon, Chris Archer, Krste Asanovic, Richard Chaung, Keith Goolsbey, Mary Hall, Christos Kozyrakis, Kunle Olukotun, Una-May O'Reilly, Rick Pancoast, Viktor Prasanna, Rodric Rabbah, Steve Ward, and Donald Yeung. CEARCH: Cognition-Enabled Architecture. In Proceedings of the 10th Annual High Performance Embedded Computing Workshop. Lexington, MA. September 2006.
[pdf]

Meng-Ju Wu, Minshu Zhao, Mike Badamo, Jeff Casarona, and Donald Yeung. Characterizing Embedded Applications via Multicore Reuse Distance Analysis. In Workshop on Suite of Embedded Applications and Kernels (Poster Session), held in conjunction with DAC-LI. San Francisco, CA. June 2014.

Articles in Refereed Journals:

Candace Walden, Devesh Singh, Meenatchi Jagasivamani, Shang Li, Luyi Kang, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald Yeung. Monolithically Integrating Non-Volatile Main Memory Over the Last-Level Cache. ACM Transactions on Architecture and Code Optimization. Vol. 18, No. 4, Article 48. July 2021.
(ACM digital library distribution)

Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Luyi Kang, Shang Li, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald Yeung. Analyzing the Monolithic Integration of a ReRAM-based Main Memory into a CPU's Die. IEEE Micro (Special Issue on Monolithic 3D Architectures). Vol. 39, Issue 6. November/December 2019.
(IEEE digital library distribution)

Daniel Gerzhoy, Xiaowu Sun, Michael Zuzak, and Donald Yeung. Nested MIMD-SIMD Parallelization for Heterogeneous Microprocessors. ACM Transactions on Architecture and Code Optimization. Vol. 16, No. 4, Article 48. December 2019.
(ACM digital library distribution)

Minshu Zhao and Donald Yeung. Using Multicore Reuse Distance to Study Coherence Directories. ACM Transactions on Computer Systems. Vol. 35, No. 2. Article 4. October 2017.
[pdf]
(ACM digital library distribution)

Abdel-Hameed A. Badawy and Donald Yeung. Guiding Locality Optimizations for Graph Computations via Reuse Distance Analysis. IEEE Computer Architecture Letters. Vol. 16, Issue 2. pp. 119-122. July - December 2017.
(IEEE digital library distribution)

I. Stephen Choi and Donald Yeung. Multi-Cache Resizing via Greedy Coordinate Descent. Journal of Supercomputing. Vol. 73, No. 6. pp. 2402-2429. June 2017.
(Springer digital library distribution)

Michael Badamo, Jeff Casarona, Minshu Zhao, and Donald Yeung. Identifying Power Efficient Multicore Cache Hierarchies via Reuse Distance Analysis. ACM Transactions on Computer Systems. Vol. 34, No. 1. Article 3. pp. 1-30. April 2016. (c) 2016 ACM
[pdf]
(ACM digital library distribution)

Caleb Serafy, Avram Bar-Cohen, Ankur Srivastava, and Donald Yeung. Unlocking the True Potential of 3D CPUs with Micro-Fluidic Cooling. IEEE Transactions on Very Large Scale Integration Systems. Vol. 24, No. 4. pp. 1515-1523. April 2016.
(IEEE digital library distribution)

Meng-Ju Wu and Donald Yeung. Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs. ACM Transactions on Computer Systems. Vol. 31, No. 1. Article 1. pp. 1-37. February 2013. (c) 2013 ACM.
[pdf]
(ACM digital library distribution)

Inseok Choi, Minshu Zhao, Xu Yang, and Donald Yeung. Experience with Improving Distributed Shared Cache Performance on Tilera's Tile Processor. IEEE Computer Architecture Letters. Volume 10, Issue 1. July 2011.
[pdf, gzip'd ps]

Wanli Liu and Donald Yeung. Enhancing LTP-Driven Cache Management Using Reuse Distance Information. Journal of Instruction-Level Parallelism. Vol. 11. pp. 1-24. April 2009.
[pdf, gzip'd ps]

Seungryul Choi and Donald Yeung. Hill-Climbing SMT Processor Resource Distribution. ACM Transactions on Computer Systems. Vol. 27, No. 1. Article 1. pp. 1-47. February 2009. (c) 2009 ACM
[pdf]
(ACM digital library distribution)

Xuanhua Li and Donald Yeung. Exploiting Application-Level Correctness for Low-Cost Fault Tolerance. Journal of Instruction-Level Parallelism. Vol. 10. pp. 1-28. September 2008.
[pdf, gzip'd ps]

Sumit Pamnani, Deepak Agarwal, Gang Qu, and Donald Yeung. Low Power System Design with Performance Enhancement Techniques--General Approach and Case Study. Journal of Circuits, Systems, and Computers. Vol. 16, No. 5. pp. 745-767. October 2007.
[pdf]

Dongkeun Kim and Donald Yeung. A Study of Source-Level Compiler Algorithms for Automatic Construction of Pre-Execution Code. ACM Transactions on Computer Systems. Vol. 22, No. 3. pp. 326-379. August 2004. (c) 2004 ACM
[pdf, gzip'd ps]
(ACM digital library distribution)

Abdel-Hameed A. Badawy, Aneesh Aggarwal, Donald Yeung, and Chau-Wen Tseng. The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems. Journal of Instruction-Level Parallelism. Vol. 6. July 2004.
[pdf, gzip'd ps]

Seungryul Choi, Nicholas Kohout, Sumit Pamnani, Dongkeun Kim, and Donald Yeung. A General Framework for Prefetch Scheduling in Linked Data Structures and its Application to Multi-Chain Prefetching. ACM Transactions on Computer Systems. Vol. 22, No. 2. pp. 214-280. May 2004. (c) 2004 ACM
[pdf, gzip'd ps]
(ACM digital library distribution)

Gautham K. Dorai, Donald Yeung, and Seungryul Choi. Optimizing SMT Processors for High Single-Thread Performance. Journal of Instruction-Level Parallelism. Vol. 5. pp. 1-35. April 2003.
[pdf, gzip'd ps]

Andras Moritz, Donald Yeung, and Anant Agarwal. SimpleFit: A Framework for Analyzing Design Tradeoffs in Raw Architectures. IEEE Transactions on Parallel and Distributed Systems. Vol. 12, No. 6. pp. 730-742. June 2001.
[pdf, gzip'd ps]

Donald Yeung, John Kubiatowicz, and Anant Agarwal. Multigrain Shared Memory. ACM Transactions on Computer Systems. Vol. 18, No. 2. pp. 154-196. May 2000. (c) 2000 ACM
[pdf, gzip'd ps]
(ACM digital library distribution)

Anant Agarwal, Ricardo Bianchini, David Chaiken, Frederic T. Chong, Kirk L. Johnson, David Kranz, John D. Kubiatowicz, Beng-Hong Lim, Kenneth Mackenzie, and Donald Yeung. The MIT Alewife Machine. Proceedings of the IEEE. Vol. 87, No. 3. pp. 430-444. March 1999.
[pdf, gzip'd ps]

Anant Agarwal, John Kubiatowicz, David Kranz, Beng-Hong Lim, Donald Yeung, Godfrey D'Souza, and Mike Parkin. Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors. IEEE Micro. pp. 48-61. June 1993.
[pdf, gzip'd ps]

Chapters in Books:

Janice McMahon, Steve Crago, and Donald Yeung. Advanced Microprocessor Architectures. High Performance Embedded Computing Handbook: A Systems Perspective. CRC Press. 2008.

Yan Solihin and Donald Yeung. Data Cache Prefetching. Speculative Execution in High Performance Computer Architectures. CRC Press. 2005.

David Kranz, Beng-Hong Lim, Donald Yeung, and Anant Agarwal. Low-Cost Support for Fine-Grain Synchronization in Multiprocessors. Multithreading: A Summary of the State of the Art. Kluwer Academic Publishers. 1992.
[pdf, gzip'd ps]

Articles in Review:

Technical Reports:

Devesh Singh and Donald Yeung. SRTP: Predicting Store Reuse Time to Improve ReRAM Energy and Endurance. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2022-01. May 2022.

Daniel Gerzhoy and Donald Yeung. Pipelined CPU-GPU Scheduling for Caches. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2021-01. March 2021.
[pdf]

Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Shang Li, Luyi Kang, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald Yeung. Design and Evaluation of Monolithic Computers Implemented Using Crossbar ReRAM. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2019-01. July 2019.
[pdf]

Michael Zuzak and Donald Yeung. Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2016-01. November 2016.
[pdf]

Minshu Zhao and Donald Yeung. Studying Directory Access Patterns via Reuse Distance Analysis and Evaluating Their Impact on Multi-Level Directory Caches. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2014-01. January 2014.
[pdf]

Inseok Choi and Donald Yeung. Symbiotic Cache Resizing for CMPs with Shared LLC. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2013-02. September 2013.
[pdf]

Inseok Choi and Donald Yeung. Multi-Level Cache Resizing. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2012-11. November 2012.
[pdf]

Meng-Ju Wu and Donald Yeung. Understanding Multicore Cache Behavior of Loop-based Parallel Programs via Reuse Distance Analysis. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2012-01. January 2012.
[pdf]

Meng-Ju Wu and Donald Yeung. Memory Performance Analysis for Parallel Programs Using Concurrent Reuse Distance. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2010-10. October 2010.
[pdf]

Meng-Ju Wu and Donald Yeung. Scaling Single-Program Performance on Large-Scale Chip Multiprocessors. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2009-16. November 2009.
[pdf]

Wanli Liu and Donald Yeung. Probabilistic Replacement: Enabling Flexible Use of Shared Caches for CMPs. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2008-13. July 2008.
[pdf]

Wanli Liu and Donald Yeung. Enhancing LTP-Driven Cache Management Using Reuse Distance Information. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2007-33. June 2007.
[pdf]

Xuanhua Li and Donald Yeung. Application-Level Correctness and its Impact on Fault Tolerance. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2006-36. August 2006.
[pdf]

Meng-Ju Wu and Donald Yeung. Parallelization of the SSCA #3 Benchmark on the RAW Processor. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2006-42. August 2006.
[pdf]

Seungryul Choi and Donald Yeung. Hill-Climbing SMT Processor Resource Scheduler. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2005-30. May 2005.
[pdf]

Gautham K. Dorai, Donald Yeung, and Seungryul Choi. Optimizing SMT Processors for High Single-Thread Performance. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2003-07. January 2003.
[pdf, gzip'd ps]

Deepak Agarwal and Donald Yeung. Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2002-64. July 2002.
[pdf, gzip'd ps]

Dongkeun Kim and Donald Yeung. Using Program Slicing to Drive Pre-Execution on Simultaneous Multithreading Processors. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2001-49. June 2001.
[pdf, gzip'd ps]

Aneesh Aggarwal, Abdel-Hameed A. Badawy, Donald Yeung, and Chau-Wen Tseng. Evaluating the Impact of Memory System Performance on Software Prefetching and Locality Optimizations. University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2000-57. July 2000.
[pdf, gzip'd ps]

Nicholas Kohout, Seungryul Choi, and Donald Yeung. Multi-Chain Prefetching: Exploiting Memory Parallelism in Pointer-Chasing Codes. University of Maryland Systems and Computer Architecture Group Technical Report, UMD-SCA-TR-2000-01. June 2000.
[pdf, gzip'd ps]

Donald Yeung. Multigrain Shared Memory. Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. MIT/LCS Technical Report, MIT-LCS-TR-743. February 1998.
[pdf, gzip'd ps]

Donald Yeung, William J. Dally, and Anant Agarwal. How to Choose the Grain Size of a Parallel Computer. MIT/LCS Technical Report, MIT-LCS-TR-739. February 1994.
[pdf, gzip'd ps]

Donald Yeung. An Evaluation of Multiprocessor Support for Fine-Grain Synchronization in Preconditioned Conjugate Gradient. Master's Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. MIT/LCS Technical Report, MIT-LCS-TR-565. February 1993.
[pdf, gzip'd ps]

Unpublished:

Donald Yeung. Scalability of Multicast Communication over Wide-Area Networks. Area Exam, Massachusetts Institute of Technology. April 1996.
[pdf, gzip'd ps]

ACM permission notice:
The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

ACM copyright notice:
Copyright © 2013 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page in print or the first screen in digital media. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Send written requests for republication to ACM Publications, Copyright & Permissions at the address above or fax +1 (212) 869-0481 or email permissions@acm.org. For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.

Last updated: March 2024 by Donald Yeung (yeung@umd.edu)