Daniel Gerzhoy and Donald Yeung. Pipelined CPU-GPU Scheduling to
Reduce Main Memory Accesses. In Proceedings of the 7th
International Symposium on Memory Systems. Virtual
Conference. October-December, 2021.
[pdf]
Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Luyi Kang, Mehdi
Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald
Yeung. Tileable Monolithic ReRAM Memory Design.
In Proceedings of the IEEE Symposium on Low-Power and High-Speed
Chips and Systems. Tokyo, Japan. April 2020.
[pdf]
Daniel Gerzhoy, Xiaowu Sun, Michael Zuzak, and Donald Yeung. Exploiting Nested MIMD-SIMD Parallelism on Heterogeneous Microprocessors. Presented at the High Performance and Embedded Architecture and Compilation Conference. Bologna, Italy. January 2020.
Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Luyi Kang, Shang
Li, Mehdi Asnaashari, Sylvain Dubois, Donald Yeung, and Bruce
Jacob. Design for ReRAM-based Main-Memory Architectures.
In Proceedings of the 5th International Symposium on Memory
Systems. Washington, D.C. September 2019.
[pdf]
Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Luyi Kang, Shang
Li, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald
Yeung. Memory Systems Challenges in Realizing Monolithic
Computers. In Proceedings of the 4th International
Symposium on Memory Systems. National Harbor, MD. October
2018.
[pdf]
Abdel-Hameed A. Badawy and Donald Yeung. Optimizing Locality in
Graph Computations using Reuse Distance Profiles.
In Proceedings of the 36th International Performance Computing
and Communications Conference. San Diego, CA. December 2017.
(IEEE digital library distribution)
Michael Zuzak and Donald Yeung. Exploiting Multi-Loop Parallelism
on Heterogeneous Microprocessors. In Proceedings of the
10th International Workshop on Programmability and Architectures for
Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with
HiPEAC-12. Stockholm, Sweden. January 2017.
Best paper award.
[pdf]
Stephen P. Crago and Donald Yeung. Reducing Data Movement with
Approximate Computing Techniques. In Proceedings of the
IEEE International Conference on Rebooting Computing. San
Diego, CA. October 2016.
[pdf]
Caleb Serafy, Ankur Srivastava, Avram Bar-Cohen, and Donald
Yeung. Design Space Exploration of 3D CPUs and Micro-Fluidic
Heatsinks with Thermo-Electrical-Physical Co-Optimization.
In ASME 2015 International Technical Conference and
Exhibition on Packaging and Integration of Electronic and Photonic
Microsystems, and 13th International Conference on Nanochannels,
Microchannels, and Minichannels. San Francisco, CA. July 2015.
(ASME digital collection distribution)
Minshu Zhao and Donald Yeung. Studying the Impact of Multicore
Processor Scaling on Directory Techniques via Reuse Distance
Analysis. In Proceedings of the 21st International
Symposium on High Performance Computer Architecture
(HPCA-XXI). San Francisco Bay Area, CA. February 2015.
[pdf, gzip'd
ps]
Caleb Serafy, Ankur Srivastava, and Donald Yeung. Unlocking the
True Potential of 3D CPUs with Micro-Fluidic Cooling.
In Proceedings of the International Symposium on Low Power
Electronics and Design (ISLPED'14). La Jolla, California.
August 2014.
(ACM
digital library distribution)
Caleb Serafy, Bing Shi, Ankur Srivastava, and Donald Yeung. Electro-Thermo Co-Design for Micro-Fluidically Cooled 3D ICs. In Proceedings of the 51st Design Automation Conference, Work-In-Progress Session. San Francisco, California. June 2014.
Caleb Serafy, Ankur Srivastava, Bing Shi, and Donald Yeung.
Continued Frequency Scaling in 3D ICs through Micro-fluidic
Cooling. In Proceedings of the IEEE Intersociety
Conference on Thermal and Thermomechanical Phenomena in Electronic
Systems. Orlando, Florida. May 2014.
(IEEE digital library distribution)
Caleb Serafy, Bing Shi, Ankur Srivastava, and Donald Yeung. High
Performance 3D Stacked DRAM Processor Architectures with Micro-Fluidic
Cooling. In Proceedings of the IEEE 3D System Integration
Conference. San Francisco, CA. October 2013.
(IEEE
digital library distribution)
Meng-Ju Wu, Minshu Zhao, and Donald Yeung. Studying Multicore
Processor Scaling via Reuse Distance Analysis.
In Proceedings of the 40th International Symposium on Computer
Architecture (ISCA-XL). Tel-Aviv, Israel. June 2013.
[pdf, gzip'd
ps]
Meng-Ju Wu and Donald Yeung. Identifying Optimal Multicore Cache
Hierarchies for Loop-based Parallel Programs via Reuse Distance
Analysis. In Proceedings of the ACM SIGPLAN Workshop on
Memory Systems Performance and Correctness (MSPC-2012).
Beijing, China. June 2012.
[pdf]
Meng-Ju Wu and Donald Yeung. Coherent Profiles: Enabling Efficient
Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel
Programs. In Proceedings of the 20th International
Conference on Parallel Architectures and Compilation Techniques
(PACT-XX). Galveston Island, TX. October 2011.
[pdf, gzip'd
ps]
Eric Lau, Jason Miller, Inseok Choi, Donald Yeung, Saman Amarasinghe,
and Anant Agarwal. Multicore Performance Optimization using
Positive Energy Partnerships. In Proceedings of the 3rd
USENIX Workshop on Hot Topics in Parallelism (HotPar '11).
Berkeley, CA. May 2011.
[pdf]
Inseok Choi, Minshu Zhao, Xu Yang, and Donald Yeung. Early
Experience with Profiling and Optimizing Distributed Shared Cache
Performance on Tilera's Tile Processor. In Proceedings of
the 6th International Workshop on Unique Chips and Systems.
Atlanta, GA. December 2010.
One of 2 best papers out of 12 papers appearing in the workshop.
[pdf, gzip'd ps]
Wanli Liu and Donald Yeung. Using Aggressor Thread Information to
Improve Shared Cache Management for CMPs. In Proceedings of
the 18th International Conference on Parallel Architectures and
Compilation Techniques (PACT-XVIII). Raleigh, NC. September
2009.
[pdf, gzip'd ps]
Xuanhua Li and Donald Yeung. Exploiting Value Prediction for Fault
Tolerance. In Proceedings of the 3rd Workshop on Dependable
Architectures (WDA-III). Lake Como, Italy. November
2008.
[pdf, gzip'd ps]
Xuanhua Li and Donald Yeung. Application-Level Correctness and its
Impact on Fault Tolerance. In Proceedings of the 13th
International Symposium on High-Performance Computer Architecture
(HPCA-XIII). Phoenix, AZ. February 2007.
[pdf, ps]
Xuanhua Li and Donald Yeung. Exploiting Soft Computing for
Increased Fault Tolerance. In Proceedings of the 2006
Workshop on Architectural Support for Gigascale Integration.
Boston, MA. June 2006.
[pdf, ps]
Seungryul Choi and Donald Yeung. Learning-Based SMT Processor
Resource Distribution via Hill-Climbing. In Proceedings of
the 33rd International Symposium on Computer Architecture
(ISCA-XXXIII). Boston, MA. June 2006.
[pdf, gzip'd
ps]
Kursad Albayraktaroglu, Aamer Jaleel, Xue Wu, Manoj Franklin, Bruce
Jacob, Chau-Wen Tseng, and Donald Yeung. BioBench: A Benchmark
Suite of Bioinformatics Applications. In Proceedings of the
2005 IEEE International Symposium on Performance Analysis of Systems
and Software (ISPASS-V). Austin, TX. March 2005.
[pdf] [benchmark suite download]
Deepak N. Agarwal, Sumitkumar N. Pamnani, Gang Qu, and Donald Yeung.
Transferring Performance Gain from Software Prefetching to Energy
Reduction. In Proceedings of the 2004 International
Symposium on Circuits and Systems (ISCAS2004). Vancouver,
Canada. May 2004.
[pdf, gzip'd
ps]
Dongkeun Kim, Steve Shih-wei Liao, Perry Wang, Juan del Cuvillo,
Xinmin Tian, Xiang Zou, Hong Wang, Donald Yeung, Milind Girkar, and
John Shen. Physical Experimentation with Prefetching Helper
Threads on Intel's Hyper-Threaded Processors. In Proceedings
of the 2004 International Symposium on Code Generation and
Optimization with Special Emphasis on Feedback-Directed and Runtime
Optimization (CGO2004). San Jose, CA. March 2004.
[pdf, gzip'd
ps]
Deepak Agarwal, Wanli Liu, and Donald Yeung. Exploiting
Application-Level Information to Reduce Memory Bandwidth
Consumption. Fourth Workshop on Complexity-Effective
Design. San Diego, CA. June 2003.
[pdf, gzip'd ps]
Dongkeun Kim and Donald Yeung. Design and Evaluation of Compiler
Algorithms for Pre-Execution. In Proceedings of the 10th
International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS-X). San Jose, CA.
October 2002.
[pdf, gzip'd ps]
Gautham K. Dorai and Donald Yeung. Transparent Threads: Resource
Allocation in SMT Processors for High Single-Thread Performance.
In Proceedings of the 11th Annual International Conference on
Parallel Architectures and Compilation Techniques (PACT-XI).
Charlottesville, VA. September 2002.
[pdf, gzip'd ps]
Nicholas Kohout, Seungryul Choi, Dongkeun Kim, and Donald Yeung.
Multi-Chain Prefetching: Effective Exploitation of Inter-Chain
Memory Parallelism for Pointer-Chasing Codes. In
Proceedings of the 10th Annual International Conference on
Parallel Architectures and Compilation Techniques (PACT-X).
Barcelona, Spain. September 2001.
[pdf, gzip'd
ps]
Abdel-Hameed A. Badawy, Aneesh Aggarwal, Donald Yeung, and Chau-Wen
Tseng. Evaluating the Impact of Memory System Performance on
Software Prefetching and Locality Optimizations. In
Proceedings of the 15th Annual International Conference on
Supercomputing (ICS-XV). Sorrento, Italy. June 2001.
[pdf,
gzip'd
ps]
Nicholas Kohout, Seungryul Choi, and Donald Yeung. Multi-Chain
Prefetching: Exploiting Memory Parallelism in Pointer-Chasing
Codes. Solving the Memory Wall Problem Workshop.
Vancouver, Canada. June 2000.
[pdf,
gzip'd
ps]
Donald Yeung. The Scalability of Multigrain Systems. In
Proceedings of the 13th Annual International Conference on
Supercomputing (ICS-XIII). Rhodes, Greece. June 1999.
[pdf, gzip'd ps]
Donald Yeung, Nicholas Kohout, Sujata Ramasubramanian, Ilya Khazanov,
and Rishi Kurichh. Vortex: Irregular Data Stream Support for
Data-Intensive Applications. Eighth Scalable Shared Memory
Multiprocessors Workshop. Atlanta, GA. April 1999.
[abstract]
Andras Moritz, Donald Yeung, and Anant Agarwal. Exploring Optimal
Cost-Performance Designs for Raw Microprocessors. In
Proceedings of the 6th Annual IEEE Symposium on
Field-Programmable Custom Computing Machines (FCCM-VI). Napa,
California. April 1998.
[pdf, gzip'd ps]
Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A
Multigrain Shared Memory System. In Proceedings of the 23rd
Annual International Symposium on Computer Architecture
(ISCA-XXIII). Philadelphia, PA. May 1996.
[pdf, gzip'd ps]
Anant Agarwal, Ricardo Bianchini, David Chaiken, Kirk L. Johnson,
David Kranz, John Kubiatowicz, Beng-Hong Lim, Ken Mackenzie, and
Donald Yeung. The MIT Alewife Machine: Architecture and
Performance. In Proceedings of the 22nd Annual
International Symposium on Computer Architecture (ISCA-XXII).
Santa Margherita, Italy. June 1995.
[pdf, gzip'd ps]
John Kubiatowicz, David Chaiken, Anant Agarwal, Arthur Altman,
Jonathan Babb, David Kranz, Beng-Hong Lim, Ken Mackenzie, John
Piscitello, and Donald Yeung. The Alewife CMMU: Addressing the
Multiprocessor Communications Gap. In Hot Chips: A
Symposium on High Performance Chips. Stanford, CA. August
1994.
[pdf, gzip'd ps]
Donald Yeung and Anant Agarwal. Experience with Fine-Grain
Synchronization in MIMD Machines for Preconditioned Conjugate
Gradient. In Proceedings of the 4th ACM SIGPLAN Symposium
on Principles and Practice of Parallel Programming (PPoPP-IV).
San Diego, California. May 1993.
[pdf, gzip'd ps]
Meng-Ju Wu, Minshu Zhao, Mike Badamo, Jeff Casarona, and Donald Yeung. Characterizing Embedded Applications via Multicore Reuse Distance Analysis. In Workshop on Suite of Embedded Applications and Kernels (Poster Session), held in conjunction with DAC-LI. San Francisco, CA. June 2014.
Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Luyi Kang, Shang
Li, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald
Yeung. Analyzing the Monolithic Integration of a ReRAM-based
Main Memory into a CPU's Die. IEEE Micro (Special Issue
on Monolithic 3D Architectures). Vol. 39, Issue 6.
November/December 2019.
(IEEE digital
library distribution)
Daniel Gerzhoy, Xiaowu Sun, Michael Zuzak, and Donald Yeung. Nested
MIMD-SIMD Parallelization for Heterogeneous Microprocessors.
ACM Transactions on Architecture and Code Optimization.
Vol. 16, No. 4, Article 48. December 2019.
(ACM digital
library distribution)
Minshu Zhao and Donald Yeung. Using Multicore Reuse Distance to
Study Coherence Directories. ACM Transactions on Computer
Systems. Vol. 35, No. 2. Article 4. October 2017.
[pdf]
(ACM digital
library distribution)
Abdel-Hameed A. Badawy and Donald Yeung. Guiding Locality
Optimizations for Graph Computations via Reuse Distance Analysis.
IEEE Computer Architecture Letters. Vol. 16, Issue 2.
pp. 119-122. July - December 2017.
(IEEE digital
library distribution)
I. Stephen Choi and Donald Yeung. Multi-Cache Resizing via Greedy
Coordinate Descent. Journal of
Supercomputing. Vol. 73, No. 6. pp. 2402-2429. June
2017.
(Springer digital library distribution)
Michael Badamo, Jeff Casarona, Minshu Zhao, and Donald
Yeung. Identifying Power Efficient Multicore Cache Hierarchies
via Reuse Distance Analysis. ACM Transactions on Computer
Systems. Vol. 34, No. 1. Article 3. pp. 1-30. April 2016. (c)
2016 ACM
[pdf]
(ACM digital
library distribution)
Caleb Serafy, Avram Bar-Cohen, Ankur Srivastava, and Donald
Yeung. Unlocking the True Potential of 3D CPUs with
Micro-Fluidic Cooling. IEEE Transactions on Very Large
Scale Integration Systems. Vol. 24, No. 4. pp. 1515-1523.
April 2016.
(IEEE
digital library distribution)
Meng-Ju Wu and Donald Yeung. Efficient Reuse Distance Analysis of
Multicore Scaling for Loop-based Parallel Programs. ACM
Transactions on Computer Systems. Vol. 31, No. 1. Article 1.
pp. 1-37. February 2013. (c) 2013 ACM.
[pdf]
(ACM digital
library distribution)
Inseok Choi, Minshu Zhao, Xu Yang, and Donald Yeung. Experience
with Improving Distributed Shared Cache Performance on Tilera's Tile
Processor. IEEE Computer Architecture Letters.
Volume 10, Issue 1. July 2011.
[pdf, gzip'd
ps]
Wanli Liu and Donald Yeung. Enhancing LTP-Driven Cache Management
Using Reuse Distance Information. Journal of
Instruction-Level Parallelism. Vol. 11. pp. 1-24. April
2009.
[pdf,
gzip'd
ps]
Seungryul Choi and Donald Yeung. Hill-Climbing SMT Processor
Resource Distribution. ACM Transactions on Computer
Systems. Vol. 27, No. 1. Article 1. pp. 1-47. February
2009. (c) 2009
ACM
[pdf]
(ACM digital
library distribution)
Xuanhua Li and Donald Yeung. Exploiting Application-Level
Correctness for Low-Cost Fault Tolerance. Journal of
Instruction-Level Parallelism. Vol. 10. pp. 1-28. September
2008.
[pdf, gzip'd
ps]
Sumit Pamnani, Deepak Agarwal, Gang Qu, and Donald Yeung. Low
Power System Design with Performance Enhancement Techniques--General
Approach and Case Study. Journal of Circuits, Systems, and
Computers. Vol. 16, No. 5. pp. 745-767. October 2007.
[pdf]
Dongkeun Kim and Donald Yeung. A Study of Source-Level Compiler
Algorithms for Automatic Construction of Pre-Execution Code.
ACM Transactions on Computer Systems. Vol. 22, No. 3.
pp. 326-379. August 2004. (c) 2004 ACM
[pdf, gzip'd
ps]
(ACM
digital library distribution)
Abdel-Hameed A. Badawy, Aneesh Aggarwal, Donald Yeung, and Chau-Wen
Tseng. The Efficacy of Software Prefetching and Locality
Optimizations on Future Memory Systems. Journal of
Instruction-Level Parallelism. Vol. 6. July 2004.
[pdf, gzip'd
ps]
Seungryul Choi, Nicholas Kohout, Sumit Pamnani, Dongkeun Kim, and
Donald Yeung. A General Framework for Prefetch Scheduling in
Linked Data Structures and its Application to Multi-Chain
Prefetching. ACM Transactions on Computer Systems.
Vol. 22, No. 2. pp. 214-280. May 2004. (c) 2004 ACM
[pdf, gzip'd ps]
(ACM
digital library distribution)
Gautham K. Dorai, Donald Yeung, and Seungryul Choi. Optimizing SMT
Processors for High Single-Thread Performance. Journal of
Instruction-Level Parallelism. Vol. 5. pp. 1-35. April 2003.
[pdf, gzip'd ps]
Andras Moritz, Donald Yeung, and Anant Agarwal. SimpleFit: A
Framework for Analyzing Design Tradeoffs in Raw
Architectures. IEEE Transactions on Parallel and Distributed
Systems. Vol. 12, No. 6. pp. 730-742. June 2001.
[pdf, gzip'd ps]
Donald Yeung, John Kubiatowicz, and Anant Agarwal. Multigrain
Shared Memory. ACM Transactions on Computer
Systems. Vol. 18, No. 2. pp. 154-196. May 2000. (c) 2000
ACM
[pdf, gzip'd ps]
(ACM
digital library distribution)
Anant Agarwal, Ricardo Bianchini, David Chaiken, Frederic T. Chong,
Kirk L. Johnson, David Kranz, John D. Kubiatowicz, Beng-Hong Lim,
Kenneth Mackenzie, and Donald Yeung. The MIT Alewife Machine.
Proceedings of the IEEE. Vol. 87, No. 3. pp. 430-444.
March 1999.
[pdf, gzip'd ps]
Anant Agarwal, John Kubiatowicz, David Kranz, Beng-Hong Lim, Donald
Yeung, Godfrey D'Souza, and Mike Parkin. Sparcle: An Evolutionary
Processor Design for Large-Scale Multiprocessors. IEEE
Micro. pp. 48-61. June 1993.
[pdf, gzip'd ps]
Yan Solihin and Donald Yeung. Data Cache Prefetching.
Speculative Execution in High Performance Computer
Architectures. CRC Press. 2005.
David Kranz, Beng-Hong Lim, Donald Yeung, and Anant Agarwal.
Low-Cost Support for Fine-Grain Synchronization in
Multiprocessors. Multithreading: A Summary of the State of
the Art. Kluwer Academic Publishers. 1992.
[pdf, gzip'd ps]
Daniel Gerzhoy and Donald Yeung. Pipelined CPU-GPU Scheduling for
Caches. University of Maryland Institute for Advanced
Computer Studies Technical Report, UMIACS-TR-2021-01. March
2021.
[pdf]
Meenatchi Jagasivamani, Candace Walden, Devesh Singh, Shang Li, Luyi
Kang, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald
Yeung. Design and Evaluation of Monolithic Computers Implemented
Using Crossbar ReRAM. University of Maryland
Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2019-01. July 2019.
[pdf]
Michael Zuzak and Donald Yeung. Exploiting Multi-Loop Parallelism
on Heterogeneous Microprocessors. University of Maryland
Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2016-01. November 2016.
[pdf]
Minshu Zhao and Donald Yeung. Studying Directory Access Patterns
via Reuse Distance Analysis and Evaluating Their Impact on
Multi-Level Directory Caches. University of Maryland
Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2014-01. January 2014.
[pdf]
Inseok Choi and Donald Yeung. Symbiotic Cache Resizing for CMPs
with Shared LLC. University of Maryland Institute for
Advanced Computer Studies Technical Report,
UMIACS-TR-2013-02. September
2013.
[pdf]
Inseok Choi and Donald Yeung. Multi-Level Cache
Resizing. University of Maryland Institute for Advanced
Computer Studies Technical Report, UMIACS-TR-2012-11.
November 2012.
[pdf]
Meng-Ju Wu and Donald Yeung. Understanding Multicore Cache
Behavior of Loop-based Parallel Programs via Reuse Distance
Analysis. University of Maryland Institute for Advanced
Computer Studies Technical Report, UMIACS-TR-2012-01. January
2012.
[pdf]
Meng-Ju Wu and Donald Yeung. Memory Performance Analysis for
Parallel Programs Using Concurrent Reuse Distance.
University of Maryland Institute for Advanced Computer Studies
Technical Report, UMIACS-TR-2010-10. October 2010.
[pdf]
Meng-Ju Wu and Donald Yeung. Scaling Single-Program Performance on
Large-Scale Chip Multiprocessors. University of Maryland
Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2009-16. November 2009.
[pdf]
Wanli Liu and Donald Yeung. Probabilistic Replacement: Enabling
Flexible Use of Shared Caches for CMPs. University of
Maryland Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2008-13. July 2008.
[pdf]
Wanli Liu and Donald Yeung. Enhancing LTP-Driven Cache Management
Using Reuse Distance Information. University of Maryland
Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2007-33. June 2007.
[pdf]
Xuanhua Li and Donald Yeung. Application-Level Correctness and its
Impact on Fault Tolerance. University of Maryland Institute
for Advanced Computer Studies Technical Report,
UMIACS-TR-2006-36. August 2006.
[pdf]
Meng-Ju Wu and Donald Yeung. Parallelization of the SSCA #3
Benchmark on the RAW Processor. University of Maryland
Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2006-42. August 2006.
[pdf]
Seungryul Choi and Donald Yeung. Hill-Climbing SMT Processor
Resource Scheduler. University of Maryland Institute for
Advanced Computer Studies Technical Report, UMIACS-TR-2005-30.
May 2005.
[pdf]
Gautham K. Dorai, Donald Yeung, and Seungryul Choi. Optimizing SMT
Processors for High Single-Thread Performance. University of
Maryland Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2003-07. January 2003.
[pdf, gzip'd ps]
Deepak Agarwal and Donald Yeung. Exploiting Application-Level
Information to Reduce Memory Bandwidth Consumption.
University of Maryland Institute for Advanced Computer Studies
Technical Report, UMIACS-TR-2002-64. July 2002.
[pdf, gzip'd ps]
Dongkeun Kim and Donald Yeung. Using Program Slicing to Drive
Pre-Execution on Simultaneous Multithreading Processors.
University of Maryland Institute for Advanced Computer Studies
Technical Report, UMIACS-TR-2001-49. June 2001.
[pdf, gzip'd ps]
Aneesh Aggarwal, Abdel-Hameed A. Badawy, Donald Yeung, and Chau-Wen
Tseng. Evaluating the Impact of Memory System Performance on
Software Prefetching and Locality Optimizations. University
of Maryland Institute for Advanced Computer Studies Technical Report,
UMIACS-TR-2000-57. July 2000.
[pdf, gzip'd
ps]
Nicholas Kohout, Seungryul Choi, and Donald Yeung. Multi-Chain
Prefetching: Exploiting Memory Parallelism in Pointer-Chasing
Codes. University of Maryland Systems and Computer
Architecture Group Technical Report, UMD-SCA-TR-2000-01. June
2000.
[pdf, gzip'd ps]
Donald Yeung. Multigrain Shared Memory. Ph.D. thesis,
Department of Electrical Engineering and Computer Science,
Massachusetts Institute of Technology. MIT/LCS Technical
Report, MIT-LCS-TR-743. February 1998.
[pdf, gzip'd ps]
Donald Yeung, William J. Dally, and Anant Agarwal. How to Choose
the Grain Size of a Parallel Computer. MIT/LCS Technical
Report, MIT-LCS-TR-739. February 1994.
[pdf, gzip'd ps]
Donald Yeung. An Evaluation of Multiprocessor Support for
Fine-Grain Synchronization in Preconditioned Conjugate Gradient.
Master's Thesis, Department of Electrical Engineering and Computer
Science, Massachusetts Institute of Technology. MIT/LCS
Technical Report, MIT-LCS-TR-565. February 1993.
[pdf, gzip'd ps]
ACM permission notice:
The documents contained in these directories are included by the
contributing authors as a means to ensure timely dissemination of
scholarly and technical work on a non-commercial basis. Copyright and
all rights therein are maintained by the authors or by other copyright
holders, notwithstanding that they have offered their works here
electronically. It is understood that all persons copying this
information will adhere to the terms and constraints invoked by each
author's copyright. These works may not be reposted without the
explicit permission of the copyright holder.
ACM copyright notice:
Copyright © 2013 by the Association for Computing Machinery,
Inc. (ACM). Permission to make digital or hard copies of portions of
this work for personal or classroom use is granted without fee
provided that the copies are not made or distributed for profit or
commercial advantage and that copies bear this notice and the full
citation on the first page in print or the first screen in digital
media. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy
otherwise, to republish, to post on servers, or to redistribute to
lists, requires prior specific permission and/or a fee. Send written
requests for republication to ACM Publications, Copyright &
Permissions at the address above or fax +1 (212) 869-0481 or email
permissions@acm.org. For other copying of articles that carry a code
at the bottom of the first or last page, copying is permitted provided
that the per-copy fee indicated in the code is paid through the
Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.