![]() ![]() The second property defines the notion of what it means to have a coherent view of memory. The first property simply preserves program order, which is true even in uniprocessors. This ensures that we do not see the older value after the ne wer value. Writes to the same location are serialized that is, two writes to the same location by any two processors are seen in the same order by all processors. A read by a processor to location X that follows a write by another processor to X returns the written value if the read and write are sufficiently separated in time and no other writes to X occur between the two accesses.ģ. A read by a processor P to a location X that follows a write by P to X, with no writes of X by another processor occurring between the write and the read by P, always returns the value written by P.Ģ. The second aspect, called consistency, determines when a written value will be returned by a read.Ī memory system is coherent if the following hold good:ġ. The first aspect, called coherence, defines what values can be returned by a read. This simple definition contains two different aspects of memory system behavior, both of which are critical to writing correct shared-memory programs. Informally, we could say that a memory system is coherent if any read of a data item returns the most recently written value of that data item. This difficulty is generally referred to as the cache coherence problem. Thus, two different processors can have two different values for the same location. Later on, when processor A modifies it to value 0, processor B still has it as value 1. We can see that both processors A and B read location X as 1. This is because the shared data can have different values in different caches, and this has to be handled appropriately. Caching of shared data, however, introduces the cache coherence problem. In addition to the reduction in access latency and required memory bandwidth, this replication also provides a reduction in contention that may exist for shared data items that are being read by multiple processors simultaneously. Similarly, when shared data are cached, the shared value may be replicated in multiple caches. Since no other processor uses the data, the program behavior is identical to that in a uniprocessor. When a private data is cached, its location is migrated to the cache, reducing the average access time as well as the memory bandwidth required. Private data are used by a single processor, while shared data are used by multiple processors essentially providing communication among the processors through reads and writes of the shared data. Multiprocessor Cache Coherence: Symmetric shared-memory machines usually support the caching of both shared and private data. We shall elaborate on that in detail in this module and the next module. However, when we cache data in multiple processors, we have the problem of cache coherence and consistency. Caches serve to increase bandwidth and reduce latency of access and are useful for both private data and shared data. This can be done by caching the data in multiple processors. Communication latency among processors is going to be a major overhead and that has to be reduced.Therefore, we need to write parallel programs that will harness the full power of the underlying parallel architecture. As the sequential code increases, the performance of the multiprocessor is going to come down. Our programs are going to have both sequential code and parallel code.Parallel and sequential portions of the program The two main challenges that we pointed out are as follows:ġ. ![]() In the previous module, we pointed out the challenges associated with multiprocessors. ![]() The objectives of this module are to discuss about the cache coherence problem in multiprocessors and elaborate on the snoop based cache coherence protocol. ![]() 40. Thread Level Parallelism – SMT and CMP.37. Exploiting ILP with Software Approaches II.34. Case Studies of Multicore Architectures II.33. Case Studies of Multicore Architectures I.31. Other Issues with Parallel Processors.20. Exploiting ILP with Software Approaches I.19. Dynamic scheduling with Speculation.18. Dynamic scheduling – Loop Based Example.16. Advanced Concepts of ILP – Dynamic scheduling.15. Exception handling and floating point pipelines.9. Execution of a Complete Instruction – Control Flow.8. Execution of a Complete Instruction – Datapath Implementation.4. Summarizing Performance, Amdahl’s law and Benchmarks. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |