Computer Architecture 101 - Python - Notes - Teachmint (2023)

Page 1 :
The Fundamentals of Computing, Linda Null and Julia Lobur, JONES AND BARTLETT


Page 2 :
the basics of, Linda Null, Pennsylvania State University, Julia Lobur, Pennsylvania State University


Page 3:
World Headquarters, Jones and Bartlett Publishers, 40 Tall Pine Drive, Sudbury, MA 01776, 978-443-5000, info@jbpub.com, www.jbpub.com, Jones and Bartlett Publishers, Canada, 2406 Nikanna Road, Mississauga, ON L5C 2W6, CANADA, , Jones and Bartlett Publishers International, Barb House, Barb Mews, London W6 7PA, UK, , Copyright © 2003 by Jones and Bartlett Publishers, Inc., cover image © David Buffington / Getty Images, Illustrations based on and excerpted from illustrations provided by Julia Lobur, Library of Congress Cataloging in Publication Data, Null, Linda., The Essentials of Computer Organization and Architecture / Linda Null, Julia Lobur., p. cm., ISBN 0-7637-0444-X, 1. Organization of computers. 2. Computer architecture. I. LOBUR, Julia. II. Title., QA76.9.C643 N85 2003, 004.2'2—dc21, 2002040576, All rights reserved. No part of the material protected by this copyright notice may be reproduced or used in any form, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the written permission of the copyright owner. ., CEO: Clayton Jones, COO: Don W. Jones, Jr., Executive Vice President and Publisher: Robert W. Holland, Jr., Vice President of Design and Production: Anne Spencer, Vice President of Manufacturing and Inventory Control: Therese Bräuer, Director of Sales and Marketing: William Kane, Editor-in-Chief, Faculty: J. Michael Stranz, Production Manager: Amy Rose, Senior Marketing Manager: Nathan Schultz, Associate Production Editor: Karen C. Ferreira, Associate Editor : Theresa DiDonato, Production Assistant: Jenny McIsaac , Cover Design: Kristin E. Ohlin, Composition: Northeast Composers, Text Design: Anne Flanagan, Printing and Binding: C ourier Westford, Cover Printing: Jaguar Advance d Graphics, This book was composed in Quark 4.1 on a Macintosh G4. The font families used were Times, Mixage and Prestige, Elite. First printing was done on 45# Highland Plus., Printed in the United States of America, 07 06 05 04 03, , 10 9 8 7 6 5 4 3 2 1


Page 4:
In memory of my father, Merrill Cornell, a pilot and a man of infinite talent and courage, who taught me that when we venture into the unknown, we either find solid ground or learn to fly., —L. M.N., To the loving memory of my mother, Anna J. Surowski, who did everything possible for her daughters. —J.M.L.


Page 6:
PREFACE, FOR THE STUDENT, This is a book on computer organization and architecture. It focuses on the function and design of the various components required to process information digitally. We present computer systems as a series of layers, starting with low-level hardware and moving to higher-level software, including assemblers and operating systems. These levels constitute a hierarchy of virtual machines. The study of computing organization focuses on this hierarchy and the issues related to how we divide the levels and how each level is implemented. The study of computer architecture focuses on the interface between hardware and software and emphasizes the structure and behavior of the system. for Software Performance Students invariably ask, “If I have a computer science degree, why should I learn about computer hardware? Isn't that for computer engineers? Why do I care what the inside of a computer looks like? As computer users, we probably don't need to worry about this any more than we need to know what our car looks like, like under the hood, in order to drive it. We can certainly write high-level language programs without understanding how those programs run; we can use various application packages without understanding how they actually work. But what happens when the program we write needs to be faster and more, v


Page 7:
vi, , Preface, , efficient or the application we are using does not do exactly what we want? As computer scientists, we need a basic understanding of the computer system itself to solve these problems. There is a fundamental relationship between computer hardware and the many aspects of programming and software components in computer systems. To write good software, it is very important to understand the computer system as a whole. Understanding your hardware can help explain the mysterious bugs that sometimes creep into your programs, like the infamous segmentation fault or bus error. The level of knowledge about computer organization and architecture that a high-level programmer should have depends on the task that the high-level programmer is trying to complete. For example, to write compilers, you must understand the specific hardware for which you are compiling. Some of the ideas used in hardware (such as pipelines) can be adapted to build techniques, making the compiler faster and more efficient. In order to model large, complex real-world systems, you need to understand how floating-point arithmetic (which is not necessarily the same thing) is supposed to work and works. To write device drivers for video, disk, or other I/O devices, you need a good understanding of the I/O interface and general computer architecture. If you want to work on embedded systems, which often have very limited features, you need to understand all the trade-offs of time, space, and price. To research and make recommendations for specific hardware systems, networks, or algorithms, you must understand benchmarking and learn how to correctly present performance results. Before you buy hardware, you need to understand benchmarking and all about the ways others can manipulate performance results to "prove" that one system is better than another. Regardless of our particular area of ​​expertise, as computer scientists it is imperative that we understand how hardware and software interact. You may also wonder why a book with the gist word in the title is so great. The reason is twofold. First of all, the subject of computer organization is vast and growing every day. Second, there is little consensus on which topics in this growing sea of ​​information are truly essential and which are simply useful to know. In writing this book, one of the goals was to provide a concise text compatible with the Computer Architecture Curriculum Guidelines published jointly by the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE). These guidelines cover topics that experts say constitute the "essential" core of knowledge relevant to the topic of computer organization and architecture. essential - for the continuation of computing, study and professional advancement. Topics that we believe will help you in your continuing studies of computer science include operating systems, compilers, database administration, and data communication. Other topics are included to help you understand how real systems work in real life.


Page 8:
Preface, , vii, , We hope you find reading this book a pleasant experience and that you take the time to delve into some of the material we present. Our intent is that this book will serve as a useful reference long after you have completed your formal course. While we provide a substantial amount of information, it is only a foundation on which you can build the rest of your studies and career. Successful computer professionals continually increase their knowledge of how computers work. Welcome to the beginning of your journey., , FOR THE INSTRUCTOR, About the Book, This book is the result of two computer architecture and organization classes taught at the Pennsylvania State University Harrisburg campus. As the computer science curriculum has evolved, we have realized that it is necessary not only to modify the material taught in the courses, but also to condense the courses of a two-semester sequence into one course of one semester of three credits. Many other schools also recognized the need to compress the material to accommodate emerging themes. This new course, like this textbook, is primarily aimed at computer science graduates and is intended to address topics in computer architecture and organization that computer science graduates should be familiar with. This book not only integrates the underlying principles in these areas, but also introduces and motivates the issues, providing the breadth needed for higher education and the depth needed for further study in computing. Our main goal in writing this book is to change the way computers, organization, and architecture are commonly taught. A computer science graduate should leave a computer architecture and organization class not only with an understanding of the important general concepts on which the digital computer is based, but also with an understanding of how these concepts apply in the world. real. These concepts must transcend vendor-specific terminology and design; in fact, students must be able to take the concepts provided in the specific and translate them into the generic and vice versa. In addition, students must develop a solid foundation for further study in the main course. the specialty must have exposure, familiarity, or mastery. We do not expect students using our textbook to have a complete command of all the topics presented. However, we strongly believe that there are certain issues that must be mastered; there are those subjects for which students should have a definite familiarity; and there are certain subjects for which a brief introduction and exposition is necessary. We do not believe that the concepts presented in sufficient depth can be learned by studying the general principles in isolation. Therefore, we present the themes as a


Page 9:
viii, , Preface, , grated set of solutions, not simply a collection of individual information. We believe that our explanations, examples, exercises, tutorials, and simulators combine to provide the student with a total learning experience that exposes the inner workings of a modern digital computer at the appropriate level. style, omitting unnecessary jargon, writing clearly and concisely, and avoiding unnecessary abstractions, in the hope of increasing student enthusiasm. We've also expanded the range of topics typically found in a top-tier architecture book to include system software, a brief tour of operating systems, performance issues, alternative architectures, and a concise introduction to networking as it these topics are closely related to , computer equipment. Like most books, we have chosen a model of architecture, but it is a model that we have designed with simplicity in mind. CC-2001). These new guidelines represent the first major revision since the very popular Computing Curricula 1991. CC-2001 represents several major changes from CC-1991, but we are primarily interested in those that address computer organization and architecture. CC-1991 suggested approximately 59 hours of class time for architecture (defined as organization and architecture and labeled AR), including the following topics: digital, logic, digital systems, machine level data representation, assembly level, organization of machine, memory system organization and architecture, interface and communication and alternative architectures. The most recent version of CC-2001, (available at www.computer.org/education/cc2001/) reduced architecture coverage to 36 core hours, including digital logic and digital systems (3 hours), data representation at the level of machine (3 hours), organization of the machine at the assembly level (9 hours), organization and architecture of the memory system (5 hours), interface and communication (3 hours), functional organization (7 hours) and multiprocessing and alternative architectures ( 3 hours). In addition, CC-2001 suggests including architecture and performance enhancements for networks and distributed systems as part of the CC-2001 Architecture and Organization module. We are pleased, after completely revising our course and writing this textbook, that our new material is in direct correlation to the 2001 ACM/IEEE Curriculum Guidelines for Computer Architecture and Organization as follows: AR1., AR2., AR3., AR4., AR5., AR6., AR7., , Digital Logic and Digital Systems (kernel): Chapters 1 and 3, Representation of data at the machine level (kernel): Chapter 2, Organizational assembly at the machine (kernel): Chapters 4, 5 and 6, Organization and architecture of the memory system (kernel): Chapter 6, Interface and communication (kernel): Chapter 7, Functional organization (kernel): Chapters 4 and 5, Multiprocessing alternatives and (core) architectures: Chapter 9


Page 10:
Preface, , ix, , AR8. Performance Enhancements (Optional): Chapters 9 and 10, AR9. Architecture for Distributed Systems and Networks (Optional): Chapter 11, Why Another Text? In our more than 25 years of teaching these courses, we have used many very good textbooks. However, each time we taught the course, the content evolved and we eventually found that we were writing many more course notes to bridge the gap between the material in the textbook and the material we felt needed to be presented in our classes. We found that our course material moved from a computer engineering approach to organization and architecture toward a computer science approach to these topics. When the decision was made to merge the organization, the class, and the architecture class into a single course, we simply could not find a textbook that covered the material we felt was necessary for our courses, written from a computer perspective, written without a machine. . specific terminology designed to motivate the topics before covering them. Students, however, must have a solid understanding of basic concepts before they can understand and appreciate the non-tangible aspects of design. Most organization and architecture textbooks present a similar subset of technical information on these fundamentals. However, we pay special attention to the level at which information should be addressed and to presenting that information in the context that is relevant to computer science students. For example, throughout this book, when specific examples are needed, examples are provided for personal computers, corporate systems, and mainframes, since these are the types of systems you are most likely to encounter. We have avoided the "PC bias" that is prevalent in similar books, hoping that students will appreciate the differences, similarities, and roles that various platforms play in today's automated infrastructures. Textbooks often forget that motivation is perhaps the most important key to learning. To that end, we include many real-world examples, while trying to maintain a balance between theory and application. material more, accessible to students. Some of the features are listed below: • Sidebars. These sidebars include interesting information that goes beyond the main focus of the chapter, allowing readers to dig deeper into the material.


Page 11:
x, , Preface, , • Real world examples. We have integrated the book with real life examples to give students a better understanding of how technology and techniques are combined for practical purposes. • Chapter summaries. These sections provide short but concise summaries of the main points of each chapter. • Additional reading. These sections list additional sources for readers who wish to investigate any of the topics in more detail, and contain references to definitive articles and books related to the topics in the chapter. • Review questions. Each chapter contains a set of review questions designed to ensure that the reader has a firm understanding of the material. • Chapter exercises. Each chapter has a wide selection of exercises to reinforce the ideas presented. The most challenging exercises are marked with an asterisk. • Answers to selected exercises. To ensure students are on the right track, we provide answers to representative questions from each chapter. Questions with answers at the end of the text are marked with a blue diamond. • Special "Focus on" sections. These sections provide additional information for instructors who want to cover certain concepts, such as kmaps and input/output, in more detail. Additional exercises are also provided for these sections. • Appendix. The appendix provides a brief introduction or review of data structures, including topics such as stacks, linked lists, and trees. • Glossary. An extensive glossary includes brief definitions of all key terms, from the chapters., • Index. A comprehensive index, with numerous cross references, is provided with this book to make it easier for the reader to locate terms and concepts. About the authors We bring to this book not only more than 25 years of combined teaching experience, but also 20 years of industry experience. Our combined efforts therefore emphasize the underlying principles of computer organization and architecture and how these issues interrelate in practice. We have included real life examples to help students appreciate how these fundamental concepts apply to the world of computing. Linda Null received a Ph.D. in Computer Science from Iowa State University in 1991, an M.S. in Computer Science from Iowa State University in 1989, an M.S. in Computer Science Education from Northwestern Missouri State University in 1983, an M.S. in Mathematics Education from Northwestern Missouri State University in 1980, and a B.S. in Mathematics and English from Northwest Missouri State University in 1977. She has taught mathematics and computer science for more than 25 years and is currently the coordinator of the graduate program in Computer Science at the Harrisburg campus of Pennsylvania State University, where he has been a faculty member since 1995. His areas of interest include computer organization and architecture, operating systems, and computer security.


Page 12:
Preface, , xi, , Julia Lobur has been active in the computer industry for more than 20 years. She has held positions as a systems consultant, programmer/staff analyst, systems and network designer, and software development manager, in addition to part-time teaching roles. She includes one year of programming experience using a high-level procedural language. Students are also expected to have completed one year of college level mathematics (calculus or discrete mathematics), as this book assumes and incorporates these mathematical concepts. This book does not assume any prior knowledge of computer hardware. A computer architecture and organization class is usually a prerequisite for an undergraduate class on operating systems (students must know about memory hierarchy, concurrency, exceptions, and interrupts), compilers (students must know about sets addressing, and memory binding), networking (students must understand the hardware of a system before attempting to understand the network that links these components together), and of course any kind of advanced architecture. This text covers the necessary topics for these courses, General Organization and Coverage. I don't think the best way to do this is to “compartmentalize” the various topics; therefore, we choose a structured but integrated approach, where each issue is addressed in the context of the entire IT system. As with many popular texts, we take a bottom-up approach, starting with the digital logic level and working up to the application level that students should be familiar with before beginning class. The text is carefully structured so that the reader understands one level before moving on to the next. By the time the reader reaches the application level, all the necessary concepts in computing, organization, and architecture have been introduced. Our goal is to enable students to link the hardware knowledge covered in this book with the concepts learned in their introductory programming classes, resulting in a full and complete picture of how hardware and software fit together. Ultimately, the degree of understanding of the hardware has a significant influence on the design and performance of the software. If students can build a strong foundation in hardware fundamentals, it will go a long way in helping them become better computer scientists. To address the myriad areas a computer professional must be educated in, we look at high-level computer architecture, providing low-level coverage only when necessary to understand a specific concept. For example, when talking about ISA, many hardware-dependent issues are introduced in the context


Page 13:
xii, , Preface, , of different case studies to differentiate and reinforce the problems associated with ISA design. pointing out, highlighting the many milestones in the development of computing systems and allowing the reader to visualize how we got to the current state of computing. This chapter introduces the necessary terminology, the basic components of a computer system, the various logical levels of a computer model of the system, and the von Neumann computer model. It provides a high-level view of the computer system, as well as the motivation and concepts needed for further study. • Chapter 2 provides comprehensive coverage of the various ways computers use to represent character and numeric information. Addition, subtraction, multiplication, and division are covered after the reader has been exposed to number bases and typical number representation techniques, including one's complement, two's complement, and BCD. In addition, EBCDIC, ASCII, and Unicode character representations are covered. Fixed and floating point representations are also introduced. Codes for recording data and detecting and correcting errors are briefly discussed. • Chapter 3 is a classic presentation of digital logic and how it relates to Boolean algebra. This chapter covers both combinational and sequential logic, in enough detail to allow the reader to understand the logic makeup of more complicated MSI (medium-scale integration) circuits (such as decoders). More complex circuitry such as buses and memory are also included. We have included optimization and Kmaps in a special "Focus on" section. • Chapter 4 illustrates the basic computer organization and introduces many fundamental concepts, including the get, decode, and execute cycle, the data path, clocks and buses, register transfer notation, and, of course, the CPU. A very simple architecture, MARIE, and its ISA are presented to allow the reader to gain a full understanding of the basic architectural organization involved in running the program. MARIE features the classic von Neumann design and includes a program counter, accumulator, instruction register, 4096 bytes of memory, and two addressing modes. Assembly language programming is introduced to reinforce the concepts of instruction format, instruction mode, data format, and control presented earlier. This is not an assembly language textbook and is not designed to provide a practical course in assembly language programming. The main objective of the assembly introduction is to deepen the understanding of computer architecture in general. However, a simulator is provided for MARIE so that assembly language programs can be written, assembled, and run on the MARIE architecture. The two methods of control, hardwired and firmware, are presented and compared in this chapter. Finally, Intel and MIPS architectures are compared to reinforce the concepts of the chapter. • Chapter 5 provides a more detailed look at instruction set architectures, including instruction formats, instruction types, and addressing modes. level of instruction


Page 14:
Preface, , •, , •, , •, , •, , •, , •, , xiii, channeling is also introduced. Real-world ISAs (including Intel, MIPS, and Java) are introduced to reinforce the concepts presented in the chapter. memory, hierarchy, including cache and virtual memory. This chapter provides a comprehensive presentation of direct mapping, associative mapping, and set associative mapping techniques for caching. It also provides a detailed look at overlays, paging and segmentation, TLBs, and the various algorithms and devices associated with each. A tutorial and simulator for this chapter are available on the book's website. Chapter 7 provides a detailed overview of I/O fundamentals, bus communication and protocols, and typical external storage devices such as magnetic and optical disks, as well as the various formats available for each. DMA, scheduled I/O, and interrupts are also covered. In addition, various techniques for exchanging information between devices are introduced. RAID architectures are covered in detail and various data compression formats are presented. Chapter 8 discusses the various programming tools available (such as compilers and assemblers) and their relationship to the architecture of the machine on which they run. The purpose of this chapter is to link the programmer's vision of a computer system with the actual hardware and underlying machine architecture. In addition, operating systems are introduced, but they are only covered in as much detail as is applicable to the architecture and organization of a system (such as resource use and protection, traps and interrupts, and various other services). in recent years. RISC, Flynn's Taxonomy, parallel processors, instruction level, parallelism, multiprocessors, interconnection networks, shared memory systems, cache coherence, memory models, superscalar machines, neural networks, systolic architectures, data flow computers are addressed. and distributed architectures. Our main goal in this chapter is to help the reader realize that we are not limited to the von Neumann architecture and to force the reader to consider performance issues, setting the stage for the next chapter. Chapter 10 addresses various performance analysis and management issues. The necessary mathematical preliminaries are introduced, followed by a discussion of MIPS, FLOPS, benchmarking, and various optimization problems that a computer scientist should be familiar with, including branch prediction, speculative execution, and loop optimization. Chapter 11 focuses on network organization and architecture, including network components and protocols. The OSI model and the TCP/IP suite are introduced in the context of the Internet. This chapter is not intended to be exhaustive. The main objective is to put the computer architecture in the correct context in relation to the network architecture. An appendix on data structures is provided for those situations where students may need a brief introduction or review of topics such as stacks, queues, and linked lists.


Page 15:
xiv, , Preface, , Chapter 1: Introduction, , Chapter 2:, Data Representation, , Chapter 3:, Boolean Algebra and, Digital Logic, , Chapter 4: MARIE, the, Simple Computer, , Chapter 5: One More Look close close on ISA, Chapter 6:, Memory, Chapter 7:, Input/Output, Chapter 8:, System software, Chapter 9: Alternative, Architectures, Chapter 11:, Network organization, Chapter 10:, Performance , , FIGURE P.1 Relationship of prerequisites between chapters, , The sequence of chapters is such that they can be taught in the given numerical order. However, an instructor can modify the order to better suit a particular curriculum, if necessary. Figure P.1 shows the prerequisite relationships that exist between various chapters. Intended Audience This book was originally written for an undergraduate class in computer organization and architecture for computer science courses. While specifically targeted at computer science graduates, the book does not preclude its use by IS and IT graduates. This book contains more than enough material for a typical semester course (14 weeks, 42 class hours); however, the average student cannot master all of the material in the book in a semester-long class. if the instructor


Page 16:
Preface, , xv, , plans to cover all topics in detail, a two-semester sequence would be ideal. The organization is such that one instructor can cover the main subject areas at different levels of depth, depending on the experience and needs of the students. Table P.1 gives the instructor an idea of ​​the time required to cover the topics and also lists the corresponding achievement levels for each chapter. It is our intention that this book will serve as a useful reference long after the formal course is completed.Support MaterialsA textbook is a fundamental tool in learning, but its effectiveness is greatly enhanced by supplementary materials and exercises. , which emphasize key concepts, provide immediate information. feedback to the reader, and foster understanding through repetition. Therefore, we have created the following support materials for The Essentials of Computer Organization and Architecture: • Instructor's Manual. This manual contains answers to exercises and sample exam questions. In addition, it provides tips on teaching various concepts and problem areas that students frequently encounter. • Reading slides. These slides contain lesson material appropriate for a semester-long course in computer organization and architecture. • Figures and tables. For those who wish to prepare their own teaching materials, we provide the figures and tables in downloadable format., , One Semester, (42 hours), , Chapter, 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, , Two semesters, (84 hours), , Reading, Hours, , Expected, Level, , Reading, Hours, , Expected, Level, , 3, 6, 6, 6, , Mastery, Mastery, Familiarity , , 3, 5, 2, 2, , Familiarity, Familiarity, Familiarity, Exposure, , 3, 3, 3, , Familiarity, Exposure, Exposure, , 3, 6, 6, 10, 8, 9, 6 , 7, 9, 9, 11, , Master's, Master's, Master's, Master's, Master's, Master's, Master's, Master's, Master's, Master's, Master's, , TABLE P.1 Suggested class hours


Page 17:
xvi, , Preface, , • Memory Tutorial and Simulator. This package allows students to apply the concepts of cache and virtual memory. • MARIE simulator. This package allows students to create and run MARIE programs. • Tutorial software. Other tutorial software is provided for various concepts in the book. • The companion website. All related software, slides, and materials can be downloaded from the book's website: http://computerscience.jbpub.com/ECOA, , The exercises, sample quiz problems, and solutions were tested in various classes. The Instructor's Manual, which includes teaching tips, various chapters, as well as answers to exercises in the book, suggested programming tasks, and sample questions, is available to instructors who adopt the book. (Contact your Jones and Bartlett Publishers representative at 1-800-832-0034 to access this area of ​​the site.) The Instructional Model: MARIE, in A Book of Computer Organization and Architecture, Choosing the architectural model affects both instructor and students. If the model is too complicated, both the instructor and the students tend to get bogged down in details that really have no bearing on the concepts being presented in the classroom. Real architectures, while interesting, often have too many quirks to be used in an introductory class. To further complicate matters, the actual architectures change from day to day. Also, it's hard to find a book that incorporates a model that corresponds to a particular department's local computing platform, keeping in mind that the platform can also change from year to year. To alleviate these issues, we designed our own simple architecture, MARIE, specifically for pedagogical use. MARIE (Machine Architecture, i.e. Really Intuitive and Easy) enables students to learn the essential concepts of computer organization and architecture, including assembly language, without getting bogged down in the unnecessary and confusing details that exist in real architectures. work system The MARIE machine simulator, MarieSim, has an easy-to-use GUI that allows students to: (1) create and edit source code; (2) assemble the source code into the object code of the machine; (3) execute, machine code; and, (4) debugging programs. Specifically, MarieSim has the following features:, •, •, •, •, , Support for the MARIE assembly language introduced in Chapter 4, An integrated text editor for creating and modifying programs, hexadecimal machine language object code, a Integrated debugger with single-step mode, breakpoints, pause, resume, and log and memory trace


Page 18:
Preface, , xvii, , • A graphical memory monitor showing all 4096 addresses in MARIE memory, • A graphical display of MARIE registers, • Instructions highlighted during program execution, • User controlled speed of execution, • Status messages, • User-visible symbol tables, • An interactive assembler that allows the user to correct any errors and reassemble, automatically, without changing the environment, • Online help, • Optional core dumps, which allow the user to specify memory range, • Frame sizes that can be changed by the user, • A small learning curve, allowing students to learn the system quickly, MarieSim was written in the Java™ language to make the system portable to any platform for which a Java™ Virtual Machine (JVM) is available. Java students may want to look at the simulator source code and maybe even offer enhancements or enhancements. in its simple functions. Figure P.2, MarieSim's graphical environment, shows the graphical environment of the MARIE machine simulator. The screen consists of four parts: the menu bar, the central monitor area, the memory monitor, and the message area. FIGURE P.2 The MarieSim graphics environment


Page 19:
xviii, , Preface, , Menu options allow the user to control the actions and behavior of the MARIE Machine Simulator system. These options include loading, starting, stopping, setting breakpoints, and pausing programs that were written in MARIE, assembly language. The MARIE Simulator illustrates the process of mounting, loading and running, all in a simple environment. Users can view assembly language instructions directly from their programs, along with the corresponding machine code (hexadecimal) equivalents. The addresses of these instructions are also indicated and users can view any part of the memory at any time. The highlight is used to indicate the initial load address of a program, as well as the instruction that is currently being executed while a program is running. The graphical display of registers and memory allows the student to see how instructions change values ​​within registers and memory. If you do find an error, we have tried to make this book as technically accurate as possible, but although the manuscript has gone through countless revisions, errors have a way of escaping detection. We'd love to hear from readers who find bugs that need to be fixed. Your comments and suggestions are always welcome by sending an email to ECOA@jbpub.com., Credits and Acknowledgments, Few books are entirely the result of the efforts of one or two people, and this is no exception. We now realize that writing a textbook is a formidable task and only possible with a concerted effort, and it is impossible for us to adequately thank those who made this book possible. If, in the following acknowledgments, we inadvertently omit someone, we humbly apologize. different topics in class. A special thanks to the classes that used preliminary versions of the textbook for their tolerance and diligence in finding errors. Several people read the manuscript in detail and provided helpful suggestions. In particular we would like to thank Mary Creel and Hans Royer. ); Robert Franks (Central College); Karam, Mossaad (University of Texas at Austin); Michael Schulte (University of Missouri, St. Louis); Peter Smith (CSU Northridge); Xiaobo Zhou (Wayne State, University). A special thanks to Karishma Rao for his time and effort in producing a quality memory software module.


Page 20:
Foreword, , xix, , The editorial team at Jones and Bartlett has been wonderful to work with, and each member deserves a special thank you, including Amy Rose, Theresa, DiDonato, Nathan Schultz, and J. Michael Stranz., I, Linda Null , me I would like to personally thank my husband, Tim Wahls, for his patience in living life as a "book widower," for listening and commenting candidly on the contents of the book, for doing such an amazing job with the entire kitchen, and for putting up with the almost daily commitments necessary to write this book. I consider myself incredibly lucky to be married to such a wonderful man. I express my sincere gratitude to my mentor, Merry McDonald, who taught me the value and joy of learning and teaching, and of doing both with integrity. Finally, I would like to express my deepest gratitude to Julia Lobur, as without her this book and accompanying software would not be a reality. My work on this book was made possible by her patience and faithfulness. She nourished my body through her culinary delights and my spirit through her wisdom. She took time off from me in many ways while working hard on her own major and her own advanced degree. I would also like to express my deep gratitude to Linda Null: Above all, for her unparalleled dedication to the field of computer science education and her dedication to her students, and consequently for giving me the opportunity to share with her the ineffable experience of didactics, literary paternity.


Page 22:
Contents, , CHAPTER, , Introduction, , 1, , 1.1, 1.2, 1.3, 1.4, 1.5, , 1.6, 1.7, 1.8, , 1, , Overview 1, The main components of a computer 3, An example system: walk through the Jargon, Standard Organizations 10, Historical Development 12, 1.5.1, , Generation Zero: Mechanical Calculating Machines (1642–1945), , 1.5.2, , The First Generation: Vacuum Tube Computers (1945–1953), , 1.5 . 3, , The second generation: Transistor computers (1954–1965), , 1.5.4, , The third generation: Integrated circuit computers (1965–1980), , 1.5.5, , The fourth generation: VLSI computers (1980 – ? ???) von Neumann Models 29, , Chapter Summary, , , 25, , 31, 31, , 32, , Review of Key Terms and Concepts, Exercises, , , 4, , 33, , 34, , xxi


Page 23:
xxii, , Contents, , CHAPTER, , Representation of data in computer systems, , 2, , 2.1, 2.2, 2.3, , 2.4, , 2.5, , 2.6, , 2.7, , 2.8, , Introduction 37, Positional numbering systems 38 , Decimal to binary conversions 38, 2.3.1, , Conversion of unsigned integers, , 2.3.2, , Conversion of fractions, , 2.3.3, , Conversion between powers of two roots, , Representation of signed integers, 2.4 . 1, , Signed magnitude, , 2.4.2, , Complementary systems, , 49, , 55, , A simple model, , 2.5.2, , Floating point arithmetic, , 56, , 2.5.3, , Point errors floating, , 2.5. 4, , The IEEE-754 Floating Point Standard, , , 58, , 59, 61, , 62, , 2.6.1, , Binary Coded Decimal, , 2.6.2, , EBCDIC, , 2.6.3, , ASCII, , 2.6.4, , Unicode, , 62, , 63, , 63, 65, , Codes for recording and transmitting data, 2.7.1, , Code without return to zero, , 2.7.2, , Encoding without return Zero inversion , , 2.7.3, , Phase modulation (Manchester coding), , 2.7.4, , Frequency modulation, , 2.7.5, , Run-length limited code, , Error detection and correction in, 2.8 .1, , Cyclic Redundancy Checking , , 2.8.2, , Hamming Codes, , 2.8.3, , Reed-Soloman, , References, , 77, 82, , 83, , 85, , Review of Essential Terms and Concepts, 86 , , 68, , 70, , Further Reading 84, , Exercises, , 44, , 2.5.1, , Chapter Summary, , 44, , 44, , Floating Point Representation, , Character Codes, , 39, , 41 , , 85, , 71, , 73, 73, , 69, 70, , 67, , 37


Page 24:
Contents, , CHAPTER, , Boolean algebra and digital logic, , 3, , 3.1, 3.2, , 3.3, , 3.4, , 3.5, , 3.6, , 3.7, , Boolean expressions, , 3.2.2, , Boolean identities, , 3.2 .3, , Simplification of Boolean expressions, , 3.2.4, , Complements 99, , 3.2.5, , Representation of Boolean functions, , 94, , 96, 98, , 100, , Logic gates 102, 3.3.1, , Symbols for logic gates, , 3.3.2, , universal gates, , 3.3.3, , multiple input gates, , 102, , 103, 104, , digital components 105, 3.4.1, , digital circuits and their relationship with the Boolean algebra, , 3.4.2, , Integrated circuits, , 106, , Combinational circuits 106, 3.5.1, , Basic concepts, , , 3.5.2, , Examples of typical combinational circuits, , 107, 107, , Sequential circuits 113 , 3.6 . 1 , , Basic Concepts, , , 3.6.2, , Clocks, , 3.6.3, , Flip-Flops, , 3.6.4, , Examples of Sequential Circuits, , Further Reading, References, , 114, , 114, 115, , Circuit Design, , Chapter Summary, , 117, , 120, , 121, 122, , 123, , Review of Essential Terms and Concepts, Exercises, , 93, , Introduction 93, Boolean Algebra 94, 3.2.1, , , 123, , 124, , Focus on Karnaugh maps, , , 130, , 3A.1, , Introduction , , 3A.2, , Description of Kmaps and terminology, , , 131, , 3A.3, , Kmap simplification for two variables, , 133, , 3A.4, , Simplification of Kmap for three variables, , 3A. 5, , Kmap simplification for four variables, , 3A.6, , Regardless of conditions, , 3A.7, , Summary, , Exercises, , 141, , 130, , 141, , 140, , xxiii, , 134, 137 , , 105


Page 25:
xxiv, , Index, , CHAPTER, , MARIE: Introduction to a simple computer, , 4, , 4.1, , 4.2, , 4.3, , 4.4, 4.5, , 4.6, 4.7, 4.8, , 145, , Introduction 145, 4.1 . 1, , Basic concepts and organization of the CPU, , 145, , 4.1.2, , The bus, , 4.1.3, , Clocks, , 4.1.4, , The input/output subsystem, , 4.1.5, , Memory Organization and Addressing, , 4.1.6, , Interrupts, , , 147, 151, 153, 153, , 156, , MARIE 157, 4.2.1, , The architecture, , 157, , 4.2.2, , Registers and buses , , 4.2 . 3, , The instruction set architecture, , 4.2.4, , Register transfer notation, , 159, 160, , 163, , Instruction processing 166, 4.3.1, , The fetch-decode-execute loop, , 4.3.2, , Interrupts and I/O, , 166, , 166, , A simple program 169, A discussion of assemblers 170, 4.5.1, , What do assemblers do?, , 170, , 4.5.2, , Why Use Assembly Language?, , 173, , Expanding Our Instruction Set 174, A Discussion of Decoding: Hardwired vs. Microprogrammed Control 179, Real-World Examples of Computer Architectures 182, 4.8.1, , Intel Architectures , , 4.8.2, , MIPS architectures, , Chapter summary, , 183, 187, , 189, , Further reading 190, References, , 191, , Review of essential terms and concepts, Exercises, , 192, , 193, , CHAPTER , , A Closer Look at Instruction Set Architectures, , , , , 5.1, 5.2, , Introduction 199, Instruction Formats 199, 5.2.1, , Design decisions for instruction sets, , 200, , 199


Page 26:
Content, , 5.3, 5.4, , 5.5, 5.6, , 5.2.2, , Little versus Big Endian, , 5.2.3, , Internal CPU storage: stacks versus registers, , 5.2.4, , Number of operands and length , 211 .3, , Java Virtual Machine, , , 220, , 226, , 227, 228, , 229, , CHAPTER, , Memory, , , 6, , 6.1, , 233, , Memory 233, Memory types 233 , The Hierarchy report, 6.3.1, , 6.5, , 221, , 225, , Review of essential terms and concepts, , 6.4, , 204, , 212, , 5.6.2, , References, , 6.3, , 203, , 208, , 5.6.1, , Further Reading, , 6.2, , 201, , Pipeline at Instruction Level 214, Real ISA Examples 219, , Chapter Summary, , Exercises, , xxv, , 235, , Reference Locale , , Cache , , 237, , 237, , 6.4.1, , Cache mapping schemes, , 6.4.2, , Replacement policies, , 6.4.3, , Access time or Effective and hit rate, , 6.4.4, , When does the cache stop? 6.5.1, , Paging, , 251, , 6.5.2, , Effective access time using paging, , 6.5.3, , Putting it all together: using cache, TLB and paging, , 6.5.4, , Advantages and disadvantages of paging and virtual memory, , 6.5.5, , Segmentation, , 6.5.6, , Paging combined with segmentation, , 258, , 262, 263, , 259, 259


Page 27:
xxvi, , Contents, 6.6, , A real-life example of memory management, , Chapter summary, , 263, , 264, , Further reading 265, References, , 266, , Review of essential terms and concepts, Exercises, , 266, , 267, , CHAPTER, , Input/output and storage systems, , 7, , 7.1, 7.2, 7.3, , 7.4, , 7.5, , 7.6, 7.7, , 7.8, , 273, , Introduction 273, Amdahl's Law 274 , I/O architectures 275, 7.3.1, , I/O control methods, , 7.3.2, , I/O bus operation, , 276, , 7.3.3, , Another look at I/O interrupts Controlled I/O, , 280, , Magnetic Disk Technology, , 283, , 286, , 7.4.1, , Hard Disk Drives, , 288, , 7.4.2, , Floppy Disks (Floppy Disks), , 292, , Disks Optical Discs 293, 7.5.1, , CD-ROM, , 7.5.2, , DVD 297, , 7.5.3, , Optical Disc Recording Methods, , Magnetic Tape, RAID 301, , 294, 298, , 299, , 7.7.1, , RAID Level 0, , 302 , , 7.7.2, , RAID Level 1, , 303, , 7.7.3, , RAID Level 2, , 303, , 7.7.4, , RAID Level 3, , 304 , , 7.7.5, , RA ID level 4, , 305, , 7.7.6, , RAID level 5, , 306, , 7.7.7, , RAID level 6, , 307, , 7. 7.8, , hybrid RAID systems, , data compression, , 308 , , 309, , 7.8.1, , Statistical encoding, , 7.8.2, , Ziv-Lempel (LZ) dictionary systems, , 311, , 7.8.3, , GIF compression, , 7.8.4, , JPEG compression, , 322 , 323, , 318


Page 28:
Contents, Chapter Summary, Further Reading, References, , 328, 328, , 329, , Review of Essential Terms and Concepts, Exercises, , 330, , 332, , Focus on Selected Disk Storage Implementations, 7A.1, , Introduction, , 7A.2, , Data transmission modes, , 7A.3, , SCSI, , 7A.4, , Storage area networks, , 350, , 7A.5, , Other I/O connections, , 352, , 7A.6, , Summary, , Exercises, , 8, , 8.1, , 8.4, , 8.5, 8.6, 8.7, , 335, 335, , 354, , 354, , System software, , 8.3, , 335, , 338, , CHAPTER, , , 8.2, , xxvii, , 357, , Introduction 357, Operating systems 358, 8.2.1, , History of operating systems, , 8.2.2, , Design of the operating system, , 8.2. 3 , , Operating system services, , 359, 364 , 366, , Protected environments 370, 8.3.1, , Virtual machines, , 8.3.2, , Subsystems and partitions, , 371, , 8.3.3, , Protected environments and an evolution of systems architectures, , 374, , Programming tools 378 , 8 .4.1 , Assemblers and Assembly, , 8.4.2, , Link Editors, , 8.4.3, , Dynamic Link Libraries, , 8.4.4, , Compilers, , 8.4.5, , Interpreters, , 384, 388, , Java: All of the Above 389, Database Software 395, Transaction Managers 401, , Chapter Summary, Further Reading, , 403, 404, , 378, , 381, 382, ​​, 376


Page 29:
xxviii, , Index, References, , 405, , Review of essential terms and concepts, Exercises, , 407, , CHAPTER, , Alternative architectures, , , 9, , 9.1, 9.2, 9.3, 9.4, , 9.5, , 406 , , 411 , , Introduction 411, RISC Machines 412, Flynn's Taxonomy 417, Parallel and Multiprocessor Architectures, 9.4.1, , Superscalar and VLIW, , 9.4.2, , Vector Processors, , 9.4.3, , Interconnection Networks, , 9.4 4, , Shared Memory Multiprocessors, , 9.4.5, , Distributed Computing, , 424, 425, 430, , 434, , Alternative Parallel Processing Approaches, 9.5.1, , Data Stream Computing, , 9.5.2 , , Neural networks , , 9.5. 3, , Systolic Matrices, , Chapter Summary, , 421, , 422, , 435, , 435, , 438, 441, , 442, , Further Reading 443, References, , 443, , Review of Essential Terms and Concepts, Exercises . 1, , What does Eu an mean, , 10.3.2, , Statistics and semantics, , Benchmarking, , 451, , 452, , 454, 459, , 461, , 10.4.1, , Clock frequency, MIPS and FLOPS, , 10.4 .2, , Synthetic Benchmarks: Whetstone, Linpack, and Dhrystone, , , 464, , 10.4.3, , Standard Performance Evaluation Cooperation Benchmarks, , 465, , 10.4.4, , Benchmarks Transaction Performance Council, , 10.4.5, , Simulation System, , 476 , , 462, , 469


Page 30:
Contents, 10.5, , 10.6, , CPU performance optimization, Branch optimization, , 10.5.2, , Using good algorithms and simple code, , 477, , Understanding the problem, , 10.6.2, , Physical considerations, , 10.6.3 , , Logical considerations, , Further reading, References, , 484, , 485, 486, , 492, 493, , 494, , Review of essential terms and concepts, , 495, , 495, , CHAPTER, Organization and Network Architecture, Disk Performance , , xxix, , 501, , Introduction 501, Early Commercial Computer Networks 501, Early Academic and Scientific Networks: The Roots and Architecture of the Internet 502, Network Protocols I: ISO/OSI Protocol Unification 506, 11.4.1, , The Parabola, , 11.4.2, , The OSI Reference Model, , 507, 508, , Network Protocols II: TCP/IP Network Architecture 512, 11.5.1, , The IP Layer for version 4, , 11.5.2, , the problem with IP version 4, , 512, 516, , 11.5.3, , Transmission Control Protocol, , , 520, , 11.5.4, , the protocol o TCP in action, , 11.5.5, , IP version 6, , 521, , 525, , Network organization, , 530, , 11.6 .1, , Physical transmission media, , 11.6.2, , Interface cards , , 11.6.3, , Repeaters, , , 11.6.4, , Hubs, , , 11.6.5, , Switches, , , 11.6.6, , Bridges and Gateways, , , 11.6.7, , Routers and Routing, , 535, , 536, , 537, 537, 538, 539, , High-capacity digital links 548, 11.7.1, , The digital hierarchy , , 549, , 530


Page 31:
xxx, , Contents, , 11.8, , 11.7.2, , ISDN, , 11.7.3, , Asynchronous Transfer Mode, , 553, , A look at the Internet, , 557, , 11.8.1, , Internet Access , , 11.8.2, , Internet Extension, , Chapter summary, , 556, , 558, , 565, , 566, , Further reading 566, References, , 568, , Review of essential terms and concepts, Exercises, , 568 , , 570, , APPENDIX, , Data Structures and the Computer, , A, , A.1, A.2, , A.3, A.4, , 575, , Introduction 575, Fundamental Structures 575, A.2.1, , Matrices , , 575, , A.2.2, , Queues and Linked Lists, , A.2.3, , Stacks, , Trees 581, Network Graphs, , Summary, , 577, , 578, , 587, , 590, , Further reading 590 , References, Exercises, , 590, 591, , Glossary, , 595, , Answers and tips for selected exercises, , 633, , Index, , 647


Page 32:
“Computer science is no longer about computers. and survive 🇧🇷 🇧🇷 🇧🇷 We have seen computers go from giant air-conditioned rooms to closets, then to desks and now to our laps and pockets. But this is not the end. 🇧🇷 🇧🇷 🇧🇷 Like a force of nature, the digital age cannot be denied or stopped. 🇧🇷 🇧🇷 🇧🇷 The information superhighway may be mostly hyped up today, but it's an understatement for tomorrow. It will exist, beyond people's wildest predictions. 🇧🇷 🇧🇷 🇧🇷 We are not waiting for any invention. Is here. And now. It is almost genetic in nature, in the sense that each generation will become more digital than the last.” R. Negroponte is one of many who see the computer revolution as a force of nature. This force has the potential to lead humanity to its digital destiny, enabling us to overcome problems that have eluded us for centuries, as well as all the problems that arise as we solve the original problems. Computers have freed us from the tedium of mundane tasks, unleashing our collective creative potential so that we can, of course, build bigger and better computers. Looking at the profound scientific and social changes that computers have brought us, it's easy to start feeling overwhelmed by the complexity of it all. This complexity, however, emanates from fundamentally very simple concepts. These simple ideas are what got us to where we are today and are the foundation of the computers of the future. How far into the future they will survive is anyone's guess. But today they are the foundation of all computing as we know it. Computer scientists are generally more concerned with writing complex program algorithms than with designing computer hardware. Of course, if we want our algorithms to be useful, eventually a computer will have to run them. Some algorithms are so complicated that they would take a long time to run on current systems. These types of algorithms are considered computationally infeasible. Certainly, at the current rate of innovation, some things that are unfeasible today may be doable tomorrow, but it seems that no matter how big or fast computers are, someone will think of a problem that will push the limits of the machine. 1


Page 33:
2, , Chapter 1 / Introduction, , To understand why an algorithm is unfeasible, or to understand why the implementation of a viable algorithm runs very slowly, you must be able to see the program from the computer's point of view. You need to understand what makes a computer system work before trying to optimize the programs it runs. Trying to optimize a computer system without first understanding it is like trying to tune up your car by pouring elixir into the gas tank: you'll be lucky if it works when you're done. Program optimization and system tuning are perhaps the most important motivations. to learn how computers work. There are, however, many other reasons. For example, if you want to write compilers, you need to understand the hardware environment in which the compiler will work. The best compilers use specific hardware features (like pipelines) for speed and efficiency. If you need to model large, complex real-world systems, you need to know how floating-point arithmetic is supposed to work as well as how it actually works, in practice. If you want to design peripheral equipment or the software that controls it, peripheral equipment, you must know every detail of how a given computer handles its input/output (I/O). If your work involves embedded systems, you should be aware that these systems often have limited resources. Your understanding of time, space, and price trade-offs, as well as I/O architectures, will be critical to your career. Systems People conducting research involving hardware systems, networks, or algorithms find benchmarking techniques crucial to their daily work. Technical managers tasked with purchasing hardware also use benchmarks to help them buy the best system, for a given amount of money, while taking into account the ways in which performance benchmarks can be manipulated to imply favorable system outcomes. . The examples illustrate the idea that there is a fundamental relationship between computer hardware and many aspects of programming and software components in computer systems. Therefore, regardless of our area of ​​expertise, as computer scientists it is imperative that we understand how hardware and software interact. We should become familiar with how various circuits and components fit together to create working computer systems. We do this by studying the organization of computers. Computer organization addresses issues such as control signals (how the computer is controlled), signaling methods, and types of memory. It covers all physical aspects of computer systems. Help us answer the question: How does a computer work? logical aspects of the system implementation seen by the programmer. Computer architecture includes many elements, such as instruction sets and formats, operation codes, data types, number and types of registers, addressing modes, main memory access methods, and various I/O mechanisms. The architecture of a system directly affects the logical execution of programs. Studying computer architecture helps us answer the question: How do I design a computer?


Page 34:
1.2 / The main components of a computer, , 3, , The computer architecture for a given machine is the combination of its hardware components plus its instruction set architecture (ISA). ISA is the agreed upon interface between all the software running on the machine and the hardware that runs it. ISA allows you to talk to the machine. The distinction between computer organization and computer architecture is unclear. People in the fields of computer science and computer engineering have different opinions about exactly which concepts belong to computer organization and which belong to computer architecture. In fact, neither the computer nor the organization nor the architecture of the computer can be autonomous. They are interrelated and interdependent. We can only truly understand each of them after understanding both. Our understanding of the organization and architecture of computers leads to a deeper understanding of computers and computing: the heart and soul of computer science. Computers, organization and ideas related to the architecture of computers, it is impossible to say where hardware problems end and software problems begin. . Computer scientists design algorithms that are usually implemented as programs written in some computer language, such as Java or C. But what makes the algorithm run? Another algorithm of course! And another algorithm runs that algorithm, and so on, down to the level of the machine, which can be thought of as an algorithm implemented as an electronic device. So modern computers are actually implementations of algorithms that run other algorithms. This chain of nested algorithms leads us to the following principle: Principle of equivalence of hardware and software: everything that can be done with software can also be done with hardware, and everything that can be done with hardware can also be done with software.1 , A special-purpose computer can be designed to perform any task, such as word processing, budget analysis, or playing a friendly game of Tetris. Consequently, programs can be written to perform the functions of special-purpose computers, such as the systems embedded in your car or in your microwave oven. There are times when a simple embedded system gives us much better performance than a complicated computer program, and there are times when a program is the preferred approach. The principle of equivalence of hardware and software tells us that we have a choice. Our knowledge of the organization and IT architecture will help us make the best decision.


Page 35:
4, , Chapter 1 / Introduction, , We begin our discussion of computer hardware by looking at the components needed to build a computer system. At the most basic level, a computer is a device consisting of three parts: 1. A processor for interpreting and executing programs, 2. A memory for storing data and programs, 3. A mechanism for transferring data to and from the outside world. We discuss these three components in detail in relation to computer hardware in subsequent chapters. like you, he could change his behavior if he wanted to. You might even feel like you have some things in common with him. This idea is not as far-fetched as it seems. Consider how a student sitting in a classroom demonstrates the three components of a computer: the student's brain is the processor, the notes taken represent memory, and the pencil or pen used to take notes is the I/O mechanism. . But remember that its capabilities far exceed those of any computer in the world today, or any that may be built in the near future. part of the specific vocabulary of computer science. This jargon can be confusing, imprecise, and intimidating. We think that with a little explanation we can dispel the fog. For discussion purposes, we provide a computer facsimile announcement (see Figure 1.1). The ad is typical of many in that it bombards the reader with phrases like "64MB SDRAM," "64-bit PCI sound card," and "32KB L1 cache." Without a grasp of that terminology, it would be hard to know if the right system is a smart buy, or even if the system is capable of meeting your needs. As we go through this book, you will learn the concepts behind these terms. Before we explain the ad, though, we need to discuss something even more basic: the measurement terminology you'll encounter throughout your study of computers. It seems that each field has its own way of measuring things. The field of computing is no exception. In order for people who work with computers to tell each other how big something is or how fast something is going, they must use the same units of measurement. When we want to talk about the size of a computer, we talk about it in terms of thousands, millions, billions, or trillions of characters. Term prefixes are given on the left side of Figure 1.2. In computer systems, as you will see, powers of 2 are often more important than powers of 10, but powers of 10 are easier for people to understand. Therefore, these prefixes occur in both powers of 10 and powers of 2. Since 1000 is a value close to 210 (1024), we can approximate powers of 10 by powers of 2. The prefixes used in system metrics often apply when the underlying base system is base 2, not base 10. example, to


Page 36:
1.3 / An example system: Wading through the jargon, , 5, , FOR SALE: OBSOLETE COMPUTER - GO DOWN! CHEAP! ECONOMIC!, • Pentium III 667 MHz, • 133 MHz 64MB SDRAM, • 32KB L1 cache, 256KB L2 cache, • 30GB EIDE hard drive (7200 RPM), • 48X max variable CD-ROM, • 2 USB ports, 1 port serial, 1 parallel port, • 19" monitor, 0.24 mm AG, 1280 × 1024 @ 85 Hz, • Intel 3D AGP graphics card, • PCI 56K voice modem, • PCI 64-bit sound card, FIGURE 1.1 Typical computer ad, Kilo - (K), , (1 mil = 103 ~, ~ 210), , Milli- (m), , –10, (1 mil = 10 –3 ~, ~2 ), , Mega- (M), , 20 , (1 million = 106 ~, ~2 ), , Micro- (µ), , (1 millionth = 10–6 ~, ~ 2 –20), , Nano- (n) , , ( 1 billionth = 10– 9 ~ 2 –30), , Pico- (p), , (1 trillionth = 10–12 ~, ~ 2 –40), , Femto- (f), , (1 quadrillionth = 10– 15 ~ 2 –50) , , 109, , Giga- (G), , (1 billion =, , Tera- (T), , (1 billion = 1012 ~, ~ 240), , Peta- (P) , , (1 quadrillion =, , ~ , , 230), , 1015, , ~, , 250), , FIGURE 1.2 Common prefixes associated with computer organization and architecture, , kilobyte (1 KB) of memory it is typically 1,024 b bytes of memory r more than 1000 bytes of memory. However, a 1 GB drive may actually have 1 billion bytes instead of 230 (approximately 1.7 billion). You should always read the manufacturer's fine print just to make sure you know exactly what 1K, 1KB, or 1G means. , billionths or billionths. The prefixes for these metrics are provided on the right hand side of Figure 1.2. Note that fractional prefixes have exponents that are the reciprocal of the prefixes on the left side of the figure. So if someone tells you that an operation takes a microsecond to complete, you also need to understand that a million of these operations can be done in a second. When you need to talk about how many of these things happen in a second, you should use the mega prefix. When you need to talk about the speed with which operations are performed, you should use the micro prefix.


Page 37:
6, , Chapter 1 / Introduction, , Now explaining the announcement: The microprocessor is the part of the computer that actually executes the instructions of the program; It is the brain of the system. The microprocessor in the ad is a Pentium III, which runs at 667 MHz. Every computer system contains a clock that keeps the system synchronized. The clock simultaneously sends electrical pulses to all major components, ensuring that data and instructions are where they need to be, when they need to be. The number of pulses emitted each second by the clock is its frequency. Clock frequencies are measured in cycles per second or hertz. Because computer system clocks generate millions of pulses per second, they are said to operate in the megahertz (MHz) range. Many computers today operate in the gigahertz range, generating billions of pulses per second. And since nothing is done in a computer system without the involvement of the microprocessor, the frequency and rating of the microprocessor are crucial to the overall speed of the system. The system's microprocessor in our ad operates at 667 million cycles per second, so the vendor says it runs at 667 MHz, or, equivalently, that each instruction requires 1.5 nanoseconds to execute. Later in this book, you will see that each computer instruction requires a fixed number of cycles to execute. Some instructions require one clock cycle; however, most instructions require more than one. The number of instructions per second that a microprocessor can actually execute is proportional to its clock speed. The number of clock cycles required to execute a given machine instruction is a function of both the organization of the machine and its architecture. The next thing we see in the ad is “133MHz 64MB SDRAM”. The 133 MHz refers to the speed of the system bus, which is a group of wires that moves data and instructions to various places within the computer. Like the microprocessor, the speed of the bus is also measured in MHz. Many computers have a special local bus for data, which supports very fast transfer speeds (as required by video). This local bus is a high-speed path that connects memory directly to the processor. The speed of the bus ultimately sets the upper limit of the information-carrying capacity of the system. The system in our ad also has a memory capacity of 64 megabytes (MB), or about 64 million characters. Memory capacity determines not only the size of the programs you can run, but also how many programs you can run at the same time without overloading your system. The manufacturer of your application or operating system generally recommends how much memory you will need to run their products. (Sometimes these recommendations can be hilariously conservative, so be careful who you believe!). In addition to memory size, our advertised system gives us one type of memory, SDRAM, short for Synchronous Dynamic Random Access Memory. SDRAM is much faster than conventional (non-synchronous) memory because it can be synchronized with the bus of a microprocessor. Until now, SDRAM bus synchronization is only possible with buses running at or below 200 MHz. Newer memory technologies such as RDRAM (Rambus DRAM) and SLDRAM (SyncLink DRAM) are required for systems running faster buses.


Page 38:
1.3 / A System Example: Flipping Through the Jargon, , A Look Inside a Computer, Ever wonder what the inside of a computer really looks like? The sample computer described in this section provides a good overview of the components of a modern PC. However, opening a computer and trying to find and identify the various parts can be frustrating, even if you are familiar with the components and their functions. Courtesy of Intel Corporation If you remove the computer's cover, you'll no doubt first notice a large metal case with a fan attached. This is the power supply. You will also see multiple drives, including a hard drive and perhaps a floppy drive and a CD-ROM or DVD drive. There are lots of ICs: little black rectangular boxes with legs attached. You will also notice electric roads, or buses, in the system. There are printed circuit boards (expansion boards) that fit into motherboard sockets, the large board on the bottom of a standard desktop PC, or the side of a PC configured as a tower or mini-tower. The motherboard is the printed circuit board that connects all the components of the, , 7


Page 39 :
8, , Chapter 1 / Introduction, , computer, including the CPU and RAM and ROM, as well as a variety of other essential components. Motherboard components tend to be the hardest to identify. Above you can see an Intel D850 motherboard with the most important components labeled. I/O ports at the top of the board allow the computer to communicate with the outside world. The I/O controller hub allows all connected devices to work without conflict. PCI slots allow for expansion cards belonging to various PCI devices. The AGP connector is for connecting the AGP graphics card. There are two banks of RAM and a memory controller hub. There is no processor attached to this motherboard, but we can see the socket where the CPU will be placed. All computers have an internal battery, as seen in the lower left corner. This motherboard has two IDE connector slots and a floppy controller. The power supply plugs into the power connector. A note of caution when looking inside the case: There are many safety considerations involved in removing the cover for you and your computer. There are many things you can do to minimize your risk. First, make sure your computer is turned off. It is generally preferred to leave it plugged in, as this provides a path for static electricity. Before you open your computer and touch anything inside, make sure it's properly grounded so that static electricity won't damage any components. Many of the edges, both on the cover and on the circuit boards, can be sharp, so be careful when handling the various parts. Attempting to insert misaligned cards into sockets can damage both the card and the motherboard, so be careful if you decide to add a new card or remove and reinstall an existing one. 256 KB cache” also describes a type of memory. In Chapter 6, you'll learn that no matter how fast a bus is, it still takes “a while” to get data from memory to the processor. To provide even faster access to data, many systems contain special memory called a cache. Our ad system has two types of cache. The Level 1 (L1) cache is a small, fast memory cache that is built into the microprocessor chip and helps speed up access to frequently used data. The Level 2 (L2) cache is a collection of fast onboard memory chips located between the microprocessor and main memory. Please note that our system cache has a capacity of kilobytes (KB), which is much less than main memory. In Chapter 6, you'll learn how caching works and that a bigger cache isn't always better. On the other hand, everyone agrees that the more fixed drive capacity you have, the better off you'll be. The announced system has 30 GB, which is quite impressive. However, the storage capacity of a fixed (or hard) drive is not the only thing to consider. A large disk is not much use if it is too slow for the host system. The computer in our ad has a hard drive that spins at 7200 RPM (revolutions per minute). To the experienced reader, this indicates (but does not establish)


Page 40:
1.3 / An example system: read the lingo, , 9, , directly) that this is a reasonably fast unit. Disk speeds are typically expressed in terms of the number of milliseconds it takes (on average) to access data on the disk, plus the disk's rotational speed. Spinning speed is only one of the determining factors in the overall performance of a drive. The way it connects or interacts with the rest of the system is also important. The announced system uses a drive interface, called EIDE, or Enhanced Integrated Drive Electronics. EIDE is an inexpensive hardware interface for mass storage devices. EIDE contains a special circuit that allows it to improve the connectivity, speed and memory capacity of a computer. Most EIDE systems share the main system bus with the processor and memory; the speed of the system bus. While the system bus is responsible for all data movement internal to the computer, ports allow data to move to and from devices external to the computer. Our advertisement talks about three different ports with the line "2 USB ports, 1 serial port, 1 parallel port". Most desktop computers come with two types of data ports: serial ports and parallel ports. Serial ports transfer data by sending a series of electrical pulses over one or two data lines. Parallel ports use at least eight data lines, which are powered simultaneously to transmit data. Our advertised system also comes equipped with a special serial connection called a USB (Universal Serial Bus) port. USB is a popular external bus that supports Plug-and-Play (the ability to automatically configure devices) as well as Hot Plug (the ability to add and remove devices while the computer is running). bus with dedicated I/O buses. Peripheral, Component Interconnect (PCI) is one such I/O bus that supports the connection of various peripheral devices. PCI, developed by Intel Corporation, runs at high speeds and is also Plug-and-Play compatible. There are two PCI devices, mentioned in the announcement. The PCI modem allows the computer to connect to the Internet. (We discussed modems in detail in Chapter 11.) The other PCI device is a sound card, which contains the necessary components for the system's stereo speakers. You'll learn more about the different types of I/O, I/O buses, and disk storage in Chapter 7. After telling us about the ports in the advertised system, the ad gives us some specifications for the monitor, saying: " 19 "monitor, .24mm AG, 1280 ⫻ 1024 @ 85 Hz.” Monitors have little to do with the speed or efficiency of a computer system, but they have a major influence on user comfort. The ad monitor supports a refresh rate of 85 Hz. This means that the image displayed on the monitor is repainted 85 times. per second. If the refresh rate is too slow, the screen may exhibit irritating or wavy behavior. Visual fatigue caused by wavy screen makes people tired easily; some people may even experience headaches after prolonged periods of use. Another source of eyestrain is low resolution. A higher resolution monitor allows for better viewing and finer graphics. Resolution is determined by the monitor's dot pitch, which is the distance between a dot (or pixel) and the closest dot of the same color. The smaller the dot, the sharper the image. In this case, we have a 0.28 millimeter


Page 41:
10, , Chapter 1 / Introduction, , distance between dots (mm) supported by an AG (aperture grid) display. Direct opening louvers, the electron beam that paints the monitor's image onto the internal phosphor coating, the monitor's glass. AG monitors produce sharper images than older shadow mask technology. This monitor is even compatible with an AGP (Accelerated Graphics Port) graphics card. This is a graphical interface designed by Intel specifically for 3D graphics. In light of the discussion above, you may be wondering why the monitor's dot and pitch cannot be arbitrarily small to provide perfect image resolution. The reason is that the refresh rate depends on the passing of the points. Updating 100 points, for example, takes more time than updating 50 points. A smaller dot pitch requires more dots to cover the screen. The more points that are updated, the longer each update cycle will take. Experts recommend a refresh rate of at least 75Hz. The monitor's advertised refresh rate of 85Hz is better than the minimum recommendation of 10Hz (about 13%). While we can't delve into every brand-specific component available, by completing this book you should understand the concept of how most computer systems work. This understanding is important for both casual users and experienced programmers. As a user, you need to know the strengths and limitations of your computer system so that you can make informed decisions about applications and therefore use your system more effectively. As a programmer, you need to understand exactly how your system and hardware work in order to write effective and efficient programs. For example, something as simple as the algorithm your hardware uses for mapping, main memory for caching, and the method used for memory interleaving can have a tremendous impact on your decision to access array elements in parent row or column order. we investigate large and small computers. Mainframe computers include mainframes (enterprise-class servers) and supercomputers. Small computers include personal systems, workstations, and portable devices. The systems are very similar. We also visit some architectures that are outside of what is now mainstream computing. We hope that the knowledge gained in this book will serve as a springboard for your continued studies in the vast and exciting fields of computer architecture and organization. have one of those new AG 0.28mm pitch-spot monitors. You think you can shop around to find the best price. You make a few phone calls, surf the web, and drive around town until you find the one that offers the best value for money. You know from experience that you can buy your monitor anywhere and it will probably work just fine in your system. You can make this assumption because the computer


Page 42:
1.4 / Standards Organizations, , 11, , manufacturers have agreed to comply with connectivity and operational specifications set forth by various government and industry organizations. Some of these standard setting organizations are ad hoc trade associations or consortia made up of industry leaders. Manufacturers know that by establishing common guidelines for a given type of equipment, they can market their products to a broader audience than if they had separate and perhaps incompatible specifications. Some standards organizations are formally licensed and internationally recognized as the definitive authority in certain areas of electronics and computing. As you continue your studies in computer organization and architecture, you will come across specifications formulated by these groups, so you should know something about them. The Institute of Electrical and Electronics Engineers (IEEE) is an organization dedicated to advancing the electronic engineering professions. and electronics. informatica IEEE actively promotes the interests of the worldwide engineering community by publishing a variety of technical literature. The IEEE also sets standards for various computing components, signaling protocols, and data representation, to name just a few areas of its involvement. The IEEE has a democratic, if complicated, procedure for creating new standards. Your final documents are highly regarded and often last for several years before requiring revision. The International Telecommunication Union (ITU) has its headquarters in Geneva, Switzerland. The ITU was formerly known as the International Consultative Committee, Télégraphique et Téléphonique, or International Consultative Committee on Telephone and Telegraphy. As its name suggests, the ITU is concerned with the interoperability of telecommunication systems, including telephone, telegraph, and data communication systems. The ITU's telecommunications arm, ITU-T, has established several standards that you will find in the literature. You'll see these standards prefixed by ITU-T or by the group's old initials, CCITT. Many countries, including the European Community, have commissioned comprehensive organizations to represent their interests in various international groups. The group that represents the United States is the American National Standards Institute (ANSI). Great Britain has its British Standards Institution (BSI), as well as having a voice in CEN (Comite Europeen de Normalisation), the European committee for standardisation. The International Organization for Standardization (ISO) is the body that coordinates the development of standards worldwide, including the activities of ANSI, with BSI, among others. ISO is not an acronym, but is derived from the Greek word isos, which means “equal”. ISO consists of more than 2,800 technical committees, each dealing with some global standardization problem. His interests range from the behavior of photographic film to the pitch of screws and the complex world of computer engineering. The proliferation of world trade was facilitated by the ISO. Today, ISO affects virtually every aspect of our lives. Throughout this book, we refer to the official standard designations where appropriate. Definitive information on many of these patterns can be


Page 43:
12, , Chapter 1 / Introduction, , which is found in excruciating detail on the website of the organization responsible for setting the cited standard. As an added benefit, many standards contain "normative" and informational references, which provide basic information in areas related to the standard. convenience. Living memory is forced to recall the days of shorthand, swimming pools, carbon paper, and mimeographs. Sometimes it seems that these magical computing machines instantly developed into the form we now know. But the path of computer development is paved with accidental discoveries, commercial coercion, and outlandish whims. And occasionally computers even got better through the application of good engineering practices! Despite all the technological twists, turns, and dead ends, computers have evolved at a rate that defies comprehension. We can fully appreciate where we are today only when we see where we came from. In the following sections, we divide the evolution of computers into generations, each generation being defined by the technology used to build the machine. We provide approximate dates for each generation for reference purposes only. You will find little agreement among experts on the exact start and end times of each technological epoch. in the late 1990s. How much computing do we really see coming out of the mysterious boxes placed on or next to our desks? Until recently, computers only served us with mind-blowing mathematical manipulations. No longer limited to scientists in white coats, today's computers help us type documents, keep in touch with loved ones around the world, and do our shopping. Modern business computers spend only a minuscule part of their time doing accounting calculations. Its main objective is to provide users with a large amount of strategic information for a competitive advantage. Has the word computer become a misnomer? An anachronism? So what should we call them if not computers? We cannot present the complete history of computing in a few pages. Entire books have been written on this subject and even leave their readers wanting more details. If we have piqued your interest, we suggest that you consult some of the books cited in the reference list at the end of this chapter., , 1.5.1, , Generation Zero: Mechanical Calculating Machines (1642–1945), Before 1500, a man A typical European businessperson used an abacus to make calculations and recorded the result of their encryption in Roman numerals. After the decimal number system finally replaced the Roman numerals, various people invented devices to make decimal calculations even faster and more accurate.


Page 44:
1.5 / Historical evolution, , 13, , rate. Wilhelm Schickard (1592-1635) is credited with inventing the first mechanical calculator, the Calculating Clock (exact date unknown). This device was capable of adding and subtracting numbers containing up to six digits. In 1642, Blaise Pascal (1623–1662) developed a mechanical calculator called the Pascaline to help his father with her tax work. Pascaline could add, carry and subtract. It was probably the first mechanical adding device to be used for practical purposes. In fact, the Pascalina was so well conceived that its basic design was still in use until the early 20th century, as evidenced by the Lightning Portable Adder in 1908 and the Addometer in 1920. Gottfried Wilhelm von Leibniz (1646–1716), a noted mathematician , invented a calculator known as the Stepped Reckoner that could add, subtract, multiply, and divide. None of these devices could be programmed or had memory. They required manual intervention at every step of their calculations. Although machines like the Pascalina were used in the 20th century, in the 19th century new calculator designs began to emerge. One of the most ambitious of these new designs was the difference engine of Charles Babbage (1791-1871). Some people refer to Babbage as "the father of computing." , movable obstructions out of the way of the locomotives. Babbage built his difference engine in 1822. The difference engine was named after him because it used a calculation technique called the method of differences. The machine was designed to mechanize the solution of polynomial functions and was actually a calculator, not a computer. Babbage also designed a general purpose machine in 1833 called the Analytical Engine. Although Babbage died before he could build it, the analytical engine was designed to be more versatile than the earlier differential engine. The analytical engine could have performed any mathematical operation. The analytical engine included many of the components associated with modern computers: an arithmetic processing unit to perform calculations (Babbage referred to this as the mill), memory (storage), and input and output devices. Babbage also included conditional branching, an operation in which the next statement to execute was determined by the result of the previous operation. Ada, Countess of Lovelace and daughter of the poet Lord Byron, suggested that Babbage write a plan for how the machine would calculate the numbers. This is considered the first computer program, and Ada is considered the first computer programmer. It is also rumored that she suggested using the binary number system instead of the decimal number system to store data. Babbage designed the analytic engine to use a type of punch card for input and programming. The use of cards to control the behavior of a machine did not originate with Babbage, but with one of his friends, Joseph-Marie Jacquard (1752-1834). In 1801, Jacquard invented a programmable weaving machine, a loom that could produce intricate patterns on fabric. Jacquard gave Babbage a tapestry that had been woven on this loom using more than 10,000 punched cards. It seemed natural to Babbage that if a loom could be controlled with cards,


Page 45:
14, , Chapter 1 / Introduction, , then your analytics engine might as well. Ada expressed her satisfaction with this idea, writing: "[The] Analytical Engine weaves algebraic patterns like the Jacquard loom weaves flowers and leaves." computer system. Keyboard input had to wait until fundamental changes were made to the way calculating machines were built. In the second half of the 19th century, most machines used wheeled mechanisms, which were difficult to integrate with early keyboards because they were lever devices. But leveraged, the devices could easily punch cards and devices on wheels could easily read them. Therefore, various devices were invented for encoding and then "tabulating" punched data on cards. The most important tabulating machine of the late 19th century was the one invented by Herman Hollerith (1860-1929). Hollerith's machine was used to code and compile data for the 1890 census. This census was completed in record time, thus increasing Hollerith's finances and the reputation of his invention. Hollerith then founded the company that would become IBM. His 80-column punch card, the Hollerith card, was a staple of automated data processing for more than 50 years. father of computing ”, his machines were mechanical, not electrical or electronic. In the 1930s, Konrad Zuse (1910–1995), continued where Babbage left off, adding electrical technology and other improvements to Babbage's design. Zuse's computer, the Z1, used electromechanical relays instead of Babbage's manual gears. The Z1 was programmable and had a memory, arithmetic unit, and control unit. As money and resources were scarce in wartime Germany, Zuse used discarded film instead of punch cards to gain entry. Although his machine was designed to use vacuum tubes, Zuse, who was building his own machine, could not afford the tubes. Thus, the Z1 correctly belongs to the first generation, although it did not have valves. Zuse built the Z1 in his parents' living room in Berlin while Germany was at war with most of Europe. Fortunately, he couldn't convince the Nazis to buy the machine from him. They did not realize the tactical advantage such a device would give them. Allied bombs destroyed Zuse's first three systems, the Z1, Z2, and Z3. Zuse's impressive machines could not be refined until after the war and turned out to be another "evolutionary dead end" in the history of computing. in the 1930s and 1940s. Pascal's basic mechanical calculator was designed and modified simultaneously by many people; the same can be said of the modern electronic computer. Despite ongoing arguments about who was the first to do what, three people clearly stand out as the inventors of modern computers: John Atanasoff, John Mauchly, and J. Presper Eckert. John Atanasoff (1904–1995) is credited with building the first fully electronic computer. The Atanasoff Berry Computer (ABC) was a binary machine built from vacuum tubes. How this system was specifically built


Page 46:
1.5 / Historical development, , 15, , ly to solve systems of linear equations, we cannot call it a general purpose computer. However, there were some features that the ABC had in common with the general-purpose ENIAC (Electronic Numerical Integrator and Computer), which was invented a few years later. These common features caused considerable controversy over who should get the credit (and patent rights) for the invention of the electronic digital computer. (The interested reader can find more details about a rather long lawsuit involving Atanasoff and ABC in Mollenhoff [1988].), John Mauchly (1907–1980) and J. Presper Eckert (1929–1995) were the two main inventors of the ENIAC. , introduced to the public in 1946. The ENIAC is recognized as the first all-electronic general-purpose digital computer. This machine used 17,468 vacuum tubes, occupied 1,800 square feet of space, weighed 30 tons, and consumed 174 kilowatts of power. The ENIAC had a memory capacity of about 1,000 bits of information (about 20 10-digit decimal numbers) and used punched cards to store data. John Mauchly's vision for an electronic calculating machine grew out of his lifelong interest in mathematically predicting the weather. While a physics professor at Ursinus College near Philadelphia, Mauchly hired dozens of adding machines and student operators to crunch reams of data that he believed would reveal the mathematical relationships behind weather patterns. He felt that if he could have a little more computational power, he could reach the goal, which seemed to be beyond his reach. In keeping with the Allied war effort and with ulterior motives to learn about electronic computing, Mauchly volunteered for a crash course in electrical engineering at the University of Pennsylvania's Moore School of Engineering. Upon completion of this program, Mauchly accepted a teaching position at the Moore School, where he taught a bright young student, J. Presper Eckert. Mauchly and Eckert found a mutual interest in building an electronic calculating device. To secure the funds they needed to build their machine, they wrote a formal proposal for the school to review. They portrayed their machine in the most conservative light possible, classifying it as an "automatic calculator". Although they probably knew that computers could work more efficiently using the binary number system, Mauchly and Eckert designed their system to use base 10 numbers, while maintaining the appearance of building a huge electronic adding machine. The university rejected Mauchly and Eckert's proposal. Fortunately, the US military was more interested. During World War II, the Army had an insatiable need to calculate the trajectories of its new ballistic weapons. Thousands of human "computers" were busy 24 hours a day checking the arithmetic needed for these shooting tables. Realizing that an electronic device could shorten the ballistic table calculation from days to minutes, the Army funded ENIAC. And ENIAC really cut the time to calculate a program from 20 hours to 30 seconds. Unfortunately, the machine was not ready before the end of the war. But ENIAC proved that vacuum tube computers were fast and workable. Over the next decade, vacuum tube systems continued to improve and were commercially successful.


Page 47:
16, , Chapter 1 / Introduction, , US Army, 1946


Page 48:
1.5 / Historical development, , 17, , What is a vacuum tube?, , Plate, (anode), Control, Grid, Cathode, Enclosure, , The wired world we know today was born from the invention of a single electronic device called vacuum tube by the Americans and, more precisely, valve by the British. Vacuum tubes should be called valves because they control the flow of electrons in electrical systems in the same way that valves control the flow of water in a plumbing system. In fact, some types of these mid-20th century electronic tubes do not contain a vacuum at all, but are filled with conductive gases such as mercury vapor, which can provide desirable electrical behavior. The electrical phenomenon that makes the tubes work was discovered. by Thomas A. Edison in 1883 while trying to find ways to prevent the filaments in his light bulbs from burning out (or oxidizing) within minutes of applying electrical current. Edison correctly reasoned that one way to prevent the filament from rusting would be to place it in a vacuum. Edison did not immediately understand that air not only supports combustion, but is also a good insulator. When he energized the electrodes, holding a new tungsten filament, the filament soon became hot and burned, like the others before it. This time, however, Edison noticed that electricity continued to flow from the hot negative terminal to the cold positive terminal inside the bulb. In 1911, Owen Willans Richardson analyzed this behavior. He concluded that when a negatively charged filament was heated, the electrons "boiled" like water molecules can be boiled to create steam. He aptly called this phenomenon thermionic emission. Thermionic emission, as documented by Edison, was considered by many to be simply an electrical curiosity. But in 1905, Edison's former British assistant John A. Fleming saw Edison's discovery as much more than a novelty. He knew that thermionic emission supported the flow of electrons in only one direction: from the negatively charged cathode to the positively charged anode, also called the plate. He realized that this behavior could rectify alternating current. In other words, it could transform alternating current into direct current, essential for the proper functioning of telegraphic equipment. Fleming used his ideas to invent an electronic tube that was later called a diode or rectifier tube. yet to discover. Inside


Page 49:
18, , Chapter 1 / Introduction, , 1907, an American named Lee DeForest added a third element, called a control grid. The control grid, when carrying a negative charge, can reduce or prevent the flow of electrons from the cathode to the anode of a diode. Negative charge, on the cathode and control grid; positive at anode: electrons close together, the cathode., , Negative charge attached, the cathode; positive ignition, control grid, and anode: Electrons travel from cathode to anode., , +, , –, –, , Filament, , Diode, , Tetrode, , +, , –, , +, , When DeForest patented his device , called it an audio tube. Later it became known as a triode. The schematic symbol for the triode is shown on the left. A triode can act as a switch or an amplifier. Small grid changes in the control grid charge can cause much larger changes in the flow of electrons between the cathode and anode. Therefore, a weak signal applied to the grid results in a much stronger signal at the output of the board. A large enough negative charge applied to the grid, anode (plate), prevents all the electrons from leaving the cathode. Cathode, additional control grids were finally added to the triode to allow more exact control of the flow of electrons. Tubes with two grids (four elements) are called tetrodes; Tubes with three grids are called pentodes. Triodes and pentodes were the most widely used tubes in communications and computer applications. Often two or three triodes or pentodes would be combined, within one envelope, so that they could share a single heater, thus reducing the power consumption of a given device. These modern devices were called "miniature" tubes because many were about 2 inches (5 cm) tall and half an inch (1.5 cm) in diameter. The full-size equivalents, diodes, triodes, and pentodes were only slightly smaller than a household light bulb. Vacuum tubes were not suitable for building computers. Huge amounts of electrical power were required to heat the cathodes of these devices. To prevent a meltdown, this heat had to be removed from the system as quickly as possible. Power consumption and heat dissipation could be reduced by running the cathode heaters at lower voltages, but this reduced the already slow switching speed of the tube. Despite its limitations and power consumption, the vacuum tube computer, Pentode


Page 50:
1.5 / Historical development, , 19, , systems, both analog and digital, have served their purpose for many years and are the architectural foundation of all modern computer systems. Although decades have passed since the last vacuum computer was manufactured, tubes are still used. used in audio amplifiers. These "high-end" amps are preferred by players who believe tubes provide a nice, resonant sound unattainable with solid-state devices. Tube technology was not very reliable. In fact, some ENIAC detractors believed that the system would never work because the tubes would burn out faster than they could be replaced. While system reliability wasn't as bad as detractors predicted, evacuated tube systems often experienced more downtime than uptime. In 1948, three researchers from Bell Labs, John Bardeen, Walter Brattain, and William Shockley, invented the transistor. This new technology not only revolutionized devices like televisions and radios, but also propelled the computer industry into a new generation. Because transistors consume less power than vacuum tubes, are smaller, and work more reliably, computer circuits consequently become smaller and more reliable. Despite using transistors, computers of this generation were still bulky and quite expensive. Typically, only universities, governments, and big business could justify the expense. However, a plethora of computer manufacturers have emerged in this generation; IBM, Digital Equipment Corporation (DEC), and Univac (now Unisys) dominated the industry. IBM marketed the 7094 for scientific applications and the 1401 for business applications. DEC was busy manufacturing the PDP-1. A company founded (but soon sold) by Mauchly and Eckert built the Univac systems. The most successful Unisys systems of this generation belonged to its 1100 series. Another company, Control Data Corporation (CDC), under the supervision of Seymour Cray, built the CDC 6600, the world's first supercomputer. The $10 million CDC 6600 could execute 10 million instructions per second, used 60-bit words, and had a staggering 128 kilowords of main memory. -Triode state version There is no solid state version of the tetrode or pentode. Since electrons behave better in a solid medium than in the open vacuum of a vacuum tube, they do not need additional control grids. Both germanium and silicon could be the basic "solid" used in these solid-state devices. In their pure form, none of these elements are good conductors of electricity. But when combined with


Page 51:
20, , Chapter 1 / Introduction, , Few, electrons, removed, , –, , trace elements that are their neighbors on the Periodic Table, of Elements, conduct electricity effectively and easily controllable., Boron , aluminum and the Gallium sits to the left of silicon and germanium on the periodic table. Since they are to the left of silicon and germanium, they have one fewer electron in their outer shell, or valence electron. So if you add a small amount of aluminum to silicon, the silicon ends up with a slight imbalance in its outer electron shell, and therefore pulls electrons from whichever pole has a negative potential (excess electrons). When modified (or doped) in this way, silicon or germanium becomes a P-type material. Likewise, if we add a little boron, arsenic, or gallium to silicon, we will have extra valence electrons in the crystal of silicon. This gives us an N-type material. A small amount of current will flow through the N-type material if we give the emitter, loosely bound electrons in the N-type material, base, a place to go. In other words, if we apply a positive collector potential to an N-type material, electrons will flow from the negative pole to the positive pole. If the poles are reversed, that is, if we apply a negative potential to the N-type material and a positive potential to the P-type material, no current will flow. This means that we can make a solid-state diode from a simple junction of N-type and P-type materials. The solid-state triode, the transistor, consists of three layers of semiconductor material. A slice of P-type material is placed between two N-type materials, or a slice of N-type material is placed between two P-type materials. The first is called an NPN transistor and the second a PNP transistor. The inner layer of the transistor is called the base; the other two layers are called collector and emitter. The figure on the left shows how current flows through NPN and PNP transistors. The base of a transistor works just like the control grid of a triode tube: Small changes in current in the base of a transistor result in a large electron flowing from emitter to collector. In Electron a discrete component transistor is shown. Source, packaging “TO-50” in the figure at the beginning of this sidebar. There are only three wires (conductors), few, that connect the base, emitter and collector of electrons, of the transistor to the rest of the circuit. Furthermore, transistors are not only smaller than vacuum tubes, but they also operate at lower temperatures and are much more reliable. Vacuum tube filaments, like light bulb filaments, run, are large, hot, and eventually burn out. Computers using, current output, + naturally transistorized components


Page 52:
1.5 / Historical development, smaller and cooler than its tube predecessors. Ultimate miniaturization, however, is not achieved by replacing the individual triodes with discrete N transistors, but by reducing the entire circuitry onto one piece of silicon. transistors Several different techniques are used to fabricate integrated circuits. One of the simplest methods is to create a circuit using computer-aided design software that can print large maps of each of the various layers of silicon that make up the chip. Each map is used as a photographic negative where light-induced changes in a photoresist substance on the chip's surface produce the delicate circuit patterns when the silicon chip is dipped in a chemical that removes exposed areas of silicon. This technique is called photomicrolithography. After etching is complete, a layer of N-type or P-type material is deposited on the irregular surface of the chip. This layer is then treated with a photoresist, exposed to light, and etched like the previous layer. This process continues until all layers have been written. The resulting peaks and valleys of P and N material form microscopic electronic components, including transistors, that behave exactly like larger versions made from discrete components, except that they run much faster and consume a small fraction of the power. Issuer, , N, P, N, , 1.5.4, , 21, , Contacts, , The Third Generation: Integrated Circuit Computers (1965–1980), The real explosion in the use of computers came with the generation of integrated circuits integrated., Jack Kilby invented the integrated circuit (IC) or microchip, made of germanium. Six months later, Robert Noyce (who was also working on integrated circuit design) created a similar device using silicon instead of germanium. This is the silicon chip on which the computer industry was built. Early integrated circuits allowed dozens of transistors to exist on a single silicon chip smaller than a single "discrete component" transistor. Computers became faster, smaller, and cheaper, leading to huge gains in processing power. The IBM System/360 family of computers were among the first commercially available systems to be built entirely from solid-state components. The 360 ​​product line was also IBM's first, offering where all the machines in the family were compatible, that is, they all used the same assembly language. Users of smaller machines can upgrade to larger systems without having to rewrite all their software. This was a revolutionary new concept at the time. The IC generation also saw the introduction of timesharing and multiprogramming (the ability for more than one person to use the computer at the same time). introduction of new operating systems for these computers. Timesharing minicomputers, such as DEC's PDP-8 and PDP-11, made computing affordable for smaller businesses and more universities.


Page 53:
22, , Chapter 1 / Introduction, , Comparison of computer components, clockwise from the top: 1) Vacuum tube, 2) Transistor, 3) Chip containing 3200 2 NAND gates inputs, 4) IC package (small square silver, lower left corner is an IC), courtesy of Linda Null, , IC technology has also enabled the development of more powerful supercomputers. Seymour Cray took what he learned building the CDC 6600 and founded his own company, Cray Research Corporation. This company produced several supercomputers, beginning with the $8.8 million Cray-1 in 1976. The Cray-1, in stark contrast to the CDC 6600, could execute over 160 million instructions per second and support 8 megabytes of data. data memory. , 1.5.5, , The Fourth Generation: VLSI Computers (1980–????), In the third generation of electronic evolution, several transistors were integrated on a chip. As chip technologies and manufacturing techniques, an increasing and advanced number of transistors were placed on a chip. There are now several levels of integration: SSI (Small-Scale Integration), where there are


Page 54:
1.5 / Historical development, , 23, , 10 to 100 components per chip; MSI (Medium Scale Integration), where there are 100 to 1,000 components per chip; LSI (Large Scale Integration), where there are 1,000 to 10,000 components per chip; and finally VLSI (Very Large Scale Integration), where there are more than 10,000 components per chip. This last level, VLSI, marks the beginning of the fourth generation of computers. To put these numbers in some perspective, consider the ENIAC-on-a-chip project. In 1997, to commemorate the fiftieth anniversary of its first public demonstration, a group of students at the University of Pennsylvania built a single-chip equivalent of the ENIAC. The 1,800-square-foot, 30-ton beast that consumed 174 kilowatts of power the moment it turned on had been reproduced on a chip the size of a fingernail. This chip contained approximately 174,569 transistors, an order of magnitude less than the number of components typically placed on the same amount of silicon in the late 1990s. VLSI enabled Intel, in 1971, to create the world's first microprocessor , the 4004, which was a fully functional 4-bit system running at 108 KHz. Intel also introduced the Random Access Memory (RAM) chip, which holds four kilobits of memory on a single chip. This allowed fourth-generation computers to become smaller and faster than their solid-state predecessors. VLSI technology and its amazing reduction circuitry spawned the development of microcomputers. These systems were small and inexpensive enough to make computers available and accessible to the general public. The first microcomputer was the Altair 8800, released in 1975 by the Micro, Instrumentation and Telemetry (MITS) corporation. The Altair 8800 was soon followed by the Apple I and Apple II, and Commodore's PET and Vic 20. Finally, in 1981, IBM introduced its PC (Personal Computer). -level”, computer system. Their Datamaster, as well as their 5100 series desktop computers, failed miserably in the marketplace. Despite these initial failures, IBM's John Opel convinced his management to try again. He suggested forming a largely self-contained "independent business unit" in Boca Raton, Florida, far from IBM's headquarters in Armonk, New York. Opel chose Don Estridge, an energetic and capable engineer, to lead the development of the new system, code-named Acorn. In light of IBM's earlier failures in the small systems area, corporate management retained control over Acorn's schedule and finances. Opel only managed to get the project off the ground for him after promising to deliver it within a year, a seemingly impossible feat. Estridge knew that the only way to deliver the PC within the wildly optimistic 12-month schedule would be to break IBM convention and use as many "off-the-shelf" parts as possible. Thus, from the beginning, the IBM PC was designed with an "open" architecture. While some at IBM may have later regretted the decision to keep PC architecture as non-proprietary as possible, it was this very openness that allowed IBM to set the standard for the industry. While IBM's competitors were busy suing the companies for copying their system designs, PC clones proliferated. Before long, the price of “IBM-compatible” microcomputers was within the reach of almost all small businesses.


Page 55:
24, , Chapter 1 / Introduction, , business. Also, thanks to cloners, a large number of these systems soon began to find true "personal use" in people's homes. IBM eventually lost its hold on the microcomputer market, but the genie came out of the bottle. For better or worse, the IBM architecture remains the de facto standard for microcomputing, with systems getting bigger and faster every year. Today, the average desktop computer has many times the computing power of the mainframes of the 1960s. Since the 1960s, mainframe computers have experienced impressive price-performance improvements thanks to VLSI technology. Although the IBM System/360 was an entirely solid-state system, it was still a water-cooled, power-guzzling behemoth. It could only execute around 50,000 instructions per second and only supported 16 megabytes of memory (although it usually had kilobytes of physical memory installed). These systems were so expensive that only the largest companies and universities could afford to buy or lease one. Today's mainframes, now called "enterprise servers," still cost millions of dollars, but their processing capabilities have grown thousands of times, surpassing the billion instructions per second mark in the late 1990s. As web servers routinely support hundreds of thousands of transactions per minute. ! The processing power brought by VLSI to supercomputers defies comprehension. The first supercomputer, the CDC 6600, could execute 10 million instructions per second and had 128 kilobytes of main memory. On the other hand, today's supercomputers contain thousands of processors, can handle terabytes of memory, and will soon be able to execute quadrillions of instructions per second. What technology will mark the beginning of the fifth generation? Some say that the fifth generation will mark the acceptance of parallel processing and the use of networks and single-user workstations. Many people believe that we have already crossed this generation. Some people characterize the fifth generation as the generation of neural networks, DNA, or optical computing systems. We may not be able to define gen 5 until we've advanced to gen 6 or 7 and whatever those eras bring, 1.5.6, Moore's Law, So where does it end? How small can we make transistors? How densely can we pack chips? No one can say for sure. Every year, scientists continue to thwart forecasters' attempts to define the limits of integration. In fact, more than one skeptic raised an eyebrow when, in 1965, Intel founder Gordon Moore declared: "The density of transistors on an integrated circuit will double every year." The current version of this prediction is usually broadcast as “Silicon chip density doubles every 18 months.” This statement became known as Moore's Law. Moore intended this postulate to last only 10 years. chip manufacturing processes have allowed this law to be in place for almost 40 years (and many believe it will continue into the 2010s).However, using current technology, Moore's Law cannot last forever.There are limitations physical and financial factors that must come into play.


Page 56:
1.6 / Hierarchy at the computer level, , 25, , miniaturization rental rate, it would take about 500 years to put the entire solar system on a chip! Clearly the limit is somewhere between here and there. Cost, may be the final constraint. Rock's Law, proposed by former Intel capitalist Arthur Rock, is a corollary of Moore's Law: "The cost of capital equipment to build semiconductors will double every four years." Rock's Law stems from the observations of a financier who watched the price of new chip installations rise from about $12,000 in 1968 to $12 million in the late 1990s. At that rate, in the year 2035, no only a memory element smaller than an atom will be reduced in size, but it would also take all the wealth in the world to build a single chip! So even as we continue to make smaller and faster chips, the ultimate question may be whether we can afford to build them. Of course, for Moore's Law to hold, Rock's Law must fall. Clearly, for these two things to happen, computers must switch to a radically different technology. Demonstrations of laboratory prototypes based on organic computing, superconductors, molecular physics and quantum computing were made. Quantum computers, which take advantage of the quirks of quantum mechanics to solve computational problems, are particularly exciting. Quantum systems would not only compute exponentially faster than any previously used method, but would also revolutionize the way we define computational problems. Problems that are considered ridiculously impossible today may be within the reach of the next generation of students. These students may, in fact, laugh at our "primitive" systems in the same way that we are tempted to laugh at ENIAC. range of problems, you must be able to run programs written in a variety of languages, from FORTRAN and C to Lisp and Prolog. As we will see in Chapter 3, the only physical components we have to work with are the wires and the gates. There is a formidable open space, a semantic gap, between these physical components and a high-level language like C++. For a system to be practical, the semantic gap must be invisible to the majority of users of the system. Programming experience teaches us that when a problem is big, we should break it down and use a divide and conquer approach. In programming, we divide a problem into modules and then design each module separately. Each module performs a specific task and modules only need to know how to interact with other modules to make use of them. The organization of the computer system can be approached in a similar way. Through the principle of abstraction, we can imagine the machine built from a hierarchy, of levels, in which each level has a specific function and exists as a different hypothetical machine. We call the hypothetical computer at each level a virtual machine. Each level virtual machine executes its own particular set of instructions, calling lower level machines to perform tasks when necessary. As you study the organization of computers, you will see the logic behind dividing up the hierarchy, such as


Page 57:
26, , Chapter 1 / Introduction, , Lev, the 6, , Use, , r, , Lev, the 5, Lev, the 4, Lev, the 3, Lev, the 2, Lev, the 1, Lev, the 0 , , Hig, , h-L, , eve, lL, , Ass, Sys, , has, , Mac, , Sof, , hine, , troll, , ital, , uag, , ngu, , Log, , ic, , and, , age, , twa, re, , Con, Dig, , ang, , emb, ly L, a, s, ram, rog, P, ble, tc., cuta, N y, A, Exe, R, T , FOR, va,, a, J, ,, C++, ode, ode, yC, r and C, mbl, a, e, r, s, ib, As, m, L, yste, S, g, e, ratin , ctur, Ope, hite, c, r, A, Set, tion, c, u, r, red, Inst, dwi, ​​Har, r, o, de, roco, Mic, etc., tes,, a , G, ,, uits, Circ, , FIGURE 1.3 The abstract levels of modern computer systems, as well as the way in which these layers are implemented and interact with each other. Figure 1.3 shows the commonly accepted layers that represent abstract virtual machines. Level 6, the user level, is made up of applications and is the level that everyone is most familiar with. At this level we run programs such as Word, processors, graphics packages or games. The lower levels are almost invisible, starting at the user level. Level 5, the high-level language level, consists of languages ​​such as C, C++, FORTRAN, Lisp, Pascal, and Prolog. These languages ​​must be translated (using a compiler or an interpreter) into a language that the machine can understand. Compiled languages ​​are translated into assembly language and then assembled into machine code. (They are translated to the next lower level.) The user at this level sees very little of the lower levels. Although a programmer must know the data types and the instructions available for those types, he does not need to know how those types are actually implemented. Level 4, the assembly language level, encompasses some type of assembly language. As mentioned above, the compiled high-level language


Page 58:
1.7 / The von Neumann model, , 27, , the indicators are first translated into an assembly, which is then directly translated into machine language. This is a one-to-one translation, which means that an assembly language instruction is exactly translated to a machine language instruction. By having separate levels, we reduce the semantic gap between a high-level language like C++ and a real machine language (consisting of 0's and 1's). Level 3, the system software level, deals with operating system instructions. This level is responsible for multiprogramming, memory protection, synchronization, processes, and several other important functions. Often, instructions translated from assembly language to machine language pass through this level without modification. Level 2, the Instruction Set Architecture (ISA), or Machine Level, consists of the machine language recognized by the architecture. Programs written in the true machine language of a computer on a wired computer (see below) can be executed directly by electronic circuitry without interpreters, translators, or compilers. We will study instruction set architectures in depth in Chapters 4 and 5. Level 1, the control level, is where a control unit ensures that instructions are decoded and executed correctly and that data moves where and when it should. be. The control unit interprets the machine instructions that are passed to it, one at a time, from the higher level, causing the necessary actions to be carried out. Control units can be designed in two ways: they can be hardwired or they can be microprogrammed. In hardwired control units, the control signals emanate from digital logic component blocks. These signals direct all data traffic and instructions to the appropriate parts of the system. Wired control, drives are usually very fast because they are actually physical components. However, once implemented, they are very difficult to modify for the same reason. The other control option is to implement instructions using a microprogram. A microprogram is a program written in a low-level language, that is, implemented directly by the hardware. Machine instructions produced at Level 2 are fed into this microprogram, which then interprets the instructions by activating the appropriate hardware to execute the original instruction. A machine-level instruction is often translated into several microcode instructions. This is not the one-to-one correlation that exists between assembly language and machine language. Microprograms are popular because they can be modified relatively easily. The disadvantage of microprogramming is, of course, that the additional layer of translation generally results in slower instruction execution. Level 0, the Digital Logic Level, is where we find the physical components of the computer system: ports and cables. These are the fundamental constructs, the building blocks, the implementations of mathematical logic, that are common to all computational systems. Chapter 3 presents the digital logic level in detail., , 1.7, , THE VON NEUMANN MODEL, In early electronic computing machines, programming was synonymous with connecting wires to sockets. There was no layered architecture, so programming


Page 59:
28, , Chapter 1 / Introduction, , a computer was as much a feat of electrical engineering as it was an exercise in algorithm design. Before their work on ENIAC was finished, John W., Mauchly, and J. Presper Eckert devised an easier way to change the behavior of their calculating machine. They calculated that memory devices, in the form of mercury delay lines, could provide a way to store program instructions. This would forever end the tedium of rebooting every time you had a new problem to solve or an old one to debug. Mauchly and Eckert documented their idea and proposed it as the basis for their next computer, the EDVAC. Unfortunately, while involved in the top-secret ENIAC project during World War II, Mauchly and Eckert were unable to immediately publish their ideas. 🇧🇷 One of those people was a famous Hungarian mathematician named John von Neumann (pronounced von noy-man). After reading Mauchly and Eckert's proposal for the EDVAC, von Neumann published and publicized the idea. He was so effective in delivering this concept that history has credited him with inventing it. All stored program computers are known as von Neumann systems using the von Neumann architecture. Homage to its true inventors: John W. Mauchly and J. Presper Eckert., The current version of the stored-program machine architecture satisfies at least the following characteristics: • It consists of three hardware systems: A central processing unit (CPU ) with a control unit, an arithmetic logic unit (ALU), registers (small storage areas), and a program counter; a main memory system, which contains programs that control the operation of the computer; and an I/O system., • Ability to perform sequential instruction processing, • Contains a single path, either physical or logical, between main memory, the system, and the CPU control unit, forcing change the instruction and execution cycles. This unique path is often called a von Neumann bottleneck. Figure 1.4 shows how these functions work together in modern computer systems. Note that the system shown in the figure passes all of its I/O through the arithmetic logic unit (it actually goes through the accumulator, which is part of the ALU). This architecture runs programs in what is known as the von Neumann execution cycle (also called the fetch, decode, and execute cycle), which describes how the machine works. One iteration of the loop is as follows: 1. The control unit fetches the next program instruction from memory, using the program counter to determine where the instruction is located. 2. The instruction is decoded into a language that the ALU can understand. 3. Any operand data needed to execute the instruction is retrieved from memory and placed in registers within the CPU. 4. The ALU executes the instruction and places the results in registers or memory.


Page 60:
1.5 / Non-von Neumann Models, , 29, , Central Processing Unit, Program Counter, , Registers, , Arithmetic Logic, Unit, , Main, Memory, , Control, Unit, , Input/Output, System, , FIGURE 1.4 The von Neumann Architecture, The ideas present in the von Neumann architecture have been extended so that programs and data stored on a slow-access storage medium, such as a hard drive, can be copied to a fast-access storage medium. and volatile, like RAM, before being executed. . This architecture has also been simplified into what is now called the system bus model, shown in Figure 1.5. The data bus moves data from main memory to CPU registers (and vice versa). The address bus contains the address of the data that the data bus is currently accessing. The control bus carries the necessary control signals that specify how the information transfer is to be performed. adding virtual memory and adding general registers. You will learn a lot about these improvements in the following chapters. However, the von Neumann bottleneck continues to baffle engineers looking for ways to build fast systems that are inexpensive and compatible with the vast amount of commercially available software. Engineers who are not limited


Page 61:
30, , Chapter 1 / Introduction, , CPU, (ALU, registers and control), , Memory, , I/O, , Data bus, Address bus, Control bus, , FIGURE 1.5 The modified von Neumann architecture, By adding a system bus, because of the need to maintain compatibility with von Neumann systems, many different models of computing are free to use. Several different subfields fall into the non-von Neumann category, including neural networks (using model ideas, of the brain as a computing paradigm), genetic algorithms (exploring ideas from biology and DNA evolution), quantum computing (discussed above) and, parallel computers. Of these, parallel computing is currently the most popular. Today, parallel processing solves some of our biggest problems in the same way that the settlers of the Old West solved their biggest problems using parallel oxen. If they were using an ox to move a tree and the ox wasn't big or strong enough, they certainly weren't trying to create a bigger ox: they used two oxen. If one computer is not fast or powerful enough, instead of trying to develop a faster and more powerful computer, why not use multiple computers? This is precisely what parallel computing does. The first parallel processing systems were built in the late 1960s and had only two processors. The 1970s saw the introduction of supercomputers with up to 32 processors, and the 1980s brought the first systems with more than 1,000 processors. Finally, in 1999, IBM announced the construction of a supercomputer called Blue, Gene. This massively parallel computer contains more than 1 million processors, each with its own dedicated memory. Your first task is to analyze the behavior of protein molecules. Even parallel computing has its limits. As the number of processors increases, the overhead of managing how tasks are distributed to those processors also increases. Some parallel processing systems require additional processors just to manage the rest of the processors and the resources allocated to them. a bottleneck is likely to develop. The best thing we can do to remedy this is to make sure that the slowest parts of the system are the least used. This is the idea behind Amdahl's Law. This law states that the performance improvement possible with a given enhancement is limited by the amount that the improved feature


Page 62:
Further reading, , 31, ture is used. The underlying premise is that every algorithm has a sequential part that ultimately limits the speedup that the multiprocessor implementation can achieve. We show how they differ. We have also introduced some terminology in the context of a fictional computer advertisement. Much of this terminology will be expanded upon in later chapters. Historically, computers were simply calculating machines. As computers became more sophisticated, they became general-purpose machines, which required viewing each system as a hierarchy of levels rather than one gigantic machine. Each layer in this hierarchy has a specific purpose, and all levels help to minimize the semantic gap between a high-level programming language or application and the gates and wires that make up the physical hardware. Perhaps the most important development in computing that affects us as programmers is the introduction of the stored program concept of the von Neumann machine. computer., , I, , FURTHER READING, We encourage you to build on our brief presentation on the history of computers. We think you will find this topic intriguing because it is as much about people as it is about machines. You can read about the "forgotten father of the computer," John Atanasoff, in Mollenhoff (1988). This book documents the strange relationship between Atanasoff and John Mauchly and recounts the open battle between two computer giants, Honeywell and Sperry Rand. Ultimately, this trial gave Atanasoff his due credit. For a clearer look at the history of the computer, try Rochester and Gantz's (1983) book. Augarten's Illustrated History of Computers (1985) is a delight to read, containing hundreds of hard-to-find images of early computers and computing devices. -Volume dictionary of Cortada (1987). Ceruzzi (1998) presents a particularly careful account of the history of computing. If he is interested in an excellent set of case studies on historical computers, see Blaauw & Brooks (1997). Chronicle of the development of the IBM PC and Toole's biography of Ada, Countess of Lovelace (1998). Polachek's (1997) article conveys a vivid picture of the complexity of calculating ballistic shooting tables. After reading this article, he'll understand why the military would gladly pay for anything that promises to make the process faster or more accurate. Maxfield and Brown's (1997) book contains a fascinating look at the origins and history of computing, as well as detailed explanations of how a computer works.


Page 63:
32, , Chapter 1 / Introduction, , For more information on Moore's Law, we refer the reader to Schaller, (1997). For detailed descriptions of the first computers, as well as profiles and reminiscences of industry pioneers, you can check out the IEEE Annals of Computing History, which is published quarterly. The Computer Museum History Center can be found online at www.computerhistory.org. Contains various, exhibits, surveys, timelines and collections. Many cities now have computers, museums, and allow visitors to use some of the older computers. Much information can be found on the websites of the regulatory bodies that are covered in this chapter (as well as on the websites that are not covered in this chapter). The IEEE can be found at: www.ieee.org; ANSI at www.ansi.org; the ISO at www.iso.ch; BSI at www.bsi-global.com; and ITU-T at www.itu.int. The ISO website offers a wealth of information and reference materials on standards. The WWW Computer Architecture home page at www.cs.wisc.edu/~arch/, www/ contains a comprehensive index of information related to computer architecture. Many USENET newsgroups are also devoted to these topics, including comp.arch and comp.arch.storage. The entire May/June 2000 issue of MIT Technology Review is dedicated to the architectures that can power tomorrow's computers. Reading this issue will be time well spent. In fact, we could say the same for all editions., , REFERENCES, Augarten, Stan. Little by Little: An Illustrated History of Computers. London: Unwin Paperbacks, 1985., Blaauw, G. and Brooks, F. Computer Architecture: Concepts and Evolution. Reading, MA: AddisonWesley, 1997., Ceruzzi, Paul E. History of modern computing. Cambridge, MA: MIT Press, 1998., Chopsky, James, and Leonsis, Ted. Blue Magic: the people, power, and politics behind IBM, the personal computer. New York: Facts on File Publications, 1988., Cut, J. W. Historical Dictionary of Data Processing, Volume 1: Biographyes; Volume 2: Organization, Volume 3: Technology. Westport, CT: Greenwood Press, 1987, Maguire, Yael, Boyden III, Edward S., and Gershenfeld, Neil. "Towards a quantum desktop computer". IBM Systems Journal 39: 3/4 (June 2000), pp. 823–839., Maxfield, Clive, & Brown, A. Bebop BYTES Back (An Unconventional Guide to Computers)., Madison, AL: Doone Publications, 1997. , McCartney, Scott. ENIAC: The triumphs and tragedies of the world's first computer. New York: Walker and Company, 1999., Mollenhoff, Clark R. Atanasoff: The Forgotten Father of the Computer. Ames, IA: Iowa State University Press, 1988., Polachek, Harry. "Before ENIAC". IEEE Annals of the History of Computing 19:2 (June 1997), pp. 25–30., Rochester, JB and Gantz, J. The Naked Computer: A Layperson's Almanac of Computer Lore, Wizardry, Personalities, Memorabilia, World Records, Mindblowers and Tomfoolery. New, York: William A. Morrow, 1983., Schaller, R. "Moore's Law: Past, Present, and Future." IEEE Spectrum, June 1997, pp. 52–59., Tanenbaum, A. Structured Computing Organization, 4th ed. Upper Saddle River, NJ: Prentice Hall, 1999.


Page 64:
Essential Terms and Concepts Review, , 33, , Toole, Betty A. Ada, the Enchantress of Numbers: Prophet of the Computer Age. Mill Valley, CA:, Strawberry Press, 1998, Waldrop, M. Mitchell. "Quantum computing." MIT Technology Review 103: 3 (May/June 2000), pp. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24. , 25., , What is the difference between the organization of the computer and computer architecture?, What is an ISA?, What is the importance of the Principle of Equivalence of Hardware and Software?, Name the three basic components of every computer., What power of 10 does the prefix refer to? jig-? What is the (approximate) equivalent power of 2? What power of 10 does the prefix micro- refer to? What is the equivalent (approximate) power of 2? What unit is commonly used to measure the speed of a computer clock? Name two types of computer memory. What is the mission of the IEEE? is the full name of the organization that uses the ISO acronym? ISO is an acronym? ANSI is the acronym used by which organization? What is the name of the Swiss organization dedicated to issues related to telephony, telecommunications and data communication? Who is known as the father of computing and why? What was the importance of the punched card? Name two driving factors in the development of computers. What was it about a transistor that made it such a vast improvement over a tube? vacuum?, How is an integrated circuit different from a transistor?, Explain the differences between SSI, MSI, LSI and VLSI., What technology prompted the development of microcomputers? Why? What does an “open architecture” mean? State Moore's Law. Can it last indefinitely? How is Rock's Law related to Moore's Law? Name and explain the seven commonly accepted layers of the Computer Level Hierarchy. How does this arrangement help us understand computer systems? What was it about the von Neumann architecture that distinguished it from its predecessors? Name the features present in a von Neumann architecture.


Page 65:
34, , Chapter 1 / Introduction, 26. How does the fetch-decode-execute loop work?, 27. What does parallel computing mean?, 28. What is the underlying premise of Amdahl's Law?, , EXERCISES, ◆ , , 1 How are hardware and software different? In what way are they equal?, 2. a) How many milliseconds (ms) are there in 1 second?, b) How many microseconds (µs) are there in 1 second?, c) How many nanoseconds (ns) are there in 1 millisecond ?, d) How many microseconds are there in 1 millisecond?, e) How many nanoseconds are there in 1 microsecond?, f) How many kilobytes (KB) are there in 1 gigabyte (GB)?, g) How many How many kilobytes are there in 1 megabyte (MB)?, h) How many megabytes are there in 1 gigabyte (GB)?, i) How many bytes are there in 20 megabytes?, j) How many kilobytes are there in 2 gigabytes?, , ◆, , 3. ¿ By what order of magnitude is something that runs in nanoseconds faster than something that runs in milliseconds? 4. Imagine that you are ready to buy a new computer for personal use. First, take a look at the ads in various magazines and newspapers and list any terms you don't quite understand. Look, these terms and give a brief explanation in writing. Decide which factors are important in your decision on which computer to buy and list them. After selecting the system you wish to purchase, identify which terms refer to hardware and which refer to software. 5. Choose your favorite programming language and write a small program. After the program is compiled, see if you can determine the relationship between the instructions in the source code and the machine language instructions generated by the compiler. If you add a line of source code, how will that affect the machine language program? Try adding different source code statements like add then multiply. How does machine code file size change with different instructions? Comment on the result 6. Respond to the comment mentioned in Section 1.5: If it were invented today, what do you think the computer would be called? Give at least one good reason for your answer., , ◆, , 7. Assume that a transistor on an integrated circuit chip is 2 microns in size. According to Moore's Law, how big will this transistor be in 2 years? How important is Moore's Law to programmers? 8. What circumstances helped make the IBM PC so successful? 9. List five personal computer applications. Is there a limit for desktop apps? Do you envision any radically different and exciting apps in the near future? So what?


Page 66:
Exercises, , 35, , 10. Under the von Neumann architecture, a program and its data are stored in memory. Therefore, it is possible for a program, thinking that a memory location contains data, when it actually contains a program instruction, to accidentally (or on purpose) change it. What implications does this have for you as a programmer? 11. Read a popular local newspaper and research job openings. (You can also check out some of the most popular online career sites.) What jobs require specific hardware knowledge? What jobs involve knowledge of computer hardware? Is there a correlation between knowledge of the required hardware and the company or its location?


Page 68:
“What would life be without arithmetic but a scene of horrors?”, , —Sydney Smith (1835), , CHAPTER, , , 2, 2.1, , Representation of data in computer systems, INTRODUCTION, the organization of any computer it depends considerably on how it is represented, numbers, characters and control information. The reverse is also true: standards and conventions established over the years have determined certain aspects of computing organization. This chapter describes the various ways that computers can store and manipulate numbers and characters. The ideas presented in the following sections form the basis for understanding the organization and function of all types of digital systems. The most basic unit of information in a digital computer is called a bit, which is short for a binary digit. In the concrete sense, a bit is nothing more than an "on" or "off" (or "high" and "low") state within a computer circuit. In 1964, the designers of the IBM System/360 mainframe established a convention of using groups of 8 bits as the basic unit of computer addressable storage. They called this collection 8 bits per byte. plus adjacent bytes that are sometimes and almost always handled collectively. The word size represents the size of the data that a particular architecture handles most efficiently. Words can be 16 bits, 32 bits, 64 bits, or any other length that makes sense within the context of a computer's organization (including lengths that are not multiples of eight). Eight-bit bytes can be split into two 4-bit halves called nibbles (or nybbles). Since each bit of a byte has a value within a positional numbering system, the nibble containing the lowest-value binary digit is called a lower-order nibble and the other half is called a higher-order nibble., 37


Page 69:
38, , Chapter 2 / Representation of data in computer systems, , 2.2, , POSITIONAL NUMBERING SYSTEMS, Sometime in the mid-16th century, Europe adopted the decimal (or base 10) numbering system that the Arabs and Hindus had been using for nearly a millennium. Today we assume that the number 243 means two hundred plus four tens plus three units. Even though zero means "nothing," most everyone knows that there is a substantial difference between having 1 of something and having 10 of something. Powers of a root (or base). This is often called a weighted number system because each position is weighted by a power of the root. The set of valid numerals for a positional numeral system is equal in size to the root of that system. For example, there are 10 digits in the decimal system, 0 through 9, and 3 digits in the ternary system (base 3), 0, 1, and 2. The largest valid number in a base system is one less than the root . , so 8 is not a valid number in any system with a root less than 9. To distinguish between numbers with different roots, we use the root as a subscript, as in 3310 to represent the decimal number 33. (In this book, the numbers written without subscript must be , presumably decimals.) Any decimal integer can be expressed exactly in any other base integral system (see Example 2.1)., EXAMPLE 2.1 Three numbers represented as powers of a root, 243.5110 = 2 ⫻ 102 + 4 101 + 3 100 + 5 10⫺1 + 1 10⫺2, 2123 = 2 32 + 1 31 + 2 30 = 2310, 101102 = 1 24 + 23 ⫻ 1 ⫻ + 22 + 1 0 ⫻ 20 = 2210, both The most important roots in computing are binary (base two) and hexadecimal (base 16). Another root of interest is octal (base 8). The binary system uses only the digits 0 and 1; the octal system, from 0 to 7. The hexadecimal system allows the digits from 0 to 9, and A, B, C, D, E, and F are used to represent the numbers from 10 to 15. Figure 2.1 shows some of the roots . , 2.3, , DECIMAL TO BINARY CONVERSIONS, Gottfried Leibniz (1646–1716) was the first to generalize the idea of ​​the decimal (positional) system to other bases. A deeply spiritual person, Leibniz attributed divine qualities to the binary system. He correlated the fact that any whole number could be represented by a series of ones and zeros with the idea that God (1) created the universe out of nothing (0). Until the first binary digital computers were built in the late 1940s, this system remained nothing more than a mathematical curiosity. Today, it is the core of virtually all electronic devices that are based on digital controls.


Page 70:
2.3 / Conversions from Decimal to Binary, , Powers of 2, , Decimal, , 4-bit Binary, , Hexadecimal, , 2–2 = = 0.25, 2–1 = = 0.5, 20 = 1, 21 = 2 , 22=4 , 23=8, 24=16, 25=32, 26=64, 27=128, 28=256, 29=512, 210=1,024, 215=32,768, 216=65,536, , 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010 , 1011, 1100 , 1101, 1110, 1111, , 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, , FIGURE 2.1, , 39 , , Some Numbers Remember, because of its simplicity, the binary number system is easily translated into electronic circuits. It is also easy for humans to understand. Experienced computer professionals can recognize smaller binary numbers (such as those shown in Figure 2.1) quickly. However, converting larger values ​​and fractions usually requires a calculator or pencil and paper. Fortunately, the conversion techniques are easy to master with a little practice. We show some of the simpler techniques in the following sections, 2.3.1, Unsigned Integer Conversion. We start with the base conversion of unsigned numbers. Converting signed numbers (numbers that can be positive or negative) is more complex and it is important that you first understand the basic conversion technique before proceeding with signed numbers. subtraction or division-remainder method. The subtraction method is complicated and requires familiarity with the powers of the root being used. However, being the more intuitive of the two methods, let's explain first. As an example, let's say we want to convert 10410 to base 3. We know that 34 = 81 is the greatest power of 3 that is less than 104, so our base 3 number will have 5 digits (one for each power of the root: 0 to 4). We notice that 81 goes to 104 once and subtract, leaving a difference of 23. We know that the next power of 3, 33 = 27, is too large to subtract, so we write the zero, "placeholder," and see how many times 32 = 9 divides 23. We see it go, twice, and subtract 18. We are left with 5 of which we subtract 31 = 3, leaving 2, which is 2 ⫻ 30. These steps are shown in Example 2.2.


Page 71:
40, , Chapter 2 / Representation of data in computer systems, , EXAMPLE 2.2 Convert 10410 to base 3 by subtraction., 104, ⫺81 = 34 ⫻ 1, 23, ⫺0 = 33 ⫻ 0, 23, ⫺18 = 32 ⫻ 2 , 5, ⫺3 = 31 ⫻ 1, 2, ⫺2 = 30 ⫻ 2, 0, , 10410 = 102123, , The division remainder method is faster and easier than the repeated subtraction method. It employs the idea that successive divisions by the base are, in fact, successive subtractions by powers of the base. The remainders we get when we divide sequentially by the base end up being the digits of the result, read from bottom to top. This method is illustrated in Example 2.3., EXAMPLE 2.3 Convert 10410 to base 3 using the remainder division method., 3 |104, 3 |34, 3 |11, 3 |3, 3 |1, 0, , 2 , 1 , 2, 0, 1, , 3 divide 104 34 times with remainder 2, 3 divide 34 11 times with remainder 1, 3 divide 11 3 times with remainder 2, 3 divide 3 1 times with remainder 2 0, 3 divide 1 0 times with remainder 1, , Reading the remainders from bottom to top, we have: 10410 = 102123., , This method works with any base and, due to the simplicity of the calculations, is particularly useful in converting from decimal to binary. Example 2.4 shows such a conversion.


Page 72:
2.3 / Conversions from decimal to binary, , 41, , EXAMPLE 2.4 Convert 14710 to binary., 2 |147, 2 |73, 2 |36, 2 |18, 2 |9, 2 |4, 2 |2, 2 | 1 , 0, , 1, 1, 0, 0, 1, 0, 0, 1, , 2 divide 147 73 times with remainder 1, 2 divide 73 36 times with remainder 1, 2 divide 36 18 times with remainder 0, 2 divide 18 9 times with remainder 0, 2 divide 9 4 times with remainder 1, 2 divide 4 2 times with remainder 0, 2 divide 2 1 times with remainder 0, 2 divide 1 0 times with remainder 1, , Read remainders of bottom above we have: 14710 = 100100112., A binary number with N bits can represent unsigned integers from 0 to 2N⫺1. For example, 4 bits can represent decimal values ​​from 0 to 15, while 8 bits can represent values ​​from 0 to 255. The range of values ​​that a given number of bits can represent is extremely important when doing arithmetic operations on binary numbers. . Consider a situation where binary numbers have 4 bits and we want to add 11112 (1510) to 11112. We know that 15 plus 15 equals 30, but 30 cannot be represented using just 4 bits. This is an example of a condition known as an overflow, which occurs in the unsigned binary representation when the result of an arithmetic operation is outside the precision range allowed for the given number of bits. We covered overflow in more detail when we discussed signed numbers in Section 2.4., 2.3.2, Fraction Conversion. Fractions in any base system can be approximated in any other base system using negative powers of a base. Root points separate the integer part of a number from its fractional part. In the decimal system, the base point is called the decimal point. Binary fractions have a binary point. Fractions that contain repeating sequences of digits to the right of the root point in one base may not necessarily have a repeating sequence of digits in another base. For example, 2⁄3 is a repeating decimal fraction, but in the ternary system it ends in 0.23 (2 ⫻ 3⫺1 = 2 ⫻ 1⁄3). Repeated subtraction and division-remainder methods to convert whole numbers. Example 2.5 shows how we can use repeated subtraction to convert a number from decimal to base 5.


Page 73:
42, , Chapter 2 / Data Representation in Computer Systems, , EXAMPLE 2.5 Convert 0.430410 to base 5., 0.4304, ⫺ 0.4000 = 5⫺1 ⫻ 2, 0.0304, ⫺ 0.0000 = 5⫺2 ⫻ 0, 0.0304, ⫺ 4 0.0240 ⫺3 ⫻ 3, 0.0064, ⫺ 0.0064 = 5⫺4 ⫻ 4, 0.0000, (A placeholder), , Reading from top to bottom, we find 0.430410 = 0.20345. Since the remainder method works with positive powers of the root for integer conversions, it makes sense that we use multiplication to convert fractions, because they are expressed in negative powers of roots. However, instead of looking for the remainders, as we did above, we use only the integer part of the product after multiplying by the root. The response is read from the top down instead of the bottom up. Example 2.6 illustrates the process., EXAMPLE 2.6 Convert 0.430410 to base 5., .4304, ⫻ 5, 2.1520, , The integer part is 2, omit from subsequent multiplication., , .1520, ⫻ 5, 0.7600, , The integer part is 0, we'll need it as a placeholder. zero, so we're done. Reading from top to bottom, we have 0.430410 = 0.20345. This example is designed so that the process stops after a few steps. Many times things don't work out so well and we end up with fractions repeated. Most computer systems implement specialized rounding algorithms to


Page 74:
2.3 / Decimal to binary conversions, , 43, , provide a predictable degree of precision. However, for the sake of clarity, we will simply discard (or truncate) our response when the desired precision is reached, as shown in Example 2.7., EXAMPLE 2.7 Convert 0.3437510 to binary with 4 bits to the right of the binary point., .34375, ⫻, 2, 0.68750, .68750, ⫻, 2, 1.37500, .37500, ⫻, 2, 0.75000, .75000, ⫻, 2, 1.50000, , (Another placeholder), , (This is our fourth bit Let's stop here.), reading from top to bottom, 0.3437510 = 0.01012 to four binary places. The methods described can be used to directly convert any number in any base to any other base, say from base 4 to base 3 (as in the example 2.8). However, in most cases it is faster and more accurate to first convert to base 10 and then to the desired base. An exception to this rule is when you are working between bases that are powers of two, as you will see in the next section. EXAMPLE 2.8 Convert 31214 to base 3. First, convert to decimal: 31214 = 3 ⫻ 43 + 1 ⫻ 42 + 2 ⫻ 41 + 1 ⫻ 40, = 3 ⫻ 64 + 1 ⫻ 16 + 2 ⫻ 4 + 4 = 21710, then convert to base 3:, 3 |217, 3 |72, 3 |24, 3 | 8, 3 |2, 0, , 1, 0, 0, 2, 2, We have 31214 = 220013.


Page 75:
44, , Chapter 2 / Representation of data in computer systems, , 2.3.3, , Conversion between powers of two roots, Binary numbers are often expressed in hexadecimal, and sometimes in octal, to improve their readability. Since 16 = 24, a group of 4 bits (called a hextet) is easily recognized as a hexadecimal digit. Likewise, with 8 = 23, a group of 3 bits (called an octet) can be expressed as an octal digit. Using these relationships, we can convert a number from binary to octal or hexadecimal by doing little more than looking at it. ., 6 2 3 5, 1100100111012 = 62358, 1100 1001 1101, C, 9, D, , Separate into groups of 4 for hexadecimal conversion., 1100100111012 = C9D16, , If there are few bits, leading zeros can be added. , , 2.4, , REPRESENTATION OF SIGNED INTEGERS, We have seen how to convert an unsigned integer from one base to another., Signed numbers require additional problems to be addressed. When an integer variable is declared in a program, many programming languages ​​automatically allocate a storage area that includes a sign as the first bit of the storage location. By convention, a "1" in the high-order bit indicates a negative number. The storage location can be as small as an 8-bit byte or as large as several words, depending on the programming language and the computer system. The remaining bits (after the sign bit) are used to represent the number itself. The way this number is represented depends on the method used. There are three commonly used approaches. The most intuitive method, signed magnitude, uses the remaining bits to represent the magnitude of the number. This method and the other two approaches, which use the concept of complements, are presented in the following sections., , 2.4.1, , Signed Magnitude, Up to this point we have ignored the possibility of binary representations for negative numbers. The set of positive and negative integers is called the set of signed integers. The problem with representing signed integers as binary values ​​is the sign: how should we encode the actual sign of the number? The signed-magnitude representation is one method of solving this problem. like your name


Page 76:
2.4 / The signed integer representation, , 45, , implies that a signed magnitude number has a sign as its leftmost bit (also called the highest-order bit or most significant bit), while the remaining bits represent the magnitude (or absolute value) of the numerical value. For example, in an 8-bit word, ⫺1 would be represented as 10000001 and +1 as 00000001. In a computer system that uses signed magnitude representation and 8 bits to store integers, 7 bits can be used for the actual representation. of the magnitude of the number. This means that the largest integer that an 8-bit word can represent is 27 ⫺ 1, or 127 (a zero in the high-order bit, followed by 7 ones). The smallest integer is 8, ones, or ⫺127. Thus N bits can represent ⫺2(N⫺1) ⫺ 1 to 2(N⫺1) ⫺ 1. Computers must be able to perform arithmetic operations on integers represented by this notation. Signed-magnitude arithmetic is done using essentially the same methods that humans use with pencil and paper, but it can get confusing very quickly. As an example, consider the rules of addition: (1) If the signs are equal, add the magnitudes and use the same sign for the result; (2) If the signs are different, you must determine which operand has the greater magnitude. The sign of the result is equal to the sign of the operand of greater magnitude, and the magnitude must be obtained by subtracting (not adding) the smaller from the larger. If you consider these rules carefully, this is the method you use for hand signed arithmetic. appropriate, when the calculation is complete. When modeling this idea in an 8-bit word, we must be careful to include only 7 bits in the magnitude of the response, disregarding any carry that occurs on the high-order bit. EXAMPLE 2.10 Add 010011112 to 001000112 using signed magnitude arithmetic 1 1 1, (79), 1 1, + (35), 1 0, (114), , The arithmetic proceeds exactly as in decimal addition, including carries , until you get to the seventh bit from the right. If there is a carry here, we say we have an overflow condition and the carry is discarded, resulting in an incorrect sum. There is no overflow in this example. We find 010011112 + 001000112 = 011100102 in the signed-magnitude representation. The sign bits are segregated because they are only relevant after the addition is complete. In this case, we have the sum of two positive numbers, which is positive. Overflow (and therefore an erroneous result) on signed numbers occurs when the sign of the result is incorrect.


Page 77:
46, , ​​Chapter 2 / Data Representation in Computer Systems, , In signed magnitude, the sign bit is used only for the sign, so we can't "load" it into it. If there is a carry emitted from the seventh bit, our result will be truncated when the seventh bit is blown, giving an incorrect sum. (Example 2.11 illustrates this overflow condition.) Prudent programmers avoid “million-dollar” mistakes by checking for overflow conditions every time there is even the slightest chance of them occurring. If we didn't drop the overflow bit, it would carry over to the signal, causing the more shocking result of adding two positive numbers together, being negative. (Imagine what would happen if the next step in a program was to take the square root or logarithm of this result!) EXAMPLE 2.11 Add 010011112 to 011000112 using signed-magnitude arithmetic, last carry, 1 ←, overflows and is 0, discarded ., 0 +, 0, , 1, 1 0 0, 1 1 0, 0 1 1, , 1, 1, 0, 0, , 1, 1, 0, 0, , 1, ⇐ loads, 1 1, ( 79), 1 1, + (99), 1 0, (50), , We get the wrong result of 79 + 99 = 50., , What is Double-Dabble?, The fastest way to convert a binary number to decimal is a method called double-dabble (or double-dibble). This method is based on the idea that a subsequent power of two is twice the previous power of two in a binary number. The calculation starts with the leftmost bit and continues with the rightmost bit. piece. That sum is then doubled and added to the next bit. The process is repeated for each bit until the rightmost bit has been used., , EXAMPLE 1, Convert 100100112 to decimal., Step 1:, , Write the binary number, leaving a space between the bits., 1, , Step 2: , , 0, , 0, , 1, , 0, , 0, , 1, , 1, , Duplicate the high-order bit and copy it to the next bit., 1, ⫻2, 2, , 0, 2, , 0, , 1, , 0, , 0, , 1, , 1


Page 78:
2.4 / Signed Integer Representation, , Step 3:, , Add the next bit and double the sum. Copy this result to the next bit., 1, , 0, 2, +0, 2, ⫻2, 4, , ⫻2, 2, Step 4:, 1, , ⫻2, 2, , 0, 2, + 0 , 2, ⫻2, 4, , 0, 4, +0, 4, ⫻2, 8, , 47, , 0, 4, , 1, , 0, , 0, , 1, , 1, , Repeat step 3 until you run out of bits., , 1, 8, +1, 9, ⫻2, 18, , 0, 18, +0, 18, ⫻2, 36, , 0, 36, +0, 36, ⫻ 2, 72, , 1, 72, +1, 73, ⫻2, 146, , ​​1, 146, +1, 147, , ⇐ The answer: 100100112 = 14710, , When we combine the hextet grouping (backwards ) with the double tap, we find that we can convert hexadecimal to decimal with ease., , EXAMPLE 2, Convert 02CA16 to decimal., First, convert hexadecimal to binary by grouping into hextets., 0, 0000, , 2, 0010 , , C, 1100 , , A, 1010, , Then double-tap in binary form:, 1, , ⫻2, 2, , 0, 2, +0, 2, ⫻2, 4, , 1 , 4, + 1 , 5, ⫻2, 10, , 1, 10, +1, 11, ⫻2, 22, , 0, 22, +0, 22, ⫻2, 44, , 02CA16 = 10110010102 = 71410, , 0, 44 , +0, 44, ⫻2, 88, , 1, 88, +1, 89, ⫻2, 178, , 0, 178, +0, 178, ⫻2, 356, , 1, 356, + 1, 357 , ⫻2,714, , 0.714, +0.714


Page 79:
48, , Chapter 2 / Data Representation in Computer Systems, , Like addition, signed magnitude subtraction is performed in a similar way to pencil-and-paper decimal arithmetic, where it is sometimes necessary to borrow digits from the minuend ., EXAMPLE 2.12 Subtract 010011112 from 011000112 using signed-magnitude arithmetic, 0 1 1, 0, 1 1 0 0, 0 ⫺ 1 0 0 1, 0, 0 0 1 0, , 2, ⇐ borrow, 0 1 1 , ( 99), 1 1 1, ⫺ (79 ), 1 0 0, (20), , We find 011000112 ⫺ 010011112 = 000101002 in the signed-magnitude representation. ., Upon inspection, we see that the subtrahend, 01100011, is greater than the minuend, 01001111. With the result obtained in Example 2.12, we know that the difference of these two numbers is 00101002. Since the subtrahend is greater than, the minuend , all we have to do is change the sign of the difference. Thus, we find 010011112 ⫺ 011000112 = 100101002 in the signed-magnitude representation. much easier than doing all the necessary borrowing for subtraction, particularly when dealing with binary numbers). So we need to look at some examples involving positive and negative numbers. Remember the rules for addition: (1) If the signs are the same, add the magnitudes and use the same sign for the result; (2) If the signs are different, you must determine which operand has the greater magnitude. The sign of the result is equal to the sign of the operand of greater magnitude, and the magnitude must be obtained by subtracting (not adding) the lesser from the greater EXAMPLE 2.14 Add 100100112 (⫺19) to 000011012 (+13) using signed magnitude arithmetic. The first number (the addition) is negative because its sign bit is set to 1. The second number (the addition) is positive. What we are being asked to do is actually a subtraction. First, we determine which of the two numbers is larger in magnitude and use that number for the schedule. Its sign will be the sign of the result. 1 0 1, + (13), 1 1 0, (⫺6)


Page 80:
2.4 / Signed Integer Representation, , 49, , With the inclusion of the sign bit, we see that 100100112 ⫺ 000011012 =, 100001102 in the signed magnitude representation., EXAMPLE 2.15 Subtract 100110002 (⫺24) from 101010112 (⫺43 ) using, signed-magnitude arithmetic. We can convert subtraction to addition by negating ⫺24, which gives us 24, and then we can add that to ⫺43, which gives us a new problem of ⫺43 + 24. However, we know from the addition rules above indicate that since the signs now differ, we should actually subtract the smaller magnitude from the larger magnitude (or subtract 24 from 43) and make the result negative (since 43 is greater than 24)., 0 2 , 0 1– 0 1 0 1 1, (43), ⫺ 0 0 1 1 0 0 0 ⫺ (24), 0 0 1 0 0 1 1, (19), Note that we do not care about the signal until that we have done the subtraction. We know that the answer must be positive. So we end up with 101010112 ⫺, 100011002 = 000100112 in the signed-magnitude representation. As you read the examples above, you may have noticed how many questions we had to ask ourselves: Which number is greater? Am I subtracting a negative number? How many times do I have to borrow the minuendo? A computer designed to perform arithmetic in this way must make just as many decisions (albeit much faster). The logic (and the circuit) is further complicated by the fact that the signed magnitude has two representations for zero, 10000000 and 00000000 (and mathematically speaking, that simply shouldn't happen!). Simpler methods of representing signed numbers would allow for simpler and less expensive circuits. These simpler methods are based on root complement systems., 2.4.2, , Complementary systems, Number theorists have known for hundreds of years that one decimal number can be subtracted from another by adding the difference of the subtrahend of all nines and adding back one carry. This is called taking the nine's complement of the subtrahend, or more formally, finding the complement of the diminished root of the subtrahend. Let's say we want to find 167 ⫺ 52. Taking the difference of 52 of 999, we have 947. So, in nine's complement arithmetic we have 167 ⫺ 52, = 167 + 947 = 114. The carry of the hundreds column becomes to add to the ones place, giving us a correct 167 ⫺ 52 = 115. This method was commonly called "excluding 9" and was extended to binary operations to simplify computer arithmetic. The advantage that complementary systems give us over sign magnitude is that there is no need to process the sign bits separately, but we can still easily check the sign of a number by looking at its high-order bit.


Page 81:
50, , Chapter 2 / Representation of data in computer systems, , Another way to visualize complementary systems is to imagine an odometer on a bicycle. Unlike cars, when you ride your bike backwards, the odometer moves backwards as well. Assuming a three-digit odometer, if we start at zero and end at 700, we can't be sure if the bike has gone 700 miles or gone 300 miles. The easiest solution to this dilemma is to simply halve the number, space, and use 001–500 for positive miles and 501–999 for negative miles. We have effectively reduced the distance that our odometer can measure. But now if it says 997, we know the bike went back 3 miles instead of forward, 997 miles. The numbers 501–999 represent the root complements (the second of the two methods given below) of the numbers 001–500 and are used to represent negative distance. A number in base 10 is found by subtracting the base subtrahend minus one, which is 9 in decimal. More formally, given a number N in base r with d digits, the complement of the diminished root of N is defined as (rd ⫺ 1) ⫺ N. For decimal numbers, r = , 10, and the diminished root is 10 ⫺ 1 = 9 For example, the nine's complement of 2468 is 9999 ⫺ 2468 = 7531. For an equivalent binary operation, we subtract the base (2) of one from minus (2), which is 1. For example, the one's complement of 01012 is 11112 ⫺ 0101 = 1010. Although we could tediously borrow and subtract as discussed above, some experimentation will convince you that forming the complement of a binary number amounts to nothing more than changing all 1's to 0's and vice versa. This type of bit shifting is very simple to implement in computer hardware. It is important to note at this point that although we can find the nine's complement of any decimal number or the one's complement of any binary number, we are more interested in using the complement notation to represent negative numbers. We know that doing subtraction, like 10 ⫺ 7, can also be thought of as “adding the opposite,” as in 10 + (⫺7). The complement notation allows us to simplify subtraction into addition, but it also gives us a method of representing negative numbers. Since we don't want to use a special bit to represent the sign (as we did in the signed-magnitude representation), we must remember that if a number is negative, we must convert it to its complement. The result must have a 1 in the leftmost bit position to indicate that the number is negative. If the number is positive, we don't need to convert it to its complement. All positive numbers must have a zero in the leftmost bit position. Example 2.16 illustrates these concepts. EXAMPLE 2.16 Express 2310 and ⫺910 in 8-bit binary complement form.


Page 82:
2.4 / Representation of signed integers, , 51, , Suppose we want to subtract 9 from 23. To perform a one's complement subtraction, we first express the subtrahend (9) in one's complement, then we add the minuend (23); we are effectively adding ⫺9 to 23. The high-order bit will have a carry of 1 or 0, which is added to the low-order bit of the addition. (This is called the trailing carry and results from using the diminished base complement.) EXAMPLE 2.17 Add 2310 to ⫺910 using one's complement arithmetic., 1 ← 1, 0, The last carry, + 1, is added, 0, for the sum ., , 1, 0, 1, 0, , 1 , 1 1, ⇐ charges, 0 1 0 1 1 1, (23), 1 1 0 1 1 0, + (–9), 0 0 1 1 0 1 , + 1, 0 0 0 0 1 1 1 0, 1410, , EXAMPLE 2.18 Add 910 to ⫺2310 using one's complement arithmetic., The last, 0 ← 0 0 0 0 1 0 0 1, carry is zero , + 1 1 1 0 1 0 0 0, so we're done ., 1 1 1 1 0 0 0 1, , (9), + (–23), –1410, , How do we know that 111100012 is really ⫺ 1410? We simply need to take the one's complement of this binary number (remembering that it must be negative because the leftmost bit is negative). The one's complement of 111100012 is 000011102, which is 14. The main disadvantage of one's complement is that we still have two representations for zero: 00000000 and 11111111. For this and other reasons, computer engineers have long since stopped using the one's complement in favor of the more efficient two's complement representation for binary numbers., Two's complement, , Two's complement is an example of a base complement. Given a number N, in base r with d digits, the root complement of N is defined as rd ⫺ N, for N ≠ 0 and 0 for N = 0. The root complement is many times more intuitive than the root complement reduced. Using our odometer example, the ten's complement of going 2 miles forward is 102 ⫺ 2 = 998, which we already agree indicates a negative (backward) distance. Similarly, in binary, the two's complement of the 4-bit number 00112 is 24 ⫺ 00112 = 100002 ⫺, 00112 = 11012. To find the two's complement of a binary number, simply flip the bits and add 1. This simplifies addition and subtraction.


Page 83:
52, , Chapter 2 / Representation of data in computer systems, , too. However, since the subtrahend (the number we complement and add) is incremented at the beginning, there's no need to worry about the final carry. We just discard any payload involving the higher order bits. Remember, only negative numbers need to be converted to two's complement notation, as shown in Example 2.19. EXAMPLE 2.19 Express 2310, ⫺2310, and ⫺910 in 8-bit binary two's complement form. (000101112) = 111010002 + 1 = 111010012, ⫺910 = ⫺ (000010012) = 111101102 + 1 = 1111010012, We want to know its representation of decimal and decimal, suppose? Positive numbers are easy. For example, to convert the two's complement value of 000101112 to decimal, we simply convert that binary number to a decimal number to get 23. However, converting negative numbers to two's complement requires a reverse procedure similar to converting from decimal to binary. . Suppose we are given the two's complement binary value of 111101112 and we want to know the decimal equivalent. We know that this is a negative number, but we must remember that it is represented in two's complement. First we flip the bits and then we add 1 (find the one's complement and add 1). This results in the following: 000010002 + 1 = 000010012. This is equivalent to the decimal value 9. However, the original number we started with was negative, so we end up with ⫺9 as the decimal equivalent of 111101112. The following illustrates how to perform addition (and therefore subtraction, because we subtract a number by adding its opposite) using two's complement notation. 1 0 0 1, (9), + 1 1 1 0 1 0 0 1 + (–23), 1 1 1 1 0 0 1 0 –1410


Page 84:
2.4 / Signed Integer Representation, , 53, , It is left as an exercise for you to verify that 111100102 is really ⫺1410 using two's complement notation., , EXAMPLE 2.21 Find the sum of 2310 and ⫺910 in binary using complement arithmetic to two ., discard, load., , 1← 1, 0, + 1, 0, , 1, 0, 1, 0, , 1, 1 1, 0 1 0 1, 1 1 0 1, 0 0 1 1 , , 1, ⇐ carry, 1 1, (23), 1 1, + (–9), 1 0, 1410, , Note that the discarded carry in Example 2.21 did not cause an erroneous result. An overflow occurs if two positive numbers are added and the result is negative, or if two negative numbers are added and the result is positive. It is not possible to have overflow when using two's complement notation if you are adding a positive and a negative number. Simple computer circuits can easily detect an overflow condition using an easy to remember rule. You will notice in Example 2.21 that the carry, the sign bit (a 1 is carried from the previous bit position to the sign bit position), is the same as the sign bit carry (a 1 is carried and discard ). When these carries are equal, no overflow occurs. When they differ, an overflow flag is set on the arithmetic logic unit, indicating that the result is incorrect. It occurred. If the carry of the sign bit is different from the carry of the sign bit, an overflow (and therefore an error) has occurred. The tricky part is getting programmers (or compilers) to constantly check for the overflow condition. Example 2.22 indicates overflow because the carry on the sign bit (carries a 1) is not equal to the carry on the sign bit (carries a 0). EXAMPLE 2.22 Find the sum of 12610 and 810 in binary using two's complement arithmetic., Discard last, load., , 0← 1, 0, + 0, 1, , 1, 1, 0, 0, , 1, 1 , 0, 0, , 1, ⇐ load, 1 1 1 1 0, (126), 0 1 0 0 0, +(8), 0 0 1 1 0 (–122???)


Page 85:
54, , Chapter 2 / Representation of data in computer systems, , MULTIPLICATION AND DIVISION OF INTEGERS, unless sophisticated algorithms are used, multiplication and division can consume a considerable number of calculation cycles before obtaining a result. approach to these operations. In real systems, dedicated hardware is used to optimize performance, sometimes performing parts of the computation in parallel. Curious readers will want to investigate some of these advanced methods in the references cited at the end of this chapter. The simplest multiplication algorithms that computers use are similar to the traditional pencil-and-paper methods that humans use. The complete multiplication table for binary numbers couldn't be simpler: zero times any number is zero, and once any number is that number. storage areas. We also need a third storage area for the product. Starting with the low-order bit, a pointer is defined for each digit of the multiplier. For each digit in the multiplier, the multiplicand is "shifted" one bit to the left. When the multiplier is 1, the "shifted" multiplicand adds to a running sum of partial products. Because we change the multiplicand by one bit for each bit in the multiplier, a product requires twice as much working space for both the multiplicand and the multiplier. There are two simple approaches to binary division: we can iteratively subtract the denominator from the divisor, or we can use the same trial-and-error method of long division that we learned in grade school. As mentioned above, with multiplication, the most efficient methods used for binary division are beyond the scope of this text and can be found in the references at the end of this chapter. Regardless of the relative efficiency of the algorithms used, division is an operation that can always cause a computer to crash. This is the, , U, , A one is pushed to the leftmost bit, but a zero is executed. Since these charges are not equal, an overflow has occurred. (We can easily see that two positive numbers are being added, but the result is negative.) Two's complement is the most popular option for representing signed numbers. The algorithm for adding and subtracting is fairly easy, has the best representation for 0 (all 0 bits), is self-inverting, and easily extends to a larger number of bits. The biggest drawback is the observed asymmetry in the range of values ​​that can be represented by N bits. With signed magnitude numbers, say 4, the bits allow us to represent the values ​​⫺7 to +7. However, using two's complement, we can represent the values ​​⫺8 to +7, which is often confusing for anyone learning about complement representations. To see why +7 is the largest number that we can represent using the 4-bit two's complement representation, we need


Page 86:
2.5 / Floating point representation, , 55, , particularly when it comes to dividing by zero or when two numbers of very different magnitudes are used as operands. When the divisor is much smaller than the dividend, we get a condition known as division underflow, which the computer sees as the equivalent of dividing by zero, which is impossible. Computers make a distinction between integer division and floating point division. With integer division, the answer comes in two parts: a quotient and a remainder. Floating-point division results in a number that is expressed as a binary fraction. These two types of divisions are different enough from each other to ensure that each has its own special circuit. Floating-point calculations are performed in dedicated circuits called floating-point units, or FPUs., EXAMPLE Find the product of 000001102 and 000010112., Multiplying, , Partial Products, , 0 0 0 0 0 1 1 0, , +, , 0 0 0 0 0 0 0 0, , 1 0 1 1, , Add by multiplying and shifting to the left., , 0 0 0 0 1 1 0 0, , +, , 0 0 0 0 0 1 1 0, , 1 0 1 1, , Add the multiplicand and shift to the left., , 0 0 0 1 1 0 0 0, , +, , 0 0 0 1 0 0 1 0, , 1 0 1 1, , Don't add, just shift , multiplying to the left., , 0 0 1 1 0 0 0 0, , +, , 0 0 0 1 0 0 1 0, , 1 0 1 1, , Add multiplicand., , =, , 0 1 0 0 0 0 1 0 , , Product, , just remember that the first bit must be 0. If the remaining bits are all 1 (giving us the largest possible magnitude), we have 01112, which is 7. An immediate reaction to this is that the smallest So, the negative number should be 11112, but we can see that 11112 is actually ⫺1 (reverse the bits, add one and make the number negative). So how do we represent ⫺8 in two's complement notation using 4 bits? It is represented as 10002. We know that this is a negative number. If we flip the bits, (0111), add 1 (to get 1000, which is 8) and make it negative, we get ⫺8., , 2.5, , FLOATING POINT REPRESENTATION, If we wanted to build a real computer, we could use any of the integer representations that we have just studied. We would choose one of them and proceed with our design, tasks. Our next step would be to decide the word size of our system. if we want our


Page 87:
56, , Chapter 2 / Representation of data in computer systems, , to be really economical, we would choose a small word size, say 16 bits. Allowing for the sign bit, the largest integer this system can store is 32,767. So what do we do to accommodate a potential client who wants to track the number of paying viewers to professional sporting events in a given year? Certainly the number is greater than 32,767. No problem. Let's increase the size of the word. Thirty-two bits should do it. Our word is now big enough for almost anything anyone wants to count. But what if that client also needs to know how much money each viewer spends per game minute? This number is likely to be a decimal fraction. Now we are really stuck. The easiest and cheapest approach to this problem is to stick with our 16-bit system and say, “Hey, we're building a cheap system here. If you want to do fancy things with it, get a good programmer. While this position seems outrageously irreverent in the context of today's technology, it was true in the early days of every generation of computers. There was simply no floating point unit on many early mainframes or microcomputers. For many years, clever programming allowed these entire systems to act as if they were, in fact, floating point systems. operations, as if it could provide floating point, emulation, in a complete system. In scientific notation, numbers are expressed in two parts: a fractional part, called the mantissa, and an exponential part, which indicates the power of ten to which the mantissa must be raised to get the value we need. Therefore, to express 32.767 in scientific notation, we could write 3.2767 ⫻ 104. Scientific notation simplifies paper-and-pencil calculations involving very large or very small numbers. It is also the basis for floating point computing in today's digital computers. to a power of 2) and a fractional part, called a mantissa (which is a fancy word for a mantissa). The number of bits used for both the exponent and the significance depends on whether we want to optimize the range (more bits in the exponent) or the precision (more bits in the significance). For the remainder of this section, we will use a 14-bit model with a 5-bit exponent, an 8-bit significance, and a sign bit (see Figure 2.2). The more general forms are described in Section 2.5.2. Let's say we want to store the decimal number 17 in our model. We know that 17 = 17.0 ⫻ 100 = 1.7 ⫻ 101 = 0.17 ⫻ 102. Analogously, in binary, 1710 =, 100012 ⫻ 20 = 1000.12 ⫻ 21 = 100.012 ⫻ 22 = 10.0012 ⫻ 20 = 1000.12 ⫻ 21 = 100.012 ⫻ 22 = 10.0012 ⫻ bit 1 ⫻ 023 , 5 bits, , 8 bits, , Sign bit, , Exponent, , Meaning, , FIGURE 2.2, , Floating point representation


Page 88:
2.5 / Floating-point representation, , 57, , 0.100012 ⫻ 25. Using the latter form, our fractional part will be 10001000 and our exponent will be 00101, as shown here: , 0, , 0, , 0, , 1 , , 0, , 1, , 1, , 0, , 0, , 0, , 1, , 0, , 0, , 0, , Using this form, we can store numbers of much larger magnitude than we could using a Fixed Point representation 14-bit (using a total of 14 binary digits, plus a binary point, or base). If we want to represent 65536 = 0.12 ⫻ 217 in this model, we have:, 0, , 1, , 0, , 0, , 0, , 1, , 1, , 0, , 0, , 0, , 0 , , 0 , , 0, , 0, , An obvious problem with this model is that we do not provide negative exponents. If we wanted to store 0.25, we couldn't because 0.25 is 2⫺2 and the exponent ⫺2 cannot be represented. We could get around the problem by adding a sign bit to the exponent, but it's more efficient to use a biased exponent because we can use simpler integer circuits when comparing the values ​​of two floating point numbers. , The idea behind using a bias value is to convert each integer in the range to a non-negative integer, which is then stored as a binary number. Integers in the desired range of exponents are first adjusted by adding this fixed bias value to each exponent. The bias value is a number close to the middle of the range of possible values ​​that we have selected to represent zero. In this case, we might select 16 because it is in the middle between 0 and 31 (our exponent is 5 bits, allowing for 25 or 32 values). Any number greater than 16 in the exponent field will represent a positive value. Values ​​less than 16 will indicate negative values. This is called the excess representation of 16 because we have to subtract 16 to get the true value of the exponent. Note that all-zero or all-one exponents are normally reserved for special numbers (such as zero or infinity). Returning to our example of storing 17, we calculate 1710 = 0.100012 ⫻ 25. The biased exponent is now 16 + 5 = 21:, 0, , 1, , 0, , 1, , 0, , 1, , 1, , 0, , 0, , 0, , 1, , 0, , 0, , 0, , 0 , , 0, , 0, , If we wanted to store 0.25 = 1.0 ⫻ 2⫺2 we would have:, 0, , 0, , 1, , 1, , 1, , 0, , 1, , 0, , 0 , , 0, , 0, , There is still a big problem with this system: we don't have a unique representation for each number. All of the following are equivalent:


Page 89:
58, , Chapter 2 / Representation of data in computer systems, , 0, , 1, , 0, , 1, , 0, , 1, , 1, , 0, , 0, , 0, , 1, , 0, , 0 , , 0, , =, , 0, , 1, , 0, , 1, , 1, , 0, , 0, , 1, , 0, , 0, , 0, , 1, , 0, , 0, , =, , 0, , 1, , 0, , 1, , 1, , 1, , 0, , 0, , 1, , 0, , 0, , 0, , 1, , 0, , =, , 0 , , 1, , 1, , 0, , 0, , 0, , 0, , 0, , 0, , 1, , 0, , 0, , 0, , 1, , because synonymous forms like these are not suitable for In digital computers, a convention has been established that the leftmost bit of significance will always be a 1. This is called normalization. This convention has the added advantage that the 1 can be implicit, effectively providing a bit more precision in meaning. 0.0001 ⫻ 2⫺1 = 0.001 ⫻ 2⫺2 = 0.01 ⫻ 2⫺3 = 0.1, ⫻ 2⫺4. Applying bias, the exponential field is 16 ⫺ 4 = 12., 0, , 0, , 1, , 1, , 0, , 0, , 1, , 0, , 0, , 0, , 0, , 0, , 0, , 0, , Note that in this example we do not express the number using the normalization notation that implies 1., 2.5.2, , Floating-point arithmetic, If we wanted to add two decimal numbers that are expressed in scientific notation, such as 1.5 ⫻ 102 + 3.5 ⫻ 103, we would change one of the numbers so that they are both expressed in the same base power. In our example, 1.5 ⫻, 102 + 3.5 ⫻ 103 = 0.15 ⫻ 103 + 3.5 ⫻ 103 = 3.65 ⫻ 103. Floating-point addition and subtraction work the same way, as shown illustrate below. EXAMPLE 2.24 Sum the following binary numbers represented in a normalized 14-bit format with a bias of 16., 0, , 1, , 0, , 0, , 1, , 0, , 1, , 1, , 0, , 0 , , 1, , 0, , 0, , 0, , 0, , 1, , 0, , 0, , 0, , 0, , 1, , 0, , 0, , 1, , 1, , 0, , 1, , 0, , + , , We see that the agenda is raised to the second power and that the agenda is raised to the zero power. Aligning these two operands at the binary point gives us:


Page 90:
2.5 / Floating-point representation, , 59, , 11.001000, + 0.10011010, 11.10111010, Renormalizing, we retain the largest exponent and truncate the low-order bit. Thus, we have: , 0, , 1, , 0, , 0 , , 1, , 0, , 1, , 1, , 1, , 0, , 1, , 1, , 1, , 0, , The multiplication and division are done using the same rules for exponents applied to decimal arithmetic, such as 2⫺3 ⫻ 24 = 21, for example. , 0, , 0, , 1, , 0, , 0, , 0, , = 0.11001000 ⫻ 22, , 0, , 1, , 0, , 0, , 0 , , 0, , 1, , 0, , 0 , , 1, , 1, , 0, , 1, , 0, , = 0.10011010 ⫻ 20, , Multiplying 0.11001000 by 0.10011010 gives a product of 1.11011011. Renormalizing and supplying the appropriate exponent, the floating point product is:, 0, , 2.5.3, , 1, , 0, , 0, , 0, , 1, , 1, , 1, , 1, , 0, , 1 , , 1, , 0, , 1, , Floating Point Errors, When we use pencil and paper to solve a trigonometry problem or calculate interest on an investment, we intuitively understand that we are working in the real number system. We know that this system is infinite, because given any pair of real numbers, we can always find another real number that is less than one and greater than the other. Unlike mathematics in our imagination, computers are finite systems, with finite storage. When we call our computers to perform floating-point calculations, we are modeling the infinite system of real numbers into a finite system of integers. What we actually have is an approximation of the real number system. The more bits we use, the better the approximation. However, there is always some element of error no matter how many bits we use. Floating point errors can be blatant, subtle, or go unnoticed. Egregious errors, such as numeric overflow or underflow, are what cause programs to fail. Subtle errors can lead to wildly erroneous results that are often difficult to detect before they cause real problems. For example, in our simple model, we can express normalized numbers in the range ⫺.111111112 ⫻ 215 to +.11111111 ⫻ 215. Obviously, we cannot store 2⫺19 or 2128; they just don't fit. It's not so obvious that we can't accurately store 128.5, which is within our range. moved out


Page 91:
60, , Chapter 2 / Representation of data in computer systems, , 128.5 for binary, we have 10000000.1, which is 9 bits wide. Our meaning can contain only eight. Normally, the low-order bit is discarded or rounded up to the next bit. No, it doesn't matter how we handle it, however we introduce a bug into our system. We can calculate the relative error in our representation by taking the ratio between the absolute value of the error and the true value of the number. Using our example of 128.5, we find:, 128.5 ⫺ 128 = 0.003906 ⬇ 0.39%., 128, If we are not careful, such errors can propagate throughout a long calculation, causing a substantial loss of precision. Figure 2.3 illustrates the error propagation when we iteratively multiply 16.24 by 0.91 using our 14-bit model. By converting these numbers to 8-bit binary, we see that we have a substantial error from the start. As you can see, in six iterations, we more than tripled the error in the product. Continued iterations will produce a 100% error because the product eventually reaches zero. Although this 14-bit model is so small that it exaggerates the error, all floating-point systems behave in the same way. There is always some degree of error involved in representing real numbers in a finite system, no matter how big we make that system. Even the smallest error can have catastrophic results, especially when computers are used to track physical events, such as in military and medical applications. The challenge for computer scientists is to find efficient algorithms to control such errors within the limits of performance and economy. ) (13.1885) , 0.11101000 = ,, 1011.1111, (11.9375) ,, 12.2380 ,, 2.46%,, 1011.1111 ⫻, (11.9375), 0.11101000 = ,, 1010.1101, (6.2931,5%) , 1010.1101 ⫻, (10,8125) ,, 0.11101000 =, 1001.1100, (9.75) ,, 10,1343 ,, 3.79%,, 1001,1100 ⫻, (9.75) ,, 0.11101000 = ,, 1000.1101, (8,8125), 8,3922 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,, 4.44.1101, (8.8125), 8.3922 ,,,,,,, 4.44.1101, (8.8125), 8.3922 ,,,,,,,,,. , (16.125), , FIGURE 2.3, , Error propagation in a 14-bit floating-point number


Page 92:
2.5 / Floating Point Representation, , 2.5.4, , 61, , The IEEE-754 Floating Point Standard, The floating point model we use in this section is designed to simplify and understand the concepts. We could extend this model to include any number of bits we wanted. Until the 1980s, this type of decision was purely arbitrary, leading to numerous incompatible representations between systems from various manufacturers. In 1985, the Institute of Electrical and Electronics Engineers (IEEE) published a floating-point standard for single- and double-precision floating-point numbers. This standard is officially known as IEEE-754 (1985). The IEEE-754 single precision standard uses an excess bias of 127 in an 8-bit exponent. The meaning is 23 bits. With the sign bit included, the total size of the word is 32 bits. When the exponent is 255, the quantity represented is ⫾ infinity (which has a meaning of zero) or "is not a number" (which has a meaning other than zero). "Not a number", or NaN, is used to represent a value that is not a real number, and is often used as an indicator of error. Double-precision numbers use a 64-bit signed word consisting of an 11-bit exponent and a 52-bit meaning. The bias is 1023. The range of numbers that can be represented in the IEEE double precision model is shown in Figure 2.4. NaN is indicated when the exponent is 2047. At a small performance cost, most FPUs only use the 64-bit model, so only a specialized circuitry needs to be designed and implemented. IEEE-754 double precision models have two representations for zero. When the exponent and the significance are all zero, the quantity stored is zero. It doesn't matter what value is stored in the signal. For this reason, programmers must be careful when comparing a floating point value to zero. Virtually all newly designed computer systems have adopted the IEEE-754 floating point model. Unfortunately, by the time this standard emerged, many mainframe computer systems had established their own floating-point systems. floating point, system, and IEEE-754. However, prior to 1998, IBM systems used the same architecture for floating-point arithmetic that the original System/360 used, , Zero, , Negative, Overflow, , Expressible, Negative Numbers, , –1.0 ⫻ 10308, , FIGURE 2.4 , , Negative, Positive, Underflow Underflow, –1.0 ⫻ 10– 308, , Expressible, Positive, Numbers, , 1.0 ⫻ 10– 308, , Positive, Overflow, , 1.0 ⫻ 10308, , Range precision IEEE-754 double numbers


Page 93:
62, , Chapter 2 / Representation of data in computer systems, , in 1964. It would be expected that both systems would remain compatible, due to the large amount of old software running on these systems., , 2.6, , CHARACTER CODES We have seen how digital computers use the binary system to represent and manipulate numerical values. We have yet to consider how these inner values ​​can be turned into a form that is meaningful to humans. How this is done depends on the encoding system used by the computer and how the values ​​are stored and retrieved. , mainframe and mid-range systems. As its name implies, BCD encodes each digit of a decimal number to a 4-bit binary form. When stored in an 8-bit byte, the upper nibble is called a zone and the lower one is called a digit. (This convention comes from the days of punch cards, where each column of the card could have a "zone punch" on one of the top 2 lines and a "digit punch" on one of the bottom 10 lines.) The high-order nibble in a BCD byte is used to contain the sign, which can take one of three values: an unsigned number is denoted 1111; a positive number is indicated by 1100; and a negative number is indicated by 1101. The encoding of BCD numbers is shown in Figure 2.5. As you can see in the figure, six possible binary values, 1010 to 1111, are not used. While it may seem like almost 40% of our values ​​will go to waste, we are gaining a considerable advantage in accuracy. For example, the number 0.3 is a repeating decimal when stored in binary. Truncated to an 8-bit fraction, it is converted back to 0.296875, giving us an error of, Digit, BCD, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, , 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, , Zones, 1111, 1100, 1101, , FIGURE 2.5, , Unsigned, Positive, Negative, , Decimal with binary code


Page 94:
2.6 / Character codes, , 63, , approximately 1.05%. In BCD, the number is stored directly as 1111 0011 (we assume the decimal point is implicit in the data format), without giving any error. The digits of BCD numbers only take up one nibble, so we can save space and make calculations simpler when adjacent digits are placed in adjacent nibbles, leaving one nibble for the sign. This process is known as packing, and numbers stored in this way are called packed decimal numbers.EXAMPLE 2.26 Represent ⫺1265 in 3 bytes using packed BCD. packing, this string becomes:, 0001 0010 0110 0101, adding the sign after the low order digit and padding the high order digit with, every 3 bytes we get:, 1111, , 0001, , 0010, , 0110, , 0101 , , 1101, , 2.6.2, , EBCDIC, Prior to the development of the IBM System/360, IBM used a variation of 6-bit BCD to represent characters and numbers. This code was severely limited in the way it could represent and manipulate data; in fact, lower case letters were not part of his repertoire. System/360 designers needed more information, processing power, as well as a consistent way to store numbers and data. To maintain compatibility with earlier computers and peripheral equipment, IBM engineers decided that it would be better to simply expand BCD from 6-bit to 8-bit. Therefore, this new code was named Extended Binary Coded Decimal Interchange Code (EBCDIC). IBM continues to use EBCDIC on IBM mainframes and midrange computing systems. The EBCDIC code is shown in Figure 2.6 in the form of zone digits. Characters are represented by adding digit bits to zone bits. For example, the character a is 1000 0001 and the digit 3 is 1111 0011 in EBCDIC. Note that the only difference between the uppercase and lowercase characters is in bit position 2, which makes converting from uppercase to lowercase (or vice versa) a simple matter of flipping a bit. The zone bits also make it easier for a programmer to test the validity of input data. between systems. The American Standard Code for Information Interchange (ASCII) is the result of these efforts. ASCII is a direct descendant of the encoding schemes used for decades by teletype (telex) devices. These devices used a 5-bit (Murray) code that


Page 95:
64, , Capítulo 2 / Representação de Dados em Sistemas Computacionais, Dígito, , Zona 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111, 0000, , , ETXUL, SOH, , , NTXUL , , 0001, , DLE, DC1, DC2, TM, RES, NL, BS, IL, 0010, DS, SOS, FS, BYP, LF, ETB, ESC, SYN, , PN, , RS, , UC, , EOT, , 0011, 0100, , SP, , 0101, , &, , 0110, , –, , LC, , DEL, , RLF SMM, CAN, , EM, , /, , 0111, , ', , 1000, , a, , b, , c, , d, , e, , f, , g, , h, , i, , 1001, , j, , k, , l, , m , , n, , o, , pag, , q, , r, , 1010, , ~, , s, , t, , u, , v, , w, , x, , y, , z , , VT, , FF, , CC, , CU1, , IFS, , SM, , CU2, , CR, , SR, , IGS, , SI, , IRS, , IUS, , ENQ ACK, , BEL, SUB, , CU3, , DC4, , NAK, , [, , ., , <, , (, , ], , $, , *, , ), , ;, , ˆ, , |, , ,, , %, , _, , >, , ? , , :, , #, , @, , ', , =, , ", , +, , !, , 1011, 1100, , {, , A, , B, , C, , D, , E, , F , , G, , H, , I, , 1101, , }, , J, , K, , L, , M, , N, , O, , P, , Q, , R, , S , , DOS, , V, , W, , X, , Y, , Z, , 2, , 3, , 4, , 5, , 6, , 7, , 8, , 9, , 1110, , \, , 1111, , , , 1, , Abbreviations: , NUL , SOH, STX, ETX, PF, HT, LC, DEL, RLF, SMM, VT, FF, CR, SO, YES, DLE, DC1, DC2, , Null, Initial of title , Initial of text, End of text, Drilling, Horizontal Guide, Lowercase, Exclude, Reverse Line Advance, Manual Message Initial, Vertical Guide, Form Advance, Carriage Return, Desubike for Outside, Desubike for Inside, Dice Output Link, Controls from Device 1 , Controls from device 2, , FIGURE 2.6, , TM , RES, NL, BS, IL, CAN, EM, CC, CU1, IFS, IGS, IRS, IUS, DS, SOS, FS, .BYP, LF, . , Ribbon Mark, Restore, Nova linha, Fallback, Inactive, Cancel, End Media, Cursor Control, Client Use 1, Scroll File Separator, Scroll Group Separator, Scroll Record Separator, Scroll Unit Separator, Digit Selection , Start of importance, Field separator, Omit , Line advance, , ETB, ESC, SM, CU2 , ENQ, A CK, BEL, SYN, PN, RS, UC, EOT, CU3, DC4, NAK, SUB, SP, Transmission end lock, Exhaust, Defined mode, Customer use 2, Enquiry, Confirmation, Touch bell (bipe), Idle Synchronous, Punch on, Record separator, Capital separator, Transmission end, Customer use 3, Device control 4, Negative confirmation, Substitute, Space, O EBCDIC code (Values ​​given in binary zone-digit format).


Page 96:
2.6 / Character codes, , 65, , were derived from the Baudot code, which was invented in the 1880s. By the early 1960s, the limitations of 5-bit codes became apparent. The International Organization for Standardization (ISO) developed a 7-bit encoding scheme that they called International Alphabet Number 5. In 1967, a derivative of this alphabet became the official standard that we now call ASCII. As can be seen in Figure 2.7, ASCII defines codes for 32 control characters, 10 digits, 52 letters (upper and lower case), 32 special characters (such as $ and #), and the space character. The high-order (eighth) bit was created to be used for parity. Parity is the most basic of all error detection schemes. It is easy to implement, on simple devices like teletypes. A parity bit is "on" or "off" depending on whether the sum of the other bits in the byte is even or odd. For example, if we decide to use even parity and we are sending ASCII A, the bottom 7 bits are 100, 0001. Since the sum of the bits is even, the parity bit would be turned off and we would transmit 0100 0001. ASCII C, 100 0011 , the parity bit would be set before we send the 8-bit byte, 1100 0011. Parity can be used to detect single-bit errors. We will discuss more sophisticated error detection methods in Section 2.8. To allow compatibility with telecommunications equipment, computer manufacturers gravitated toward the ASCII code. However, as computer hardware became more reliable, the need for a parity bit began to disappear. In the early 1980s, manufacturers of microcomputers and microcomputer peripherals began using the parity bit to provide an "extended" character set for values ​​between 12810 and 25510. symbols for characters that form the sides of character boxes of foreign languages, such as n. Unfortunately, no amount of clever trickery can make ASCII a truly international code exchange. As such, they have limited ability to provide data representation for the non-Latin alphabets used by the majority of the world's population. As all countries began to use computers, each was creating codes that would more effectively represent their native languages. None of them were necessarily compatible with any other, putting another barrier in the way of the emerging global economy. new international information, exchange code called Unicode. This group is appropriately called Unicode, Consortium. Unicode is a 16-bit alphabet that is compatible with ASCII and the Latin-1 character set. Complies with the international alphabet ISO/IEC 10646-1. Since Unicode's base encoding is 16 bits, it has the ability to encode most of the characters used in all of the world's languages. If that wasn't enough, Unicode also defines an extension mechanism that will allow a million additional characters to be encoded. That's enough to provide codes for every written language in the history of civilization.


Page 97:
66, , Capítulo 2 / Representación de datos en sistemas informáticos, , 0, , NUL, , 16 DLE, , 32, , 48 0, , 64 @, , 80 P, , 96, , `, , 112 p, , 1 , , SOH, , 17 DC1, , 33 !, , 49 1, , 65 A, , 81 Q, , 97, , a, , 113 q, , 2, , STX, , 18 DC2, , 34 ", , 2 , , 66 B, , 82 R, , 98, , b, , 114 r, , 3, , ETX, , 19 DC3, , 35 #, , 51 3, , 67 C, , 83 S, , 99, , .c , , 115 s, , 4, , EOT, , 20 DC4, , 36 $, , 52 4, , 68 D, , 84 T, , 100 d, , 116 t, , 5, , ENQ, , 21 NAK, , 37 %, , 53 5, , 69 E, , 85 U, , 101 E, , 117 U, , 6, , ACK, , 22 SYN, , 38 &, , 54 6, , 70 F, , 86 V, . , 102 f, , 118 v, , 7, , BEL, , 23 ETB, , 39', , 55 7, , 71 G, , 87 W, , 103 g, , 119 w, , 8, , BS, , 24 CAN , , 40 (, , 56 8, , 72 H, , 88 X, , 104 h, , 120 x, , 9, , TAB, , 25 EM, , 41 ), , 57 9, , 73 I, , Y , , 105 i, , 121 y, , 10 LF, , 26 SUB, , 42 *, , 58 :, , 74 J, , 90 Z, , 106 j, , 122 z, , 11 VT, , 27 ESC, . , 43 +, , 59 ;, , 75 K, , 91 [, , 107 k, , 123 {, , 12 FF, , 28 FS, , 44 ,, , 60 <, , 76 L, , 92 \, , yo , , |, , 13 CR, , 29 GS, , 45 -, , 61 =, , 77 M, , 93 ], , 109 m, , 125 }, , 14 SO, , 30 RS, , 46 ., , 62 >, , 78 N, , 94 ˆ , , 110 n, , 126 ~, , 15 IF, , 31 US, , 47 /, , 63 ?, , 79 O, , 95 _, , 111 o, , 127 OF, , Abreviaturas:, NUL, , Nulo, , DLE, , Escape de enlace de datos, , SOH, , Inicio de título, , DC1, , Dispositivo de control 1, , STX, , Inicio de texto, , DC2, , Dispositivo de control 2, , ETX , . , Fin de texto, , DC3 , , Control del dispositivo 3, , EOT, , Fin de transmisión, , DC4, , Control del dispositivo 4, , NAK, , Reconocimiento negativo, , SYN, , Inactividad síncrona, , ETB, , Fin del bloque de transmisión, , CAN, , Cancelar , , EM, , Fin del medio, , SUB, , Sustituir, , ENQ, ACK, , Consulta, Confirmar, , BEL, , Campana (bipe), , BS, , Retroceso , , HT, , Guía horizontal, , LF, , Avance de línea, línea nueva, , ESC, , Escape, , VT, , Pestaña vertical, , FS, , Separador de archivos, , FF, , Avance de página, página nueva, , GS , , Separador de grupos, , CR, , Retorno de carro , , RS, , Separador de registros, , SO, , Dislocar para foros, , US , , Separador de unidades, , SI, , Dislocar por dentro, , DEL, , Excluir/inativo, , FIGURA 2.7, , O Código ASCII (valores dados en decimal)


Page 98:
2.7 / Codes for data recording and transmission, , Character, Types, , Character set, Description, , Number of, Characters, , Hexadecimal, Values, , Alphabets, , Latin, Cyrillic,, Greek, etc., , 8192, , 0000 , to, 1FFF, , Symbols, , Dingbats,, Mathematical,,, etc., , 4096, , 2000, to, 2FFF, , CJK, , Chinese, Japanese, and Korean Phonetic, Symbols and, punctuation, , , 4096 , , 3000 to 3FFF , , Han , , Unified Chinese , Japanese and Korean , , , 40,960 , , 4000 , to DFFF , , , 4096 , , E000 , to EFFF , , 4095 , , F000 , to FFFE , Han Expansion or Overflow , , User Defined, , FIGURE 2.8, , 67, , Unicode code space, , The Unicode code space consists of five parts, as shown in Figure 2.8. A complete Unicode-compliant system will also allow composite characters to be formed from individual codes, such as combining ´ and A to form Á. The algorithms used for these compound characters, as well as the Unicode extensions, can be found in the references at the end of this chapter. Although Unicode has not yet become the exclusive alphabet for American computers, most manufacturers include at least some limited support in their systems. Unicode is currently the default character set for the Java programming language. Ultimately, the acceptance of Unicode by all manufacturers will depend on how aggressively they are willing to position themselves as international players and how low-cost disk drives can be produced to support an alphabet with twice the storage requirements. than ASCII or EBCDIC. , , 2.7, , CODES FOR DATA RECORDING AND TRANSMISSION, ASCII, EBCDIC, and Unicode are unambiguously represented in computer memories. (Chapter 3 describes how this is done using binary digital devices.) Digital switches, such as those used in memories, are either "off" or "on" with nothing in between. However, when data is recorded on some type of recording medium (such as tape or disk) or is transmitted over long distances, binary signals can become


Page 99:
68, , Chapter 2 / Representation of data in computer systems, , blurred, particularly when dealing with long strings of ones and zeros. This blurring is partially attributable to time variations that occur between senders and receivers. Magnetic media, such as tapes and disks, can also become out of sync due to the electrical behavior of the magnetic material from which they are made. Signal transitions between the "high" and "low" states of digital signals help maintain synchronization in data recording and communication devices. To this end, ASCII, EBCDIC, and Unicode are translated into other codes before being transmitted or recorded. This translation is performed by the control electronics within the data recording and transmission devices. Neither the user nor the host computer are aware that this translation has taken place. Telecommunications devices send and receive bytes using "high" and "low" pulses in the transmission medium (copper wire, for example). Magnetic storage devices record data using changes in magnetic polarity called flux reversals. Certain encryption methods are more suitable for data communications than for data recording. New codes are continually invented to accommodate evolving recording methods and improved transmission and recording media. We'll look at some of the more popular recording and streaming code to show how some of the challenges in this area have been overcome. For the sake of brevity, we will use the term data encoding to refer to the process of converting a single character code, such as ASCII, into some other code that is more suitable for storage or transmission. The encoded data will be used to refer to character codes, therefore encoded., 2.7.1, , Non-return code, The simplest data encoding method is the non-return code (NRZ). We use this code implicitly when we say that "highs" and "lows" represent ones and zeros: ones are usually high voltage and zeros are low voltage. Usually the high voltage is positive 3 or 5 volts; low voltage is negative 3 or 5 volts. (The reverse is logically equivalent.) For example, the ASCII code for the English word OK with even parity is: 11001111 01001011. This pattern in the NRZ code is also shown in its sign form, as well as in its magnetic flux form in Figure 2.9. Each of the bits occupies an arbitrary portion of time on a transmission medium or an arbitrary particle of space on a disk. These cuts and smears are called bit cells. ones in ASCII O. If we transmitted the longer form of the word OK, OK, we would have a long string of zeros as well as a long string of ones: 11001111 01001011 01000001, 01011001. Unless the receiver is precisely in sync with the sender , neither can know the exact duration of the signal for each bit cell. Slow or out-of-date synchronization within the receiver can cause the bit sequence for OK to be received as: 10011 0100101 010001 0101001, which would be translated back to ASCII as <ETX>(), not looking like what was sent. (<ETX> is used here to refer to the single ASCII end-of-text character, 26 in decimal.)


Page 100:
69, , 2.7 / Codes for recording and transmitting data, a., 1, , 1, , 0, , 0, , 1, , 1, , 1, , 1, , 0, , 1, , 0, , 0 , , 1, , 0, , 1, , 1, , High, Zero, Low, , b., , FIGURE 2.9, , NRZ encoding of OK as, a. Transmission waveform, b. Magnetic Flux Pattern (The direction of the arrows indicates the magnetic polarity), A little experimentation with this example will show you that if a single bit is lost in the NRZ code, the entire message can be reduced to gibberish. , , Non-return-to-zero inverted encoding, The non-return-to-zero inverted (NRZI) method addresses part of the desynchronization problem. NRZI provides a transition (high to low or low to high) for each binary, and no transition to binary zero. The NRZI encoding, for OK (with even parity) is shown in Figure 2.10. Although NRZI eliminates the problem of discarding binaries, we still face the problem of long strings of zeroes causing the receiver or reader to be out of phase, which could cause lost bits along the way. The obvious approach to solving this problem is to inject enough transitions into the transmitted waveform to keep the sender and receiver synchronized, preserving the information content of the message. This is the essential idea behind all encryption methods used today for data storage and transmission. , , 1, , 0, , 0, , 1, , NRZI coding correct, , 0, , 1, , 1


Page 101:
70, , Chapter 2 / Data Representation in Computer Systems, , 2.7.3, , Phase Modulation (Manchester Coding), The coding method commonly known as phase modulation (PM), or Manchester coding, deals directly with the problem of synchronization. PM provides a transition for each bit, whether it is one or zero. In PM, each binary is marked by an "up" transition and binary zeros with a "down" transition. Additional transitions are provided at bit cell boundaries when necessary. The PM encoding of the word OK is shown in Figure 2.11. Phase modulation is often used in data transmission applications such as local area networks. However, it is inefficient for use in data storage. If PM were used for tape and disk, phase modulation would require twice the bit density of NRZ. (One flow transition for each half-bit cell, depicted in Figure 2.11b.) However, we have just seen how the use of NRZ can lead to unacceptably high error rates. Thus, we could define a "good" encoding scheme as one that strikes a most economical balance between "excessive" storage volume requirements and "excessive" error rates. Various codes have been created in an attempt to find this middle ground., , 2.7.4, , Frequency Modulation, As used in digital applications, Frequency Modulation (FM) is similar to Phase Modulation in that it is provides at least one transition for each bit cell. These time transitions occur at the beginning of each bit cell. To encode a binary 1, an additional transition is provided in the center of the bit cell. The FM coding, for OK, is shown in Figure 2.12. As you can easily see from the figure, FM is only slightly better than PM in terms of storage requirements. FM, however, lends itself to a coding method, called modified frequency modulation (MFM), in which the bit cell boundary changes, , to., 1, , 1, , 0, , 0, , 1 , , 1, , 1 , , 1, , 0, , 1, , 0, , 0, , 1, , 0, , 1, , 1, , b., FIGURE 2.11, , Phase modulation (Manchester coding) word OK as: a. Transmission waveform, b. magnetic flux pattern


Page 102:
2.7 / Data recording and transmission codes, , 1, , 1, , 0, , 0, , 1, , FIGURE 2.12, , 1, , 1, , 1, , 0, , 1, , 0, , 0, , 1 , , 0, , 1, , 71, , 1, , The FM encoding of OK, , only occurs between consecutive zeros. With MFM, then, at least one transition is provided for every pair of bit cells, unlike every cell in PM or FM. With fewer transitions than PM and more transitions than NRZ, MFM is a highly efficient code in terms of economy and error handling. For many years, MFM was virtually the only encryption method used for hard drive storage. MFM, the encoding for OK is shown in Figure 2.13., 2.7.5, Run-Length-Limited Code, Run-length-limited (RLL) is an encoding method in which the block character code, words like ASCII or EBCDIC, are translated into specially designed code words to limit the number of consecutive zeros that appear in the code. An RLL(d, k) code allows a minimum of d and a maximum of k consecutive 0's to appear between any pair of consecutive 1's. Clearly, RLL keywords must contain more bits than the original character, the code. However, since the RLL is NRZI-encoded on disk, the RLL-encoded data actually takes up less space on the magnetic media because there are fewer flux transitions involved. The keywords used by RLL are designed to prevent a disk from going out of sync, as would happen if a "flat" binary NRZI code were used. Although many variants exist, RLL(2, 7) is the predominant code used by magnetic disk systems. It's technically a mapping of 16-bit ASCII or 8-bit EBCDIC characters. However, it is almost 50% more efficient than MFM in terms of flow reversal. (The proof of this is left as an exercise.) Theoretically speaking, RLL is a form of data compression called Huffman coding (discussed in Chapter 7), where the most likely bit patterns of information, 1, , 1, , 0 , , FIGURE 2.13, , 0, , 1, , 1, , 1, , 1, , 0, , 1, , 0, , 0, , 1, , 0, , 1, , 1, , Frequency Modulation Coding modified OK


Page 103:
72, , Chapter 2 / Data Representation in Computer Systems, are coded using the shortest codeword bit patterns. (In our case, we are talking about the smallest number of flow reversals.) The theory is based on the assumption that the presence or absence of a 1 in any bit cell is an equally likely event. From this assumption, we can infer that the probability is 0.25 that pattern 10 will occur in any pair of adjacent bit cells. (P(bi = 1) = 12 ;, P(bj = 0) = 12 ; ⇒ P(bibj = 10) = 12 ⫻ 21 = 14 .) Similarly, the bit pattern 011 has a probability of 0.125 of occurring. Figure 2.14 shows the probability tree for the bit patterns used in RLL(2, 7). Figure 2.15 shows the bit patterns used by RLL(2, 7). ., Figure 2.16 compares the MFM encoding for OK with its RLL(2, 7) NRZI encoding. MFM has 12 flow transitions to 8 transitions for RLL. If the limiting factor in disk design is the number of flow transitions per square millimeter, we can pack 50% more OKs into the same magnetic area using RLL than using MFM. For this reason, RLL is used almost exclusively in the manufacture of high capacity disk drives., Root, , 0, , 1, , 0, , 1, , 0, P(10)=, , 0, , 1, , P( 000)=, , , , P(010)=, , 0, , P(11)=, , , , , , 1, P(011)=, , , , 1, , P(0010)= , , FIGURE 2.14, , 0, , , , 1, , P(0011)= , , The probability tree for encoding RLL(2, 7), , Character bit, Pattern, , RLL(2, 7), Code RLL(2, 7)


Page 104:
2.8 / Detection and correction of errors, , 73, , 12 Transitions, , 1, , 0, , 0, , 0, , 0, , 0, , 0, , 0, , 1, , 1, , 0, , 0 , , 1, , 1, , 1, , 1, , 0, , 1, , 0, , 0, , 1, , 0, , 1, , 1, , 1, , 0, , 0, , 0, , 1 , , 0, , 0, , 0, , 1, , 0, , 0, , 1, , 0, , 0, , 1, , 0, , 0, , 1, , 0, , 0, , 1, , 0, , 0, , 0, , 8 Transitions, , FIGURE 2.16, , 2.8, , MFM (upper) and RLL(2, 7) (lower) encoding for OK, , ERROR DETECTION AND CORRECTION, regardless of the method of encoding used, with no communication channels or storage, the media can be completely error-free. It is a physical impossibility. As the baud rates increase, the bit timing becomes tighter. As more bits are compressed per square millimeter of storage, flux densities increase. Error rates increase in direct proportion to the number of bits per second transmitted or the number of bits per square millimeter of magnetic storage. In Section 2.6.3, we mentioned that a parity bit can be added to an ASCII byte to help determine if any of the bits were corrupted during transmission. This method of error detection has limited effectiveness: Simple, parity can only detect an odd number of errors per byte. If two errors occur, we cannot detect a problem. In Section 2.7.1, we showed how the 4-byte string for the OK word can be received as the 3-byte string <ETX>(). Alert readers noted that the parity bits for the second string were correct, allowing nonsense to pass for good data. If such errors occur in financial reporting or in program code, the effects can be disastrous. As you read through the following sections, you should keep in mind that just as it is impossible to create error-free media, it is also impossible to detect or correct 100% of all errors that can occur on media. Error detection and correction is yet another study of the tradeoffs that must be made when designing computer systems. Thus, a well-constructed error control system is one in which a "reasonable" number of "reasonably" expected errors can be detected or corrected within the limits of "reasonable" economics. (Note: the word reasonable is implementation dependent.) These are self-test codes that quickly indicate if previous digits have been misread. Cyclic


Page 105:
74, , Chapter 2 / Representation of data in computer systems, , redundancy check (CRC) is a type of checksum used mainly in data communications that determines if an error has occurred in a large block or stream of data. bytes of information. The larger the block to be scanned, the larger the checksum must be to provide adequate protection. Checksums and CRCs are a type of systematic error detection scheme, which means that the error check bits are added to the original byte of information. The group of error checking bits is called a syndrome. The original information byte remains unchanged by adding error check bits. The word cyclic in cyclic redundancy check refers to the abstract mathematical theory behind this error control system. Although a discussion of this theory is beyond the scope of this text, we can demonstrate how the method works to help understand its power to detect transmission errors cost-effectively. a modulo Twelve-hour clock arithmetic is a modulo 12 system that is used every day to tell time. When we add 2 hours to 11:00, we get 1:00. Arithmetic modulo 2 uses two binary operands without borrowing or carrying. The result is also binary and is also a member of the modulo 2 system. Because of this closure under addition and the existence of identity elements, mathematicians say that this modulo 2 system forms an algebraic field. The addition rules are as follows: 0+0=0, 0+1=1, 1 +0=1, 1+1=0, EXAMPLE 2.27 Find the sum of 10112 and 1102 modulo 2., 1011, +110, 11012 (mod 2), this addition makes sense only in modulo 2., modulo 2 division operates through a series of partial additions using modulo 2 addition rules. Example 2.28 illustrates the process.


Page 106:
2.8 / Troubleshooting, , EXAMPLE 2.28, for 10112., 1011)1001011, 1011, 0010, , 001001, 1011, 0010, 00101, , 75, , Find the quotient and remainder when dividing 10010112, 1. Write the divisor directly below, the first bit of the dividend., 2. Add these numbers modulo 2., 3. Reduce the bits of the dividend, so that the first 1 of the difference can align with the first 1 of the divisor., 4 .Copy the divisor as in Step 1., 5. Add as in Step 2., 6. Decrease another bit., 7. 1012 is not divisible by 10112, so this is the remainder., , The quotient is 10102 ., Arithmetic operations on the field modulo 2 have equivalent polynomials that are analogous to polynomials on the field of integers. We saw how positional number systems represent numbers in increasing powers of a root, for example, 10112 = 1 ⫻ 23 + 0 ⫻ 22 + 1 ⫻ 21 + 1 ⫻ 20. By setting X = 2, the binary number 10112 becomes an abbreviation from the polynomial:, 1 ⫻ X3 + 0 ⫻ X2 + 1 ⫻ X1 + 1 ⫻ X0., The division performed in Example 2.28 becomes the polynomial operation:, X6 + X3 + X + 1, ᎏᎏ, ., X3 + X2 + X +1, Calculation and use of CRC, , With that long preamble behind us, we can now proceed to show how CRCs are constructed. We will do this, for example: 1. Let the information byte I = 10010112. (Any number of bytes can be used to form a message block.) 2. The sender and receiver agree on an arbitrary binary pattern, say P = 10112., (Patterns that start and end with 1 work best.)


Page 107:
76, , Chapter 2 / Representation of data in computer systems, , 3. Shift I to the left by one less than the number of bits in P, giving a new I =, 10010110002., 4. Using I as a dividend and P as divisor, perform division modulo 2 (as shown in Example 2.28). We ignore the quotient and note that the remainder is 1002. The remainder is the actual CRC checksum. message receiver using the reverse process., Only now P divides M exactly:, 1010100, 1011 ) 1001011100, 1011, 001001, 1011, 0010, 001011, 1011, 0000, A non-zero remainder indicates that an error occurred in the transmission of M. This method works best when using a large prime polynomial. There are four standard polynomials widely used for this purpose:, •, •, •, •, , CRC-CCITT (ITU-T): X16 + X12 + X5 + 1, CRC-12: X12 + X11 + X3 + X2 + X +1, CRC-16 (ANSI): X16 + X15 + X2 + 1, CRC-32: X32 + X26 + X23 + X22 + X16 + X12 + X11 + X10 + X8 + X7 + X5 + X4 +, X + 1 , , CRC-CCITT, CRC-12, and CRC-16 operate on byte pairs; CRC-32 uses four bytes, which is appropriate for systems that operate with 32-bit words. CRCs using these polynomials have been shown to be able to detect more than 99.8% of all single-bit errors. CRCs can be effectively implemented using lookup tables instead of calculating the remainder with each byte. The remainder generated by each possible input bit pattern can be "written" directly to the communications and storage electronics. The remainder can be retrieved by a 1 cycle seek vs. a 16 or 32 cycle divide operation. Clearly, the trade-off is in speed versus the cost of more complex control circuitry.


Page 108:
2.8 / Error detection and correction, , 2.8.2, , 77, , Hamming codes, Data communication channels are more error prone and more error tolerant than disk systems. In data communications it is enough to have the ability to detect errors. If a communication device determines that a message contains an incorrect bit, it simply requests retransmission. Storage and memory systems don't have that luxury. Sometimes a disc may be the only repository of a financial transaction or other collection of non-playable real-time data. Therefore, memory and storage devices must be capable of not only detecting but also correcting a reasonable number of errors. Error recovery coding has been intensively studied over the last century. Hamming code. Hamming codes are an adaptation of the parity concept, where error detection and correction capabilities increase proportionally to the number of parity bits added to a word of information. Hamming codes are used in situations where random errors are likely to occur. With random errors, we assume that each bit loss has a fixed probability of occurring independently of other bit losses. It is common for computer memory to experience such errors, so in our discussion below, we present Hamming codes in the context of detecting and correcting memory bit errors. We mentioned that Hamming codes use parity bits, also called check bits or redundant bits. The memory word itself consists of m bits, but r redundant bits are added to allow error detection and/or correction. Therefore, the final word, called the code word, is a unit of n bits containing m data bits and r control bits. There is a single codeword consisting of n = m + r bits for each data word as follows:, m bits, , r bits, , The number of bit positions in which two codewords differ is called Hamming distance of these two codewords. For example, if we have the following two codewords: 1 0 0 0 1 0 0 1, 1 0 1 1 0 0 0 1, * * *, we see that they differ in 3-bit positions, so the Hamming distance of these two keywords is 3. (Note that we haven't discussed creating keywords yet; we will shortly.) The Hamming distance between two keywords is important in the context of error detection. If two codewords are separated by a single-bit Hamming distance d, d, errors are needed to convert one codeword to another, which implies this type


Page 109:
78, , Chapter 2 / Representation of data in computer systems, the error would not be detected. Therefore, if we want to create a code that is guaranteed to detect all single-bit errors (an error in only 1 bit), all code pairs, words must have a Hamming distance of at least 2. If an n-bit word is not recognized as a legal code word, it is considered an error. Given an algorithm to calculate the check bits, it is possible to build a complete list of legal keywords. The smallest Hamming distance found between all keyword pairs in this code is called the minimum Hamming distance for the code. and ability to correct. Briefly stated, for any codeword X to be received as another valid codeword Y, at least D (min.) errors must occur in X. Therefore, to detect k (or fewer) single-bit errors, the code must have the Hamming Distance of D(min) = k + 1. Hamming codes can always detect D(min) ⫺ 1 errors and correct (D(min) ⫺ 1)/2 errors.1 Therefore , the Hamming distance from a code to must be minus 2k + 1 in order to correct k errors. Codewords are constructed from information words using r parity bits. Before continuing with the discussion of error detection and correction, let's consider a simple example. The most common error detection uses a single parity bit, added to the data (remember the discussion on ASCII character representation). A single bit error in any bit of the keyword results in incorrect parity (the keyword must be even). With 2 data bits, we have a total of 4 possible words. Here we list the data word, its corresponding parity bit, and the resulting code word for each of these 4 possible words:, Data, Word, , Parity, Bit , , Code, Word, , 00, , 0, , 000 , , 01, , 1, , 011, , 10, , 1, , 101, , 11, , 0, , 110, , The resulting codewords have 3 bits However, using 3 bits allows 8 bit patterns different as follows (valid codewords are marked with an *): *, , 111, , the brackets denote the integer floor function, which is the largest integer less than the included quantity. For example, 8.3 = 8 and 8.9 = 8.


Page 110:
2.8 / Troubleshooting, , 79, , If the keyword 001 is found, it is invalid and therefore indicates that an error occurred somewhere in the keyword. For example, suppose the correct code, the word to be stored in memory is 011, but an error produces 001. This error can be detected, but it cannot be corrected. It is impossible to determine exactly how many bits have been inverted and exactly which ones are in error. Error-correcting codes require more than a single parity bit, as we'll see in the following discussion. What happens in the above example if a valid codeword is subject to two-bit errors? For example, suppose the keyword 011 becomes 000. This error is not caught. If you examine the code in the example above, you'll see that D(min) is 2, which implies that this code is guaranteed to catch only single-bit errors. a code depends on D(min), and from the point of view of error detection, we saw this relationship in Example 2.29. Error correction requires that the code contain additional redundant bits to ensure a minimum Hamming distance D(min) = 2k + 1 if the code is to detect and correct k errors. This Hamming distance ensures that all legal keywords are far enough apart that even with k changes, the original invalid keyword is closest to a single valid keyword. This is important, since the method used in error correction is to transform the invalid codeword into a valid codeword, a word that differs by the least number of bits. This idea is illustrated in Example 2.30., EXAMPLE 2.30 Suppose we have the following code (don't worry now, learn how this code was generated; we'll cover this problem shortly): 0, 0, 1, 1, , 0, 1 , 0, 1, , 0, 0, 1, 1, , 0, 1, 1, 0, , 0, 1, 0, 1, , First, let's determine D(min). Examining all possible keyword pairs, we find that the minimum Hamming distance D(min) = 3. Therefore, this code can detect up to two errors and correct a single bit error. How is the correction done? Suppose we read the invalid codeword 10000. There must be at least one error because it doesn't match any of the valid codewords. We now determine the Hamming distance between the observed keyword and each legal keyword: it differs by 1 bit from the first keyword, 4 from the second, 2 from the third, and 3 from the last, resulting in a vector of difference of [1, 4,2,3]. To perform the correction with this code, we automatically corrected to the closest legal keyword to the observed word, resulting in a correction to 00000. Note that this "correction


Page 111:
80, , Chapter 2 / Representation of data in computer systems, , tion” is not necessarily correct. We assume that the minimum number of possible errors occurred, ie 1. It is possible that the original keyword was 10110 and was changed to 10000 when two errors occurred. Suppose that two errors actually occurred. For example, suppose we read the invalid keyword 11000. If we calculate the distance vector of [2,3,3,2], we see that there is no "closest" keyword and we cannot make the correction. The minimum Hamming distance of three allows for the correction of a single error and cannot guarantee correctness, as evidenced in this example, if more than one error occurs. , codes, but did not provide details on how the codes are generated. There, there are many methods that are used for code generation; perhaps one of the most intuitive is the Hamming algorithm for code design, which we now present. Before explaining the actual steps of the algorithm, we provide some background. Suppose we want to design a code with words consisting of m data bits and r check bits that allow one-bit errors to be corrected. This implies that there are 2 million legal keywords, each with a unique combination of check bits. Since we're focused on single-bit errors, let's look at the set of invalid keywords that are within a distance of 1 from all legal keywords. Each valid code word has n bits, and an error can occur in any of those n positions. Thus, each valid codeword has n illegal codewords at a distance of 1. Thus, if we are interested in each legal codeword and each invalid codeword consisting of an error, we have associated patterns of n + 1 bit to each code word (1, legal word n ​​illegal words). Since each codeword consists of n bits, where n = , m + r, there are 2n possible bit patterns in all. This results in the following inequality: (n + 1) ⫻ 2m ≤ 2n, where n + 1 is the number of bit patterns per codeword, 2m is the number of valid codewords, and 2n is the total number of possible bit patterns. Since n = m + r, we can rewrite the inequality as:, (m + r + 1) ⫻ 2m ≤ 2m⫹r, or, (m + r + 1) ≤ 2r, This inequality is important because it specifies the lower bound on the number of check bits needed (we always use as few check bits as possible) to build a code with m data bits and r check bits that corrects all single-bit errors. Suppose we have data words of length m = 4. Then:, (4 + r + 1) ≤ 2r


Page 112:
2.8 / Error detection and correction, , 81, , which implies that r must be greater than or equal to 3. We choose r = 3. This means that, to build a code with 4-bit data words that must correct errors of a single bit, you need to add 3 check bits. Hamming's algorithm provides a direct method for designing codes to correct single-bit errors. To create error-correcting codes for memory words of any size, we follow these steps: 1. Determine the number of check bits, r, needed for the code, then number the n bits (where n = m + r), from right to left. , starting with 1 (not 0), 2. Each bit whose bit number is a power of 2 is a parity bit; the rest are data bits. , parity bits b1, b2, . 🇧🇷 🇧🇷 , bj such that b1 + b2 + . 🇧🇷 🇧🇷 + bj = b. (Where “+” indicates the sum of modulo 2). We now present an example to illustrate these steps and the actual process of debugging. EXAMPLE 2.31-bit ASCII character K. (The high-order bit will be zero.) Induce a single bit error, then indicate how to locate the error. First we determine the keyword for K. Step 1: Determine the number of check bits required, add those bits to the data, bits and number all n bits. Since m = 8, we have: (8 + r + 1) ≤ 2r, which implies that r must be greater than or equal to 4 . We choose r = 4., Step 2: Number the n bits from right to left, starting with 1, giving:, 12 11 10 9, , 8, , 7, , 6, , 5, , 4, , 3 , , 2, , 1, , Parity bits are marked with boxes. Step 3: Assign parity bits to check the different bit positions. To perform this step, we first write all the bit positions as sums of the numbers that are powers of 2 :, 1=1, 2=2, 3=1+2, 4=4, , 5=1+4, 6=2+4, 7=1+2+4, 8 =8 , , 9=1+ 8, 10 = 2 + 8, 11 = 1 + 2 + 8, 12 = 4 + 8, , The number 1 contributes to 1, 3, 5, 7, 9, and 11, so this parity bit will reflect the parity of the bits in those positions. Likewise, 2 contributes to 2, 3, 6, 7, 10, and 11, so the parity bit in position 2 reflects the parity of that set of bits. Bit 4 provides parity for bits 4, 5, 6, 7, and 12, and bit 8 provides parity for bits 8, 9, 10, 11,


Page 113:
82, , Chapter 2 / Data Representation in Computer Systems, and 12. If we write the data bits in the unboxed blanks and then add the parity bits, we get the following keyword as a result: , 0 1 0 0 , 12 11 10 9, , 1, 8, , 1, 7, , 0, 6, , 1, 5, , 0, 4, , 1, 3, , 1, 2, , 0, 1 , , Therefore , the keyword for K is 010011010110. We introduce an error at bit position b9, resulting in the keyword 010111010110. If we use the parity bits to check the various sets of bits, we find the following: Bit 1 check 1, 3, 5, 7, 9 and 11: With even parity it gives an error. Bit 2 checks 2, 3, 6, 7, 10, and 11: OK. Bit 4 checks 4, 5, 6, 7, and 12: OK. , Bit 8 checks 8, 9, 10, 11 and 12: This produces an error., Parity bits 1 and 8 show errors. These two parity bits check bit 9 and 11, so the single bit error must be either bit 9 or bit 11. However, since bit 2 checks bit 11 and indicates that no error occurred in the subset of bits it checks for, the error must be occurring at bit 9. (We know this because we created the error; however, note that even if we have no idea where the error is, using this method allows us to determine the position of the error and correct it by simply inverting the bit.), Because of the way the parity bits are positioned, an easier method of detecting and correcting the error bit is to add the positions of the parity bits that indicate an error . We found that parity bits 1 and 8 produced an error, and 1 + 8 = 9, that is, exactly where the error occurred. In the next chapter, you'll see how easy it is to implement Hamming code, using simple binary circuits. . Due to its simplicity, Hamming code protection can be added cost-effectively with minimal performance impact. Fixed magnetic disk drives have error rates on the order of 1 bit in 100 million. The 3-bit Hamming code we just studied will easily correct this type of error. However, Hamming codes are useless in situations where multiple adjacent bits are likely to be damaged. These types of errors are called burst errors. Due to their exposure to rough handling and environmental stress, burst errors are common with removable media such as magnetic tape and compact discs. It operates at the block level, unlike a Hamming code, which operates at the bit level. A Reed-Soloman (RS) code can be thought of as a CRC that operates on entire characters instead of just a few bits. RS codes, like CRCs, are systematic: parity bytes are added to a block of information bytes. The RS(n, k) codes are defined using the following parameters:


Page 114:
Chapter Summary, , 83, , • s = The number of bits in a character (or “symbol”), • k = The number of s-bit characters that make up the data block, • n = The number of bits in the keyword, (n ⫺ k), RS(n, k) can correct ᎏ errors in k bytes of data., 2, The popular RS(255, 223) code therefore uses 223 8-bit data bytes and 32 bytes of syndrome to form 255-byte codewords. It will correct up to 16 wrong bytes in the information block. The generating polynomial of a Reed-Soloman code is given by a polynomial, defined on an abstract mathematical structure called the Galois field. (A lucid discussion of Galois mathematics would take us a long way. See the references at the end of the chapter.) The Reed-Soloman generator polynomial is: g(x) = (x ⫺ ai)(x ⫺ ai⫹1 ) . 🇧🇷 🇧🇷 (x ⫺ ai⫹2t), where t = n ⫺ k and x is an integer byte (or symbol) and g(x) operates on the field GF(2s). (Note: this polynomial expands over the Galois field, which is considerably different from the integer fields used in common algebra.) The n-byte RS keyword is computed using the equation: c(x) = g(x) ⫻ i(x), where i(x) is the information block. Despite the overwhelming algebra behind them, Reed-Soloman error correction algorithms lend themselves well to implementation in computer hardware. They are implemented in high-performance disk drives for mainframe computers, as well as compact discs used for music and data storage. These implementations will be described in Chapter 7., , CHAPTER SUMMARY, and the fundamentals of data representation and numerical operations in digital computers will be presented. You should master the techniques described for base conversion and memorize smaller binary and hexadecimal numbers. This knowledge will benefit him as he studies the rest of this book. His knowledge of hexadecimal encoding will be useful if he needs to read a core (memory) dump after a system crash or if he does serious work in the field of data communications. point numbers can produce significant errors, where small errors can accumulate in iterative processes. There are several numerical techniques that can be used to control for such errors. These techniques deserve detailed study, but are beyond the scope of this book. You learned that most computers use ASCII or EBCDIC to represent characters. There is usually little value in memorizing any of these codes in their entirety, but if you work with them often, you will find that you are learning several "key values" from which you can calculate most of the others you need. ., , C


Page 115:
84, , Chapter 2 / Data Representation in Computer Systems, , Unicode is the standard character set used by Java and recent versions of Windows. It is likely to replace EBCDIC and ASCII as the basic method of representing characters in computer systems; however, the older codes will be with us for the foreseeable future due to their economy and ubiquity. Your knowledge of how bytes are stored on disk and tape will help you understand many of the issues and problems associated with data storage. Your familiarity with error handling methods will help you in your study of data storage and communication. You will learn more about data storage in Chapter 7. Chapter 11 introduces topics related to data communication. Error detection and correction codes are used in virtually every facet of computer technology. If necessary, your understanding of the various error handling methods will help you make informed decisions among the many options available. The method you choose will depend on many factors, including computational overhead and the capacity of the storage and streaming media available to you. FOR FURTHER READING, a brief overview of early mathematics in Western civilization can be found in Bunt (1988). A delightful and comprehensive discussion of the evolution of number systems and computerized arithmetic is presented by Knuth (1998) in Volume 2 of his Algorithm series. . (Every computer scientist should have a set of Knuth books.) A definitive description of floating-point arithmetic can be found in Goldberg, (1991). Schwarz et al. (1999) describe how IBM System/390 performs floating point operations in both the old and IEEE standard forms. Soderquist and Leeser (1996) provide an excellent and detailed discussion of problems involving floating-point division and square roots. The Unicode Standard, version 3.0 (2000). The website of the International Organization for Standardization can be found at www.iso.ch. You will be surprised at the extent of the influence of this group. A similar treasure trove of information can be found on the American National Standards Institute website: www.ansi.org. The best information related to data encoding for data storage can be found in electrical engineering books. They offer a fascinating insight into the behavior of physical media and how that behavior is exploited through various coding methods. We find the book by Mee and Daniel (1988) particularly useful. Once you have mastered the ideas presented in Chapter 3, you will enjoy reading Arazi's (1988) book. This well written book shows how error detection and correction is accomplished using simple digital circuitry. The appendix to this book provides a remarkably lucid discussion of Galois field arithmetic as used in Reed-Soloman codes.


Page 116:
Review of Essential Terms and Concepts, , 85, , If you prefer a rigorous and exhaustive study of error correction theory, Pretzel's (1992) book is an excellent starting point. The text is accessible, well written and complete. Detailed discussions of Galois fields can be found in the (cheap!) books by Artin (1998) and Warner (1990). Warner's much larger book is a clear, written, and comprehensive introduction to the concepts of abstract algebra. R, the study of abstract algebra will be useful if you delve into the study of mathematical cryptography, a rapidly growing area of ​​interest in computer science. REFERENCES, Arazi, Benjamin. A common sense approach to error correcting code theory. Cambridge, MA: The MIT Press, 1988., Artin, Emil. Galois theory. New York: Dover Publications, 1998., Bunt, Lucas N. H., Jones, Phillip S. and Bedient, Jack D. The Historical Roots of Elementary Mathematics. New York: Dover Publications, 1988., Goldberg, David. "What Every Computer Scientist Should Know About Floating Point Arithmetic". ACM Computing Surveys 23:1 Mar 1991. pp. 5–47., Knuth, Donald E. The Art of Computer Programming, 3rd ed. Reading, MA: Addison-Wesley, 1998., Mee, C. Denis and Daniel, Eric D. Magnetic Recording, Volume II: Computer Data Storage. New York: McGraw-Hill, 1988., Pretzel, Oliver. Error correction codes and finite fields. New York: Oxford University Press, 1992., Schwartz, Eric M., Smith, Ronald M., and Krygowski, Christopher A. "The S/390 G5 Floating-Point, Unit Supporting Hex and Binary Architectures." IEEE Proceedings of the 14th Symposium, on Computer Arithmetic. 1999. pp. 258–265., Soderquist, Peter and Leeser, Miriam. "Area and performance tradeoffs in floating-point divisions and square root implementations". Computer surveys ACM 28:3. September 1996. pp. 518–564., The Unicode Consortium. The Unicode Standard, version 3.0. Reading, MA: Addison-Wesley, 2000., Warner, Seth. modern algebra. New York: Dover Publications, 1990., , REVIEW OF ESSENTIAL TERMS AND CONCEPTS, The word bit is a contraction of which two words?, Explain how the terms bit, byte, nibble, and word are related., Why binary and decimal are called positional number systems?, What is a root?, How many of the “numbers to remember” (in all bases) in Figure 2.1 can you remember?, 6. What does overflow mean in the context of unsigned numbers?, 7 . Name the three ways that signed integers can be represented in digital computers and explain the differences., 8. Which of the three integer representations is most often used in digital computers, systems?, 1., 2. 3. 4. 5 .


Page 117:
86, , Chapter 2 / Representation of data in computer systems, 9. How are complementary systems like the odometer of a bicycle?, 10. Do you think that the double tap is an easier method than the other methods of binary conversion to decimal explained in this chapter? Why?, 11. With reference to the previous question, what are the disadvantages of the other two conversion methods?, 12. What is overflow and how can it be detected? How are unsigned numbers overflowing different from signed numbers overflowing? 13. If a computer is only capable of handling and storing whole numbers, what difficulties are there? How are these difficulties overcome?, 14. What are the three component parts of a floating-point number?, 15. What is a biased exponent and what efficiencies can it provide?, 16. What is normalization and why it is necessary? , 17. Why is there always some degree of error in floating-point arithmetic when performed with a binary digital computer? 18. How many bits does a double precision number have according to the IEEE-754 floating point standard? 🇧🇷 What is EBCDIC and how is it related to BCD? 20. What is ASCII and how did it originate? 21. How many bits does a Unicode character require? 22. Why was Unicode created? Is non-return coding avoided as a method of writing data to a magnetic disk? Cyclic redundancy checks?, 27. What is systematic error detection?, 28. What is a Hamming code?, 29. What is the Hamming distance and why is it important? What does the minimum Hamming distance mean? 30. How is the number of redundant bits needed for the code related to the number of data bits? 31. What is a burst error? 32. Name a burst detection method error that can compensate for burst errors., , EXERCISES, ◆, , 1. Perform the following base conversions using subtraction or division-remainder:, ◆ a) 458, 10 = ________ 3, ◆ b) 677, 10 = ________ 5


Page 118:
Exercises, , 87, , c) 151810 = _______ 7, d) 440110 = _______ 9, 2. Perform the following base conversions using subtraction or remainder division:, ◆, ◆, , , a) 58810 = _________ 3, b ) 225410 = ________ 5, c) 65210 = ________ 7, d) 310410 = ________ 9, ◆, , 3. Convert the following decimal fractions to binary with a maximum of six places to the right of the binary point:, ◆, ◆, ◆ , , a) 26.78125, b) 194.03125, c) 298.796875, , d) 16.1240234375, 4. Convert the following decimal fractions to binary with up to six places to the right of the binary point:, ◆, , a) 25.84375, b ) 57.55, c) 80.90625, d) 84.874023, 5. Represent the following decimal numbers in binary using 8-bit signed magnitude, one's complement, and two's complement: , ◆, ◆, , a) 77, b) ⫺ 42 , c) 119, d) ⫺107, , 6. Using a 3-bit “word,” list all possible signed binary numbers and their decimal equivalents that can be represented by: a) Signed magnitude, b) Complement of ones, c) Two's complement, 7. Uses Using a 4-bit “word”, list all possible signed binary numbers and their decimal equivalents that can be represented by: a) Signed magnitude b) One's complement c ) Two's complement 8. From the results of the previous two questions, generalize the range of values ​​(in decimal) that can be represented in any x number of bits using:, a) Signed magnitude


Page 119:
88, , Chapter 2 / Representation of data in computer systems, b) One's complement, c) Two's complement, 9. Given a (very) small computer with a word size of 6 bits, what are the largest negative numbers? small and the largest positive numbers? Can this computer represent each of the following representations?, ◆, , a) One's complement, b) Two's complement, , 10. You came across an unknown civilization while navigating the world. The people, who call themselves Zebronians, do math using 40 separate characters (probably because there are 40 stripes on a zebra). They would love to use computers, but they would need a computer to do Zebronian math, which would mean a computer that could represent all 40 characters. You are a computer designer and you decide to help them. You decide it's best to use BCZ, binary-encoded Zebronian (which is just like BCD except it encodes Zebronian, not Decimal). How many bits will it take to represent each character if you want to use the minimum number of bits? 111, c), , 11010, ⫻ 1100, , 12. Perform the following binary multiplications:, a), , 1011, ⫻ 101, , b), , 10011, ⫻ 1011, , c), , 11010, ⫻ 1011, , 13. Perform the following binary divisions:, ◆, , a) 101101 ÷ 101, b) 10000001 ÷ 101, c) 1001010010 ÷ 1011, , 14. Perform the following binary divisions:, a) 11111101 ÷ 1011, b) 1010101 ÷ 1001, c) 1001111100 ÷ 1100


Page 120:
Exercises, ◆, , 89, , 15. Use the double-tap method to convert 102123 directly to decimal. (Hint: you have to change the multiplier.), 16. Using the signed-magnitude representation, complete the following operations:, + 0 + (⫺0) =, (⫺0) + 0 =, 0+0=, ( ⫺ 0) + (⫺0) =, , ◆, , 17. Suppose a computer uses 4-bit one's complement numbers. Ignoring overflows, what value will be stored in variable j after the following pseudocode routine completes?, 0 → j, // Store 0 in j., -3 → k // Store -3 in k., while k ≠ 0, j = j + 1, k = k - 1, end while, , 18. If the floating-point number stored in a given system has a sign bit, a 3-bit exponent, and a significance of 4 bits: a) What is the largest positive number and the smallest negative number that can be stored in this system if storage is normalized? (Assume there are no implied bits, no bias, exponents use two's complement notation, and exponents of all 0's and all 1's are allowed.) b) What bias should be used on the exponent if we prefer all exponents to be non-negative? ? Why would you choose this bias?, ◆, , 19. Using the template from the previous question, including the chosen bias, add the following floating-point numbers and express your answer using the same addend and addend notation:, 0 , 0 , , 1, 1, , 1, 0, , 1, 1, , 1, 1, , 0, 0, , 0, 0, , 0, 1, , Compute the relative error, if any, in your answer to previous question., 20. Suppose we are using the simple model for the floating point representation as given in this book (the representation uses a 14-bit format, 5 bits for the exponent with a bias of 16, a mantissa of 8 normalized bits and a single sign bit for the number):, a) Show how the computer would represent the numbers 100.0 and 0.25 using this floating-point format., b) Show how the computer would add the two floating-point numbers in the part a by changing one of the numbers so that both are expressed using the same power of 2., c) Show how the computer would represent the sum in pairs te b using the given floating point representation n. What decimal value for the sum is the computer actually storing? explain.


Page 121:
90, , Chapter 2 / Data representation in computer systems, 21. What causes division underflow and what can be done about it?, 22. Why do we usually store floating point numbers in normalized form? What is the advantage of using a bias instead of adding a sign bit to the exponent? described in the text (the representation uses a 14-bit format, 5-bit for the exponent, with a bias of 16, an 8-bit normalized mantissa, and a single-bit sign for the number), perform the following calculations, taking into account Pay close attention to the order of operations. What can you say about the algebraic properties of floating point arithmetic in our finite model? Do you think this algebraic anomaly holds for both multiplication and addition?, b + (a + c) =, (b + a) + c =, 24. a) Given that the ASCII code for A is 1000001, what is the ASCII code What is the EBCDIC code for J?, b) Given that the EBCDIC code for A is 1100 0001, what is the EBCDIC code for J?, ◆, , 25. Assume a 24-bit word in a computer. In these 24 bits, we want to represent the value 295., a) If our computer uses even parity, how would the computer represent the decimal value 295?, ◆ b) If our computer uses 8-bit ASCII and even parity, how would it would the computer represent the decimal value 295?string 295?, ◆ c) If our computer uses compressed BCD, how would the computer represent the number +295?, 26. Decode the following ASCII message, assuming 7 bits, no parity ASCII characters : , ◆, , 1001010 1001111 1001000 1001110 0100000 1000100 1000101, ◆, , 27. Why would a system designer want to make Unicode the default character set for their new system? What reason(s) could you give for not using Unicode as the default? 28. Write the 7-bit ASCII code for character 4 using the following encoding: return to zero-invert, c) Manchester code, d) Frequency modulation, e) Modified frequency modulation, f) Limited run length (assume that 1 is "high" and 0 is "low"), 29. Why is NRZ encoding rarely used to write data to magnetic media? Make a list of all the legal keywords in this code. What is the Hamming distance of your code? 31. Are Hamming error correction codes systematic? explain.


Page 122:
Exercícios, ◆ ,, 91 ,, 32. Calcule a distância de hamming do seguinte código:, 0011010010111100, 00000111110001111 ,, 0010010110101101, 000101101001111111111111111111111111111111111111111111111111111111111111111111111111. a) Quantos bits de paridade são necessários?, b) Supondo estamos usando o algoritmo de Hamming apresentado neste chapter for, design our error-correcting code, find the code word to represent the 10-bit information word: 1001100110., ◆ 35. Suppose we are working with an error-correcting code that will allow all bits of error corrected to memory words of length 7. We have already calculated that we need 4 check bits and the length of all keywords will be 11. Keywords are created according to Hamm's algorithm presented in the text. Now we get the following keyword: , , 36., 37., 38., ◆, , 39., ◆, ◆, ◆, ◆, , , 10101011110, Assuming even parity, is this a legal keyword? If not, according to our error-correcting code, where is the error? Repeat Exercise 35 using the following keyword: 01111010101, list two ways in which Reed-Soloman coding differs from Hamming coding . When would you choose a CRC code instead of a Hamming code? A Hamming code, over a CRC?, Find the quotients and remainders of the following modulo 2 division problems., a), b), c), d), , 10101112 ÷ 11012, 10111112 ÷ 111012, 10110011012 ÷ 101012 , 1110101112 ÷ 101112


Page 123:
92, , Chapter 2 / Data Representation in Computer Systems, 40. Find the quotients and remainders of the following modulo 2 division problems., a) 11110102 ÷ 10112, b) 10101012 ÷ 11002, c) 11011010112 ÷ 101012, d) 11111010112 ÷ 1011012, ◆ 41. Using the CRC polynomial 1011, compute the CRC keyword for the data, word, 1011001. Check the division performed at the receiver., 42. Using the CRC polynomial 1101, compute the CRC keyword for the data , word, 01001101. Verify the division made in the receiver., *43. Choose an architecture (such as 80486, Pentium, Pentium IV, SPARC, Alpha, or MIPS). Do research to discover how your architecture addresses the concepts presented in this chapter. For example, what representation do you use for negative values? What character codes does it support?


Page 124:
“I always love that word, Boolean.”, , —Claude Shannon, , CHAPTER, , 3, 3.1, , Boolean Algebra and Digital Logic, INTRODUCTION, George Boole lives in England at the time when Abraham Lincoln was receiving, , Involved in politics in the U.S. Boole was a mathematician and logician who developed ways to express logical processes using algebraic symbols, thus creating a branch of mathematics known as symbolic logic, or Boolean algebra. It was not until years later that John Vincent Atanasoff applied Boolean algebra to computing. He was trying to build a machine based on the same technology used by Pascal and Babbage, and he wanted to use that machine to solve linear algebraic equations. After struggling with repeated failures, Atanasoff became so frustrated that he decided to give it a try. He lived in Ames, Iowa at the time, but found himself 200 miles away in Illinois before he suddenly realized how far he had driven. Atanasoff had no intention of driving that far, but since he was in Illinois, where he could legally buy a drink at a tavern, he sat down, ordered a scotch, and realized that he had come a long way to get a drink. beverage. (Atanasoff assured the author that it was not the drink that led to the following revelations; in fact, he left, with the drink intact on the table.) Drawing on his expertise in physics and mathematics and focusing on the flaws in his previous computing machine, he made four much-needed breakthroughs in the machine's new design. He would use electricity instead of mechanical movements (vacuum tubes would allow him to do this). , base 10 (this directly correlated to which switches were "on" or "off"), resulting in a digital rather than analog machine., 93


Page 125:
94, , Chapter 3 / Boolean Algebra and Digital Logic, , I would use capacitors (capacitors) for memory because they store electrical charges with a regenerative process to prevent energy leakage., The calculations would be done by what Atanasoff called “direct logical action” (which is essentially equivalent to Boolean algebra) and not by enumeration as all previous computing machines had done. It should be noted that, at the time, Atanasoff did not recognize the application of Boolean algebra to his problem and that he developed his own direct logic, action by trial and error. He did not know that in 1938, Claude Shannon showed that two-valued Boolean algebra could describe the operation of two-valued switching electrical circuits. Today, we see the importance of the application of Boolean algebra in the design of modern computer systems. It is for this reason that we have included a chapter on Boolean logic and its relationship to digital computers. This chapter contains a brief introduction to the fundamentals of logical design. It provides minimal coverage of Boolean algebra and the relationship of that algebra to logic gates and basic digital circuits. He should already be familiar with basic boolean operators from a previous programming class. So it is a fair question to ask why he should study this material in more detail. The relationship between Boolean logic and the actual physical components of any computer system is very close, as you will see in this chapter. As a computer scientist, he may never need to design digital circuits or other physical components; in fact, this chapter will not prepare you to design such elements. Rather, it provides enough information for you to understand the basic motivation behind the computer, the design, and the implementation. Understanding how Boolean logic affects the design of various computer system components will allow you to use, from a programming perspective, any computer system more efficiently. For the interested reader, there are many resources listed at the end of the chapter to allow further investigation of these topics. false, although it can be any pair of values. Since computers are built as collections of switches that are "on" or "off," Boolean algebra is a very natural way to represent digital information. In reality, digital circuits use high and low voltages, but for our level of understanding, 0 and 1 will suffice. It is common to interpret the digital value 0 as false and the digital value 1 as true., , 3.2.1, , Boolean Expressions, In addition to binary objects, Boolean algebra also has operations that can be performed on these objects, or variables. The combination of the variables and the operators results in Boolean expressions. A boolean function typically takes one or more inputs, values, and produces a result, based on those input values, in the range {0,1}. Three common Boolean operators are AND, OR, and NOT. To better understand these operators, we need a mechanism that allows us to examine their behavior.


Page 126:
3.2 / Boolean Algebra, , Inputs, , x, 0, 0, 1, 1, , TABLE 3.1, , y, 0, 1, 0, 1, , Inputs, , Outputs, , x, 0, 0, 1, 1 , , xy, 0, 0, 0, 1, , The truth table, for AND, , TABLE 3.2, , y, 0, 1, 0, 1, , 95, , Outputs, , x+y, 0, 1, 1, 1, , The truth table, for OR, , iors. A Boolean operator can be fully described using a table that lists the inputs, all possible values ​​for those inputs, and the resulting values ​​of the operation for all possible combinations of those inputs. This table is called the real table. A truth table shows the relationship, in tabular form, between the input values ​​and the result of a specific Boolean operator or function on the input variables. Let's look at the Boolean operators AND, OR, and NOT to see how each is represented, using Boolean algebra and truth tables. The logical operator AND is usually represented by a period or no symbol at all. For example, the Boolean expression xy is equivalent to the expression x · y and is read “x and y”. The expression xy is often called a Boolean product. The behavior of this operator is characterized by the truth table shown in Table 3.1. The result of the expression xy is 1 only when both inputs are 1, and 0 otherwise. Each table row represents a different Boolean expression, and all possible combinations of values ​​for x and y are represented by the table rows. The OR Boolean operator is usually represented by a plus sign. Therefore, the expression x + y is read "x or y". The result of x + y is 0 only when both of its input values ​​are 0. The expression x + y is often called a Boolean sum. The truth table for OR is shown in Table 3.2. The remaining logical operator, NOT, is usually represented by a superpoint or a prime number. Therefore, both x苶 and x⬘ are read as “NOT x”. In fact, the table for NOT is shown in Table 3.3. We now understand that Boolean algebra deals with binary variables and logical operations on these variables. Combining these two concepts, we can examine Boolean expressions made up of Boolean variables and multiple logical operators. For example, the Boolean function:, F(x, y, z) = x + y苶z, , TABLE 3.3, , Inputs, , Outputs, , x, 0, 1, , 1, , x, 0, , 0 truth table for NOT


Page 127:
96, , Chapter 3 / Boolean Algebra and Digital Logic, Inputs, x y z, 0 0 0, 0 0 1, 0 1 0, 0 1 1, 1 0 0, 1 0 1, 1 1 0, 1 1 1, , TABLE 3.4, , Outputs, , y, 1, 1, 0, 0, 1, 1, 0, 0, , yz, 0, 1, 0, 0, 0, 1, 0, 0, , x + yz = F, 0, 1, 0, 0, 1, 1, 1, 1, , The truth table for F(x,y,z) = x + y–z, , is represented by a Boolean expression involving the three Boolean variables x, y, and z and the logical operators OR, NOT, and AND. How do we know which operator to apply first? The precedence rules for Boolean operators give NOT precedence at the top, followed by AND, and then OR. For our function F above, we would first negate y, then AND 苶y y z, and finally OR this result with x. We can also use a truth table to represent this expression. It is often helpful, when creating a truth table for a more complex function like this, to build the table to represent different parts of the function, one column at a time, until the final function can be evaluated. The truth table for our function F is shown in Table 3.4. The last column of the truth table gives the values ​​of the function for all possible combinations of x, y, and z. We notice that the real truth table for our function F only consists of the first three columns and the last column. The shaded columns show the intermediate steps needed to arrive at our final answer. Creating truth tables in this way makes it easy to evaluate the function for all possible combinations of input values. Recall from algebra that an expression like 2x + 6x is not in its simplest form; can be reduced (represented by fewer terms or simpler terms) to 8x. Boolean expressions can also be simplified, but we need new identities or laws that apply to Boolean algebra instead of regular algebra. These identities, which apply to both simple Boolean variables and Boolean expressions, are listed in Table 3.5. Note that every relation (with the exception of the last) has an AND (or product) form and an OR (or sum) form. This is known as the principle of duality. The Law of Identity states that any Boolean variable ANDed with 1 or ORed with 0 simply results in the original variable. (1 is the identity element for AND; 0 is the identity element for OR). The Null Law states that any Boolean variable ANDed with 0 is 0 and a variable ORed with 1 is always 1. The Idempotent Law states that ANDing or ORing a variable with itself produces the original variable. The Reverse Law states that AND or OR of a variable with its complement


Page 128:
3.2 / Boolean Algebra, Name of Identity, , 1x = x, , Law of Null (or Dominance), , 0x = 0, , Idempotent Law, Inverse Law, Commutative Law, Associative Law, Distributive Law, Law of Absorption, Law DeMorgan's , Double Complement Law, , TABLE 3.5, , OR Form, , Y Form, , Identity Law, , 97, , 0+x = x, , 1+x = 1, xx = x, x +x = x, xx = 0, x +x = 1, xy = yx, x +y = y +x, (xy)z = x(yz), (x +y) +z = x +(y +z), x + yz = (x + y)(x +z) x (y +z) = xy +xz, x(x +y) = x, x +xy = x, (xy) = x +y, (x +y ) = xy, x =x, , Basic identities of Boolean algebra, produces the identity for that given operation. You should recognize the Commutative Law and the Associative Law of algebra. Boolean variables can be rearranged (changed) and regrouped (joined) without affecting the final result. The Distributive Law shows how OR is distributed over AND and vice versa. The Law of Absorption and DeMorgan's Law are not so obvious, but we can prove these identities by creating a truth table for the various expressions: If the right-hand side is equal to the left-hand side, the expressions represent the same, work, and give as result identical truth tables. Table 3.6 describes the truth table for the left and right sides of DeMorgan's law for AND. It is left as an exercise to test the validity of the remaining laws, in particular, the OR form of DeMorgan's Law and both forms of the Takeover Law. The Law of the Double Complement formalizes the idea of ​​the double negative, which provokes reprimands from secondary school teachers. The Double Complement Law can be useful both in digital circuits and in your life. For example, let x be the amount of money you have (assume a positive amount). If you don't have money, you have x苶. When an unreliable acquaintance borrows money, you can honestly say that you don't have the money. That is, x = (x 苶) even if he just got it. One of the most common mistakes beginners make when working with Boolean logic is to assume the following: (x苶y苶) = x苶y苶, Note that this is not a valid equality! DeMorgan's Law clearly states that the statement above is incorrect; however, it is a very easy mistake to make and should be avoided. 0, , 0, , 1, , 1, , 1, , 1, , 0, , 1, , 0, , 1, , 1, , 0, , 1, , 1, , 0, , 0, , 1 , , 0, , 1, , 1, , 1, , 1, , 1, , 0, , 0, , 0, , 0, , Truth tables for the AND form of DeMorgan's Law


Page 129:
98, , Chapter 3 / Boolean Algebra and Digital Logic, , 3.2.3, , Simplifying Boolean Expressions, The algebraic identities we studied in algebra class allow us to reduce algebraic expressions (such as 10x + 2y ⫺ x + 3y) to their simpler forms (9x + 5y). Boolean identities can be used to simplify Boolean expressions in a similar way. We apply these identities in the following examples. EXAMPLE 3.1 Suppose we have the function F(x,y) = xy + xy. Using the OR form of the Idempotent Law and treating the expression xy as a boolean variable, we simplify the original expression to xy. Therefore, F(x,y) = xy + xy = xy., EXAMPLE 3.2 Given the function F(x,y,z) = x苶yz + 苶xyz苶 + xz, we simplify it thus:, F(x ,y, z) = x苶yz + 苶xyz苶 + xz, = x苶y(z + 苶z) + xz, = x苶y(1) + xz, = 苶xy + xz, , (Distributive), (Inverse) , (Identity), , Sometimes the simplification is quite simple, as in the previous examples. However, using identities can be tricky, as we will see in the next example. EXAMPLE 3.3 Given the function F(x,y,z) = xy + x z + yz, we simplify as follows:, = xy + 苶xz + yz (1), = xy + x苶z + yz(x + x苶), = xy + 苶xz + (yz)x + (yz)xx苶, = xy + x苶z + x(yz) + x苶(zy), = xy + x苶z + (xy)z + (xx苶z)y, = xy + (xy)z + x苶z + (x苶xz)y, = xy(1 + z) + x苶z(1 + y), = xy(1) + x苶z(1), = xy + 苶xz, , (Identity), (Inverse), (Distributive), (Commutative), (Associative), ( Commutative), (Distributive), (Null), (Identity), , Example 3.3 illustrates what is commonly known as the Consensus Theorem. How did we know to insert additional terms to simplify the function? Unfortunately, there is no defined set of rules for using these identities to minimize a Boolean expression; it's just something that comes with experience. There are other methods that can be used to simplify Boolean expressions; we mention them later in this section.


Page 130:
3.2 / Boolean Algebra, Proof, (x +y)(x +y) = xx +xy +yx +yy, = 0+xy +yx +yy, = 0+xy +yx +y, = xy +yx + y , = y(x +x) +y, = y (1)+y, = y +y, =y, , TABLE 3.7, , 99, , Identity Name, Distributive Law, Inverse Law, Idempotent Law, Identity law, distributive law (and commutative law), inverse law, identity law, idempotent law, example using identities, we can also use these identities to prove Boolean equalities. Suppose we want to prove that (x + y)(x苶 + y) = y. The proof is given in Table 3.7. To test the equality of two Boolean expressions, you can also create truth tables for each and compare. If the truth tables are identical, the expressions are the same. We leave it as an exercise to find the truth tables for equality in, Table 3.7., 3.2.4, Complements, As you saw in Example 3.1, Boolean identities can be applied to Boolean expressions, not just Boolean variables (we treat xy as a boolean variable and then apply the Idempotent Law). The same goes for boolean operators. The most common Boolean operator applied to more complex Boolean expressions is the NOT operator, which returns the complement of the expression. Later we will see that there is a one-to-one correspondence between a Boolean function and its physical implementation through electronic circuits. It is often cheaper and less complicated to implement a function plugin than the function itself. If we implement the plugin, we must invert the final output to produce the original function; this is done with a simple NOT operation. Therefore, plugins are quite useful. To find the complement of a Boolean function, we use DeMorgan's Law. The OR form of this law states that (x苶苶+, 苶苶y苶) = x苶 y, 苶. We can easily extend this to three or more variables as follows: Given the function:, 苶苶y苶苶+, 苶苶z苶), F(x,y,z) = (x苶苶+, Let w = (x + y) So, F(x,y,z) = (w, 苶苶+, 苶苶z苶) = 苶, w z苶, Now applying DeMorgan's Law again, we get:, w,苶苶y苶) z = 苶x y苶 z苶 = F, 苶(x,y,z), 苶z苶 = (x苶苶+


Page 131:
100, , Chapter 3 / Boolean algebra and digital logic, x, , y, , z, , yz, , x+yz, , y +z, , x(y+z), , 0, , 0, , 0, , 0, , 1, , 1, , 0, , 0, , 0, , 1, , 0, , 1, , 1, , 0, , 0, , 1, , 0, , 1, , 1, , 0 , , 0, , 0, 1, , 1, 0, , 1, 0, , 0, 0, , 1, 0, , 1, 1, , 0, 1, , 1, , 0, , 1, , 0 , , 0, , 1, , 1, , 1, , 1, , 0, , 1, , 1, , 0, , 0, , 1, , 1, , 1, , 0, , 0, , 1, , 1, , TABLE 3.8 Representation of the truth table for a function and its complement, , Therefore, if F(x,y,z) = (x + y + z), then 苶, F(x,y,z ) = x苶yé ze. Applying the principle of duality, we see that (x苶y苶z苶) = 苶x + 苶y + 苶z. It seems that to find the complement of a Boolean expression we simply replace each variable with its complement (x is replaced by 苶x) and swap AND and OR. In fact, that's exactly what DeMorgan's Law tells us to do. For example, the complement of 苶x + yz苶 is x(y苶 + z). We have to add the parentheses, to ensure the correct precedence. You can verify that this simple rule for finding the complement of a Boolean expression is correct by examining the truth tables for the expression and its complement. The complement of any expression, when represented as a truth table, must have 0 for the output everywhere the original function has 1 and 1 where the original function has 0. Table 3.8 describes the truth tables for F (x,y,z) = x苶 + y z苶 and its complement, 苶, F(x,y,z) = x(y苶 + z). The shaded parts indicate the final results for F and 苶, F., 3.2.5, , Representation of Boolean functions. We have seen that there are many different ways to represent a given Boolean function. For example, we can use a truth table, or we can use one of many different Boolean expressions. In fact, there are an infinite number of Boolean expressions that are logically equivalent to each other. Two expressions that can be represented by the same truth table are considered logically equivalent. See Example 3.4., EXAMPLE 3.4 Suppose F(x,y,z) = x + xy苶. We can also express F as F(x,y,z) =, x + x + xy苶 because the Idempotent Law tells us that these two expressions are equal. We can also express F as F(x,y,z) = x(1 + y苶) using the Distributive Law. To help eliminate potential confusion, logic designers specify a Boolean function, using a standardized or canonical form. For any Boolean function, there is only one standardized form. However, there are different "patterns" that designers use. The two most common are the sum of products form and the product of sums form.


Page 132:
3.2 / Boolean Algebra, , 101, , The sum-of-products form requires that the expression be a collection of AND variables (or product terms) that are ORed together. The function F1(x,y,z) = xy + yz苶, + xyz is in sum of products form. The function F2(x,y,z) = xy苶 + x(y + z苶) is not in sum of products form. We apply the distributive law to distribute the variable x in F2, resulting in the expression xy苶 + xy + xz苶, which is now in sum-of-products form. Boolean expressions expressed as a product of sums consist of OR variables (sum terms) that are combined with AND. The function F1(x,y,z) = (x + y)(x +, z )(y + z苶)(y + z) is in sum product form. The product of sums form is generally preferred when the Boolean expression evaluates to true in more cases than false. This is not the case for the function F1, so the sum of products form is appropriate. Also, the sum-of-products form is generally easier to work with, and for simplicity, we use this form exclusively in the following sections. Any Boolean expression can be represented as a sum of products. Because any Boolean expression can also be represented as a truth table, we conclude that any truth table can also be represented as a sum of products. It is very easy to convert a truth table into sum of products form, as shown in the following example. EXAMPLE 3.5 Consider a simple majority function. This is a function that, when given three inputs, produces a 0 if less than half of its inputs are 1 and a 1 if at least half of its inputs are 1. Table 3.9 describes the truth table for this function. majority function on three variables. To convert the truth table to sum of products form, we start by looking at the problem backwards. If we want the expression x + y to be equal to 1, then either x or y (or both) must be equal to 1. If xy + yz = 1, then either xy = 1 or yz = 1 (or both). Using this logic in reverse and applying it to Example 3.5, we see that the function must generate 1 when x = 0, y = 1, and z = 1. The product term that satisfies is x苶yz (of course, this is equal to 1 when x = 0 , y = 1 and z = 1). The second occurrence of an output value of 1 is when x = 1, y = 0, and z = 1. The product term to guarantee an output of 1 is xy苶z. The third product term we need is xyz, and the last one is xyz. In summary, to generate a sum-of-products expression using, , TABLE 3.9, , x, , y, , z, , F, , 0, , 0, , 0, , 0, , 0, , 0, , 1 , , 0, , 0, , 1, , 0, , 0, , 0, 1, , 1, 0, , 1, 0, , 1, 0, , 1, , 0, , 1, , 1, , 1 , , 1, , 0, , 1, , 1, , 1, , 1, , 1, , Representation of the truth table for the majority function


Page 133:
102, , Chapter 3 / Boolean Algebra and Digital Logic, , the truth table for any Boolean expression, must generate a product term of the input variables corresponding to each row where the value of the output variable in that row is 1 . For each product term, you must complement any variable that is 0 for that row. Our majority function can be expressed as a sum of products as F(x,y,z), = x苶yz + xy苶z + xyz苶 + xyz. Note that this expression may not be in the simplest form; we are only guaranteeing a standard shape. The standard forms of sum of products and product of sum are equivalent ways of expressing a Boolean function. One form can be converted to the other by an application of Boolean identities. Whether the sum of products or the product of sums is used, the expression must eventually be converted to its simplest form, which means reducing the expression to the minimum number of terms. Why should expressions be simplified? There is a one-to-one correspondence between a Boolean expression and its implementation through electrical circuits, as we will see in the next section. The unnecessary product terms in the expression lead to unnecessary components in the physical circuit, which in turn produces a suboptimal circuit. Summary using truth tables and Boolean expressions. Real physical components or digital circuits, such as those that perform arithmetic operations or make decisions on a computer, are built from a series of primitive elements called gates. Gates implement each of the basic logic functions that we have discussed. These gates are the basic building blocks for digital design. Formally, a gate is a small electronic device that calculates various functions from signals of two values. More simply, a gate implements a simple boolean function. The physical implementation of each gate requires from one to six or more transistors (described in Chapter 1), depending on the technology used. To summarize, the basic physical component of a computer is the transistor; the basic logic element is the gate., , 3.3.1, , Symbols for logic gates First we will look at the three simplest gates. They correspond to the logical operators AND, OR and NOT. We discuss the functional behavior of each of these Boolean operators. Figure 3.1 shows the graphical representation of the gate corresponding to each operator., x, , xy, , y, , x, , x +y, , x, , x, , y, , AND gate, , OR gate, , FIGURE 3.1, , The Three Basic Gates, , NOT Gate


Page 134:
3.3 / Logic gates, , x, , y, , x XOR y, , 0, , 0, , 0, , 0, , 1, , 1, , 1, , 0, , 1, , 1, , 1, , 0, , x, , x ⊕y, , y, , (a), , FIGURE 3.2, , 103, , (b), , a) The truth table for XOR, b) The logical symbol for XOR, , Note the circle at the exit of the NOT gate. Typically, this circle represents the plugin operation. Another common gate is the exclusive OR (XOR) gate, represented by the Boolean expression: x 丣 y. XOR is false if both input values ​​are equal and true otherwise. Figure 3.2 illustrates the truth table for XOR, as well as the logic diagram that specifies its behavior. Each port has two different logical symbols, which can be used for port representation. (It is left as an exercise to prove that the symbols are logically equivalent. Hint: use DeMorgan's law.) Figures 3.3 and 3.4 show the logic diagrams for NAND and NOR along with truth tables to explain the functional behavior of each gate. x, , y, , x NAND y, , 0, , 0, , 1, , 0, , 1, , 1, , 1, , 0, , 1, , 1, , 1, , 0, , FIGURE 3.3, , x, , y, , x NI y, , , 0, , 0, , 1, , 0, , 1, , 0, , 1, , , 0, , 0, , 1, , 1, , 0 , , FIGURE 3.4 , , x, , x, , (xy ), , y, , x + y = (xy ), , y, , The truth table and logic symbols for NAND, , x, y, , (x + y ) , , x, y, , The truth table and logical symbols for NOR, , xy = (x +y )


Page 135:
104, , Chapter 3 / Boolean Algebra and Digital Logic, x, , x, , (xy ), , x, (xy ) = x + y, , (xy ) = xy, , y, , y, , x, , y, , AND Gate, , OR Gate, , FIGURE 3.5, , x, , NOT Gate, , Three circuits built using only NAND gates, , The NAND gate is commonly called a universal gate, because any electronic circuit can be built using only NAND gates . To prove this, Figure 3.5 shows an AND gate, an OR gate, and a NOT gate using only NAND gates. Why not just use the AND, OR, and NOT gates that we already know exist? There are two reasons to investigate using only NAND gates to build any circuit. First, NAND gates are cheaper to build than other gates. Second, complex ICs (discussed in the following sections) are often much easier to build using the same building block (i.e., multiple NAND gates) rather than a collection of basic building blocks ( that is, a combination of AND, OR , and NOT gates). Note that the duality principle also applies to universality. You can build any circuit using only NOR gates. The NAND and NOR gates are related in the same way as the sum-of-products form and the product-of-sums form, presented earlier. NAND can be used to implement an expression in sum-of-products form and NOR for those in product-of-sums form., 3.3.3, Multiple Input Gates. However, gates are not limited to two input values. There are many variations in the number and types of entrances and exits allowed for various doors. For example, we can represent the expression x + y + z using an OR gate with three inputs, as in Figure 3.6., Figure 3.7 represents the expression xy苶z., x, y, z, , FIGURE 3.6, , x+ y +z, , A three-input OR gate representing x + y + z, x, y, z, , FIGURE 3.7, , xyz, , a three-input AND gate representing xy, 苶z


Page 136:
3.4 / Digital components, , FIGURE 3.8, , x, , Q, , y, , Q, , 105, , AND Gate with two inputs and two outputs, , We will see later in this chapter that it is sometimes useful to represent the output of a port like Q along with its complement Q, 苶, as shown in Figure 3.8., Note that Q always represents the actual output., , 3.4, , DIGITAL COMPONENTS, When you open a computer and look inside, you can see that there is a lot, to know all the digital components that make up the system. Every computer is built using collections of ports that are all connected via cables that act as signal gateways. These collections of ports are often quite standardized, resulting in a set of building blocks that can be used to build the entire computer system. Surprisingly, these building blocks are built using the basic operations AND, OR, and NOT. In the following sections, we will discuss digital circuits, their relationship to Boolean algebra, standard building blocks, and examples of two different categories, combinational logic and sequential logic, into which these building blocks can be placed., , 3.4 . 1, , Digital circuits and their relationship with Boolean algebra. What is the connection between Boolean functions and digital circuits? We have seen that a simple Boolean operation (such as AND or OR) can be represented by a simple logic gate. More complex Boolean expressions can be represented as combinations of AND, OR, and NOT gates, resulting in a logic diagram that describes the entire expression. This logic diagram represents the physical implementation of the given expression, or the actual digital circuit. Consider the function F(x,y,z) = x + y苶z (which we saw earlier). Figure 3.9 represents a logic diagram that implements this function. We can build logic diagrams (which in turn lead to digital circuitry) for any Boolean expression. 3.9, , Logic diagram for F(x,y,z) = x + y, 苶z


Page 137:
106, , Chapter 3 / Boolean Algebra and Digital Logic, , Boolean algebra allows you to analyze and design digital circuits. Because of the relationship between Boolean algebra and logic diagrams, we simplify our circuit by simplifying our Boolean expression. Digital circuits are implemented with gates, but gates and logic diagrams are not the most convenient ways to represent digital circuits during the design phase. Boolean expressions are much better to use during this phase because they are easier to manipulate and simplify. The complexity of the expression that represents a Boolean function has a direct impact on the complexity of the resulting digital circuit; the more complex the expression, the more complex the resulting circuit. It should be noted that we don't normally simplify our circuits using Boolean identities; we have already seen that this can sometimes be quite difficult and time consuming. Instead, designers use a more automated method to do this. This method involves the use of Karnaugh maps (or Kmaps). The interested reader should refer to the focus section that follows this chapter to learn how Kmaps helps simplify digital circuits. collections of gates are used by the actual hardware of a computer to create larger modules, which, in turn, are used to implement various functions. The number of doors required to create these "building blocks" depends on the technology used. Since circuit technology is beyond the scope of this text, the reader is referred to the reading list at the end of this chapter for more information on circuit technology. Doors are generally not sold individually; they are sold in units called integrated circuits (ICs). A chip (a small semiconductor crystal of silicon) is a small electronic device that consists of the necessary electronic components (transistors, resistors, and capacitors) to implement various gates. As described in Chapter 1, the components are etched directly onto the chip, allowing them to be smaller and require less power to operate than their discrete components. This chip is then mounted in a ceramic or plastic container with external pins. , The necessary connections are soldered from the chip to the external pins to form an IC. The first integrated circuits contained very few transistors. As we learned in Chapter 1, the first integrated circuits were called SSI chips, and they contained up to 100 electronic components per chip. We now have ULSI (Large Scale Integration), with over 1 million electronic components per chip. Figure 3.10 illustrates, a simple SSI IC, COMBINATIONAL CIRCUITS, Digital Logic Chips combine to give us useful circuits. These logic circuits can be categorized as combinational logic or sequential logic. This section introduces combinational logic. Sequential logic is covered in Section 3.6.


Page 138:
3.5 / Combination circuits, +5 volts DC 14, , 13, , 12, , 11, , 10, , 9, , 8, , 6, , 5, , 4, , 3, , 2, , 1, , 107, , Notch, , 7, , FIGURE 3.10, , Ground, , A simple SSI integrated circuit, , 3.5.1, , Basic Concepts, Combinatorial logic is used to build circuits that contain basic Boolean operators, inputs, and outputs. The key concept in recognizing a combinational circuit is that an output is always based entirely on the given inputs. Thus, the output of a combinational circuit is a function of its inputs, and the output is determined solely by the values ​​of the inputs at a given time. A given combinational circuit can have multiple outputs. In this case, each output represents a different Boolean function., , 3.5.2, , Examples of typical combinational circuits, Let's start with a very simple combinational circuit called a half-adder. Consider the problem of adding two binary digits. There are only three things to remember: 0 + 0 = 0, 0 + 1 = 1 + 0 = 1, and 1 + 1 = 10. We know the behavior this circuit exhibits, and we can formalize this behavior using a truth table. We need to specify two outputs, not just one, because we have a sum and a transport direction. The truth table for a half adder is shown in Table 3.10. A closer look reveals that Sum is actually an XOR. The Carry output is equivalent to that of an AND gate. We can combine an XOR gate and an AND gate, resulting in the logic diagram for a half adder shown in Figure 3.11. The half adder is a very simple circuit and not very useful because it can only add two bits. However, we can extend this adder to a circuit that allows the addition of larger binary numbers. Consider how you add base 10 numbers: you add the rightmost column, write down the ones digit, and carry the tens digit. It then adds that transport to the current column and continues similarly. We can add binary numbers in the same way. However, we need a


Page 139:
108, , Chapter 3 / Boolean Algebra and Digital Logic, Inputs, , TABLE 3.10, , Outputs, , x, , y, , Sum, , Carry, , 0, , 0, , 0, , 0, , 0, , 1 , , 1, , 0, , 1, , 0, , 1, , 0, , 1, , 1, , 0, , 1, , The truth table for a half-adder, , x, , Sum, , y, , Carry, , FIGURE 3.11, , Logic diagram for a half-adder circuit, , that allows three inputs (x, y, and carry) and two outputs (add and carry). Figure 3.12 illustrates the truth table and the corresponding logic diagram for a full adder. Note that this full adder is made up of two half adders and an OR gate. Given this full adder, you may be wondering how this circuit can add binary numbers, since it is only capable of adding three bits. The answer is: you can't. However, we can build an adder capable of adding two 16-bit words together, for example, by replicating the above circuit 16 times, feeding the Carry Out of one circuit into the Carry In of the circuit immediately to its left. Figure 3.13 illustrates this idea. This type of circuit is called a ripple carry adder because of the sequential generation of carries that "ripple" through the adder stages. Notice that instead of drawing all the gates that make up a full adder, we use a black box approach to represent our adder. A black box approach allows us to ignore the details of the actual ports. We are only interested in the inputs and outputs of the circuit. This is usually done with most circuitry, including decoders, multiplexers, and adders, as we'll see shortly. Since this adder is very slow, it is usually not implemented. However, it is easy to understand and should give you an idea of ​​how adding larger binary numbers can be achieved. Modifications made to the designs of the adders resulted in the carry-ahead adder, the carry-select adder, and the carry-and-save adder, as well as others. Each attempts to shorten the delay required to add two binary numbers.


Page 140:
109, , 3.5 / Combination Circuits, Carry, , Inputs, , Outputs, , x, , y, , Carry, Input, , Sum, , Carry, Output, , 0, , 0, , 0, , 0, , 0 , , 0, , 0, , 1, , 1, , 0, , 0, , 1, , 0, , 1, , 0, , 0, , 1, , 1, , 0, , 1, , 1, , 0 , , 0, , 1, , 0, , 1, , 0, , 1, , 0, , 1, , 1, , 1, , 0, , 0, , 1, , 1, , 1, , 1, , 1, , 1, , (a), , Sum, , x, y, , Carry out, (b), , FIGURE 3.12 a) A truth table for a full adder, , b) A logic diagram for an adder FIGURE 3.13, , C2, , Y0, , X1, , FA, , C1, , X0, , FA, , C0, , El logic diagram for a wave carry adder, these newer adders actually achieve speeds 40% to 90% faster than the wave carry adder, performing additions in parallel and reducing the maximum carry path. Adders are very important circuits: a computer wouldn't be very useful if it couldn't add numbers. An equally important operation that all computers frequently use is to decode binary information from a set of n inputs to a maximum of 2n outputs. A decoder uses the inputs and their respective values ​​to select a specific output line. What do we mean by "select an output line"? It simply means that a single output line is on or set to 1, while the other output lines are set to zero. Decoders are normally defined by the number of inputs and the number of outputs. For example, a decoder with 3 inputs and 8 outputs is called a 3 to 8 decoder. We mentioned that this decoder is something the computer uses frequently. , it should be able to run, but you may have a hard time finding a decoding example. If so, you are not familiar with the way a computer accesses memory. All memory addresses in a computer are specified as binary numbers. When a memory address is referenced (either read or write), the computer

(Video) 5. types of instructions in computer architecture | computer teacher class by sampat | computer gk


Page 141:
110, , Chapter 3 / Boolean Algebra and Digital Logic, , the actual address must first be determined. This is done using a decoder. The following example should clear up any questions you may have about how a decoder works and what it can be used for. EXAMPLE 3.6 A 3-to-8 Decoder Circuit Imagine a memory consisting of 8 chips, each containing 8K bytes. Suppose chip 0 contains memory addresses 0–8191, chip 1 contains memory addresses 8192–16,383, and so on. We have a total of 8K ⫻ 8, or 64K (65,536), addresses available. We won't write all 64K addresses as binary numbers; however, writing some addresses in binary format (as we illustrate in the following paragraphs) will illustrate why a decoder is necessary. Given 64 = 26 and 1K = 210, then 64K = 26 ⫻ 210 = 216, indicating that we need 16 bits to represent each address. If you're having trouble understanding this, start with a smaller number of addresses. For example, if you have 4 addresses, addresses 0, 1, 2, and 3, the binary equivalent of these addresses is 00, 01, 10, and 11, which requires two bits. We know that 22 = 4. Now consider eight directions. We have, to be able to count from 0 to 7 in binary. How many bits does it require? The answer is 3. You can write them all down or recognize that 8 = 23. The exponent tells us the minimum number of bits needed to represent the addresses. All addresses on chip 0 have the format: 000xxxxxxxxxxxx. Since chip 0 contains addresses from 0 to 8191, the binary representation of these addresses is in the range 0000000000000000 to 0001111111111111. Likewise, all addresses on chip 1 have the form 001xxxxxxxxxxxx, and so on for the remaining chips. The leftmost 3 bits determine which chip the address is actually on. We need 16 bits to represent the full address, but on each chip we only have 213 addresses. Therefore, we only need 13 bits to uniquely identify an address on a given chip. The rightmost 13 bits give us this information. When a computer is given an address, it must first determine which chip to use; so you have to find the actual address on that specific chip. In our example, the computer would use the leftmost 3 bits to choose the chip, then find the address on the chip using the remaining 13 bits. These 3 high order bits are actually used as inputs to a decoder so the computer can determine which chip to activate to read or write. If the first 3 bits are 000, chip 0 should be turned on. If the first 3 bits are 111, chip 7 should be turned on. Which chip would be turned on if the first 3 bits were 010? It would be chip 2. Connecting a specific active wire, a chip. The output of the decoder is used to drive one chip, and only one, as addresses are decoded. Figure 3.14 illustrates the physical components of a decoder and the symbol commonly used to represent a decoder. We will see how an in-memory decoder is used in Section 3.6. Another common combinational circuit is a multiplexer. This circuit selects binary information from one of many input lines and directs it to a single output line. The selection of a specific input line is controlled by a set of selection variables.


Page 142:
3.5 / Combinational Circuits, , x, , xy, , y, , xy, , 111, , xy, , xy, , ., n Inputs ., ., , Decoder, , (a), , ., . 2n Outputs, ., , (b), , FIGURE 3.14, , a) A look inside a decoder, b) A decoder symbol, , abilities or control lines. At any time, only one input (the selected one) is routed through the circuit to the output line. All other inputs are "cut off". If the values ​​on the control lines change, the input actually routed will also change. Figure 3.15 illustrates the physical components of a multiplexer and the symbol often used to represent a multiplexer. Can you think of some situations that require multiplexers? Time-sharing computers multiplex input from user terminals. Modem pools multiplex the modem lines coming into the computer. Another useful set of combinational circuits to study includes a parity generator and a parity checker (remember that we studied parity in Chapter 2). A parity generator is a circuit that creates the necessary parity bit to add to a word; one pair, S1S0I3, , S1, S0, , S1S0I2, , I3, , S1S0I1, , I2, , One input, wraps, to output, , l0, l1, l2, , Multiplexer, , l3, , S1S0I0, , I1 , ​​, S1, S0, control lines, , I0, (a), , FIGURE 3.15, , (b), , a) A look inside a multiplexer, b) A multiplexer symbol


Page 143:
112, , Chapter 3 / Boolean Algebra and Digital Logic, , the checker verifies that the proper parity (odd or even) is present in the word, detecting an error if the parity bit is incorrect. built using XOR, functions. Assuming that we are using odd parity, the truth table for a parity generator, for a 3-bit word, is given in Table 3.11. The truth table for a parity checker to be used on a 4-bit word with 3 information bits and 1 parity bit is given in Table 3.12. The parity checker outputs a 1 if an error is detected and a 0 otherwise. We leave it as an exercise to draw the corresponding logic diagrams for both the parity generator and the parity checker. There are too many combinational circuits to cover, all in this short chapter. Comparators, changers, programmable logic devices - these are all valuable circuits, and actually very easy to understand. The interested reader is referred to the references at the end of this chapter for more information on combinational circuits. However, before we end the topic of combinational logic, there is one more combinational circuit that we need to introduce. We have covered all the components needed to build an Arithmetic Logic Unit (ALU). Figure 3.16 illustrates a very simple ALU with four basic operations (AND, OR, NOT, and add) performed in two machine words of 2 bits each. The control lines, f0 and f1, determine what operation the , , x, , y, , z, , P, , Error, Detected?, , 0, , 0, , 0, , 0, , 1, , should perform. 0, , 0, , 0, , 1, , 0, , 0, , 0, , 1, , 0, , 0, , 0, , 0, , 1, , 1, , 1, , 0, , 1 , , 0, , 0, , 0, , 0, , 1, , 0, , 1, , 1, , 1, , 1, , 0, , 1, , y, , z, , Parity, Bit, , 0 , , x, , 0, , 1, , 1, , 1, , 0, , 0, , 0, , 0, , 1, , 1, , 0, , 0, , 0, , 0, , 0 , , 0 , , 1, , 1, , 0, , 0, , 1, , 0, , 1, , 0, , 1, , 0, , 0, , 1, , 0, , 1, , 0, , 1, , 0, , 1, , 1, , 0, , 0, , 1, , 1, , 1, , 1, , 1, , 0, , 0, , 0, , 1, , 1, , 0, , 0 , , 1, , 1, , 0, , 1, , 0, , 1, , 0, , 1, , 1, , 1, , 1, , 1, , 0, , 1, , 1, , 1 , , 1 , , 0, , 0, , 0, , 1, , 1, , 1, , 1, , 1, , 1, , TABLE 3.11, , 1, , 1, , Parity generator, , TABLE 3.12, , Checker parity


Page 144:
3.6 / Sequential circuits, , 113, , Input, B0, , B1, , A0, , A1, , f0, , f1, Decoder, Overflow, , Carry, , HalfAdder, , FullAdder, , Output, C0, , FIGURE 3.16, , C1, , A simple two-bit ALU, , CPU. The sign 00 is used for the addition (A + B); 01 for NOT A; 10 for A OR B and 11 for A AND B. The input lines A0 and A1 indicate 2 bits of a word, while B0 and B1 indicate the second word. C0 and C1 represent the output lines., , 3.6, , SEQUENTIAL CIRCUITS, In the previous section we studied combinational logic. We approach our study of Boolean functions by examining variables, the values ​​of those variables, and the outputs of functions that depend only on the values ​​of the function inputs. If we change an input value, this has a direct and immediate impact on the output value. The main weakness of combinational circuits is that there is no concept of storage: they have no memory. This presents us with a bit of a dilemma. We know that computers must have a way of remembering values. Consider the much simpler digital circuit needed for a soda machine. when you put


Page 145:
114, , Chapter 3 / Boolean Algebra and Digital Logic, , money in a soda machine, the machine remembers how much you deposited, at any given time. Without this ability to remember, it would be very difficult to use. A soda machine cannot be built using only combinational circuits. To understand how a soda machine works, and ultimately how a computer works, we must study sequential logic. 🇧🇷 So, the output depends on the past inputs. In order to remember previous inputs, sequential circuits must have some sort of storage element. We often refer to this storage element as a flip-flop. The state of this flip-flop is a function of the previous inputs to the circuit. Therefore, the pending output depends on the current, the inputs, and the current state of the circuit. Just as combinational circuits are generalizations of gates, sequential circuits are generalizations of flip-flops., , 3.6.2, , Clocks, The fact that a sequential circuit uses past inputs to determine present outputs indicates that we must have an order of events). Some sequential circuits are asynchronous, which means that they are activated the moment any input value changes. Synchronous sequential circuits use clocks to order events. A clock is a circuit that outputs a series of pulses with a precise pulse width and a precise interval between consecutive pulses. This interval is called the clock cycle time. Clock speed is usually measured in megahertz (MHz), or millions of pulses per second. Common cycle times range from one to several hundred MHz. A sequential circuit uses a clock to decide when to update the state of the circuit (when do "present" inputs become "past" inputs?). This means that inputs to the circuit can only affect the storage element at certain discrete instances of time. In this chapter, we examine synchronous sequential circuits because they are easier to understand than their asynchronous counterparts. From here on, when we refer to “sequential circuit”, we are suggesting “synchronous sequential circuit”. Most sequential circuits are edge controlled (as opposed to level controlled). This means that they can change their states on the rising or falling edge of the clock signal, as seen in Figure 3.17. indicating discrete instances of time


Page 146:
3.6 / Sequential Circuits, , 3.6.3, , 115, , Flip-Flops, A level-controlled circuit can change state whenever the clock signal is high or low. Many people use the terms latch and flip-flop interchangeably. Technically, a latch is level triggered, while a flip-flop is edge triggered. In this book, we use the term flip-flop. To "remember" a past state, sequential circuits rely on a concept called feedback. This simply means that the output of a circuit is fed back as an input to the same circuit. A very simple feedback circuit uses two NOT gates, as shown in Figure 3.18. In this figure, if Q is 0, it will always be 0. If Q is 1, it will always be 1. This is not very interesting. or useful circuit, but lets you see how feedback works. A more useful feedback circuit is made up of two NOR gates that result in the most basic unit of memory called an SR flip-flop. SR stands for "set/reset". The logic diagram of the SR flip-flop is shown in Fig. 3.19. We can describe any flip-flop using a characteristic table, which indicates what the next state should be as a function of the inputs and the current state, Q. The notation Q(t) represents the current state and Q(t + 1) denotes the next state, or the state the flip-flop should enter after the clock ticks. Figure 3.20 shows the actual implementation of the SR sequential circuit and its characteristics table. An SR flip-flop exhibits interesting behavior. There are three inputs: S, R, and the current output Q(t). We create the truth table shown in Table 3.13 to illustrate how this circuit works. For example, if S is 0 and R is 0, and the current state, Q(t), is 0, then the next state, Q(t + 1), is also 0. If S is 0 and R is 0, and Q(t) is 1, so Q(t+1) is 1. Real, the inputs (0,0) to (S,R) are unchanged when the clock ticks. Following a similar argument, we can see that the inputs (S,R) = (0,1) force the next state, Q(t +, 1), to 0 regardless of the current state (thus forcing a reset on the output circuit). . , When (S,R) = (1,0), the output of the circuit is defined as 1., , Q, , FIGURE 3.18, , Simple feedback example, , S, , Q, , C, , Q , R , , FIGURE 3.19, , SR flip-flop logic diagram


Page 147:
116, , Chapter 3 / Boolean algebra and digital logic, , S, , Q, , Q, , R, , S, , R, , Q (t +1), , 0, , 0, , Q(t) ( unchanged), , 0, , 1, , 0 (reset to 0), , 1, , 0, , 1 (set to 1), , 1, , 1, , undefined, , (a), , FIGURE 3.20, , (b), , a) The actual SR Flip-Flop, b) The characteristics table of the SR Flip-Flop, , S, , R, , Current state, Q(t), , Next state, Q(t + 1), , 0, , 0, , 0, , 0, , 0, , 0, , 1, , 1, , 0, , 1, , 0, , 0, , 0, , 1, , 1, , 0 , , 1, , 0, , 0, , 1, , 1, , 0, , 1, , 1, , 1, , 1, , 0, , undefined, , 1, , 1, , 1, , undefined, , TABLE 3.13, , Truth Table for SR Flip-Flop, , There is an oddity with this particular flip-flop. What happens if S and R are set to 1 at the same time? This forces Q and 苶, Q to 1, but how can Q =, 1=Q, 苶? This results in an unstable circuit. Therefore, this combination of inputs is not allowed in an SR flip-flop. Figure 3.21. This results in a JK flip-flop. JK flip-flops are named after Texas instrument engineer Jack Kilby, who invented the integrated circuit in 1958. Another variant of the SR flip-flop is the D (data) flip-flop. A D flip-flop is a true representation of the computer's physical memory. This sequential circuit stores one bit of information. If a 1 is declared on the input line D and the clock is pressed, the output line Q becomes 1. If a 0 is declared on the input line and the clock is pressed, the output becomes 0. that the output Q represents the state current of the circuit. Therefore, an output value of 1 means that the circuit is currently "storing" a value of 1. Figure 3.22 illustrates the D flip-flop, lists its characteristics table, and reveals that the D flip-flop is actually a modified SR-flop flip-flop. failure.


Page 148:
117, , 3.6 / Sequential circuits, , Q, , J, C, , Q, , K, , J, , K, , Q(t +1), , 0, , 0, , Q(t ) (no change ), , 0, , 1, , 0 (reset to 0), , 1, , 0, , 1 (set to 1), , 1, , 1, , Q(t ), , (a), , S, , J, K, , R, , (b), , FIGURE 3.21, , Q, , D, C, , Q, , FIGURE 3.22, , 3.6.4, , Q, , (c), , a) A JK Flip-Flop, b) The JK characteristic table, c) A JK Flip-Flop as a modified SR Flip-Flop, , D, , Q (t +1), , 0, , 0, , 1, , D, , S, , Q, , C, R, , 1, , (b), , (a), , Q, , C, , Q, , (c), , a) A D Flip-Flop, b) The characteristic D Table, c) Flip-Flop A D as modified SR Flip-Flop, Examples of sequential circuits, Latches and flip-flops are used to implement more complex sequential circuits, Registers, counters, memories, and shift registers all require the use of storage and are therefore implemented using sequential logic. Our first example of a sequential circuit is a simple 4-bit register implemented using and or four D flip-flops. (To implement longer word registers, we would simply need to add flip-flops.) There are four input lines, four output lines and a clock signal line. The clock is very important from the point of view of time; the loggers must accept your new input values ​​and change their storage elements at the same time. Remember that a synchronous sequential circuit cannot change state unless the clock pulses. The same clock signal is tied to all four D flip-flops, so they switch in unison. Figure 3.23 shows the logic diagram of our 4-bit register, as well as a block diagram for the register. The hardware actually has extra lines for power and ground, as well as a clear line (which allows you to reset the entire register to zero). However, in this text, we are willing to leave these concepts to computer engineers and focus on the actual digital logic present in these circuits. Another useful sequential circuit is a binary counter, which cycles through a predetermined sequence of states as the clock ticks. In a direct binary counter, these states reflect the binary number sequence. If we start counting in binary:


Page 149:
118, , Chapter 3 / Boolean algebra and digital logic, In0, , D, , Q, , Out0, , In1, , D, , Q, , Out1, , In2, , D, , Q, , Out2, , In3, , D, , Q, , Output3, , Clock, , (a), , FIGURE 3.23, , Input0, Input1, Input2, Input3, , Register, , Output0, Output1, Output2, Output3, , (b), , a) A 4-bit register, b) a block diagram for a 4-bit register, 0000, 0001, 0010, 0011, . 🇧🇷 🇧🇷 , we can see that as the numbers increase, the lower order bit is complemented each time. Every time it changes state from 1 to 0, the left bit is complemented. Each of the other bits changes state from 0 to 1 when all bits to the right are equal to 1. Because of this concept of complementary states, our binary counter is best implemented using a JK flip-flop (remember that, when J and K are both equal to 1, the flip-flop complements the current state). Instead of separate inputs for each flip-flop, there is a count enable line that runs for each flip-flop. The circuit only counts when the clock ticks and this count enable line is set to 1. If the count enable is set to 0 and the clock ticks, the circuit does not change state. You should examine Figure 3.24 very carefully, following the multiple-input circuit to make sure you understand how this circuit generates the binary numbers 0000 through 1111. You should also check what state the circuit enters if the current state is 1111 and the clock is pressed. . We have seen a simple register and a binary counter. We are now ready to examine a very simple memory circuit. The memory represented in Figure 3.25 contains four 3-bit words (commonly denoted as 4 × 3 memory). Each column of the circuit represents a 3-bit word. Note that the flip-flops that store the bits of each word are synchronized via the clock signal, so a read or write operation always reads or writes an entire word. The In0, In1, and In2 inputs are the lines used to store or write a 3-bit word to memory. The S0 and S1 lines are the address lines used to select which word in


Page 150:
3.6 / Sequential circuits, , J, , Q, , B0, , Q, , B1, , Q, , B2, , Q, , B3, , 119, , C, , Enable counting, , K, , J, C, K, J, C, K, J, C, K, , Output Carry, , Clock, , FIGURE 3.24, , Reference is made to a 4-bit synchronous counter using JK flip-flops, , memory. (Note that S0 and S1 are the input lines to a 2-4 decoder responsible for selecting the correct memory word.) The three output lines (Out1, Out2, and Out3) are used when reading words from memory. , You should also look at another control line. The write enable control line indicates whether we are reading or writing. Note that in this chip we separate the input and output lines for ease of understanding. In practice, input lines, lines, and output are the same lines. To summarize our discussion of this memory circuit, these are the steps required to write a word to memory: 2. WE (Write Enable) is set to high. 3. The decoder using S0 and S1 enables only one AND gate, selecting a certain word in memory., 4. The line selected in Step 3 combined with the clock and WE select only one word., 5. The gate of writing enabled in Step 4 activates the clock for the selected word., 6. When the clock strikes , the word on the input lines is loaded onto the D flip-flops. We leave it as an exercise to create a similar list of steps required to read a word from this memory. Another interesting exercise is to analyze this circuit


Page 151:
120, , Chapter 3 / Boolean Algebra and Digital Logic, Word 0, , Word 1, , D Q, , D Q, , Word 2, , D Q, , Word 3, , D Q, , Out0, Out1, In0, , D Q, , D Q, , D Q, , D Q, , D Q, , D Q, , D Q, , D Q, , Out2, , In1, In2, , Word 0, Select, , Clock, , Write, Enable, , Word 1, Select, , S0 , , FIGURE 3.25, , Word 2, Select, , Word 3, Select, , S1, , A 4 ⫻ 3 Memory, , and determine what additional components would be needed to extend the memory from, say, a 4 ⫻ 3 memory to a memory 8 ⫻ 3 or a memory 4 ⫻ 8., , 3.7, , CIRCUIT DESIGN, In the previous sections we have introduced many different components that are used in computer systems. We are by no means providing enough detail to enable you to begin designing circuits or systems. Digital logic design requires someone not only familiar with digital logic, but also well-versed in digital analysis (analyzing the relationship between inputs and outputs), digital synthesis (starting with a truth table and determining the logic diagram to implement the given logical function) and the use of CAD (computer aided design) software. Remember from our previous discussions that great care must be taken when designing circuits to ensure that these are minimized. A circuit designer faces many problems, including finding


Page 152:
Chapter Summary, , 121, , using efficient Boolean functions, using the fewest number of ports, using an economical combination of ports, arranging the ports on a circuit board to be used, smallest surface area, and minimum power requirements power, and try to do , all using a standard set of modules for the implementation. Add to this many issues we haven't discussed, such as signal propagation, deployment, timing issues, and external interface, and you can see that digital circuit design is quite complicated. Up to this point, we have discussed how to design registers, counters, memory, and various other basic digital components. Given these components, a circuit designer can implement any given algorithm in hardware (remember the Principle of Equivalence of Hardware and Software from Chapter 1). When writing a program, you are specifying a sequence of Boolean expressions. It is usually much easier to write a program than it is to design the hardware needed to implement the algorithm. However, there are situations where the hardware implementation is better (for example, in a real-time system, the hardware implementation is faster, and faster is definitely better). However, there are also cases where a software implementation is better. It is often desirable to replace a large number of digital components with a single programmed microcomputer chip, resulting in an embedded system. Your microwave oven and your car probably have integrated systems. This is done to replace additional hardware that may have mechanical problems. Programming these embedded systems requires design software capable of reading input variables and sending output signals to perform tasks such as turning a light on or off, beeping, sounding an alarm, or opening a door. How Boolean Functions Behave., CHAPTER SUMMARY, The primary goal of this chapter is to familiarize you with the basic concepts involved in logic design and to provide a general understanding of the basic circuit configurations used to build computer systems. This level of familiarity will not allow you to design these components; instead, it gives you a better understanding of the architectural concepts discussed in later chapters. 🇧🇷 Any boolean function can be represented as a truth table, which can then be transformed into a logic diagram, indicating the components needed to implement the digital circuit for that function. Therefore, truth tables provide us with a means of expressing the characteristics of Boolean functions and logic circuits. In practice, these simple logic circuits are combined to create components such as adders, ALUs, decoders, multiplexers, registers, and memory. There is a one-to-one correspondence between a Boolean function and its digital representation. Boolean identities can be used to reduce Boolean expressions to minimize combinational and sequential loops. Minimization is extremely important in circuit design. From the point of view of a chip designer, , T


Page 153:
122, , Chapter 3 / Boolean Algebra and Digital Logic, , see, the two most important factors are speed and cost: minimizing circuitry helps reduce cost and increase performance. Digital logic falls into two categories: combinational logic and sequential logic. Combinational logic devices such as adders, decoders, and multiplexers produce outputs strictly based on the current inputs. AND, OR, and NOT gates are the basic components of combinational logic circuits, although universal gates such as NAND and NOR can also be used. Sequential logic devices such as registers, counters, and memory produce outputs based on the combination of current inputs and the current state of the circuit. These circuits are built using SR, D, and JK flip-flops. These logic circuits are the basic components needed for computer systems. a computer really works. If you are interested in learning more about Kmaps, there is a special section that focuses on Kmaps located at the end of this chapter, after the exercises. FURTHER READING Most Kmaps Computer architecture and organization books have a brief discussion of digital logic and Boolean algebra. The books by Stallings (2000) and Patterson and Hennessy (1997) contain good synopses of digital logic. Mano (1993) provides a good discussion of the use of Kmaps for circuit simplification (discussed in the focus section of this chapter) and programmable logic devices, as well as an introduction to various circuit technologies. For more detailed information on digital logic, see the books by Wakerly (2000), Katz (1994), or Hayes (1993). 1998). Maxfield's book (1995) is an absolute joy to read, containing informative and sophisticated concepts in Boolean logic, as well as a treasure trove of insightful and interesting trivia (including a wonderful seafood gumbo recipe!). For a very direct and easy to read book, Gates and Flip-Flops (as well as a fantastic explanation of what computers are and how they work), see Petgold's book (1989). Davidson (1979) presents a circuit decomposition method based on NAND (interesting because NAND is a universal gate). If you're interested in designing some circuits, there's a good simulator available for free. The set of tools is called the Chipmunk System. It performs a wide variety of applications, including electronic circuit simulation, graphics, editing, and curve tracing. It contains four main tools, but for circuit simulation Log is the program you need. The Diglog portion of the registry allows you to create and test digital circuits. If you are interested in downloading the program and running it on your machine, you can find the general Chipmunk distribution at www.cs.berkeley.edu/~lazzaro/chipmunk/. The distribution is available for a wide variety of platforms (including PCs and Unix machines).


Page 154:
Review of Essential Terms and Concepts, , 123, , REFERENCES, Davidson, E. S. “An Algorithm for NAND Decomposition under Network Constraints,” IEEE, Transactions on Computing: C-18, 1098, 1979., Gregg, John. Uns e Zeros: Comprendendo Álgebra Booleana, Circuitos Digitais y una Lógica de Conjuntos. Nueva York: IEEE Press, 1998., Hayes, J.P. Digital Logic Design. Reading, MA: Addison-Wesley, 1993., Katz, R. H. Contemporary Logic Design. Redwood City, CA: Benjamin Cummings, 1994., Mano, Morris M. Computer System Architecture, 3ª ed. Englewood Cliffs, Nueva Jersey: Prentice Hall, 1993., Maxfield, Clive. Bebop para el boogie booleano. Solana Beach, CA: High Text Publications, 1995., Patterson, D. A. and Hennessy, J. L. Computer Organization and Design, The Hardware/Software, Interface, 2nd ed. San Mateo, CA: Morgan Kaufmann, 1997., Petgold, Charles. Code: The Hidden Language of Computer Hardware and Software, Redmond, WA: Microsoft Press, 1989., Stallings, W. Computer Organization and Architecture, 5ª ed. Nueva York: Macmillan Publishing, Company, 2000., Tanenbaum, Andrew. Organización Estruturada de Computadores, 4ª ed. Upper Saddle River, NJ: Prentice, Hall, 1999., Wakerly, J. F. Principios y prácticas de diseño digital, Upper Saddle River, NJ: Prentice Hall, 2000., , REVISIÓN DE TÉRMINOS Y CONCEPTOS ESENCIALES, 1., 2., 3 . , 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., , Por que a compreensão da álgebra booleana é importante para os cientistas da computação?, Qual operação booleana é referida como um produto booleano?, Qual operação booleana é referida como uma soma booleana?, Crie tabelas verdade para os operator booleanos OR, AND e NOT., O que é o princípio da dualidade booleana?, Por que é importante que as expressões booleanas ser minimizado no projeto de circuitos digitais?, Qual é a relação between transistores and portas?, Cite as quatro portas lógicas basics., Quais são as duas portas universais descritas neste capítulo? Por que essas portas universais são importantes?, Descrever a construção básica de um chip lógico digital., Descrever a operação de um somador de transporte de ondulação. Por que os somadores de transporte de ripple no son usados, na maioria dos computadores hoje?, Como chamamos um circuito que leva várias entradas y seus valores respectivos para selecionar, uma linha de saída específica? Cite uma aplicação importante para ess dispositivos. ¿Qué tipo de circuito seleciona informações binárias de uma das muitas linhas de entrada y como dirección para uma única linha de saída?


Page 155:
124, , Chapter 3 / Boolean Algebra and Digital Logic, 14. How are sequential circuits different from combinational circuits?, 15. What is the basic element of a sequential circuit?, 16. What do we mean when we say that a sequential circuit is edge triggered instead of level triggered?, 17. What is feedback?, 18. How is a JK flip-flop related to an SR flip-flop?, 19. Why is it often Are JK flip-flops preferred to flip-flops? SR -flops?, 20. Which flip-flop gives a true representation of computer memory?, , EXERCISES, ◆, , 1. Construct a truth table for the following: a) xyz + (x苶y苶z苶), ◆ b) x(yz + xy), 苶, 2. Construct a truth table for the following:, ◆, , a) xyz + xy苶z苶 + x苶y苶z苶, b) ( x + y)( x + z)(x + z), ◆, , 3. Using DeMorgan's Law, write an expression for the complement of F if F(x,y,z) =, x(y苶 + z ., 4. Using DeMorgan's Law, write an expression for the complement of F if F(x,y,z) = xy, + x苶 z + yz., , ◆, , 5. Using DeMorgan's Law DeMorgan, write an expression for the complement of F if F (w,x,y,z) =, x yz苶 (y苶苶z苶苶+, 苶苶x苶) + (w, 苶yz + 苶x ) , 6. Use Boolean identities to prove the following:, a) Absorption laws, b) DeMorgan's laws , , ◆, , 7. Is the following distributive law valid or not? Prove your answer., x XOR (y AND z) = (x XOR y) AND (x XOR z), 8. Prove that x = xy + xy苶, a) Using truth tables, b) Using Boolean identities, 9 Prove that xz = (x + y)(x + y苶)(x苶 + z), a) Using truth tables, ◆, , b) Using Boolean identities


Page 157:
126, , Chapter 3 / Boolean Algebra and Digital Logic, 19. The truth table for a Boolean expression is shown below. Write the Boolean expression in sum-of-products form., x, 0, , y, 0, , z, 0, , 0, , 0, , 1, , 0, , 1, , 0, , 0, , 1 , , 1, , 1, , 0, , 0, , 1, , 0, , 1, , 1, , 1, , 0, , 1, , 1, , 1, , F, 1, 0, 0, 1 , 0 , 0, 1, 0, , 20. Draw the truth table and rewrite the following expression as the complement sum of two products: xz + yz + xy, 21. Given the Boolean function F(x,y,z ) = xy + xyz, ◆, , 22., , 23., *24., 25., , 26., , ◆, , 27., , a) Derive an algebraic expression for the complement of F. Express in sum form -of- products., b) Prove that FF = 0., c) Prove that F + F = 1., Given the function F(xy,z) = xyz + x苶 y苶 z + xyz, a) List a b ) Draw the logic diagram using the original Boolean expression., c) Simplify the expression using Boolean algebra and identities., d) List the truth table for your answer in Part c., e) Draw the logic diagram for the simplified expression in Part c. , Construct the XOR operator using only the AND, OR, and NOT gates, Construct use the XOR operator using only NAND gates, Hint: x XOR y = (x苶苶y苶)(x苶y苶苶苶), Design a circuit with three inputs (x, y, and z) that represent the bits in a binary number and three outputs (a, b, and c) that also represent bits in a binary number. When the input is 0, 1, 2, or 3, the binary output must be one less than the input. When the binary input is 4, 5, 6, or 7, the binary output must be one greater than the input. Show your truth table, all the simplifying calculations, and the final circuit. Draw the combinational circuit that directly implements the following Boolean expression:, F(x,y,z) = xz + (xy + z), Draw the combinational circuit that directly implements the following Boolean expression:, 苶苶z苶苶)) , F(x, y,z ) = (xy XOR (y苶苶+,苶 + xz


Page 158:
127, , Exercises, 28. Find the truth table that describes the following circuit:, X, , Y, F, Z, ◆, , 29. Find the truth table that describes the following circuit:, X, , Y, , F , , Z, , 30. Find the truth table that describes the following circuit: , , X, , F, , Y, , Z, , 31. Draw circuits to implement the parity generator and parity checker that are shown in Tables 3.11 and 3.12 , respectively., 32. Draw a half adder using only NAND gates., 33. Draw a full adder using only NAND gates., 34. Tyrone Shoelaces has invested a large amount of money in the market of securities and, by not trusting any person to give you buy and sell information. Before you buy a particular stock, you should obtain information from three sources. Its first source is pain.


Page 159:
128, , Chapter 3 / Boolean algebra and digital logic, Webster, a famous stockbroker. The second source of him is Meg A. Cash, a self-made millionaire in the stock market, and the third source of him is Madame LaZora, a world famous psychic. After several months of receiving advice from all three, he came to the following conclusions: a) Buy if Pain and Meg say yes and the medium says no, b) Buy if the medium says yes, c) Don't buy otherwise. contrary. ., Construct a truth table and find the minimized Boolean function to implement the logic that tells Tyrone when to buy., ◆ *35. A very small company has hired you to install a security system. The brand of system you install is priced by the number of bits encoded on the proximity cards that allow access to certain locations in a facility. It's clear that this small company wants to use as few bits as possible (spend as little money as possible) and still satisfy all of their security needs. The first thing to do is determine how many bits each card requires. You then need to program the card readers at each secure location to respond appropriately to a scanned card. This company has four types of employees and five areas that they want to restrict to certain employees. The employees and their restrictions are as follows: a) Big Boss needs access to the executive lounge and executive restroom, b) Big Boss's secretary needs access to the supply closet, staff lounge and executive lounge, c) Computer room staff need access to the server room and staff room, d) The janitor needs access to all areas of the workplace. Determine how each class of employee will be encoded on the cards, and construct logic diagrams for card readers in each of the five restricted areas., 36 How many 256 × 8 RAM chips are needed to provide a memory capacity of 4096 bytes? , a) How many bits will each memory address contain?, b) How many address lines should go to each chip?, c) How many lines should be decoded for chip select inputs? Specify the size of the decoder., ◆ *37. Investigate the operation of the following circuit. Assume an initial state of 0000. Trace the outputs (the Q's) as the clock ticks and determine the purpose of the circuit. You must show the stroke to complete your answer., J, C, K, , Q, Q, , J, C, K, , Q, Q, , J, C, K, , Q, Q, , J, C, K, , Q, Q, , Clock, , 38. Describe how each of the following circuits works, giving typical inputs and outputs. Also provide a neatly labeled black box plot for each.


Page 160:
129, , Exercises, a) Decoder, ◆, , b) Multiplexer, , 39. Complete the truth table for the following sequential circuit:, Next State, , X, , J Q, C, K Q, , A, , D Q, C , Q, , B, , A, 0, 0, 0, 0, 1, 1, 1, 1, , B, 0, 0, 1, 1, 0, 0, 1, 1, , X, 0, 1 , 0, 1, 0, 1, 0, 1, , A, , B, , 40. Complete the truth table for the following sequential circuit:, Next State, X, Y, Z, , ◆, , Full Adder , C , , D Q, C, Q, , A, 0, 0, 0, 0, 1, 1, 1, 1, , S, Q, , B, 0, 0, 1, 1, 0, 0, 1 , 1 , , X, 0, 1, 0, 1, 0, 1, 0, 1, , A, , B, , 41. Complete the truth table for the following sequential circuit: , Next State, , X, , J Q, C , K Q, , A, , D Q, C, Q, , B, , A, 0, 0, 0, 0, 1, 1, 1, 1, , B, 0, 0, 1, 1, 0 , 0, 1, 1, , X, 0, 1, 0, 1, 0, 1, 0, 1, , A, , B, , 42. A sequential circuit has one flip-flop; two inputs, X and Y; and an output, S. It consists of a full adder circuit connected to a flip-flop D, as shown below. Complete the characteristics table for this sequential circuit by filling in the Next State and Output columns.


Page 161:
130, , Chapter 3 / Boolean Algebra and Digital Logic, , X, Y, Z, , S, , Full Adder, C, , J, , Q, , K, Clock, , Current State, Q (t), , ◆ , , Entries, X, Y, , 0, , 0, , 0, , 0, , 0, , 1, , 0, , 1, , 0, , 0, , 1, , 1, , 1, , 0 , , 0, , 1, , 0, , 1, , 1, , 1, , 0, , 1, , 1, , 1, , Next State, Q (t + 1), , Exit, S, , *43 . A Mux-Not flip-flop (MN flip-flop) behaves as follows: if M = 1, the flip-flop complements the current state. If M = 0, the next state of the flip-flop is equal to the value of N., a) Obtain the characteristic table for the flip-flop., b) Show how a JK flip-flop can be converted to a JK flip-flop. flop MN adding gate(s) and inverter(s)., ◆, , 44. List the steps required to read a word from memory in the 4 ⫻ 3 memory circuit, shown in Figure 3.25., , FOCUS ON KARNAUGH MAPS, 3A.1, , INTRODUCTION, In this chapter, we focus on Boolean expressions and their relationship to digital circuits. Minimizing these circuits helps reduce the number of components in the actual physical implementation. Having fewer components allows the circuit to run faster. The reduction of Boolean expressions can be done using Boolean identities; however, using identities can be very difficult because there are no rules given for how or when to use identities, and there is no well-defined set of steps to follow. In a way, minimizing boolean expressions is a lot like taking an exam: You know when you're on the right track, but getting there can sometimes be difficult.


Page 162:
3A.2 / Description of Kmaps and terminology, , 131, , frustrating and slow. In this appendix, we present a systematic approach to reducing Boolean expressions., , 3A.2, , DESCRIPTION OF KMAPS AND TERMINOLOGY Karnaugh maps, or Kmaps, are a graphical way of representing Boolean functions. A map is simply a table used to list the values ​​of a given boolean expression, for different input values. The rows and columns correspond to the possible values ​​of the inputs of the function. Each cell represents the outputs of the function to these possible inputs. If a product term includes all variables exactly once, complemented or not, that product term is called a minterm. For example, if there are two input values, x and y, there will be four minterms, x苶 y苶, 苶x y, x 苶y, and xy, representing all possible input combinations for the function. If the input variables are x, y, and z, there will be eight minterms: x苶 y苶 z苶, 苶x y苶 z, x苶 y z苶, x苶 y z, x y苶 z苶,, x y苶 z, x y苶z, and xyz. As an example, consider the Boolean function F(x,y) = xy + 苶xy. The possible inputs for x and y are shown in Figure 3A.1. The minterm x苶 y苶 represents the input pair (0,0). Likewise, the minterm 苶xy represents (0,1), the minterm xy苶 represents (1,0), and xy represents (1,1). The minterms for three variables, along with the input values ​​they represent, are shown in Figure 3A.2., , FIGURE 3A.1, , FIGURE 3A.2, , Minterm, , x, , y, , XY, , 0 , , 0, , XY, , 0, , 1, , XY, , 1, , 0, , XY, , 1, , 1, , Minterms for two variables, , Minterms, , x, , y, , z, , XYZ, , 0, , 0, , 0, , XYZ, , 0, , 0, , 1, , XYZ, , 0, , 1, , 0, , XYZ, , 0, , 1, , 1, , XYZ , , 1, , 0, , 0, XYZ, , 1, , 0, , 1, , XYZ, , 1, , 1, , 0, , XYZ, , 1, , 1, , 1, , Three minterm variables


Page 163:
132, , Chapter 3 / Boolean Algebra and Digital Logic, , A Kmap is a table with one cell for each minterm, that is, it has one cell for each row of the truth table of the function. Consider the function F(x,y) = xy and its truth table, as shown in Example 3A.1., EXAMPLE 3A.1 F(x,y) = xy, x, 0, 0, 1, 1 , , y , 0, 1, 0, 1, , xy, 0, 0, 0, 1, , The corresponding Kmap is:, y, x, 0, , 0, 0, , 1, 0, , 1, , 0, , 1, , Note that the only cell in the map with a value of one occurs when x = 1 and y =, 1, the same values ​​for which xy = 1. Let's look at another example, F(x,y ) = x + y., EXAMPLE 3A.2 F(x,y) = x + y, x, 0, 0, 1, 1, , y, 0, 1, 0, 1, , x+y, 0, 1 , 1, 1, , y, x, 0, , 0, 0, , 1, 1, , 1, , 1, , 1, , Three of the minterms in Example 3A.2 have a value of 1, exactly the minterms , so the input to the function gives us an output 1. To assign 1 in Kmap, we simply put 1 where we find 1 matches in the truth table. We can express the function F(x,y) = x + y as the logical OR of all minterms for which the minterm has a value of 1. Then F(x,y) can be represented by the expression x苶y + x苶y + x y Obviously, this expression is not minimized (we already know that this function is simply x + y). We can minimize using boolean identities:


Page 164:
3A.3 / Simplification of KMap for two variables, , F(x,y) = x苶 y + x y苶 + x y, = 苶x y + x y + x y苶 + x y, = y(x苶 + x) + x( y苶 + y), =y+x, =x+y, , 133, , (remember, xy + xy = xy), , How did we know to add an extra xy term? Algebraic simplification using Boolean identities can be very complicated. This is where Kmaps can help., , 3A.3, , KMAP SIMPLIFICATION FOR TWO VARIABLES, In the previous reduction to the function F(x,y), the goal was to group the terms so that we could factor the variables. We add the xy to give us a term to combine with the 苶xy. This allowed us to factor y, leaving x苶 + x, which reduces to 1. However, if we use Kmap's simplification, we don't have to worry about which terms to add or which boolean identity to use. Maps take care of that for us. Let's look at the Kmap for F(x,y) = x + y again in Figure 3A.3. To use this map to reduce a boolean function, we simply need to group ones together. This grouping is very similar to how we group terms when reducing using Boolean identities, except that we must follow specific rules. First, we group just a few. Second, we can group ones in Kmap if they are in the same row or column, but they cannot be diagonal (ie they must be adjacent cells). Third, we can group ones if the total number in the group is a power of 2. The fourth rule specifies that we should make the groups as large as possible. As the fifth and final rule, everyone must be in a group (even if some are in a group of one). Let's look at some correct and incorrect groupings, as shown in Figures 3A.4 to 3A.7. Notice in Figure 3A.6(b) and 3A.7(b) that a 1 belongs to two groups. This is the map equivalent to adding the xy term to the boolean function, as we did when we were doing the simplification using identities. The xy term in the map will be used twice in the simplification procedure. To simplify the use of Kmaps, first create the groups as specified in the rules above. Once you find all the groups, examine each group and discard the variable that differs within each group. For example, Figure 3A.7(b) shows the correct grouping for F(x,y) = x + y. Let's start with the group represented by the second line (where x = 1). The two minterms are x 苶y and xy. This group represents the logical OR of these two terms, or x y苶 + xy. These terms differ in y, so we discard y, y, x, , FIGURE 3A.3, , 0, , 0, 0, , 1, 1, , 1, , 1, , 1, , Kmap for F( x,y) = x + y


Page 165:
134, , Chapter 3 / Boolean algebra and digital logic, , y, x, , y, , 0, , 0, 0, , 1, 1, , 1, , 1, , 1, , , x, , y, , 0 , , 0, 0, , 1, 1, , 1, , 1, , 1, , x, , a) Incorrect, , b) Correct, , FIGURE 3A.4, , Groups contain only 1, , and, x , 0, , 1, 1, , 1, , 1, , 1, , a) Incorrect, , FIGURE 3A.6, , x, , 0, , 1, 1, , 1, , 1, , 1, , x , 0, , 0, 0, , 1, 1, , 1, , 1, , 1, , a) Incorrect, , FIGURE 3A.5, , y, 0, 0, , y, 0, 0, , b ) Correct , , Groups cannot be diagonal, , y, , 0, , 0, 0, , 1, 1, , 1, , 1, , 1, , x, , b) Correct, , y, , 0, , 0, 0 , , 1, 1, , 1, , 1, , 1, , x, 0, , 0, 0, , 1, 1, , 1, , 1, , 1, , , a) Incorrect, , The groups should be powers of 2, , FIGURE 3A.7, , b) Correct, , The groups should be as large as possible, , leaving only x. (We can see that if we used boolean identities, it would reduce to the same value. Kmap allows us to take a shortcut, helping us automatically drop the correct variable.) The second group represents 苶xy + xy. These differ by x, so x is discarded, leaving y. If we OR the results of the first group and the second group, we have x + y, which is the correct reduction of the original function, F., , 3A.4, , SIMPLIFICATION OF KMAP FOR THREE VARIABLES, Kmaps can be applied to expressions of more than two variables. In this focus section, we show three and four variable Kmaps. These can be extended to situations that have five or more variables. We refer to Maxfield (1995) in the "Further Reading" section of this chapter for a comprehensive and enjoyable coverage of Kmaps. He already knows how to configure Kmaps for expressions involving two variables. We simply extend this idea to three variables, as indicated in Figure 3A.8. The first difference he should note is that two variables, y and z, are grouped together in the table. The second difference is that the column numbering is not sequential. Instead of labeling the columns 00, 01, 10, 11 (the normal binary progression), we label them 00, 01, 11, 10. The input values ​​to Kmap must be ordered such that each minterm differs by only one variable from each neighbour. Using this order (for example, 01 followed by 11), the, yz, x, , FIGURE 3A.8, , 0, , 00, XYZ, , 01, XYZ, , 11, XYZ, , 10, XYZ, , 1 , , XYZ, , XYZ, , XYZ, , XYZ, , Minterms and Kmap format for three variables


Page 166:
3A.4 / Kmap simplification for three variables, , 135, , the corresponding minterms, 苶x y苶z and 苶xyz, differ only in the variable y. Remember, to reduce, we need to eliminate the variable that is different. Therefore, we must ensure that each group of two minterms differs by only one variable. The largest groups we found in our two-variable examples were made up of two 1's. It is possible to have groups of four or as many as eight 1's, depending on the function. Let's look at some examples of simplifying maps for three-variable expressions: EXAMPLE 3A.3 F(x, y, z) = x苶 y苶 z + 苶xyz + xy苶z + xyz, yz, x, 0, , 00 , 0, , 01, 1, , 11, 1, , 10, 0, , 1, , 0, , 1, , 1, , 0, , Again we follow the rules for forming groups. You should see that you can make groups of two in a variety of ways. However, the rules stipulate that we should create the largest groups whose sizes are powers of two. There is a group of four, so we group them like this:, yz, x, 0, , 00, 0, , 01, 1, , 11, 1, , 10, 0, , 1, , 0, , 1, , 1 , , 0, , It is not necessary to create additional groups of two. The fewer groups you have, the fewer terms there will be. Remember, we want to simplify the expression, and all we have to do is make sure that every 1 is in some group. How, exactly, do we simplify when we have a group of four 1's? Two 1's in a group allowed us to rule out a variable. Four 1's in a group allows us to drop two variables: The two variables in which the four terms differ. In the group of four in the example above, we have the following minterms: x苶 y苶 z, x苶yz,, xy苶z, and xyz. They all have z in common, but the variables x and y differ. Therefore, we discard x and y, leaving us with F(x,y,z) = z as the final reduction. To see how this simplification parallels the use of Boolean identities, consider the same reduction, using identities. Note that the function is originally represented as the logical OR, of minterms with value 1., F(x. y, z) = x苶 y苶 z + 苶x y z + x 苶y z + x y z, = x苶 ( y苶 z + y z) + x(y苶 z + y z), = (x苶 + x)(y苶 z + y z), = y苶 z + y z, = (y苶 + y)z, =z


Page 167:
136, , Chapter 3 / Boolean Algebra and Digital Logic, , The final result using Boolean identities is exactly the same as the result using map simplification. From time to time, the bundling process can be a bit tricky. Let's look at an example that requires further analysis. EXAMPLE 3A.4 F (x, y, z) = x y z + x y z + x y z + x y z + x y z + x y z, yz, x, 0, , 00, 1, , 01, 1, , 11, 1, , 10, 1 , , 1, , 1, , 0, , 0, , 1, , This is a tricky problem for two reasons: we have overlapping groups and we have a “wrapping” group. The leftmost 1's in the first column can be grouped with the rightmost 1's in the last column, because the first and last columns are logically adjacent (imagine the map as drawn on a cylinder ). Kmap are also logically adjacent, which becomes apparent when we look at four-variable maps in the next section. The correct collations are: The first group reduces to x (this is the only term all four have in common) and the second group reduces to z, so the final minimized function is F(x, y, z) = x + + z., EXAMPLE 3A.5 A Kmap with all 1's, suppose we have the following Kmap:, yz, x, 0, , 00, 1, , 01, 1, , 11, 1, , 10, 1, , 1, , 1 , , 1, , 1, , 1, , The largest group of 1's that we can find is a group of eight, which puts all 1's in the same group. How do we simplify this? We follow the same rules that we have followed. Remember, groups of two eliminate one variable and groups of four eliminate two variables; therefore groups of eight should allow us to eliminate three variables. But that's all we got! If we eliminate all the variables, we are left with F(x,y,z) = 1. If you look at the truth table for this function, you will see that we actually have a correct simplification.


Page 169:
138, , Chapter 3 / Boolean Algebra and Digital Logic, , EXAMPLE 3A.7 Choice of groups, yz, wx, 00, , 00, 1, , 01, , 1, , 11, , 1, , 10, , 1 , , 01, , 11, 1, , 10, , 1, , 1, , The first column must be clearly grouped. Also, the terms w, 苶 x 苶 y z and w, 苶 x y z must be grouped together. However, we can choose how to group the term w, 苶 x, and 苶z. It can be grouped with w, x, y, z, or, with, w, x, y, z, (as, the, envelope). These two, 苶, 苶 苶苶, solutions are given below., yz , wx, , yz, , 00, , 00, 1, , 01, , 1, , 11, 10, , 01, , 11, 1 , , 10, , 1, , 1, , wx, 00, , 00 , 1, , 01, , 1, , 1, , 11, , 1, , 1, , 10, , 1, , 01, , 11, 1, , 10, , 1, , 1, , wyz + w , The first map simplifies to F(w,x,y,z) = F1 = 苶y z苶 + 苶, 苶 x y. The second map, simplified to F (w, x, y, z) = F2 = y苶 z苶 + w, y, z, +, w, x, z, ., The latter terms are different. However, F1, 苶, 苶苶, and F2 are equivalent. We leave it up to you to produce the truth tables for F1 and F2 to verify equality. Both have the same number of terms and variables as well. If we follow the rules, minimizing Kmap results in a minimized function (and thus a minimal circuit), but these minimized functions need not be unique in representation. Before moving on to the next section, here are the rules for simplifying Kmap ., 1., 2., 3., 4., 5., 6., 7., 8., , Groups can only contain 1; you can't group 0, you can only group 1 in adjacent cells; Diagonal grouping is not allowed. The number of 1 in a group must be a power of 2. Groups must be as large as possible following all the rules. All 1's must belong to a group, even if it is a group of one. ., Overlapping groups allowed., Wrapping allowed., Use as few groups as possible.


Page 170:
3A.5 / Simplifying Kmap to four variables, , 139, , Using these rules, let's complete one more example for a function of four variables., Example 3A.8 shows several applications of the various rules., EXAMPLE 3A.8, w x苶y z + 苶, wxyz, F (w, x, y, z) = w, 苶 x苶 y苶 z苶 + w, 苶 x苶 y z + 苶, + w x 苶y z + w x y z + w 苶x y z + w苶x y z, yz, wx, , 01, , 11, 1, , 01, , 1, , 1, , 11, , 1, , 1, , 00, , 00, 1, , 1, , 10, , 10 , , 1, , In this example, we have a group with only one element. Note that there is no way to group this term with others if we follow the rules. The function represented by this Kmap simplifies to F(w, x, y, z) = y z + x z + w, 苶 x 苶 y 苶 z 苶. If you are given a function that is not written as a sum of minterms, you can still use Kmaps to help minimize the function. However, you will have to use a slightly reverse procedure to what we are doing to configure Kmap, before the reduction can take place. Example 3A.9 illustrates this procedure. EXAMPLE 3A.9 A function not represented as a sum of minterms. Suppose you are given the function F(w, x, y, z) = w, w x苶 y z苶. The last one, 苶xy + w, 苶 x苶 and z + 苶, two terms are minterms and we can easily place 1's in the appropriate positions in Kmap. However, the term wx y is not a minterm. Suppose this term is the result of a grouping you performed on a Kmap. The term that was dropped was the z term, which means that this term is equivalent to the two terms 苶, w x and 苶z +, w, x, y, z. You can now use these two terms in the Kmap because they are both 苶 min terms. Now we get the following Kmap:, yz, wx, 00, , 00, , 11, 1, , 10, 1, , 01, , 1, , 1, , 11, 10, , 01


Page 171:
140, , Chapter 3 / Boolean Algebra and Digital Logic, , Then we know that the function F(w, x, y, z) = w, w x苶 y z + w, 苶xy + 苶, 苶 x苶 y z苶 simplifies to , F (w, x, y, z) = w, 苶 y., , 3A.6, , CONDITIONS DO NOT MATTER, there are certain situations where a function may not be completely specified, which means that there may be some inputs that are not defined for the function. For example, consider a function with 4 inputs that act as bits to count, in binary, from 0 to 10 (decimal). We use the bit combinations 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, and 1010. However, we do not use the combinations 1011, 1100, 1101, 1110, and 1111, which means that if look at the truth table, these values ​​would not be 0 or 1. They should not be in the truth table. We can use these indifferent inputs to our advantage by simplifying Kmaps. Since these are input values ​​that shouldn't matter (and should never be given), we can leave them with values ​​of 0 or 1, depending on what helps us the most. The basic idea is to define these indifferent values ​​in such a way that they either contribute to a larger group or do not contribute at all. Example 3A.10 illustrates this concept. EXAMPLE 3A.10 Conditions don't matter, values ​​don't matter are usually indicated by an "X" in the corresponding cell. The following Kmap shows how to use these values ​​to help with minimization. We treat irrelevant values ​​in the first row as 1 to help form a group of four. Irrelevant values ​​on lines 01 and 11 are treated as 0. This reduces to, F1 (w, x, y, z) = w, 苶 x苶 + y z., yz, wx, 00, , 00, , 01, , 11, , 10, , X, , 1, , 1, , X, , X, , 1, , 01, 11, , 1, , X, , 10, , 1, , There is another way to group these values:, yz, wx, 00, , 00, , 01 , , 11, , 10, , X, , 1, , 1, , X, , X, , 1, , 01, 11, 10, , X, , eleven


Page 172:
Exercises, , 141, , Using the above groupings, we end up with a simplification of F2 (w, x, y, z) =, wz + yz . Note that in this case F1 and F2 are not the same. However, if you create the truth tables for both functions, you should see that they are not equal only in those values ​​that "we don't care about". a brief introduction to kmaps and map simplification. Using Boolean identities for reduction is tricky and can be very difficult. Kmaps, on the other hand, provides a precise set of steps to follow to find the minimal representation of a function, and therefore the minimal circuit representing that function., , I, , EXERCISES, 1. Write a simplified expression for the boolean function defined by each of the following Kmaps:, ◆ a), yz, x, , ◆, , b) x, , c), , 0, , 00, 0, , 01, 1, , 11, 1 , , 10, 0, , 1, , 1, , 0, , 0, , 1, , 0, , 00, 0, , 01, 1, , 11, 1, , 10, 1, , 1, , 1, , 0 , , 0, , 0, , 0, , 00, 1, , 01, 1, , 11, 1, , 10, 0, , 1, , 1, , 1, , 1, , 1, , , and z , , yz , x, , 2. Create the Kmaps and simplify to the following functions: a) F (x, y, z) = x苶 y苶 z苶 + 苶x y z + x苶 y z苶, b) F (x , y , z) = x苶 y苶 z苶 + 苶x y z苶 + x y苶 z苶 + x y 苶z, c) F (x, y, z) = y苶z苶 + 苶y z + x y z苶


Page 174:
Exercises, , 143, , 6. Write a simplified expression for the Boolean function defined by each of the following Kmaps:, ◆ a), yz, x, , b), , 0, , 00, 1, , 01, 1 , , 11, 0, , 10, X, , 1, , 1, , 1, , 1, , 1, , yx, wx, 00, , 00, 1, , 01, 11, 10, , , 01, 1 , , 11, 1, , 10, 1, , X, , 1, , X, , X, 1, , X, , 1


Page 176:
“When you want to produce a result through an instrument, don't allow yourself to complicate it.”, , —Leonardo da Vinci, , CHAPTER, 4, 4.1, , MARIE: Introduction to a simple computer, INTRODUCTION, designing a computer these days is a job for a well-trained computer engineer. It is impossible in an introductory book like this (and in an introductory course on computer organization and architecture) to present everything needed to design and build a working computer like the ones we can buy today. However, in this chapter we will first look at a very simple computer called MARIE: a really intuitive and easy machine architecture. Below we provide brief overviews of Intel and MIP machines, two popular architectures that reflect CISC and RISC design philosophies. The purpose of this chapter is to give you an understanding of how a computer works. Therefore, we kept the architecture as uncomplicated as possible, following the advice of Leonardo da Vinci's opening quote, 4.1.1, CPU Basics and Organization. From our studies in Chapter 2 (representation of data), we know that a computer must manipulate data encoded in binary. We also know from Chapter 3 that memory is used to store data and program instructions (also in binary). Somehow the program needs to run and the data needs to be processed correctly. The central processing unit (CPU) is responsible for obtaining the program instructions, decoding each instruction that is obtained, and performing the indicated sequence of operations on the correct data. To understand how computers work, you must first become familiar with their various components and the interaction between them. To introduce the simple architecture in the next section, first, 145


Page 177:
146, , ​​Chapter 4 / MARIE: Introduction to a Simple Computer, , examines, in general, the microarchitecture that exists at the control level of modern computers. All computers have a central processing unit. This unit can be divided into two parts. The first is the datapath, which is a network of storage units (registers) and arithmetic and logical units (to perform various operations on the data), connected by buses (capable of moving data from one place to another) where the data is controlled. weather. watches. The second component of the CPU is the control unit, a module responsible for sequencing operations and ensuring that the correct data is where it needs to be at the correct time. Together, these components perform the tasks of the CPU: getting instructions, decoding them, and finally executing the indicated sequence of operations. The performance of a machine is directly affected by the design of the data path and the control unit. Therefore, we will cover these CPU components in detail in the following sections. program execution. Simply put, a register is a hardware device that stores binary data. The registers are located in the processor so that the information can be accessed very quickly. We saw in Chapter 3 that D flip-flops can be used to implement registers. A D flip-flop is equivalent to a 1-bit register, so a collection of D flip-flops is needed to store multi-bit values. For example, to build a 16-bit register, we need to connect 16 D flip-flops together. We saw in our binary counterfigure in Chapter 3 that these collections of flip-flops must be programmed to work in unison. With each clock tick, the input goes into the register and cannot be changed (and therefore stored) until the clock ticks again. Data processing in a computer is usually done in binary words of fixed length, which are stored in registers. Therefore, most computers have registers of a certain size. Common sizes include 16, 32, and 64 bits. The number of registers on a machine varies from architecture to architecture, but is typically a power of 2, with 16 and 32 being the most common. Registers contain data, addresses, or control information. Some registers are specified as "special purpose" and may contain only data, only addresses, or only control information. Other registers are more generic and may contain data, addresses, and control information at various times. Information is written to registers, read from registers, and transferred from one register to another. Registers are not addressed in the same way that memory is addressed (remember that each memory word has a unique binary address starting at location 0). The registers are addressed and manipulated by the control unit itself. In modern computer systems, there are many types of specialized registers: registers for storing information, registers for changing values, registers for comparing values, and registers that count. There are "scratchpad" registers that store temporary values, index registers to control the cycle of the program, stack pointer registers to manage stacks of information for processes, status registers to maintain state or mode


Page 178:
4.1 / Introduction, , 147, , operational registers (such as overflow, carry or zero conditions) and general purpose which are the registers available to the programmer. Most computers have sets of registers, and each set is used in a specific way. For example, the Pentium architecture has one set of data registers and one set of address registers. Certain architectures have very large sets of registers that can be used in quite new ways to speed up instruction execution. (We discussed this topic when we covered advanced architectures in Chapter 9.) The ALU The arithmetic logic unit (ALU) performs necessary logic operations (such as comparisons) and arithmetic operations (such as addition or multiplication) during program execution. . You saw an example of a simple ALU in Chapter 3. Typically, an ALU has two data inputs and one data output. Operations performed on the ALU generally affect bits in the status register (the bits are set to indicate actions, such as if an overflow has occurred). The ALU knows what operations to perform because it is controlled by signals from the control unit. The Control Unit The control unit is the "policeman" or "traffic manager" of the CPU. It oversees the execution of all instructions and the transfer of all information. The control unit fetches instructions from memory, decodes those instructions, making sure the data is in the right place at the right time, tells the ALU which registers to use, stops the services, and turns on the correct circuit in the ALU to his execution. of the desired operation. The control unit uses a program counter register to find the next instruction to execute and a status register to track overflows, loads, borrows and the like. Section 4.7 discusses the control unit in more detail., 4.1.2, , The Bus, The CPU communicates with the other components via a bus. A bus is a set of cables that acts as a shared but common data path to connect various subsystems within the system. It consists of multiple lines, allowing parallel movement of bits. Buses are low cost but very versatile and make it easy to connect new devices to each other and to the system. At any time, only one device (be it a register, the ALU, memory, or some other component) can use the bus. However, this exchange often results in a communication bottleneck. The speed of the bus is affected by its length, as well as the number of devices that share it. Devices are often divided into the categories of master and slave, where a master device is the one that initiates actions and a slave is the one that responds to requests from a master. A bus can be point-to-point, connecting two components (as seen in Figure 4.1a), or it can be a common path connecting multiple devices, requiring those devices to share the bus (called a multidrop bus and shown below). in Figure 4.1b).


Page 179:
148, , Chapter 4 / MARIE: Introduction to a simple computer, , Serial, Port, , Modem, , (a), Control, Unit, , ALU, , Printer, , Computer 1, , Computer 2, , File, Server , , (b), , Disk, CPU, , Disk, Controller, , Memory, , Monitor, , FIGURE 4.1, , Disk, Controller, , a) Point-to-point buses, b) A multidrop bus


Page 180:
4.1 / Introduction, , 149, , Power, , CPU, , Address bus, Data bus, Control bus, , I/O, Device, , Main, Memory, , I/O, Device, , I/Subsystem S S, , FIGURE 4.2, , The components of a typical bus, , Because of this sharing, the bus protocol (set of usage rules) is very important. Figure 4.2 shows a typical bus consisting of data lines, address lines, control lines, and power lines. The lines of a bus dedicated to moving data are often called a data bus. These rows of data contain the actual information that needs to be moved from one location to another. The control lines indicate which device can use the bus and for what purpose (read or write from memory or an I/O device, for example). The control lines also transfer acknowledgments, for bus requests, interrupts, and clock synchronization signals. The address lines indicate the location (in memory, for example) where the data will be read from, from, or written to. Power lines provide the necessary electrical energy. Typical bus transactions include the sending of an address (for a read or write), the transfer of data from memory to a register (a memory read), and the transfer of data to memory from a register (a write of memory). memory). Additionally, the buses are used for I/O, reads, and writes from peripheral devices. Each type of transfer occurs within one bus cycle, the time between two ticks of the bus clock. Due to the different types of information bus transportation and the various devices that buses use, the buses themselves have been divided into different types. Processor memory buses are short, high-speed buses that are matched to the machine's memory system to maximize bandwidth (data transfer), and are typically of a very specific design. I/O buses are typically longer than processor memory buses and support many types of devices with different bandwidths. These buses support many different architectures. A backplane bus (Figure 4.3) is actually built into the chassis of the machine and connects the processor, I/O devices, and memory (so all devices share one bus). Many computers have a bus hierarchy, so it is not uncommon to have two buses (for example, a processor memory bus and an I/O bus) or more in the same system. High performance systems typically use all three types of buses.


Page 181:
150, , Chapter 4 / MARIE: Introduction to a Simple Computer, , System, Bus, , Interface, Cards, , FIGURE 4.3, , A backplane bus, , Personal computers have their own terminology when it comes to buses. PCs have an internal bus (called the system bus) that connects the CPU, memory, and all other internal components. External buses (sometimes called expansion buses) connect external devices, peripherals, expansion slots, and I/O ports to the rest of the computer. Most PCs also have local buses, data buses that connect a peripheral device directly to the CPU. These are very high speed buses and can only be used to connect a limited number of similar devices. Expansion buses are slower but allow for more generic connectivity. Chapter 7 deals with these issues in great detail. Buses are physically little more than bundles of wire, but they have specific standards for connectors, timing and signaling specifications, and exact protocols for their use. Synchronous buses are timed and things just happen on the clock, ticking (a sequence of events is controlled by the clock). Each device is synchronized by the speed at which it ticks the clock, or the frequency of the clock. The bus cycle time, mentioned above, is the reciprocal of the bus clock frequency. For example, if the bus clock speed is 133 MHz, the bus cycle time is 1/133,000,000 or 7.52 ns. problems, which implies that the bus should be kept as short as possible so that the clock drift is not too large. In addition, the cycle time of the bus must not be less than the time it takes for the information to travel the bus. The length of the bus, therefore, imposes restrictions on the bus clock, the tariff, and the bus cycle time. With asynchronous buses, control lines coordinate operations and a complex handshake must be used to enforce synchronization. To read a data word from memory, for example, the protocol would require steps similar to the following: 1. ReqREAD: This bus control line is turned on and the data memory address is placed on the appropriate bus lines to the Same time. , 2 .ReadyDATA: This control line is activated when the memory system has placed the required data on the data lines for the bus., 3. ACK: This control line is used to indicate that ReqREAD or ReadyDATA has been acknowledged .


Page 182:
4.1 / Introduction, , 151, , Using a protocol instead of the clock to coordinate transactions means that asynchronous buses scale better with technology and can support a wider range of devices. To use a bus, a device must reserve it. because only one device can use the bus at a time. As mentioned above, bus masters are devices that can initiate information transfer (control bus) while bus slaves are modules that, when activated by a master, respond to data read and write requests ( only teachers can reserve the bus). . They both follow a communication protocol for using the bus, working within very specific time requirements. In a very simple system (like the one we present in the next section), the processor is the only device that can become a bus master. This is good for avoiding chaos, but bad because the processor is now involved in all transactions that use the bus. In systems with more than one master device, bus arbitration is required. On the bus, arbitration schemes must give priority to certain master devices and ensure that lower priority devices are not excluded. Bus arbitration schemes fall into four categories: 1. Chain Arbitration: This scheme uses a "grant bus" control line that runs along the bus from the higher priority device to the lower priority device. Low-priority devices may "hang" and never be allowed to use the bus.) This scheme is simple but not fair. 2. Centralized parallel arbitration: Each device has a request control line for the bus, and a centralized arbitrator selects who takes the bus. Bottlenecks can arise using this type of arbitration. 3. Distributed arbitration using self-selection: This scheme is similar to centralized arbitration, but instead of a central authority selecting who gets the bus, the devices themselves determine who has higher priority. . who should take the bus. 4. Distributed Arbitration using Collision Detection: Each device can make a request to the bus. If the bus detects a collision (multiple simultaneous requests), the device must make another request. (Ethernet uses this type of arbitration.) Chapter 7 contains more detailed information on buses and their protocols, 4.1.3, Clocks, Every computer contains an internal clock that regulates how fast instructions can be executed. The clock also synchronizes all system components. As the clock ticks down, it sets the pace for everything that happens in the system, like a metronome or a symphony conductor. The CPU uses this clock to regulate its progress by checking the unpredictable speed of the digital logic gates. The CPU requires a fixed number of clock cycles to execute each instruction. Therefore, instruction performance is often measured in clock cycles, the time between ticks of the clock, rather than seconds. Clock frequency (sometimes called clock rate or clock speed) is measured in MHz, as we saw in Chapter 1, where 1 MHz equals 1 million cycles per second (so 1 hertz is 1 cycle per second). ). The clock cycle time (or clock period) is simply the reciprocal


Page 183:
152, , Chapter 4 / MARIE: Introduction to a simple computer, , of the clock frequency. For example, an 800 MHz machine has a clock cycle time of 1/800,000,000, or 1.25 ns. If a machine has a cycle time of 2 ns, then it is a 500 MHz machine. Most machines are synchronous: there is a master clock signal, which ticks (changes from 0 to 1 to 0 and so on) to regular intervals. Loggers must wait for the clock to strike before new data can be uploaded. It seems reasonable to assume that if we speed up the clock, the machine will run faster. However, there are limits to the duration of the clock cycles. As the clock ticks down and new data is loaded into the registers, the register outputs are likely to change. These modified output values ​​must propagate through all the circuits in the machine, until they reach the input of the next set of registers, where they are stored. The clock cycle must be long enough to allow these changes to reach the next set of registers. If the clock cycle is too short, we may end up with some values ​​not reaching the registers. This would result in an inconsistent state on our machine, which is definitely something to avoid. Therefore, the minimum clock cycle time must be at least as large as the maximum propagation delay of the circuit, from each set of register outputs to register inputs. What happens if we “shorten” the distance between the registers to decrease the propagation delay? We could do this by adding records between the output records and the corresponding input records. But remember that registers can't change values ​​until the clock ticks, so we actually increase the number of clock cycles. For example, an instruction that would require 2 clock cycles might now require 3 or 4 (or more, depending on where we put the extra registers). Most machine instructions require 1 or 2 clock cycles, but some can take 35 or more. We present the following formula to relate seconds to cycles: seconds, seconds, instructions, average cycles, CPU time = ᎏ = ᎏᎏ ⫻ ᎏᎏ ⫻ ᎏ, program, program, instruction, cycle, It is important to note that the architecture of a machine it has a big effect on your performance. Two machines with the same clock speed do not necessarily execute instructions in the same number of cycles. For example, a multiply operation, on an older Intel 286 machine, required 20 clock cycles, but on a newer Pentium, a multiply operation can be performed in 1 clock cycle, which implies that the newer machine it would be 20 times faster than the 286 even though both have the same internal system clock. In general, multiplication takes longer than addition, floating point operations take more cycles than integer operations, and memory access takes longer than register access. Generally, when we mention the term clock, we mean the system clock, or the master clock that regulates the CPU and other components. However, some buses also have their own clocks. Bus clocks are often slower than CPU clocks, causing bottleneck problems. System components have defined performance thresholds, which indicate the maximum time required for the components to perform their functions. Manufacturers guarantee that their components will perform within these limits under the most extreme circumstances.


Page 184:
4.1 / Introduction, , 153, , circumstances. When connecting all components in series, where one component must complete its task before another can function properly, it is important to know these performance limits so that we can correctly synchronize the components. However, many people push the limits of certain system components in an attempt to improve system performance. Overclocking is one method that people use to achieve this goal. While many components are potential candidates, one of the most popular components to overclock is the CPU. The basic idea is to run the CPU at clock and/or bus speeds above the upper limit specified by the manufacturer. While this can increase system performance, care must be taken not to create system time glitches or worse, overheat the CPU. The system bus can also be overclocked, resulting in the various components that communicate over the bus being overclocked. Overclocking the system bus can provide considerable performance improvements, but it can also cause components that use the bus to be damaged or perform unreliably. communicate with the computer system. I/O is the transfer of data between main memory and various I/O peripherals. Input devices such as keyboards, mice, card readers, scanners, voice recognition systems, and touch screens allow us to enter data into the computer. Output devices such as monitors, printers, plotters, and speakers allow us to get information from the computer. These devices are not directly connected to the CPU. Instead, there is an interface that handles data transfers. This interface converts system bus signals to and from a format acceptable to the given device. The CPU communicates with these external devices through input/output registers. This data exchange takes place in two ways. In memory-mapped I/O, the registers in the interface appear in the computer's memory map, and there is no real difference between accessing memory and accessing an I/O device. Clearly, this is advantageous from a speed point of view, but it consumes memory space on the system. With instruction-based I/O, the CPU has specialized instructions that do input and output. While this does not use memory space, it does require specific I/O instructions, which means that it can only be used by CPUs that can execute those specific instructions. Interrupts play a very important role in I/O, as they are an efficient way to notify the CPU that an input or output is available for use. small memory in Chapter 3. However, we have not yet discussed in detail how memory presents itself and how it is addressed. It is important that you have a good understanding of these concepts before proceeding. You can view memory as a bit array. Each line, implemented by a register, has a length typically equal to the size of the machine word. Each


Page 185:
154, , Chapter 4 / MARIE: Introduction to a simple computer, Address, , 8 bits, , Address, , 16 bits, , 1, 2, 3, 4, ..., M, , 1, 2, 3, 4 , ..., N, (a), , FIGURE 4.4, , (b), , a) N 8-bit memory locations, b) M 16-bit memory locations, , register (more commonly known as a location memory) has a unique address; memory addresses generally start at zero and progress upward. Figure 4.4 illustrates this concept. An address is almost always represented by an unsigned integer. Remember from Chapter 2 that 4 bits is a nibble and 8 bits is a byte. Normally, memory is byte-addressable, which means that each individual byte has a unique address. Some machines may have a word size greater than a single byte. For example, a computer can handle 32-bit words (meaning it can handle 32 bits at a time across multiple instructions) but still employ a byte-addressable architecture. In this situation, when a word uses several bytes, the byte with the lowest address determines the address of the entire word. It's also possible that a computer could be word-addressable, which means that each word (not necessarily each byte) has its own address, but most machines today are byte-addressable (even if they have words of 32 bits or longer). ). 🇧🇷 A memory address is normally stored in a single machine word. If all this talk about machines using byte addressing with different word lengths has you a bit confused, the following analogy may help. Memory is similar to a street lined with apartment buildings. Each building (word) has several apartments (bytes) and each apartment has its own address. All apartments are sequentially numbered (addressed) from 0 to the total number of apartments in the complex. The buildings themselves serve to group the apartments. On computers, words do the same. Words are the basic unit of length used in many instructions. For example, you can read or write a word in memory, even on a byte-addressable machine. If an architecture is byte-addressable and the instruction set architecture word is greater than 1 byte, the alignment problem must be addressed. For example, if we want to read a 32-bit word into a byte-addressable machine, we must ensure that: (1) the word was stored at a natural alignment boundary, and (2) access begins at that boundary. This is done, in the case of 32-bit words, by requiring that the address be a multiple of 4. Some architectures allow unaligned accesses, where the desired address need not start at a natural boundary. Memory is built with random memory chips. access (RAM). (We covered memory in detail in Chapter 6.) Memory is often referred to using the notation L ⫻ W, (length ⫻ width). For example, 4M ⫻ 16 means that the memory has a length of 4M (has


Page 186:
4.1 / Introduction, Total elements, Total in power of 2, Number of bits, , TABLE 4.1, , 2, 21, 1, , 4, 22, 2, , 8, 23, 3, , 16, 24, 4, , 155, , 32, 25, ??, , Calculation of required address bits, , 4M = 22 ⫻ 220 = 222 words) and is 16 bits wide (each word is 16 bits). The width (second number in the pair) represents the size of the word. To address this memory (assuming word addressing), we need to be able to uniquely identify 212 different items, which means we need 212 different addresses. Since the addresses are unsigned binary numbers, we need to count from 0 to (212 ⫺ 1) in binary. How many bits does it require? Well, to count from 0 to 3 in binary (for a total of 4 elements), we need 2 bits. To count from 0 to 7 in binary (for a total of 8 elements) we need 3 bits. To count from 0 to 15 in binary (for a total of 16 elements) we need 4 bits. Do you see a pattern emerging here? Can you fill in the missing value for Table 4.1? The correct answer is 5 bits. In general, if a computer has 2N addressable memory units, it will take N bits to uniquely address each byte. Main memory is usually larger than a RAM chip. Consequently, these chips are combined into a single memory module to provide the desired memory size. For example, suppose you need to build 32K ⫻ 16 memory and all you have is 2K ⫻ 8 RAM chips. You can connect 16 rows and 2 columns of tiles as shown in Figure 4.5. Each chip line is 2000 words (assuming the machine is word addressable), but requires two chips to handle the full width. Addresses for this, the memory must be 15 bits (that's 32K = 25 ⫻ 210 words to access). But each chip pair (each line) requires only 11 address lines (each chip pair contains only 211 words). In this situation, a decoder would be needed to decode the leftmost 4 bits of the address to determine which chip pair contains the desired address. Once the proper chip pair was located, the remaining 11 bits were fed into another decoder to find the exact address within the chip pair. A single shared memory module causes the access sequence. Interleaved memory, dividing memory into multiple memory modules (or banks), Line 0, , 2K ⫻ 8, , 2K ⫻ 8, , Line 1, , 2K ⫻ 8, , 2K ⫻ 8, •••, , Line 15 , , FIGURE 4.5, , 2K ⫻ 8, , 2K ⫻ 8, , Memory as a collection of RAM chips


Page 187:
156, , Chapter 4 / MARIE: Introduction to a simple computer, Module 1, , Module 2, , Module 3, , Module 4, , Module 5, , Module 6, , Module 7, , Module 8, , 0, , 4 , , 8, , 12, , 16, , 20, , 24, , 28, , 1, 2, , 5, 6, , 9, 10, , 13, 14, , 17, 18, , 21, 22, , 25, 26, , 29, 30, , 3, , 7, , 11, , 15, , 19, , 23, , 27, , 31, , FIGURE 4.6, , High-order memory interleaving, , can be used to help alleviate this. With low-order interleaving, the low-order bits of the address are used to select the bank; in high-order interleaving, the high-order bits of the address are used. High-order interleaving, the most intuitive organization, allocates addresses so that each module contains consecutive addresses, as shown by the 32 addresses in Figure 4.6. Low-order interleaved memory places consecutive words of memory in different memory modules. . Figure 4.7 shows low-order interleaving in 32 directions. overlap). The key concepts to focus on are: (1) Memory addresses are unsigned binary values ​​(although we often view them as hexadecimal values ​​because it's easier) and (2) The number of items to be addressed determines the number of bits. that occur in the direction. Although we can always use more bits than necessary for the address, this is rarely done because minimization is an important concept in computer design. computer architecture: CPU, buses, control unit, registers, clocks, I/O, and memory. However, there is one more concept we need to address, which deals with how these components interact with the processor: the interrupts are, Modulo 1, Modulo 2, Modulo 3, , 0, , 1, , 2, , 3, , 4 , , 5, , 6, , 7, , 8, 16, 24, , 9, 17, 25, , 10, 18, 26, , 11, 19, 27, , 12, 20, 28, , 13, 21, 29, , 14, 22, 30, , 15, 23, 31, , FIGURE 4.7, , Module 4, , Module 5, , Module 6, , Module 7, , Low-order memory interleaving, , Module 8


Page 188:
4.2 / MARIE, , 157, , events that alter (or interrupt) the normal flow of execution in the system. An interrupt can be triggered for many reasons, including: •, •, •, •, •, •, •, •, I/O requests, arithmetic errors (for example, division by zero), arithmetic underflow or overflow, hardware faulty malfunction (for example, memory parity error), user-defined breakpoints (such as when debugging a program), page faults (this is covered in detail in Chapter 6), invalid instructions (some often as a result of a pointer), several, The actions performed for each of these types of interrupts (called interrupt, handler) are very different. Telling the CPU that an I/O request is complete is very different from terminating a program due to division by zero. But these actions are handled by interrupts because they require a change in the normal flow of program execution. An interrupt can be user-initiated or system-initiated, can be maskable (disabled or ignored) or non-maskable (a high-priority interrupt that cannot be disabled and must be acknowledged), can occur within or between instructions, can be synchronous (occurs in the same place every time a program is executed) or asynchronous (occurs unexpectedly) and can result in the program terminating or continuing its execution as soon as the interrupt is handled. Interrupts are discussed in more detail in Section 4.3.2 and Chapter 7. Now that we have provided an overview of the components necessary for a computer system to function, we proceed with a simple but functional architecture to illustrate these concepts. MARIE, MARIE, a Really Intuitive and Easy Machine Architecture, is a simple architecture made up of memory (for storing programs and data) and a CPU (made up of an ALU and various registers). It has all the functional components necessary to be a true working computer. MARIE will help illustrate the concepts in this and the previous three chapters. We describe the architecture of MARIE in the following sections., , 4.2.1, , The Architecture, MARIE has the following characteristics:, •, •, •, •, , Binary, two's complement, Stored Program, fixed word length, Word (but not byte) addressable, 4K words of main memory (this implies 12 bits per address)


Page 189:
158, , Chapter 4 / MARIE: Introduction to a Simple Computer, , •, •, •, •, •, •, •, •, •, , 16-bit data (words are 16-bit), 16-bit instructions bits, 4 for the opcode and 12 for the address, a 16-bit accumulator (AC), a 16-bit instruction register (IR), a 16-bit memory buffer register (MBR), a 12-bit memory buffer (MBR) (PC), A 12-bit memory address register (MAR), an 8-bit input register, an 8-bit output register, Figure 4.8 shows the MARIE architecture. Before we continue, we must emphasize an important point about memory. In Chapter 8, we introduced a simple memory built with D flip-flops. We noted, again, that each memory location has a unique address (represented in binary), and each location can hold a value. These notions of address versus what is actually stored at that address tend to be confusing. To help avoid confusion, simply see a post office. There are mailboxes with various “addresses” or numbers. Inside the mailbox is mail. To receive mail, the PO Box number must be known. The same is true for data or instructions that need to be retrieved from memory. The content of any memory address is manipulated by specifying the address of that memory location. We will see that there are many different ways to specify this address., , Memory address 0, OutREG, ALU, AC, , InREG, , MAR, , MBR, , Main, Memory, , PC, , IR, Control Unit, , A CPU , , FIGURE 4.8, , memory address 4K–1, , MARIE architecture


Page 190:
4.2 / MARIE, , 4.2.2, , 159, , Registers and Buses, Registers are storage locations within the CPU (as illustrated in Figure 4.8). The ALU (Arithmetic Logic Unit) part of the CPU performs all the processing (arithmetic operations, logical decisions, etc.). Registers are used for very specific purposes when running programs: they store temporary values, storage, data that is manipulated in some way, or results of simple calculations. Registers are often implicitly referenced in an instruction, as we see when we describe the instruction set for MARIE that follows in Section 4.2.3. In MARIE, there are seven registers, namely: • AC: the accumulator, which contains data values. This is a general purpose register and contains data that the CPU needs to process. Most computers today have several general purpose registers, • MAR: The memory address register, which contains the memory address of the data being referenced, • MBR: The buffer register memory, which contains the memory just read, or the data ready to be written to memory. • PC: The program counter, which contains the address of the next instruction to be executed in the program. • IR: The instruction register, which contains the next instruction to be executed. • InREG: The input register, which contains data for the input device., • OutREG: The output register, which contains data for the output device., MAR, MBR, PC and IR contain very specific data and cannot be used for nothing else. than the stated purposes. For example, we could not store an arbitrary data value from PC memory. We must use MBR or AC to store this arbitrary value. In addition, there is a status register or flag that contains information indicating various conditions, such as an ALU overflow. However, for clarity, we have not included this record explicitly in any of the figures. MARIE is a very simple computer with a limited set of registers. Modern CPUs have several general-purpose registers, often called user-visible registers, that perform functions similar to those of the AC. Today's computers also have additional registers; For example, some computers have registers that change data values, and other registers that, if taken as a set, can be treated as a list of values. MARIE cannot transfer data or instructions to or from registers without a bus. At MARIE, we assume a common bus scheme. Each device connected to the bus has a number, and before the device can use the bus, it must be configured with that identification number. We also have some ways to speed up execution. We have a communication path between MAR and memory (MAR provides the entries for the memory address lines so the CPU knows where to read or write memory) and a separate path from MBR to AC. There is also a special path from the MBR to the ALU to allow the data in the MBR to be used in arithmetic operations. Information can also flow from the AC through the ALU and back to the AC without being placed on the common bus. The advantage of using these additional paths is that the information can be placed on the computer.


Page 191:
160, , Chapter 4 / MARIE: Introduction to a simple computer, Bus, 0, Main memory, , 1, MAR, 2, PC, 3, MBR, ALU, AC, , InREG, , 4, , 5, , 6 , OutREG, , IR, , 7, , 16-bit bus, , FIGURE 4.9, , The data path on MARIE, , bus mon at the same clock cycle as the data is placed on these other paths, allowing them to occur these parallel events. Figure 4.9 shows the data path (the path that information follows) in MARIE., 4.2.3, , The Instruction Set Architecture, MARIE has a very simple but powerful instruction set. The instruction set architecture (ISA) of a machine specifies the instructions that the computer can execute and the format of each instruction. ISA is essentially an interface, between software and hardware. Some ISAs include hundreds of instructions. We mentioned earlier that each instruction for MARIE consists of 16 bits. The 4 most significant bits, bits 12 through 15, form the opcode that specifies the instruction to be executed (allowing for a total of 16 instructions). The least significant 12 bits, bits 0–11, form an address, allowing a maximum memory size of 212–1. The instruction format for MARIE is shown in Figure 4.10.


Page 192:
4.2 / MARIE, , Opcode, Bit, , 15, , Address, 12 11, , FIGURE 4.10, , 161, , 0, , MARIE Instruction Format, , Most ISAs consist of instructions for processing data, move data and control the sequence of program execution. MARIE's instruction set consists of the instructions shown in Table 4.2. The Load instruction allows us to move data from memory to the CPU (via MBR and AC). All data (including anything that is not an instruction) in memory must first go to the MBR and then to the AC or ALU; there are no other options in this architecture. Note that the load statement does not need to name the AC as the final destination; this register is implicit in the instruction. Other instructions refer to the AC register in a similar way. The Store instruction allows us to move data from the CPU to memory. The Add and Subt instructions, respectively, add and subtract the value of the data found in the X direction to or from the value in AC. Data located at address X is copied to the MBR, where it is kept until the arithmetic operation is performed. Entry and exit allow MARIE to communicate with the outside world. Input and output are complicated operations. In modern computers, input and output is done using ASCII bytes. This means that if you type the number 32 on the keyboard as input, it will actually be read as the ASCII character "3" followed by "2". These two characters must be converted to the numeric value 32 before being stored in the AC. Since we're focusing on how a computer works, let's assume that a value entered from the keyboard is "automatically" converted correctly. We are missing a very important concept: how does the computer know if an input/output value should be treated as numeric or ASCII, if everything that is input or output is really ASCII? The answer is, Instruction Number, Bin, Hex, Instruction, 0001, 0010, 0011, 0100, , 1, 2, 3, 4, Load X, Store X, Add X, Subt X, 0101, 0110 , 0111, 1000 , 1001, , 5, 6, 7, 8, 9, , In, Out, Halt, Skipcond, Jump X, , Meaning, Loads the content of address X into AC., Stores the content of AC at address X. , add the contents of the X address to AC and store the result in AC, subtract the contents of the X address from AC and store the result in AC, input a keyboard value in AC, transmit the value in AC to the screen., Terminate the program., Ignore the next instruction in the condition., Load the value of X into the PC., , TABLE 4.2, , MARIE's instruction set


Page 193:
162, , Chapter 4 / MARIE: Introduction to a simple computer, , that the computer knows through the context of how the value is used. At MARIE, we assume only numeric inputs and outputs. We also allow values ​​to be entered as decimal and assume there is a "magic conversion" to the actual binary values ​​being stored. In reality, these are problems that need to be resolved in order for a computer to function properly. The Halt command causes the execution of the current program to end. The Skipcond statement allows us to perform conditional branches (as is done with "while" loops or "if" statements). When the Skipcond instruction is executed, the value stored in the AC must be inspected. Two of the address bits (suppose we always use the two nearest address bits of the opcode field, bits 10 and 11), specify the condition to test for. If both address bits are 00, it means "skip if AC is negative". If both address bits are 01 (bit eleven is 0 and bit ten is 1), this is translated as "jump if AC equals 0". Finally, if both address bits are 10 (or 2), this is translated as "skip if AC is greater than 0". By "jump" we simply mean jumping to the next instruction. This is done by incrementing PC by 1, essentially ignoring the next instruction, which is never fetched. The Jump instruction, an unconditional branch, also affects the PC. This instruction causes the contents of PC to be replaced by the value of X, which is the address of the next instruction to fetch. We want to keep the architecture. and as simple a set of instructions as possible and still convey the information needed to understand how a computer works. Therefore, we omit several useful instructions. However, you will soon see that this instruction set is still quite powerful. Once you are familiar with how the machine works, we will expand the instruction set to make programming easier. Let's examine the instruction format used in MARIE. Suppose we have the following 16-bit instruction: opcode, address, 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1, bit, 151413121110 9 8 7 6 5 4 3 2 1 0, , All 4 leftmost bits indicate the opcode or instruction to be executed. 0001 is binary to 1, which represents the load instruction. The remaining 12 bits indicate the address of the value we are loading, which is address 3 in main memory. This instruction causes the data value found in main memory, address 3, to be copied to AC. Consider another instruction: opcode, address, 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1, bit, 151413121110 9 8 7 6 5 4 3 2 1 0, , the leftmost four bits, 0011 , are equal to 3, which is the Add statement. The address bits indicate the address 00D in hexadecimal (or 13 decimal). We go to main memory, get the data value at address 00D, and add that value to AC. The value in AC would change to reflect this sum. Here's one more example:


Page 194:
4.2 / MARIE, opcode, 163, address, , 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0, Bit, 151413121110 9 8 7 6 5 4 3 2 1 0, , The opcode for this instruction represents the Skipcond instruction. Bits ten and eleven (read from left to right, or bit eleven followed by bit ten) are 10, indicating a value of 2. This implies a "jump if AC is greater than or equal to 0". If the value of AC is less than zero, this instruction is ignored and we simply move on to the next instruction. If the value in AC is greater than or equal to zero, this instruction causes PC to be incremented by 1, which causes the instruction immediately following this instruction in the program to be ignored (keep this in mind when reading the next section about the instructional cycle). These examples bring up an interesting point. We will be writing programs, using this limited set of instructions. Would you rather write a program using the Load, Add, and Stop commands or their binary equivalents 0001, 0011, and 0111? Most people prefer to use the instruction name, or mnemonic, for the instruction rather than the binary value of the instruction. Our binary instructions are called machine instructions. The corresponding mnemonic instructions are what we call assembly language instructions. There is a one-to-one correspondence between assembly language and machine instructions. When we write a program in assembly language (that is, using the instructions listed in Table 4.2), we need an assembler to convert it to its binary equivalent. We discussed assemblers in Section 4.5., 4.2.4, Register Transfer Notation. We have seen that digital systems consist of many components, including arithmetic logic units, registers, memory, decoders, and control units. These units are interconnected by buses to allow information to flow through the system. The instruction set presented to MARIE in the previous sections constitutes a machine-level instruction set used by these components to execute a program. Each instruction appears to be very simple; however, if you look at what actually happens at the component level, each declaration involves multiple operations. For example, the Load instruction loads the contents of a given memory location into the AC register. But, if we look at what is happening at the component level, we will see that several "mini-instructions" are being executed. First, the address of the instruction must be loaded into the MAR. Then the data in memory at this location must be loaded into the MBR. Then the MBR needs to be loaded onto the AC. These mini-instructions are called micro-operations, and they specify the elementary operations that can be performed on the data stored in the registers. The symbolic notation used to describe the behavior of micro-operations is called Register Transfer Notation (RTN) or Register Transfer Language (RTL). We use the notation M[X] to indicate actual data stored at location X in memory and ← to indicate a transfer of information. In reality, a transfer from one register to another always involves a transfer from the source register to the bus and then a transfer from the bus to the destination register. However, as a matter of


Page 195:
164, , Chapter 4 / MARIE: Introduction to a Simple Computer, , for clarity, we have not included these bus transfers, assuming you understand that the bus will be used for data transfer. We now introduce the register transfer notation for each of the instructions in ISA for MARIE., Load X, Remember that this instruction loads the contents of memory location X into AC. However, the X address must first be placed in MAR. Then the data at location M[MAR] (or address X) is moved to the MBR. Finally, this data is placed, in AC., MAR ← X, MBR ← M[MAR], AC ← MBR, , because the IR must use the bus to copy the value of X to MAR, first, the data at location X can be placed in the MBR, this operation requires two bus cycles. Therefore, these two operations are on separate lines to indicate that they cannot occur during the same cycle. However, because we have a special connection between the MBR and the AC, the data transfer from the MBR to the AC can be done immediately after the data is put into the MBR, without waiting for the bus. , This instruction stores the contents of AC at memory location X:, MAR ← X, MBR ← AC, M[MAR] ← MBR, , Add X, , The data value stored at address X is added to AC. This can be done as follows:, MAR ← X, MBR ← M[MAR], AC ← AC + MBR, , Subt X, , Similar to Add, this instruction subtracts the value stored at address X from the accumulator and places the result back in AC:, MAR ← X, MBR ← M[MAR], AC ← AC ⫺ MBR


Page 196:
4.2 / MARIE, , 165, , Input, , Any input from the input device is first routed to InREG. The data is then passed to AC., AC ← InREG, , Output, , This instruction causes the contents of AC to be put into OutREG, where it is ultimately sent to the output device., OutREG ← AC, , Halt , , No no operation is performed on the registers; the machine simply stops executing., Skipcond, , Remember that this instruction uses the bits in positions 10 and 11 in the address field to determine which comparison to perform on the AC. Depending on this bit combination, the AC is checked to see if it is negative, equal to zero, or greater than zero. If the supplied condition is true, the next statement will be skipped. This is done by incrementing the PC register by 1. If IR[11–10] = 00, then if AC < 0 then PC, ← PC+1, otherwise if IR[11–10] = 01, then if AC = 0 then PC, , {if bit 11 = 0 and bit 10 = 1}, , ← PC + 1, , if IR[11–10] = 10 then, If AC > 0 then PC, , { if bits 10 and 11 in IR are both 0}, , {if bit 11 = 1 and bit 10 = 0}, , ← PC + 1, , if bits in positions ten and eleven are both one, an error occurs in the execution condition. , However, an additional condition can also be set using these bit values., Skip X, , This instruction causes an unconditional branch to the given address, X. Therefore, to execute this instruction, X must be loaded into the PC., PC ← X, , Actually, the lowest or least significant 12 bits of the instruction register (or IR[11–0]) reflect the value of X. Therefore, this transfer is more accurate, represented as :, PC ← IR [11 –0], , However, we feel that the PC ← X notation is easier to understand and relate to the actual instructions, so we use it instead., The register-transfer notation it is a symbolic means of expressing what is happening in the system when a given instruction is being executed. RTN is sensitive to


Page 197:
166, , Chapter 4 / MARIE: Introduction to a Simple Computer, , data path, where if several micro-operations are to share the bus, they must be executed sequentially, one after the other., , 4.3, , INSTRUCTION PROCESSAMENTO, Now that we have a basic language for communicating ideas to our computer, we need to discuss exactly how a specific program runs. All computers follow a basic machine cycle: the get, decode, and execute cycle., , 4.3.1, , The program get, decode, and execute cycle. The CPU gets an instruction (transfers it from main memory to the instruction register), decodes it (determines the opcode and gets all the data needed to execute the instruction), and executes it (performs the operations indicated by the instruction). Note that much of this cycle is spent copying data from one location to another. When a program is initially loaded, the address of the first instruction must be placed in the PC. The steps of this cycle, which occur at specific clock cycles, are listed below. Note that steps 1 and 2 are the search phase, step 3 is the decode phase, and step 4 is the execute phase. 1. Copy the content from PC to MAR: MAR ← PC. 2. Go to the main memory and look for the instruction found at the MAR address, placing this instruction in the IR; increment PC by 1 (PC now points to the next instruction in the program): IR ← M[MAR] and then PC ← PC+1. (Note: Since MARIE is word addressable, PC is incremented by one, which results in the address of the next word occupying PC. If MARIE were byte addressable, PC would need to be incremented by 2 to point to the address, of the next instruction, because each instruction would require two bytes.On a byte-addressable machine with 32-bit words, the PC would have to be incremented by 4.), 3. Copy the rightmost 12 bits of the IR to MAR ; decode the leftmost four bits to determine the opcode, MAR ← IR[11–0], and decode IR[15–12]., 4. If necessary, use the address in MAR to go to memory looking for data, putting it in the MBR (and possibly the AC), then execute the instruction, MBR ← M[MAR] and execute the current instruction. This cycle is illustrated in the flowchart of Figure 4.11. , long instructions, and huge memories, they can perform millions of these fetch, decode, and execute cycles in the blink of an eye., , 4.3.2, , Interrupts and I/O, Chapter 7 is devoted to input and exit. However, we will discuss some basic I/O concepts at this point, to make sure you understand the entire process of running a program.


Page 198:
4.3 / Instruction Processing, , 167, , Start, , Copy PC to, MAR, , Copy memory content at address, MAR to IR;, Increment PC by 1, , Decode instruction and set IR bits [ 11-0] at, MAR , , Yes, , Instruction, require, operand?, , No, , Copy the contents of memory at address, MAR to MBR, , Execute the instruction, , FIGURE 4.11, , The Seek Decode-Execute Cycle , , MARIE has two registers to accommodate input and output. The input register contains the data that is transferred from an input device to the computer; the output register contains information ready to be sent to an output device. The time spent by these two registrars is very important. For example, if you are typing, typing, and typing very fast, the computer must be able to read each character entered into the input register. If another character is entered into that register before the computer has had a chance to process the current character, the current character will be lost. Since the processor is very fast and keyboard input is very slow, the processor is more likely to be able to read the same character from the input register multiple times. We must avoid both situations. MARIE solves these problems by using interrupt-controlled I/O. (A detailed discussion of the various types of I/O can be found in Chapter 7.) When the CPU executes an input or output instruction, the appropriate I/O device is notified. EITHER


Page 199:
168, , Chapter 4 / MARIE: Introduction to a Simple Computer, , The CPU continues with other useful work until the device is ready. At that time, the device sends an interrupt signal to the CPU. The CPU then processes the interrupt, after which it continues with the normal get, decode, and execute cycle. This process requires the following: • A signal (interrupt) from the I/O device to the CPU indicating that input or output is complete, • Some means to allow the CPU to deviate from the normal get, decode, and execution To "aware" this interrupt, the method most computers use to process an interrupt is to check for an interrupt pending at the beginning of each get, decode, and execute cycle. If so, the interrupt is processed, after which the execution cycle of the machine continues. If there are no interruptions, processing continues normally. The execution path is illustrated in the flow chart in Figure 4.12. Typically, the input or output device sends an interrupt using a special register, the status register or flag. A special bit is set to indicate that an interrupt has occurred. For example, as soon as keyboard input is entered, this bit is set. The CPU checks this bit at the beginning of each machine cycle. When set, the CPU processes an interrupt. When not set, the CPU performs a normal get, decode, and execute cycle, processing instructions in the currently executing program. When the CPU encounters the interrupt bit set, it executes an interrupt routine determined by the type of interrupt that occurred. I/O interrupts are not the only types of interrupts that can occur during program execution. Have you ever typed Ctrl-break or Ctrl-C to break a program? This is another example of an interrupt. There are external interrupts generated by an external event (such as input/output or power failure), internal interrupts generated by some exception condition in the program (such as division by zero, stack overflow, or protection violations), and software interrupts generated by executing a , , Yes, Process the interrupt, FIGURE 4.12, Was an interrupt issued?


Page 200:
4.4 / A simple program, , 169, , instruction in the program (such as one that requires a program to change, running at one level, such as the user level, to another level, such as the kernel level), regardless of what type interrupt was invoked, interrupt handling, the process is the same. After the CPU acknowledges an interrupt request, the address of the interrupt service routine is determined (usually by hardware) and the routine is executed (much like a procedure). The CPU switches from executing the program to executing a specialized procedure to handle the interrupt. The normal fetch, decode, and execute cycle runs through the interrupt service routine instructions until the code is fully executed. The CPU then returns to the program it was executing before the interrupt occurred. The CPU should return to the exact point at which it was running in the original program. Therefore, when the CPU switches to the interrupt service routine, it must save the contents of the PC, the contents of all other CPU registers, and any existing status conditions for the original program. After the interrupt service routine completes, the CPU restores the exact same environment that the original program was running in, and then begins searching for, decoding, and executing the original program's instructions. SIMPLE PROGRAM Now we present a simple program written for MARIE. In Section 4.6, we present several additional examples to illustrate the power of this minimal architecture. It can even be used to run programs with procedures, various loop constructs, and different selection options. Our first program adds two numbers (both are in main memory), storing the sum in memory. (We've dispensed with input/output for now.) Table 4.3 lists an assembly language program to do this, along with its corresponding machine language program. The list of instructions in the Instruction column is the actual assembly language program. We know that the fetch-decode-execute loop begins by fetching the first instruction of the program, which it finds by loading the PC with the address of the first instruction when the program is loaded for execution. For simplicity, let's assume that our MARIE programs are always loaded starting at address 100 (in hexadecimal). It is usually easiest to, , Hex, Address, 100, 101, 102, 103, 104, 105, 106, , Instruction, Load, Add, Store, Stop, 0023, FFE9, 0000, , TABLE 4.3, , 104 , 105, 106, Binary content of memory address, 0001000100000100, 0011000100000101, 001000010000010, 0111000000000000, 00000000100011, 11111111010000, 0000000, 0000000, 0000000, 0000000


Page 201:
170, , Chapter 4 / MARIE: Introduction to a Simple Computer, , humans to read hexadecimal instead of binary, so the actual memory content is displayed in hexadecimal., This program loads 002316 (or the decimal value 35) in AC. It then adds the hexadecimal value FFE9 (decimal ⫺23) that it finds at address 105. This results in a value of 12 in the AC. The Store instruction stores this value in memory location 106. When the program completes, the binary content of location 106 changes to 0000000000001100, which is 000C hexadecimal or 12 decimal. The instruction in Part c places the sum in the appropriate memory location. The “decode IR[15–12]” instruction simply means that the instruction must be decoded to determine what to do. This decoding can be done in software (using a microprogram) or in hardware (using hardwired circuits). These two concepts are discussed in more detail in Section 4.7. Note that there is a one-to-one correspondence between assembly language and machine language instructions. This makes it easy to convert assembly language to machine code. Using the instruction tables provided in this chapter, you will be able to manually assemble any of our example programs. For this reason, from now on we will only look at assembly language code. However, before presenting more programming examples, a discussion of the assembly process 104 is required, for example, for the machine language instruction 1104 (in hexadecimal). But why bother with this conversion? Why not just write in machine code? While it is very efficient for computers to view these instructions as binary numbers, it is difficult for humans to understand and program them in sequences of 0's and 1's. We prefer words and symbols to long numbers, so it seems like a natural solution to create a program that does this simple conversion for us. This program is called an assembler., , 4.5.1, , What do assemblers do? and ones). Assemblers take a programmer's assembly language program, which is actually a symbolic representation of binary numbers, and convert it into binary instructions or the equivalent machine code. The assembler reads a source file (assembler program) and produces an object file (machine code). Replacing simple alphanumeric names with opcodes makes programming much easier. We can also substitute labels (simple names) to identify or name specific memory addresses, which further simplifies the task of writing assembly programs. For example, in our program to add two numbers, we can use labels to


Page 202:
171, , 4.5 / A discussion of assemblers, a) Load 104, Step, , RTN, , (initial values), Fetch, , PC, , IR, , MAR, , MBR, , AC, , 100 ----- - ------ ------ -----MAR, IR, , PC, M[MAR], , 100 -----100 1104, , 100, 100, , ---- -- ----------- ------, , Decode, , PC, MAR, , PC + 1, IR[11—0], , 101, 101, , 1104, 1104, , 100, 104, , ------ ----------- ------, , Get Operand, , (Decode IR[15—12]) 101, MBR, M[ MAR], 101, , 1104, 1104, , 104, 104, , ------ -----0023 ------, , Execute, , AC, , MBR, , 101, , 1104, , 104, , 0023, , 0023, , RTN, , PC, , IR, , MAR, , MBR, , AC, , MAR, PC, IR, M[MAR], PC, PC + 1, MAR, IR[11 —0], (Decode IR[15—12]), MBR, M[MAR], AC, AC + MBR, , 101, 101, 101, 102, 102, 102, 102, 102, , 1104, 1104, 3105 , 0023, 0023, 0023, 000C, , RTN, , PC, , IR, , MAR, , MBR, , AC, , MAR, PC, IR, M[MAR], PC, PC + 1, MAR, IR[11 —0], (Decode IR[15—12 ]), (not required), MBR, AC, M[MAR], MBR, , 102, 102, 102, 103, 103, 103, 103, 103, 103, 3105 , 3105, 2106, 000C, 000C, 000C, 000C, 000C , 000C, 000C, 000C, 000C, 000C, 000C, 000C, , b) Add 105, Step, (initial values), Find, , Decode, Get Operand, Execute, c ) Store 106, Step, (initial values), Get , , Decode, Get Operand, Execute, , FIGURE 4.13, , A trace of the program to add two numbers


Page 203:
172, , Chapter 4 / MARIE: Introduction to a Simple Computer, , Address, , X,, Y,, Z,, , TABLE 4.4, , Instruction, , 100, 101, 102, 103, 104, 105, 106, , Load, Add, Store, Stop, 0023, FFE9, 0000, , X, Y, Z, , An example using labels, , indicate memory addresses, making it unnecessary to know the exact memory address of the operands for the instructions. Table 4.4 illustrates this concept. When the address field of an instruction is a label instead of an actual physical address, the assembler must still translate it to an actual physical address in main memory. Most assembly languages ​​allow tags. Assemblers often specify formatting rules for their instructions, including those with tags. For example, a label may be limited to three characters and may also need to appear as the first field of the statement. MARIE requires that tags be followed by a comma. Tags are good for programmers. However, they are more work for the assembler. You must make two passes through a program to do the translation. This means that the assembler reads the program twice, from top to bottom each time. In the first step, the assembler builds a set of mappings called a symbol table. For the above example, build a table with three symbols: X, Y, and Z. Because an assembler walks through code from top to bottom, it cannot translate the entire assembly language instruction to machine code in a single pass; it doesn't know where the data part of the declaration is if it only receives a label. But after you've built the symbol table, you can do a second pass and "fill in the blanks". In the above program, the first pass of the assembler creates the following symbol table: , X, Y, Z, , It also starts translating the instructions. After the first pass, the translated instructions would be incomplete as follows:, 1, 3, 2, 7, , X, Y, Z, 0, , 0, , 0, , In the second pass, the assembler uses the symbol table to complete the addresses and create the corresponding machine language instructions. So in the second


Page 204:
4.5 / A discussion of assemblers, , Address, , X,, Y,, Z,, , TABLE 4.5, , 173, , Instruction, , 100, 101, 102, 103, 104, 105, 106, , Load, Add, Store, Halt, DEC, DEC, HEX, , X, Y, Z, 35, –23, 0000, , An example using directives for constants, , pass would know that X is at address 104 and would substitute 104 for X. A similar procedure would replace Y and Z, yielding:, 1, 3, 2, 7, , 1, 1, 1, 0, , 0, 0, 0, 0, , 4, 5, 6, 0, , As a Most people are uncomfortable reading hexadecimal, most assembly languages ​​allow data values ​​stored in memory to be specified as binary, hexadecimal, or decimal. Usually, the assembler is given some sort of assembler directive (an instruction, specifically for the assembler that should not be translated into machine code) to specify which base should be used to interpret the value. We use DEC for decimal and HEX for hexadecimal in MARIE's assembly language. For example, we rewrite the program in Table 4.4 as shown in Table 4.5. Instead of asking for the actual value of the binary data (written in hexadecimal), we specify a decimal value using the DEC directive. The assembler recognizes this directive and converts the value accordingly before storing it in memory. Again, the directives are not translated into machine language; they just instruct the assembler, somehow. Another type of directive common to virtually all programming languages ​​is the comment delimiter. Comment delimiters are special characters that tell the assembler (or compiler) to ignore all text after the special character. MARIE's comment delimiter is a forward slash ("/"), which causes all text between the delimiter and the line to be ignored., 4.5.2, Why use assembly language? Our main goal in introducing MARIE's assembly language is to give you an idea of ​​how the language relates to architecture. Understanding how to program in assembly goes a long way in understanding architecture (and vice versa). Not only do you learn basic computer architecture, but you can also learn exactly how the processor works and gain meaningful information about the specific architecture you are programming on. There are many other situations where assembly language programming is useful.


Page 205:
174, , Chapter 4 / MARIE: Introduction to a Simple Computer, , Most programmers agree that 10% of a program's code uses about 90% of the CPU time. In time critical applications we often need to optimize this 10% of the code. Normally, the compiler handles this optimization for us. The compiler takes a high-level language (such as C++) and turns it into assembly language (which is then converted to machine code). Compilers have been around for a long time, and in most cases they do a great job. From time to time, however, programmers must bypass some of the restrictions found in high-level languages ​​and manipulate the assembly code themselves. By doing this, programmers can make the program more efficient in terms of time (and space). This hybrid approach (most of the program is written in high-level language, with some rewritten in assembly language) allows the programmer to have the best of both worlds. Are there situations where entire programs must be written in assembly language? If overall program size or response time are critical, assembly language often becomes the language of choice. This is because compilers tend to obscure information about the cost (in time) of various operations, and programmers often find it difficult to judge exactly how their compiled programs will execute. Assembly language brings the programmer closer to the architecture and therefore to tighter control. Assembly language may be really necessary if the programmer wants to perform certain operations that are not available in a high level language. A perfect example, in terms of responsive performance and critical space design, is found in embedded systems. They are systems in which the computer is integrated with a device that is not normally a computer. Embedded systems must be reactive and are often found in time-constrained environments. These systems are designed to execute a single instruction or a very specific set of instructions. Most likely, you use some kind of embedded system every day. Consumer electronics (such as cameras, camcorders, mobile phones, PDAs, and interactive games), consumer products (such as washing machines, microwave ovens, and washing machines), automobiles (in particular, motor and anti-lock control, brakes), medical instruments (such as such as CAT scanners and heart rate monitors) and industry (for process controllers and avionics) are just a few examples of where we find embedded systems. Software for an embedded system is critical. An embedded software program must work within very specific response parameters and is limited in the amount of space it can consume. These are perfect applications for assembly language programming., , 4.6, , EXTENDING OUR INSTRUCTION SET, While MARIE's instruction set is enough to write any program we want, there are some instructions we can add to make the assembly programming much simpler. We have 4 bits allocated for the opcode, which means we can have 16 unique instructions and we're only using 9 of them. We added the instructions in Table 4.6 to expand our instruction set.


Page 206:
4.6 / Extending our construction set, , 175, , Instruction, Number (hexadecimal), , Instruction, , Meaning, , 0, A, B, , JnS X, Delete, AddI X, , Store PC at X address and jump for X + 1., Put all zeros in AC., Add indirect: Go to address X. Use the value in X, as the actual address of the data, operand to add to AC., , C, , JumpI X , , Indirect jump : Go to address X. Use the value in X, as the actual address of the location to jump to., , TABLE 4.6, , MARIE Extended Instruction Set, , The JnS (jump and store) instruction tells us allows you to store a pointer to a return instruction and then proceed to set the PC to a different instruction. This allows us to call procedures and other subroutines and then return to the calling point in our code after the subroutine is finished. The Clear statement moves all zeros to the accumulator. This saves machine cycles that would otherwise be spent loading an operand 0 from memory. The AddI instruction (like the JumpI instruction) uses a different addressing mode. All of the above instructions assume the value in the data portion of the instruction is the direct address of the operand required by the instruction. The AddI instruction uses indirect addressing mode. (We cover more about addressing modes in Chapter 5.) Instead of using the value found at location X as the actual address, we use the value found at X as a pointer to a new memory, the location that contains the data we want. use in the instructions. For example, if we have the instruction AddI 400, we'll go to location 400, and assuming we've found the value 240 stored in location 400, we'll go to location 240 to get the actual operand of the instruction. We essentially allow pointers in our language. Returning to our discussion of register transfer notation, our new instructions are represented as follows: JnS, MBR ← PC, MAR ← X, M[MAR] ← MBR, MBR ← X, AC ← 1, AC ← AC + MBR, PC ← C.A.


Page 207:
176, , Chapter 4 / MARIE: Introduction to a Simple Computer, , Sure, AC ← 0, AddI X, MAR ← X, MBR ← M[MAR], MAR ← MBR, MBR ← M[MAR], AC ← AC + MBR, JumpI X, MAR ← X, MBR ← M[MAR], PC ← MBR, Table 4.7 summarizes the entire MARIE instruction set. Let's look at some examples using the full instruction set. EXAMPLE 4.1 Here is an example using a loop to add five numbers: Address, 100, 101, 102, 103, 104, 105, Loop, 106, 107, 108, 109, 10A, 10B, 10C, 10D, 10E, 10F, 110, 111, Addr, 112 , Next, 113, Num, 114, Sum, 115, Ctr, 116, One, 117, , Instruction, Load, Store, Load, Subt, Store, Delete, Load, AddI, Store, Load , Add, Store, Load , Subt, Store, Skipcond, Jump, Halt, Hex, Hex, Dec, Dec, Hex, Dec, , Comments, Addr /Load address of the first number to be added, Next /Store this address as our Next pointer, Num/Load the number of items to be added, One /Decrement, Ctr /Store this value in Ctr to control the loop, AC /Clear, Add /Load sound to in AC, Forward/Add the value to which points to location Next, Add/Store this sum, Next/Load Next, One/Increment by one to point to the next address, Next/Store at our Next pointer, Ctr/Load a loop control variable, One/Subtract One of the loop control variable, Ctr /Store this new value in the loop control variable, 00, /If control variable < 0, skip the next statement, Loop /Otherwise, go to Loop, /End program, 118 /Numbers to add start at location 118, 0, /A pointer to the next number to add, 5, /The number of values ​​to add, 0, /The sum, 0, /The loop control variable, 1, /Used to increment and decrement by 1


Page 208:
4.6 / Extending our build set, , Opcode, 0000, , Instruction, , RTN, MBR, MAR, M[MAR], MBR, AC, AC, PC, , PC, X, , Load X, , MAR, MBR, . , X, M[MAR], AC, , 0010, , Store X, , MAR, X, MBR, M[MAR], MBR, , 0011, , Add X, , MAR, MBR, AC, , X, M[ MAR ], AC + MBR, , 0100, , Subt X, , MAR, MBR, AC, , X, M[MAR], AC – MBR, , 0101, , Input, , AC, , InREG, , 0110, , Output , , OutREG, , 0111, , Halt, , 1000, , Skipcond, , If IR[11—10] = 00 then, If AC < 0 then PC, PC + 1, Otherwise If IR[11—10] = 01 then, If AC = 0, then PC, PC + 1, otherwise, if IR[11—10] = 10, then, if AC > 0, then PC, PC + 1, , 1001, , Jump X, , PC, , IR[ 11—0 ], , 1010, , Clear, , AC, , 0, , 1011, , AddI X, , MAR, MBR, MAR, MBR, AC, , X, M[MAR], MBR , M[MAR], AC + MBR , , 1100, , JumpI X, , MAR, MBR, PC, , X, M[MAR], MBR, , 0001, , JnS X, , TABLE 4.7, , MBR, X, . 1, AC + MBR, AC, MBR , , AC, , AC, , Complete game of instructions from MARIE, , 177


Page 209:
178, , Chapter 4 / MARIE: Introduction to a simple computer, 118, 119, 11A, 11B, 11C, , dic, dic, dic, dic, dic, , 10, 15, 20, 25, 30, , /The values To add, Although the comments are reasonably explanatory, let's review Example 4.1. Remember that the symbol table stores the pairs [label, location]. The Load Addr instruction becomes Load 112, because Addr is located at physical memory address 112. The value of 118 (the value stored in Addr) is then stored in Next. This is the pointer that allows us to "loop" through the five values ​​we are adding (located at addresses 118, 119, 11A, 11B, and 11C). The Ctr variable controls how many iterations of the loop we perform. Since we're checking to see if Ctr is negative to end the loop, we start by subtracting one from Ctr. The sum (with an initial value of 0) is then loaded into the AC. The loop begins, using Next as the address of the data we want to add to the AC. The Skipcond statement ends the loop when Ctr is negative, skipping the unconditional branch at the beginning of the loop. The program then ends when the Halt statement is executed. Example 4.2 shows how you can use the Skipcond and Jump statements to perform the selection. As this example illustrates an if/else construct, you can easily modify it to execute an if/then construct or even a case (or switch) construct., EXAMPLE 4.2 This example illustrates the use of an if/else construct to allow the selection. In particular, it implements the following:, if X = Y then, X := X ⫻ 2, else, Y := Y ⫺ X;, , Direction, If,, 100, 101, 102, 103, Then, 104, 105 , 106, 107, Else, 108, 109, 10A, , Statement, Load, X, Subt, Y, Skipcond 01, Jump, Else, Load, X, Add, X, Store, X, Jump, Endif, Load, Y , Subt, X, Store, Y, Comments, /Load first value, /Subtract value from Y and store result in AC, /If AC = 0, skip next statement, /Skip elsewhere if AC is not equal to 0, /Reload X so it can be doubled, /Double X, /Store the new value, /Skip the Else part at the end of If, /Start the Else part by loading Y, /Subtract X from Y , / Store Y ⫺ X in Y


Page 210:
4.7 / A discussion on decoding: hardwired vs. Microprogrammed, Endif, 10B, X,, 10C, Y,, 10D, , Halt, Dec, Dec, , 12, 20, , 179, , /Terminate program (doesn't do much!), /Load control variable from loop, /Subtract one from the loop control variable, Example 4.3 demonstrates how JnS and JumpI are used to enable subroutines. This program includes an END statement, another example of an assembler directive. This declaration tells the assembler where the program ends. Other potential directives include instructions for the assembler to know where to find, the first instruction in the program, how to configure memory, and whether the code blocks are procedures. EXAMPLE 4.3 This example illustrates the use of a simple subroutine to duplicate any number and can be coded: , Temp, 10B, Subr, 10C, 10D, 10E, 10F, 110 , , Load, Store, JnS, Store, Load, Store, JnS , Store, Halt, Dec, Dec, Dec, Hex, Clear, Load, Add, JumpI, END, X, Temp, Subr, X, Y, Temp , Subr, Y, , /Load the first number to duplicate, /Use Temp as parameter to pass value to Subr, /Store return address, jump to procedure, /Store first number, duplicate, /Load second number to duplicate, /Use Temp as parameter to pass value to Subr, /Store return address, pass to procedure, /Store second number, duplicate, /End of program, , 20, 48, 0, 0, , /Store return address here, /Delete AC modified by JnS, Temporary /real subroutine for double numbers , Temp /AC now contains the double value of Temp, Subr /Return to call code, , Us By using MARIE's simple instruction set, you should be able to implement any c High-level programming language construction, such as loop and while statements, . These are left as exercises at the end of the chapter. We made some nods and just assumed everything works as described, with a basic understanding that for each instruction, the control unit causes the CPU to execute a sequence of steps correctly. Actually, there must be control signals to assert lines in various digital components for things to happen as described (remember the various digital components


Page 211:
Chapter 4 / MARIE: Introduction to a simple computer, from Chapter 3). For example, when we execute an Add statement on MARIE in assembly language, we assume that the addition is done because the control signals, for the ALU, are defined as "add" and the result is placed in the AC. The ALU has several control lines that determine what operation to perform. The question we must answer is: "How are these control lines actually activated?" You can use one of two approaches to ensure that the control lines are set up correctly. the actual instructions of the machine. Instructions are divided into fields, and the different bits of the instruction are combined through various digital logic components to drive control lines. This is called hardwired control and is illustrated in Figure 4.14. The control unit is implemented by hardware (with simple NAND gates, flip-flops, and counters, for example). We need a special digital circuit that uses, as inputs, the bits of the opcode field in our instructions, bits of the flag (or status) register, bus signals, and clock signals. It must produce, as outputs, the control signals to control the various components of the computer. For example, a 4 to 16 decoder can be used to decode the opcode. Using the contents of the IR register and the ALU status, this unit controls registers, ALU operations, all changers and bus access., Instruction Register, Instruction Decoder, •••, , •• • , Clock input , , •••, , 180, , System bus input, (as interrupts), , Control unit, (combined circuit), , •••, , Status input/flag registers , , •••, , Control signals, (These signals go to the registers, the bus and the ALU.), , FIGURE 4.14, , Wired control unit


Page 212:
4.7 / A discussion about decoding: wired control versus microprogrammed control, , 181, , The advantage of wired control is that it is very fast. The downside is that the instruction set and control logic are directly linked by special circuitry that is complex and difficult to design or modify. If one designs a connected computer and then decides to extend the instruction set (as we did with MARIE), the physical components of the computer must change. This is prohibitively expensive, because not only do new chips have to be made, but old ones have to be located and replaced as well. The other approach, called microprogramming, uses software for control and is illustrated in Figure 4.15. All machine instructions are entered into a special program, the microprogram, to convert the instruction into the appropriate control signals. The microprogram is essentially an interpreter, written in microcode, that is stored in the firmware (ROM, PROM, or EPROM), which is often called control storage. This program converts machine instructions of ones and zeros into control signals., , Status input/, flag registers, , •••, , Instruction register, , Select a specific instruction, , Microinstruction, Address generation, , Clock, , Control storage, Microprogram memory, , Buffer microinstruction, Microinstruction buffer, , Microinstruction, Decoder, , Subroutine that is, executed by given, microinstruction, , •••, , Control signals, , FIGURE 4.15, , Microprogrammed control


Page 213:
182, , Chapter 4 / MARIE: Introduction to a Simple Computer, , Essentially, there is a subroutine in this program for each machine instruction., The advantage of this approach is that if the instruction set requires modification, the microprogram is simply updated no changes to the actual hardware are needed to match. Microprogramming is flexible, simple in design, and lends itself to very powerful instruction sets. Microprogramming allows for convenient hardware/software tradeoffs: if what you want isn't implemented in hardware (for example, your machine doesn't have a multiply instruction), it can be implemented in microcode. The disadvantage of this approach is that all instructions must go through an additional level of interpretation, which slows down the execution of the program. In addition to this time cost, there is a cost associated with the development itself, as proper tools are needed. We discuss wired control vs. firmware in more detail in Chapter 9. It's important to note that whether we're using wired control or firmware control, timing is critical. The control unit is responsible for the actual timing signals that direct all data transfers and actions. These signals are generated in sequence with a simple binary counter. For example, the timing signals of an architecture may include T1, T2, T3, T4, T5, T6, T7, and T8. These signals control when actions can occur. A fetch for an instruction can only occur when T1 is on, while a fetch for an operand can only occur when T4 is on. We know that registers can change state only when the clock strikes, but they are also limited to changing along with a given time signal. We saw a memory example in Chapter 3 that included a write enable control, row. This control line can be connected with a timing signal to ensure that the memory is only changed during specified intervals. computer architecture would be easy to understand without being completely overwhelming. While MARIE's architecture and assembly language are powerful enough to solve any problem that might arise in a modern architecture using a high-level language like C++, Ada, or Java, you probably won't be too happy with the inefficiency of the assembly language. architecture or with the difficulty it would have. the program is for writing and debugging! MARIE's performance could be significantly improved if more CPU storage were added by adding more registers. Making things easy for the programmer is another matter. For example, suppose a MARIE programmer wants to use parameterized procedures. Although MARIE supports subroutines (programs can branch to various sections of code, execute the code, and then return), MARIE has no mechanism to support parameter passing. Programs can be written without parameters, but we know that using them not only makes the program more efficient (particularly in the area of ​​reusability), but also makes the program easier to write and debug. you need a stack, a data structure that contains a list of items that can be accessed from a single end. A stack of dishes in the kitchen cupboard is analogous to a stack: you put the dishes on top and


Page 214:
4.8 / Real World Examples of Computer Architectures, , 183, , remove the top plates (usually). For this reason, stacks are often called last-in, first-out structures. (See Appendix A at the end of this book for a brief overview of the various data structures.) We can emulate a stack using certain parts of main memory by restricting how the data is accessed. For example, if we assume memory locations 0000, through 00FF are used as a stack and treat 0000 as the top, then it should be pushed (added) onto the stack from the top and removed (removed) from the stack. . It must be done from above. If we push the value 2 onto the stack, it will be pushed to location 0000. If we push the value 6, it will be pushed to location 0001. If we perform a pop operation, the 6 will be removed. In a stack, the pointer keeps track of where items should be pushed or popped. MARIE shares many features with modern architectures, but is not an accurate representation of them. In the next two sections, we present two contemporary computing architectures to better illustrate the features of modern architectures that, in an attempt to follow Leonardo da Vinci's advice, were excluded from MARIE. We started with Intel architecture (the x86 and Pentium families) and then moved to MIPS architecture. We chose these architectures because, while similar in some ways, they are based on fundamentally different philosophies. Each member of the x86 family of Intel architectures is known as a CISC (complex instruction set computer) machine, while the Pentium family and MIPS architectures are examples of RISC (reduced instruction set computer) machines. large number of instructions, of variable length, with complex designs. Many of these instructions are quite complicated and perform multiple operations when a single instruction is executed (for example, it is possible to loop using a single assembly language instruction). The basic problem with CISC machines is that a small subset of complex CISC instructions slows down systems considerably. The designers decided to go back to a less complicated architecture and incorporate a small (but complete) set of instructions that would execute extremely fast. This meant that it would be the responsibility of the compiler to produce efficient code for the ISA. Machines that use this philosophy are called RISC machines. RISC is a misnomer. It is true that the number of instructions is reduced. However, the main goal of RISC machines is to simplify the instructions so that they can be executed faster. Each instruction performs only one operation, they are all the same size, have only a few different designs, and all arithmetic operations must be performed between registers (data in memory cannot be used as operands). Virtually all new instruction sets (for any architecture) since 1982 are RISC, or some kind of combination of CISC and RISC. We cover CISC and RISC in detail in Chapter 9., 4.8.1, Intel Architectures, Intel Corporation has produced many different architectures, some of which may be familiar to you. Intel's first popular chip, the 8086, was introduced in 1979 and used in the IBM PC. It handled 16-bit data and worked with 20-bit addresses, so it could address a million bytes of memory. (A close cousin


Page 215:
184, , Chapter 4 / MARIE: An introduction to a simple computer, from the 8086, the 8-bit 8088 was used in many PCs to reduce costs). On the 8086, the CPU was divided into two parts: the execution unit, which included the general and ALU registers, and the bus interface unit, which included the instruction, queue, segment registers, and instruction pointer. ), BX (the base register used to extend addressing), CX (the count register), and DX (the data register). Each of these records was divided into two parts: the more significant half was designated the "high" half (indicated by AH, BH, CH, and DH), and the less significant half was designated the "low" half (indicated by AL, BL, CL and DL). Several 8086 instructions required the use of a specific register, but the registers could also be used for other purposes. The 8086 also had three pointer registers: the stack pointer (SP), which was used as the stack offset; the base pointer (BP), which was used to refer the sent parameters, to the stack; and the instruction pointer (IP), which contained the address of the next instruction (similar to MARIE's PC). There were also two index registers: the SI (source index) register, used as the source pointer for string operations, and the DI (destination index) register, used as the destination pointer for string operations. The 8086 also had a status flag registration. The individual bits of this register indicated various conditions, such as overflow, parity, transport interruption, etc. An 8086 assembly language program has been divided into different segments, special blocks, or areas to hold specific types of information. There was a code, a segment (to hold the program), a data segment (to hold the program data), and a stack segment (to hold the program stack). To access information from any of these segments, it was necessary to specify the offset of that element from the start of the corresponding segment. Therefore, segment pointers were needed to store segment addresses. These registers included the code, segment register (CS), data segment register (DS), and stack segment register (SS). There was also a fourth segment register, called the extra segment (ES) register, which was used in some string operations to handle memory addressing. was the value in the segment register and yyy was the offset. In 1980, Intel introduced the 8087, which added floating point instructions to the 8086 machine set, as well as an 80-bit stack. Many new chips using essentially the same ISA as the 8086 were introduced, including the 80286 in 1982 (which could handle 16 million bytes) and the 80386 in 1985 (which could handle up to 4 billion bytes of memory). The 80386 was a 32-bit chip, the first in a family of chips commonly known as IA-32 (for Intel architecture, 32-bit). When Intel moved from the 16-bit 80286 to the 32-bit 80386, designers wanted these architectures to be backward compatible, which means that programs written for an older, less powerful processor should run on the newer, more powerful processors. powerful and faster. For example, programs that run on the 80286 must also run on the 80386. Therefore, Intel has kept the same basic architecture and register sets. (New features were added with each successive model, so future compatibility is not guaranteed.) The naming convention used in the 80386 for the registers, which went from 16-bit to 32-bit, included an "E" prefix (which stood for "extended"). So instead of AX, BX, CX, and DX, the records became EAX, EBX, ECX, and EDX. East


Page 216:
4.8 / Real World Examples of Computer Architectures, 16-bit, , 8-bit, , 8-bit, , AH, , AL, , 185, , AX, EAX, 32-bit, , FIGURE 4.16, , The EAX register, divided into parts, , the same convention was used for all other records. However, the programmer can still access the original registers, AX, AL, and AH, for example, using the original names. Figure 4.16 illustrates how this worked, using the AX register as an example. Both the 80386 and 80486 were 32-bit machines, with 32-bit data buses. The 80486 added high-speed cache (see Chapter 6 for more details on cache and memory), which significantly improved performance. The Pentium series (Intel renamed numbers like 80486 to "Pentium" because it couldn't register the numbers) began with the Pentium processor, which had 32-bit registers and a 64-bit data bus and employed a superscalar design. This means that the CPU had multiple ALUs and could issue more than one instruction per clock cycle (that is, execute instructions in parallel). The Pentium Pro added branch prediction, while the Pentium II added MMX technology (which most agree wasn't a huge success) to handle multimedia. The Pentium III added more support for 3D graphics (using floating point instructions). Historically, Intel has used a classic CISC approach across its entire series of processors. The newer Pentium II and III used a combined approach, employing CISC architectures with RISC cores that could translate instructions from CISC to RISC. Intel was in line with the current trend moving away from CISC and towards RISC. Intel's seventh generation CPU family featured the Intel Pentium 4 (P4) processor. This processor differs from its predecessors in several ways, many of which are beyond the scope of this text. Suffice it to say that the Pentium 4 processor is clocked at 1.4 GHz (and higher), uses no fewer than 42 million transistors for the CPU, and implements something called the "Netburst" microarchitecture. (Until then, all Pentium family processors were based on the same microarchitecture, a term used to describe the architecture below the instruction set.) This new microarchitecture is made up of several innovative technologies, including a hyper-pipeline (we covered pipelines in Chapter 5), a 400 MHz (and faster) system bus, and many improvements to cache and floating-point operations. This made the P4 an extremely useful processor for multimedia applications. The introduction of the Itanium processor in 2001 marked Intel's first 64-bit chip (IA-64). Itanium includes a register-based programming language and a very rich instruction set. It also employs a hardware emulator to maintain backwards compatibility with the IA-32/x86 instruction sets. This processor has 4 integer units, 2 floating point units, a significant amount of cache memory at 4 different levels (we


Page 217:
186, Chapter 4 / MARIE: Introduction to a Simple Computer, study cache levels in Chapter 6), 128 floating-point registers, 128 integer registers, and various miscellaneous registers to handle efficient loading of instructions in load situations. Itanium can address up to 16 GB of main memory. The assembly language of an architecture reveals important information about that architecture. To compare the MARIE architecture with the Intel architecture, let's go back to Example 4.1, the MARIE program that used a loop to add five numbers. Let's rewrite the program in x86 assembly language, as shown in Example 4.4. Note the addition of a data segment directive and a code segment directive, EXAMPLE 4.4, to Pentium., Num1, , Num, Sum, , .DATA, EQU 10, EQU 15, EQU 20, EQU 25 , EQU 30, DB 5 , DB 0, , A program that uses a loop to add five written numbers to be executed, , ; Num1 is initialized to 10, ; Each word after Num1 is initialized, , ; Initializes the loop counter, ; Initialize Sum, , .CODE, READ EBX, Num1, ; Load the address of Num1 into EBX, MOV ECX, Num, ; Set the loop counter, MOV EAX, 0, ; Initialize sum, MOV EDI, 0, ; Initialize offset (from which number to add), Start: ADD EAX, [EBX+EDI *4] ; Add the EBXth number to EAX, INC EDI, ; Increases the offset by 1, DEC ECX, ; Decrement the loop counter by 1, JG Start, ; If the counter is greater than 0, go back to Home, MOV Sum, EAX, ; Store the result in Sum, , We can make the above program easier to read (which also makes it less like MARIE's assembly language) by using the loop statement. Syntactically, the loop statement is similar to a jump statement in that it requires a label. The above loop can be rewritten as follows:, MOV, Start: ADD, INC, LOOP, MOV, , ECX, Num, EAX, [EBX + EDI + 4], EDI, Start, Sum, EAX, , ; Setting the counter, , The loop statement in x86 assembly is similar to the do...while construct in C, C++, or Java. The difference is that there is no explicit loop variable: the ECX register is supposed to contain the loop counter. After executing the loop statement


Page 218:
4.8 / Real World Example of Computer Architectures, , 187, , tion, the processor reduces ECX by one and then tests ECX to see if it equals zero. If it is not zero, control jumps to Home; if it is zero, the loop ends. The loop statement is an example of the types of statements that can be added to make the programmer's job easier, but are not required to get the job done. 4.8.2 MIPS Architectures The MIPS family of CPUs has been one of the most successful and flexible designs in its class. MIPS R3000, R4000, R5000, R8000, and R10000 are some of the many trademarks owned by MIPS Technologies, Inc. MIPS chips are used in embedded systems, as well as computers (such as Silicon Graphics machines) and various toy computers ( Nintendo and Sony use the MIPS CPU in many of their products). Cisco, a very successful manufacturer of Internet routers, also uses MIPS CPUs. ) and MIPS64 (for 64-bit architecture). Our discussion in this section focuses on MIPS32. It is important to note that MIPS Technologies made a similar decision to Intel: as ISA evolved, backwards compatibility was maintained. And just like Intel, each new version of ISA included operations and instructions to improve efficiency and handle floating point values. The new MIPS32 and MIPS64 architectures feature significant improvements in VLSI technology and CPU organization. The end result is significant cost and performance benefits compared to traditional architectures. Like the IA-32 and IA-64, the MIPS ISA incorporates a rich instruction set, including arithmetic, logic, compare, data transfer, branch, jump, shift, and multimedia instructions. MIPS is a load/store architecture, which means that all instructions (except load and store instructions) must use registers as operands (memory operands are not allowed). MIPS32 has 168 32-bit instructions, but many are similar. For example, there are six different add instructions, all of which add numbers, but vary in the operands and registers used. This idea of ​​having multiple instructions for the same operation is common in assembly language instruction sets. Another common instruction is the MIPS instruction, NOP (no-op), which just consumes time (NOPs are used in pipelines, as we'll see in Chapter 5). purpose registers, numbered r0 to r31. (Two of them have special functions: r0 is hardwired, a value of 0, and r31 is the default register for use with certain instructions, meaning you don't need to specify it in the instruction itself.) In MIPS assembly, these 32 usage registers are usually designated as $0, $1, . 🇧🇷 🇧🇷 , $31. Register 1 is reserved and registers 26 and 27 are used by the operating system kernel. Registers 28, 29 and 30 are pointer registers. The remaining records can be referred to by number, using the naming convention shown in Table 4.8. For example, you can refer to record 8 as $8 or as $t0. There are two special-purpose registers, HI and LO, that store the results of certain operations on integers. Of course, there is also a PC (program counter) register, giving a total of three special purpose registers.


Page 219:
188, , Chapter 4 / MARIE: Introduction to a Simple Computer, Naming, Convention, , Register, Number, , Value placed in register, , $v0–$v1, $a0–$a3, , 2—3, 4— 7, , Results, Expressions, Arguments, , $t0–$t7, $s0–$s7, , 8—15, 16—23, , Temporary Values, Stored Values, , $t8–$t9, , 24—25, , More Temporary Values, , TABLE 4.8, , MIPS32 Register Naming Convention, , MIPS32 has 32 32-bit floating-point registers that can be used in single-precision floating-point operations (with double-precision values ​​being stored in pairs of odd numbers from these records). There are 4 special purpose floating point control registers for use by the floating point unit. Let's continue our comparison by writing the programs of Examples 4.1 and 4.4 in MIPS32 assembly language. 🇧🇷 ., .data, # $t0 = sum, # $t1 = Ctr loop counter, Value: .word 10, 15,20,25,30, Sum = 0, Ctr = 5, .text, .global main , # declaration of main as a global variable, main: lw $t0, Sum, # Initialize the register containing the sum to zero, lw $t1, Ctr, # Copy the Ctr value to the register, la $t2, value, # $t2 is a pointer to the current value, while: blez #t1, end_while # Done with loop if counter <= 0, lw #t3, 0($t2), # Load pointer offset value of 0, add $t0, $t0, $t3, # Add value to sum, addi $t2, $t2, 4, # Go to next data value, sub #t1, $t1, 1, # Decrement Ctr, b while, # Back to the beginning of the loop, la $t4, sum, # Load address of sum into register, sw $t0, 0($t4), # Write sum to memory location sum, . 🇧🇷 ., , This is similar to Intel code where the loop counter is copied into a register, decremented during each iteration of the loop, and then checked to see if it is less than or equal to zero. Registry names may seem formidable, but they're actually quite easy to work with once you understand the naming conventions.


Page 220:
Chapter Summary, , 189, , If you are interested in writing MIPS programs but do not have a MIPS machine, there are several simulators you can use. The most popular is SPIM, a stand-alone simulator for running MIPS R2000/R3000 assembly language programs. SPIM provides a simple debugger and implements almost the entire MIPS assembly instruction set. The SPIM package includes the source code and a complete set of documentation. It is available for various versions of Unix (including Linux), Windows (PC), and Windows (DOS), as well as Macintosh. For more information, see the references at the end of this chapter. If you look at Examples 4.1, 4.4, and 4.5, you'll see that the instructions are quite similar. The records are referenced differently and have different names, but the underlying operations are basically the same. Some assembly languages ​​have larger instruction sets, allowing the programmer more options for coding multiple algorithms. But, as we saw with MARIE, you don't absolutely need a huge set of instructions to do the job of how computers actually work. This simple architecture was combined with an ISA and an assembly language, with an emphasis on the relationship between these two, which allowed us to write programs for MARIE. The CPU is the main component of any computer. It consists of a data path (registers and an ALU connected by a bus) and a control unit responsible for sequencing operations and moving data and creating the timing signals. All components use these timing signals to work in unison. The input/output subsystem accommodates data input to the computer and feedback to the user. MARIE is a very simple architecture, designed specifically to illustrate the concepts in this chapter, without getting bogged down in too many technical details. MARIE has 4K 16-bit words in main memory, uses 16-bit instructions, and has seven registers. There is only one general purpose register, AC. The instructions for MARIE use 4 bits for the opcode and 12 bits for an address. The register transfer notation was introduced as a symbolic way to examine what each instruction at the register level does. The fetch-decode-execute cycle consists of the steps a computer takes to execute a program. An instruction is obtained, then it is decoded, the required operands are obtained, and finally the instruction is executed. Interrupts are processed at the beginning of this cycle, returning to the normal get, decode, and execute state when the interrupt handler completes. A machine language is a list of binary numbers that represent executable machine instructions, while an assembly language program uses symbolic instructions to represent the numerical data from which the machine language program is derived. Assembly language is a programming language, but it does not offer a wide variety of data types or instructions to the programmer. Assembly language programs represent a low-level programming method., , T


Page 221:
190, , Chapter 4 / MARIE: Introduction to a Simple Computer, , You will probably agree that programming in MARIE's assembly language is, to say the least, quite tedious. We have seen that most branches must be done explicitly by the programmer, using branch and branch instructions. It's also a big step from this assembly language to a high level language like C++ or Ada. However, the assembler is one step in the process of turning the source code into something that can be understood by the machine. We didn't introduce assembly language with the expectation that it would escape and become an assembly language programmer. Rather, this introduction should serve to give you a better understanding of the architecture of the machine and how instructions and architectures are related. The assembly language should also provide a basic idea of ​​what goes on behind the scenes in high-level C++, Java, or Ada programs. Although assembly language programs are easier to write for x86 and MIPS than for MARIE, they are all more difficult to write and debug than high-level language programs. in detail) for two reasons. First, it's interesting to compare the various architectures, starting with a very simple architecture and continuing to much more complex and complex architectures. You should focus on the differences as much as the similarities. Second, although the Intel and MIPS assembly languages ​​looked different from the MARIE assembly language, they are actually quite comparable. Instructions access memory and registers, and there are instructions for moving data, performing arithmetic and logic operations, and branching. Intel and MIPS sets also have more records than MARIE. Other than the number of instructions and the number of registers, the languages ​​work almost identically. FURTHER READING A MARIE assembly simulator is available on the first page of this book. This simulator builds and runs your MARIE programs. For more detailed information on the organization of CPUs and ISAs, see the books by Tanenbaum (1999) and Stallings (2000). Mano (1991) contains instructive examples of microprogrammed architectures. Wilkes (1958) is an excellent reference on microprogramming. Jones' book takes a straightforward and simple approach to assembly language programming, and all three books are very comprehensive. If you are interested in other assembly languages, see Struble (1975) for IBM's assembler, Gill, Corwin, and Logar (1987) for Motorola, and Sun Microsystems (1992) for SPARC. For a gentle introduction to embedded systems, try Williams (2000). If you are interested in MIPS programming, Patterson and Hennessy (1997) give a very good presentation, and their book has a separate appendix with useful information. Donovan (1972) also has good coverage of the MIPS environment. Kane and Heinrich (1992) is the definitive text on the MIPS instruction set and


Page 222:
References, , 191, , Assembly language programming on MIPS machines. The MIPS home page also has a lot of information. For more information on Intel architectures, see Alpert and Avnon (1993), Brey (2003), and Dulon (1998). Perhaps one of the best books on the subject of Pentium architecture is Shanley (1998). The Motorola, UltraSparc, and Alpha architectures are discussed in Circello (1995), Horel and Lauterbach (1999), and McLellan (1995), respectively. For a more general introduction to advanced architectures, see Tabak (1991). and various other downloads. Waldron (1999) is an excellent introduction to RISC assembly language programming, as well as MIPS., , REFERENCES, Abel, Peter. IBM PC Programming and Assembly Language, 5th Ed. Upper Saddle River, NJ: Prentice Hall, 2001., Alpert, D. and Avnon, D. "Architecture of the Pentium Microprocessor," IEEE Micro 13:3, Apr 1993, pp. 11–21., Brey, B Intel 8086/8088, 80186/80188, 80286, 80386, 80486 Microprocessors Pentium and Pentium Pro, Pentium II, Pentium III, and Pentium IV Processors: Architecture, Programming, and Interface, 6th Ed. Englewood Cliffs, NJ: Prentice Hall, 2003., Circello, J., Edgington, G., McCarthy, D., Gay, J., Schimke, D., Sullivan, S., Duerden, R., Hinds, C. , Marquette, D., Sood, L., Crouch, A., and Chow, D. “The Superscalar Architecture of the, MC68060,” IEEE Micro 15:2, Apr 1995, pp. 10–21., Dandamudi, S. P. Introduction to Assembly Language Programming: From 8086 to Pentium, Processors, New York: Springer Verlag, 1998., Donovan. J. J. Systems Programming, New York: McGraw-Hill, 1972., Dulon, C. “The IA-64 Architecture at Work”, COMPUTER 31:7, Jul 1998, pp. 24–32., Gill, A., Corwin, E., and Logar, A. Assembly Language Programming for the 68000, Upper Saddle, River, NJ: Prentice Hall, 1987., Goodman, J., and Miller, K. A Programmer's View of Computer Architecture, Philadelphia: Saunders College Publishing, 1993., Horel, T. and Lauterbach, G. "UltraSPARC III: Designing Third Generation 64-Bit Performance", IEEE Micro 19:3, May/June 1999, pp. 73–85., Jones, W. Assembly Language for the IBM PC Family, 2nd ed. El Granada, CA: Scott Jones, Inc., 1997., Kane, G., & Heinrich, J., MIPS RISC Architecture, 2nd ed. Englewood Cliffs, NJ: Prentice Hall, 1992. Hand, Morris. Digital Design, 2nd ed., Upper Saddle River, NJ: Prentice Hall, 1991. McLellan, E. “The Alpha AXP Architecture and 21164 Alpha Microprocessor,” IEEE Micro 15:2, Apr 1995, pp. 33–43. , MIPS home page: www.mips.com, Patterson, D.A., & Hennessy, J.L. Computer organization and design: the hardware/software interface, 2nd ed. San Mateo, CA: Morgan Kaufmann, 1997., Samaras, W.A., Cherukuri, N. & Venkataraman, S. “The IA-64 Itanium Processor Cartridge”, IEEE Micro 21:1, Jan/Feb 2001, pp. –89.


Page 223:
192, , Chapter 4 / MARIE: An Introduction to a Simple Computer, Shanley, T. Pentium Pro and Pentium II System Architecture. Reading, MA: Addison-Wesley, 1998., SPARC International, Inc., The SPARC Architecture Manual: Version 9, Upper Saddle River, NJ:, Prentice Hall, 1994., SPIM homepage: www.cs.wisc. edu/~larus/spim.html, Stallings, W. Computer Organization and Architecture, 5th ed. New York: Macmillan Publishing, Company, 2000., Struble, G. W., Programación en lenguaje ensamblador: The IBM System/360 and 370, 2nd ed. Reading, MA: Addison Wesley, 1975., Tabak, D. Advanced Microprocessors, New York, NY: McGraw-Hill, 1991., Tanenbaum, Andrew. Structured Organization of Computers, 4th ed. Upper Saddle River, New Jersey: Prentice, Hall, 1999., Waldron, John. Introducción al lenguaje RISC assembly, Reading, MA: Addison Wesley, 1999., Wilkes, M.V., Renwick, W. and Wheeler, D.J. "The Design of the Control Unit of an Electronic, Digital Computer", Proceedings of IEEE, 105, Part B, No. 20, 1958, pp. 121–128, 1958., Williams, Al. Microcontroller Projects with Basic Stamps, Gilroy, CA: R&D Books, 2000., , REVIEW OF ESSENTIAL TERMS AND CONCEPTS, 1. What is the function of a CPU?, 2. What is a data path used for?, 3. What is the purpose does the control unit serve?, 4. Where are the registers located and what are the different types?, 5. How does the ALU know which function to perform?, 6. Why is a bus often a communication bottleneck?, 7. What is the difference between a point-to-point bus and a multipoint bus?, 8. Why is a bus protocol important?, 9. Explain the differences between data buses, address buses, and control buses., 10. The what is a bus cycle?, 11. Name three different types of buses and where would you find them., 12. What is the difference between synchronous and non-synchronous buses?, 13. What are the four types of bus arbitration?, 14. Explain a difference between the clock cycles and the clock frequency. Do system clocks and bus clocks differ?, 16. What is the function of an I/O interface?, 17. Explain the difference between memory-mapped I/O and instruction-based I/O., 18. What is the difference between a byte and a word? What distinguishes each?, 19. Explain the difference between byte-addressable and word-addressable., 20. Why is address alignment important?, 21. List and explain the two types of memory interleaving and the differences between they.


Page 224:
Exercises, , 193, , 22. Describe how an interrupt works and name four different types., 23. How is a maskable interrupt different from a non-maskable one?, 24. Why, if MARIE has 4K words of main memory, addresses must have 12 bits?, 25. Explain the functions of all registers in MARIE., 26. What is an opcode?, 27. Explain how each instruction works in MARIE., 28. How is it different? a machine language from an assembly language? Is the conversion, 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., one to one (an instruction is the same as a machine instruction)?, What is the meaning of RTN?, Is a micro-operation the same as a machine instruction?, How is a micro-operation different from an assembly language instruction regular?, Explain the steps of the get-decode-execute cycle., How does interrupt-controlled I/O work?, Explain how an assembler works, including how it generates the symbol table, what it does with the source code, and the object code, and how it deals with tags., What is an embedded system? How is it different from an ordinary computer? Provide a trace (similar to Figure 4.13) for Example 4.1. Explain the difference between wired control and microprogrammed control. What is a battery? Why is it important for programming?, Compare CISC machines with RISC machines., How is the Intel architecture different from MIPS?, Name four Intel processors and four MIPS processors., , EXERCISES, 1. What are the functions CPU mains?, 2. Explain what the CPU must do when an interrupt occurs. Include in your answer the method the CPU uses to detect an interrupt, how it is handled, and what happens when the interrupt is serviced., ◆ 3. How many bits would you need to address 2M memory? , a) Is the memory addressable by byte?, , ◆, , b) Is the memory addressable by word?, , 4. How many bits are needed to address a main memory of 4M ⫻ 16 if, a) The memory Is main memory byte addressable?, b) Is main memory word addressable?, 5. How many bits are needed to address a main memory of 1 M ⫻ 8 if, a) Is main memory byte addressable?, b) Is main memory addressable by addressable word?


Page 225:
194, , Chapter 4 / MARIE: An introduction to a simple computer, ◆, , 6. Suppose a 2M ⫻ 16 main memory is built using 256 KB ⫻ 8 RAM chips and, , the memory is word-addressable., ◆, , a) How many RAM chips are needed?, , ◆, , b) How many RAM chips are there per memory word?, , ◆, , c) How many address bits are needed for each RAM chip ?, , ◆, , d ) How many banks will this memory have?, , ◆, , e) How many address bits are needed for the entire memory?, , ◆, , f) If high-order interleaving is used, where address 14 (is E in hexadecimal) ) be, , located?, g) Repeat exercise 6f for low-order interleaving., 7. Redo exercise 6 assuming a memory of 16M ⫻ 16 built using 512K ⫻ 8 RAM chips ., 8. A digital computer has a memory unit with 24 bits per word. The instruction set consists of 150 different operations. All instructions have an opcode part and an address part (allowing only one address). Each instruction is stored in a memory word. a) How many bits are needed for the opcode? b) How many bits are left for the address part of the instruction? c) What is the maximum memory size allowed? d) What is the largest unsigned binary number that can be accommodated in a word of memory? 9. Assume a 220-byte memory: addressable?, , ◆, , b) What are the lowest and highest addresses if the memory is word-addressable, , assuming a 16-bit word?, c) What are the lowest and highest addresses if the memory is word addressable? , assuming a 32-bit word?, 10. Given a 2048-byte memory consisting of multiple 64-byte chips ⫻ 8 RAM and, assuming byte-addressable memory, which of the seven diagrams below indicates the correct form of use the address bits? Explain your answer., 10-bit address, a., , 2-bit chip select, 8-bit address on chip, 64-bit address, , b., , 16-bit chip select, 48-bit address on chip, 11-bit address, , c., , 6-bit for chip select, 5-bit for on-chip address


Page 226:
Exercises, , 195, , 6-bit address, d., , 1-bit for chip select, 5-bit for on-chip address, 11-bit address, , e., , 5-bit for chip select, , 6-bit for on-chip address, 10-bit address, , f., , 4-bit for chip select, 6-bit for on-chip address, 64-bit address, , g., , 8-bit for chip select, 56-bit for address on the chip , , 11. Explain the steps of the search-decode-execute cycle. Your explanation should include, ◆, , what happens in the various registers., 12. Explain why, in MARIE, MAR is only 12 bits wide while AC is 16 bits wide. program (manual mounting)., Label, , S2,, , S1,, , ◆, , Hex Address, , Instruction, , 100, , Load To, , 101, , Add One, , 102, , Skip S1, , 103 , , Add One, , 104, , Store To, , 105, , Stop, , 106, , Add To, , 107, , Skip S2, , To,, , 108, , HEX 0023, , One,, , 109 , , HEX 0001, , 14. What is the content of the symbol table of the previous program?, 15. Given the instruction defined for MARIE in this chapter:, a) Decode the following instructions in machine language MARIE (write the set ◆ , , equivalent in bly):, i) 0010000000000111, ii) 10010000000001011, iii) 0011000000001001


Page 227:
196, , Chapter 4 / MARIE: Introduction to a Simple Computer, b) Write the following code segment in assembly language for MARIE:, if X > 1 then Y := X + X;, X := 0;, endif; , Y := Y + 1;, c) What are the potential problems (perhaps more than one) with the following piece of assembly language code (implementing a subroutine) written to execute, MARIE? The subroutine assumes that the parameter to be passed is in AC and must double this value. The main part of the program includes a sample call to the subroutine. You can assume that this fragment is part of a larger program, Main, Load, Jump, Sret,, , X, Sub1, Store X, , . 🇧🇷 ., Sub1, Add, Jump, , X, Sret, , 16. Write a MARIE program to evaluate the expression A ⫻ B + C ⫻ D., 17. Write the following MARIE assembly language code segment:, X := 1 ;, while X < 10 do, X := X + 1;, endwhile;, 18. Write the following code segment in MARIE assembly language:, Sum := 0;, for X := 1 to 10 do , Sum := Sum + X;, 19. Write a MARIE program using a loop that multiplies two positive numbers using, , repeated addition. For example, for multiples 3 ⫻ 6, the program would add 3 six times, or 3 + 3 + 3 + 3 + 3 + 3. 20. Write a MARIE subroutine to subtract two numbers. 21. More registers appear to be a good thing, in terms of reducing the total number of memory accesses a program might require. Give an arithmetic example to support this statement. First, determine the number of memory accesses required using MARIE and the two registers to store the memory data values ​​(AC and MBR). Then do the same arithmetic for a processor that has more than three registers to store data values ​​in memory. 22. MARIE stores the return address for a subroutine in memory at a location designated by the jump-and-store instruction. On some architectures, this address is stored in a


Page 228:
Exercises, , 23., 24., 25., , 26., *27., , 197, , record, and in many it is stored in a stack. Which of these methods would best handle recursion? Explain your answer. Provide a script (similar to Figure 4.13) for Example 4.2. Provide a script (similar to Figure 4.13) for Example 4.3. Suppose we add the following declaration to MARIE's ISA: Operand IncSZ, This instruction increments the value with effective address “Operand”, and if this new incremented value equals 0, the program counter is incremented by 1. Basically, we are incrementing the operand, and if this new value is equal to 0, we skip the next instruction. Show how this instruction would be written using RTN. Would you recommend a synchronous bus or an asynchronous bus to use between the CPU and memory? Explain your answer. Choose an architecture (in addition to those covered in this chapter). Do some research to find out how your architecture handles the concepts presented in this chapter, just as it did with Intel and MIPS., TRUE or FALSE, _____ 1. If a computer uses wired control, the firmware determines the instruction set for the machine. This instruction set can never be changed unless the architecture is redesigned., _____ 2. A branch instruction alters the flow of information by changing the PC., _____ 3. Registers are storage locations within the CPU itself. , _____ 4. Two- Pass assembler typically creates a symbol table during the first pass and completes the full assembly language translation to machine instructions in the second. _____ 5. The MAR, MBR, PC and IR registers in MARIE can be used to hold arbitrary values, data values., _____ 6. MARIE has a common bus scheme, which means multiple entities share the bus., _____ 7. An assembler is a program that accepts a symbolic language program and produces the binary equivalent in machine language, resulting in a one-to-one correspondence between the source assembly language program and the machine, the object language program. ., _____ 8. If a computer uses control and firmware, the firmware determines the instruction set for the machine.


Page 230:
“Every program has at least one error and can be shortened to at least one instruction, from which, by induction, it can be deduced that every program can be reduced to one instruction that doesn't work.”, , —Anonymous, , CHAPTER, , 5 , 5.1, , A Closer Look, Instruction Set, Architectures, INTRODUCTION, and we saw in Chapter 4 that machine instructions consist of opcodes and, , Woperands. Operation codes specify the operations to perform; Operands specify registers or data locations in memory. Why, when we have languages ​​like C++, Java, and Ada available, should we care about machine instructions? When we program in a high-level language, we often have little awareness of the topics discussed in Chapter 4 (or this chapter), because high-level languages ​​hide the details of the architecture from the programmer. language skills, not because they need an assembly language programmer, but because they need someone who can understand computer architecture to write more efficient and effective programs. In this chapter, we expand on the topics introduced in the previous chapter, in order to give you a more detailed look at instructions and machine sets. We look at different types of instructions and types of operands and how instructions access data in memory. You will find that variations in instruction sets are essential in distinguishing different computer architectures. Understanding how instruction sets are designed and how they work can help you understand the finer details of the architecture of the machine itself., , 5.2, , INSTRUCTION FORMATS, we know that a machine instruction has an opcode and zero or more operations. In Chapter 4, we saw that MARIE had an instruction length of 16 bits and could take, 199


Page 231:
200, , Chapter 5 / A closer look at instruction set architectures, 1 operand at most. Encoding an instruction set can be done in several ways. The architectures differ from one another in the number of bits allowed per instruction (16, 32, and 64 are the most common), the number of operands allowed per instruction, and the types of instructions and data each can process. More specifically, instruction sets are differentiated by the following characteristics: • CPU operand storage (data can be stored in a stack structure or in registers), • Number of explicit operands per instruction (zero, one, two and three are most common) , • Operand location (instructions can be classified as register-to-register, register-to-memory, or memory-to-memory, which simply refer to the combinations of operands allowed per instruction), • Operations (including not only the types of operations, but also which instructions can and cannot access memory), • Type and size of operands (operands can be addresses, numbers, or even characters), 5.2.1, , Design Decisions Ions for instruction sets When a computer architecture is in the design phase, the instruction set format must be determined before many other decisions can be taken. flush decisions. Selecting this format is often quite difficult because the instruction set must match the architecture, and the architecture, if designed well, can last for decades. Decisions made during the design phase have lasting ramifications. Instruction set architectures (ISAs) are measured by several different factors, including: (1) the amount of space a program requires; (2) the complexity of the instruction set, in terms of the amount of decoding required to execute an instruction and the complexity of the tasks performed by the instructions; (3) the length of the instructions; and (4) the total number of instructions. Points to consider when designing an instruction set include: • Short instructions are generally better because they take up less memory space and can be retrieved quickly. However, this limits the number of instructions, because there must be enough bits in the instruction to specify the number of instructions we need. Shorter instructions also have stricter limits on the length and number of operands. • Fixed-length instructions are easier to decode but waste space. • The organization of memory affects the format of the instructions. If the memory has, for example, 16-bit or 32-bit words and is not byte-addressable, it will be difficult to access a single character. For this reason, even machines that have 16-, 32-, or 64-bit words are often byte-addressable, which means that each byte has a unique address, even if the words are longer than 1 byte. does not necessarily imply a fixed number of operands. We could design an ISA with a fixed total instruction length, but allow the number of bits in the operand field to vary as needed. (This is called opcode expansion and is covered in more detail in Section 5.2.5.)


Page 232:
5.2 / Instruction Formats, , 201, , • There are many different types of addressing modes. In Chapter 4, MARIE used two modes of addressing: direct and indirect; however, we see in this chapter that there is a wide variety of addressing modes. • If words consist of multiple bytes, in what order should those bytes be stored in a byte-addressable machine? Should the least significant byte be stored in the highest or lowest byte address? This small versus big endian debate is discussed in the next section. • How many records should the architecture contain and how should these records be organized? How should the operands be stored in the CPU? The little versus big endian debate, opcode expansion, and CPU register organization are discussed in more detail in the following sections. In the process of discussing these issues, we also address the other design issues listed. the bytes of a multibyte data item. Virtually all current computing architectures are byte-addressable and therefore must have a standard for storing information that requires more than one byte. Some machines store a two-byte integer, for example, with the least significant byte first (in the low address) followed by the most significant byte. Therefore, a byte at a lower address has less meaning. These machines are called little endian machines. Other machines store this same two-byte integer with its most significant byte first, followed by its least significant byte. These are called big endian machines because they store the most significant bytes at the lowest addresses. Most UNIX machines are big endian, while most PCs are little endian machines. Most of the newer RISC architectures are also big endian. These two terms, little endian and big endian, are from Gulliver's Travels. those who ate the eggs opening the “big” end (big, endians) and those who ate the eggs opening the “small” end (small endians). For example, Intel has always done things the "little endian" way, while Motorola has always done things the "big endian" way. (It's also worth noting that some CPUs can handle both Little Endian and Big Endian.) For example, consider an integer that requires 4 bytes: Byte 3, Byte 2, Byte 1, Byte 0, On a Small Endian machine, this is organized in memory as follows: Base Address + 0 = Byte0, Base Address + 1 = Byte1, Base Address + 2 = Byte2, Base Address + 3 = Byte3


Page 233:
202, , Chapter 5 / A Closer Look at Instruction Set Architectures, , On a big endian machine, this long integer would be stored as: Base Address + 0 = Byte3, Base Address + 1 = Byte2, Base Address + 2 = Byte1 , Base Address + 3 = Byte0, , Suppose that on a byte-addressable machine, the 32-bit hexadecimal value 12345678 is stored at address 0. Each digit requires a nibble, so a byte contains two digits. , This hexadecimal value is stored in memory as shown in Figure 5.1, where the shaded cells represent the actual contents of memory. There are advantages and disadvantages to each method, although one method is not necessarily better than the other. Big endian is more natural to most people and therefore makes hex dumps easier to read. By having the high-order byte first, you can always test if the number is positive or negative by looking at the byte at offset zero. (Compare this to little endian where you have to know how long the number is, and then you have to skip bytes to find the one that contains the sign information.) Big endian machines store integers and strings in the same order and are faster on certain string operations. Most bitmap graphics are mapped with a "left most significant bit" scheme, which means that the architecture itself can handle working with graphics larger than one byte. This is a performance limitation for Little Endian computers, since they must continually reverse the byte order when working with large graphical objects. When decoding compressed data encoded with schemes like Huffman and LZW (discussed in Chapter 7), the actual keyword can be used as an index into a lookup table if it is stored in big endian (this is also true for encoding). However, big endian also has disadvantages. Converting a 32-bit integer address to a 16-bit integer address requires a big endian machine to perform the addition. High-precision arithmetic on Little Endian machines is faster and easier. , always starts at an even-numbered byte address). Waste space. Little Endian architectures like Intel allow odd address reads and writes, which makes programming on these machines much easier. If a programmer writes an instruction to read a value with an incorrect word length, on a big endian machine it is always read as an incorrect value; on a little endian machine, this can sometimes result in the correct data being read. (Note that Intel eventually added an instruction to reverse the byte order in the registers.) 56, , 34, , 78, 12, , Address, , FIGURE 5.1 ​​The hexadecimal value 12345678 stored in Big and Little, Endian format


Page 234:
5.2 / Instruction Formats, , 203, , Computer networks are big endian, which means that when little endian computers are going to pass integers across the network (addresses of network devices, for example), they have to convert them to network byte order. Also, when they receive integer values ​​over the network, they need to convert them back to their own native representation. While you may not be familiar with this small vs. big debate, it is important to many software applications today. Any program that writes or reads data from a file must know the byte order on the specific machine. For example, the Windows BMP graphics format was developed on a little endian machine, so to view BMPs on a big endian machine, the application used to view them must first reverse the byte order. Popular software designers are well aware of these byte ordering issues. For example, Adobe Photoshop uses big endian, GIF is little endian, JPEG is big endian, MacPaint is big endian, PC Paintbrush is little endian, Microsoft RTF is little endian, and Sun, raster files are big endian. Some applications support both formats: Microsoft, WAV, and AVI files, TIFF files, and XWD (X Windows Dump) files support both, usually by encoding a handle to the file., 5.2.3, Internal CPU Storage: Stacks vs. to registers, once given the order of the bytes in memory, the hardware designer must make some decisions about how the CPU should store the data. This is the most basic way to differentiate ISAs. There are three options: 1. A stack architecture, 2. An accumulator architecture, 3. A general purpose register (GPR) architecture, Stack architectures use a stack to execute instructions and the operands are (implicitly) located in the top of the stack. . Although stack-based machines have good code density and a simple model for evaluating expressions, a stack cannot be accessed randomly, making efficient code generation difficult. Accumulator architectures like MARIE, with an operand implicit in the accumulator, minimize the internal complexity of the machine and allow for very short instructions. But since the accumulator is just a temporary storage, the memory traffic is very high. General-purpose register architectures, using general-purpose register sets, are the most widely accepted models for machine architectures today. These register sets are faster than memory, easy for compilers to handle, and can be used very effectively and efficiently. In addition, hardware prices have been significantly reduced, making it possible to add a large number of loggers at minimal cost. If memory access is fast, a stack-based design may be a good idea; if memory is slow, it's usually better to use registers. These are the reasons why most computers in the last 10 years have been based on ledgers. However, since all operands must be named, using registers results in longer instructions, causing longer seek and decode times. (A very important goal for ISA designers is short instructions.) Designers who choose a


Page 235:
204, Chapter 5 / A Closer Look at Instruction Set Architectures, ISA must decide which one will work best in a given environment and carefully examine the advantages and disadvantages. The general purpose architecture can be divided into three classifications depending on where the operands are located. Memory-memory architectures can have two or three operands in memory, allowing an instruction to perform an operation without requiring any operand to be in a register. Register memory architectures require a combination, where at least one operand is in a register and the other is in memory. Offload storage architectures require data to be moved to registers before any operations are performed on that data. Intel and Motorola are examples of register memory architectures; Digital Equipment's VAX architecture allows memory-to-memory operations; and SPARC, MIPS, ALPHA and PowerPC are charge storage machines. Since most current architectures are based on GPR, we now examine two key instruction set features that divide general-purpose register architectures. These two characteristics are the number of operands and how the operands are addressed. In Section 5.2.4, we'll look at instruction length and the number of operands an instruction can have. (Two or three operands are most common for GPR architectures, and we compared them to architectures with zero and one operand.) Next, we investigate the types of instructions. Finally, in Section 5.4, we investigate the various addressing modes available. 5.2.4, Number of operands and length of the instruction. each instruction. This has a direct impact on the duration of the instruction itself. MARIE uses a fixed-length instruction with a 4-bit opcode and a 12-bit operand. Instructions on current architectures can be formatted in two ways: • Fixed Length: Wastes space, but is fast and results in better performance when using instruction-level pipelines, as we'll see in Section 5.5. • Variable length: more complex to decode but saves storage space. Typically, the real life compromise involves the use of two or three instructions, lengths, which provide bit patterns that are easily distinguishable and easy to decode. The instruction length must also be compared to the length of the word in the machine. If the instruction length is exactly equal to the word length, the instructions align perfectly when stored in main memory. Instructions should always line up with words for addressing purposes. Therefore, statements that are half, quarter, double, or triple the actual size of the word can waste space. Variable-length statements are clearly not the same length and need to be aligned with words, which also leads to wasted space. The most common instruction formats include zero, one, two, or three operands. We saw in Chapter 4 that some MARIE instructions have no operands, while others have an operand. Arithmetic and logical operations usually have two operands, but they can be performed with one operand (as we saw in MARIE) if the accumulator is implicit. We can extend this idea to three operands


Page 236:
5.2 / Instruction formats, , 205, , if we consider the final destination as the third operand. We can also use a stack, which allows us to have instructions with zero operands. The following are some common instruction formats: • OPCODE only (zero addresses), • OPCODE + 1 address (usually a memory address), • OPCODE + 2 addresses (usually registers or a register and a memory address), • OPCODE + 3 addresses (usually registers, or combinations of registers and memory) All architectures have a limit on the maximum number of operands allowed per instruction. For example, in MARIE, the maximum was one, even though some instructions had no operands (Halt and Skipcond). We mentioned that the zero, one, two, and three operand instructions are the most common. Instructions with one, two, and even three operands are reasonably easy to understand; a full ISA based on zero-operand instructions can, at first, be a bit confusing. details in Appendix A, where all insertions and deletions are done from above) to perform those operations that logically require one or two operands (such as an Add). Instead of using general purpose registers, a stack-based architecture stores operands at the top of the stack, making the top element accessible by the CPU. (Note that one of the most important data structures in machine architectures is the stack. This structure not only provides an efficient means of storing intermediate data, values ​​during complex computations, but also provides an efficient method of passing parameters. during procedure calls, as well as a means of saving local blocks, structuring, and evaluating variables and subroutines). In stack-based architectures, most instructions consist solely of opcodes; however, there are special instructions (those that add items to and remove items from the stack) that have only one operand. Stack architectures require a push instruction and a pop instruction, each with an allowed operand. pushing X pushes the data, the value found, into memory location X on the stack; Pop X removes the top item from the stack and stores it at location X. Only certain instructions have access to memory; all others must use the stack for the required operands during execution. For operations that require two operands, the first two items on the stack are used. For example, if we execute an Add instruction, the CPU adds the first two items to the stack, pops them, and then pushes the sum to the top of the stack. For non-commutative operations such as subtraction, the top stack, the item is subtracted from the item near the top, both are displayed, and the result is placed on top of the stack. This stack organization is very effective for evaluating long arithmetic expressions written in Reverse Polish Notation (RPN). This representation places the operator after the operands in what is known as postfix notation (compared to infix notation, which places the operator between the operands, and prefix notation, which places the operator before the operands). . For example:


Page 237:
206, , Chapter 5 / A Closer Look at Instruction Set Architectures, , X + Y is in infix notation, + X Y is in prefix notation, X Y + is in postfix notation, All arithmetic expressions can be written using either of these representations. However, the postfix representation combined with a register stack is the most efficient means of evaluating arithmetic expressions. In fact, some electronic calculators (such as Hewlett-Packard) require the user to enter expressions in postfix notation. With a little practice on these calculators, it is possible to quickly calculate long expressions that contain many nested parentheses without even stopping to think about how the terms are grouped. Consider the following expression: (X + Y) ⫻ (W ⫺ Z ) + 2 , Written in RPN, becomes:, XY + WZ ⫺ ⫻2+, Note that the need for parentheses to preserve precedence is eliminated when, using RPN., To illustrate the concepts of zero, one, two, and three operands, Let's write a simple program to evaluate an arithmetic expression, using each of these formats., EXAMPLE 5.1 ​​Suppose we want to evaluate the following expression:, Z = (X ⫻ Y) + (W ⫻ U), Typically, when three operands are allowed, at least one operand must be a register, and the first operand is usually the destination. Using three-address instructions, the code to evaluate the expression of Z is written as follows: Mult, Mult, Add, R1, X, Y, R2, W, U, Z, R2, R1, When two addresses are used instructions, usually one address specifies a register (two-address instructions rarely allow both operands to be memory addresses). The other operand can be a register or a memory address. Using two direction instructions, our code becomes: Load, Mult, Load, Mult, Add, Store, R1, X, R1, Y, R2, W, R2, U, R1, R2, Z, R1


Page 238:
5.2 / Instruction formats, , 207, , Note that it is important to know if the first operand is the source or the destination. In the instructions above, we assume it is the destination. (This tends to be a point of confusion for programmers who must switch between Intel assembly language and Motorola assembly language: Intel assembler specifies the first operand as the destination, while in Motorola assembler the first operand is the source). one-address instructions (as in MARIE), we must assume that a register (usually the accumulator) is implicit as the destination of the instruction result. To evaluate Z, our code now becomes: Load, Mult, Store, Load, Mult, Add, Store, , X, Y, Temp, W, U, Temp, Z, , Notice that as we reduce the number of operands allowed per instruction, increases the number of instructions required to execute the desired code. This is an example of a typical space/time tradeoff in architectural design: shorter instructions but longer programs. What does this program look like on a stack-based machine with zero-address instructions? Stack-based architectures do not use operands for instructions like Add, Subt, Mult, or Divide. We need a stack and two operations on that stack: Pop and Push. Operations that communicate with the stack must have an address field to specify the operand to be pushed onto the stack (all other operations have address zero). Push pushes the operand to the top of the stack. Pop removes the top of the stack and places it on the operand. This architecture results in the longest program to evaluate our equation. Assuming that the arithmetic operations use the two operands at the top of the stack, open them, and push the result of the operation, our code looks like this: , Push, Push, Mult, Push, Push, Mult, Add, Store, , X , Y, W, U, , Z


Page 239:
208, , Chapter 5 / A Closer Look at Instruction Set Architectures, , Instruction length is certainly affected by the length of the opcode and the number of operands allowed in the instruction. If the length of the opcode is fixed, decoding will be much easier. However, to provide flexibility and backward compatibility, the opcodes can be of variable length. Variable-length opcodes have the same problems as variable-length instructions versus constants. A tradeoff used by many designers is to expand opcodes., 5.2.5, , Expand Opcodes, Expand Opcodes represents a compromise between the need for a rich set of opcodes and the desire to have opcodes short and therefore short instructions. The idea is to shorten some opcodes, but have a way to provide longer codes when needed. When the opcode is short, there are plenty of bits left over to store operands (meaning we could have two or three operands per instruction). When you don't need space for the operands (for an instruction like Halt or because the machine uses a stack), all the bits can be used for the opcode, allowing many individual instructions. In between, there are longer opcodes with fewer operands, as well as shorter opcodes with more operands. Consider a machine with 16-bit instructions and 16 registers. Since we now have a set of registers instead of a simple accumulator (as in MARIE), we need to use 4 bits to specify a single register. We could code 16 instructions, each with 3 operand registers (implying that any data to be operated on must first be loaded into a register), or use 4 bits for the opcode and 12 bits for an address of memory (as in MARIE, assuming 4K size memory). Any memory reference requires 12 bits, leaving only 4 bits for other purposes. However, if all the data in memory is first loaded into a register from this set of registers, the instruction can select that particular data item using only 4 bits (assuming 16 registers). These two options are illustrated in Figure 5.2., , Opcode, , Address 1, , Opcode, , Address 2, , Address 3, , Address 1, , FIGURE 5.2 Two possibilities for a 16-bit instruction format, , Suppose that we want to encode the following instructions: • 15 instructions with 3 addresses, • 14 instructions with 2 addresses


Page 240:
5.2 / Instruction formats, , 209, , • 31 instructions with 1 address, • 16 instructions with 0 addresses, Can we code this set of instructions in 16 bits? The answer is yes, as long as we use expandable opcodes. The coding is as follows:, 0000 R1, ..., 1110 R1, , R2, , R3, , R2, , R3, , 1111 0000 R1, ..., 1111 1101 R1, , R2, , 15 3-address codes , , 14 2 address codes, , R2, , 1111 1110 0000 R1, ..., 1111 1111 1110 R1, , 31 1 address codes, , 1111 1111 1111 0000, ..., 1111 1111 1111 1111, , 16 0 -address codes, , This expanding opcode scheme makes decoding more complex. Instead of simply looking at a bit pattern and deciding what the instruction is, we need to decode the instruction as follows: if (leftmost four bits ! = 1111) {, Execute the appropriate instruction from all three addresses }, else if (leftmost seven bits != 1111 111 ) {, Execute appropriate two-address instruction}, else if (leftmost twelve bits != 1111 1111 1111 ) {, Execute appropriate one-address instruction }, else {, Execute the appropriate zero-address instruction , }, , At each stage, a spare code is used to indicate that we should now examine more bits. Here's another example of the kinds of trade-offs hardware designers continually face: Here, we trade opcode space for operand space.


Page 241:
210, , Chapter 5 / A Closer Look at Instruction Set Architectures, , 5.3, , TYPES OF INSTRUCTIONS, Most computer instructions operate on data; however, there are some that do not. Computer manufacturers regularly group instructions into the following categories:, •, •, •, •, •, •, •, , Data movement, Arithmetic, Boolean, Bit manipulation (shift and rotation), I/O, Transfer of Control, Special Purpose, , Data Movement instructions are the most commonly used instructions. Data moves from memory to registers, registers to registers, and registers to memory, and many machines give different instructions depending on the source and destination. For example, there could be a MOVE instruction that always requires two register operands, while a MOVE instruction allows one register and one memory operand. Some architectures, such as RISC, limit instructions that can move data to and from memory in an attempt to speed up execution. Many machines have variations of load, store, and move instructions to handle data of different sizes. For example, there could be a LOADB statement to deal with bytes and a LOADW statement to deal with words. Arithmetic operations include those instructions that use integer and floating point numbers. Many instruction sets provide different arithmetic instructions for various data sizes. As with data movement instructions, there are sometimes different instructions to provide various combinations of registers and memory accesses in different addressing modes. Usually there are instructions to perform AND, NOT, and often OR and XOR operations. Bit manipulation instructions are used to set and reset individual bits (or sometimes groups of bits) in a given data word. This includes arithmetic and logical shift instructions and rotation instructions, both left and right. Logic shift instructions simply shift bits to the left or right by a specified amount, shifting the zeros from the opposite end. Arithmetic shift instructions, commonly used to multiply or divide by 2, do not shift the leftmost bit because it represents the sign of the number. In an arithmetic right shift, the sign bit is replicated in its correct bit position. In an arithmetic left shift, the values ​​are shifted to the left, the zeros are shifted in, but the sign bit is never shifted. Rotation instructions are simply shift instructions that change by the bits that are shifted. For example, in a 1-bit left rotation, the leftmost bit is shifted and rotated to become the rightmost bit.


Page 242:
5.4 / Addressing, , 211, , I/O instructions vary greatly from one architecture to another. The basic schemes for I/O handling are scheduled I/O devices, interrupt-driven I/O, and DMA. These are covered in more detail in Chapter 7. Control statements include branches, jumps, and procedure calls. Branching can be unconditional or conditional. Jump instructions are basically branch instructions with implicit addresses. Since no operands are needed, skip instructions often use bits from the address field to specify different situations (remember the Skipcond instruction used by MARIE). Procedure calls are special branches, instructions that automatically save the sender's address. Different machines use different methods to save this address. Some store the address in a specific location in memory, others store it in a register, while others put the return address on a stack. We have already seen that batteries can be used for other purposes. Special-purpose instructions include those used for string processing, high-level language support, protection, flag control, and cache management. Most architectures provide instructions for string processing, including string manipulation and string search. its own section. We now introduce the two most important addressing issues: the types of data that can be addressed and the various modes of addressing. We only cover the fundamental addressing modes; the more specialized modes are built using the basic modes in this section. There must be hardware support for a given data type if the instruction is to reference that type. In Chapter 2, we discussed data types, including numbers and characters. Numeric data consists of integers and floating point values. Integers may or may not be signed and may be declared in various sizes. For example, in C++, integers can be short (16 bits), int (the word size of the given architecture), or long (32 bits). Floating-point numbers have lengths of 32, 64, or 128 bits. It is not uncommon for ISAs to have special instructions for handling numeric data of different lengths, as we saw above. For example, there may be one MOVE for 16-bit integers and a different MOVE for 32-bit integers. Non-numeric data types consist of strings, booleans, and pointers. String instructions often include operations such as copy, move, find, or modify. Boolean operations include AND, OR, XOR, and NOT. Pointers are actually addresses in memory. Although they are actually numeric in nature, pointers are treated differently than integers and floating point numbers. MARIE lets


Page 243:
212, , Chapter 5 / A closer look at instruction set architectures, , for this type of data using indirect addressing mode. The operands in instructions that use this mode are actually pointers. In an instruction that uses a pointer, the operand is essentially an address and should be treated as such. interpreted in two different ways: the 12 bits represent memory, the address of the operand, or a pointer to a physical memory address. These 12 bits can be interpreted in many other ways, thus providing many different addressing modes. Addressing modes allow us to specify where the operands of the instruction are located. An addressing mode can specify a constant, a register, or a location in memory. Certain modes allow shorter addresses, and some allow us to dynamically determine the location of the actual operand, often called the effective address of the operand. We now investigate the most basic addressing modes. Immediate Addressing Immediate addressing is so named because the referenced value immediately follows the opcode in the instruction. That is, the data to be operated on is part of the instruction. For example, if the addressing mode of the operand is immediate and the instruction is Load 008, the numeric value 8 is loaded into the AC. The 12 bits of the operand field do not specify an address, they specify the actual operand that the instruction requires. Immediate addressing is very fast because the value to load is included in the instruction. However, since the value to be loaded is fixed at compile time, it is not very flexible., Direct Addressing, Direct addressing is so called because the value to be referenced is obtained by specifying its memory address directly in the instruction. For example, if the addressing mode of the operand is direct and the instruction is Load 008, the data and value found at memory address 008 is loaded into the AC. Direct addressing is usually quite fast because although the value to be loaded is not included in the instruction, it is quickly accessible. It is also much more flexible than immediate addressing, since the value to load is the one at the given address, which can be variable. specify the operand. This is very similar to direct addressing, except that instead of a memory address, the address field contains a register reference. The contents of this register are used as the operand., Indirect Addressing, , Indirect addressing is a very powerful addressing mode that provides an exceptional level of flexibility. In this mode, the bits in the address field specify a memory


Page 244:
5.4 / Addressing, , 213, , or address to be used as pointer. The effective address of the operand is found by going to this memory address. For example, if the addressing mode of the operand is indirect and the instruction is Load 008, the data value at memory address 008 is actually the effective address of the desired operand. Suppose we found the value 2A0 stored in location 008. 2A0 is the "real" address of the value we want. The value found in location 2A0 is then loaded into the AC. In a variation of this scheme, the operand bits specify a register rather than a memory address. This mode, known as register indirection, works exactly the same as indirection mode, except that it uses a register instead of a memory address to point to data. For example, if the instruction is Load R1 and we are using register indirect addressing mode, we will find the effective address of the desired operand in R1., Indexed and Based Addressing, In indexed addressing mode, an index register (either explicitly or implicitly designated) is used to store an offset (or offset), which is added to the operand, resulting in the effective address of the data. For example, if the operand X of the Load X instruction is to be addressed using indexed addressing, assuming R1 is the index register and contains the value 1, the effective address of the operand is actually X + 1. Based addressing is similar, except that a base address record is used instead of an index record. In theory, the difference between these two modes is how they are used, not how the operands are calculated. An index register contains an index that is used as an offset, relative to the address given in the address field of the instruction. A base record contains a base address, where the address field represents an offset from this base. These two addressing modes are very useful for accessing array elements as well as characters in strings. In fact, most assembly languages ​​provide special index registers that are implicit in many string operations. Depending on the instruction set design, general purpose registers can also be used in this mode. We have already seen how this works in Section 5.2.4., Additional Addressing Modes There are many variations on the above schemes. For example, some machines have indexed indirect addressing, which uses indirect and indexed addressing at the same time. There is also base/offset addressing, which adds an offset to a specific base register and then adds it to the specified operand, resulting in the effective address of the actual operand to be used in the instruction. There are also auto-increment and auto-decrement modes. These modes automatically increment or decrement the register used, thus reducing the size of the code, which can be extremely important in applications such as embedded systems. Self-relative addressing calculates the address of the operand as an offset from the current instruction. There are additional modes; however, familiarity with the immediate, direct,


Page 245:
214, , Chapter 5 / A Closer Look at Instruction Set Architectures, memory, 800, ..., , 900, , 900, ..., , 1000, , 1000, ..., , 500, , 1100 , .. ., , 600, , 1600, , 700, , R1, , 800, , FIGURE 5.3 Memory content when load is executed 800, , Mode, , Loaded value, in CA, , Immediate, , 800 , , Direct, , 900 , , Indirect, , 1000, , Indexed, , 700, , TABLE 5.1 ​​Results of Using Various Memory Addressing Modes In Figure 5.2, the addressing modes , register, indirect, indexed, and stack help much to understand any addressing mode you may encounter Let's look at an example to illustrate these various modes. Suppose we have the instruction Load 800, and the memory and register R1 shown in Figure 5.3. actually charged in AC is seen in table 5.1. The Load R1 instruction, using register addressing mode, loads an 800 into the accumulator and, using indirect register addressing mode, loads a 900 into the accumulator. We summarize the addressing modes in Table 5.2. The various addressing modes allow us to specify a much wider range of locations than if we just used one or two modes. As always, there are trade-offs. We have sacrificed the simplicity of address computation and limited memory references for flexibility and a wider address range. each computer clock pulse is used to control one step in the sequence, but sometimes additional pulses may be used.


Page 246:
5.5 / Pipelining at the instruction level, Addressing mode, Immediate, , Value of the operand present in the instruction, , Direct, , Effective address of the operand in the address field, , Register, , Value of the operand located in the register, , Indirect, , Address field points to address of real operand, , 215, , To locate operand, , Indirect register, , Register contains address of real operand, , Indexed or based, , Effective address of operand generated by add value in the address field to the contents of a register, , Stack, , Operand placed on the stack, , TABLE 5.2 A summary of the basic modes of addressing, , to control minor details in one step. Some CPUs split the fetch-decodeexecute loop into smaller steps, where some of those smaller steps can be executed in parallel. This overlap speeds up execution. This method, used by all current CPUs, is known as pipeline. ., 6., , Get instruction, Decode opcode, Calculate effective address of operands, Get operands, Execute instruction, Store result, , The pipeline is analogous to a car assembly line. Each step in a computer pipeline completes a part of an instruction. Like the car assembly line, different steps complete different parts of different instructions in parallel. Each of the steps is called a pipeline stage. The stages are connected to form a tube. Instructions enter at one end, go through the various stages, and exit at the other end. The goal is to balance the time that each stage of the pipeline takes (ie, roughly the same time that every other stage of the pipeline takes). If the stages are not balanced in time, after a while the faster stages will wait for the slower ones. To see a real life example of this imbalance, consider the washing stages. If you only have one washer and one dryer, you often end up waiting on the dryer. If you consider the wash as the first step and the dry as the next, you will see that the longer drying step causes the clothes to pile up between the two steps. If you add folded clothes as a third stage, you'll soon realize that this stage will constantly be waiting for other, slower stages. Figure 5.4 provides an illustration of a computing pipeline with overlapping stages. We see each clock cycle and each stage for each instruction (where S1 represents fetch, S2 represents decode, S3 is compute state, S4 is operand fetch, S5 is execution, and S6 is storage).


Page 247:
216, , Chapter 5 / A Closer Look at Instruction Set Architectures, , Loop 1 Loop 2 Loop 3 Loop 4 Loop 5 Loop 6 Loop 7 Loop 8 Loop 9, , S1, , S2, , S3, , S4, , S5, , S6, , S2, , S3, , S4, , S5, , S6, , S2, , S3, , S4, , S5, , S6, , S2, , S3, , S4, , S5, , Instruction 1 , S1, , Instruction 2, S1, , Instruction 3, S1, , S6, , Instruction 4, , FIGURE 5.4 Four instructions passing through a 6-stage pipeline, , We see from Figure 5.4 that once the has gotten instruction 1 and is in the process of decoding, we can start recovery at instruction 2. When instruction 1 is getting operands and instruction 2 is decoding, we can start recovery at instruction 3. Note that these Events can occur in parallel, like a car assembly line. Suppose we have a pipeline of k stages. Assume that the clock cycle time is tp, that is, it takes tp per step. Also suppose we have n instructions (often called tasks) to process. Task 1 (T1) requires time k ⫻ tp to complete. The remaining n ⫺ 1 tasks emerge from the pipeline one per cycle, which implies a total time for these tasks of (n ⫺ 1)tp. Thus, to complete n tasks using a k-stage pipeline requires: (k ⫻ tp) + (n ⫺ 1) tp = (k + n ⫺ 1)tp, or k + (n ⫺ 1) clock cycles. Calculate the acceleration we gain using a pipe. Without a pipe, the time required is ntn cycles, where tn = k ⫻ tp. Therefore, the speedup (time without pipe divided by time using pipe) is:, speedup S =, , ntn, (k + n − 1)t p, , If we take the limit of this as n approaches infinity, we see that (k + n ⫺ 1), approaches n, resulting in a theoretical acceleration of:, Acceleration =, , k × tp, =k, tp, , Theoretical acceleration, k, is the number of stages in the pipe ., Let's see an example. Suppose we have a 4-stage pipeline, where: • S1 = get instructions, • S2 = decode and calculate effective addresses


Page 248:
5.5 / Pipeline at instruction level, , 217, , • S3 = search operand, • S4 = execute instruction and store results. We must also assume that the architecture provides a means to obtain data and instructions in parallel. This can be done with separate instructions and data paths; however, most memory systems do not allow this. Instead, they cache the operand, which in most cases allows the instruction and operand to be fetched simultaneously. Also assume that instruction I3 is a conditional branch instruction that changes the sequence of execution (so that instead of I4 being executed next, it passes control to I8). This results in the pipe feature shown in Figure 5.5. Note that I4, I5, and I6 are fetched and go through several stages, but after I3 (the branch) is executed, I4, I5, and I6 are no longer needed. Only after time period 6, when the branch has been executed, can the next instruction to execute (I8) be fetched, after which the pipeline is reloaded. From time periods 6 to 9, only one instruction was executed. In a perfect world, for every period of time after the tube was originally filled, one instruction would exit the channel. However, we see in this example that this is not necessarily true. Note that not all instructions need to go through every stage of the tube. If an instruction has no operand, there is no need for stage 3. To simplify the pipeline, hardware, and timing, all instructions go through all stages, whether they are needed or not. , stages that exist in the pipeline, the faster everything will run. This is true up to a point. There is a fixed overhead involved in moving data from memory to registers. The amount of control logic for the pipeline also increases in size proportionally to the number of stages, thus decreasing the overall execution. Additionally, there are several conditions that result in "pipeline deadlocks", which prevent us from achieving the goal of executing one instruction per clock. These include: • Resource conflicts, • Data dependencies, • Conditional branch statements. Resource conflicts are a major concern in statement-level parallelism. For example, if one instruction stores a value in memory while another is retrieved, Time Period, Instruction:, , 1, 2, (branch) 3, 4, 5, 6, 8, 9, 10, , 1, S1 , , 2, S2, S1, , 3, S3, S2, S1, , 4, S4, S3, S2, S1, , 5, S4, S3, S2, S1, , 6, , 7, , 8, , 9 , , 10, , 11, , 12, , S1, , S2, S1, , S3, S2, S1, , S4, S3, S2, , S4, S3, , S4, , S4, S3, S2, S1, , FIGURE 5.5 Example of a statement pipeline with conditional branch, , 13


Page 249:
218, , Chapter 5 / A Closer Look at Instruction Set Architectures, , Memory, both need access to memory. This is usually resolved by allowing the execution of the statement to continue, while forcing it to wait for the statement to be fetched. Certain conflicts can also be resolved by providing two separate paths: one for data coming from memory and one for instructions coming from memory. Data dependencies arise when the result of an instruction, not yet available, must be used as an operand for a following instruction. There are several ways to handle these kinds of pipeline conflicts. Special hardware can be added to detect instructions whose source operands are destinations of instructions later in the pipeline. This hardware can insert a short delay (usually a non-op instruction that does nothing) into the pipeline, allowing enough time for the deadlock to be resolved. Specialized hardware can also be used to detect these conflicts and route data through special paths that exist between various pipeline stages. This reduces the time required for the instruction to access the required operand. Some architectures address this problem by allowing the compiler to resolve the conflict. The compilers were designed to reorder the instructions, which resulted in a delay in loading any conflicting data, but did not affect the logic or output of the program. Branch instructions allow us to change the flow of execution in a program, which, in plumbing terms, causes major problems. If instructions are fetched one per clock cycle, several can be fetched and even decoded before a previous instruction is executed, indicating a branch. Conditional branching is particularly difficult to handle. Many architectures offer branch prediction, using logic to make the best estimate of what instructions will be needed next (essentially, they are predicting the result of a conditional branch). Compilers try to solve branching problems by rearranging the machine code to cause a delayed branch. An attempt is made to reorder and insert useful instructions, but if this is not possible, non-operational instructions are inserted to keep the pipeline full. Another approach, used by some machines with a conditional branch, is to start lookups on both branch paths and save them until the branch is actually executed, at which point the "true" execution path will be known. On-chip, modern CPUs employ a superscalar design (introduced in Chapter 4), which is a step beyond the pipeline. Superscalar chips have multiple ALUs and issue more than one instruction per clock cycle. Clock cycles per instruction can really go under one. But the logic for tracking hazards becomes even more complex; it takes more logic to schedule operations than to perform them. But even with complex logic, it is difficult to schedule parallel operations “on the fly”. The architecture discussed in Chapter 4. EPIC machines have very large instructions (remember that the instructions for Itanium are 128 bits), which specify several operations to be performed in parallel. Due to parallelism, inherent in the design, the EPIC instruction set is highly compiler dependent (meaning that a user needs a sophisticated compiler to take advantage of parallelism for significant performance benefits). The weight of programming


Page 250:
5.6 / ISA Real World Examples, , 219, , operations are offloaded from the processor to the compiler and much more time can be spent developing a good schedule and analyzing potential pipeline conflicts, to reduce pipeline problems. , IA-64, introduced predicate instructions. Compare instructions define predicate bits, just as they define condition codes on the x86 machine (except there are 64 predicate bits). Each operation specifies a predicate bit; is executed only if the predicate bit is equal to 1. In practice, all operations are executed, but the result is stored in the log file only if the predicate bit is equal to 1. The result is that they are executed more instructions, but we don't need to stop the pipeline waiting for a condition. There are various levels of parallelism, ranging from the simple to the complex. All computers exploit parallelism to some degree. Instructions use words as operands (where words are typically 16, 32, or 64 bits long), rather than acting on individual bits at a time. More advanced types of parallelism require more specific and complex hardware and operating system support. Although an in-depth study of parallelism is beyond the scope of this text, we would like to take a brief look at what we consider to be the two extremes of parallelism: Program Level Parallelism (PLP) and Instruction Level Parallelism (ILP). ). parts of a program to run on more than one computer. This may sound simple, but it requires the correct coding of the algorithm to make this parallelism possible, as well as providing careful synchronization between the various modules. ILP involves the use of techniques that allow the execution of overlapping instructions. Essentially, we want to allow more than one statement within a single program to execute simultaneously. There are two types of ILP. The first type breaks an instruction into stages and overlaps them. This is exactly what the pipeline does. The second type of ILP allows individual instructions to overlap (that is, the processor itself can execute the instructions at the same time). show ILP. Superscalar architectures (as you'll recall from Chapter 4) perform multiple operations at the same time, using parallel pipelines. Examples of superscalar architectures include IBM's PowerPC, Sun's UltraSparc, and DEC's Alpha. Superpipeline architectures combine superscalar concepts with pipeline, by breaking the pipeline stages into smaller pieces. The IA-64 architecture exhibits a VLIW architecture, which means that each instruction can specify multiple scalar operations (the compiler places multiple operations in a single instruction). Superscalar and VLIW machines get and execute more than one instruction per cycle to address the issues presented in this chapter: instruction formats, instruction types, number of operands,


Page 251:
220, , Chapter 5 / A Closer Look at Instruction Set, Addressing, and Pipeline Architectures. We'll also introduce the Java Virtual Machine to illustrate how software can create an ISA abstraction that completely hides the actual ISA machine., 5.6.1, , Intel, Intel uses a variable-length, two-address, little-endian instruction architecture. Intel processors use register memory architecture, which means that all instructions can operate on one memory location, but the other operand must be a register. This ISA allows variable-length operations, operating on data with lengths of 1, 2, or 4 bytes. The 8086 through 80486 are single stage pipeline architectures. The architects reasoned that if one pipe was good, two would be better. The Pentium had two parallel five-stage pipes, called the U-tube and V-tube, for executing instructions. The stages of these pipelines include prefetch, instruction, decoding, address generation, execution, and rewriting. To be effective, these pipes must be kept full, which requires instructions that can be issued in parallel. It is the responsibility of the compiler to ensure that this parallelism occurs. Retirement. Most of the new steps were added to address Intel's MMX technology, an architecture extension that handles multimedia data. The Pentium III increased the stages to 14 and the Pentium IV to 24. Additional stages (beyond those presented in this chapter) included stages to determine instruction duration, stages to create, micro-operations, and stages to "confirm" the instruction. instruction (certify to make sure it is executed and the results become permanent). Itanium only contains a 10-stage instruction pipeline. Intel processors support the basic addressing modes presented in this chapter, plus many combinations of these modes. The 8086 provided 17 different ways to access memory, most of which were variants of the basic modes. 🇧🇷 Surprisingly, the IA-64 lacks memory addressing modes. It only has one: log-indirect (with optional post-increment). This seems extremely limiting, but it follows the RISC philosophy. Addresses are computed and stored in general purpose registers. The more complex addressing modes require specialized hardware; By limiting the number of addressing modes, the IA-64 architecture minimizes the need for this specialized hardware. , word-addressable, three-address, fixed-length ISA. This is a load and store architecture, which means that only load and store instructions can access memory. All other instructions must use registers for operands, which means that this ISA requires a large set of registers. MIPS is


Page 252:
5.6 / Real World Examples of ISA, , 221, , also limited to fixed-length operations (those that operate on data with the same number of bytes). Some MIPS processors (like the R2000 and R3000) have five stages, pipeline. The R4000 and R4400 have 8 stage super pipes. The R10000 is quite interesting, as the number of stages in the pipeline depends on the functional unit the instruction must pass through: there are five stages for integer instructions, six for load/store instructions, and seven for floating point instructions. . 🇧🇷 Both MIPS 5000 and 10000 are superscalars. MIPS has a simple ISA with five basic instruction types: simple, arithmetic (add, XOR, NAND, shift), data movement (load, store, move), control (branch, skip), multicycle (multiply, divide), and miscellaneous instructions (save PC, save log in conditions). MIPS programmers can use immediate, registered, direct, indirect, base, and indexed addressing modes. However, ISA only provides one (base addressing). The assembler provides the remaining modes. MIPS64 has two additional addressing modes for use in embedded system optimizations. The MIPS instructions in Chapter 4 had up to four fields: an opcode, two operand addresses, and a result address. There are essentially three instruction formats available: the I type (immediate), the R type (register), and the J type (jump). -destination bit, register, a 5-bit offset amount, and a 6-bit function. Type I instructions have a 6-bit operand, a 5-bit source register, a 5-bit destination register or branch condition, and a 16-bit branch offset or immediate address change. J-type instructions have a 6-bit opcode and a 26-bit destination address., 5.6.3, Java Virtual Machine, Java, a language that is becoming quite popular, is very interesting because it is independent of the platform. This means that if you compile code on one architecture (for example, Pentium) and want to run your program on a different architecture (for example, a Sun workstation), you can do so without modifying or recompiling your code. The compiler makes no assumptions about the underlying architecture of the machine the program will run on, such as the number of registers, memory size, or I/O ports, when you first compile your code. However, after compilation, to run your program, you will need a Java Virtual Machine (JVM), for the architecture your program will run on. (A virtual machine is a software emulation of a real machine.) The JVM is essentially a "wrapper" that follows the hardware architecture and is highly platform dependent. The JVM for the Pentium is different from the JVM for a Sun workstation, which is different from the JVM for the Macintosh, and so on. But since the JVM exists on a specific architecture, that JVM can run any Java program compiled on any ISA platform. It is the responsibility of the JVM to load, verify, find, and execute bytecodes at runtime. The JVM, although virtual, is a good example of a well-designed ISA. The JVM for a specific architecture is written in the native instruction set of that architecture. It acts as an interpreter, taking Java bytecodes and interpreting them into explicit underlying machine instructions. Bytecodes are produced when


Page 253:
222, , Chapter 5 / A Closer Look at Instruction Set Architectures, , a Java program is compiled. These bytecodes then become inputs to the JVM. The JVM can be compared to a giant switch statement (or case), parsing one bytecode statement at a time. Each bytecode instruction causes a jump to a specific code block, which implements the given bytecode instruction. This differs significantly from other high-level languages ​​you may be familiar with. For example, when you compile a C++ program, the object code produced is for that specific architecture. (Compiling a C++ program results in an assembly language program that is translated into machine code.) If you want to run your C++ program on a different platform, you must recompile it for the target architecture. The compiler translates compiled languages ​​into binary machine code executable files. Once this code has been generated, it can only be executed on the target architecture. Compiled languages ​​tend to exhibit excellent performance and provide very good access to the operating system. Examples of compiled languages ​​include C, C++, Ada, FORTRAN, and COBOL. Some languages ​​like LISP, PhP, Perl, Python, Tcl and most BASIC languages ​​are interpreted. The source must be reinterpreted each time the program is executed. The trade off for platform independence of the interpreted languages ​​is slower performance, often by a factor of 100 times. (We will have more to say on this subject in Chapter 8.) There are also languages ​​that are a bit of both (compiled and interpreted). They are often referred to as P-code languages. Source code written in these languages ​​is compiled into an intermediate form, called P-code, and then the P-code is interpreted. P-code languages ​​typically run 5-10 times slower than compiled languages. Python, Perl, and Java are actually P code languages, although they are often referred to as interpreted languages. Figure 5.6 presents an overview of the Java programming environment. ), javac, Java compiler, runtime environment, JVM, program class files, (file. class), the actual bytecode, java, class, loader, JAVA, API, files, execution, engine, FIGURE 5.6 The Java programming environment


Page 254:
5.6 / Real World Examples of ISA, , 223, , Perhaps more interesting than Java's platform independence, particularly in relation to the topics covered in this chapter, is the fact that Java bytecode is a stack-based language, partially composed of zero code. direction instructions. Each instruction consists of a one-byte opcode followed by zero or more operands. The opcode itself indicates whether it is followed by operands and the form the operands take (if any). Many of these instructions require zero operands. Java uses two's complement to represent signed integers, but does not allow unsigned integers. The characters are encoded with 16-bit Unicode. Java has four registers, which provide access to five different regions of main memory. All references to memory are based on offsets to these registers; Pointers or absolute memory addresses are never used. Since the JVM is a stack machine, no general logging is provided. This lack of overhead registers is detrimental to performance as more memory references are generated. We are trading performance for portability. Let's look at a small Java program and its corresponding bytecode. Example 5.2 shows a Java program that finds the maximum of two numbers. EXAMPLE 5.2 Here is a Java program to find the maximum of two numbers. ., public class Maximum {, public static void main (String[] Args), { int X,Y,Z;, X = Integer.parseInt(Args[0]);, Y = Integer.parseInt(Args [1 ] );, Z = Max(X,Y);, System.out.println(Z);, }, public static int Max (int A, int B), { int C;, if (A>B )C = A;, else C=B;, return C;, }, }, , After compiling this program (using javac), we can disassemble it to examine the bytecode by issuing the following command:, javap -c Maximum, , You should see the following:, Compiled from Maximum.java, public class Maximum extends java.lang.Object {, public Maximum();


Page 256:
Chapter Summary, , 225, , compiles to the following bytecode:, 14, 15, 16, 19, , iload_1, iload_2, invokestatic #3 <Method int Max(int, int)>, istore_3, , Should be pretty obvious that Java bytecode is stack based. For example, the iadd statement pops two integers off the stack, adds them, and then pushes the result back on the stack. There is no such thing as "add r0, r1, f2" or "add AC, X". The iload_1 (integer load) instruction also uses the stack by putting slot 1 on the stack (slot 1 in main contains X, so X is placed on the stack). And it is pushed onto the stack by statement 15. The invoke static statement actually executes the call to the Max method. When the method completes, the isstore_3 instruction pops the top of the stack and stores it in Z. We will explore the Java language and the JVM in more detail in Chapter 8. The instruction set architecture includes the memory model (size word and how address space is divided), registers, data types, instruction formats, addressing, and instruction types. Although most computers today have general-purpose register sets and specify operands by combinations of memory and register locations, instructions vary in size, type, format, and number of operands allowed. The instructions also have strict requirements for the locations of these operands. Operands can be located on the stack, in registers, in memory, or in a combination of all three. Many decisions must be made when designing ISAs. Larger instruction, sets require longer instructions, which means longer seek and decode time. Fixed-length instructions are easier to decode, but can waste space. brief instructions. Perhaps the most interesting debate is that of small versus big endian byte ordering. There are three options for internal storage on the CPU: stacks, an accumulator, or general purpose registers. Each has its advantages and disadvantages, which must be considered in the context of the applications of the proposed architecture. The internal storage scheme has a direct impact on the format of the instruction, particularly the number of operands that the instruction can reference. Stack, the architectures use zero operands, which fits well with RPN notation. Instructions fall into the following categories: data movement, arithmetic, Boolean, bit manipulation, I/O, control transfer, and special.


Page 257:
226, , Chapter 5 / A Closer Look at Instruction Set Architectures, , Purpose. Some ISAs have many instructions in each category, others have many, few in each category, and many have a combination of each. Advances in memory technology, resulting in larger memories, have led to the need for alternative addressing modes. The various addressing modes introduced included immediate, direct, indirect, register, indexing, and stack. Having these different modes provides flexibility and convenience for the programmer without changing the fundamental operations of the CPU. The statement-level pipeline is an example of statement-level parallelism. It is a common but complex technique that can speed up the get, decode, and execute cycle. With pipeline, we can overlap instruction execution, executing multiple instructions in parallel. However, we have also seen that the amount of parallelism can be limited by pipeline conflicts. While pipelining executes different stages of multiple instructions at the same time, superscalar architectures allow us to perform multiple operations at the same time. Superpipelining, a combination of superscalar and pipelining, as well as VLIW, was also briefly introduced. There are many types of parallelism, but at the level of organization and architecture of the computer, what we are really interested in primarily is the ILP. Intel and MIPS have interesting ISAs, as we saw both in this chapter and in Chapter 4. However, the Java virtual machine is a unique ISA, because the ISA is embedded in software, allowing Java programs to run on any machine that supports JVM. Chapter 8 covers the JVM in great detail. FURTHER READING Instruction sets, addressing, and instruction formats are covered in detail in almost every computer architecture book. Patterson and Hennessy's (1997) book provides excellent coverage in these areas. Many books like Brey, (2003), Messmer (1993), Abel (2001) and Jones (2001) are dedicated to the Intel x86 architecture. For those interested in the Motorola 68000 series, we suggest Wray and Greenfield (1994) or Miller (1992). Kaeli and Emma (1991) provide an interesting overview of how branching affects pipeline performance. For a good history of segmentation, see Rau and Fisher (1993). For a better idea of ​​the limitations and problems of segmentation, see Wall, (1993). Instruction set architectures worth mentioning. Atanasoff's ABC computer (Burks and Burks [1988], von Neumann's EDVAC, and Mauchly and Eckert's UNIVAC (Stern [1981] for information on both) had very simple instruction set architectures, but required programming 8080, (a single-address machine) was the predecessor to the 80x86 family of chips introduced in Chapter 4. See Brey (2003) for a complete and readable introduction to the Intel family of processors. Hauck (1968) provides good coverage of Burroughs, zero-address machine Struble (1975) has a good presentation of IBM's 360 family Brunner (1991) provides details on DEC's VAX systems, which incorporated dual-core architectures direction with more sophisticated instruction sets SPARC (1994)


Page 258:
References, , 227, , provides an excellent overview of the SPARC architecture. Meyer and Downing (1991), Lindholm, and Yellin and Venner provide very interesting coverage of JVM., , REFERENCES, Abel, Peter. IBM PC Assembly Language and Programming, 5th Ed., Upper Saddle River, NJ:, Prentice Hall, 2001., Brey, B. Intel 8086/8088, 80186/80188, 80286, 80386, 80486 Microprocessors Pentium Processor and Pentium Pro, Pentium II, Pentium III, and Pentium IV: Architecture, Programming, and Interface, 6th ed., Englewood Cliffs, NJ: Prentice Hall, 2003., Brunner, R.A. VAX Architecture Reference Manual, 2nd ed., Herndon, VA: Digital Press, 1991., Burks, Alice and Burks, Arthur. The First Electronic Computer: The Atanasoff Story. Ann Arbor, MI: University of Michigan Press, 1988., Hauck, E.A. and Dent, B.A. "Burroughs B6500/B7500 Stack Mechanism", AFIPS Proceedings, SJCC (1968), vol. 32, pp. 245–251., Jones, William. Assembly Language Programming for the IBM PC Family, 3rd ed., El Granada, CA: Scott/Jones Publishing, 2001., Kaeli, D. and Emma, ​​​​​​P. "Branch History Table Prediction of Target Branches Moving Due to Subroutine Returns". Proceedings of the 18th Annual International Symposium on Computer Architecture, May 1991. Lindholm, Tim and Yellin, Frank. The Java virtual machine specification. Online at java.sun.com/docs/books/vmspec/html/VMSpecTOC.cod.html., Messmer, H. The Indispensable PC Hardware Book. Reading, MA: Addison-Wesley, 1993., Meyer, J. and Downing, T. Java Virtual Machine. Sebastopol, CA: O'Reilly & Associates, 1991., Miller, M. A. The 6800 Family, Architecture Programming and Applications, 2nd ed., Columbus, OH: Charles E. Merrill, 1992., Patterson, D. A., & Hennessy, J. L. Computer Organization and Design, The Hardware/Software, Interface, 2nd ed., San Mateo, CA: Morgan Kaufmann, 1997., Rau, B. Ramakrishna, & Fisher, Joseph A. “Instruction-Level Parallel Processing: History,, Overview, and perspective". Journal of Supercomputing 7(1), Jan 1993, pp. 9–50., Sohi, G. "Instruction Issue Logic for High-Performance Interruptible Multiple Functional Units, Pipeline Computers." IEEE Transactions on Computers, March 1990., SPARC International, Inc., The SPARC Architecture Manual: Version 9, Upper Saddle River, NJ:, Prentice Hall, 1994., Stallings, W. Computer Organization and Architecture, 5th ed., New York, New York: Macmillan Publishing Company, 2000., Stern, Nancy. From ENIAC to UNIVAC: an evaluation of Eckert-Mauchly computers. Herndon, VA: Digital Press, 1981., Struble, G.W. Assembly Language Programming: The IBM System/360 and 370, 2nd ed., Reading, MA: Addison-Wesley, 1975., Tanenbaum, Andrew. Structured Computing Organization, 4th ed., Upper Saddle River, NJ: Prentice Hall, 1999., Venner, Bill. Inside the Java Virtual Machine. Online at www.artima.com., Wall, David W. Limits of parallelism at the instruction level. DEC-WRL Investigative Report 93/6, November 1993.


Page 259:
228, , Chapter 5 / A Closer Look at Instruction Set Architectures, Wray, W. C., & Greenfield, J. D. Using Microprocessors and Microcomputers, the Motorola Family. Englewood Cliffs, NJ: Prentice Hall, 1994., , REVIEW OF ESSENTIAL TERMS AND CONCEPTS, 1. Explain the difference between register-to-register, register-to-memory, and memory-to-memory instructions. 2. There are several design decisions regarding instruction sets. Name four and explain. 3. What is an expandable opcode? 4. If a byte-addressable machine with 32-bit words stores the hexadecimal value 98765432, indicate how this value would be stored on a little endian machine and on a big endian machine, machine. Why is endian-ness important? 5. We can design stack architectures, accumulator architectures or register architectures, general purpose registers. Explain the differences between these options and name some situations in which one might be better than the other. 6. How are memory-memory, register-memory, and load-store architectures different? What are the pros and cons of fixed-length and variable-length instructions? Which is currently the most popular? 8. How does a zero-operand-based architecture get data values ​​from memory? 9. Which is probably longer (has more instructions): a program written for a zero -, , address architecture, a program written for a one-address architecture, or a program written for a two-address architecture? Why?, 10. Why can stack architectures represent arithmetic expressions in reverse Polish notation?, 11. Name the seven types of data instructions and explain each one., 12. What is an address mode? direct, register, indirect, indirect and indexed register, addressing, 14. How is indexed addressing different from based addressing?, 15. Why do we need so many different addressing modes?, 16. Explain the concept behind the pipeline., 17 What is the theoretical speedup for a 4-stage pipeline with a 20 ns clock cycle if it is processing 100 jobs? 18. What are the pipeline conflicts that can cause a pipeline to slow down? types of ILP and how are they different? 20. Explain superscalar, superpipelining, and VLIW architectures.


Page 260:
Exercises, , , 229, , 21. List several ways in which Intel ISA and MIPS differ. Name several forms of entry that are the same. 22. Explain the Java bytecodes. 23. Give an example of a current stack-based architecture and a current GPR-based architecture. How are they different?, , EXERCISES, 1. Suppose you have a machine that uses 32-bit integers and you are storing the hexadecimal value 1234 at address 0:, ◆, , a) Show how this is stored in a large endian machine., , ◆, , b) Show how this is stored in a little endian machine., c) If you want to increase the hexadecimal value to 123456, which byte allocation would be more efficient, big or little endian? Explain your answer. 2. Show how the following values ​​would be stored by machines with 32-bit words, using little endian and then big endian format. Assume each value starts at address 1016. Draw a memory diagram for each, placing the appropriate values ​​in the correct (and labeled) memory locations. the first two bytes of a 2M ⫻ 16 main memory have the following hexadecimal values:, , • Byte 0 is FE, • Byte 1 is 01, If ​​these bytes contain a 16-bit two's complement integer, what is their decimal value? real if :, ◆, , a) is the memory big endian?, , ◆, , b) is the memory little endian?, , 4. What kind of problems do you think endian can cause if you want to transfer, , data from a big endian machine to a little endian machine? Explain., ◆, , 5. The Institute for Population Studies monitors the population of the United States. In , 2000, this institute created a program to create files of numbers that represent the populations of the different states, as well as the total population of the US. This program, which runs on a Motorola processor, projects the population based on various rules, such as the average number of births and deaths per year. The institute runs the program and then sends the output files to state agencies so that the data values ​​can be used as input to various applications. However, a Pennsylvania branch, running all Intel machines, encountered difficulties, as indicated by the following issue.


Page 261:
230, , Chapter 5 / A closer look at instruction set architectures, when the 32-bit unsigned integer 1D2F37E816 (representing the 2003 US General Population Forecast) is used as input and the program from the agency simply generates, that input value, the input value. the population forecast for 2003 is very large. Can you help this Pennsylvania agency by explaining what might be going wrong? Why is this not a good idea in a stacked machine?, ◆, , 7. A computer has 32-bit instructions and 12-bit addresses. Suppose there are 250 instructions with address 2-, . How many 1-direction instructions can be formulated? Explain your answer., 8. Convert the following expressions from infix notation to reverse Polish notation (postfix)., ◆, , a) X ⫻ Y + W ⫻ Z + V ⫻ U, b) W ⫻ X + W ⫻ ( U ⫻ V + Z), c) (W ⫻ (X + Y ⫻ (U ⫻ V)))/(U ⫻ (X + Y)), , 9. Convert the following expressions from reverse Polish notation to infix notation ., a) W X Y Z ⫺ + ⫻, b) U V W X Y Z + ⫻ + ⫻ +, c) X Y Z + V W ⫺ ⫻ Z ++, 10. a) Write the following expression in postfix (reverse Polish) notation. Remember, the precedence rules for arithmetic operators!, A ⫺ B + C ⫻ (D ⫻ E ⫺ F), X = ᎏᎏᎏ, G+H⫻K, b) Write a program to calculate the above arithmetic statement using a stack-organized computer with zero-address instructions (only pop and push can access memory). 11. a) In a computer instruction format, the length of the instruction is 11 bits and the size of an address field is 4 bits. Is it possible to have, 5, , 2-address instructions, , 45 1-address instructions, 32 0-address instructions, using the format? Justify your answer. b) Suppose that a computer architect has already designed 6 two-address instructions and 24 zero-address instructions using the instruction format given in Problem 11. What is the maximum number of one-address instructions that can be added to the instruction? set?, 12. What is the difference between using direct and indirect addressing? Set an example.


Page 262:
Exercises, ◆, , 231, , 13. Suppose we have the instruction Load 1000. Given that memory and register R1, , contain the following values:, Memory, 1000, ..., , 1400, , 1100, .. ., , 400, , 1200, ..., , 1000, , 1300, ..., , 1100, , 1400, , 1300, , R1, , 200, , Assuming R1 is implicit in indexed addressing mode, determine the value, loaded into the accumulator and fill in the table below:, Loaded Value, in AC, , Mode, Immediate, Direct, Indirect, Indexed, , 14. Suppose we have the instruction Load 500. Given that the memory and the register R1 , , contain the following values:, Memory, 100, ..., , 600, , 400, ..., , 300, , 500, ..., , 100, , 600, ..., , 500 , , 700, , 800, , R1, , 200, , Assuming that R1 is implicit in indexed addressing mode, determine the actual value, loaded into the accumulator, and fill in the following table: Mode, Immediate, Direct, Indirect, Indexed , , Value charged, in CA


Page 263:
232, , Chapter 5 / A Closer Look at Instruction Set Architectures, 15. A non-pipeline system takes 200 ns to process a task. The same task can be processed, , ◆, , in a 5-threaded pipeline with a 40 ns clock cycle. Determine the acceleration rate of the pipeline for 200 jobs. What is the maximum acceleration that can be achieved with the piped unit over the unpiped unit? 16. A non-pipeline system takes 100 ns to process a job. The same task can be processed in a 5-stage pipeline with a 20ns clock cycle. Determine the pipeline throttling rate for 100 tasks. What is the theoretical acceleration that can be achieved with the piped system over an unpiped system?, 17. Write code to implement the expression A = (B + C) ⫻ (D + E) in 3-, 2-, 1 - , and 0 machine addresses. In accordance with the practice of the programming language, computation, the expression must not change the values ​​of its operands., 18. A digital computer has a memory unit with 24 bits per word. The instruction set consists of 150 different operations. All instructions have an opcode part and an address part (allowing only one address). Each instruction is stored in one word of memory., , How many bits are needed for the opcode?, ◆, How many bits are left for the address part of the instruction?, ◆, What is the maximum memory size permitted? , ◆, What is the largest unsigned binary number that can be accommodated in a word of memory?, 19. A computer memory unit has 256K words of 32 bits each. The computer has an instruction format with 4 fields: an opcode field; a mode field to specify 1 of 7 addressing modes; a record address field to specify 1 of 60 records; and a memory, address field. Suppose an instruction is 32 bits long. Answer the following: a) How long should the mode field be?, b) How long should the record field be?, c) How long should the address field be?, d) How big should it be? the opcode field?, 20. Suppose an instruction requires four cycles to execute on a non-pipeline CPU: one cycle to get the instruction, one cycle to decode the instruction, one cycle to perform the ALU operation, and one cycle to store the result. On a CPU with a 4-stage pipeline, this instruction still takes four cycles to execute, so how can we say that the pipeline speeds up program execution?, *21. Choose an architecture (in addition to those covered in this chapter). Do some research to find out how your architecture addresses the concepts presented in this chapter, just as it did with Intel, MIPS, and Java., ◆, , a), b), c), d), True or False. , 1. Most computers are generally classified into one of three types of CPU organization: (1) general register organization; (2) single accumulator organization; or (3) stack organization. 2. The advantage of zero address instruction computers is that they have short programs; the downside is that the instructions require a lot of bits, making them very, very long.


Page 264:
RAM /abr./: Rarely adequate memory, because the more memory a computer has, the faster it can generate error messages. , , CHAPTER, , 6, 6.1, , Memory, , INTRODUCTION, most computers are built from the Von Neumann model, which has as its center, , Mmemory. The programs that perform the processing are stored in memory. We looked at a small 4 × 3-bit memory in Chapter 3 and learned how to address memory in Chapters 4 and 5. We know that memory is logically structured as an array of locations, with addresses from 0 to the maximum memory size that The processor can handle. You can board. In this chapter, we look at the various types of memory and how each is part of the memory hierarchy system. Next, we discuss cache, memory (a special high-speed memory), and a method that makes the most of memory through virtual memory implemented through paging., , 6.2, , MEMORY TYPES, One question A common question that many people ask is "Why are there so many different types of computer memory?" The answer is that new technologies keep being introduced in an attempt to match improvements in CPU design: memory speed has to keep up with the CPU or memory becomes a bottleneck. While we have seen many improvements in CPUs in recent years, improving main memory to keep up with the CPU is not as critical due to cache usage. Cache is a type of small, high-speed (and therefore high-cost) memory that serves as a buffer for frequently accessed data. The additional expense of using very fast technologies for memory cannot always be justified because slower memories can often be "hidden" by, 233


Page 265:
234, , Chapter 6 / Memory, , High performance caching systems. However, before we talk about the cache, we will explain the various memory technologies. Although there are a large number of memory technologies, there are only two basic types of memory: RAM (Random Access Memory) and ROM (Read Only Memory). RAM is a bit of a misnomer; a more appropriate name is read-write memory. RAM is the memory to which the computer's specifications refer; if you buy a computer with 128 megabytes of memory, it will have 128 MB of RAM. RAM is also the "main memory" that we refer to continuously throughout this book. Often called primary memory, RAM is used to store programs and data that the computer needs when running programs; but RAM is volatile and loses this information when power is turned off. There are two general types of chips used to build most of the RAM in today's computers: SRAM and DRAM (static and dynamic random access memory). Dynamic RAM is built with tiny capacitors that leak electricity. DRAM requires a reload every few milliseconds to retain its data. Static RAM technology, on the other hand, retains its contents as long as power is available. SRAM consists of circuitry similar to the D-flip-flops we studied in Chapter 3. SRAM is faster and much more expensive than DRAM; however, designers use DRAM because it is much denser (it can store many bits per chip), consumes less power, and generates less heat than SRAM. For these reasons, both technologies are often used in combination: DRAM for main memory and SRAM for cache. The basic operation of all DRAM memory is the same, but there are many types, including Multi-Bank DRAM (MDRAM), Fast Page Mode (FPM) DRAM, Extended DRAM, Data Output (EDO), Burst EDO DRAM ( BEDO DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Synchronous Link (SL) DRAM, Double Data Rate (DDR) SDRAM, and Direct Rambus (DR) DRAM. The different types of SRAM include asynchronous SRAM, synchronous SRAM, and pipe burst SRAM. For more information on these types of memory, see the references listed at the end of the chapter. In addition to RAM, most computers contain a small amount of ROM (read-only memory) that stores critical information needed to operate the system. , as the program needed to boot the computer. The ROM is not volatile and always retains its data. This type of memory is also used in embedded systems or in any system where it is not necessary to change the programming. Many household appliances, toys, and most automobiles use ROM chips to retain information when the power is turned off. ROMs are also widely used in calculators and peripheral devices like laser printers that store their fonts in ROMs. There are five different basic types of ROM: ROM, PROM, EPROM, EEPROM, and flash memory. PROM (Programmable Read Only Memory) is a variation of ROM. PROMs can be programmed by the user with the appropriate equipment. While ROMs are hardwired, PROMs have fuses that can be blown to program the chip. Once programmed, the data and instructions in the PROM cannot be changed. an EPROM requires a special tool that emits ultraviolet light). To reprogram an EPROM, the entire chip must first be erased. EEPROM, (Electrically Erasable PROM) eliminates many of the disadvantages of EPROM: no


Page 266:
6.3 / The hierarchy of memory, , 235, , special tools are needed to erase (this is done by applying an electric field) and you can only erase parts of the chip, one byte at a time. Flash memory is essentially EEPROM with the added benefit that data can be written or erased in blocks, removing the limitation of one byte at a time. This makes flash memory faster than EEPROM., , 6.3, , THE HIERARCHY OF MEMORY, One of the most important considerations in understanding the performance capabilities of a modern processor is the hierarchy of memory. Unfortunately, as we've seen, not all memory is created equal, and some types are much less efficient and therefore less expensive than others. To address this disparity, today's computer systems use a combination of memory types to provide the best performance at the best cost. This approach is called hierarchical memory. As a general rule of thumb, the faster the memory, the more expensive it is per bit of storage. By using a hierarchy of memory, each with different access speeds and storage capacities, a computer system can perform better than would be possible without the combination of various types. The basic types that typically make up the hierarchical memory system include registers, cache, main memory, and secondary memory. can be temporarily stored. This cache is connected to a much larger main memory, which is usually a medium speed memory. This memory is supplemented by a very large secondary memory, consisting of a hard drive and various removable media. Using this hierarchical scheme, you can improve the effective speed of memory access, using only a small number of fast (and expensive) chips. This allows designers to create a computer with acceptable performance at a reasonable cost. We classify memory based on its "distance" from the processor, the distance being measured by the number of machine cycles required for access. The closer the memory is to the processor, the faster it needs to be. As memory moves away from the main processor, we can allow for longer access times. Therefore, slower technologies are used for these memories and faster technologies for memories closer to the CPU. The better the technology, the faster and more expensive memory becomes. Therefore, faster memories tend to be smaller than slower ones, due to cost. The following terminology is used when referring to this memory hierarchy: • Hit: The requested data resides in a certain memory level (we are typically concerned with the hit rate only for higher memory levels). • Failure: The requested data was not found at the given memory level. • Hit Rate: The percentage of memory accesses found at a given memory level. percentage of memory accesses that are not at a certain memory level. Note: Miss rate = 1 ⫺ Hit rate. • Hit time: the time required to access the requested information in a given memory level.


Page 267:
Chapter 6 / Memory, , • Failure Penalty: The time required to process a failure, which includes replacing a block at a higher level of memory, plus additional time to deliver the requested data to the processor. (The time to process an error is usually significantly longer than the time to process a hit.) The memory hierarchy is illustrated in Figure 6.1. This is drawn as a pyramid to help indicate the relative sizes of these various memories. The memories closest to the top tend to be smaller in size. However, these smaller memories have better performance and therefore higher cost (per bit) than memories at the bottom of the pyramid. The numbers given to the left of the pyramid indicate typical access times. For any given data, the processor sends its request to the smallest and fastest memory partition (usually cache, because registers tend to have a more special purpose). If data is found in the cache, it can be quickly loaded into the CPU. If it doesn't reside in the cache, the request is forwarded to the next level down in the hierarchy and this lookup process starts over. If data is found at this level, the entire block in which the data resides is transferred to the cache. If no data is found at that level, the request is forwarded to the next lower level, and so on. The key idea is that when the lower levels (slower, larger and cheaper) of the hierarchy respond to a request from the higher levels for the content of location X, they also send, at the same time, the data located in the directions X + 1 , X + 2, . 🇧🇷 🇧🇷 ,, thus returning an entire block of data to higher level memory. The hope is that this additional data will be referenced in the near future, which, in most cases, it is. The memory hierarchy is functional because programs tend to exhibit a property, known as locale, which generally allows the processor to access the returned data, for addresses X+1, X+2, etc. So while there is a failure for, say, Most Expensive, Access, Times, , 50ns, Level 2, , 90ns, Cache, Cache, Main M, Magneti, Tape c(R, * if the volume is mounted. , , in, rL, in, , obotic L, , ffl, , in, , Disk (Ju, keboxes, ), , O, , 3m *, , Least expensive, , nl, igid Dis, k y , , 25ns, , 10ns, , Register, rs, , st, , 3ns, , 2ns, , in, , 1ns, , Sy, , 236, , ibraries, , ), , FIGURE 6.1 The memory hierarchy


Page 268:
6.4 / Cache, , 237, , cache, for X, there may be multiple cache hits on the block just fetched, later due to locale., 6.3.1, , Locale for reference, in practice, the Processors tend to access memory in a very standardized way. For example, in the absence of branches, the PC in MARIE is incremented by one after each instruction fetch. Therefore, if memory location X is accessed at time t, there is a high probability that memory location X + 1 will also be accessed in the near future. This grouping of memory references into groups is an example of locality, of reference. This locality can be exploited by implementing memory as a hierarchy; when a fault is processed, instead of simply transferring the requested data to a higher level, the entire block containing the data is transferred. Due to the locality of reference, it is likely that additional data will be needed in the block in the near future, in which case the data can be quickly loaded from faster memory. There are three basic forms of locality: • Temporary locality: Items that were recently accessed tend to be accessed again in the near future. • Spatial locality: accesses tend to be grouped together in address space (eg, as in arrays or loops). , • Sequential locality: Instructions tend to be accessed sequentially., The locality principle provides the opportunity for a system to use a small amount of memory very quickly to effectively speed up most memory accesses. it is accessed at any time, and the values ​​in that space are accessed over and over again. Therefore, we can copy these values ​​from a slower memory to a smaller but faster memory that resides higher up in the hierarchy. This results in a memory system that can store a large amount of information in a large but inexpensive memory and still provide nearly the same access speeds as would result if very fast but expensive memory were used., , 6.4, , MEMORY CACHE , A computer processor is very fast and is constantly reading information from memory, which means that many times it has to wait for the information to arrive, because memory access times are slower than the speed of the processor. A cache is a small, temporary, but fast memory that the processor uses to obtain information that is likely to be needed again in the very near future. Examples of non-computational caching are everywhere. Keeping them in mind will help you understand your computer's memory cache. Think of a homeowner with a large toolbox in the garage. Suppose you own this house and have a basement renovation project to work on. You know, the project will require drills, wrenches, hammers, a tape measure, various types of saws, and many different types and sizes of screwdrivers. The first thing you need to do is measure and then cut some wood. Run to the garage, get


Page 269:
238, , Chapter 6 / Memory, , the tape measure from a huge tool storage chest, goes down to the basement, measures the wood, runs back to the garage, puts down the tape measure, grabs the saw and returns to the cellar with the saw and cut the wood. Now you decide, screw some pieces of wood. So run to the garage, get the bit set, go back to the basement, drill the holes to put the screws in, go back to the garage, put the bit set down, get a wrench, go back to the basement, figure it out, the wrench is the size Wrong, go back to the toolbox in the garage, get another key, run downstairs. 🇧🇷 🇧🇷 Wait! Would you really work like that? No! Being a reasonable person, you think to yourself, "If I ever need a wrench, I'll probably need another one in a different size soon, so why not get the full set of wrenches?" Taking this one step further, you reason, "Once you're done with a certain tool, there's a good chance you'll need another one soon, so why not just pack up a little toolbox and take it down to the basement?" In this way, you keep the tools you need always at hand, so access is faster. You've just cached some easy-to-access and quick-to-use tools! Tools that you are less likely to use are stored farther away and take longer to access. That's all the cache does: it stores the data that has been accessed and the data that the CPU can access in faster, closer memory. Another cache analogy is found in grocery shopping. You rarely, if ever, go to the supermarket to buy a single item. You buy all the items you need right away, plus items you'll likely use in the future. The grocery store, the grocery store is similar to the main memory, and its home is the cache. As another example, consider how many of us carry an entire phone book. Most of us have a small address book. We enter the names and numbers of the people we respond to, to call more frequently; looking up a number in our address book is much faster than finding a phone book, locating the name, and getting the number. We tend to keep the address book handy, while the phone book is probably in our house, tucked away on a nightstand or shelf somewhere. The phone, the book is something we don't use very often, so we can keep it in a place a little further away. Comparing the size of our phone book with the phone book, we see that the "memory" of the phone book is much smaller than that of a phone book. But the chances are very high that when we make a call, it will be someone in our address book. Research students offer another common example of caching. Suppose you are writing an article on quantum computing. Would you go to the library, look for a book, go home, look up the information you need from that book, go back to the library, look for another book, go back home, etc.? No, she would go to the library and check out all the books she would need and take them home with her. The library is analogous to main memory, and its home is, again, similar to the cache. And, as a final example, consider how one of your authors uses the office. Any material you don't need (or haven't used for a period of more than six months) is archived in a large file set. However, frequently used "dice" remain stacked on your desk, close at hand and easy (sometimes) to find. If you need something from a file, chances are you'll get the whole file, not just one or two pieces of paper from the folder. The entire file is then added to the stack on your


Page 270:
6.4 / Cache memory, , 239, , table. Files are your "main memory" and your desktop (with its many messy looking stacks) is the cache. requiring an access to main memory to retrieve the data. The cache can be as cluttered as your author's desktop or as organized as your address book. Either way, however, the data must be accessible (findable). The cache on a computer differs from our real-life examples in one important way: the computer really has no way of knowing a priori what data is most likely to be accessed, so it uses the locality principle and transfers a block of main memory to cache each time an access to main memory is required. If the probability of using something else in that block is high, transferring the entire block saves access time. The cache location for this new block depends on two things: the cache mapping policy (discussed in the next section) and the cache size (which affects whether there is room for the new block). they vary enormously. The typical level 2 (L2) cache of a personal computer is 256K or 512K. Level 1 (L1) cache is smaller, typically 8K or 16K. The L1 cache resides on the processor, while the L2 cache resides between the CPU and main memory. Therefore, L1 cache is faster than L2 cache. The relationship between the L1 and L2 cache can be illustrated using our grocery store and store example: if the store is the main memory, you can consider your refrigerator, the L2 cache, and the actual dinner table as the cache. L1. The purpose of the cache is to speed up memory accesses by storing recently used data closer to the CPU instead of storing it in main memory. Although the cache is not as large as main memory, it is considerably faster. Whereas main memory is typically DRAM with, say, 60ns access time, cache is typically SRAM, giving faster access with a much shorter cycle time than DRAM (an access time typical to cache is 10 ns). The cache doesn't need to be very large to work well. A general rule of thumb is to make the cache small enough that the overall average cost per bit is close to that of main memory, but large enough to be beneficial. Since this fast memory is quite expensive, it is not feasible to use the technology found in the cache to build all of the main memory. What makes caching "special"? The cache is not accessed by address; is accessed, by content. For this reason, the cache is sometimes called content-addressable memory, or CAM. In most cache mapping schemes, cache entries must be checked or searched to see if the requested value is stored in the cache. To simplify this process of finding the desired data, various cache mapping algorithms are used., 6.4.1, , Cache mapping schemes, For the cache to be functional, it must store useful data. However, this data becomes useless if the CPU cannot find it. When accessing data or instructions, the CPU first generates a main memory address. If the data was copied to the cache, the address of the data in the cache is not the same as the address of main memory. For example, data located at main memory address 2E3 may be located at the same main memory address 2E3.


Page 271:
240, , Chapter 6 / Memory, , first cache location. So how does the CPU find the data when it has been copied to the cache? The CPU uses a specific mapping scheme that "converts" the main memory address into a cache location. This address conversion is done by giving special meaning to the bits in the main memory address. First we divide the bits into different groups that we call fields. Depending on the mapping scheme, we can have two or three fields. The way we use these fields depends on the specific mapping scheme being used. The mapping scheme determines where the data is placed when it is originally copied to the cache, and also provides a method for the CPU to find previously copied data when it searches the cache. Before discussing these allocation schemes, it's important to understand how data is cached. The main memory and the cache are divided into blocks of the same size (the size of these blocks varies). When a memory address is generated, it first looks in the cache to see if the required word exists there. When the requested word is not found in the cache, the entire block of main memory in which the word resides is loaded into the cache. As mentioned above, this scheme is successful because of the locality principle: if a word has just been referenced, it is very likely that words in the same general neighborhood will soon also be referenced. Therefore, a missing word often results in multiple words being found. For example, when you are in the basement and need tools for the first time, you have a "miss" and need to go to the garage. If you gather a set of tools you might need and head back down to the basement, expect to get a few "hits" while working on your home improvement project and not have to make many more trips to the garage. Since accessing a cached word (a tool already in the basement) is faster than accessing a word from main memory (go to the garage one more time!), caching speeds up access time general. So how do we use the fields in the main memory address? A main memory address field tells us a location in the cache where the data resides, if it resides in the cache (this is called a cache hit), or where it should be placed if it doesn't (called a cache miss). . ). . (This is a bit different for the associative mapped cache, which we'll talk about shortly.) The referenced cache block is then checked to see if it is valid. This is done by associating a valid bit with each cache block. A valid bit of 0 means that the cache block is invalid (we have a cache miss) and we need to access main memory. A valid bit of 1 means it's valid (we might have a cache hit, but we need to complete one more step first, we're sure). We then compare the tag in the cache block with the tag field in our address. (The label is a special group of bits derived from the main memory address that is stored with its corresponding block in the cache.) If the tags match, we have found the desired cache block (we have a cache hit). At this point we need to place the desired word in the block; this can be done using a different part of the main memory address called the word field. All cache mapping schemes require a word field; however, the remaining fields are determined by the mapping scheme. We discuss the three main cache mapping schemes on the next page.


Page 272:
6.4 / Memory Cache, , 241, , Direct Mapped Cache, , Direct Mapped Cache assigns cache mappings using a modular approach. Since there are more main memory blocks than cache blocks, it should be clear that main memory blocks compete for cache locations. Direct mapping maps main memory block X to cache block Y, mod N, where N is the total number of blocks in the cache. For example, if the cache contains 10 blocks, main memory block 0 maps to cache block 0, main memory block 1 maps to cache block 1, . 🇧🇷 🇧🇷 , main memory block 9 is mapped to cache block 9 and main memory block 10 is mapped to cache block 0. This is illustrated in Figure 6.2. Therefore, main memory blocks 0 and 10 (and 20, 30, etc.) compete for cache block 0. Does the CPU know which block actually resides in cache block 0 at any given time? The answer is that each block is copied to the cache and is identified, Cache, Block, 0, Block, 1, Block, 2, Block, 3, Block, 4, Block, 5, Block, 6, Block, 7, Block , 8, Block, 9, Block, 0, Block, 1, Block, 2, Block, 3, Block, 4, Block, 5, Block, 6, Block, 7, Block, 8, Block, 9, Block, 10 , Block, 11, Block, 12, Block, 13, Block, 14, Block, 15, Block, ..., , Main memory, , FIGURE 6.2 Direct mapping of main memory blocks to cache blocks


Page 273:
242, , Chapter 6 / Memory, Block, , Label, , Data, , Valid, , 0, , 00000000, , words A, B, C,..., , 1, , 1, , 11110101, , words L, M, N,..., , 1, , 2, , -------------, , 0, , 3, , -------------, , 0, , FIGURE 6.3 A closer look at Cache, , by the label described above. If we take a closer look at the cache, we see that it stores more than just data copied from main memory, as indicated in Figure 6.3. In this figure, there are two valid cache blocks. Block 0 contains several words of main memory, identified by the label "00000000". Block 1 contains words identified with the label "11110101". The other two cache blocks are invalid. To perform direct mapping, the main memory binary address is divided into the fields shown in Figure 6.4. The size of each field depends on the physical characteristics of the main memory and the cache. The word field (sometimes called the offset field) uniquely identifies a word within a specific block; therefore it must contain the appropriate number of bits to do this. This also applies to the block field - you need to select a unique block from the cache. The label field is what remains. When a block from main memory is copied to the cache, this tag is stored with the block and uniquely identifies it. The total of the three fields must, of course, add up to the number of bits in a main memory address. 🇧🇷 From this we determine that the memory has, , 214, = 211 blocks. We know that each main memory address requires 23, 14 bits. Of this 14-bit address field, the rightmost 3 bits reflect the word field (we need 3 bits to uniquely identify one of the 8 words in a block). We need 4 bits to select a specific block from the cache, so the block field consists of the middle 4 bits. The remaining 7 bits form the label field. The fields with sizes are illustrated in Figure 6.5. As mentioned above, each block's label is stored with that block in the cache. In this example, since main memory blocks 0 and 16 are mapped to cache block 0, the label field would allow the system to differentiate between block, label, block, word, bits in the main memory address, FIGURE 6.4 The format of a main memory address using direct mapping


Page 274:
6.4 / Cache, 7-bit, , 4-bit, , 3-bit, , Label, , Block, , Word, , 243, , 14-bit, , FIGURE 6.5 The main memory address format for our example, , 0, and block 16. The binary addresses in block 0 differ from block 16 in the leftmost 7 bits, so the labels are different and unique. To see how these directions differ, let's look at a smaller, simpler example. system that uses direct mapping with 16 words of main memory, divided into 8 blocks (so each block has 2 words). Assume that the cache is 4 blocks in size (for a total of 8 words). Table 6.1 shows how the main memory blocks are allocated to the cache. We know: • A main memory address has 4 bits (because there are 24 or 16 words in main memory). • This main memory address is 4 bits long.bits is divided into three fields: The word field is 1 bit, (we only need 1 bit to differentiate the two words in a block); the block, the field is 2 bits (we have 4 blocks in main memory and we need 2 bits to uniquely identify each block); and the label field is 1 bit (that's all that's left). The main memory address is divided into the fields shown in Figure 6.6. Main memory, Maps to, Block 0 (addresses 0, 1), Block 1 (addresses 2, 3), Block 2 (addresses 4, 5), Block 3 (addresses 6, 7), Block 4 (addresses 8, 9) , Block 5 (addresses 10, 11), Block 6 (addresses 12, 13), Block 7 (addresses 14, 15), , Cache, Block 0, Block 1, Block 2, Block 3, Block 0, Block 1, Block 2, Block 3, , TABLE 6.1 Example of main memory allocated to cache, , 1 bit , , 2 bits, , 1 bit, , Label, , Block, , Word, , 4 bits, , FIGURE 6.6 The memory address format main for a memory of 16 words


Page 275:
244, , Chapter 6 / Memory, 1 bit, , 2 bits, , 1 bit, , 1, (label), , 0, 0, (block), , 1, (word), , 4 bits, , FIGURE 6.7 Address Main memory address 9 = 10012 Divided into fields, Suppose we generate main memory address 9. We can see from the above allocation list that address 9 is in main memory block 4 and should be assigned to cache block 0 (meaning that the contents of main memory block 4 should be copied to cache block 0). The computer, however, uses the actual address of main memory to determine the cache mapping block. This address, in binary, is represented in Figure 6.7. When the CPU generates this address, it first takes the 00 bits from the block field and uses them to direct it to the appropriate block in the cache. 00 indicates that cache block 0 should be checked. If the cache block is valid, it compares the label field value of 1 (at the main memory address) with the label associated with cache block 0. If the cache tag is 1, block 4 currently resides in cache block 0. cache block 0 . If the tag is 0, then main memory block 0 is in cache block 0. (To see this, compare main memory address 9 = 10012, which is in block 4, with main memory address, 1 = 00012, which is in block 0. These two addresses differ only in the leftmost bit, which is the bit used as a label by the cache.) Assuming the labels match, which means that main memory block 4 (with addresses 8 and 9) resides in the cache, block 0, the value of word field 1 is used to select one of the two words residing in the block Since the bit is 1, it selects We move the word with offset 1, which results in retrieving the copied data from main memory address 9. Let's do one more example in this context. Suppose the CPU now issues, address 4 = 01002. The middle two bits (10) direct the lookup to cache block 2. If the block is valid, the leftmost tag bit (0) would be compared with the stored bit label, with the cache block. If they match, the first word of that block (offset 0) will be returned to the CPU. To make sure you understand this process, do a similar exercise with main memory address 12 = 11002. Let's move on to a larger example. Suppose we have a system that uses 15-bit main memory addresses and 64 cache blocks. If each block contains 8 words, we know that the 15-bit main memory address is divided into a 3-bit word field, a 6-bit block field, and a 6-bit label field. If the CPU generates the main memory address:, 1028 =, , 000010, , 000000, , 100, , TAG, , BLOCK, , WORD


Page 276:
6.4 / Cache, , 245, , would look in block 0 of the cache, and if it finds a label of 000010, the word at offset 4 in this block will be returned to the CPU., Fully Associative Cache, , Direct Map the cache is not as expensive as other caches because the mapping scheme does not require any lookup. Each block of main memory has a specific location that it maps to in the cache; when a main memory address is converted to a cache address, the CPU knows exactly where to look in the cache for that block of memory simply by looking at the bits in the block field. This is similar to your address book: pages are usually indexed alphabetically; so if you are looking for "Joe Smith" it will search under the "s" tab. Instead of specifying a unique location for each block of main memory, we can go for the opposite extreme: allow a block of main memory to be placed anywhere in the cache. The only way to find a block mapped this way is to search the entire cache. (This is similar to your author's table!) This requires the entire cache to be built from associative memory so that it can be searched in parallel. That is, a single lookup must compare the requested tag against all tags in the cache to determine if the desired data block is present in the cache. Associative memory requires special hardware to enable associative lookup, and is therefore quite expensive. Using associative mapping, the main memory address is divided into two parts, the label and the word. For example, using our previous memory configuration with 214 words, a 16-block cache, and 8-word blocks, we see in Figure 6.8 that the word field is still 3 bits, but now the label field is 11. bits. This tag must be stored with each cached block. When the cache is looked up for a specific block of main memory, the main memory address tag field is compared to all valid tag fields in the cache; if a match is found, the block is found. (Remember, the label uniquely identifies a block of main memory.) If there is no match, we have a cache miss and the block must be transferred from main memory. With direct mapping, if a block already occupies the cache location where the new block will be placed, the block currently in the cache is evicted (written, returned to main memory if modified, or simply replaced if not). changed). With fully associative mapping, when the cache is full, we need a replacement algorithm to decide which block we want to remove from the cache (we call it our victim block). A simple first-in, first-out algorithm would work, such as, , 11-bit, , 3-bit, , label, , word, , 14-bit, , FIGURE 6.8 Main memory address format for associative mapping


Page 277:
246, , ​​Chapter 6 / Memory, , would be a less recently used algorithm. There are many replacement algorithms that can be used; these are briefly discussed. Establish associative caching Due to its speed and complexity, associative caching is very expensive. While direct mapping is inexpensive, it is very restrictive. To see how direct mapping limits cache usage, suppose we're running a program on the architecture described in our previous examples. Suppose the program is using block 0, then block 16, then 0, then 16, and so on while executing the instructions. Blocks 0 and 16 were allocated to the same location, meaning that the program would repeatedly cast 0 to fetch 16 and then cast 16 to fetch 0, even though there were additional blocks in the cache that were not being used. Fully associative caching solves this problem by allowing a block of main memory to be placed anywhere. However, it requires a larger tag to be stored with the block (resulting in a larger cache) and requires special hardware to retrieve all the blocks in the cache simultaneously (resulting in a more expensive cache). We need an outline, somewhere in between. The third mapping scheme we present is N-way set-associative cache mapping, a combination of these two approaches. This scheme is similar to direct allocation caching in that we use the address to allocate the block to a particular cache location. The important difference is that instead of being mapped to a single cache block, an address is mapped to a set of multiple cache blocks. All sets in the cache must have the same size. This size can vary from one cache to another. For example, in a 2-way set, associative cache, there are two cache blocks per set, as shown in Figure 6.9. In this figure, we see that set 0 contains two blocks, one of which is valid and contains the data A, B, C, . 🇧🇷 🇧🇷 , and another one that is invalid. The same is true for Set 1. Set 2 and Set 3 can also contain two blocks, but currently, only the second block is valid in each set. In an 8-way set-associative cache, there are 8 cache blocks per set. The directly allocated cache is a special case of N-way set-associative cache allocation, where the size of the set is one. In the set-associative cache mapping set-associative cache, the main memory address is divided into three parts: the label field, the set field, and the word field. The tag and word fields assume the same functions as before; the pool field indicates which cache pool the main memory block is allocated to. Suppose we are using 2-way set-associative mapping with a main memory of 214 words, a cache with 16 blocks, where each block contains 8 words. If the cache consists of a total of 16 blocks and each set contains 2, , Set, , Label, , Set Block 0, , Valid, , Label, , 0, , 00000000, , Words A, B, C, . 🇧🇷 ., , 1, , 1, , 11110101, , Words L, M, N, . 🇧🇷 ., , 1, , -------------------------, , Block 1 of the set, , Valid, , 2, , --- - - --------, , 0, , 10111011, , P, Q, R, . 🇧🇷 ., , 1, , 3, , -------------, , 0, , 11111100, , T, U, V, . 🇧🇷 ., , 1, , 0, 0, , FIGURE 6.9 A bidirectional set-associative cache


Page 278:
6.4 / Cache, 8-bit, , 3-bit, , 3-bit, , label, , set, , word, , 247, , 14-bit, , FIGURE 6.10 Format for associative allocation of sets, , blocks, so there are 8 cached sets. So, the defined field is 3 bits, the word field is 3 bits, and the label field is 8 bits. This is illustrated in Figure 6.10., 6.4.2, Replacement Policies. In a directly allocated cache, if there is a dispute over a cache block, there is only one possible action: the existing block is evicted from the cache to make room for the new block. This process is called replacement. With direct mapping, there is no need for a replacement policy because the location of each new block is predetermined. However, with fully associative cache and associative cache defined, we need a substitution algorithm to determine the "victim" block to be removed from the cache. When fully associative caching is used, there are K possible cache locations (where K is the number of blocks in the cache) to which a given block of main memory can be allocated. With N-way set associative mapping, a block can map to any of N different blocks within a given set. How do we determine which block in the cache to replace? The algorithm for determining the replacement is called the replacement policy. There are several popular replacement policies. One that is impractical but can be used as a benchmark against which to measure all the others is the ideal algorithm. We like to cache values ​​that will be needed again soon and discard blocks that won't be needed again or won't be needed for some time. It would be better to have an algorithm that could look into the future to determine the precise blocks to keep or kick out based on these two criteria. This is what the optimal algorithm does. We want to replace the block that will not be used for a longer time in the future. For example, if the victim's block choice is between block 0 and block 1, and block 0 will be used again in 5 seconds, while block 1 will not be used again for 10 seconds, we would discard the block 1. From a practical point of view, we can't look into the future, but we can run a program and then run it again, so we really know the future. We can then apply the optimal algorithm on the second run. The optimal algorithm guarantees the lowest possible error rate. Since we can't see the future in every program we run, the ideal algorithm is only used as a metric to determine how good or bad another algorithm is. The closer an algorithm is to the optimal algorithm, the better. We need algorithms that best approximate the optimal algorithm. We have several options. For example, we can consider the temporal locality. We can assume that any value that has not been used recently is unlikely to be needed again.


Page 279:
248, , Chapter 6 / Memory, , coming soon. We can track the last time each block was accessed (assign a timestamp to the block) and select the block that was used at least recently as the victim block. This is the least recently used (LRU) algorithm. Unfortunately, LRU requires the system to maintain a history of accesses to each cache block, which takes up significant space and slows down cache performance. There are ways to approximate the LRU, but that is beyond the scope of this book. , (See the references at the end of the chapter for more information.) First in, first out (FIFO) is another popular approach. With this algorithm, the block that has been in the cache the longest (regardless of the last time it was used) would be selected as the victim to be removed from the cache. Another approach is to select a victim at random. The problem with LRU and FIFO is that there are degenerate reference situations in which they can be done, thrash (throw a block constantly, then bring it back, then throw it, then bring it back, repeatedly). Some people argue that random replacement, while sometimes discarding data that will be needed soon, is never debated. Unfortunately, it's hard to have a truly random replacement and it can slow down average performance. The selected algorithm generally depends on how the system will be used. No, the single (practical) algorithm is better for all scenarios. For this reason, designers use algorithms that work well in a wide variety of circumstances. 6.4.3, Effective access time and hit rate. the mean time per access. EAT is a weighted average using the hit rate and the relative access times of successive levels of the hierarchy. For example, suppose the cache access time is 10 ns, the main memory access time is 200 ns, and the cache hit rate is 99%. So the average time for the processor to access an element in this two-level memory would be: EAT = 0.99 (10 ns) + 0.01 (200 ns) = 9.9 ns + 2 ns = 11 ns, cache hit, cache miss, what, exactly, does that mean? If we look at the access times over a long period of time, this system works as if it had a single large memory with an access time of 11 ns. A 99% cache hit rate allows the system to run smoothly, even though most of the memory is built on slower technology with an access time of 200ns. The formula for calculating the effective access time for a two-level memory is given by :, EAT = H ⫻ AccessC + (1 ⫺ H) ⫻ AccessMM, where H = cache access speed, AccessC = access time to cache and AccessMM = access time to main memory. This formula can be extended to apply to memories of three or even four levels, as we will see shortly.


Page 280:
6.4 / Cache, , 249, , 6.4.4, , When is the cache broken?, When programs display the locale, the cache works fine. However, if programs exhibit poor locality, caching will break and memory hierarchy performance will be poor. In particular, object-oriented programming can cause programs to display less than ideal locality. Another example of incorrect locale can be seen when accessing two-dimensional arrays. Arrays are normally stored in parent row order. Assume, for the purposes of this example, that one line fits in exactly one cache block, and that the cache can contain all but one of the lines in the array. If a program accesses the array one line at a time, accessing the first line fails, but once the block is transferred to the cache, all subsequent accesses to that line are hits. So a 5 ⫻ 4 matrix would produce 5 errors and 15 hits in 20 accesses (assuming we are accessing all the elements of the matrix). If a program accesses the array in the order of the leading columns, accessing the first column generates an error, after which an entire row is transferred. However, the second access to the column results in another failure. The data that is passed for each row is not used because the matrix is ​​accessed by column. Since the cache is not large enough, this would produce 20 errors in 20 accesses. A third example would be a program that loops through a linear array that does not fit in the cache. There would be a significant reduction in locale when memory is used in this way. -the so-called dirty cache blocks, or blocks that have been modified. When the processor writes to main memory, the data can be written to the cache, assuming the processor is likely to read it again soon. If a cache block is modified, the cache write policy determines when the actual main memory block is updated to match the cache block. There are two basic write policies: • Write-through: A write-policy updates cache and main memory simultaneously on each write. This is slower than writeback, but ensures that the cache is consistent with main system memory. The obvious downside here is that every write now requires an access to main memory. Using a write policy means that every write to the cache requires a write to main memory, which slows down the system (if all accesses are writes, this essentially slows the system down to the speed of main memory). However, in real applications, most accesses are read, so this slowdown is negligible. • Writeback – A writeback (also called copyback) policy only updates blocks in main memory when the cache block is selected as a victim and must be removed from main memory. cache. This is typically faster than write-through because no time is wasted writing information to memory on each cache write. Memory traffic is also reduced. The downside is that main memory and cache may not hold the same value at any given time, and if a process terminates (crashes) before writing to main memory is complete, data may be lost. in the cache. should increase the hit rate using a better mapping algorithm (up to about 20% increase), better strategies for


Page 281:
250, Chapter 6 / Memory, writes (potentially 15% increase), better replacement algorithms (up to 10% increase), and better coding practices, as we saw in the previous example, row vs. column access ( up to 30% increase in the hit rate). Simply increasing the cache size can improve the hit rate by 1-4%, but this is not guaranteed to happen. but a faster cache. The cache sits near the top of our memory hierarchy. Another important concept inherent in the hierarchy is virtual, memory. The purpose of virtual memory is to use the hard drive as an extension of RAM, thus increasing the available address space that a process can use. Most personal computers have a relatively small amount (usually less than 512 MB) of main memory. Typically, this memory is not enough to support multiple applications simultaneously, such as a word processing application, an email program, and a graphics program, in addition to the operating system itself. Using virtual memory, your computer takes up more main memory than it actually has, and uses the hard drive to store the excess. This area of ​​the hard drive is called a page file because it contains chunks of main memory on the hard drive. The easiest way to think of virtual memory is to conceptualize it as an imaginary memory location where the operating system handles all addressing issues. Memory is divided into blocks of fixed size, and programs are divided into blocks of the same size. Usually, parts of the program are saved to memory as needed. It is not necessary to store contiguous parts of the program in contiguous parts of main memory. Since parts of the program can be stored out of order, the program addresses, once generated by the CPU, must be translated into main memory addresses. Remember, in caching, a main memory address must be converted to a cache location. The same is true when using virtual memory; every virtual address must be translated to a physical address. How do you do this? Before we dive into the explanation of virtual memory, let's define some commonly used terms for virtual memory implemented through paging: • Virtual Address: The logical or program address used by the process. Every time the CPU generates an address, it is always in terms of the virtual address space. • Physical Address: The actual address in physical memory. • Mapping: The mechanism by which virtual addresses are translated into physical addresses (very similar to caching). mapping), • Page frames: the fragments or blocks of equal size into which the main memory (physical memory) is divided. • Pages: the fragments or blocks in which the virtual memory is divided (the logical address, the space), each equal in size to a page frame. Virtual pages are stored on disk until needed.


Page 282:
6.5 / Virtual Memory, , 251, • Paging – the process of copying a virtual page from disk to a page frame in main memory, • Fragmentation – memory that becomes unusable, • Page Fault – an event that occurs when requested The page is not in main memory and must be copied from disk to memory. Since main memory and virtual memory are divided into equal-sized pages, parts of the process's address space can be moved into main memory, but need not be stored contiguously. As noted above, we don't need to have the entire process in main memory at once; Virtual memory allows a program to run when only specific parts are present in memory. The parts that are not currently used are stored in the paging file on disk. Virtual memory can be implemented with different techniques, including paging, segmentation, or a combination of both, but paging is the most popular. (This topic is covered in great detail in the study of operating systems.) The success of paging, like that of caching, depends heavily on the locality principle. resides is copied from disk to main memory, in the hope that other data on the same page will be useful as the program continues to run., 6.5.1, , Paging, The basic idea behind paging is quite simple: Allocate physical memory to processes in fixed-size chunks (page frames) and keep track of where the various pages of the process reside by recording information in a page table. Each process has its own page table that normally resides in main memory, and the page table stores the physical location of each virtual page in the process. The page table has N rows, where N is the number of virtual pages in the process. If there are process pages that are not currently in main memory, the page table indicates this by setting a valid bit to 0; if the page is in main memory, the valid bit is set to 1. Thus, each page table entry has two fields: a valid bit and a frame number. Usually additional fields are added to convey more information. For example, a dirty bit (or modify bit) can be added to indicate whether the page has changed. This makes returning the page to disk more efficient, since if it doesn't change, it doesn't need to be written back to disk. Another bit (the usage bit) can be added to indicate the usage of the page. This bit is set to 1 each time the page is accessed. After a certain period of time, the usage bit is set to 0. If the page is referenced again, the usage bit is set to 1. However, if the bit remains at 0, it indicates that the page has not been used for a period. of time, and the system can benefit from sending this page to disk. By doing so, the system frees up the location of this page to another page that the process ultimately needs (we'll talk about this in more detail when we introduce the replacement algorithms). Process memory is divided into these fixed-size pages, leading to potential internal fragmentation when the last page is copied into memory. The process may not


Page 283:
252, , Chapter 6 / Memory, it actually needs the entire page frame, but no other process can use it. Therefore, unused memory in the latter framework is effectively wasted. It may happen that the process itself requires less than one page in its entirety, but must occupy a full page frame when copied into memory. Internal fragmentation is unusable space within a given partition (in this case, a page) of memory. Now that you understand what pagination is, let's discuss how it works. When a process generates a virtual address, the operating system must dynamically translate that virtual address to the physical address in memory where the data resides. (For the sake of simplicity, let's assume we don't have a cache at this point.) For example, from the program's point of view, we see the final byte of a 10-byte program as address 9, assuming both 1-byte and 1-byte. byte instruction addresses and a starting address of 0. However, when loaded into memory, logical address 9 (perhaps a reference to the X label in an assembly language program) may reside at physical memory location 1239 , which implies that the program was loaded starting at physical address 1230. There must be an easy way to convert logical or virtual address 9 to physical address 1230. To perform this address translation, a virtual address is split into two fields : a page field and a page field. offset, to represent the offset within that page where the requested data is located. This address translation process is similar to the process we use when dividing main memory addresses into fields for cache mapping algorithms. And just like cache blocks, page sizes are usually powers of 2; this simplifies the extraction of page numbers and offsets from virtual addresses. To access data at a given virtual address, the system performs the following steps: 1. Extract the page number from the virtual address. 2. Extract the offset from the virtual address., 3. Translate the page number to a physical page frame number by accessing the page table., A. Look up the page number in the page table (using the number virtual page, as an index). , B. Check the valid bit for this page., 1. If the valid bit = 0, the system generates a page fault and the operating system must intervene to, a. Locate the desired page on the disk., b. Find a free page frame (this may require removing a "victim" page from memory and copying it back to disk if memory is full). c. Copy the desired page to the free page frame in main memory, d. Update the page table. (The virtual page just introduced must have its frame number changed and its valid bit in the page table changed. If there is a "victim" page, its valid bit must be set to zero). and. Resume execution of the process that caused the page to fail by continuing with Step B2.

(Video) Derivation of String | Left/Rightmost, Type 2 Grammar, Parse Tree | CFG | Compiler Design


Page 284:
6.5 / Virtual memory, , 253, , 2. If the valid bit = 1, the page is in memory., a. Replace the virtual page number with the actual frame number., b. Access data at the offset in the physical page frame by adding the offset to the frame number for the given virtual page. Note that if a process has free frames in main memory when a page fault occurs, the newly fetched page can be placed in any of those free frames. However, if the memory allocated to the process is full, a victim page must be selected. The replacement algorithms used to select a victim are quite similar to those used in caching. FIFO, Random and LRU are possible replacement algorithms to select a victim page. (For more information on substitution algorithms, see the references at the end of this chapter.) Let's see an example. Suppose we have a virtual address space of 28 words for a given process (this means that the program generates addresses in the range 0 to 25510, which is 00 to FF16) and a physical memory of 4 page frames (no, cache). Also assume that the pages have 32 words. Virtual addresses contain 8 bits and physical addresses contain 7 bits (4 frames of 32 words each of 128 words or 27). Also assume that some process pages have been moved to main memory. Figure 6.11 illustrates the current state of the system. Each virtual address is 8 bits long and is divided into 2 fields: the page field is 3 bits, indicating that there are 23 pages of virtual memory, , 8, 2, 25, , = 23 . Each page is , , 25 = 32 words long, so we need 5 bits for the page offset. Therefore, an 8-bit virtual address has the format shown in Figure 6.12. Suppose the system now generates the virtual address 1310 = 0D16 =, 000011012. Splitting the binary address into the page and offset fields (see Figure, Virtual Memory, Physical Memory, , 0, , 0, , 1, , 1 , , 2 , , 2, , 3, , 3, , 4, , Page, 0, 1, 2, 3, 4, 5, 6, 7, , Page Table, frame number, 2, 0, 1, 3 , , Valid bit, 1, 0, 0, 1, 1, 0, 0, 1, , 5, 6, 7, , FIGURE 6.11 Current state using paging and associated page, table


Page 285:
254, , Chapter 6 / Memory, 8-bit, Page, , Offset, , 3-bit, , 5-bit, , FIGURE 6.12 Format for an 8-bit virtual address with 25 = 32 words, page size, , 6.13), we see the page field P = 0002 and the offset field equals 011012. To continue the translation process, we use the page field value 000 as an index, into the page table. Going to entry 0 in the page table, we see that virtual page 0 maps to physical page frame 2 = 102. So the translated physical address becomes page frame 2, offset 13. Note that a physical address has only 7 bits (2, for the frame, because there are 4 frames and 5 for the offset). Written in binary, using both fields, it becomes 10011012, or address 4D16 = 7710 and is shown in Figure 6.14. We can also find this address in another way. Each page has 32, words. We know that the virtual address we want is on virtual page 0, which maps to physical page box 2. Box 2 starts with address 64. An offset of 13 results in address 77. Let's see a complete example in one real (but small) system (again, no cache). Suppose a program is 16 bytes long, has access to 8-byte memory using byte addressing (this means each byte or word has its own address), and a page 2 words (bytes) long. As the program runs, it generates the following address reference string (addresses are given in decimal values): 0, 1, 2, 3, 6, 7, 10, 11. address indicates that reference is first made to address 0, then address 1, then address 2, and so on.) By default, memory contains no pages for this program. When address 0 is needed, both address 0 and address 1 (in page 0) are copied to page 2 frame in main memory (may be frames 0, 8 bits, 000, , 01101, , 3 bits, , 5 bits, , FIGURE 6.13 Format for virtual address 000011012 = 1310, 7 bits, 10, , 01101, , 2 bits, , 5 bits, , FIGURE 6.14 Format for physical address 1011012 = 7710


Page 286:
255, , 6.5 / Virtual memory, and 1 memory are occupied by another process and therefore not available). This is an example of a page fault, as the desired page of the program had to be retrieved from disk. When address 1 is referenced, the data already exists in memory (so we have a page hit). When address 2 is referenced, this causes another page fault and program page 1 is copied to frame 0 in memory. This continues, and after referencing these addresses and copying the pages from disk to main memory, the state of the system is shown in Fig. 6.15a. We see that the program address 0, which contains the data value "A", currently resides at, , a., , Program Address Space, , A, B, C, D, E, F, G, H, I , J, K, L, M, N, O, P, , Page 0, 1, 2, 3, 4, 5, 6, 7, , b., , Page Table, , 0, 1, 2, 3, 4, 5 , 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, , Page, , Frame, , Valid, Bit, , 0, 1, 2, 3, 4, 5, 6, 7, , 2, 0, 1, 3, -, , 1, 1, 0, 1, 0, 1, 0, 0, , =, =, =, =, =, =, =, =, =, = , =, =, =, =, =, =, , 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, , Memory address Main, Page, Table 0, 1, 2, 3, , c., , C, D, G, H, A, B, K, L, , 0, 1, 2, 3, 4, 5, 6, 7 , , =, =, =, =, =, =, =, =, , Virtual Address 1010 = 10102, 4 bits (Keep K Value), , 1 0 1 0, Page Offset 5 0, , d., , 000, 001, 010 , 011, 100, 101, 110, 111, , Physical address, 3 bits, , 1 1 0, Page Offset 3 0, , FIGURE 6.15 Example of small memory


Page 287:
256, Chapter 6 / Memory, memory location 4 = 1002. Therefore, the CPU must translate from virtual address 0 to physical address 4 and use the translation scheme described above to do this. Note that main memory addresses contain 3 bits (there are 8 bytes in memory), but virtual (program) addresses must be 4 bits (because there are 16 bytes in the virtual address). Therefore, the translation must also convert a 4-bit address to a 3-bit address. Figure 6.15b shows the page table for this process after accessing the provided pages. We can see that pages 0, 1, 3 and 5 of the process are valid and therefore reside in memory. Pages 2, 6, and 7 are invalid, and each would cause page faults if referenced. Let's take a closer look at the translation process. Suppose the CPU now generates the program or virtual address 10 = 10102 a second time. We see in Figure 6.15a that the data at this location, "K," resides in main memory, address 6 = 01102. However, the computer must perform a specific translation process to find the data. To accomplish this, the virtual address, 10102, is split into a page field and an offset field. The page field is 3 bits long because there are 8 pages in the program. This leaves 1 bit for change, which is fine because there are only 2 words on each page. This division of fields is illustrated in Figure 6.15c. Once the computer sees these fields, it's a simple matter of converting them to the physical address. The page field value of 1012 is used as an index into the page, table. Since 1012 = 5, we use 5 as the offset in the page table (Figure 6.15b) and see that virtual page 5 is assigned to physical frame 3. We now replace 5 = 1012 with 3 = 112, but keep the same Offset. The new physical address is 1102, as shown in Figure 6.15d. This process correctly translates from virtual addresses to physical addresses and reduces the number of bits from four to three as needed. Now that we've worked through a small example, we're ready for a larger, more realistic example. Assume we have a virtual address space of 8K words, a physical memory size of 4K words using byte addressing, and a page size of 1K words (there is also no cache on this system, but we are getting closer to understanding how the cache works). memory). works and will eventually use paging and caching in our examples), and a word size of one byte. A virtual address has a total of 13 bits (8K = 213), with 3, 13, bits used for the page field (ie 2210 = 23 virtual pages) and 10 used for offset (each page is 210 bytes). . A physical memory address is only 12 bits (4K = .212), with the first 2 bits being the page field (there are 22 page frames in main memory) and the remaining 10 bits being the intra-page offset. . The formats for the virtual address, the address, and the physical address are shown in Figure 6.16a. For the purposes of this example, suppose we have the page table shown in Figure 6.16b. Figure 6.16c shows a table indicating the various main memory addresses (in base 10) that are useful for illustrating translation. and offset fields and how it is converted to the physical address 136310 = 0101010100112.


Page 288:
6.5 / Virtual memory, 13, , Virtual address space: 8K = 2, Physical memory: 4K = 212, Page size: 1K = 210, , a., , Virtual address, , Physical address, 12, , 13, , . B. , , Page, , Offset, , Frame, , Offset, , 3, , 10, , 2, , 10, , Page Table, , Page, , 0, 1, 2, 3, 4, 5, 6, . 7, , c., Frame, 3, 0, 1, 2, -, , Valid, Bit, 0, 1, 1, 0, 0, 1, 1, 0, , Addresses, , Page, , 0, 1, . 2, 3 , 4, 5, 6, 7, , :, :, :, :, :, :, :, :, :, 0, 1024, 2048, 3072, 4096, 5120, 6144, 7168, , -, , 1023 , 1, , e., , Virtual Address 2050, is converted to, Physical Address 2, , 010 0000000010, Page 2, , 00, , 0000000010, , Frame 1, , f., , Virtual Address 4100, , 100 0000000100, , . Page error, FIGURE 6.16 A larger memory example, 257


Page 289:
258, , Chapter 6 / Memory, , the virtual address page field 101 is replaced with frame number 01, since page 5 is assigned to frame 1 (as indicated in the page table). Figure 6.16e illustrates how virtual address 205010 is translated to physical address 2. Figure 6.16f shows virtual address 410010 generating a page fault; page 4 = 1002 is invalid in the page table. It is worth mentioning that selecting a suitable page size is very difficult. The larger the page size, the smaller the page table will be, saving space in main memory. However, if the page is too big, the internal fragmentation gets worse. Larger page sizes also mean fewer actual transfers from disk to main memory, since the blocks being transferred are larger. However, if they are too large, the locality principle starts to break down and we are wasting resources, transferring data that may not be needed. 6.5.2, Effective access time using paging, introduced the notion of effective access time. We also need to address the EAT when using virtual memory. There is a time penalty, associated with virtual memory: for every memory access the processor generates, there must now be two physical memory accesses: one to reference the page table and one to reference the data to which we want to access. It's easy to see how this affects effective access time. Let's assume that an access to main memory takes 200 ns and the page fault rate is 1% (99% of the time we find the page we need in memory). Let's say it takes 10 ms to access a page that is not in memory (this 10 ms time includes the time required to transfer the page to memory, update the page table, and access the data). The effective access time for a memory access is now: EAT = 0.99(200ns + 200ns) + 0.01(10ms) = 100.396ns, Even if 100% of the pages were in main memory, the effective access time would be: EAT = 1.00(200ns + 200ns) = 400ns, which is twice the memory access time. Accessing the page table costs us additional memory access because the page table itself is stored in main memory. side buffer (TLB)., Each TLB entry consists of a virtual page number and its corresponding frame, , Virtual, Page Number, 5, 2, 1, 6, , Physical Page Number, 1, 0, 3, 2, , TABLE 6.2 Current state of the TLB for Figure 6.16


Page 290:
6.5 / Virtual memory, , 259, , number. Table 6.2 lists a possible TLB state for the example page table above. The TLB is typically implemented as an associative cache, and page/virtual frame pairs can be allocated anywhere. These are the steps required for an address lookup when using a TLB (see Figure 6.17): Extract the page number from the virtual address. Extract the offset from the virtual address. Look up the virtual page number in the TLB. If the pair (virtual page number, page frame number) is found in the TLB, add the offset, physical frame number, and access memory location. 5. If there is a lack of TLB, go to the page table to get the required frame number., If the page is in memory, use the corresponding frame number and add the offset to get the physical address., 6. If the page is not in main memory, generate a page fault and restart access, when the page fault is complete., 1., 2., 3., 4., , 6.5.3, , Putting it all together : Using cache, TLB, and paging, because TLB is essentially a cache, putting all these concepts together can be confusing. A walkthrough of the entire process will help you understand the big picture. When the CPU generates an address, it is either an address relative to the program itself or a virtual address. This virtual address must be converted to a physical address before data recovery can continue. There are two ways to do this: (1) use the TLB to find the frame by locating a recently cached (page, frame) pair; or (2) in the case of a missing TLB, use the page table to locate the corresponding frame in main memory (the TLB is usually updated at this point as well). This frame number is then combined with the offset given in the virtual address to create the physical address. At this point, the virtual address has been converted to a physical address, but the data for that address has not yet been retrieved. There are two possibilities to retrieve the data: (1) look in the cache to see if the data resides there; or (2) on a cache miss, go to the actual main memory location to retrieve the data (the cache is usually refreshed at this point as well). Figure 6.18 illustrates the process of using TLB, paging, and caching., , 6.5.4, , Advantages and Disadvantages of Paging and Virtual Memory, In section 6.5, an additional memory reference when accessing data . This time penalty is partially mitigated by using a TLB to cache page table entries. However, even with a high TLB hit rate, this process still generates translation overhead. Another disadvantage of virtual memory and paging is the additional consumption of resources (the memory overhead to store page tables). In extreme cases (very large programs), page tables can take up a significant portion of physical memory. One solution offered for the latter problem is to paginate the page tables, which can


Page 291:
260, , Chapter 6 / Memory, CPU, Virtual Address, Page, , Offset, TLB, Frame # Page #, , TLB Hit, , Main Memory, , Frame, , Offset, , Physical Address, , PageTable, TLB, . Miss , , Physical, Memory, , Update, Page, Table, , Page Failure, (Requires OS intervention), , Update, TLB, , Secondary memory, , FIGURE 6.17 Use of TLB, , Load page, Physical memory


Page 292:
6.5 / Virtual memory, , 261, , CPU generate virtual address, Page, , No, , Offset, , Is page, table entry, for P in, TLB?, , Yes, , (Now has frame.), , Frame , , Offset, , Use P as index, in page table, P is in, page, table?, , Yes, , (Now has frame), , Frame, , Offset, , Update TLB, , Yes, , Is it locked in cache? ?, , No, No, , Read page from disk, , Update cache, Transfer P, to memory, Access data, No, , Memory full?, , Update page table, , Yes, , Find victim, paging and write, return to disk, , Replace victim page, with new page, P, , Update TLB, Restart access, , FIGURE 6.18 Putting it all together: TLB, page table, cache, and memory main, it really gets very messy! Virtual memory and paging also require special hardware and operating system support. The benefits of using virtual memory must outweigh these disadvantages for it to be useful in computer systems. But what are the advantages of virtual memory and paging? It's quite simple: programs are no longer limited by the amount of physical memory available. Virtual memory allows us to run individual programs whose virtual address space is larger than physical memory. (Actually, this allows a process to share physical memory with itself.) You have to worry about physical address space limitations. Since each program requires less physical memory, virtual memory also allows us to run more programs at the same time. This allows us to share the machine between processes whose total address space size exceeds the physical memory size, resulting in increased CPU utilization and system performance.


Page 293:
262, , Chapter 6 / Memory, , The fixed size of frames and pages simplifies the allocation and location from the point of view of the operating system. Paging also allows the operating system to specify protection ("This page belongs to user X and you can't access it") and sharing ("This page belongs to user X, but you can read it") on a per page basis. , 6.5.5, , Paging, Although it is the most common method, paging is not the only way to implement virtual memory. A second method employed by some systems is segmentation. Instead of dividing the virtual address space into pages of equal fixed size and the physical address space into page frames of equal size, the virtual address space is divided into logical and variable pages. units of length or segments. Physical memory is not actually partitioned or partitioned at all. When it is necessary to copy a segment into physical memory, the operating system searches for a free portion of memory large enough to contain the entire segment. Each segment has a base address, which indicates where it is in memory, and a threshold limit, which indicates its size. Each program, which consists of several segments, now has an associated segment table instead of a page table. This segment table is simply a collection of base/limit pairs for each segment. Memory accesses are translated by providing a segment number and an offset within the segment. Error checking is done to ensure that the offset is within the allowed limit. If so, the base value for that segment (found in the segment table) is added to the offset, producing the actual physical address. blocking, protecting and sharing is easier using segmentation. For example, the virtual address space can be divided into a code segment, a data segment, a stack segment, and a symbol table segment, each of a different size. It's much easier to say "I want to share all my data, so make my data segment accessible to everyone" than it is to say "Okay, which pages does my data reside on and now that I've found these four pages, let's make three of the pages accessible, but only half of that fourth page accessible.", Like paging, segmentation suffers from fragmentation. Paging creates internal fragmentation because a frame can be assigned to a process that does not need the full frame. Segmentation , on the other hand, suffers from external fragmentation. As segments are allocated and deallocated, the free chunks that reside in memory break. Eventually, there are many small chunks, but none large enough to hold an entire thread. The difference between outer and inner fragmentation is that with outer fragmentation there can be enough total memory space to allocate to a process, but that space is not contiguous. : exists as a large number of small unusable holes. With internal fragmentation, memory is simply not available because the system has overallocated memory to a process that doesn't need it. To combat external fragmentation, systems use some form of garbage collection. This process simply shuffles busy chunks of memory to merge smaller, fragmented chunks into larger, usable chunks. If you've ever defragmented a drive, you've witnessed a similar process.


Page 294:
6.6 / A real-world example of memory management, , 263, , process, collecting the many small free spaces on the disk and creating fewer, but larger ones., 6.5.6, , 6.6, , Paging Combined With Striping, the Pagination is not the same as segmentation. Paging is based on a purely physical value: the program and main memory are divided into blocks of the same physical size. Segmentation, on the other hand, allows logical parts of the program to be divided into partitions of variable size. With segmentation, the user knows the sizes and limits of the segments; with paging, the user is not aware of the partition. Paging is easier to manage: allocating, freeing, swapping, and reallocating are all easy when everything is the same size. However, pages are often smaller than segments, which means more overhead (in terms of resources to crawl and transfer pages). Paging removes external fragmentation while segmentation removes internal fragmentation. Segmentation has the ability to support sharing and protection, both of which are very difficult to do with paging. Pagination and segmentation have their advantages; however, a system need not use one or the other: these two approaches can be combined in an effort to get the best of both worlds. In a combined approach, the virtual address space is divided into segments of variable length, and the segments are divided into pages of fixed size. Main memory is divided into frames of the same size. Each segment has a page table, which means that each program has multiple page tables. The physical address is divided into three fields. The first field is the segment field, which points the system to the appropriate page table. The second field is the page number, which is used as an offset in this page table. The third field is the offset within the page. Combined segmentation and paging are very advantageous because they allow segmentation from the user's point of view and pagination from the system's point of view. memory, here is a brief overview of how this processor handles memory. The Pentium architecture allows for 32-bit virtual memory. addresses and 32-bit physical addresses. Use 4KB or 4MB page sizes when paging is used. Paging and segmentation can be applied in different combinations, including non-segmented and non-paged memory; non-segmented and paged memory; segmented, non-paged memory; and segmented and paged memory. The Pentium has two caches, L1 and L2, both with a block size of 32 bytes. L1 is close to the processor, while L2 is between the processor and the memory. The L1 cache is actually two caches; the Pentium (like many other machines) separates the L1 cache into the cache used to store instructions (called I-cache) and the cache used to store data (called D-cache). Both L1 caches use one LRU bit to handle


Page 295:
264, , Chapter 6 / Memory, L1 Cache: 2-way associative array, single-bit LRU, 32-byte line size, , 32B, , L1 Cache I, (8 or 16 KB), TLB, , L2, ( 512 KB , or, 1 MB), , CPU, 32B, , L1 D-cache, (8 or 16 KB), , Main, Memory, (up to 8 GB), , Virtual, Memory, , TLB, , D- Cache TLB: , 4-way associative array, 64 entries, , I-Cache TLB: 4-way associative array, 32, entries, , FIGURE 6.19 Pentium memory hierarchy, block replacement. Each L1 cache has a TLB: D-cache TLB has 64 entries and I-cache has only 32 entries. Both TLBs are 4-way associative and use a pseudoLRU replacement. L1 D-cache and I-cache use bidirectional set associative mapping. The L2 cache can be from 512 KB (for earlier models) to 1 MB (for later models). The L2 cache, like the two L1 caches, uses 2-way set-associative mapping. To manage memory access, the Pentium I cache and L2 cache use the MESI cache coherence protocol. Each cache line has two bits that store one of the following MESI states: (1) M: modified (cache is different from main memory); (2) E: exclusive (cache has not been modified and is the same as memory); (3) S: shared (this line/block can be shared with another line/cache block); and (4) I: invalid (line/block not cached). Figure 6.19 presents an overview of the Pentium memory hierarchy. We have only provided a brief and basic overview of the Pentium and its approach to memory management. If you are interested in more details, see the "Further reading" section. , CHAPTER SUMMARY, memory is organized as a hierarchy, with larger memories being cheaper but slower, and smaller memories being faster but more expensive. In a typical memory hierarchy, we find a cache, main memory, and secondary memory (usu-, , M


Page 296:
Further reading, , 265, add a disk drive). The locality principle helps to bridge the gap between the successive layers of this hierarchy, and the programmer is given the impression of very, very fast, very large memory without worrying about the details of transfers between the various levels of this hierarchy. stores the most used main memory blocks and is close to the CPU. One of the goals of the memory hierarchy is for the processor to see an effective access time very close to the cache access time. Achieving this goal depends on the behavioral properties of the running programs, the size and organization of the cache, and the cache replacement policy. Processor references placed in the cache are called cache hits; if they are not found, they are cache misses. On failure, the missing data is retrieved from main memory and the entire block containing the data is loaded into the cache. The cache organization determines the method the CPU uses to find, cache different memory addresses. The cache can be organized in different ways: directly mapped, fully associative, or set-associative. Directly allocated cache does not require a replacement algorithm; however, fully associative and associative set must use FIFO, LRU, or some other placement policy to determine which block to evict from the cache to make room for a new block if the cache is full. LRU offers very good performance but is very difficult to implement. Another purpose of the memory hierarchy is to extend main memory using the hard drive itself, also called virtual memory. Virtual memory allows us to run programs whose virtual address space is larger than physical memory. It also allows more processes to run simultaneously. The disadvantages of virtual memory, implemented with paging, include additional resource consumption (to store the page, table) and additional memory accesses (to access the page table), unless a TLB is used, to cache the most recently used virtual/physical data pairs. addresses. Virtual memory also incurs a translation penalty for converting the virtual address to a physical address, as well as a penalty for handling a page fault if the requested page currently resides on disk rather than in main memory. The relationship between virtual memory and main memory is very similar to the relationship between main memory and cache. Due to this similarity, the concepts of cache and TLB are often confused. Actually, the TLB is a cache. It's important to realize that virtual addresses must first be translated to physical addresses before anything else can be done, and this is what the TLB does. Although caching and paged memory look very similar, the goals are different: caching improves the effective access time of main memory, while paging increases the size of main memory. FURTHER READING Mano (1991) has a good explanation of BATER. Stallings (2000) also gives a very good explanation of RAM. Hamacher, Vranesic, and Zaky (2002) contain an extensive discussion of caching. For good coverage of virtual memory, see Stallings, (2001), Tanenbaum (1999), or Tanenbaum and Woodhull (1997). For more on memory management in general, see Flynn and McHoes


Page 297:
266, , Chapter 6 / Memory, , (1991), Stallings (2001), Tanenbaum and Woodhull (1997), or Silberschatz, Galvin and Gagne (2001). Hennessy and Patterson (1996) discuss issues related to determining cache performance. For an online tutorial on memory and technologies, see www.kingston.com/king/mg0.htm. George Mason University also has a set of banks on various computer topics. The workbench for virtual memory is located at cne.gmu.edu/workbenches/vmsim/vmsim.html., REFERENCES, Davis, W. Operating Systems, A Systematic View, 4th ed., Redwood City, CA: Benjamin/ Cummings, 1992., Flynn, I.M. and McHoes, A.M. Understand operating systems. Pacific Grove, CA:, Brooks/Cole, 1991., Hamacher, V.C., Vranesic, Z.G. and Zaky, S.G. Computer Organization, 5th ed., New York: McGraw-Hill, 2002., Hennessy, J.L. and Patterson, D.A. Computer Architecture: A Quantitative Approach, 2nd ed., San Francisco: Morgan Kaufmann, 1996., Mano, Morris. Digital Design, 2nd ed., Upper Saddle River, NJ: Prentice Hall, 1991., Silberschatz, A., Galvin, P., and Gagne, G. Operating System Concepts, 6th ed., Reading, MA: Addison-Wesley, 2001., Stallings, W. Computer Organization and Architecture, 5th ed., New York: Macmillan Publishing, Company, 2000., Stallings, W. Operating Systems, 4th ed., New York: Macmillan Publishing Company, 2001., Tanenbaum, A. Structured Organization of Computers, 4th ed., Englewood Cliffs, NJ: Prentice Hall, 1999., Tanenbaum, A. and Woodhull, S. Operating Systems, Design and Implementation, 2. nd ed., Englewood Cliffs, NJ: Prentice Hall, 1997., , REVIEW OF ESSENTIAL TERMS AND CONCEPTS, 1. Which is faster, SRAM or DRAM?, 2. What are the advantages of using DRAM as main memory? , 3. Name three different applications where ROMs are often used. 4 • Explain the concept of memory hierarchy. Why did its authors choose to represent it, did they send it as a pyramid?, 5. Explain the concept of reference locality and indicate its importance for memory systems, , tems., 6. What are the three forms of locality?, 7 Give two non-computational examples of the cache concept. 8. Which L1 or L2 cache is faster? Which is smaller? Why is it smaller?, 9. Cache memory is accessed by its ________, while main memory is accessed by its, , _______.


Page 298:
Exercises, , 267, , 10. What are the three fields in a directly mapped cache address? How are they used to access a word located in the cache? 11. How is associative memory different from normal memory? Which is more expensive and why? 12. Explain how fully associative caching differs from direct allocation caching. 13. Explain how defined associative caching combines the ideas of direct and fully associative caching. case of defined associative cache where the defined size is 1., , So, the fully associative cache is a special case of defined associative cache where the defined size is ___., 15. What are the three fields in an associative cache address established and how? are used to access a location in the cache?, 16. Explain the four cache replacement policies presented in this chapter., 17. Why is the optimal cache replacement policy important?, 18 What is the behavior worst-case caching that can be developed using LRU and FIFO caching, override policies?, 19. What exactly is effective access time (EAT)?, 20. Explain how to derive a effective access time formula. caching misbehaves?, 22. What is a dirty block?, 23. Describe the advantages and disadvantages of the two write cache policies., 24. What is the difference between a virtual memory address and a physical memory address,? Which is bigger? Why?, 25. What is the purpose of paging?, 26. Discuss the pros and cons of paging., 27. What is a page fault?, 28. What causes internal fragmentation? , 29. What are the components (fields) of a virtual address?, 30. What is a TLB and how does it improve EAT?, 31. What are the advantages and disadvantages of virtual memory?, 32. When would a system need to page its page table?, 33. What causes external fragmentation and how can it be corrected?, , EXERCISES, ◆, , 1. Suppose a computer using direct mapping cache has 220 words of main memory and a 32-block cache, where each block cache contains 16 words. , ◆, , a) How many blocks of main memory are there?


Page 299:
268, , Chapter 6 / Memory, ◆, , b) What is the format of a memory address seen by the cache, that is, what are the sizes of the label, block and word fields?, c) For what block? of cache reference 0DB6316 will be mapped?, 2. Suppose a computer using direct mapping cache has 232 words of main memory and a cache of 1024 blocks, where each cache block contains 32 words., a) How many main cache blocks exist? ?, b) What is the format of a memory address seen by the cache, i.e., what are the tag, block, and word field sizes?, c) Which cache block will the memory refer to in the cache? map 000063FA16?, ◆ 3. Suppose a computer using a fully associative cache has 216 words of main memory and a 64-block cache, where each cache block contains 32 words., ◆, , ◆, , a) ¿ How many main memory blocks are there? will memory reference F8C916 be allocated?, , word and label field sizes?, 4. Suppose a computer using fully associative caching has 224 words of main memory and, cache, where each cache block contains 64 words ., a) How many blocks of main memory are there?, b) What is the format of a memory address vi sta for the cache, that is, what are the field, label and word sizes?, c) To which cache block will the memory reference 01D87216 be assigned?, ◆ 5. Suppose that the memory of a system has 128 million of words. The blocks are 64 words long and the cache consists of 32,000 blocks. Show the format of a main memory address assuming a bidirectional set-associative cache mapping scheme. Be sure to also include the fields, such as their sizes. 6. A 2-way set-associative cache consists of four sets. Main memory contains 2K blocks of eight words each. a) Show the main memory address format that allows us to assign main memory addresses to the cache. Be sure to include the fields as well as their sizes. b) Calculate the hit ratio for a program that iterates 3 times from locations 8 to 51 in main memory. You can leave the hit ratio in terms of fractions. 7. Suppose a computer using a defined associative cache has 216 words of main memory and a cache of 32 blocks, each cache block containing 8 words. a) If this cache is 2-Way Set Associative, what is the format of a memory address seen by the cache, ie what are the tag, set, and word field sizes? b) If this cache is 4-way set-associative, what is the format of a memory address seen by the cache? 8. Suppose a computer using associative caching has 221 words of main memory and a 64-block cache, where each cache block contains 4 words., a) If this cache is a bidirectional set - associative, what is the format of a memory address seen by the cache, i.e., what are the sizes of tag, set and word fields?


Page 300:
Exercises, , 269, , b) If this cache is 4-way set-associative, what is the format of a memory address seen by the cache?, *9. Suppose we have a computer that uses a memory address word size of 8 bits. This computer has a 16 byte cache with 4 bytes per block. The computer accesses various memory locations during the execution of a program. Suppose this computer uses direct allocation caching. Here is the format of a memory address as seen by the cache: label, 4 bits, block, 2 bits, word, 2 bits, the system accesses memory addresses (in hexadecimal) in this exact order: 6E, B9, 17, E0, 4E, 4F, 50, 91, A8, A9, AB, AD, 93, and 94. The memory addresses of the first four accesses were loaded into cache blocks as shown below. (The content of the tag is displayed in binary, and the "content" of the cache is simply the address stored at that location in the cache.) Label, Cache Content, Content (represented by address), Label, Cache Content, Content (represented by address), , Block 0, , 1110, , E0, E1, E2, E3, , Block 1, , 0001, , 14, 15, 16, 17, , Block 2, , 1011, , B8, B9 , BA , BB , , Block 3, , 0110, , 6C, 6D, 6E, 6F, , a) What is the hit ratio for the entire memory reference sequence given above?, b) Which memory blocks will be in the cache after the last address accessed?, 10. A directly mapped cache consists of eight blocks . Main memory contains 4K blocks of eight words each. The cache access time is 22 ns and the time required to fill a main memory cache slot is 300 ns. (This time it allows us to determine the missing block and bring it into the cache.) Suppose a request is always started in parallel to cache and main memory (so if it's not found in the cache, we don't need to add , this time from cache lookup to memory access). If a block is missing from the cache, the entire block is placed in the cache and access is restarted. Initially, the cache is empty. a) Shows the main memory address format that allows us to map the main memory addresses to the cache. Be sure to include the fields as well as their sizes. b) Calculate the hit rate for a program that iterates 4 times from locations 0 to 6710 in memory. c) Calculate the effective access time for this program.


Page 301:
270, Chapter 6 / Memory, 11. Consider a byte-addressable computer with 24-bit addresses, a cache capable of storing a total of 64KB of data, and 32-byte blocks. Show the format of a 24-bit memory address for: a) direct mapping, b) associative, c) 4-way defined associative, 12. Suppose a process page table contains the entries shown below. Using the format shown in Figure 6.15a, indicate where the process pages are located in memory., Frame, 1, 0, 3, 2, ◆, , Valid, Bit, 1, 0, 1, 1, 0, 0, 1 , 0, , 13. Suppose a process page table contains the entries shown below. Using the format, shown in Figure 6.15a, indicate where the process pages are located in memory., Frame, 3, 2, 0, 1, Valid, Bit, 0, 1, 0, 0, 1, 1, 0, 1, , *14. You have a virtual memory system with a 2-entry TLB, a 2-way associative set, cache, and a page table for a process P. Assume cache blocks of 8 words and a page size of 16 words. In the following system, the main memory is divided into blocks, where each block is represented by a letter. Two blocks equal one square.


Page 302:
271, , Exercises, Page Frame, 0, 4, , Page, Set 0 label, Set 1 label, , 3, 1, , label, label, , C, D, , TLB, , I, H, , 0, 1 , , Cache, , 2, , 0, 1, 2, 3, 4, 5, 6, 7, , Frame, 3, 0, 2, 1, -, , Valid, 1, 1, 0, 1, 1 , 0, 0, 0, , Frame, , Page Table, , 0, 1, 2, 3, , C, D, I, J, G, H, A, B, , Block, 0, 1, 2, 3 , 4, 5, 6, 7, , Main memory, , 3, 4, 5, 6, 7, , A, B, C, D, E, F, G, H, I, J, K, L, M , N, O, P, , Block, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, , Virtual memory, for Process P , , Given the state of the system described above, answer the following questions: a) How many bits are there in a virtual address for process P? Explain., b) How many bits are there in a physical address? Explain., c) Show the format of the virtual address 1810 (specify the name and the length of the field), which would be used by the system to translate to a physical address and then translate that virtual address to the corresponding physical address. (Hint: Convert 18 to its binary equivalent and divide it into the appropriate fields.) Explain how these fields are used to translate to the corresponding physical address. 15. Given a virtual memory system with one TLB, one cache, and one page, assume the following: • A TLB hit requires 5 ns. • A cache hit requires 12 ns. • A memory reference requires 25 ns. • A disk reference requires 200 ms (this includes update page table, cache and TLB)., • TLB hit rate is 90%., • Cache hit rate is 98%., • The page error rate is 0.001%. cache, update, but access is not restarted., • On page fault, page is fetched from disk, all updates are performed, but access is restarted., • All references are sequential (no overlap, nothing done in parallel).


Page 303:
272, , Chapter 6 / Memory, For each of the following, indicate if it is possible or not. If possible, specify the time required to access the requested data., a) TLB hit, cache hit, b) TLB miss, page table hit, cache hit, c) TLB miss, cache hit, page table, cache miss, d) TLB miss, page table miss, cache hit, e) TLB miss, page table miss, Write the equation to calculate the effective access time. 16. A system implements a paged virtual address space for each process using a level page table . The maximum size of the virtual address space is 16 MB. The page table, for the running process, includes the following valid entries (the notation → indicates that a virtual page is assigned to the given page frame, that is, it is located in that frame): Virtual Page 2 → Page Frame 4, , Virtual page 4 → Page frame 9, , Virtual page 1 → Page frame 2, , Virtual page 3 → Page frame 16, , Virtual page 0 → Page frame 1, Page size is 1024 bytes and the maximum size of the physical memory of the machine is 2 MB., a) How many bits are needed for each virtual address?, b) How many bits are needed for each physical address?, c) What is the maximum number of entries in a page table?, d) What physical address will the virtual address 152410 translate to?, e) What virtual address will the physical address 102410 translate to?, 17. a) If you You are a computer manufacturer trying to make your system as competitive as possible in terms of price, what features do you want? icas and organization would you select for your memory hierarchy?, b) If you are a computer buyer trying to get the best data performance from a system, what characteristics would you look for in your memory hierarchy?, *18. Consider a system that has multiple processors where each processor has its own cache, but main memory is shared among all processors. a) What write cache policy would you use? b) The cache coherence problem. With respect to the system just described, what problems arise if one processor has a copy of memory block A in its cache and a second processor, which also has a copy of A in its cache, updates the main memory block? A? Can you think of a way (perhaps more than one) to prevent this situation or lessen its effects?, *19. Choose a specific architecture (other than the one covered in this chapter). Do some research to see how your architecture addresses the concepts presented in this chapter, just as it did with Intel's Pentium.


Page 304:
“Who is General Failure and why is he reading my disk?”, , —Anonymous, , CHAPTER, , 7, 7.1, , Storage and Input/Output Systems, INTRODUCTION, the computer is useless without some means of inserting data and reports, , Action out of it. Having a computer that doesn't do this effectively or efficiently is little better than no computer at all. When the processing time exceeds the user's time, "thinking time", users will complain that the computer is "slow". This slowdown can sometimes have a substantial impact on productivity, measured in hard currency. Most of the time, the root cause of the problem is not with the processor or memory, but with how the system processes its input and output (I/O). I/O is about more than just file storage and retrieval. A malfunctioning I/O system can have a ripple effect, dragging down the entire computer system. In the previous chapter, we described virtual memory, that is, how the system page blocks from memory to disk to free up space for more user processes in main memory. If the disk system is slow, the process execution slows down and causes CPU lags and disk queues. The easy solution to the problem is to simply throw more resources at the system. Buy more main storage. Buy a faster processor. If we're in a particularly draconian mood, we could just limit the number of concurrent processes! Such measures are wasteful, if not downright irresponsible. If we truly understand what goes on in a computer system, we can make the best use of available resources, adding expensive features only when absolutely necessary. The goal of this chapter is to present you with an overview of ways that I/O and storage capabilities can be optimized, enabling you to make informed storage decisions. Our greatest hope is that you can use this information as a springboard for further study, and perhaps even innovation., 273


Page 305:
274, , Chapter 7 / Input/Output and Storage Systems, , 7.2, , AMDAHL'S LEW, Every time a (specific) microprocessor company announces its latest and greatest CPU, there are headlines all over the world announcing this latest technological leap. Cyberphiles around the world agree that such advances are commendable and worthy of fanfare. However, when similar advances are made in I/O technology, the story may appear on page 67 of some obscure trade magazine. Under the clamor of the media, it's easy to lose sight of the integrated nature of computer systems. A 40% speedup for one component will certainly not make the entire system 40% faster, despite the media implications, quite the contrary. In 1967, George Amdahl recognized the interrelationship of all components with the overall efficiency of a computer system. He quantified his observations into a formula, now known as Amdahl's Law. In essence, Amdahl's Law states that the overall speedup of a computer system depends on the speedup of a specific component and how much the system uses that component. In the symbols:, S=, , 1, (1 − f ) + f /k, , where, S is the increase in speed;, f is the fraction of work done by the fastest component; e, k is the acceleration of a new component. Let's say that most of your daytime processes spend 70% of their time running on CPU and 30% waiting for service from disk. Also suppose someone is trying to sell you a processor die upgrade that is 50% faster than the one you have and costs $10,000. The day before, someone called you offering a set of drives for $7,000. These new drives promise two and a half times the performance of your existing drives. He knows that his system performance is starting to slow down, so he has to do something. Which would you choose to generate the best performance improvement for the least amount? 1 − 0.7) + 0.7/1.5, , So we see a total speedup of 130% with the new processor for $10,000., For the disk option, we have:, f = 0.30, k = 2.5, so , S=, , 1, ≈ 1.22., (1 − 0.3) + 0.3/2.5, , Upgrading the drive gives us a 122% speedup for $7,000.


Page 306:
7.3 / I/O Architectures, , 275, , All things being equal, it's a tough call. Each 1% performance improvement resulting from the processor upgrade costs about $333. Each 1% with the disk upgrade costs about $318, in dollars spent per percentage point of performance improvement. Certainly other factors would influence your decision. For example, if your drives are nearing their expected end of life or you're running out of disk space, consider upgrading your drive even if it costs more than upgrading your processor. However, you need to know your options. The following sections will help you understand the general I/O architecture, with a special emphasis on disk I/O. Disk I/O closely follows CPU and memory in determining the overall efficiency of a computer system. devices and a host system, consisting of a CPU and main memory. I/O subsystems include, but are not limited to:, •, •, •, •, •, Blocks of main memory that are dedicated to I/O functions, Buses that provide the means to move data in and out system, Control modules in the host and peripheral devices, Interfaces to external components such as keyboards and drives, Wiring or communication links between the host system and its peripherals, Figure 7.1 shows how all of these components can fit together to form an integrated system , I/O subsystem. I/O modules are responsible for moving data between main memory and the interface of a given device. Interfaces are specifically designed to communicate with certain types of devices, such as keyboards, disks, or printers. The interfaces handle the details of ensuring that the devices are ready for the next batch of data or that the host is ready to receive the next batch of data from the peripheral device. The exact form and meaning of the signals exchanged between a sender and a recipient is called a protocol. The protocols comprise command signals such as "Reset Printer"; status signals such as "Tape ready"; or data passing signals, such as "Here are the bytes you requested." In most data exchange protocols, the receiver must acknowledge the commands and data being sent to it or indicate that it is ready to receive data. This type of protocol exchange is called a handshake. External devices that handle large blocks of data (such as printers and disk and tape drives) are often equipped with buffer memory. Buffers allow the host system to send large amounts of data to peripheral devices as quickly as possible, without having to wait until slow mechanical devices have actually written the data. Dedicated memory in disk drives is usually of the fast cache type, while printers often ship with slower RAM.


Page 307:
276, , Chapter 7 / Input/output systems and storage, , Memory, Bus, , CPU, , Motherboard, , Main, Memory, , Bus I/O, I/O, Module, , I/O, Module , , Device , Interface, , Device, Interface, , Cable, Device, Interface, Volatile, Memory, Control, Electronics, , Printer, , Device, Interface, , Monitor, Keyboard, , Device Adapter, Circuit Board, (Card adapter), , Device , Interface, Cache, Control, Electronics, , Disk, , FIGURE 7.1 Model I/O configuration, , The device control circuitry takes data to or from the onboard buffers and ensures that it arrives at where they are headed In the case of writing to disks, this involves making sure that the disk is seated correctly so that data is written to a specific location. For printers, these circuits move the print head or laser beam to the next character position, activate the print head, eject paper, etc. Disk and tape are forms of durable storage, so named because data written to them lasts longer than volatile main memory. However, without storage, the method is permanent. The expected useful life of data on these media is approximately five years for magnetic media and up to 100 years for optical media. 🇧🇷 These methods are scheduled I/O, interrupt-driven I/O, direct memory access, and channel-bound I/O. While one method is not necessarily better than another, the way a computer handles its I/O greatly influences the overall system, design, and performance. The goal is to know when the I/O method used by a given computer architecture is appropriate for how the system will be used. Scheduled I/O Systems that use Scheduled I/O allocate at least one register for use. of each I/O device. The CPU continuously monitors each register, waiting for the data to arrive. This is called voting. Therefore, scheduled I/O is sometimes called


Page 308:
7.3 / I/O Architectures, , 277, , as I/O probed. Once the CPU detects a "data ready" condition, it acts on the instructions programmed for that particular register. The benefit of using this approach is that we have programmatic control over the behavior of each device. Program changes can make adjustments to the number and types of devices in the system, as well as their priorities and polling intervals. However, constant polling for logs is a problem. The CPU is in a continuous "busy wait" loop until it begins to service an I/O request. It doesn't do anything useful, it works until there is I/O to process. Due to these limitations, scheduled I/O is more suitable for special purpose systems such as ATMs and systems that control or monitor environmental events. considered as the inverse of programmed I/O. Instead of the CPU continually asking its attached devices if they have any input, the devices tell the CPU when they have data to send. The CPU continues with other tasks until interrupted by a device requesting the service. Interrupts are usually signaled by a bit in the CPU flag register called the interrupt flag. Once the interrupt flag is set, the operating system interrupts whatever program is currently running, saving that program's state and variable information. The system then obtains the address of the vector that points to the address of the I/O service routine. After the CPU completes the I/O service, it restores the saved information of the program that was running when the interrupt occurred, and program execution resumes. Interrupt-driven I/O is similar to scheduled I/O in that service routines can be modified to accommodate hardware changes. Since the arrays for different types of hardware are generally kept in the same locations on systems running the same type and level of operating system, these arrays can easily be changed to point to vendor-specific code. For example, if someone creates a new type of drive that is not yet supported by a popular operating system, the manufacturer of that drive can update the I/O vector of the drive to point to that drive code. specific. Unfortunately, some of the early DOS-based virus writers also used this idea. They would replace DOS's I/O arrays with pointers to their own nefarious code, eradicating many systems in the process. Many of today's most popular operating systems employ interrupt-driven I/O. Fortunately, these operating systems have mechanisms in place to protect against this type of vector manipulation. The device. During I/O, the CPU executes instructions similar to the following pseudocode: WHILE More input AND NO error, ADD 1 TO Byte Count, IF Byte Count > Total Bytes to Transfer THEN, EXIT


Page 309:
278, , Chapter 7 / Input/Output Systems and Storage, ENDIF, Put byte in destination buffer, Raise byte-ready signal, Initialize timer, REPEAT, WAIT, UNTIL byte acknowledged, Timeout, OR Error , ENDWHILE, Clearly, these instructions are simple enough to be programmed on a dedicated chip. That's the idea behind direct memory access (DMA). When a system uses DMA, the CPU offloads the execution of tedious I/O instructions. To perform the transfer, the CPU provides the DMA controller with the location of the bytes to be transferred, the number of bytes to be transferred, and the destination device or memory address. This communication usually takes place through special I/O registers in the CPU. An example DMA configuration is shown in Figure 7.2. Once the appropriate values ​​are placed in memory, the CPU sends a signal to the DMA subsystem and continues with its next task, while DMA takes care of the I/O details. Once the I/O is complete (or ends with an error), the DMA subsystem signals the CPU by sending another interrupt. As you can see in Figure 7.2, the DMA controller and the CPU share the memory bus. Only one of them at a time can have control of the bus, that is, be the master of the bus. In general, I/O takes precedence over CPU memory seeks for instructions and program data because many I/O devices operate within tight timing parameters. If they don't detect any activity within a specified period, they timeout and, CPU, main, memory, memory address, number of bytes, device number, data, bus, device, interface, printer, device, interface, , Disk, , DMA, , FIGURE 7.2 DMA Configuration Example


Page 310:
7.3 / I/O Architectures, , 279, , abort the I/O process. To avoid device timeouts, DMA uses memory cycles, which would otherwise be used by the CPU. This is called bike theft. Fortunately, I/O tends to generate bursty traffic on the bus: data is sent in chunks or bundles. The CPU must have access to the bus between bursts, although this access may not be long enough to prevent the system from being accused of "sniffing on I/O". Channel I/O, Scheduled I/O transfers data one byte per turn. Interrupt-driven I/O can handle data one byte at a time or in small chunks, depending on the type of device participating in the I/O. Slower devices, such as keyboards, generate more interrupts per number of bytes transferred than disks or printers. All DMA methods are block-oriented, interrupting the CPU only after the transfer of a group of bytes completes (or fails). After the I/O of the DMA signals is finished, the CPU can provide the address of the next block of memory to read or write. In the event of a failure, the CPU is solely responsible for taking the appropriate measures. Therefore, DMA I/O requires only slightly less CPU input than interrupt-driven I/O. This overload is suitable for small, single-user systems; however, it is not well suited to large, multi-user systems, such as mainframe computers. Most mainframes use a clever type of DMA interface known as an I/O channel. With channel I/O, one or more I/O processors control multiple I/O paths, called channel paths. Channel paths for "slow" devices such as terminals and printers can be combined (multiplexed), allowing multiple such devices to be managed through a single controller. On IBM mainframes, a multiplexed channel, the path is called a multiplexer channel. Channels for disk drives and other "fast" devices are called selector channels. Unlike DMA circuits, IOPs have the ability to execute programs that include arithmetic logic and branch instructions. Figure 7.3 shows a simplified channel I/O configuration. IOPs execute programs that the main processor places in the system's main memory. Consisting of a series of Channel Command Words (CCWs), these programs include not only the actual transfer instructions, but also commands that control the I/O devices. These commands include things like various types of device initializations, printer page eject, and tape rewind commands, to name a few. Once the I/O program has been placed in memory, the host issues a start subchannel (SSCH) command, which tells the IOP where the program can be found in memory. After the IOP has completed its work, it places the completion information in memory and sends an interrupt to the CPU. The CPU then gets the completion information and takes the appropriate action on the return codes. The main distinction between standalone DMA and channel I/O lies in the intelligence of the IOP. The IOP negotiates protocols, issues device commands, converts storage encoding to memory encoding, and can transfer entire files or groups of files independent of the host CPU. The host only needs to create the program instructions for the I/O operation and tell the IOP where to find them.


Page 311:
280, , Chapter 7 / Input/Output Systems and Storage, , Terminal, Controller, , Main, Memory, , I/O Bus, , Memory, Bus, , I/O, Bridge, , Printer, , Local Area, Network, , I/O Processor, (IOP), Disk, , I/O Processor, (IOP), , Disk, , Tape, I/O Processor, (IOP), , CPU, , Tape, , Disk , , Disk, , Printer, , FIGURE 7.3 Channel I/O Configuration, , Like stand-alone DMA, an IOP must steal memory cycles from the CPU. isolate the host from the I/O operation. When copying a file, from disk to tape, for example, the IOP uses the system memory bus only to get its instructions from main memory. The rest of the transfer is done using only the I/O bus. Because of its intelligence and bus isolation, the I/O channel is used in high-performance transaction processing environments where its cost and complexity can be justified. for the computer bus architecture using the schematic, shown in Figure 7.4. The important ideas conveyed by this diagram are: • A system bus is a shared resource among many components of a computer system. • Access to this share must be controlled. That is why a control bus is needed. From our discussions in the previous sections, it is clear that the memory, the bus, and the I/O bus can all be separate entities. In fact, it's often a good idea to separate them. One good reason for having memory on its own bus is that memory transfers can be synchronous, using some multiples of CPU clock cycles to retrieve data from main memory. In a properly functioning system, there is never the problem of memory going offline or the same types of errors that affect peripheral equipment, such as a printer out of paper.


Page 312:
7.3 / I/O architectures, , CPU, (ALU, registers and controls), , 281, , I/O, , Memory, , Data bus, Address bus, Control bus, , FIGURE 7.4 High level view of a system bus, I/O buses, on the other hand, cannot work synchronously. They must take into account the fact that I/O devices may not always be ready to process an I/O transfer. I/O control circuits located on the I/O bus and within I/O devices negotiate with each other to determine when each device can use the bus. Because these handshakes occur every time the bus is accessed, I/O buses are called asynchronous. We often distinguish synchronous from asynchronous transfers by saying that a synchronous transfer requires the sender and receiver to share a common clock for time. But asynchronous bus protocols also require a clock for bit timing and to outline signal transitions. This idea will become clear after we look at an example. Consider, once again, the configuration shown in Figure 7.2. For the sake of clarity, we have not separated the data, address, and control lines. The connection between the DMA circuitry and the device interface circuitry is represented more accurately in Figure 7.5, which shows the individual component buses. FIGURE 7.5 DMA Configuration Showing Separate Address, Data, and Control Lines


Page 313:
282, , Chapter 7 / Input/output and storage systems, Data, , n, , Address, , n, n, , n, Cache, , Disk, Controller, , Address, Decoder, , I/O Controller, , Disk, Decoder, Disk, Request, Ready, Write/Read, Clock (Bus), Reset, Error, FIGURE 7.6 A disk controller interface with connections to the I/O bus, Figure 7.6 provides the details of how the interface works drive connects to all three buses. Address and data buses consist of many individual conductors, each carrying one bit of information. The number of data lines determines the width of the bus. A data bus with eight data lines carries one byte at a time. The address bus has a sufficient number of conductors to uniquely identify each device on the bus. The set of control lines shown in Figure 7.6 is the minimum of which we need, for our illustrative purposes. Real I/O buses often have more than a dozen control lines. (The original IBM PC was over 20!) Control lines coordinate the activities of the bus and its attached devices. To write data to the disk drive, our example bus performs the following sequence of operations: 1. The DMA circuit places the address of the disk controller on the address lines and raises (asserts) the request and write signals. 2. With the request signal on, the decoder circuitry in the controller polls the address lines. 3. By detecting its own address, the decoder enables the drive's control circuitry. If the disk is available to write data, the controller sets a signal on the Ready line. At this point, the handshake between DMA and the controller is complete. With the Ready signal raised, no other device can use the bus. 4. The DMA circuit then places the data on the lines and lowers the Request signal. , from the data lines to the disk buffer and then low its Ready signal.


Page 314:
7.3 / I/O architectures, t0, , t1, , t2, , t3, , t4, , t5, , t6, , t7, , t8, , t9, , 283, , t10, , Request, , Address, , Write /Read, , Ready, , Data, (Bus), Clock, , FIGURE 7.7 Timing Diagram of a Bus, , To make this picture clearer and more accurate, engineers describe the operation of the bus using timing diagrams. The timing diagram for our disk write operation is shown in Figure 7.7. The vertical lines, labeled t0 through t10, specify the duration of the various signals. In a real-time diagram, time intervals would be assigned an exact duration, typically around 50 nanoseconds. The signals on the bus can only change during a clock cycle transition. Please note that the signals shown in the diagram do not go up and down instantly. This reflects the physical reality of the bus. A short period of time should be allowed for the signal level to stabilize or “cool down”. This setup time, while small, contributes to a large delay on long I/O transfers. Many real I/O buses, unlike our example, do not have separate address and data lines. Due to the asynchronous nature of an I/O bus, the data lines can be used to store the address of the device. All we need to do is add another control line, which indicates whether the signals on the data lines represent an address or data. This approach is in contrast to a memory bus where the address and data must be available simultaneously, until a command to do otherwise appears on the line. On small computer systems, this "talk only when talking" approach is not very useful. This implies that all system activity originates from the CPU, when in fact the activity originates


Page 315:
284, , Chapter 7 / Input/output systems and storage, , BYTES, DATA AND INFORMATION . 🇧🇷 🇧🇷 FOR THE RECORD, digerati need not be illiterate., , — Bill Walsh, Lapsing into a Comma, Contemporary Books, 2000, , or many people use the word information as a synonym for data and data as a synonym for bytes. In fact, we often use data interchangeably with bytes in this text for ease of reading, hoping that the context will clarify the meaning. meanings of these words., in its most literal sense, the word data is plural. It comes from the Latin singular datum. Therefore, to refer to more than one piece of data, the word data is appropriately used. In fact, it's easy on our ears when someone says, "Recent mortality data indicates that people are living longer now than they did a century ago." But we can't explain why we cringe when someone says something like "Page fault occurs when data is swapped from memory to disk." When we use data to refer to something stored in a computer system, we are actually conceptualizing the data as an "indistinguishable mass" in the same sense that we think of air and water. Air and water consist of several discrete elements called molecules. Thus, a mass of data consists of discrete elements called data. No educated person who is fluent in English would say that he breathes air or bathes in water. Therefore, it seems reasonable to say: “. 🇧🇷 🇧🇷 data is exchanged from memory to disk”. Most scholarly sources (including the American Heritage Dictionary), now recognize data as a singular collective noun when used in this way. Strictly speaking, computer storage media does not store data. They store patterns of bits called bytes. For example, if you use a binary sector editor to examine the contents of a disk, you might see the pattern 01000100. So what insight did you gain from viewing it? For all you know, that bit pattern could be the binary code of a program, part of an operating system framework, a photograph, or even someone's bank balance. If you know for sure that the bits represent some, , F, , with the user. In order to communicate with the CPU, the user must have a way to get its attention. To that end, small systems employ interrupt-driven I/O. Figure 7.8 shows how a system might implement interrupt-controlled I/O. Everything is the same as our previous example, except that the peripherals now have a way to communicate with the CPU. Each peripheral device in the system has access to an interrupt request line. The interrupt control chip has one input for each interrupt line. Each time an interrupt line is activated, the controller decodes the interrupt and generates the interrupt input (INT) to the CPU. When


Page 316:
7.3 / I/O Architectures, , 285, , numeric quantity (as opposed to program code or image file, for example) and, being stored in two's complement binary, it can be safely said to be the decimal number 68. But you still don't have a dice. Before you can have any data, someone must assign some context to that number. Is it the age or the height of a person? Is that the model number of a can opener? If you find that 01000100 comes from a file containing the temperature output of an automated weather station, then you have a piece of data. The file on disk can then be correctly called data, file. By now, you've probably assumed that weather data is in degrees Fahrenheit, because nowhere on Earth has it ever reached 68° Celsius. But, you still have no information. The data doesn't make sense: is it the current temperature in Amsterdam? Is the temperature recorded at 2 am three years ago in Miami? Data 68 becomes information only when it has meaning for a human being. Another Latin plural noun that has recently been recognized in singular use is the word media. In the old days, educated people used this word only when they wanted to refer to more than one medium. Newspapers are a type of communication medium. Television is another. Collectively they are media. But now, some publishers accept the singular usage as "Right now, the media is gathering on Capitol Hill" as artists may paint using a medium of watercolor or oil paint, medium, video recording equipment Computer data can be recorded on an electronic medium such as tape or disk. Collectively, these are electronic media. But you will rarely find a practitioner who intentionally uses the term correctly. It is much more common to find statements like “Volume 2 expelled. Put new media in the tape drive.” In that context, it is debatable whether most people would understand the directive." 🇧🇷 🇧🇷 put a new media in the tape drive”. Semantic arguments like these are symptomatic of the kinds of problems computer professionals face when trying to express human ideas in digital form, and vice versa. There will definitely be something lost in translation, and we have learned to accept that. There are, however, limits beyond which some of us are not willing to go. These thresholds are sometimes referred to as "quality." The CPU is ready to process the interrupt and sets the interrupt acknowledge (INTA) signal. Once the interrupt handler receives this confirmation, it can decrement its INT signal. System designers must, of course, decide which devices should take precedence over others when more than one device generates interrupts simultaneously. This design decision is built into the controller. Each system using the same operating system and interrupt handler will connect high-priority devices (such as a keyboard) to the same interrupt request line. The number of


Page 317:
286, , Chapter 7 / Input/output and storage systems, , CPU, , Main, Memory, , INT, INTA, D0, D1, , Keyboard, , Mouse, , Disk, , Printer, , Interrupt, Controller, , I / The bus, the adapter, FIGURE 7.8 An I/O subsystem that uses interrupts, interrupt request lines are limited across systems, and in some cases, the interrupt can be shared. Shared interrupts are fine when it's clear that no two devices need the same interrupt at the same time. For example, a scanner and a printer can often coexist peacefully using the same interrupt. This is not always the case with serial mice and modems, where, unknown to the installer, they can use the same interrupt, causing strange behavior on both. , sequential media such as punched cards and magnetic or paper tapes were the only types of durable storage available. If the data that was needed was written to the end of a tape, reel, the entire volume had to be read, one record at a time. Slow readers and small system memory made this process excruciatingly slow. Tapes and cards were not only slow, but also degraded rapidly due to the physical and environmental stresses they were exposed to. The paper tape often stretched and broke. Open-reel magnetic tape was not only stretched, but also subject to rough handling by operators. Cards can be torn, lost, and misshapen. Against this technological backdrop, it's easy to see how IBM fundamentally changed the world of computing in 1956, when it implemented the first commercial disk-based computer called the Random Access Method of Accounting and Control. , or RAMAC for short. By today's standards, the drive on that first machine was incomprehensibly large and slow. Each disk platter was 24 inches in diameter and contained only 50,000 7-bit data characters on each surface. Fifty double-sided plates were mounted on a spindle that was housed in gaudy glass, the size of a small garden shed. Total storage capacity per spindle was only 5 million characters, and on average it took a full second to access data on disk. The unit weighed over a ton and cost millions of dollars to rent. (At the time, you couldn't buy IBM equipment.)


Page 318:
7.4 / Magnetic Disk Technology, , 287, , On the other hand, in early 2000, IBM began marketing a high-capacity disk drive for use in laptop computers and digital cameras. These disks are 1 inch in diameter, store 1 gigabyte (GB) of data, and provide an average access time of 15 milliseconds. The drive weighs less than 30 grams and retails for less than $300. Disk drives are called random access (sometimes direct) devices because each storage unit, the sector, has a unique address to which it it can be accessed regardless of the sectors that surround it. As shown in Figure 7.9, sectors are divisions of concentric circles called tracks. On most systems, each track contains exactly the same number of sectors. Each sector contains the same number of bytes. Therefore, data is written more “thickly” in the center of the disk than it is at the outer edge. Some manufacturers pack more bytes onto their disks by making all sectors roughly the same size, placing more sectors on the outer tracks than on the inner tracks. This is called zoned bit registration. Bit-per-zone recording is rarely used because it requires more sophisticated drive control electronics than traditional systems. The tracks on the disc are numbered consecutively starting with track 0 on the outer edge of the disc. The sectors, however, may not be in consecutive order around the perimeter of a track. They are sometimes "skipped" to give the drive's circuitry time to process the contents of one sector before reading the next sector. That is, Track, Sector, Intersector, Gap, Intertrack, Gap, Header, Data, Sync, Information, Trailer, Error Correction, Code (ECC), FIGURE 7.9 Disk sectors showing gaps between sectors and logical sector, Format


Page 319:
288, , Chapter 7 / Input/output and storage systems, , called interleaving. Interleaving varies depending on the speed of disk rotation, as well as the speed of the disk circuitry and its buffers. Most of today's fixed disk drives read disks one track at a time, not one sector at a time, so interleaving is becoming less common. one or more metal or glass disks, called plates, to which a thin film of magnetizable material is attached. The disk platters are stacked on a spindle, which is rotated by a motor located inside the drive bay. The discs can spin as fast as 15,000 revolutions per minute (rpm), with the most common speeds being 5,400 rpm and 7,200 rpm. Read/write heads are typically mounted on a rotating actuator arm that is held in place by induced magnetic fields in coils around the axis of the actuator arm (see Figure 7.10). When the actuator is energized, the entire comb of read and write heads moves toward or away from the center of the disk. Despite continuous improvements in magnetic disk technology, it is still impossible to mass-produce a completely error-free media. 🇧🇷 Although the probability of error is small, errors should be expected. Two mechanisms are used to reduce disk surface errors: special encoding of the data itself and error correction algorithms. (This special encoding and some error-correcting codes were discussed in Chapter 2.) These tasks are handled by integrated circuits in the disk controller hardware. Other circuitry in the drive controller takes care of head location and drive timing.


Page 320:
7.4 / Magnetic Disk Technology, , 289, , In a stack of disks, all the tracks directly above and below each other form a cylinder. A stack of read and write heads access one cylinder at a time. The cylinders describe circular areas on each disk. Typically, there is one read/write head per usable disk area. (Older drives, especially removable drives, did not use the top surface of the top platter or the bottom surface of the bottom platter.) Fixed disk heads never touch the surface of the disk. Instead, they float above the surface of the disk in a cushion of air only a few microns thick. When the drive is turned off, the heads retract to a safe place. This is called parking the heads. If a read/write head were to touch the disk surface, the disk would be unusable. This condition is known as head lock. Headlocks were common during the early years of disk storage. The mechanical and electronic components of the first generation disk drive were expensive relative to the price of disk platters. To provide the most storage at the lowest cost, computer manufacturers have created drives with removable disks called disk packs. Consequently, large gaps between the head and the disk were required to prevent these impurities from causing collisions between the heads. (Despite these large head-to-disk dumps, frequent failures persisted, with some companies experiencing both downtime and uptime.) The price paid for the large head-to-disk dump was substantially lower data density. The greater the distance between the head and the disk, the greater the load on the flow case, the disk must be in order for the data to be readable. Stronger magnetic charges require more particles to participate in a flow transition, resulting in a lower data density for the drive. Eventually, cost reductions in controller circuitry and mechanical components allowed the widespread use of sealed disk drives. IBM invented this technology, which was developed under the code name "Winchester." Winchester soon became a generic term for any sealed drive. Today, with removable packages, units are no longer manufactured, it is no longer necessary to make the distinction. Sealed drives allow for tighter head-to-drive clearances, higher data densities, and faster spin-up speeds. These factors make up the performance characteristics of a hard drive. Seek time is the time it takes for a record arm to position itself on the required track. Seek time does not include the time it takes for the head to read the directory on disk. The disk directory maps logical file information, for example, my_story.doc, to a physical sector address, such as cylinder 7, surface 3, sector, 72. Some high-performance disk drives virtually eliminate seek time by providing a main read. / write function for each track on each usable surface of the disk. With no moving arms in the system, the only delays in data access are caused by the rotation delay. Rotation delay is the time it takes for the required sector to get under a read/write head. The sum of the rotation delay and the seek time is known as the access time. If we add to the access time the time it takes to read data from the disk, we get a quantity known as transfer time, which of course varies depending on the amount of data read. Latency is a direct function of rotation speed. It is a measure of the amount of time it takes to


Page 321:
290, , Chapter 7 / Input/Output Systems and Storage, , the desired sector is moved under the read/write head after the disk arm has been positioned over the desired track. Often quoted as an average, it is calculated as:, 60 seconds, 100 ms, ×, disk rotation speed second, 2, , To help you assess how all this terminology fits together, we have provided a typical disk specification as Figure 7.11 ., Since the disk directory must be read before each data read or write operation, the location of the directory can have a significant impact on the overall performance of the disk drive. The outermost tracks have the lowest bit density, by area measure, so they are less prone to bit errors than the innermost tracks. To ensure the best reliability, the drive's directories can be placed on the outermost track, track 0. This means that for each access, the arm must move to track 0 and then back to the required data track. Therefore, performance is affected by the wide arc formed by the access arms. Improvements in recording technology and error correction algorithms allow the directory to be placed where it performs best: in the midrange. This substantially reduces arm movement, providing the best possible performance. Some modern systems, but not all, take advantage of the central and tracking directory location. Directory location is one element of a disk's logical organization. The logical organization of a disk is a function of the operating system that uses it. An important component of this logical organization is the way the sectors are mapped. Fixed disks contain so many sectors that keeping track of each one is infeasible. Consider the disk described in our data sheet. Each track contains 132 sectors. There are 3196 tracks per surface and 5 surfaces on disk. This means that there are a total of 2,109,360 sectors on the disk. Therefore, an allocation table that lists the status of each sector (the status is recorded as 1 byte) would consume more than 2 megabytes of disk space. Not only is too much disk space wasted for overhead, but reading this data structure would consume an inordinate amount of time each time we need to check the status of a sector. (This is a frequently performed task.) For this reason, operating systems address sectors in groups, called blocks or clusters, to simplify file management. The number of sectors per block determines the size of the allocation table. The smaller the size of the allocation block, the less space will be wasted when a file does not occupy the entire block; however, smaller block sizes make the allocation tables larger and slower. We'll take a deeper look at the relationship between directories and file allocation structures in our discussion of floppy disks in the next section. One final comment on the drive specification shown in Figure 7.11: You can see that it also includes estimates for drive reliability under the "Reliability and Maintainability" heading. According to the manufacturer, this particular drive is designed to run for five years and tolerate stopping and starting 50,000 times. Under the same heading, a mean time to failure (MTTF) value of 300,000 hours is given. Surely this value cannot mean that the expected life of the drive is 300,000 hours; that's just over 34 years if the drive runs continuously.


Page 322:
7.4 / Magnetic Disk Technology, , RELIABILITY AND MAINTAINABILITY:, , CONFIGURATION:, Formatted Capacity, MB, , 1340, , MTTF, , 300,000 hours, , Embedded Controller, , SCSI, , Start/Stop Cycles, , 50,000, , Method encoding length , , RLL 1.7, , Design life, , 5 years (minimum), , Buffer size, , 64K, , Data errors, , Plates, , 3, , Data surfaces, , 5, , Tracks Per Area, , 3196, , PERFORMANCE: Track Density, 5080 tpi, Seek Times, Write Density, 92.2 Kbpi, Track Per Track, 4.5 ms, Bytes Per Block, 512, Average, 14 ms, Sectors per strip, , 291, , 132, , (non-recoverable), , Average latency, , <1 per 1013 bits read, , 6.72 ms, , Rotational speed, , PHYSICAL:, , (+/– 0, 20%), , 4464 rpm, , Height, , 12.5 mm, , Length, , 100 mm, , Driver overload, , Width, , 70 mm, , Data transfer rate:, , Weight, , 170 g , , To/from media, , 6.0 MB /sec, , To/from host, , 11.1 MB/s, , Temperature (C°), Operating, , 5 °C to 55 °C, , Non-operating/Storage, , 40 °C to 71 °C, , Relative Humidity, , 5% to 95%, , Acoustic Noise, , 33 dBA, idle, , < 200 µSec, , Start Time, (0 – Unit Ready), , 5 sec, , POWER REQUIREMENTS, Mode, , +5 VDC, +5% – 10%, , Power, +5.0 VDC, , Startup, , 1000 mA, , 5000 mW , , Idle, , 190mA, , 950mW, , Standby, , 50mA, , 250mW, , Idle, , 6mA, , 30mW, , FIGURE 7.11 A typical hard drive specification given by disk drive manufacturers, for example. The specification states that the unit is designed to last only five years. This apparent anomaly owes its existence to the statistical methods of quality control commonly used in the manufacturing industry. Unless the drive is manufactured under a government contract, the exact method used to calculate the MTTF is up to the manufacturer. Typically, the process involves taking random samples from production lines and running the drives in less-than-ideal conditions for a specified number of hours, typically more than 100. The number of failures is then


Page 323:
292, , Chapter 7 / Input/Output Systems and Storage, , plotted against probability curves to obtain the resulting MTTF value. In short, the number of "Design Life" is much more believable and understandable. They are often called floppy disks because the disk's magnetic coating resides on a flexible Mylar substrate. The data densities and rotational speeds (300 or 360 RPM) of floppy disks are limited by the fact that floppy disks cannot be sealed in the same way as hard drives. In addition, the floppy disk's read/write heads must touch the magnetic surface of the disk. The friction of the read/write heads causes the magnetic coating to wear away and some particles to stick to the read/write heads. Periodically, the heads must be cleaned to remove the particles resulting from this abrasion. the disk drive uses this hole to determine the location of the first sector, that is, the outer edge of the disk. Floppy disks are more uniform than fixed disks in their organization and operation. Consider, for example, the 3.5" 1.44 MB DOS/Windows floppy disk. Each sector of the floppy disk contains 512 bytes of data. There are 18 sectors per track and 80 tracks per side. Sector 0 is the boot sector If the disk is bootable, this sector contains information that allows the system to boot from the floppy instead of its fixed disk Immediately after the boot sector, there are two identical copies of the file, allocation table (FAT) On standard 1.44 MB disks, each FAT is nine sectors long. On 1.44 MB floppy disks, a cluster (the addressable drive) consists of one sector, the root directory of the disk occupies 14 sectors starting with sector 19 Each root directory entry occupies 32 bytes, within which it stores a filename, the file, attributes (file, hidden, system, etc.), the timestamp of the file, the size of the file, and its file number. initial cluster (sector) The initial cluster number points to an entry in F AT that allows us to follow the chain of sectors covered by the data, file if they occupy more than one cluster. A FAT is a simple table structure that keeps track of each cluster on disk, with bit patterns indicating whether the cluster is free, reserved, busy with data, or bad. Since a 1.44 MB disk contains 18 × 80 × 2 = 2880 sectors, each FAT entry needs 14 bits, just to point to one cluster. In fact, each FAT entry on a floppy disk is 16 bits wide, so the organization is known as FAT16. If a disk file spans more than one cluster, the first FAT entry in that file will also contain a pointer to the next FAT entry in the file. If the FAT entry is the last sector of the file, the "next FAT entry" pointer contains an end-of-file marker. FAT's linked list organization allows files to be stored on any set of free sectors, regardless of whether they are contiguous. To make this idea clearer, consider the FAT entries provided in Figure 7.12. As mentioned above, the FAT contains an entry for each cluster on the disk. Let's say


Page 324:
7.5 / Optical Discs, FAT Index, , 120, , 121, , 122, , 123, , 124, , 125, , 126, , 127, , FAT, Contents, , 97, , 124, , <EOF>, , 1258 , , 126, , <BAD>, , 122, , 577, , 293, , FIGURE 7.12 A file allocation table, , that our file occupies four sectors starting with sector 121. When we read this file, the following happens: , 1 The disk directory is read to find the initial cluster (121). The first group is read to retrieve the first part of the file. 2. To find the rest of the file, the FAT entry at location 121 is read, which provides the next data group of the file and the FAT entry (124). 3 Cluster 124 and the FAT entry for cluster 124 are read. The FAT entry points to the next data in sector 126. 4. Data sector 126 and FAT entry 126 are read. The FAT entry points to the following data in sector 122. 5. Data sector 122 and FAT entry 122 are read. By seeing the <EOF> marker, for the next data sector, the system knows that it has the last sector of the file. It doesn't take long thinking to see the performance-enhancing opportunities in organizing your FAT drives. This is the reason why FAT is not used in large-scale, high-performance systems. FAT is still very useful for floppy disks for two reasons. First, performance is not a big concern for floppy disks. Second, floppy disks have standard capacities, unlike fixed disks for which capacity increases are an almost daily event. Therefore, plain FAT structures are unlikely to cause the types of problems encountered with FAT16, since drive capacities often exceed 32 megabytes. With 16-bit cluster pointers, a 33MB disk must have a cluster size of at least 1KB. As drive capacity increases, FAT16 sectors get larger, wasting a lot of disk space when small files don't take up entire clusters. Drives larger than 2 GB require cluster sizes of 64 KB! data, densities on diskettes. Most popular among these technologies are Zip drives, introduced by Iomega Corporation, and various magneto-optical designs that combine the rewrite properties of magnetic storage with the precise positioning of the read/write head provided by laser technology. However, for long-term, high-volume data storage purposes, floppy disks are quickly becoming obsolete with the advent of inexpensive optical storage methods. it is competitive with the tape. Optical discs come in a variety of formats, the most popular format being the ubiquitous CD-ROM (compact disc read-only memory), which can hold more than 0.5 GB of data. CD-ROMs are read-only media, so


Page 325:
294, , Chapter 7 / I/O and storage systems, , are ideal for distributing software and data. CD-R (CD Recordable), CD-RW (CD Rewritable), and WORM (Write Once, Read Many) discs are optical storage devices, generally used for long-term data archiving and high-speed data output. speed. , CD-R and WORM provide unlimited amounts of tamper-evident storage for documents and data. For long-term data storage, some computer systems send the output directly to optical storage instead of paper or microfiche. This is called Computer Output Laser Disc (COLD). Robotic storage libraries called optical jukeboxes provide direct access to myriad optical discs. Jukeboxes can store tens to hundreds of disks, with total capacities from 50 GB to 1,200 GB and more. Proponents of optical storage claim that optical discs, unlike magnetic media, can be stored for 100 years without noticeable degradation. (Who could dispute this statement?), 7.5.1, , CD-ROM, CD-ROMs are 120-millimeter (4.8-inch) diameter polycarbonate (plastic) discs to which a sheet of reflective aluminum. The aluminum film is sealed, with a protective layer of acrylic to prevent abrasion and corrosion. The aluminum layer reflects the light emitted by a green laser diode located under the disc. The reflected light passes through a prism, which bends the light towards a photodetector. The photodetector converts the light pulses into electrical signals, which it sends to the unit's electronic decoder (see Figure 7.13). Compact discs are recorded from the center to the outer edge using a single spiral track of bumps on the polycarbonate substrate. . These bumps are called holes because they look like holes when viewed from the top surface of the CD. The linear spaces between the wells are called lots. The pits are 0.5 microns wide and 0.83 to 3.56 microns long. (The edges of the holes correspond to binary 1s). The bulge formed by the bottom of a hole is as tall as a quarter of the wavelength of the light produced by the green laser diode. This means that the, , Photodetector, , Lens, , Sled, , Prism, , Axis, Disk, Motor, , Laser, Sled, Motor, , FIGURE 7.13 The inside of a CD-ROM drive


Page 326:
7.5 / Optical discs, , 295, , the collision interferes with the reflection of the laser beam in such a way that the light, reflected by the collision, exactly cancels the incident laser light. This results in pulses of light and dark, which are interpreted by the drive circuitry as binary digits. If you could “untwist” a CD-ROM or audio CD track and place it on the ground, the sequence of holes and dirt would stretch for almost 5 miles (8 km). (At just 0.5 microns wide, less than half the thickness of a human hair, it would be barely visible to the naked eye.) Although a CD has only one track, a 360° sequence of pits and terrain, the disc is referred to as a track in most of the optical disc literature. Unlike the magnetic storage tracks in the center of the disk, they have the same bit density as the tracks on the outer edge of the disk. CD-ROMs were designed to store music and other sequential audio signals. Data storage applications were thought of later, as you can see in the data sector format in Figure 7.15. The data is stored in 2352-byte blocks called sectors, which are located along the length of the track. Sectors are made up of 98 588-bit primitive units called channel frames. As shown in Figure 7.16, channel frames consist of timing information, a header, and 33 17-bit symbols for a payload. The 17-bit symbols are encoded using an RLL(2, 10) code called EFM (eight to fourteen modulation). The electronics in the drive read and interpret (demodulate) channel frames to create another data structure called a small frame. Small frames are 33 bytes wide, 32 bytes of which are occupied by user data. The remaining byte is used for the subchannel information. There are eight subchannels named P, Q, R, S, T, U, V, and W. All except P (indicating start and end times) and Q (containing control information) have meaning only for audio applications. , Pit, Land, , 0.5 µm, 1.6 µm, , FIGURE 7.14 CD Track Spiral and Track Zoom


Page 327:
296, , Chapter 7 / Input/Output Systems and Storage, , Mode 0, , Mode 1, , 12 bytes, , 4 bytes, , 4 bytes, , Synchronization, , Header, , All zeros, , 12 bytes, , 4 bytes , , Sync, , Header, , 2048 bytes, User data, , 4 bytes, , 8 bytes, , 278 bytes, , CRC, , All, Zeros, , ReedSoloman, , Error detection and correction, , Mode 2, , 12 bytes, , 4 bytes, , Sync, , Header, , 1 byte, , 1 byte, , 1 byte, , Minutes Seconds Frames, , 2336 bytes, User data, , 1 byte, Mode, , FIGURE 7.15 Formats of CD data sector , , Most compact discs operate at constant linear velocity (CLV), which means that the speed at which sectors pass through the laser remains constant, regardless of whether those sectors are at the beginning or at the end of the disc. Constant speed is achieved by spinning the drive faster when accessing the outermost tracks than the innermost ones. A sector number is addressable by the number of track minutes and seconds that lie between it and the beginning (middle) of the disk. These “minutes and seconds” are calibrated under the assumption that the CD player processes 75 sectors per second. Computer CD-ROM drives are much faster than that, with speeds up to 44 times (44X) the speed of audio CDs (with faster speeds sure to follow). To locate a given sector, the sled moves perpendicular to the track of the disk, trying to guess where a given sector might be. After reading an arbitrary sector, the head tracks to the desired sector. Sectors can have one of three different formats, depending on the mode used to write the data. There are three different modes. Modes 0 and 2, intended for music recording, do not have error correction capabilities. Mode 1, intended for data recording, has two levels of error detection and correction. These formats are shown in Figure 7.15. The total capacity of a CD recorded in Mode 1 is 650 MB. Modes 0 and 2 can hold 742 MB, but cannot be used reliably for data, recording. Audio CDs have music recorded in sessions that, when viewed from below, give the appearance of broad concentric rings. When CDs started to be


Page 328:
7.5 / Optical discs, channel, frame, 27-bit, , 297, , 33 17-bit symbols, , sync, sync, payload, demodulation, small, frame, , 98 of these = 1 sector, 33 bytes 8-bit, 24 bytes, , 1 byte, bits, sync, PQRSTUVW, , Small, Frame, Number, , Payload, , 8 98-bit, Subchannels, 98 24-byte, Blocks, , PQ R S T U V W, 0, 1, 2 , , 95 , 96, 97, , 8 bytes, Cross Interleave, Reed-Soloman, Code (CIRC), , byte byte, 0, 1, byte byte, 24 25, , byte byte, 22 23, byte byte, 46 47, , byte , 2302, byte, 2326, , byte, 2324, byte, 2350, , byte, 2303, byte, 2327, , byte, 2325, byte, 2351, , Subchannel Q, (98 bit), 2 bits, , 4 bits, , 4 bits, , 72 bits, , sync control mode, , Q data, , 16 bits, CRC for, Q Channel, , FIGURE 7.16 Physical and logical CD formats, , used for data storage, the idea of a music "recording session" has been extended (without modification) to include data recording sessions. There can be up to 99 sessions on CD. Sessions are delimited by a 4500 sector guide (1 minute) containing the index of the data contained in the session and a 6750 or 2250 (or runout) sector guide at the end. (The first session on disk has 6750 output sectors. Subsequent sessions have the shortest output.) On CDROMs, the outputs are used to store directory information related to the data contained in the session. 7.5.2, , DVD, Digital Versatile Discs or DVDs (formerly called Digital Video Discs) can be considered quadruple density CDs. DVDs spin approximately three times faster than


Page 329:
298, , Chapter 7 / Input/output systems and storage, , CD. DVD pits are about half the size of CD pits (0.4 to 2.13 microns) and the track pitch is 0.74 microns. Like CDs, they come in recordable and non-recordable varieties. Unlike CDs, DVDs can be single-sided or double-sided, called single-layer or double-layer. Single and double layer 120mm DVDs can hold 4.7 GB and 8.54 GB of data, respectively. DVD's 2048-byte sector supports the same three data modes as CDs. With their higher data density and improved access time, DVDs can be expected to eventually replace CDs for long-term data storage and distribution., 7.5.3, Optical Disc Recording Methods, Various technologies are used to allow writing to CDs and DVDs. The cheapest and most widespread method uses a heat sensitive dye. The tint sits between the polycarbonate substrate and the reflective coating of the CD. When hit by emitting laser light, this dye creates a hole in the polycarbonate substrate. This hole affects the optical properties of the reflective layer. Rewritable optical media, such as CD-RWs, replace the reflective and dye coating layers on a CD-R disc with a metal alloy that includes exotic elements such as indium, tellurium, antimony, and silver. In its unaltered state, this metallic coating reflects laser light. When heated with a laser to around 500°C, it undergoes a molecular change that makes it less reflective. (Chemists and physicists call this a phase change.) The coating returns to its original reflective state when heated to just 200°C, allowing the data to be changed as often as desired. (Industry experts have warned that phase-shift CD recording can run for "only" 1,000 cycles.) Low power lasers are then used to read the data. Higher power lasers allow for different and more durable engraving methods. Three of these methods are: • Ablative: A high-powered laser melts a cavity in a reflective metal layer between the disc's protective layers. • Bimetal Alloy – Two metal layers are sandwiched between the protective coatings on the disc surfaces. The laser light fuses the two metal layers, causing a reflectance change in the lower metal layer. Bimetallic alloy WORM disc; the manufacturers claim that this medium will maintain its integrity for 100 years. • Blistering: A single layer of thermally sensitive material is sandwiched between two layers of plastic. When hit by a high-powered laser light, bubbles form in the material, causing a change in reflectance. some CD-ROM drives. The incompatibility stems from the notion that CD-ROMs would be burned (or pressed) in a single session. CD-Rs and CD-RWs, on the other hand, are most useful when they can be incrementally written to floppy disks. The first CD-ROM specification, ISO, 9660, assumed recording of a single session and has no provisions to allow more than 99 sessions on the disc. Mindful that the restrictions of ISO 9660 have been


Page 330:
7.6 / Magnetic Tape, , 299, , inhibiting wider use of their products, a group of leading CD-R/CD-RW manufacturers formed a consortium to address the problem. The result of their efforts is the Universal Disc Format Specification (UDF), which allows an unlimited number of recording sessions for each disc. The key to this new format is the idea of ​​replacing the index associated with each session with a floating index. This floating index, called the virtual allocation table (VAT), is written to the output after the last sector of user data written to disk. As data is added to what was recorded in a previous session, the VAT is rewritten at the end of the new data. This process continues until the VAT reaches the last usable sectors of the disk., , ​​7.6, , MAGNETIC TAPE, Magnetic tape is the oldest and cheapest of all mass storage devices., The first generation magnetic tapes were manufactured from the same material used by analog recorders. A half inch wide (1.25 cm) cellulose acetate film was coated on one side with a magnetic oxide. Twelve hundred feet of this material was wound onto a spool, which could then be hand wound onto a tape drive. These tape drives were about the size of a small refrigerator. The first tapes had capacities of less than 11 MB and required nearly half an hour to read or write the entire reel. Data was written to the tape one byte at a time, creating a track for each bit. An extra stripe was added for parity, making the tape nine stripes wide, as shown in Figure 7.17. The nine track tape used odd parity phase modulation encoding. The parity was odd to ensure that at least one "opposite" flow transition occurred during long series of characteristic zeros (nulls) of the database records. more bytes on each linear inch of tape., Higher-density tapes are not only cheaper to buy and store, but also, HELLO, , Track 0, , Track 4, Track 8, (Parity), , 11111011111, 11111111111, 00000010000, 0011001110 10000000100, 01001011001, 00111011010, 01110000110, 01000000000 , 00000010000, 00111001110, 10000000100, 01001011001, 00111011010, 01110000110, 01000000000, , 11111011111, 11111111111, 00000010000, 00111001110, 10000000100, 01001011001, 00111011010, 01110000110, 01000000000, , Record, , 11111011111 , 111111111111 FIGURE 7.17 A Nine-track tape format


Page 331:
300, , Chapter 7 / Storage and Input/Output Systems, , also allow for faster backups. This means that if a system needs to go offline while its files are being copied, downtime is reduced. Further savings can be realized when data is compressed before it is written to tape. (See Section 7.8.). The price paid for all these innovative tape technologies is that a plethora of standards and proprietary techniques have emerged. Cartridges of various sizes and capacities replaced the nine-track open-reel tapes. Thin-film coatings similar to those found on digital recording tapes replaced oxide coatings. The tapes support various track densities and employ helical or serpentine scan recording methods. Streamer recording methods place bits on the tape in series. Instead of bytes being perpendicular to the edges of the tape, as in the nine-track format, they are written "lengthwise", with each byte aligned parallel to the edge of the tape. A stream of data is written across the tape until the end is reached, then the tape reverses and the next track is written below the first (see Figure 7.18). This process continues until the capacity of the tape is reached. Digital linear tape (DLT) and quarter-inch cartridge (QIC) systems use serpentine recording with 50 or more tracks per tape. Digital audio tape (DAT) and 8mm tape systems use helical scan recording. In other recording systems, the tape passes directly through a fixed magnetic head similar to a tape recorder. DAT systems pass the tape over an inclined rotating drum (suspenders), which has two read heads and two write heads, as shown in Figure 7.19. (During write operations, the read heads verify the integrity of the data shortly after it has been written.) The winch rotates at 2000 RPM in the opposite direction of the belt motion. (This setup is similar to the mechanism used by VCRs.) The two sets of read/write heads write data at 40 degree angles to each other. The data recorded by the two heads overlaps, thus increasing the recording density. Helical scanning systems tend to be slower and are subject to greater wear on the tapes than meandering systems with their simpler tape paths. 1 0 0 1 1 0 1 0 0 0 1 0 1, 1001001101010001111101001101000001001001001, 10011011010010010010010001011111111000101, 100100101010


Page 332:
7.7 / Raid, , 301, , a., , b., , FIGURE 7.19 Helical scan recording, a. The read and write heads on the capstan, b. Pattern of Data Written to Tape Tape storage has been a staple of mainframe environments since its inception. Tapes seem to offer "infinite" storage at bargain prices. They remain the primary means of backing up files and systems on large systems. While the media itself is inexpensive, cataloging and handling costs can be considerable, especially when your tape library consists of thousands of tape volumes. Recognizing this problem, various vendors have produced a variety of robotic devices that can catalog, search, and load tapes in seconds. Robotic tape libraries, also known as tape silos, can be found in many large data centers. The largest robotic tape library systems have capacities of hundreds of terabytes and can load a cartridge on demand in less than half a minute., , 7.7, , RAID, In the 30 years since the introduction of IBM's RAMAC computer, only the largest computers were equipped with disk storage systems. Early disk drives were extremely expensive and took up a large amount of space in proportion to their storage capacity. They also required a tightly controlled environment: too much heat damaged control circuitry, and too little humidity caused static buildup that could upset magnetic flux polarizations on disk surfaces. scientific and academic productivity. A head crash at the end of the business day meant all data entry had to be redone until the last backup, usually the night before. Obviously, this situation was unacceptable and promised to get even worse as everyone increasingly relied on electronic data storage. A permanent remedy was a long time coming. After all, weren't the drives as reliable as we could make them? It turns out that making the drives more reliable was only part of the solution.


Page 333:
302, , Chapter 7 / Input/Output Systems and Storage, , In their 1988 article “A Case for Redundant Arrays of Inexpensive Disks,” David Patterson, Garth Gibson, and Randy Katz of the University of California at Berkeley coined the word ATTACK. . They demonstrated how mainframe disk systems could achieve reliability and performance improvements if they wanted to, by using a number of small "inexpensive" disks (such as those used by microcomputers) instead of the typical large, single disks. (SLEDs) of large systems. . Because cheap is relative and can be misleading, the correct meaning of the acronym is now generally accepted as Redundant Array of, Independent Disks In their article, Patterson, Gibson, and Katz defined five types (called levels) of RAID, each with different performance and reliability characteristics. These original levels were numbered 1 through 5. Later definitions of RAID levels 0 and 6 were recognized. Various vendors have invented other levels, which in the future may also become standards. These are usually combinations of generally accepted RAID levels. In this section, we briefly examine each of the seven RAID levels, as well as some hybrid systems that combine different RAID levels to meet specific performance or reliability goals., 7.7.1, RAID Level 0, RAID Level 0, or RAID - 0, separates blocks of data across multiple disk surfaces so that a record spans sectors across multiple disk surfaces, as shown in Figure 7.20. This method is also called drive spanning, data block interleaving, striping, or disk striping. (Striping is simply the striping of logically sequential data so that the segments are written to multiple physical devices. These segments can be as small as a single bit, as in RAID-0, or blocks of a specified size.) redundancy, of all RAID configurations, RAID-0 provides the best performance, especially if separate controllers and caches are used for each drive. RAID-0 is also very cheap. The problem with RAID-0 is that overall system reliability is only a fraction of what it would be, WEATHER FOR NOVEMBER 14: PARTLY CLOUDY WITH PERIODS OF RAIN. SUNRISE 0608., , WEATH, , ER RE, , PORT, , FOR 1, , 4 NOV, , EMBER, , : PAR, , TLY C, , LOUDY, , WITH, , PERIOD, , DS OF, , RAIN, , DOM, , RISE, , FIGURE 7.20 A record written using RAID-0, data block interleaved, stripe without redundancy


Page 334:
7.7 / Raid, , 303, , expected with a single disk. Specifically, if the array consists of five drives, each with a projected life of 50,000 hours (about six years), the entire system has a projected life of 50,000 / 5 = 10,000 hours (about 14 months). ). As the number of disks increases, the probability of failure increases to the point where it approaches certainty. RAID-0 provides impeccable tolerance as there is no redundancy. Therefore, the only advantage that RAID-0 offers is in performance. His unreliability is absolutely terrifying. RAID-0 is recommended for non-critical data (or data that changes infrequently and is copied regularly) that requires high-speed, low-cost reads and writes and is used in applications such as video or image editing., 7.7. 2, RAID level 1, RAID level 1, or RAID-1 (also known as disk mirroring), provides the best protection against failure of all RAID schemes. Every time data is written, it is mirrored to a second set of drives, called a mirror set or shadow set (as shown in Figure 7.21). This arrangement provides acceptable performance, particularly when the mirror units are synchronized 180° out of rotation with the primary drives. While write performance is slower than RAID-0 (because data must be written twice), reads are much faster because the system can read from the disk, whichever arm is closest to the destination sector. This cuts rotational latency in half on reads. RAID-1 is best suited for transaction-oriented high-availability environments and other applications that require high fault tolerance, such as accounting or payroll., , 7.7.3, , RAID Level 2, The main problem with RAID-1 is that it is expensive: you need twice as many disks to store a given amount of data. A better way might be to dedicate one or more disks to store information about the data on the other disks. RAID-2 defines one of those methods. RAID-2 takes the idea of ​​data striping to the extreme. Instead of writing data, in blocks of arbitrary size, RAID-2 writes one bit per stripe (as shown in Figure 7.22). This requires a minimum of eight surfaces just to accommodate the data. Additional units are used for the error correction information generated by a, , WEATH, , ER RE, , WEATH, , ER RE, , EMBER, , : PAR, , EMBER , , : PAR, , PERIO, , DS OF, , PERIO, , DS OF, , FIGURE 7.21 RAID-1, disk mirror


Page 335:
304, , Chapter 7 / Input/output and storage systems, data drives, , 1, , 1, , 0, , 1, , 0, , 0, , 1, , 0, , Hamming Drives, , 0, , 1, , 0, , FIGURE 7.22 RAID-2, bit-interleaved data fragmentation with a Hamming code, Code, , Hamming. The number of Hamming code units required to correct single-bit errors is proportional to the number of register data units to be protected. If any of the drives in the array fail, the Hamming keywords can be used to rebuild the failed drive. (Of course, the Hamming drive can be rebuilt using the data drives.) Because one bit is written per drive, the entire RAID-2 disk array acts as one large data disk. The total amount of storage available is the sum of the storage capacities of the data drives. All drives, including Hamming drives, must be exactly in sync; otherwise the data will be scrambled and the Hamming units will not work. Hamming code generation is time consuming; therefore, RAID-2 is too slow for most commercial implementations. In fact, most of today's hard drives have built-in CRC error correction. RAID-2, however, forms the theoretical bridge between RAID-1 and RAID-3, both of which are used in the real world. (interleaves) data one bit at a time in all data units. Unlike RAID-2, however, RAID-3 uses only one drive to maintain a single parity bit, as shown in Figure 7.23. The parity calculation can be done quickly, in hardware using an exclusive OR (XOR) operation on each data bit (shown as, bn) as follows (for even parity): Parity = b0 XOR b1 XOR b2 XOR b3 XOR b4 XOR b5 XOR b6 XOR b7, , Equivalently, Parity = b0 + b1 + b2 + b3 + b4 + b5 + b6 + b7 (mod 2).


Page 336:
305, , 7.7 / Raid, , Parity, , W High, Nibble, ASCII, W Low, Nibble, ASCII, , 0, , 1, , 0, , 1, , 0, , 0, , 1, , 1, , 1, , 1, , 0, , 1, , 0, , 0, , 1, , 0, , 1, , 0, , 1, , 0, , Letter, W, E, A, T, H, E, R, , ASCII, 0101, 0100, 0100, 0101, 0100, 0100, 0101, , 0111, 0101, 0001, 0100, 1000, 0101, 0010, , Parity (even), High, Low, Nibble, Nibble, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, , FIGURE 7.23 RAID-3: Bit-interleave data fragmentation with parity disk, , A failed drive is you can rebuild using the same calculation. For example, suppose drive number 6 fails and is replaced. Data on the other seven data drives, drives, and parity drive is used as follows: b6 = b0 XOR b1 XOR b2 XOR b3 XOR b4 XOR b5 XOR Parity XOR b7, RAID-3 requires the same mirroring and synchronization as RAID-2, but it is more cost effective than RAID-1 or RAID-2 because it uses only one drive for data protection. RAID-3 has been used in some commercial systems over the years, but it is not suitable for transaction-oriented applications. RAID-3 is most useful for environments where large blocks of data are read or written, such as image or video processing., 7.7.5, RAID level 4, RAID-4 is another "theoretical" RAID level ( comfortable). RAID-4 would offer poor performance if implemented as Patterson et al. to describe. A RAID-4 array, like RAID-3, consists of a data disk group and a parity disk. Rather than writing data bit-by-bit to all drives, RAID-4 writes data in stripes of uniform size, creating a stripe across all drives, as described in RAID-0. The bits in the data strip are XORed together to create the parity range. You can think of RAID-4 as RAID-0 with parity. However, adding parity results in a substantial performance penalty due to contention with the parity disk. For example, suppose we want to write to Track 3 of a track that spans five drives (four data, one even), as shown in Figure 7.24. First we must read the data that currently occupies Band 3 as well as the parity band. The old data is XORed, with the new data to give the new parity. The data strip is then written along with the updated parity.


Page 337:
306, , Chapter 7 / I/O and storage systems, , Lane 1, , Lane 2, , Lane 3, , Lane 4, , Parity 1–4, , Lane 5, , Lane 6, , Lane 7, , Lane 8 , , Parity 5–8, , Band 9, , Band 10, , Band 11, , Band 12, , Parity 9–12, , PARITY 1 – 4 = (Rank 1) XOR (Rank 2) XOR (Rank 3) XOR (Strip 4), , FIGURE 7.24 RAID-4, Block Interleave Data Striping with a Parity, Disk, , Imagine what happens if there are write requests waiting while we rotate the bits in the parity block, say a write request to strip 1 and one for Stripe 4. If we were using RAID-0 or RAID-1, both pending requests could have been served simultaneously with writing to Stripe 3. So, at parity, the drive becomes a bottleneck , robbing the system for all the potential performance gains offered by multi-drive systems. Some writers have suggested that RAID-4 performance can be improved by optimizing the size of the band. Again, this may be fine for applications (such as voice or video processing) where the data occupies registers of uniform size. However, most database applications involve records of widely varying sizes, making it impossible to find an "optimal" size for any substantial number of records in the database. Due to the expected low performance, there are no commercial implementations of RAID-4., 7.7.6, , RAID level 5. Most people would agree that RAID-4 would provide adequate protection against single drive failures. However, the bottleneck caused by parity drives makes RAID-4 unsuitable for use in environments that require high transaction throughput. Certainly performance would be better if we could do some sort of load balancing by writing parity to multiple disks instead of just one. That's what RAID-5 is all about. RAID-5 is RAID-4 with the parity disks spread across the array, as shown in Figure 7.25. Because a few requests can be served simultaneously, RAID-5 provides the best read performance of all parity models and provides acceptable performance on write operations. For example, in Figure 7.25, the array could service a write to drive 4, stripe 6, simultaneously with a write to drive 1, stripe 7 because these requests involve different sets of disk arms for parity and data. However, RAID-5 requires the most complex drive controller of all levels. Compared to other RAID systems, RAID-5 offers the best protection at the lowest cost. As such, it has been a commercial success, having the largest installed base of any RAID system. Recommended applications include file and application servers, email and news servers, database servers, and web servers.


Page 338:
7.7 / Raid, , Track 1, , Track 2, , Track 3, , Parity 1–3, , Track 4, , Track 5, , Parity 4–6, , Track 6, , Track 7, , Parity 7–9, , Strip 8, , Strip 9, , Parity 10–12, , Strip 10, , Strip 11, , Strip 12, , 307, , PARITY 1 – 3 = (Strip 1) XOR (Strip 2) XOR (Strip 3), , FIGURE 7.25 RAID-5, block interleaved data striping with striping, parity, , 7.7.7, , RAID level 6, Most of the RAID systems just discussed can tolerate a maximum of one drive failure at the time. The problem is that drive failures in large systems tend to occur in clusters. There are two reasons for this. First, disk drives manufactured at approximately the same time reach their expected end of life at approximately the same time. So if you're told your new drives have a useful life of about six years, you can expect problems in the sixth year, possibly concurrent failures. outbreak. A power surge hits all units at once, the weakest failing first, closely followed by the next weakest, and so on. Sequential drive failures like these can last for days or weeks. If they occur within the mean time to repair (MTTR), including call time and round-trip time, a second drive may fail before the first is replaced, rendering the entire array unusable and unusable. high availability must be able to tolerate more than one concurrent drive failure, especially if the MTTR is a large number. If an array can be designed to survive the simultaneous failure of two drives, we effectively double the MTTR. RAID-1 offers this kind of survivability; in fact, as long as a disk and its mirror are not removed, a RAID-1 array can survive the loss of half its disks. RAID-6 offers a cost-effective answer to the problem of multiple drive failures. It does this by using two sets of error correction ranges for each range (or horizontal line) of units. A second level of protection is added using Reed-Soloman error-correcting codes in addition to parity. Having two lanes of error detection per lane increases storage costs. If unprotected data can be stored on N drives, adding RAID-6 protection requires N + 2 drives. Due to two-dimensional parity, RAID-6 provides very poor write performance. A RAID-6 configuration is shown in Figure 7.26. Until recently, there were no commercial implementations of RAID-6. There are two reasons for this. First, there is a considerable overhead penalty involved in generating Reed-Soloman code. Second, it takes twice as many read/write operations to


Page 339:
308, , Chapter 7 / Input/Output and Storage Systems, , Track 1, , Track 2, , Track 3, , P 1–3, , Q 1–3, , Track 4, , Track 5, , P 4– 6 , , Q 4–6, , Throw 6, , Throw 7, , P 7–9, , Q 7–9, , Throw 8, , Throw 9, , P 10–12, , Q 10–12, , Throw 10 , , Strip 11, , Strip 12, , P = Parity, Q = Reed-Soloman, , FIGURE 7.26 RAID-6, Block-interleaved fragmentation with double error protection, update error-correcting codes residing in the disc. IBM was the first (and only, as of now) to bring RAID-6 to market with its RAMAC RVA 2 Turbo disk array. The RVA 2 Turbo array eliminates the RAID-6 write penalty by keeping disk stripe "records" running within the cache on the disk controller. Log,data allows the array to manipulate data one track at a time, calculating all parity and error codes for the entire track before writing it to disk. The data is never overwritten, on the same track it occupied before the update. Instead, the previously occupied stripe is marked as free space, since the updated stripe has been written elsewhere. In some cases, it makes sense to balance high availability with economics. For example, we may want to use RAID-1 to protect drives containing our operating system files, while RAID-5 is sufficient for data files. RAID-0 would be good enough for "fuzzy" files that are used only temporarily during long processing runs, and could potentially reduce the execution time of those runs due to faster disk access. Sometimes RAID schemes can be combined to form a "new" type of RAID, RAID-10 being one such system. Combines RAID-0 striping with RAID-1 mirroring. Although extremely expensive, RAID-10 offers the best possible read performance while offering the best possible availability. Despite its cost, some RAID-10 systems have been brought to market with some success. However, many people have a natural tendency to think that a higher number of something always indicates something better than a lower number of something. For this reason, an industry association called the RAID Advisory Board (RAB) recently reorganized and renamed the RAID systems we just introduced.


Page 340:
7.8 / Data Compression, , 309, , We have chosen to keep the "Berkeley" nomenclature in this book because it is more recognized. Huge new drives fill up quickly with all the things we wish we had put on the old drives. Before long, we're in the market for another set of new records. Few people or companies have access to unlimited resources, so we must make the best use of what we have. One way to do this is to make our data more compact by compressing it before writing it to disk. (In fact, we could even use some kind of compression to make room for a parity or mirror set, adding RAID to our system for free!). Data compression can do more than save space. It can also save you time and help optimize resources. For example, if compression and decompression are done on the I/O processor, less time is required to move data to and from the storage subsystem, freeing up the I/O bus for other work. The advantages of data compression when sending information over communication lines are obvious: less time to transmit and less storage on the host. Although a detailed study is beyond the scope of this book (see the References section for some resources), you should understand some basic data compression concepts to complete your understanding of I/O and data storage. compression hardware, we are often more concerned with how fast a compression algorithm runs and how much smaller a file becomes after the compression algorithm is applied. Compression factor (sometimes called compression ratio) is a statistic that can be quickly calculated and is understandable to anyone interested. There are several different methods used to calculate a compression factor. We'll use the following:, , , , compressed size, Compression factor = 1 −, × 100%, uncompressed size, , For example, suppose we start with a 100KB file and apply some compression to it. After the algorithm finishes, the file is 40KB in size. We can say, , 40 , that the algorithm achieved a compression factor of: 1 − 100, ⫻ 100% = 60%, for this particular file. A thorough statistical study should be done before inferring that the algorithm would always produce 60% file compression. However, we can determine an expected compression ratio for specific messages or message types once we have a bit of theoretical knowledge. The study of data compression techniques is a branch of a larger field of study called information theory. Information theory deals with how information is stored and encoded. It was born in the late 1940s through the work of Claude Shannon, a scientist at Bell Laboratories. shannon established


Page 341:
310, , Chapter 7 / Input/Output Systems and Storage, , a series of information metrics, the most fundamental of which is entropy. Entropy is a measure of the information content in a message. Higher entropy messages contain more information than lower entropy messages. This definition implies that a message with less information content would be compressed to a smaller size than a message with more information content. Determining the entropy of a message requires that we first determine the frequency of each symbol within the message. It's easier to think of symbol frequencies in terms of probability. For example, in the famous output of the program, statement: HELLO WORLD!, the probability that the letter L appears is 123 or 41. In symbols, we have P(L) =, 0.25. To assign this probability to bits in a keyword, we use the base 2 logarithm of this probability. Specifically, the minimum number of bits required to encode the letter L is: ⫺log2 P(L) o 2. The entropy of the message is the weighted average of the number of bits required to encode each of the symbols in the message. If a symbol x appears in a message for P(x), then the entropy, H, of the symbol x is:, H = ⫺P(x) ⫻ log2 P(x), , A mean entropy over all a message is the sum of the weighted probabilities of all the n message symbols:, n, , i=1, , −P(xi ) × log2 P(xi ), , A entropy establishes a lower limit on the number of bits needed to encode a message. Specifically, if we multiply the number of characters in the entire message by the weighted entropy, we obtain the theoretical minimum number of bits needed to encode the message without loss of information. Bits added to this lower bound add no information. Therefore, they are redundant. The goal of data compression is to eliminate redundancy while preserving the information content. We can quantify the average redundancy for each character contained in an encoded message of length n containing keywords of length l by the formula:, n, , i=1, , P(xi ) × li −, , n, , i= 1 , , −P(xi ) × log2 P(xi ), , This formula is most useful when comparing the efficiency of one coding scheme with another for a given message. The code that produces the message, with the least amount of redundancy, is the best code in terms of data compression. (Of course, we also need to consider things like computational speed and complexity, as well as application specifications before we can say that one method is better than another.)


Page 342:
7.8 / Data Compression, , 311, , Finding the entropy and redundancy of a text message is a direct application of the above formula. With a fixed-length code like ASCII or EBCDIC, the sum on the left above is exactly the length of the code, usually 8 bits. In our HELLO WORLD! In the example (using the sum on the right), we find that the average entropy of the symbol is about 3.022. This means that if we were to reach the theoretical maximum entropy, we would only need 3,022 bits per character × 12 characters = 36.26 or 37 bits to encode the entire message. Therefore, the 8-bit ASCII version of the message carries 96 × 37 or 59 redundant bits. the compressed message. Generally, any statistical compression application is a relatively slow and I/O intensive process, requiring two passes to read the file before compressing and writing it. Two passes over the file are needed because the first pass is used to sum the number of occurrences of each symbol. These counts are used to calculate the probabilities of each different symbol in the source message. Values ​​are assigned to each symbol in the source message based on calculated probabilities. The newly assigned values ​​are then written to a file along with the information needed to decode the file. If the encoded file, together with the table of values ​​needed to decode the file, is smaller than the original file, we say that data compression has occurred. Huffman and arithmetic coding are two fundamental methods for compressing statistical data. Variants of these methods can be found in a large number of popular data compression programs. We'll look at each of these in the following sections, starting with Huffman encoding., Huffman encoding, Suppose that after determining the probabilities for each of the symbols in the source message, we create a variable-length code that assigns frequencies , symbols used for the shorter codewords. If the code words are shorter than the information words, it is logical that the resulting compressed message is also shorter. David A. Huffman formalized this idea in an article published in 1952. Interestingly, a form of Huffman encoding, Morse code, has existed since the early 1800s. As you can see in Figure 7.27, the shortest codes represent the most commonly used letters in the English language. These frequencies clearly cannot be applied to all messages. A notable exception would be a telegram from Uncle Zachary on holiday in Zanzibar, asking for a few pounds so he could drink a pint of quinine! Thus, the most accurate statistical model for each message would be individualized. To accurately assign keywords, Huffman's algorithm builds a binary tree using the probabilities of the symbols found in the source message. A


Page 343:
312, , Chapter 7 / Input/output systems and storage, , A •–, B –•••, C –•–•, , J •–––, K –•–, L •–••, , S •••, T –, U ••–, , 1 •––––, 2 ••–––, 3 •••––, , D –••, E •, , M ––, N – •, ​​, 4 ••••–, 5 •••••, , F ••–•, G ––•, H ••••, I ••, , O –––, P •– – • , , V •••–, W •––, X –••–, Y –•––, , Q ––•–, R •–•, , Z ––••, 0 ––– – – , , 8 –––••, 9 ––––•, , 6 –••••, 7 ––•••, , FIGURE 7.27 The international Morse code, , Letter, A, C, D, E , F, G, H, I, L, M, , Count, 5, 1, 1, 10, 1, 10, 8, 7, 5, 1, , Letter, N, O, P, R, S, T , U, Y, <ws>, , Count, 3, 4, 8, 4, 3, 10, 2, 6, 21, , TABLE 7.1 Letter Frequencies, , Tree traversal gives the bit pattern assignments for each symbol in the message. We illustrate this process using a simple lullaby. For clarity, we have translated the rhyme to capital letters without punctuation as follows:, HIGGLETY PIGGLETY POP, THE DOG HAS COME TO THE MOP, THE PIGS IN A HURRY THE GATS IN A RAPID, HIGGLETY PIGGLETY POP, We start by tabulating all occurrences of each letter of the rhyme. We will use the abbreviation <ws> (white space) for the space characters between each word, as well as the new line characters (see Table 7.1). These letter frequencies are associated with each letter using two nodes of a tree. The collection of these trees (a forest) is placed in a row ordered by letter, frequencies like this:


Page 344:
313, , 7.8 / Data compression, , 1, , 1, , 1, , 1, , 2, , 3, , 3, , 4, , 4, , 5, , 5, , 6, , 7, , 8 , , 8, , 10, , 10, , 10, , 21, , C, , D, , F, , M, , U, , N, , S, , O, , R, , A, , L, , Y , , I, , H, , P, , E, , G, , T, , <WS>, , We begin to build the binary tree by joining the nodes with the two lowest frequencies. Since we have a four-way tie for the smallest, we arbitrarily select the two leftmost nodes. The sum of the combined frequencies of these two nodes is two. We create a parent node labeled with this sum and place it back in the forest at the location determined by the label on the parent node, as shown: , 1, , 1, , F, , M, , 2, 1, , 1 , , C, , D, , 2, , 3, , 3, , 4, , 4, , 5, , 5, , 6, , 7, , 8, , 8, , 10, , 10, , 10 , , 21, , U , , N, , S, , O, , R, , A, , L, , Y, , I, , H, , P, , E, , G, , T, , <WS> , , We repeat the process for the nodes that now have the lowest frequencies:, 2, , 2, , 1, , 1, , 1, , 1, , F, , M, , C, , D, , 2, , 3 , , 3, , 4, , 4, , 5, , 5, , 6, , 7, , 8, , 8, , 10, , 10, , 10, , 21, , U, , N, , S, , O , , R, , A, , L, , Y, , I, , H, , P, , E, , G, , T, , <WS>, , The two smallest nodes are the parents of F, M, C , and D. , add up to a frequency of 4, which belongs to the fourth position from the left:, 2, , 3, , 3, , U, , N, , S, , 4, 2, , 2, , 1 , , 1, , 1, , 1, , F, , M, , C, , D, , 4, , 4, , 5, , 5, , 6, , 7, , 8, , 8 , , 10 , , 10, , 10 , , 21, , O, , R, , A, , L, , Y, , I, , H, , P , , E, , G, , T, , <WS> , , The leftmost nodes to add to 5. They are moved to their new position in the tree, as shown:, 3, , 4, 2, , S , , 2, , 1, , 1, , 1, , 1, , F , , M, , C, , D, , , 4, , 4, , O, , R, , , 5, 2, , 3, , U, , N, , 5, , 5, , 6, , 7, , 8 , , 8, , 10, , 10, , 10, , 21, , A, , L, , Y, , I, , H, , P, , E, , G, , T, , <WS>


Page 345:
314, , Chapter 7 / Input/Output Systems and Storage, , The two smallest nodes now add to 7. Create a parent node and move the subtree to the middle of the forest with the other node with frequency 7:, 4, , 4, , O, , R, , 5, 2, , 3, , U, , N, , 5, , 5, , 6, , 7, , A, , L, , Y, , 3, , 7, , 8, , 8, , 10, , 10, , 10, , 21, , I, , H, , P, , E, , G, , T, , <WS>, , 4, , S, , , 2, , 2 , , 1, , 1, , 1, , 1, , F, , M, , C, , D, , The leftmost pair combines to create a parent node with a frequency of 8. Yes, put it back in the forest as shown: 5, 2, , 3, , U, , N, , 5, , 5, , 6, , 7, , A, , L, , Y, , 7, , 3, , 4, , S, , 2, , 8, , 8, , 10, , 10, , 10, , 21, , H, , P, , E, , G, , T, , <WS>, , 8, , I, , 4, , 4, , 2, , O, , R, , 1, , 1, , 1, , 1, , F, , M, , C, , D, , After several more iterations, the complete tree is looks like this:, 110 , 42, 21, 10, T, , 68, 21, , 30, , <WS>, , 11, 5, , 6, , L, , Y, , 38, 16, , 14, 7 , , 7, , 3, , 4, , S, , 2, , 8, , I, , 4, , 4, , 2 , , O, , R, , 1, , 1, , 1, , 1, , F, , M, , C, , D, , 18, 8, , 8, , H, , P, , 20, 10, 5, , 5, 2, , 3, , U, , N, , 10, , 10, , E, , G, , A, , This tree sets the framework for assigning a Huffman value to each symbol in the message. We start by labeling each right branch with a binary 1, so


Page 346:
7.8 / Data compression, , 315, , each left branch with a binary 0. The result of this step is shown below. (Frequency nodes have been removed for clarity.) , 1, , T, 0, , 1, , L, , Y, , 0, 0, , I, , 1, , S, 0, , 1, , 0, , 1, , 0, , 1, , F , , M, , C, , D, , 0, , 1, , O, , R, , 1, , 0, , H, , P, , 1, 0, , 1, , 0, , 1, , E , , G, , A, 0, , 1, , U, , N, , All we have to do now is traverse the tree from its root to each leaf node, tracing the binary digits encountered along the way. The full coding scheme is shown in Table 7.2. As you can see, the symbols with the highest frequencies end up having the fewest bits in their code. The entropy of this message is approximately 3.82 bits per symbol. The theoretical lower bound compression for this message is therefore 110 symbols ⫻ 3.82 bits = 421 bits. This Huffman code represents the message, in 426 bits, or about 1% more than is theoretically needed., Letter, <ws>, T, L, Y, I, H, P, E, G, Code, 01, 000, 0010, 0011, 1001, 1011, 1100, 1110, 1111, , S, , 10000, , Letter, O, R, A, U, N, F, M, C, D, , Code, 10100, 10101, 11011 , 110100, 110101, 1000100, 1000101, 1000110, 1000111, TABLE 7.2 The coding scheme, , Arithmetic coding, , Huffman coding cannot always achieve theoretically ideal compression, because it is restricted to the use of an integer of bits in the resulting code. In the lullaby from the previous section, the entropy of the symbol S is approximately 1.58. An ideal code would use 1.58 bits to encode each occurrence of S.


Page 347:
316, , Chapter 7 / Input/Output Systems and Storage, Symbol, D, E, H, L, O, , Probability, , , , , , , , Range, [0,0 ... 0,083), [0,083 ... 0.167), [0.167 ... 0.250), [0.250 ... 0.500), [0.500 ... 0.667), , Symbol, R, W, <space>, !, , Probability, , , , , , , Interval , [0.667 ... 0.750), [0.750 ... 0.833), [0.833 ... 0.917), [0.917 ... 1.0), , TABLE 7.3 Probability interval mapping for HELLO WORLD! , With Huffman encoding, you are restricted to using at least 2 bits for this purpose. This lack of precision cost us a total of 5 redundant bits in the end. It's not too bad, but it looks like we could do better. Huffman coding falls short of optimization because it is trying to assign probabilities, which are elements of the set of real numbers, to elements of a small subset of the set of integers. We are bound to have problems! So why not create some kind of real-to-real mapping to achieve data compression? In 1963 Norman Abramson conceived of such a mapping, which was later published by Peter Elias. This method of compressing real-to-real data is called arithmetic coding. Conceptually, arithmetic coding divides the row of real numbers in the range, between 0 and 1, using the probabilities in the message symbol set. Symbols that are used more frequently get a larger portion of the range. Going back to the output of our favorite program, HELLO WORLD!, we see that there are 12 characters in this imperative statement. The lowest probability among the symbols is 121 . All other probabilities are multiples of 121. Therefore, we divide the interval 0 ⫺ 1 into 12 parts. Each of the symbols except L and O are assigned the 121 of the rank. L and O get 123 and 122 respectively. Our probability-to-interval mapping is shown in Table 7.3. We encode a message by successively dividing a range of values ​​(starting with 0.0 to 1.0) proportional to the rank assigned to the symbol. For example, if the "current range" is 18 and the letter L gets 14 of the current range as shown above, then to encode the L we multiply 81 by 41, giving 321 as the new current range. If the next character is another L, 321 is multiplied by 14, 1, resulting in 128, for the current range. We proceed in this direction until the entire message is encoded. This process becomes clear after studying the pseudocode below. A pseudocode trace for HELLO WORLD! is given in Figure 7.28., ALGORITHM Arith_Code (Message), HiVal ← 1.0, /* Upper bound of range. */, LoVal ← 0.0, /* Lower limit of range. */, WHILE (more characters to process), Char ← Character of next message, Interval ← HiVal – LoVal, CharHiVal ← Upper range bound for Char, CharLoVal ← Lower range bound for Char, HiVal ← LoVal + Interval * CharHiVal, LoVal ← LoVal + Range * CharLoVal, ENDWHILE, OUTPUT(LoVal), END Arith_Code


Page 348:
7.8 / Data Compression, , Symbol, , Range, , CharLoVal, , CharHiVal, , LoVal, , HiVal, , 0.0, , 1.0, , 317, , H, , 1.0, , 0.167, , 0 .25, , 0.167, , 0.25, , E, , 0.083, , 0.083, , 0.167, , 0.173889, , 0.180861, , L, , 0.006972, , 0.25, , 0.5, , 0.1756320, 7,3,0. 91.7 ,,,, 0.17634602, 0.3, 9 ,, 0.176342525, 0.833 ,, 0.917 ,,,,,, 0.17634266 ,, 0.103, 0.1763, 0.177025, 0.173 . , 0.1763508271 ,, 0.1763513345 ,, o, 0.0000000050735 ,, 0.5 ,, 0.667 ,, 0.1763510808, 0.1763511655 ,, r, 0.00000008473, 0.667 ,, 0.75 ,, 0.17635113733, 0.51111763636310, 0.111113100100100100100100100ITROS. 0.083, 0.1763511391, 0.1763511392 , 0.176351139227, , FIGURE 7.28 Coding of HELLO WORLD! arithmetically coded, LowVal, Hival, Range, Codedmsgerval, H, 0.167 ,, 0.25 ,, 0.083 ,, 0.009351139227 ,, 0.11266432803 ,,,,,,,,,3531467622010 years. 0.412587049161, l,, 0.25, 0.5, 0.250, 0.162587049161,, 0.650348196643,, 0.5, 0.667, 0.167, 0.9002860 0.084,, 0.0672886027,, 0.8010547935, w, 0.75, 0.833, 0.083, 0.051054793,, 0.615117994 ,, 0.5, 0.667, 0.11511799 . , 0.0761, , 0.917, , !, , 0.917, , 1, , 0.083, , 0.000, , 0.000, , Symbol, , CodedMsg, 0.176351139227, , <sp>, , FIGURE 7.29 Decoding trace, HELLO! the message is decoded using the same reverse process as shown in the following pseudocode. A trace of the pseudocode is given in Fig. 7.29.


Page 349:
318, , Chapter 7 / Input/Output Systems and Storage, ALGORITHM Arith_Decode (CodedMsg), Finished ← FALSE, WHILE NOT Finished, FoundChar ← FALSE, , /*, , We could do this search much further, , */, , WHILE NOT FoundChar, /* efficiently in a real implementation. */, PossibleChar ← next codetable symbol, CharHiVal ← Upper range bound for PossibleChar, CharLoVal ← Lower range bound for PossibleChar, IF CodedMsg < CharHiVal AND CodedMsg > CharLoVal THEN, FoundChar ← TRUE, ENDIF, ENDWHILE, , / * Now have a character whose range, /* surrounds the value of the current message., , */, */, , OUTPUT(Matching character), Interval ← CharHiVal – CharLoVal, CodedMsgInterval ← CodedMsg - CharLoVal, CodedMsg ← CodedMsgInterval / Interval , IF CodedMsg = 0.0 THEN, Finished ← TRUE, ENDIF, END WHILE, END Arith_Decode, , You may have noticed that none of the arithmetic encode/decode algorithms contain any error checking. We did this for the sake of clarity. Real implementations must protect against floating point overflow, as well as ensure that the number of bits in the result is sufficient for the entropy of the information. Differences in floating point representations can also cause the algorithm to lose the zero condition when the message is decoded. In fact, an end-of-message character is usually inserted at the end of the message during the encoding process to avoid such problems during decoding., 7.8.2, Ziv-Lempel (LZ) Dictionary Systems, although arithmetic encoding can produce almost ideal compression is even slower than Huffman encoding because of the floating point operations that must be performed during the encoding and decoding processes. If speed is our first concern, we might consider other compression methods, even if it means we can't get perfect code. We would certainly gain considerable speed if we could avoid double-scanning the incoming message. That's what dictionary methods are all about. Jacob Ziv and Abraham Lempel pioneered the idea of ​​building a dictionary during the process of reading information and writing encoded bytes. The output of dictionary-based algorithms contains literals, or pointers to information that was previously placed in the dictionary. Where there is substantial "local" redundancy in the data, such as long strings of spaces or zeros, dictionary-


Page 350:
7.8 / Data Compression, , 319, , techniques work exceptionally well. Although referred to as an LZ dictionary, systems, the name "Ziv-Lempel" is preferred over "Lempel-Ziv" when full author names are used. Ziv and Lempel published their first algorithm in 1977. This algorithm, known as the LZ77 compression algorithm, uses a text window together with a lookahead buffer. The search head buffer contains the information to be encoded. The text window serves as a dictionary. If any characters can be found within the lookahead buffer in the dictionary, the location and length of the text in the window will be written to the output. If the text cannot be found, the unencoded symbol is written with a flag indicating that the symbol will be used as a literal. There are many variants of the LZ77, all based on one basic idea. We will explain this basic version as an example, using another lullaby. We've replaced all spaces with underscores for clarity: , STAR_LIGHT_STAR_BRIGHT_, FIRST_STAR_I_SEE_TONIGHT_, I_WISH_I_MAY_I_WISH_I_MIGHT_, GET_THE_WISH_I_WISH_TONIGHT, For illustrative purposes, we'll use a 32-byte text window and a 16-byte lookahead buffer. (In practice, these two areas often span several kilobytes.) The text is first read into the lookahead buffer. Still having nothing in the text window, the S is placed in the text window and a triplet consisting of: 1. The offset of the text in the text window, 2. The length of the string that was matched, 3. The first symbol in the lookahead buffer following the phrase, Character 0, Text Window, , Lookahead buffer, , S, , 0,0,S, , STAR_LIGHT_STAR_, , outputs, , In the above example, there is no match in the text, therefore the offset and length of the string are zeros. The next character in the lookahead buffer is also unmatched, so it is also written as a literal with index and zero length., ST, , TAR_LIGHT_STAR_B, , 0,0,S 0,0,T, , Continue writing literals until T to appear as the first character in the lookahead buffer. This corresponds to the T that is in position 1 of the text window. The character after the T in the lookahead buffer is an underscore, which is the third element of the trio that is written to the output.


Page 351:
320, , Chapter 7 / Input/Output Systems and Storage, , STAR_LIGHT, , T_STAR_BRIGHT_FI, , 0.0,L 0.0,I 0.0,G 0.0,H 1.1,_, 0.0 , S 0,0,T 0,0,A 0 ,0,R 0,0,_, , The lookahead buffer now offsets two characters. STAR_ is now at the beginning of the lookahead buffer. Has a match at the first character position (position 0) of the text window. We write 0, 5, B because B is the next character, STAR_ in the buffer., STAR_LIGHT_, , STAR_BRIGHT_FIRS, , 0,5,B, 0,0,L 0,0,I 0,0,G 0,0 , H 1,1,_, 0,0,S 0,0,T 0,0,A 0,0,R 0,0,_, , We change the lookahead buffer by six characters and look for a match in R ., we find it in position 3 of the text, writing 3, 1, I., STAR_LIGHT_STAR_B, , FIRST_STAR_RIGHT, , 0.5,B 3.1,I, 0.0,L 0.0, I 0.0,G 0.0 ,H 1.1,A, 0.0 ,S 0.0 ,T 0.0,A 0.0,R 0.0,_, , GHT is now at the beginning of the buffer. Matches four characters in the text, starting at position 7. We write, 7, 4, F., STAR_LIGHT_STAR_BRI, , GHT_FIRST_STAR_I, , 0,5,B 3,1,I 7,4,F, 0,0,L 0 , 0,I 0.0,G 0.0,H 1.1,A, 0.0,S 0.0,T 0.0,A 0.0,R 0.0,_, , After a few more iterations, the text window is almost full:, STAR_LIGHT_STAR_BRIGHT_FIRST_, , STAR_I_SEE_TONIG , , 0,5,B 3,1,I 7,4,F 6,1,R 0,2,_, 0,0,L 0 ,0,I 0.0,G 0.0,H 1.1,A, 0.0, S 0.0,T 0.0,A 0.0,R 0.0,_, , After combining STAR_ with the characters at position 0 of the text, the six characters, STAR_I, leave the buffer and enter the text window. To accommodate the six characters, the text window must be shifted three characters to the right after processing STAR_., STAR_LIGHT_STAR_BRIGHT_FIRST_, , 0,5,I, 0,5,B 3,1,I 7,4,F 6 ,1 ,R 0.2,_, 0.0,L 0.0,I 0.0,G 0.0,H 1.1,A, 0.0,S 0.0,T 0.0 , A 0,0,R 0,0,_, , STAR_I_SEE_TONIG


Page 352:
7.8 / Data Compression, , 321, , After writing the code for STAR_I and switching windows, _S is at the beginning of the buffer. These characters match the text in position 7., R_LIGHT_STAR_BRIGHT_FIRST_STAR_I, , 0.5,I, 0.5,B, 0.0,L, 0.0,S, , _SEE_TONIGHT_I_W, , 7,2,E, 3, 1 ,I 7.4,F 6.1 ,R 0.2,_, 0.0,I 0.0,G 0.0,H 1.1,A, 0.0,T 0.0,A 0 ,0,R 0,0,_, , Continuing like this we reach the end of the text. The last characters to be processed are IGHT. They match the text in position 4. Since there are no characters after IGHT in the buffer, the last triple written is marked with an end-of-file character, <EOF>., _I_MIGHT_GET_THE_WISH_I_WISH_TON, , IGHT, , 4,4, <EOF> , 4.1,E 9.8,W 4.4,T 0.0,O 0.0,N, ..., , ..., , ..., , ..., , ..., , 0.0,S 0 , 0 ,T 0,0,A 0, 0,R 0,0,_, , In this example, a total of 36 triples are written to the output. Using a 32-byte text window, the index needs only 5 bits to point to any text character. Since the lookahead buffer is 16 bytes wide, the longest string we can match is 16 bytes, so we need a maximum of 4 bits to store the length. Using 5 bits for the index, 4 bits for the string length, and 7 bits for each ASCII character, each triple requires 16 bits or 2 bytes. The rhyme contains 103 characters, which would have been stored in 103 uncompressed bytes on disk. The compressed message requires, , 72 , × 100 = 30%, just 72 bytes, giving us a compression factor of 1 − 103. It stands to reason that if we increase the text window, we increase the probability of matching the characters. in the lookahead buffer. For example, the string _TONIGHT appears in position forty-one of the rhyme and then again in position 96. Since there are 48 characters between the two occurrences of _TONIGHT, the first cannot be used as a dictionary entry, since the second occurrence if we use a 32 character text window. Expanding the text window to 64 bytes allows the first _TONIGHT to be used to encode the second and would only add one bit to each encoded triple. However, in this example, a 64-byte expanded text window reduces the output by just two triples: from 36 to 34. Since the text window requires 7 bits for the index, each triple would consist of 17 bits. The compressed message then occupies a total of 17 ⫻, 34 = 578 bits or about 73 bytes. So the larger text window actually costs us a few bits in this example. A degenerate condition occurs when there is no match between the text and the buffer during the compression process. For example, if we had used a 36-character string made up of all the letters of the alphabet and the digits 0 through 9, ABC. 🇧🇷 🇧🇷 XYZ012 . 🇧🇷 🇧🇷 9, we would have no matches in our example. The output of the algorithm would have been 36 triples.


Page 353:
322, , Chapter 7 / Input/Output Systems and Storage, , of the form, 0,0,?. We would have ended up with an output three times the original string, or 200% expansion. Fortunately, exceptional cases like the one just mentioned rarely occur in practice. Variants of the LZ77 can be found in several popular compression utilities, including the ubiquitous PKZIP. IBM's RAMAC RVA 2 Turbo disk array implements LZ77 directly in the disk control circuitry. This compression occurs at hardware speeds, making it completely transparent to users. Dictionary-based compression has been an active area of ​​research since Ziv, and Lempel published his algorithm in 1977. A year later, Ziv and Lempel improved on their work when they published their second dictionary-based algorithm, now known as LZ78. LZ78 differs from LZ77 in that it removes the fixed size text window limitation. Instead, it creates a special data tree structure called a trie, which is populated with tokens as they are read from the input. (Each internal tree node can have as many children as necessary.) Instead of writing characters to disk like LZ77, LZ78 writes pointers to tokens in the trie. The entire trie is written to disk after the encrypted message and is read first before the message is decrypted. (See Appendix A for more information on attempts.) 7.8.3, GIF Compression, Efficiently managing token intents is the biggest challenge for LZ78 implementations. If the dictionary becomes too large, the pointers may be larger than the original data. Various solutions to this problem have been found, one of which has been the source of heated debate and legal action. In 1984, Terry Welsh, an employee of Sperry Computer Corporation (now Unisys), published a paper describing an effective algorithm for managing an LZ78-style dictionary. Their solution, which involves controlling the sizes of the tokens used in the trie, is called LZW data compression, after Lempel-Ziv-Welsh. LZW compression is the fundamental algorithm behind the interchange of graphics, format, GIF (pronounced " jiff"), developed by CompuServe engineers and popularized by the World Wide Web. Because Welsh developed his algorithm as part of his official duties at Sperry, Unisys exercised its right to patent it. Later, he requested small royalties every time a GIF is used by services, providers, or high-volume users. LZW is not specific to GIF. It is also used in the TIFF image format, other compression programs (including Unix Compress), various software applications (such as Postscript and PDF), and hardware devices (mainly modems). Not surprisingly, Unisys' claim for royalties was not well received in the web community, and some sites were awash with promises to boycott GIFs forever. Cooler heads simply worked around the problem, producing better (or at least different) algorithms, one of which is PNG, Portable, Network Graphics. be, but they certainly accelerated its development. In a matter of months, in 1995, PNG went from being a draft to an internationally accepted standard. Surprisingly, the PNG specification has seen only two minor revisions since 2002.


Page 354:
7.8 / Data Compression, , 323, , PNG offers many improvements over GIF, including: • User-selectable compression modes: "Faster" or "Better" on a scale of 0 to 3 respectively, • Compression ratios improved over GIF, typically 5% to 25% better, • Error detection provided by a 32-bit CRC (ISO 3309/ITU-142), • Faster initial display in progressive display mode, • An open international standard, Freely available and sanctioned by the World Wide Web Consortium (W3C), like many other organizations and companies, PNG uses two levels of compression: First, the information is reduced using Huffman encoding. The Huffman code is followed by LZ77 compression using a 32KB text window. GIF can do one thing that PNG can't: support multiple images in the same file, giving the illusion of animation (albeit rigidly). To correct this limitation, the Internet community produced the Multi-Image Network Graphics (or, MNG, pronounced "ming") algorithm. MNG is an extension to PNG that allows multiple images to be compressed into a single file. These files can be of any type, such as grayscale, scale, true color, or even JPEG (see the next section). MNG version 1.0 was released in January 2001, with refinements and improvements. PNG and MNG are freely available (with source code!) On the internet, one tends to think that it's only a matter of time before the GIF problem becomes obsolete., 7.8.4, , JPEG compression, When we see an image graphic, like a photograph on a printed page or on a computer screen, what we're really looking at is a collection of tiny dots called pixels, or picture elements. Pixels are especially noticeable in media with low image quality, such as newspapers and comics. When the pixels are small and close together, our eyes perceive a "good quality" image. “Good quality”, being a subjective measure, starts at around 300 pixels per inch (120 pixels/cm). On the edge, most people would agree that a 1600 pixels per inch (640 pixels/cm) image is "good" if not great. The pixels contain the binary encoding of the image in a form that can be interpreted by hardware. screen and printer. Pixels can be encoded using any number of bits. If, for example, we are producing a black and white line drawing, we can do it using one bit per pixel. The bit is black (pixel = 0) or white (pixel = 1). If we decide that we prefer to have a grayscale image, we need to think about how many shades of gray will suffice. If we want eight shades of gray, we need three bits per pixel. Black would be 000, white 111. Anything in between would be some shade of gray. Colored pixels are produced by a combination of red, green, and blue components. If we want to render an image using eight different tones, each red, green, and blue, we must use three bits for each color component. So we need nine bits per pixel, giving 29 × 1 different colors. Black would still be everything.


Page 355:
324, , Chapter 7 / Input/Output Systems and Storage, , zeros: R = 000, G = 000, B = 000; white would still be all ones: R=111, G=111, B=111. “Pure” green would be R=000, G=111, B=000. R=011, G=000, B,=101 us it would turn purple. Yellow would be produced by R=,111, G=111, B=000. The more bits we use to represent each color, the closer we get to the "true color" we see around us. Many computer systems approximate true color by using eight bits per color, generating 256 different shades of each. These 24-bit pixels can display around 16 million different colors. Let's say we want to store a 4 in × 6 in (10 cm × 15 cm) photographic image in such a way that it gives us "fairly" good quality. when viewed or printed. Using 24 bits (3 bytes) per pixel at 300 pixels per inch, we need 300 × 300 × 6 × 4 × 3 = 6.48 MB to store the image. If this 4 x 6-inch photo is part of a sales brochure posted on the Web, we risk losing customers with dial-up modems as soon as they realize 20 minutes have passed and they still haven't finished downloading. the brochure. At 1600 pixels per inch, storage increases to just under 1.5GB, which is virtually impossible to download and store. JPEG is a compression algorithm specifically designed to solve this problem. Fortunately, photographic images contain a considerable amount of redundant information. Also, some of the theoretically high entropy information is generally of no consequence for image integrity. With these ideas in mind, the ISO and ITU jointly commissioned a group to formulate an international standard for image compression. This group is called the Joint Photographic, Experts Group, or JPEG, pronounced "jay-peg." The first JPEG standard, 109281, was finalized in 1992. Major revisions and improvements to this standard began in 1997. The new standard is called JPEG2000 and was finalized in December 2000. JPEG is a collection of algorithms that provide excellent compression to expense of some loss of image information. Up to this point, we have described lossless data compression: the data restored from the compressed stream is exactly the same as it was before compression, barring any computer or media errors. Sometimes we can get much better compression if a little loss of information can be tolerated. Photographic images lend themselves especially well to lossy data compression due to the human eye's ability to compensate for small imperfections in graphic images. Of course, some images contain real information and should be subjected to lossy compression only after "quality" has been carefully defined. Medical diagnostic images, such as x-rays and electrocardiograms, belong to this class. , however, are the types of images that can lose considerable "information", while maintaining their illusion of visual "quality". One of the most important features of JPEG is that the user can control the amount of information lost by providing parameters before the image is compressed. Even at 100% fidelity, JPEG produces remarkable compression. At 75%, the "missing" information is barely noticeable and the image file is a small fraction of its original size. Figure 7.30 shows a grayscale image compressed using different quality parameters. (The original 7.14 KB bitmap was used as input with the declared quality parameters.)


Page 356:
7.8 / Data compression, , , 325, , FIGURE 7.30 JPEG compression using different quantizations on a 7.14 KB bitmap file, , As you can see, JPEG loss becomes problematic only when the factors are used lower quality. You will also notice how the image takes on the appearance of a crossword puzzle when it is at its maximum compression. The reason for this becomes clear once you understand how JPEG works. When compressing color images, the first thing that JPEG does is convert the RGB components into the luminance and chrominance domain, where luminance is the brightness of the color and chrominance is the color itself. The human eye is less sensitive to chrominance than to luminance, so the resulting code is constructed in such a way that the luminance component is less likely to be lost during subsequent compression steps. Grayscale images do not require this step. The image is then divided into square blocks of eight pixels on each side. These 64-pixel blocks are converted from the spatial domain (x, y) to the frequency domain, (i, j ) using a Discrete Cosine Transform (DCT) as follows:, DCT(i, j) =, , , , , , 7, 7 , , 1, (2x + 1)iπ, (2y + 1) jπ, C( i ) × C(j), pixel(x, y) × cos, × cos, 4, 16, 16, x=0 y=0, , where, C(a) =, , , , √ 1, 2, , 1, , if a = 0, otherwise, the result of this transformation is an 8 × 8 matrix of integers ranging from ⫺1024 to 1023. The pixel at i = 0, j = 0 is called the DC coefficient and is a weighted average of the values ​​of the 64 pixels in the original block. the other 63


Page 357:
326, , Chapter 7 / Input/Output Systems and Storage, , the values ​​are called AC coefficients. Due to the behavior of the cosine function, (cos 0 = 1), the resulting frequency matrix, the (i, j) matrix, has a concentration of low numbers and zeros in the lower right corner. The largest numbers are collected towards the upper left corner of the matrix. This pattern lends itself well to many different compression methods, but we're not quite ready for that step yet. Before compacting the frequency matrix, each value in the matrix is ​​partitioned by its corresponding element in a quantization matrix. The purpose of the quantization step is to reduce the 11-bit output of the DCT to an 8-bit value. This is the JPEG lossy step, the degree of which is user selectable. The JPEG specification provides several quantization matrices, any of which may be used at the discretion of the implementer. All of these default matrices ensure that the frequency, matrix elements that contain the most information (those in the upper left corner) lose the least amount of information during the quantization step. After the quantization step, the frequency matrix is ​​sparse (contains more zero than non-zero entries) in the lower right corner. Large blocks of identical values ​​can be easily compressed using run-length encoding. Run-length encoding is a simple compression method where, instead of encoding XXXXX, we encode 5,X to indicate a sequence of five X's. When we store 5,X instead of XXXXX, we save three bytes, not including delimiters that the method may require. Clearly the most effective way to do this is to try and align everything so that we get as many zero values ​​as possible that are adjacent to each other. JPEG achieves this by doing a zigzag sweep of the frequency matrix. The result of this step is a one-dimensional array (a vector) that typically contains a long string of zeroes. Figure 7.31 illustrates how zigzag scanning works. Each of the CA coefficients in the array is compressed using run-length encoding. The DC coefficient is coded as the arithmetic difference between its original value and the DC coefficient of the previous block, if any. Huffman coding is the preferred method due to several patents on arithmetic algorithms. Figure 7.32 summarizes the steps just described for the JPEG algorithm. Decompression is achieved by reversing this process., JPEG2000 offers several improvements over the 1997 JPEG standard., The underlying mathematics is more sophisticated, allowing greater flexibility with respect to quantization parameters and the incorporation of multiple images, 0 1 2 3 4 5 6 7, , Matrix input, , 0, 1, 2, 3, 4, 5, 6, 7, , Vector output, [ (0,0), (0,1), (1, 0) , (2.0 ), (1.1), . 🇧🇷 🇧🇷 (6,7), (7,6), (7,7) ], , FIGURE 7.31 A zigzag sweep of a JPEG frequency matrix


Page 358:
7.8 / Data Compression, , User Input, Quality, Parameter, , 327, , Quantization, Matrix, , 01234567, 0, 1, 2, 3, 4, 5, 6, 7, , Pixel, Block, , 0123456, , Discrete , Cosine, Transform, , Frequency, Matrix, , 58 59 60 61 62 63, , Zigzag, Scan, , 1 x 64 Vector, , Run, Length, Encoding, , n n-1 n- 2, , 210, , 1 x n Vector, (n < 64), , Huffman or, Arithmetic, Encoding, , Quantization, , Quantized, Matrix, , Compressed, Image file, , FIGURE 7.32 The JPEG compression algorithm, , in a JPEG file . One of the most prominent features of JPEG2000 is its ability to allow user-defined regions of interest. A region of interest is an area within an image that serves as the focal point and would not be subject to the same degree of lossy compression as the rest of the image. If, for example, you had a picture of a friend standing on the shore of a lake, you would tell JPEG2000 that your friend is a region of interest. The lake bottom and trees could be greatly compressed before losing definition. Your friend's image, however, would remain clear, if not enhanced, with a lower quality background. JPEG2000 replaces JPEG's discrete cosine transform with a wavelet transform. (Wavelets are a different way of sampling and encoding an image or any set of signals.) Quantization in JPEG2000 uses a sinusoidal function instead of the simple division of the previous version. These more sophisticated mathematical manipulations require much more processor resources than JPEG, causing noticeable performance degradation. Until the performance issues are resolved, JPEG2000 will only be used when the value of its unique features outweighs the cost of increased computational power. (Similarly, JPEG2000 will be used when the reduced cost of computing power makes its slow performance irrelevant.)


Page 359:
328, , Chapter 7 / Input/Output and Storage Systems, , CHAPTER SUMMARY, This chapter provides a broad overview of many aspects of computing, input/output, and storage systems. You learned that different classes of machines require different I/O architectures. Large systems store and access data in fundamentally different ways than the methods used by smaller computers. This chapter illustrated how data is stored on a variety of media, including magnetic tape, disk, and optical media. Your understanding of the operations of magnetic disks will be particularly useful if you are able to analyze disk performance in the context of programming, system design, or troubleshooting. Our discussion of RAID systems should help you understand how RAID can provide improved performance and higher availability for the systems we all depend on. You've also seen some of the ways data can be compressed. Data compression can help save disk and tape usage, as well as reduce transmission time in data communications. An understanding of the details of these compression methods will help you select the best method for a specific application. Our brief introduction to information theory ideas can help you prepare for future work in computer science. We hope that, in the course of our discussions, you have gained an appreciation of the tradeoffs involved in virtually every decision in the system. You have already seen how often we must choose between "better" and "faster" and "faster" and "cheaper" in many of the areas we have just studied. As you take the lead on systems projects, you need to make sure your customers understand these pros and cons as well. It often takes the tact of a diplomat to fully convince your clients that there is no such thing as a free lunch. FURTHER READING You can learn more about Amdahl's Law by reading his original article from 1967). Hennessey and Patterson (1996) provide additional coverage of Amdahl's Law. Rosch (1997) contains a great deal of detail relevant to many of the topics described in this chapter, although his focus is primarily on small computer systems. It is well organized and the style is clear and readable. Rosch (1997) also provides a good overview of CD storage technology. More comprehensive coverage, including the CD-ROM's fundamentals of engineering physics, mathematics, and electrical engineering, can be found in Stan (1998) and Williams (1994). Patterson, Gibson, and Katz (1988) provide the reference document for the RAID architecture. , IBM Corporation hosts by far the best website for detailed technical information. IBM is unique in making prodigious amounts of excellent documentation available to all interested parties. Their home page can be found at


Page 360:
References, , 329, , www.ibm.com. IBM also has several websites dedicated to specific areas of interest, including storage systems (www.storage.ibm.com), as well as its servers, product lines (www.ibm.com/eservers). IBM's research and development pages contain the latest information relevant to emerging technologies (www.research.ibm.com). High-quality scholarly research journals can be found through this website at www.research.ibm.com/journal. cited theoretical vision. A more complete treatment -with source code- can be found in Nelson and Gailly, (1996). With their clear and informal writing style, Nelson and Gailly make learning the arcane art of data compression a truly enjoyable experience. A wealth of information relevant to data compression can also be found on the Web. Any good search engine will direct you to hundreds of links when searching for any of the key data compression terms presented in this chapter. If you want to delve into this heady area, you can start with Vetterli and Kovac̆ević (1995). This book also contains a comprehensive overview of image compression, including JPEG, and of course the wavelet theory behind JPEG2000. Fiber Channel, SAN and HIPPI. So far, few books can be found that describe Fiber Channel or SANS. Clark (1999) and Thornburgh (1999) provide good discussions of this topic. An industry consensus group called the National Committee for Information Technology Standards (NCITS, formerly X3 Accredited Standards Committee, Information, Technology) maintains a comprehensive web page at www.t11.org where you can find the latest draft of SCSI -3. HIPPI specifications can be found on the HIPPI website at www.hippi.org. Rosch (1997) contains a wealth of information on SCSI and other buses, architectures, and how they are implemented in small computer systems. REFERENCES, Amdahl, George M. "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities." Proceedings of the AFIPS Spring 1967 Joint Computer Conference. Vol. 30, (Atlantic City, NJ, April 1967), pp. 483–485., Clark, Tom. Designing Storage Area Networks: A Practical Guide to Implementing Fiber Channel, SANS. Reading, MA: Addison-Wesley Longman, 1999., Hennessey, John L. and Patterson, David A. Computer Architecture: A Quantitative Approach. San, Francisco, CA: Morgan Kaufmann Publishers, 1996., Lelewer, Debra A. and Hirschberg, Daniel S. "Data Compression." ACM Computing Surveys 19:3, 1987, pp. 261–297., Lesser, M. L., and Haanstra, J. W. “The Random Access Memory Accounting Machine: I. System, IBM Organization 305.” IBM Magazine of Research and Development 1:1. January 1957., reprinted in vol. 44, no. 1/2, January/March 2000, pp. 6–15.


Page 361:
330, , Chapter 7 / Input/Output Systems and Storage, Nelson, Mark and Gailly, Jean-Loup. The Data Compression Book, 2nd ed., New York: M&T, Books, 1996., Noyes, T. and Dickinson, W. E. “The Random Access Memory Accounting Machine: II. System, Organization IBM 305.” IBM Magazine of Research and Development 1:1. January 1957., reprinted in vol. 44, no. 1/2, January/March 2000, pp. 16–19., Patterson, David A., Gibson, Garth, and Katz, Randy. "A Case for Inexpensive Redundant Arrays of Disks (RAID)". Proceedings of the ACM SIGMOD Conference on Data Management, June 1988, pp. 109–116., Rosch, Winn L. The Winn L. Rosch Hardware Bible. Indianapolis: Sams Publishing, 1997., Stan, Sorin G. The CD-ROM drive: a brief overview of the system. Boston: Kluwer Academic Publishers, 1998. Thornburgh, Ralph H. Fiber Channel for Mass Storage. (Hewlett-Packard Professional Books, series.) Upper Saddle River, NJ: Prentice Hall PTR, 1999., Vetterli, Martin and Kovac̆ević, Jelena. Wavelet and subband coding. Englewood Cliffs, NJ: Prentice Hall PTR, 1995., Welsh, Terry. "A technique for high performance data compression". IEEE Computer 17:6, Jun 1984, pp. 8–19., Williams, E. W. The CD-ROM and optical recording systems. New York: Oxford University, Press, 1994., Ziv, J. and Lempel, A. “A Universal Algorithm for Sequential Data Compression”. IEEE Transactions on Information Theory 23:3, May 1977, pp. 337–343., Ziv, J., and Lempel, A. "Compression of Individual Sequences via Variable-Rate Coding." IEEE, Transactions on Information Theory 24:5, September 1978, pp. and why is it important in bus I/O technology? 4. Name three types of durable storage. 5. Explain how scheduled I/O differs from interrupt-driven I/O. 6. What is the survey? 7. How are address arrays used in interrupt-driven I/O? 8. How does direct memory access (DMA) work? 9. What is a bus master? 10 11. Why does DMA require cycle stealing? 11 What does it mean when someone refers to I/O as burst? 12. What is the difference between channel I/O and interrupt controlled I/O?


Page 362:
Review of Essential Terms and Concepts, , 331, , 14. What is multiplexing?, 15. What distinguishes an asynchronous bus from a synchronous bus?, 16. What is settling time and what can be done to it? about it?, 17. Why are magnetic disks called direct access devices?, 18. Explain the relationship between disks, tracks, sectors and groups., 19. What are the main physical components of a hard disk drive? , 21. What is seek time?, 22. What is the sum of rotation delay and called seek time?, 23. What is a file allocation table (FAT) and where is it located in a floppy disk?, 24. Why? In what order of magnitude does a hard drive spin more than a floppy disk? 25. What is the name of the robotic optical disc library devices? paper or microfiche?, 27. Magnetic disks store bytes by changing the polarity of a magnetic medium. How do optical discs store bytes?, 28. What is the difference between the format of a CD that stores music and the format of a CD that stores data? How are the formats alike?, 29. Why are CDs especially useful for long-term data storage?, 30. Do CDs that store data use recording sessions?, 31. How do DVDs store Much more data than DVDs? Ordinary CDs? 32 . Name three methods of recording WORM disks., 33. Why is magnetic tape a popular storage medium?, 34. Explain how serpentine recording differs from helical scan recording., 35. What are two tape formats? using streamer writing?, 36. Which RAID levels offer the best performance? 37. Which RAID levels offer the best economy while providing adequate redundancy? 38. What RAID level does a mirror set (shadow) use? 39. What are hybrid RAID systems? 40 . Who was the founder of the science of information theory? 41. What is information entropy and how is it related to information redundancy? 42. Name one advantage and one disadvantage of statistical coding. ., 44. Into which class of data compression algorithms does the LZ77 compression algorithm fall?


Page 363:
332, , Chapter 7 / Input/Output Systems and Storage, , EXERCISES, ◆, , 1. Your friend has just bought a new personal computer. He tells her that his new system,, 2.,, ◆,, 3., 4., 5., ◆, ◆, ◆,, 6., 7.,, 8.,, * 9.,, 10. , ◆, , 11., , runs at 1 GHz, which makes it three times faster than her old 300 MHz system. What would you say to her? Assume that the processing load during the day consists of 60% CPU activity and 40% disk activity. Your customers complain that the system is slow. After doing some research, she discovers that she can upgrade her drives for $8,000 to make them 2.5 times faster than they currently are. She also learned that she can upgrade her CPU to be 1.4 times faster for $5,000. a) Which would she choose to produce the best performance improvement for the least amount? if she doesn't care about the money but she wants a faster system?, c) What is the break-even point for upgrades? That is, what price would you charge for both upgrades to equalize the cost and the performance improvement? Types of I/O architectures. Where are each of these typically used and why are they used there? An interrupt-driven I/O CPU is busy servicing a disk request. While the CPU is in the middle of servicing the disk, another I/O interrupt occurs. a) What happens next? b) Is it a problem? c) If not, why not? If so, what can be done about it? Why are clock signals supplied to the I/O buses? If an address bus is to be able to address eight devices, how many controllers will be needed? What if each of those devices also needs to respond to the controlling I/O device? We note that I/O buses do not need separate address lines. Construct a timing diagram similar to Figure 7.7 that represents the handshake between an I/O controller and a disk controller for a write operation. (Hint: You'll need to add a control signal.) If each interval shown in Figure 7.7 is 50 nanoseconds, how long would it take to transfer 10 bytes of data? Design a bus protocol, using as many control lines as you need, that will reduce the time it takes for this transfer to occur. What happens if the address lines are removed and the data bus is used for addressing? (Hint: An additional control line may be required.) Define the terms seek time, rotation delay, and transfer time. Explain your relationship. Why do you think the term random access device is a misnomer for disk drives?


Page 364:
Exercises, , 333, , 12. Why do different systems place disk directories at different track locations on the disk, ,? What are the advantages of using each location you cited?, ◆, , 13. Check the average latency rate quoted in the disk specification in Figure 7.11. Why is the calculation divided by 2? a data transfer rate of 6.0 MB per second when reading from disk and 11.1 MB per second when writing to disk. Why are these numbers different? 16. Do you trust the MTTF numbers of the drives? Explain., 17. Suppose a disk drive has the following characteristics:, , • 4 surfaces, • 1024 tracks per surface, • 128 sectors per track, • 512 bytes/sector, • Track-to-track seek time of 5 milliseconds, • Rotation speed of 5000 RPM., ◆, , a) What is the capacity of the drive?, , ◆, , b) What is the access time?, , 18. Suppose that a disk drive has the following characteristics:, , • 5 surfaces, • 1024 tracks per surface, • 256 sectors per track, • 512 bytes/sector, • Track-to-track seek time of 8 milliseconds, • Rotational speed of 7500 RPM., a) What is the capacity of the disk? ?, b) What is the access time?, c) Is this disk faster than the one described in question 17? Explain., 19. What are the advantages and disadvantages of having a small number of sectors per disk cluster?, *20. Please suggest some ways in which the performance of a 1.44 MB floppy disk could be improved. 21. What is the maximum number of root directory entries on a 1.44 MB floppy disk? Why?, 22. How does the organization of an optical disk differ from that of a magnetic disk?, 23. Discuss the difference between how DLT and DAT record data. Why would you say one is better than the other?


Page 365:
334, , Chapter 7 / Storage and I/O Systems, 24. How do the error correction requirements of an optical document storage system differ from the error correction requirements of the same information stored in the form of text? What are the advantages of having different levels of error correction for optical storage devices?, 25. You need to archive a large amount of data. You are trying to decide whether to use tape or optical storage methods. What are the characteristics of this data and how is it used that will influence your decision?, *26. A certain high-performance computer system has been serving as an e-commerce web server. This system supports $10,000 per hour in gross trading volume. Net income per hour is estimated at $1,200. In other words, if the system goes down, the company will lose $1,200 every hour until repairs are made. Also, all data on the damaged drive would be lost. Some of that data could be recovered from last night's backups, but the rest would be gone forever. It is conceivable that a poorly timed drive failure could cost your company hundreds, thousands of dollars in immediate lost revenue and untold thousands in permanently lost business. The fact that this system does not use any kind of RAID is annoying to him. While your primary concern is data integrity and system availability, other members of your group are obsessed with system performance. They feel that more revenue would be lost in the long run if the system slows down after installing RAID. They specifically stated that a system with RAID running at half the speed of the current system would result in gross dollar per hour revenue falling to $5,000 per hour. In total, 80% of the system's e-commerce activity involves a transaction database. Database transactions consist of 60% reads and 40% writes. On average, the disk access time is 20 ms. The drives in this system are nearly full and nearing the end of their expected useful life, so new ones should be ordered soon. You think now is a good time to try installing RAID, even if you have to buy additional drives. The right drives for your system cost $2000 per 10 gigabyte spindle. The average access time of these new disks is 15ms with an MTTF of 20,000 hours and an MTTR of 4 hours. You have projected that you will need 60 gigabytes of storage to accommodate your existing data, as well as expected data growth over the next 5 years. (All drives will be replaced.) a) Are the people who are against adding RAID to the system correct in their claim that 50% slower drives will drop revenue to $5,000 per hour? Justify your answer. b) What would be the average disk access time in your system if you decide to use RAID-1? c) What would be the average disk access time in your system using a RAID-5 array with two sets of four disks if 25% of the database transactions must wait, behind one transaction for the disk to remain? free?, d) Which configuration has a better cost justification, RAID-1 or RAID-5? Explain your answer.


Page 366:
7A.2 / Data transmission modes, , 335, , 27. a) Which of the RAID systems described in this chapter cannot tolerate a single disk failure?, b) Which can tolerate more than one disk failure? simultaneous?, 28 Calculate the compression factors for each of the JPEG images in Figure 7.30. 29. Create a Huffman tree and assign Huffman codes to the rhyme "Star Bright" used in section 7.8.2. Use <ws> for whitespace instead of underscores. 30. Complete the LZ77 data compression illustrated in Section 7.8.2. 31. JPEG is a poor choice for compressing line drawings, like the one shown in Figure 7.30. Why do you think this is the case? What other compression methods do you suggest? Justify your choice(s)., 32. a) Mention one advantage of Huffman coding over LZ77., b) Mention one advantage of LZ77 coding over Huffman coding., c) Which is better?, 33 State one feature of PNG that you can use to convince someone that PNG is a better algorithm than GIF. /O, systems you will encounter throughout your career. A general understanding of these systems will help you decide which methods are best for which applications. More importantly, you will learn that modern storage systems are becoming systems in their own right, with architectural models that are different from the internal architecture of a host computer system. Before delving into these complex architectures, we begin with an introduction to the modes of data transmission. or sending one byte at a time. These are called, respectively, serial and parallel modes of communication. Each transmission mode establishes a specific communication protocol between the host and the device interface. We will discuss some of the more important protocols used in storage systems in the following sections. Many of these ideas extend to the area of ​​data communications (see Chapter 11).


Page 367:
336, , Chapter 7 / Storage and I/O Systems, , 7A.2.1, , Parallel Data Transmission, Parallel communication systems function analogously to the operation of a host memory bus. They require at least eight data lines (one for each bit) and a sync line, sometimes called a strobe. signal frequency and cable quality. At greater distances, the signals in the cable begin to weaken due to the internal resistance of the conductors. The loss of electrical signal over time or distance is called attenuation. The problems associated with dimming become clear by studying an example. Figure 7A.1 presents a simplified timing diagram for a parallel printer interface. The lines marked nStrobe and nAck are strobe and acknowledgment signals that activate when there is low voltage. The Busy and Data signals are activated when high voltage is carried. In other words, Busy and Data are positive logic signals, while nStrobe and nAck are negative logic signals. Arbitrary reference times are listed at the top of the diagram, from t0 to t6. The difference between two consecutive times, ⌬t, determines the speed of the bus. Typically, ⌬t will vary between 1 and 5 ms. Figure 7A.1 illustrates the handshake that occurs between a printer interface circuit (on a host) and the host interface of a parallel printer. The process begins when one bit is placed on each of the eight data lines. The busy line is then checked to see if it is low. As soon as the busy line is low, the strobe is turned on to let the printer know there is data on the data lines. Once the printer detects the strobe, it reads the data lines while the busy signal increases to prevent the host from putting more data on the data lines. Once the printer has, , t0, , t1, , t2, , t3, , t4, , t5, , t6, , nStrobe, , Busy, nAck, , Data, , FIGURE 7A.1 A simplified timing diagram for a parallel printer , The data sign represents eight different lines. Each of these lines can be bullish or bearish (1 or 0 signal). The signals on these lines are meaningless (shaded in the diagram) before the nStrobe signal turns on and after the nAck turns on.


Page 368:
7A.2 / Modes of data transmission, , 337, reads the data lines, reduces the busy signal and activates the acknowledgment signal, nAck, to inform the host that data has been received. Note that although the data signals are recognized, there is no guarantee that they are correct. Both the host and the printer assume that the signals received are the same as those sent. Over short distances, this is a pretty safe assumption. At greater distances, this may not be the case. Let's say the bus runs on a voltage of plus or minus 5 volts. Any value between 0 and 5 volts positive is considered "high" and any value between 0 and 5 volts negative is considered "low." The host places voltages of plus and minus 5 volts on the data lines, respectively, for each 1 and 0 of the data byte. Then set the strobe line to minus 5 volts. In a "light" dimming case, the printer may take time to detect the nStrobe signal or the host may take time to detect the nAck signal. This kind of slowdown is barely noticeable when printers are involved, but it's excruciatingly slow on a parallel disk interface where we normally expect an instant response. On a very long cable, we could end up with totally different voltages at the printer end. By the time the signals arrive, "high" could be 1 volt positive and "low" could be 3 volts negative. If 1 volt is not far enough above the threshold voltage for a logic 1, we may end up with a 0 where a 1 should be, scrambling the output in the process. Also, over long distances, it is possible for the strobe signal to reach the printer before the data bits. The printer then prints whatever is on the data lines at the time it detects the nStrobe assertion. (The extreme case is when a text character is mistaken for a control character. This can lead to remarkably strange printer behavior and the death of many trees.) transmission moves one byte at a time along a data bus. One data line is required for each bit, and the data lines are activated by pulses on a separate strobe line. Serial data transmission differs from parallel data transmission in that only one conductor is used to send data, one bit at a time, as pulses on a single data line. Other drivers may be provided for special signals as defined in particular protocols. RS-232-C is one of those serial protocols that requires separate signaling lines; the data, however, is sent over a single line (see Chapter 11). Serial storage interfaces incorporate these special signals into protocol frames exchanged along the data path. We'll look at some serial storage protocols later in this section. Serial transfer methods can also be used for time sensitive isochronous data transfers. Isochronous protocols are used with real-time data, such as voice and video signals. Since voice and video are intended for consumption by the human senses, the occasional transmission error is barely noticeable. The approximate nature of the data allows less control of errors; therefore, data can flow with minimal protocol-induced latency from its source to its destination.


Page 369:
338, , Chapter 7 / Storage and Input/Output Systems, , 7A.3, , SCSI, the small computer systems interface, SCSI (pronounced “scuzzy”), was invented in 1981 by a disk drive manufacturer then fledgling, Shugart Associates, and NCR Corporation, previously also a major player in the small computer market. This interface was originally called SASI for Shugart Associates Standard Interface. It was so well designed that it became an ANSI standard in 1986. ANSI committees called the new interface SCSI, thinking it would be better to refer to the interface in more general terms. The original standard SCSI interface (now called SCSI-1) defined a command set, a transport protocol, and the physical connections needed to link an unprecedented number of drives (seven) to a CPU at an unprecedented speed of 5 megabytes per second (MBps). The innovative idea was to put intelligence in the interface so that it was more or less self-managing. This freed up the CPU to work on computational tasks instead of I/O tasks. In the early 1980s, most small computer systems ran at clock speeds between 2 and 8.44 MHz; this made the performance of the SCSI bus seem somewhat short, glaring. SCSI is currently in its third generation, aptly named SCSI-3. SCSI-3 is more than an interface standard; it is an architecture, officially called the SCSI-3 Architecture Model (SAM). SCSI-3 defines a layered system with protocols for communication between layers. This architecture includes the “classic” parallel SCSI interface, as well as three serial interfaces and one hybrid interface. We have more to say about SAM in Section 7A.3.2., 7A.3.1, “Classic” Parallel SCSI. “My system has been screaming ever since I upgraded to SCSI,” the announcer is likely referring to a SCSI-2 or SCSI-3 parallel drive system. In the 1980s, these claims would have been quite a boast due to the difficulty of connecting and configuring the first generation of SCSI devices. Today, not only are transfer rates a few orders of magnitude higher, but intelligence has been built into SCSI devices to virtually eliminate the problems experienced by early SCSI users. Parallel SCSI-3 disk drives support a range of speeds from 10 MBps (for backwards compatibility with SCSI-2) to 80 MBps for Wide, Fast, and Ultra implementations of newer SCSI-3 devices. One of the many beauties of SCSI-3 is that a single SCSI bus can support this range of device speeds without rewiring or replacing the drive. (However, no one will give you any performance guarantees.) Some representative SCSI characteristics are shown in Table 7A.1. Much of the flexibility and robustness of the SCSI-3 parallel architecture can be attributed to the fact that SCSI devices can communicate with each other. The CPU only communicates with its SCSI host adapter, issuing I/O commands as needed. Afterwards, the CPU goes about its business while the adapter takes care of managing the input or output operation. Figure 7A.2 shows this organization for a SCSI-2 system.


Page 370:
7A.3 / SCSI, , SCSI, designation, , Number of cable pins, , Theoretical maximum transfer rate (MBps), , Maximum number of devices, , SCSI-1, Fast SCSI, , 50, 50, , 5 , 10 , , 8, 8, , Fast and Wide, Ultra SCSI, , , 2 × 68, 2 × 68 or, , 40, 80, , 32, 16, , 339, , 50 and 68, , TABLE 7A.1 Summary of Miscellaneous SCSI Resources, Device 7, 50 Leads, Cable, SCSI Host, Adapter, Terminator, CPU, SCSI Disk, Controller, SCSI Disk, Controller, SCSI Disk, Controller, Device 0, ( Device , Device 1, , Device 2, , FIGURE 7A.2 The SCSI-2 Configuration, , “Fast” parallel SCSI-2 and SCSI-3 cables have 50 conductors. Eight of them are used for data, 11 for various types of control. The remaining conductors are needed for connection. electrical interface The Device Select (SEL) signal is placed on the data bus at the beginning of a transfer or command Since there are only eight data lines, a maximum of seven devices can be supported (at other than the host adapter). "Fast and Wide" SCSI cables have 16-bit data buses, allowing twice as many devices to be supported with (presumably) twice the performance. Some Fast and Wide SCSI systems use two 68-conductor cables, which can support twice the throughput and twice the devices that systems using a single 68-conductor cable can support. Table 7A.2 shows the pinouts for a 50-conductor SCSI cable. Parallel SCSI devices communicate with each other and with the host adapter using an asynchronous protocol that runs in eight phases. Hard times are defined for each phase. That is, if a phase is not completed within a certain


Page 371:
340, , Chapter 7 / Input/Output Systems and Storage, , Pin D, number, , Signal, Ground, , 1, , Terminating power, 12 V or 5 V power, , 14, 15, , 12 V or 5 V (logic), ground, data bit 0, parity bit, , 12, 13, , data bit 7, , 17, 26, , 25, 33, 34, , signal, ground, motor power, pin D, number, 35 , 36, 37, , 12 V or 5 V power, Ground, n Attention, , 39, 40, 41, , Sync, n BUSY, , 42, 43, , Signal, , n ACKNOWLEDGE, n Reset, n MESSaGe, n SELECT, n C/ D, n REQuest, nI/O, , pin D, number, 44, 45, 46, 47, 48, 49, 50, , Negative logic signs are indicated by a initial lowercase n: Signal is set when signal is denied., , TABLE 7A.2 Type D SCSI Connector Pin Assignments, , number of milliseconds (depending on bus speed), is considered an error and the protocol restarts from the start of the current phase. The device that sends the data is called the initiator and the destination device is called the destination device. The eight phases of the SCSI protocol are described below. Figure 7A.3 illustrates these phases in a state diagram. • Bus free: Interrogate the “bus busy” (BSY) signaling line to see if the bus is in use before going to the next phase; or lower the BSY signal after the data transfer is complete. • Arbitration: The initiator provides control of the bus by placing its Device ID on the bus and raising the busy signal. If two devices do this simultaneously, the one with the higher device ID gains control of the bus. The loser waits for another "Bus Free" status. • Select: The address of the destination device is placed on the data bus, the "select" signal (SEL) is increased, and the BSY signal is decreased. When the target device sees its own device ID on the bus with increased SEL and decreased BSY and I/O, it increases the BSY signal and stores the initiator ID for later use. The initiator knows the target is ready when it sees the activated. BSY and responds by lowering the SEL signal. • Command: Once the target detects that the initiator has denied the SEL signal, it indicates that it is ready for a command by asserting the "ready for command" signal on the "command/data" (C/D) line and requests the command. command, in turn raising the REQ signal.After the initiator detects that the C/D and REQ signals are turned on, it places the first command on the data bus and turns on the ACK signal.The target device will respond to the command like this sent and then issue the ACK signal to confirm that the command has been received. Subsequent command bytes, if any, are exchanged using ACK signals until all command bytes have been transferred. At this point, the initiator and the destination can free up the bus so that other devices can use it while the disk is placed under the read/write head.


Page 372:
7A.3 / SCSI, , Start, , Data, Transfer, Request, , BUS = Initiator ID, BSY On, , Queue Data, BSY Reduced, , 341, , BUS = Target ID, SEL On, BSY Reduced, I/O Low, Arbitration, , Bus, Free, , Select, C/D On, REQ On, , All Signals, Reduced, , Status, , SEL Reduced, C/D On, REQ On, , BUS = "end , data ", Command, , Message, BUS = data, ACK on, , BUS = command, ACK on, , Data, BUS = "command complete", MSG on, BUS = data, C/D reduced, I/O reduced /asserted , , FIGURE 7A.3 State diagram of parallel SCSI phases (dotted lines show errors, conditions), , •, , •, , •, , •, , higher simultaneity but creates more overhead as control of the bus, it must be renegotiated before data can be transferred to the initiator. Data: After the target has received the full command, put the bus into "data" mode by decreasing the C/D signal. Depending on whether the transfer is output from source to destination (such as a write to disk) or input from source to destination (such as a read from disk), the "input/output" line is denied or asserted (respectively) . The bytes are then placed on the bus and transferred using the same "REQ/ACK" handshake used during the command phase. increasing the C/D signal. It then activates the REQ signal and waits for an acknowledgment from the initiator that the initiator is free and ready to accept a command. ” on the data lines and activates the “message” line, MSG., When the initiator sees the “command complete” message, it drops all signals on the bus, thus returning the bus to the “bus free” state. , Reselection: No, in the event that a transfer has been interrupted (such as when the bus is released while waiting for a disk or tape to fulfill a request), control of the bus is renegotiated through an arbitration phase as is described above. The initiator determines that it has been reselected when it sees the SEL and I/O lines


Page 373:
342, , Chapter 7 / I/O Systems and Storage, , asserted with the proper unique OR and destination ID on the data lines. The protocol then starts again in the data phase. Synchronous SCSI data transfers work in the same way as the asynchronous method described above. The main difference between the two is that handshakes are not required between the transmission of each byte of data. Instead, a minimum transfer period is negotiated between the initiator and the target. Data is exchanged during the traded period. A REQ/ACK handshake will occur before the next block of data is sent. It's easy to see why timing is so critical to the effectiveness of SCSI. At the top, timeout limits prevent the interface from crashing when there is a device error. If this were not the case, removing a floppy from your drive could prevent access to a fixed disk because the bus could be marked busy "forever" (or at least until the system is rebooted). Signal attenuation on long cables can cause timeouts, making the entire system slow and unreliable. Serial interfaces are much more tolerant of time variability., 7A.3.2, , The SCSI-3 architecture model, SCSI evolved from a monolithic system consisting of a protocol, signals, and connectors in an interface specification in layers, separating physical connections, transport protocols, and interface commands. The new specification, called the SCSI-3 Architecture Model (SAM), defines these layers and how they interact, with a command-level host architecture called the SCSI-3 Common Access Method (CAM), to perform I/O operations in virtually any type of device that can be connected to a computer system. The layers communicate with each other using protocol service requests, indications, responses, and acknowledgments. Loosely coupled protocol stacks such as these allow greater flexibility in hardware, software, and interface media options. Technical improvements to one layer must not affect the functionality of other layers. SAM's flexibility has opened up a new world of speed and adaptability for disk storage systems. Figure 7A.4 shows how the SAM components fit together. While the architecture maintains compatibility with parallel SCSI protocols and interfaces, the largest and fastest computer systems now use serial methods. The SAM serial protocols are Serial Storage Architecture (SSA), Serial Bus (also known as IEEE 1394 or FireWire), and Fiber Channel (FC). While all serial protocols support parallel-to-serial SCSI mapping, the Generic Packet Protocol (GPP) is the most chameleon-like in this regard. Due to the speeds of the SCSI-3 buses and the diversity of systems it can interconnect, the term "small" in "Small Computer System Interface" has become something of a misnomer, with variants of SCSI being used throughout , from the smallest personal computer to the largest mainframe systems. Each of the SCSI-3 serial protocols has its own protocol stack, conforming to the SCSI-3 common access method at the top and clearly defined transport, protocols, and physical interface systems at the bottom. Serial protocols send data, in packets (or frames). These packets consist of a group of bytes containing identification information (the packet header), a group of data bytes (called pay-


Page 374:
7A.3 / SCSI, , 343, , Primary Commands SCSI-3, , SCSI, Interlocked, Protocol, (SIP), , SCSI, Parallel, Interface, (SPI-2, SPI-3,, SPI-4), , Generic Protocol, Packet, (GPP), Fiber, Channel, Protocol (FCP, FCP-2), SSA, SCSI-3, Protocol (SSA-S3P), Serial Bus, Protocol-2, (SBP-2), , Transport , Protocols, , SSA-TL2, , (Also known as Ultra2, Ultra3 and Ultra4), , SCSI, Parallel, Interface, (SPI), , Fast-20, (Ultra), , Data Network , Interfaces, p. eg, TCP/IP, ATM, etc., , Fiber, Channel, (FC-PH), , SSA-PH1 or, SSA-PH2, , IEEE 1394, (PHY), , Physical interconnections, , FIGURE 7A.4 The SCSI-3 architecture model (SAM), payload) and some sort of trailer enclosing the end of the packet. Error detection coding is also included in the packet forward in many of the SAM protocols. We will look at some of the more interesting serial SAM protocols in the following sections. The 1394 got its start at the Apple Computer Company when it saw the need to create a bus that was faster and more reliable than that provided by the parallel SCSI systems that dominated in the late 1980s. This interface, which Apple called FireWire, today it offers bus speeds of 40 MBps, with higher speeds expected in the near future. IEEE 1394 is more than a storage interface: it is a peer-to-peer storage network. Devices are equipped with intelligence that allows them to communicate with each other and also with the host controller. This communication includes transfer rate negotiation and bus control. These functions are distributed across all layers of the IEEE 1394 protocol, as shown in Figure 7A.5. IEEE 1394 not only provides faster data transfer than earlier parallel SCSI, it also does so using a much thinner cable with only six conductors: four for data and control, two for power. The smaller cable is less expensive and much easier to manage than 50-conductor SCSI-1 or SCSI-2 cables. Additionally, IEEE 1394 cables can extend approximately 15 feet (4.5 meters) between devices. Up to 63 devices can be chained on a bus. The IEEE 1394 connector is modular, similar in style to Game Boy connectors. The entire system is self-configuring, allowing multiple devices to be easily connected and used while the system is running. However, hot plugging comes at a price. The vote needed to keep up


Page 375:
344, , Chapter 7 / Storage and I/O Systems, , DMA Drivers and Controllers, Bus, Management, , Asynchronous, Transfers, , Isochronous, Transfers, , Transaction, Layer, Commands, , SCSI-3, Primary, Commands, , Bus, Manager, Isochronous, Resource, Manager, Node, Controller, Serial Bus, Management, Link Layer, Loop, Control, Packet, Transmitter/Receiver, Physical Layer, Arbitration, Resynchronization, Wiring, Signal Levels, , FIGURE 7A .5 The IEEE 1394 protocol stack, the devices connected to the interface place an overhead on the system, which ends up limiting its performance. Also, if a connection is busy processing an isochronous data stream, it may not immediately recognize that a device is connecting during the transfer. Figure 7A.6. For data I/O purposes, this tree structure is of limited use. Due to its support for isochronous data transfer, IEEE 1394 has gained wide acceptance in consumer electronics. It is also poised to surpass the IEEE 488 General Purpose Interface Bus (GPIB) for laboratory data acquisition applications. Because of its concern with handling data in real time, it is unlikely that IEEE 1394 will strive to replace SCSI as a high-capacity data storage interface. become a competitor in the storage interface area. In the early 1990s, IBM was among many computer manufacturers looking for a fast and reliable alternative to parallel SCSI for use in mainframe disk storage systems. IBM engineers settled on a serial bus that offered compression and low attenuation for long cables. It was necessary to provide greater performance and reduction


Page 376:
7A.3 / SCSI, , 345, , CD, Reader/Writer, CPU, CD, Reader, Magnetic, Disk, Magnetic, Disk, Camcorder, , VCR, , DVD, , MIDI Keyboard, , FIGURE 7A.6 An IEEE 1394 Tree configuration, loaded with consumption, electronics, protection compatibility with SCSI-2 protocols. By late 1992, SSA had been refined enough to justify IBM proposing it as a standard for ANSI. This standard was approved in late 1996. The SSA design supports multiple disk drives and multiple hosts in a loop configuration, as shown in Figure 7A.7. A four-conductor cable consisting of two twisted pairs of copper wire (or four strands of fiber optic cable) allows signals to travel in opposite directions in the loop. Because of this redundancy, one drive or host adapter can fail and the rest of the drives will remain accessible. The SSA architecture's dual-loop topology also allows doubling of base throughput from 40 MBps to 80 MBps. If all the nodes are working normally, the devices can communicate with each other in full duplex mode (data goes in both directions on the loop at the same time). SSA devices can manage some of their own I/O. For example, in Figure 7A.7, host adapter A might be reading from disk 0 while host adapter B is writing to disk 3, disk 1 is sending data to a tape drive, and disk 2 is sending data to a tape drive. you are sending data to a tape drive.


Page 377:
346, , ​​Chapter 7 / I/O and Storage Systems, Disk Drive 3, , Printer, CPU, , Device, Controller, , SSA Host, Adapter B, , SSA Node, Ports, , SSA Node, Ports, , Output , Input , Device, SSA Node Controller, Ports, , SSA Host, Adapter A, CPU, , Disk Drive 2, , SSA Node, Ports, , SSA Node, Ports, , SSA Node, Ports, , Device, Controller, , Device , Controller , , Device, Controller, , Disk Drive 0, , Disk Drive 1, , Tape Drive 0, , FIGURE 7A.7 A Serial Storage Architecture (SSA) configuration, , with no rate degradation transfer attributable to the bus itself. IBM calls this idea spatial reuse because no part of the system needs to wait for the bus if there is a clear path between source and destination. Due to its elegance, speed, and reliability, SSA was on the verge of becoming the dominant interconnection method for large computing systems. 🇧🇷 🇧🇷 Até Fiber, Channel appeared., Fiber Channel, , In 1991, engineers from the CERN laboratory (Conseil Européen pour la Recherche Nucléaire, ou Organização Européia para Pesquisa Nuclear) in Genebra, Switzerland, decided to create a system to transport Internet communication on, fiber optic media. They named this system Fiber Channel, using the European spelling for fiber. The following year, Hewlett-Packard, IBM, and Sun Microsystems formed a consortium to port Fiber Channel to disk interface systems. East


Page 378:
7A.3 / SCSI, , 347, , the group has grown to become the Fiber Channel Association (FCA), which is working with ANSI to produce a refined and robust blueprint for high-speed interfaces for storage devices. Although originally designed to define fiber optic interfaces, Fiber Channel protocols can also be used over coaxial copper and twisted pair media. Fiber Channel storage systems can be any of three topologies: switched, point-to-point, or loop. The loop topology, called Fiber Channel Arbitrated Loop (FCAL), is the most widely used and least expensive of the three Fiber Channel topologies. Fiber Channel topologies are shown in Figure 7A.8. FC-AL provides 100 Mbps packet transmission in one direction, with a theoretical maximum of 127 devices in the loop; However, 60 is considered the practical limit. Note that Figure 7A.8 shows two versions of the FC-AL, one with (c) and one without (b) a simple switching device called a hub. The FC-AL hubs are equipped with port bypass switches that are activated whenever one of the FC-AL drives fails. Without some kind of port skipping capability, the entire loop will fail if only one disk becomes unusable. (Compare this to SSA.) Therefore, adding a hub to the configuration introduces failover protection. Since the hub itself can become a single point of failure (although it does not typically fail), redundant hubs are provided for installations that require high system availability. practical limit to the number of devices connected to the interface (up to 224). Each point between the switch and a node can support a 100 MBps connection. So two disks can transfer data to each other at 100 MBps while CPU transfers data to another disk at 100 MBps, , Disk, , Disk, , Disk, , Disk, , 100 MBps, , CPU, , CPU, , 100 MBps , , Disk, 100 MBps, , a. Point to point, , Disc, , b. Basic Loop, , Disk, , Disk, , 100 MBps, , Hub, , CPU, , Disk, , CPU, , 100 MBps, , Disk, , Switch, , Disk, , 100 MBps, , Disk, , Disk, , c . Loop with Hub, , Disk, , Disk, , d. Switched (Star Configuration), FIGURE 7A.8 Fiber Channel Topologies


Page 379:
348, , Chapter 7 / Input/Output Systems and Storage, , 100 MBps, etc. Unsurprisingly, switched Fiber Channel configurations are more expensive than loop configurations due to more sophisticated switching components, which must be redundant to ensure continuous operation. Fiber Channel is something like a fusion of data networks and storage interfaces. It has a protocol stack that fits both SAM and internationally accepted network protocol stacks. This protocol stack is shown in Figure 7A.9. Due to the higher level protocol mappings, a Fiber Channel storage configuration does not necessarily require a direct connection to a CPU: Fiber Channel protocol packets can be encapsulated in a network transmission packet or passed directly as a SCSI command. The FC-4 layer handles the details. The FC-2 layer produces the protocol packet (or frame) that contains data or higher level commands or lower level responses and data. This packet, shown in Figure 7A.10, has a fixed size of 2148 bytes, of which 36 are bytes for framing, routing, and error control. The FC-AL loop is initialized when turned on. At this point, the participating devices advertise each other, negotiate device (or port) numbers, and select a master device. Data transmissions are done through packet exchanges. FC-AL is a point-to-point protocol, in some ways similar to SCSI. Only two nodes, the initiator and the responder, can use the bus at a time. When an initiator wants to use the bus, it places a special signal called ARB(x) on the bus. This means that device x wants to arbitrate control of the bus. If no other device has control of the bus, each node in the loop forwards the ARB(x) to its next upstream neighbor until the packet finally returns to the initiator. When the initiator sees its unchanged ARB(x) on the bus, it knows it has gained control. Internet, LAN) ; multicast, etc.), , FC-2, , Framing, Frame Flow Control and Loop Arbitration, , FC-1, , Encode/Decode and, Transmission protocols, , FC-0, , Physical interface, (signal levels , times, media specifications, etc.), FIGURE 7A.9 The Fiber Channel protocol stack, FC-PH, ANSI X3.230, Fiber Channel, physical and signaling, interface


Page 380:
7A.3 / SCSI, 4 bytes, Start Of, Frame, Flag, 24 bytes, Frame, Header, 2112 data bytes, 2048 bytes, 64 bytes, Optional, Frame, Header*, Data Payload, 349 , , 4 bytes, , 4 bytes, , Cyclic, Redundancy, Check, , End of, Frame, Flag, , (Error Control), , * The optional header is not part of the Fiber Channel specification. Used to provide compatibility when, when interacting with other protocols, , 1-byte, , 3-byte, Packet, Type, Destination, Address, or Device ID, (for example, command, response packet, data packet, etc.) (Label id, keep frames, packet as fiber saturation, channel frame), buffer), 1 byte, Sequence, Count, 3 bytes, Optional, Header, Flag, (Packet number, in-chain, of exchange ID ), , (Nonzero when, optional header, field used), , 2 bytes, String, Identifier, , 2 bytes, Exchange, Identifier, (Arbitrary exchange (arbitrary exchange, ID assigned by , ID assigned by, responder) , originator ), , 4 bytes, Relative, offset, first byte, payload, (For buffer-to-buffer transfers), , FIGURE 7A.10 The Fiber Channel protocol packet, , If another device has control of the loop , the ARB(x) will change to ARB(F0) before returning to the initiator. The initiator tries again. If two devices try to gain control of the bus at the same time, the one with the higher node number wins, and the other tries again later. The initiator claims control of the bus by opening a connection with a responder. This is done by sending an OPN(yy) (for full-duplex) or OPN(yx) (for half-duplex) command. Upon receipt of the OPN(??) command, the responder enters the "ready" state and notifies the initiator by sending the "receiver ready" (R_RDY) command to the initiator. After the data transfer is complete, the initiator issues a "close" (CLS) command to release control of the loop. The details of the data transfer protocol depend on the service class used in the loop or structure. Some classes require packets to be recognized (for maximum precision) and some do not (for maximum speed). To date, there are five classes of service defined for Fiber Channel data transfers. Not all of these service classes have been implemented in actual products. Additionally, some classes of service can be combined if sufficient bandwidth is available. Some implementations allow Class 2 and Class 3 frames to be transmitted when the loop or channel is not used for Class 1 traffic. Table 7A.3 summarizes the various classes of service currently defined for Fiber Channel. Table 7.A.4 summarizes the main features of IEEE 1394, SSA, and FC-AL.


Page 381:
350, , Chapter 7 / I/O and storage systems, , Class, , Description, , 1, , Packet-aware dedicated connection. Many providers do not support it due to the complexity of connection management. Similar to Class 1, except that it does not require dedicated connections. Packets can be delivered out of sequence when they are routed through different paths in the network. Class 2 is suitable for facilities with low traffic, infrequent bursts., , 2, , 3, , Unconfirmed delivery offline. Packet delivery and sequencing are handled by higher level protocols. On small networks with high bandwidth, delivery is generally reliable. Suitable for FC-AL due to the temporary routes negotiated by the protocol., , 4, , Virtual circuits created from the total bandwidth of the network. For example, a 100 MBps network can support a 75 MBps connection and a 25 MBps connection. Each of these virtual circuits would allow different classes of service. As of 2002, no commercial Class 4, , 6, , Multicast products from one source with acknowledged delivery to another source have been released to the market. Useful for streaming video or audio. To prevent flooding of the broadcast node (as would happen when using class 3 connections for broadcast), a separate node would be placed in the network to manage broadcast acknowledgments. As of 2002, no Class 6 implementations have been released to the market., TABLE 7A.3 Fiber Channel Classes of Service, Interface, IEEE 1394, SSA, FC-AL, Max. Cable Length, Between Devices, Max Data Rate, Max Devices, Per Controller, 15 ft (4.5 m), Copper: 66 ft (20 m), Fiber: 0.4 mi (680 m), Copper : 50 m (165 ft), Fiber: 10 km (6 mi), , 40 MBps, , 63, , 40 MBps, , 129, , 25 MBps, 100 MBps, , 127, , TABLE 7A.4 Some speeds and features of the SCSI-3 Architecture Model, , 7A .4, , STORAGE AREA NETWORKS, developments in Fiber Channel technology have enabled the construction of dedicated networks created specifically for accessing and managing storage. These networks are called storage area networks (SANs). SANs logically extend local storage across the buses, making collections of storage devices accessible to all computing platforms: small, medium, and large. Storage devices can be co-located with the hosts or they can be miles away, serving as "hot" backups to a primary processing location.


Page 382:
7A.4 / Storage Area Networks, , 351, , SANs provide faster and more agile access to large amounts of storage than the network-attached storage (NAS) model can provide. In a typical NAS system, all file access must go through a specific file server, creating all the protocol overhead and traffic congestion associated with networking. Disk access protocols (SCSI-3 architecture model commands) are embedded in network packets, providing two layers of protocol overhead and two packet assembly/disassembly iterations, the SANs, sometimes referred to as " network behind network", are isolated from ordinary network traffic. Fiber Channel storage networks (whether switched or FCAL) are potentially much faster than NAS systems because they only have to traverse one protocol stack. Thus, they bypass traditional file servers, which can speed up network traffic. The NAS and SAN configurations are compared in Figures 7A.11 and 7A.12. Because SANs are independent of any specific network protocols (such as Ethernet) or proprietary host attachments, they can be accessed via SAM, higher-level protocols by any platform that can be configured to recognize SAN storage devices. Additionally, storage management is greatly simplified because all storage is on a single SAN (as opposed to multiple disk and file servers, arrays). Data can be stored at remote locations via electronic transfer or tape backup without interfering with host or network operations. Because of their speed, flexibility, and robustness, SANs are becoming the first choice for providing highly available, multi-terabyte storage to large user communities., , Clients, , File Server, , Disk Drive, , Mainframe, , Area local, Network, File server, tape drive, FIGURE 7A.11 Network-attached storage (NAS)


Page 383:
352, , Chapter 7 / I/O and Storage Systems, Clients, , File Server, , Tape Drive, , Mainframe, Storage Area, Network, , Local Area, Network, , File Server, , Disk Drive , , FIGURE 7A.12 A storage area network (SAN), , , 7A.5, OTHER I/O CONNECTIONS, Various I/O architectures are outside the domain of the SCSI-3 architecture model, but can interact with him to some extent. The most popular of these is the AT accessory that is used in most low-end computers. Others, designed for computing architectures beyond the Intel paradigm, have found wide application across various types of platforms. We describe some of the more popular I/O connections in the following sections. : XT for ATA, early IBM PCs supported an 8-bit bus called the PC/XT bus. This bus has been accepted by the IEEE and has been renamed the Industry Standard Architecture (ISA) bus. It originally operated at 2.38 MBps and required two cycles to access a 16-bit memory address due to its narrow width. Since the XT was running at 4.77 MHz, the XT bus provided adequate performance. With the introduction of the PC/AT with its faster 80286 processor, it became apparent that an 8-bit bus would no longer be useful. The immediate solution was to extend the bus to 16 data lines, increase its clock speed to 8 MHz, and call it the "AT bus." However, it was not long before the new AT bus became a serious system bottleneck when microprocessor speeds began to exceed 25 MHz. Various solutions to this problem have been commercialized over the years. . The most enduring of these is an incarnation of the AT bus, with several variations, known as AT Attachment, ATAPI, Fast ATA, and EIDE. The latter abbreviation stands for Enhanced Integrated Drive Electronics, so named because of much of the control function that would normally be placed on a disk drive interface board.


Page 384:
7A.5 / Other I/O Connections, , 353, , has been moved to the control circuitry of the drive itself. AT Attachment provides compatibility with 16-bit AT interface cards, while allowing 32-bit interfaces for disk drives and other devices. External devices cannot be connected directly to an AT connection bus. The number of internal devices is limited to four. Depending on whether you use scheduled I/O or DMA I/O, the AT connection bus can support transfer rates of 22 MBps or 16.7 MBps with a theoretical maximum of 66 MBps. At these speeds, ATA offers one of the most favorable cost-performance ratios for small system buses on the market today. overall performance of the small system. Fearing that the AT bus had reached the end of its useful life, Intel sponsored an industry group tasked with creating a faster and more flexible I/O bus for small systems. The result of their efforts is Peripheral Component Interconnect (PCI). The PCI bus is an extension of the system data bus, replacing any other I/O bus in the system. PCI runs at speeds up to 66 MHz with the full width of a CPU word. Therefore, the data transfer rate is theoretically 264 MBps for a 32-bit CPU (66 MHz ⫻, (32 bits ÷ 8 bits/byte) = 264 MBps). For a 64-bit bus running at 66 MHz, the maximum transfer rate is 528 MBps. Although PCI connects to the system bus, it can autonomously negotiate bus speeds and data transfers without CPU intervention. PCI is fast and flexible. PCI versions are used in small home computers as well as large, high-performance systems that support data acquisition and scientific research., , 7A.5.3, , A serial interface: USB, The Universal Serial Bus (USB) does not it's really a bus. It is a peripheral serial interface that connects to a microcomputer expansion bus like any other expansion card. Now in its second revision, USB 2.0 is poised to surpass the AT Attachment in terms of price-performance and ease of use, making it attractive for use in home systems. The designers of USB 2.0 claim that their product is as easy to use as “plugging a phone into a wall socket”. USB requires an adapter card on the host called a root hub. The root hub connects to one or more external multiport hubs that can connect directly to a wide variety of peripheral devices, including video cameras and phones. like, up to 127 devices through a single root hub. Most of the objections to USB 1.1 concerned its slow speed of 12 MBps. At this data rate, USB 1.1 worked well with slow devices like printers, keyboards, and mice, but was of little use for disks or isochronous data transmission. The main improvement that USB 2.0 offers is a theoretical maximum data rate of 480 MBps, well beyond the needs of most desktop computers today. One of the great advantages that USB offers is its low power consumption, which makes it a good choice for laptops and portable systems.


Page 385:
354, , Chapter 7 / Input/Output Systems and Storage, , 7A.5.4, , High Performance Peripheral Interface: HIPPI, The High Performance Peripheral Interface (HIPPI) is at the other end of the bandwidth spectrum. band. HIPPI is the ANSI standard for interconnecting mainframes and supercomputers at gigabit speeds. The ANSI Technical Committee X3T11 issued the first set of HIPPI specifications in 1990. With proper hardware, HIPPI can also be used as a high-capacity storage interface, as well as a LAN backbone protocol. HIPPI currently has a maximum speed of 100 MBps. Two 100 MBps connections can be duplicated to form one 200 MBps connection. Work is underway to produce a 6.4 gigabit standard for HIPPI, which would provide 1.6 gigabytes of bandwidth in full-duplex mode. 🇧🇷 Without repeaters, the HIPPI can travel about 50 meters (150 feet) over copper. HIPPI fiber optic connections can span a maximum of 6 miles (10 km) without repeaters, depending on the type of fiber used. Designed as a massively parallel interconnect for massively parallel computers, HIPPI can interface with many other buses and protocols, including PCI and SAM. small systems. SCSI-2, ATA, IDE, PCI, USB, and IEEE 1394 are suitable for small systems. HIPPI and some of the SCSI-3 protocols were designed for large, high-capacity systems. The SCSI-3 architecture model redefined high-speed interfaces. Aspects of the SCSI-3 architectural model overlap in the area of ​​data communication as computers and storage systems continue to become more interconnected. Fiber Channel is one of the fastest interface protocols currently used for servers, farms, but other protocols are on the horizon. An industry is beginning to grow around the concept of "managed storage," where short- and long-term disk storage management is handled by a third party for enterprise clients. This area of ​​outsourced services can be expected to continue to grow, bringing with it many new ideas, protocols, and architectures. What do you expect to find in a large data center or server farm? What would be the problem with using one of the other architectures in the data center environment? 2. How many SCSI devices can be active after the arbitration phase is complete?


Page 386:
Exercises, , 355, , 3. Suppose that, during an asynchronous parallel SCSI data transfer, someone removes a diskette, , from the drive that is the intended destination of the transfer. How would the initiator know that the error occurred during the phases: • No Bus, , • Status, , • Selection, , • Message, , • Command, , • New Selection, , • Data, a) During which phases is it possible to that good data can be written to the floppy disk if the data transfer is a "write" operation?, b) If the transfer is a "read", at what point will the system have good data in the buffer? Would the system recognize this data? 4. Your manager has decided that your file server's performance can be improved by replacing your old SCSI-2 host adapter with a Fast and Wide SCSI-3 adapter. He also decides that the old SCSI-2 drives will be replaced by SCSI-3 Fast and Wide drives that are much larger than the old ones. Once all the files on the old SCSI-2 drives have been moved to the SCSI-3 drives, reformat the old drives so they can be used again somewhere. Learning that he has done this, his manager tells him to leave the old SCSI-2 drives in the server, because he knows that SCSI-2 is backward compatible with SCSI-3. Being a good employee, you agree to this demand. However, a few days later, she is not surprised when her manager expresses her disappointment that the SCSI-3 upgrade does not seem to offer the performance improvement she expected. What happened? How can you fix it?, ◆, , 5. You have just upgraded your system to a fast and wide SCSI interface. This system contains a floppy disk, a CD-ROM, and five 8-gigabyte fixed disks. What is the device, host adapter number? Why? 6. How does SCSI-2 differ from the principles of the SCSI-3 architecture model? 7. What advantages does the SCSI-3 architecture model offer manufacturers of computer equipment and peripherals? 8. Suppose you want to create a video conferencing system by connecting multiple computers and video cameras. Which interface model would you choose? Will the protocol packet used to transfer the video be identical to the protocol packet used for data transmission? What protocol information would be in one packet and not the other? 9. How would an SSA bus configuration recover from a single disk failure? Suppose another node fails before the first one can be repaired. How would the system recover?10. A task force was assigned the task of placing automatically coupled controls in a chemical plant. Hundreds of sensors will be placed in tanks, vats and hoppers throughout the factory campus. All the data from the sensors will be fed to a group of computers with sufficient power so that the managers and supervisors of the plan can control and monitor the various processes in progress.


Page 387:
356, , Chapter 7 / Input/Output and Storage Systems, What kind of interface would you use between the sensors and the computers? If all the computers had access to all the sensor inputs, would you use the same type of connection to link the computers together? What I/O control model would you use?11. One of your engineers is proposing changes to the bus architecture of the systems his company makes. He states that if the bus is modified to directly support network protocols, systems will not need network cards. She claims that she can also eliminate her SAN and connect client computers directly to the disk array. Would you oppose this approach? explain.


Page 388:
A program is a spell cast on a computer, turning input into errors, messages. position where you, , or are forced to buy "sub-optimal" computer hardware because a particular system is the only one running a particular software product needed by your employer. While you may be tempted to view this as an insult to your better judgment, you must recognize that a complete system requires both software and hardware. Software is the window through which users view a system. If the software cannot provide services as expected by the users, they will consider the entire system inadequate, regardless of the quality of its hardware. In Chapter 1, we introduced a computer organization consisting of six machine levels, with each level above the gate level providing an abstraction for the layer below it. In Chapter 4, we discussed assemblers and the relationship of assembly language to architecture. In this chapter, we look at the software found at the third level and link these ideas to the software at the fourth and fifth levels. The collection of software at these three levels runs below the application programs and just above the instruction set architecture level. These are the software components, the "machines", with which your application's source code interacts. The programs at these levels work together to grant access to the hardware resources that execute the commands contained in the application programs. But to look at a computer system as a single thread running from the application source code, all the way down to the gate level, is to limit our understanding of what a computer system is. We would be ignoring the rich set of services provided at each level. Although our computer system model only locates the operating system, at the "system software" level, the study of system software often includes com357


Page 389:
358, , Chapter 8 / System Software, , stackers, and other utilities, as well as a category of complex programs sometimes called middleware. In general terms, middleware is a broad classification for software that provides services above the operating system layer but below the application program layer. You may recall that in Chapter 1 we discussed the semantic gap that exists between hardware and high-level languages ​​and applications. We know that this semantic gap should not be perceptible to the user, and middleware is the software that provides the necessary invisibility. Since the operating system is the foundation for all system software, virtually all system software interacts with the operating system to some degree. We start with a brief introduction to the inner workings of operating systems and then move on to the upper layers of the software. interact with computer hardware. Operating systems provide a necessary set of functions that allow software packages to control computer hardware. Without an operating system, each program that runs would need its own driver, since the operations of the video card, sound card, hard drive, etc., have changed considerably. They assume that an operating system will make it easier to manage the system and its resources. This expectation has given rise to "drag and drop" file management as well as "plug and play" device management. From the programmer's point of view, the operating system obscures the details of the lower architectural levels of the system, allowing more focus on high-level troubleshooting. We have seen that it is difficult to program at the machine level or at the assembly language level. The operating system works with various software components, creating a more user-friendly environment where system resources are used effectively and efficiently and where machine code programming is not required. The operating system not only provides this interface to the programmer, but also acts as a layer between the application software and the actual hardware of the machine. Whether viewed through the eyes of the user or through lines of code in an application, the operating system is essentially a virtual machine that provides a hardware interface to software. It handles real devices and real hardware so that application programs and users don't have to. The operating system itself is little more than regular software. It differs from most other programs in that it is loaded at computer startup and then directly executed by the processor. The operating system must have control of the processor (as well as other resources) because one of its many tasks is to schedule processes that use the CPU. It gives up control of the CPU to various application programs while they are running. The operating system relies on the processor to regain control when the application no longer requires the CPU or relinquishes the CPU while waiting for other resources. As mentioned, the operating system is an important interface to the underlying hardware, both for users and for application programs. In addition to


Page 390:
8.2 / Operating Systems, , 359, , its interface function, has three main tasks. Process management is perhaps the most interesting of these three. The other two are managing system resources and protecting those resources from failed processes. Before we discuss these features, let's look at a brief history of operating system development to see how it compares to the evolution of computer hardware. Plenty of graphical tools to help novice and experienced users alike. But that was not always the case. A generation ago, computing resources were so precious that every cycle of the machine had to do useful work. Due to the extremely high cost of computer hardware, computer time has been allocated with great care. Back then, if you wanted to use a computer, the first step was to log into the machine. When the time came, you laid out a deck of punched cards yourself, operating the machine in single-user interactive mode. However, before loading your program, you must first load the compiler. The initial set of cards in the entry deck included the bootloader, which caused the rest of the cards to load. At this point, you can compile your program. If there was an error in your code, you had to find it quickly, re-punch the offending card (or cards), and feed the deck into the computer again in another attempt to compile your program. If you couldn't quickly locate the problem, you had to log in longer and try again later. If your program were to compile, the next step would be to link the object code with the library's code files to create the executable file that would actually run. That was a terrible waste of expensive computer and human time. In an effort to make the hardware usable by more people, batch processing was introduced. With batch processing, professional traders would combine decks of cards into batches, or packs, with the proper instructions allowing them to be processed with minimal intervention. 🇧🇷 These batches used to be programs of similar types. For example, there could be a batch of FORTRAN programs and then a batch of COBOL programs. This allowed the operator to set up the machine for FORTRAN programs, read and execute them all, and then switch to COBOL. A program called the resident monitor allowed programs to be processed without human interaction (other than placing the decks of cards in the card reader). Monitors were the forerunners of modern operating systems. Its function was simple: the monitor started the job, gave control of the computer to the job, and when the job was done, the monitor took back control of the machine. Work originally done by people was being done by the computer, thus increasing efficiency and utilization. However, as its authors remember, the response time for batch jobs was very long. (We say two good times, delivering several cards in language assembly for processing in the data center. We are excited to wait less than 24 hours before receiving the results back!) Batch processing is difficult to debug, or more correctly , very slow. An infinite loop in a program can wreak havoc on a system.


Page 391:
360, , Chapter 8 / System Software, , Timers were eventually added to the monitors to prevent a process from hogging the system. However, the monitors had a severe limitation in that they offered no additional protection. Without protection, a batch job can affect pending jobs. (For example, a "bad" job might read too many cards, causing the next program to fail.) Also, it was even possible for a batch job to affect the monitor code! To correct this problem, computer systems have been provided with specialized hardware that allows the computer to operate in monitor mode, mode, or user mode. Programs ran in user mode, switching to monitor mode when certain system calls were needed. Increases in CPU performance made punch card batch processing less and less efficient. Card readers just couldn't keep the CPU busy. Magnetic tape offered a way to process decks more quickly. Card readers and printers were connected to smaller computers that were used to read decks of cards on tape. A tape can contain multiple jobs. This allowed the mainframe CPU to seamlessly switch between processes without reading the cards. A similar procedure was followed for departure. The output was recorded on tape, which was removed and put into a smaller computer that did the actual printing. The monitor needed to periodically check if an I/O operation was required. Timers have been added to the jobs to allow brief interruptions for the monitor to send waiting I/O to the tape drives. This allowed CPU and I/O calculations to be performed in parallel. This process, which was prevalent from the late 1960s to the late 1970s, was known as Simultaneous Online Peripheral Operation, or SPOOLing, and is the simplest form of multiprogramming. The word has stuck in the computer lexicon, but its contemporary meaning refers to printed output that is written to disk before being sent to the printer. idea of ​​spooling and batching to allow multiple running programs to be in memory simultaneously. This is accomplished by cycling processes, allowing each one to use the CPU for a specified period of time. Monitors were able to handle multiprogramming up to a point. They could start tasks, queue operations, perform I/O operations, switch between user tasks, and provide some protection between tasks. However, it should be clear that the monitor's job was becoming more complex and required more elaborate software. It was at this point that the screens became the software we now know as operating systems. While operating systems freed programmers (and operators) from a significant amount of work, users wanted closer interaction with computers. In particular, the concept of batch jobs was not attractive. Wouldn't it be nice if users could submit their own work interactively and get immediate feedback? Timeshare systems allowed just that. The terminals were connected to systems that allowed access to several simultaneous users. Batch processing soon became obsolete as interactive programming made time-sharing (also known as time-splitting) easy. In a timesharing system, the CPU switches between user sessions very, very quickly, giving each user a small slice of processor time. This procedure of switching between processes is called context switching. The operating system


Page 392:
8.2 / Operating Systems, , 361, , performs these context switches quickly, in essence giving the user a personal virtual machine. Timesharing allows many users to share the same CPU. By extending this idea, a system can allow many users to share a single application. Large interactive systems, such as airline reservation systems, serve thousands of simultaneous users. As with timesharing systems, the heavy users of the interactive system are unaware of the other users of the system. The introduction of multiprogramming and timesharing required more complex operating system software. During a context switch, all pertinent information about the currently running process must be saved, so that when the process is scheduled to use the CPU again, it can be restored to the exact state in which it was interrupted. This requires the operating system to know all the details of the hardware. Recall from Chapter 6 that virtual memory and paging are used in today's systems. Page tables and other information associated with virtual memory must be saved during a context switch. CPU registers must also be saved when a context switch occurs, because they contain the current state of the running process. These context switches are not cheap in terms of resources or time. To be valuable, the operating system must handle them quickly and efficiently. It is interesting to note the close correlation between architectural advances and the evolution of operating systems. The first generation computers used vacuum tubes and relays and were quite slow. In fact, there was no need for an operating system, because the machines could not handle multiple simultaneous tasks. Human operators performed the necessary task management tasks. Second generation computers were built with transistors. This resulted in an increase in CPU speed and capacity. Although CPU capacity has increased, it was still expensive and needed to be fully utilized. Batching was introduced as a means to keep the CPU busy. The monitors helped with processing, providing minimal protection, and handling interrupts. The third generation of computers was marked by the use of integrated circuits. This, again, resulted in an increase in speed. Queuing alone could not keep the CPU busy, so timesharing was introduced. Virtual memory and multiprogramming required a more sophisticated monitor, which evolved into what we now call an operating system. Fourth generation technology, VLSI, allowed the personal computing market to flourish. The network, operating systems and distributed systems are a consequence of this technology. Vendors often produced one or more specific operating systems for a given hardware platform. Operating systems from the same vendor designed for different platforms could vary radically both in how they function and in the services they provide. It was not uncommon for a vendor to introduce a new operating system when a new computer model was released. IBM ended this practice in the mid-1960s when it introduced the 360 ​​series of computers. Although each computer in the 360 ​​family of machines differed greatly in performance and target audience, all computers ran the same basic operating system, OS/ 360.


Page 393:
362, , Chapter 8 / System Software, , Unix is ​​another operating system that exemplifies the idea of ​​an operating system that spans multiple hardware platforms. Ken Thompson of Bell, AT&T Laboratories, began working on Unix in 1969. Thompson originally wrote Unix, in assembly language. Since assembly languages ​​are hardware specific, any code written for one platform must be rewritten and assembled for a different platform. Thompson was put off by the idea of ​​rewriting his Unix code for different machines. In an effort to save future work, he created a new high-level interpreted language called B. It turned out that B was too slow to support the activities of the operating system. Later, Dennis Ritchie partnered with Thompson to develop the C programming language, releasing the first C compiler in 1973. Thompson and Ritchie rewrote the Unix operating system in C, forever dispelling the belief that operating systems should be written in C. C language. assembler. Because it was written in a high-level language and could be compiled for different platforms, Unix was very portable. This significant departure from tradition has allowed Unix to become extremely popular, and while it has slowly made its way into the market, it is now the operating system of choice for millions of users. The hardware neutrality exhibited by Unix allows users to select the best hardware for their applications instead of being limited to a specific platform. There are literally hundreds of different versions of Unix available today, including Solaris from Sun, AIX from IBM, HP-UX from Hewlett-Packard, and Linux for PCs and servers. and distributed/networked systems. Real-time systems are used for process control in factories, assembly lines, robotics, and complex physical systems like the space station, to name just a few. Real-time systems have severe time constraints. If specific deadlines are not met, physical damage or other undesirable effects to persons or property may occur. Since these systems must respond to external events, it is critical to schedule the process correctly. Imagine a system controlling a nuclear power plant, which couldn't respond fast enough to an alarm signaling critically high core temperatures! In hard real-time systems (with potentially fatal results if deadlines are missed), there can be no errors. In soft real-time systems, meeting deadlines is desirable, but catastrophic results are not achieved if deadlines are missed. QNX is an excellent example of a real-time operating system (RTOS) designed to meet strict programming requirements. QNX is also suitable for embedded systems because it is powerful but takes up little space (requires very little memory) and tends to be very secure and reliable. Multiprocessor systems present their own set of challenges, since they have more than one processor that must be programmed. The way the operating system allocates processes to processors is an important design consideration. Typically, in a multiprocessing environment, CPUs cooperate with each other to solve problems, working in parallel to achieve a common goal. Coordination of processor activities requires that they have some means of communication with a


Page 394:
8.2 / Operating systems, , 363, , others. System timing requirements determine whether processors are designed using tightly coupled or loosely coupled communication methods. Tightly coupled multiprocessors share a single centralized memory, which requires an operating system to time processes very carefully to ensure protection. This type of coupling is typically used for multiprocessors, systems consisting of 16 or fewer processors. Symmetric Multiprocessors (SMPs) are a popular form of tightly coupled architecture. These systems have multiple processors that share memory and I/O devices. All processors perform the same functions, and the processing load is distributed among all of them. Loosely coupled multiprocessors have physically distributed memory and are also known as distributed systems. Distributed systems can be viewed in two different ways. A distributed collection of workstations on a LAN, each with its own operating system, is often called a networked system. These systems were motivated by the need for multiple computers to share resources. A network operating system includes the necessary provisions such as remote command execution, remote file access, and remote login to connect machines to the network. User processes also have the ability to communicate over the network with processes on other machines. Network file systems are one of the most important applications of network systems. This allows multiple machines to share a logical file system, even if the machines are located in different geographic locations and may have different architectures and unrelated operating systems. Synchronization between these systems is an important issue, but communication is even more important because this communication can occur over vast network distances. Although networked systems can be distributed over geographic areas, they are not considered true distributed systems. A truly distributed system differs from a network of workstations in one significant way: a distributed operating system runs simultaneously on all machines, presenting the user with the image of a single machine. On the contrary, in a networked system, the user is aware of the existence of different machines. Transparency, therefore, is an important issue in distributed systems. The user should not be required to use different names for files simply because they reside in different locations, give different commands to different machines, or perform any other interaction that depends solely on the location of the machine. For the most part, operating systems for multiprocessors need not differ significantly from those for single-processor systems. However, scheduling is one of the main differences, because multiple CPUs need to be kept busy. If the scheduling is not done correctly, the inherent advantages of multiprocessor parallelism will not be fully realized. In particular, if the operating system does not provide the proper tools to exploit parallelism, performance will suffer. Real-time systems, as we mentioned, require specially designed operating systems. Real-time systems, as well as embedded systems, require a minimal size operating system and minimal resource usage. Wireless networks, which combine the compactness of embedded systems with the characteristic problems of networked systems, also prompted innovations in operating system design.


Page 395:
364, , Chapter 8 / System Software, Personal Computer Operating Systems, , Personal computer operating systems serve a different purpose than larger systems. While larger systems want to provide excellent performance and hardware utilization (while also making the system easy to use), operating systems for personal computers have one main goal: making the system easy to use. When Intel introduced the 8080 microprocessor in 1974, the company asked Gary Kildall to write an operating system. Kildall built a controller for a floppy disk, connected the disk to the 8080, and wrote software for the operating system to control the system. Kildall called this disk-based operating system CP/M (Control Program for Microcomputers). The BIOS (Basic Input/Output System) made it possible to easily export CP/M to different types of PCs because it provided the necessary interactions with input and output devices. Since I/O devices are the components most likely to vary from system to system, by packaging the interfaces of these devices into a module, the actual operating systems can remain the same for multiple machines. Only had to change the BIOS. Intel wrongly assumed that disk-based machines had a bleak future. Then, deciding not to use this new operating system, Intel granted Kildall the rights to CP/M. In 1980, IBM needed an operating system for the IBM PC. Although IBM first approached Kildall, the deal ultimately went to Microsoft, which bought a disk-based operating system called QDOS (Quick and Dirty Operating System) from the Seattle Computer Products Company for $15,000. The software was renamed MS-DOS and the rest is history. The operating systems of the first personal computers worked with commands typed on the keyboard. Alan Key, inventor of the GUI (Graphical User Interface) and Doug Engelbart, inventor of the mouse, both at the Xerox Palo Alto Research Center, forever changed the face of operating systems when their ideas were incorporated into operating systems. . Through their efforts, command prompts have been replaced by windows, icons, and dropdown menus. Microsoft popularized these ideas (but did not invent them) through its series of Windows operating systems: Windows 1.x, 2.x, 3.x, 95, 98, ME, NT2000, and XP. The Macintosh graphical operating system, MacOS, which preceded the Windows GUI by several years, has also gone through several versions. Unix is ​​gaining popularity in the world of personal computers through Linux and OpenBSD. There are many other disk operating systems (such as DR DOS, PC DOS, and OS/2), but none are as popular as Windows and the myriad variants of Unix. The most important piece of software used by a computer is its operating system, and special attention should be paid to its design. The operating system controls the basic functions of the computer, including memory and I/O management, not to mention the appearance of the interface. An operating system differs from most other software in that it is event driven, which means that it performs tasks in response to commands, application programs, I/O devices, and interrupts.


Page 396:
8.2 / Operating Systems, , 365, , Four main factors drive operating system design: performance, power, cost, and compatibility. By now you should have an idea of ​​what an operating system is, but there are many different opinions about what an operating system should be, as evidenced by the various operating systems available today. Most operating systems have similar interfaces, but they vary greatly in the way tasks are accomplished. Some operating systems have a minimalist design and choose to cover only the most basic functions, while others try to include every imaginable feature. Some have superior interfaces but fail in other areas, while others are superior in memory and I/O management but fall short in ease of use. No operating system is superior in every way. Two components are crucial in the design of the operating system: the kernel and the system programs. The kernel is the core of the operating system. It is used by the process manager, scheduler, resource manager, and I/O manager. The kernel is responsible for scheduling, synchronization, protection/security, memory management, and interrupt handling. It has primary control of the system hardware, including interrupts, control registers, status words, and timers. It loads all device drivers, provides common utilities, and coordinates all I/O activities. The kernel must know the hardware specifications to combine all of these pieces into a working system. The two extremes of kernel design are microkernel architectures and monolithic kernels. Microkernels provide rudimentary operating system functionality and rely on other modules to perform specific tasks, thereby moving many typical operating system services into user space. This allows many services to be restarted or reconfigured without restarting the entire operating system. Microkernels provide security because services running at the user level have restricted access to system resources. Microkernels can be customized and ported to other hardware more easily than monolithic kernels. However, additional communication between the kernel and the other modules is required, often resulting in a slower and less efficient system. The main features of the microkernel design are its smaller size, easy portability, and the variety of services that run in a layer above the kernel instead of the kernel itself. Microkernel development was significantly encouraged by the growth of SMP and other multiprocessor systems. Examples of microkernel operating systems include Windows 2000, Mach, and QNX. Monolithic kernels provide all of their essential functionality through a single process. Consequently, they are significantly larger than micronuclei. Usually targeted at specific hardware, monolithic kernels interact directly with the hardware, so they can be more easily optimized than microkernel operating systems. It is for this reason that monolithic kernels are not easily portable. Examples of monolithic kernel operating systems include Linux, MacOS, and DOS. Because an operating system consumes resources as well as manages them, designers must consider the overall size of the finished product. For example, Sun Microsystem's Solaris requires 8 MB of disk space for a full installation; Windows 2000 requires approximately twice that amount. These statistics attest to the explosion in operating system functionality over the past two decades. MS-DOS 1.0 fits comfortably on a single 100KB floppy disk.


Page 397:
366, , Chapter 8 / System Software, , 8.2.3, , Operating System Services, Throughout the above discussion of operating system architecture, we mentioned some of the most important services provided by operating systems. The operating system oversees all critical system management tasks, including memory management, process management, protection, and interaction with I/O devices. In its interface role, the operating system determines how the user interacts with the computer, serving as a buffer between the user and the hardware. Each of these functions is an important factor in determining the overall performance and usability of the system. In fact, sometimes we are willing to accept reduced performance if the system is easy to use. Nowhere is this trade-off more apparent than in the area of ​​graphical user interfaces. The Human Interface The operating system provides an abstraction layer between the user and the machine's hardware. Neither users nor applications see the hardware directly, as the operating system provides an interface to hide the details of the bare machine. Operating systems provide three basic interfaces, each of which provides a different view for a particular individual. Hardware developers are interested in the operating system as the interface to the hardware. Application developers view the operating system as an interface to various programs and application services. Common users are more interested in the GUI, which is the interface most commonly associated with the term interface. Operating system user interfaces can be divided into two general categories: command line interfaces and graphical user interfaces (GUI). Command line interfaces provide a prompt at which the user enters various commands, including copy files, delete files, provide a directory listing, and manipulate the directory structure. Command line interfaces require the user to know the system syntax, which is often too complicated for the average user. However, for those who have mastered a certain command vocabulary, tasks are accomplished more efficiently with direct commands rather than using a graphical interface. GUIs, on the other hand, provide a more accessible interface for the casual user. GUIs consist of windows located on desktops. They include features such as icons and other graphical representations of files that are manipulated with the mouse. Examples of command line interfaces include Unix and DOS shells. GUI examples include the various types of Microsoft Windows and macOS. The decreasing cost of equipment, especially processors and memory, has made it practical to add GUIs to many other operating systems. Of particular interest is the generic X Window System provided with many Unix operating systems. The user interface is a program, or a small set of programs, that makes up the display manager. This module is normally separate from the main operating system functions found in the operating system kernel. Most modern operating systems create a general operating system package with modules for the interface, file handling, and other applications that are closely linked to the kernel. EITHER


Page 398:
8.2 / Operating Systems, , 367, , the way these modules are linked together is a defining characteristic of today's operating systems., Process Management, , Process management is the heart of operating system services. It includes everything from creating processes (setting up the appropriate structures to store, information about each one), to scaling the use of various resources by processes, to killing processes and cleaning them up after they have finished. The operating system keeps track of each process, its state (which includes the values ​​of variables, the contents of CPU registers, and the actual state (running, ready, or waiting) of the process), the resources it is using, and what resources it requires. The operating system closely monitors the activities of each process to avoid synchronization problems, which arise when concurrent processes have access to shared resources. These activities must be carefully monitored to avoid data inconsistencies and accidental interference. At any given time, the kernel manages a collection of processes, consisting of user processes and system processes. Most processes are independent of each other. However, if they need to interact to achieve a common goal, they rely on the operating system to facilitate their inter-process communication tasks. Process scheduling is a large part of the operating system's normal routine. you must determine which processes to allow into the system (often called long-term scheduling). Next, you need to determine which process will receive the CPU at any given time (short-term scheduling). To perform short-term scheduling, the operating system maintains a list of ready processes, so it can differentiate between processes that are waiting for resources and those that are ready to be scheduled and run. If a running process needs I/O or other resources, it voluntarily gives up CPU and is placed on a waiting list, and another process is scheduled to run. This sequence of events constitutes a context switch. During a context switch, all pertinent information about the currently running process is saved, so that when that process resumes execution, it can be restored to the exact state in which it was interrupted. The information saved during a context switch includes the contents of all CPU registers, page tables, and other information associated with virtual memory. Once this information is safely stored, a previously interrupted process (the one preparing to use the CPU) is restored to its exact state, prior to its interruption. (New processes, of course, don't have a previous state to restore.) A process can give up CPU in two ways. In nonpreemptive scheduling, a process voluntarily gives up CPU (possibly because it needs another unscheduled resource). However, if the system is configured with time division, the operating system can take the process out of a running state and into a waiting state. This is called preemptive scheduling because the process is stopped and the CPU is removed. Preference also occurs when processes are


Page 399:
368, , Chapter 8 / System Software, , scheduled and stopped based on priority. For example, if a low-priority job is running and a high-priority job needs the CPU, the low-priority job is placed in the ready queue (a context switch is performed), allowing the job to run. high priority work. run immediately. The main task of the operating system in process scheduling is to determine which process should be next in line for the CPU. Factors that affect scheduling decisions include CPU utilization, throughput, response time, wait time, and response time. Short-term programming can be done in a number of ways. Approaches include first come, first serve (FCFS), shortest work first (SJF), rotating shifts, and priority scheduling. In first-come, first-served scheduling, processes are allocated processor resources in the order in which they are requested. Control of the CPU is relinquished when the running process terminates. FCFS scheduling is a nonpreemptive algorithm that has the advantage of being easy to implement. However, it is not suitable for systems that support multiple users because there is a large variance in the average time a process must wait to use the CPU. Also, one process can hog the CPU, causing excessive delays in the execution of other pending processes. SJF is a provable optimal programming algorithm. The main problem is that there is no way to know in advance exactly how long a job will run. Systems that employ the shortest task first apply some heuristics to make "estimates" of task execution time, but these heuristics are far from perfect. The shortest task first can be non-preemptive or preemptive. Each process is allocated a certain slice of CPU time. If the process is still running when its timeslot expires, it is switched via a context switch. The next process waiting in the queue gets its own CPU slice, time. Shift scheduling is widely used in time sharing systems. When the scheduler uses small enough time intervals, users are unaware that they are sharing system resources. However, the time intervals should not be so small that the context switch time is large in comparison. Priority scheduling assigns a priority to each process. When the short-term scheduler selects a process from the ready queue, the process with the highest priority is chosen. FCFS gives the same priority to all processes. SJF gives priority to the shortest job. The main problem with priority scheduling is the possibility of indefinite starvation or blocking. Can you imagine how frustrating it would be trying to run a large job on a busy system when users continually submit shorter jobs that run before yours? Folklore says that when a mainframe at a major university went down, it found itself a job in the ready queue that it had been trying to run for several years! Some operating systems offer a combination of programming approaches. For example, a system might use a preemptive, priority-based, first-come, first-served algorithm. The highly complex operating systems that support enterprise-class systems allow some degree of user control over the length of the time slot, the number of concurrent tasks allowed, and the prioritization of different classes of work.


Page 400:
8.2 / Operating Systems, , 369, , Multitasking (allowing multiple processes to run concurrently) and multithreading (allowing a process to be subdivided into different controlling threads) provide interesting challenges for CPU scaling. A thread is the smallest programmable unit in a system. Threads share the same execution environment as their parent process, including CPU registers and the page table. Because of this, cross-thread context switching results in less overhead, so they can occur much faster than a process-wide context switch. Depending on the degree of concurrency you need, you can have a single-threaded process, a multi-threaded process, multiple single-threaded processes, or multiple multi-threaded processes. An operating system that supports multi-threading must be able to handle all combinations., Resource Management, In addition to process management, the operating system manages system resources. Since these resources are relatively expensive, it is preferable to allow them to be shared. For example, multiple processes can share a processor, multiple programs can share physical memory, and multiple users and files can share a disk. There are three resources that are of great concern to the operating system: the CPU, memory, and I/O. Access to the CPU is controlled by the programmer. Access to memory and I/O require a different set of controls and functions. Recall from Chapter 6 that most modern systems have some form of virtual memory that extends the RAM. This implies that parts of various programs can coexist in memory and each process must have a page table. Originally, before operating systems were designed to handle virtual memory, virtual memory was implemented by the programmer using the overlay technique. If a program was too large to fit in memory, the programmer would break it into parts, loading only the data and instructions needed to run at any given time. If new data or instructions were needed, it was up to the programmer (with some help from the compiler) to make sure that the correct parts were in memory. The programmer was responsible for memory management. Now operating systems have taken over this task. The operating system converts virtual addresses to physical addresses, transfers pages to and from disk, and maintains memory page tables. The operating system also determines the main memory allocation and keeps track of free frames. As the operating system deallocates memory space, it performs "garbage collection," which is the process of merging small chunks of free memory into larger, usable chunks. In addition to processes sharing a single finite memory, they also share I/O devices. Most inputs and outputs are done at the request of an app. The operating system provides the services necessary to allow input and output to occur. Applications may handle their own I/O without using the operating system, but in addition to duplicating effort, this introduces protection and access issues. If several different processes try to use the same I/O device simultaneously, the requests must be mediated. It is up to the operating system to perform this task. The operating system provides a generic interface for I/O through


Page 401:
370, , Chapter 8 / System Software, , various system calls. These calls allow an application to request an I/O service through the operating system. The operating system then calls device drivers that contain software that implements a standard set of functions relevant to particular I/O devices. The operating system also manages disk files. The operating system takes care of file creation, file deletion, directory creation, and directory deletion and also provides support for primitives that manipulate files and directories and their assignment to secondary storage devices. While I/O device drivers take care of many specific details, the operating system coordinates the activities of device drivers that support I/O system functions. The system has to ensure that everything works correctly, fairly and efficiently. However, sharing resources creates a large number of exposures, such as the possibility of unauthorized access or modification of data. Thus, the operating system also serves as a resource, a protector, ensuring that "bad guys" and faulty software don't mess things up for others. Concurrent processes must be shielded from each other, and operating system processes must be shielded from all user processes. Without this protection, a user program could remove operating system code to handle, for example, interrupts. Multi-user systems require additional security services to protect shared resources (such as memory and I/O devices) and non-shared resources (such as personal files). Memory protection protects against a bug, a user program affecting other programs, or a malicious program taking control of the entire system. CPU protection ensures that user programs do not get stuck in endless loops, consuming CPU cycles needed for other work. The operating system provides security services in several ways. First, active processes are limited to running within their own memory space. All requests for I/O or other processing resources go through the operating system, which then processes the request. The operating system runs most commands in user mode and some in kernel mode. In this way, resources are protected against unauthorized use. The operating system also provides features to control user access, typically through login names and passwords. Stronger protection can be achieved by restricting processes to a single subsystem or partition. against processes running out of control on the system. Process execution must be isolated from the operating system and other processes. Access to shared resources must be controlled and mediated to avoid conflicts. There are various forms of protection.


Page 402:
8.3 / Protected environments, , 371, , barriers can be erected in a system. In this section, we discuss three of them: virtual machines, subsystems, and partitions. card readers, printers and processor cycles. The hardware of that time was not compatible with the solutions that were in the minds of many computer scientists. In a better world, each user process would have its own machine, an imaginary machine that would coexist peacefully with many other imaginary machines inside the real machine. deliver these “virtual machines” to general-purpose timesharing computers. Virtual machines are, in concept, quite simple. The actual hardware of the actual computer is under the sole control of a driver program (or kernel). , constraints like any program running in user space. The driver presents each virtual machine with an image similar to the hardware of the real machine. Each virtual machine then "sees" an environment consisting of a CPU, registers, I/O devices, and (virtual) memory as if these resources were dedicated to the exclusive use of the virtual machine. Thus, virtual machines are imaginary machines that reflect the resources of entire systems. As illustrated in Figure 8.1, a user program running within the bounds of the virtual machine can access any system resource defined for it. When a program calls a system service to write data to disk, for example, it executes the same call as if it were running on the real machine. The virtual machine receives the I/O request and passes it to the driver program for execution on the real hardware. It is quite possible that a virtual machine is running an operating system other than the kernel operating system. It is also possible for each virtual machine to run a different operating system from the operating systems that other virtual machines run on the system. In fact, this is often the case. If you have ever opened an "MS-DOS" prompt on a Microsoft Windows system (95, through XP), you have created an instance of a virtual machine environment. The driver program for these versions of Windows is called Windows Virtual Machine Manager (VMM). VMM is a 32-bit protected mode subsystem (see the next section) that creates, runs, monitors, and shuts down virtual machines. VMM is loaded into memory at boot time. When invoked via the command prompt, VMM creates an "MS-DOS" machine running under a virtual image of a 16-bit Intel 8086/8088 processor. Although the real system has many more registers (which are 32 bits wide), tasks running in the DOS environment see only the limited number of 16-bit registers typical of an operating system.


Page 403:
372, , Chapter 8 / System Software, , Real CPU, , Main, Memory, , I/O, Devices, , Program Counter, , Logical-Arithmetic, Unit, , Registers, , Control, Unit, , Memory Manager , , I /O Subsystem, , Memory, , Input/Output, Memory, , PC, , Input/Output, , ALU, PC, ALU, , Control, Unit, , Registers, , Registers, , Virtual Machine, , Control, Unit, , Virtual Machine, Control, Program, Memory, I/O, PC, ALU, Registers, Control, Unit, Virtual Machine, FIGURE 8.1 Images of a virtual machine running under a control program, 8086/8088 Processor. The VMM driver converts (or, in virtual machine language, converts) 16-bit instructions to 32-bit instructions before they are executed on the real system processor. To service hardware interrupts, VMM loads a defined set of virtual device drivers (VxDs) each time Windows starts. A VxD can simulate external hardware or it can simulate a programming interface that is accessed through privileged instructions. VMM works with 32-bit protected mode dynamic link libraries (explained in Section 8.4.3), which allows virtual devices to intercept interrupts and


Page 404:
8.3 / Protected Environments, , 373, , failures. In this way, you control an application's access to system hardware and software devices. Of course, virtual machines use virtual memory, which must coexist with the memory of the operating system and other virtual machines running on the system. A diagram of Windows 95 memory address allocation is shown in Figure 8.2. Each process is given between 1 MB and 1 GB of private address space. This private address space is inaccessible to other processes. If an unauthorized process tries to use the protected memory of another process or operating system, a protection fault will occur (rudely announced via a simple blue screen message). The shared memory region is provided to allow sharing of data and program code between processes. The top region contains system virtual machine components, as well as DLLs accessible to all processes. The lower region is not addressable, which serves as a way to detect pointer errors. When modern systems support virtual machines, they can better provide the protection, security, and manageability that large business-class computers require. Virtual machines also provide compatibility between many hardware platforms. One such machine, the Java Virtual Machine, is described in Section 8.5. -Bit, Windows processes), 1 GB, non-addressable, (32-bit Windows applications), 4 MB, 16-bit heap space, 1 MB, DOS virtual machines, 0, , FIGURE 8.2 Memory map Windows 95


Page 405:
374, , Chapter 8 / System software, , 8.3.2, , Subsystems and partitions, Windows VMM is a subsystem that starts when Windows starts. Windows also starts other special-purpose subsystems for file management, I/O, and configuration management. The subsystems establish logically distinct environments that can be individually configured and managed. Subsystems run on top of the operating system kernel, which provides access to critical system resources, such as the CPU scheduler, which must be shared among multiple subsystems. Each subsystem must be defined within the context of the control system. These definitions include descriptions of resources, such as disk files, input and output queues, and various other hardware components, such as terminal sessions and printers. Resources defined for a subsystem are not always directly seen by the underlying kernel, but are visible through the subsystem for which they are defined. The resources defined for a subsystem may or may not be shareable between peer subsystems. Figure 8.3 is a conceptual representation of the relationship of subsystems to other system resources. Subsystems help manage the activities of large and highly complex computer systems. Since each subsystem is its own discrete controllable entity, system administrators can start and stop each subsystem individually without disturbing the kernel or any other running subsystem. Each subsystem can be individually tuned by reallocating system resources, such as, Control System, Subsystem 2, Subsystem 1, Subsystem n, Disk, Disk, Disk, Disk, FIGURE 8.3 A single resource can be configured in Multiple Subsystems


Page 406:
8.3 / Protected environments, , 375, , add or remove disk space or memory. Also, if a process goes out of control within a subsystem, or the subsystem itself fails, generally only the subsystem in which the process is running is affected. Therefore, subsystems not only make systems more manageable, but also make them more robust. In very large computer systems, the subsystems do not go far enough to segment the machine and its resources. Sometimes a more sophisticated barrier is needed to facilitate security and resource management. In these cases, a system can be divided into logical partitions, sometimes called LPARs, as illustrated in Figure 8.4. LPARs create distinct machines within a physical system, with nothing implicitly shared between them. The resources of one partition are no more accessible to another partition than if the partitions were running on physically separate systems. For example, if a system has two partitions, A and B, then partition A can read a file from partition B only if both partitions agree to mutually establish a shared resource, such as a pipe or message queue. Generally speaking, files can be copied between partitions only using a file transfer protocol or a utility created for this purpose by the system vendor., , Partition Control, , Partition, 1, , Partition, 2, , Partition , n , , Disk, Disk, , Disk, , Disk, Disk, , FIGURE 8.4 Logical partitions and their control system: Resources cannot be easily shared between partitions


Page 407:
376, , Chapter 8 / System Software, , Logical partitions are especially useful for creating "sandbox" environments, training users, or testing new programs. Sandbox environments get their name from the idea that anyone using these environments is free to "play" to their heart's content, as long as that game takes place within the confines of the sandbox. Sandbox environments place strict limits on the accessibility of the system and resources. Processes running in one partition can never intentionally or inadvertently access data or processes residing in other partitions. Therefore, partitions increase the level of security on a system by isolating resources from processes that do not have the right to use them. being mini-models of the layered system architecture of a computer system. In the case of a partitioned environment, the tiers would look like birthday cakes on adjacent layers, extending from the hardware tier to the application tier. Subsystems, on the other hand, are not all that different from each other, with most of the differences occurring at the system software level. , and logical partitions were considered artifacts of "old tech" mainframe systems. Throughout the 1990s, smaller machines were believed to be more profitable than mainframe systems. The "client-server" paradigm was designed to be easier to use and responsive to dynamic business conditions. Application development for small systems and quickly recruited programming talent. Office automation programs such as word processing and calendar management have found much more comfortable homes in collaborative network environments supported by small file servers. Print servers drove network-enabled laser printers that produced clean, sharp results on plain paper faster than mainframe line printers that could produce blurry results on specialty forms. At any size, desktop and small server, the platforms deliver raw computing power and convenience at a fraction of the cost of equivalent mainframe raw computing power. However, raw computing power is only one part of a business computing system. vital business records. Application servers, hosted programs that perform essential business management functions. When businesses became Internet-enabled, email and web servers were added to the network. If any of the servers became overloaded with activity, the simple solution was to add another server to spread the load. In the late 1990s, large companies owned huge server farms that supported hundreds of individual servers within secure, environmentally controlled facilities. Server


Page 408:
8.3 / Protected Environments, , 377, , farms soon became voracious consumers of labor, with each server occasionally requiring considerable attention. The content of each server had to be backed up on tape, and the tapes were later swapped offsite for security. Every server was a potential source of failure, and diagnosing problems and patching software became daily tasks. Before long, it became clear that smaller, cheaper systems weren't as good as they were once thought to be. This is particularly true for companies that support hundreds of small server systems. All the major business PC manufacturers now offer a server consolidation product. Different vendors take different approaches to the problem. One of the most interesting is the idea of ​​creating logical partitions that contain multiple virtual machines on a single, very large computer. The many advantages of server consolidation include: • Managing one large system is easier than managing many smaller systems. • A single large system consumes less electricity than a group of smaller systems that have equivalent computing power. • With less electricity consumption, less heat. is generated, which saves on air conditioning. • Larger systems can provide greater protection against failure. (Replacement disks and processors are often included with systems.) • A single system is easier to back up and recover. • Single systems take up less space, reducing real estate costs. large system than for a large number of small ones. • Less manpower is required to apply user program and system software updates to one system rather than many. Large system vendors such as IBM, Unisys, and Hewlett-Packard (to name a few) have been quick to seize server consolidation opportunities. IBM's mainframe and midrange lines have been redesigned as eSeries servers. The System/390 mainframe was reincarnated as the zSeries server. zSeries servers can support up to 32 logical partitions. Each partition running the IBM virtual machine (VM) operating system can define thousands of virtual Linux systems. Figure 8.5 shows the configuration of a zSeries/Linux model. Each virtual Linux system is just as capable of supporting business applications and e-commerce activities as a stand-alone Linux system, but without the administration overhead. Thus, a server farm the size of a football field can be replaced with a zSeries "box", which is slightly larger than a household refrigerator. The server consolidation movement can be said to symbolize the evolution of the operating system. System builders, by applying the evolving capabilities of the machine, continue to make their systems easier to manage, even as they become increasingly powerful.


Page 409:
378, , Chapter 8 / System Software, , Control Partition, zOS, Partition, , Subsystem, , Subsystem, , Subsystem, , VM Partition, Virtual, Linux, Machine, , Subsystem, , Virtual, Linux, Machine, , VM Partition, , Subsystem, , Virtual, Linux, Machine, Virtual, Linux, Machine, , Virtual, Linux, Machine, , Virtual, Linux, Machine, Subsystem, , FIGURE 8.5 Linux machines in logical partitions of an IBM zSeries server, , 8.4 , , PROGRAMMING TOOLS, The operating system and its suite of applications provide an interface between the user who writes the programs and the system that runs them. Other utilities or programming tools are needed to perform the more mechanical aspects of software creation. We discuss these in the sections below., , 8.4.1, , Assemblers and Assemblers, In our layered system architecture, the layer directly below the Operating System layer is the Assembly Language layer. In Chapter 4, we present a hypothetical simple machine architecture, which we call MARIE. This architecture is so simple, in fact, that no real machine would ever use it. For one thing, the continuous need to get operands from memory would slow down the system a lot. Real systems minimize memory fetches by providing a sufficient number of on-chip addressable registers. Also, the instruction set architecture of any real system would be much richer than MARIE's. Many microprocessors have over a thousand different instructions in their repertoire. Although the machine we present is quite different from a real machine, the assembly process we describe is not. Virtually all assemblers in use today double check the source code. The first step gathers as much code as possible, while building a symbol table; the second pass completes the binary instructions using address values ​​retrieved from the symbol table created during the first pass.


Page 410:
8.4 / Programming Tools, , 379, , The end result of most assemblers is a stream of relocatable binary instructions. Binary code is relocatable when the addresses of the operands are relative to where the operating system loaded the program in memory, leaving the operating system free to load this code wherever it wants. Take, for example, the following MARIE code from Table 4.5:, Load x, Add y, Store z, Stop, x, DEC 35, y, DEC -23, z, HEX 0000, , The output of the assembled code might look like like for this:, 1+004, 3+005, 2+006, 7000, 0023, FFE9, 0000, , The "+" sign in our example should not be interpreted literally. It signals to the program, loader (component of the operating system) that 004 is in the first instruction, relative to the starting address of the program. Consider what happens if the loader places the program in memory at address 250h. The memory image would appear as shown in Table 8.1. If the loader thought that memory at address 400h was a better place for the program, the memory image would look like Table 8.2. In contrast to relocatable code, absolute code is executable. binary code that should always be loaded into a specific location in memory. The non-relocatable code is, Address, Memory, Content, 250, 251, 252, 253, 254, 255, 256, 1254, 3255, 2256, 7000, 0023, FFE9, 0000, TABLE 8.1 Memory if the program is loaded from the address 250h


Page 411:
380, , Chapter 8 / System Software, , Address, , Memory, Content, , 400, 401, , 1404, 3405, , 402, 403, , 2406, 7000, , 404, 405, , 0023, FFE9, , 406 , , 0000, , TABLE 8.2 Memory if the program is loaded from 400h, , used for specific purposes in some computer systems. Typically, these applications involve explicit control of connected devices or manipulation of system software, where specific software routines can always be found in clearly defined locations. It is clear that binary machine instructions cannot be provided with "+" signs to distinguish and non-relocatable code. The specific way in which the distinction is made depends on the design of the operating system that will run the code. One of the simplest ways to distinguish between the two is to use different file types (extensions) for this purpose. The MS-DOS operating system uses a .COM extension (a COMmand file) for non-relocatable code and an .EXE extension (an EXEcutable file) for relocatable code. COM files are always loaded at address 100h. EXE files can be loaded anywhere and don't even need to take up contiguous memory space. Relocatable code can also be distinguished from non-relocatable code by appending all executable binary code with prefix or preamble information, which allows the loader to know its options while reading the program file from disk. When relocatable code is loaded into memory, the special registers generally provide the base address for the program. All addresses in the program are considered offsets to the base address stored in the register. In Table 8.1, where we show the loader putting the code at address 0250h, a real system would simply store 0250 in the base address register of the program and execute the program without modification, as in Table 8.3, where the address of each operand , Address , , Memory, Content, , 250, 251, 252, 253, 254, 255, 256, , 1004, 3005, 2006, 7000, 0023, FFE9, 0000, , TABLE 8.3 Memory if the program is loaded at address 250h using Base, Register Address


Page 412:
8.4 / Programming Tools, , 381, , becomes an effective address after being incremented by the 0250 stored in the base address register. physical addresses. The binding of instructions and data to memory addresses can occur at compile time, load time, or run time. Absolute code is an example of compile-time binding, where instruction and data references are bound to physical addresses when the program is compiled. Compile-time binding only works if the memory load location for a process image is known in advance. However, in compile-time binding, if you change the start location of the process image, the code must be recompiled. If the memory load location for a process image is not known at compile time, relocatable code is generated that can be linked at load time or at run time. Load-time binding adds the start address of the process image to each reference as the binary module is loaded into memory. However, the process image cannot be moved during execution, because the start address of the process must remain the same. Late-binding (or late-binding) delays binding until the process is actually running. This allows the process image to be moved from one memory location to another while it is running. The late binding requires special hardware support for address mapping or translation of a logical process address to a physical address. A special base register stores the starting address of the program. This address is added to each reference generated by the CPU. If the process image is moved, the base register is updated to reflect the new start address of the process. Additional virtual memory hardware is required to perform this conversion quickly. Linking is the process of combining the external symbols of a program with all the symbols exported from other files, producing a single binary file with no unresolved external symbols. The main task of a link editor, as shown in Figure 8.6, is to combine related program files into a unified loadable module. (The example in the figure uses files, extensions characteristic of a DOS/Windows environment.) The constituent binaries can be written entirely by the user or can be combined with standard system routines, depending on the needs of the application. Also, the binary linker input can be produced by any compiler. Among other things, this allows various sections of a program to be written in different languages, so that part of a program can be written in C++, to make coding easier, and another part can be written in assembly language to speed up execution in a section. particularly slow code. As with assemblers, most link editors require two passes to produce a complete load module that includes all external input modules. During its first pass, the linker produces a global external symbol table containing the names of each of the external modules and their starting addresses relative to the start of the entire linked module. During the second pass, all references


Page 413:
382, , Chapter 8 / System Software, MyProg.obj, (Main Program), CALL ProcA, ..., ..., ..., CALL ProcC, ..., ..., ..., CALL ProcB , ..., ..., ..., , MyProg.exe, , Linker, , CALL ProcA, ..., ..., ..., CALL ProcC, ..., ..., . . ., CALL ProcB, ..., ..., ..., , Main, Memory, Loader, , ProcA, ProcC, ..., ..., . 🇧🇷 🇧🇷 ProcB, ..., ProcC.obj, ..., . 🇧🇷 🇧🇷 ProcA, ProcB.obj, ..., ..., ..., ProcA.obj, , ProcB, ProcC, , FIGURE 8.6 Binding and loading of binary modules, , between modules (previously separate and external) they are replaced by offsets for these modules from the symbol table. During the second pass of the linker, platform-dependent code can also be added to the combined module, producing a unified, loadable binary program file. does not require binding editing of all procedures used by a program before creating an executable module. With the proper syntax in the source program, certain external modules can be linked at runtime. These external modules are called dynamic link libraries (DLLs) because linking is done only when the program or module is first invoked. The dynamic link process is shown schematically in Figure 8.7. As each procedure is loaded, its address is placed in a crosstab within the main module of the program. This approach has many advantages. First, if multiple programs repeatedly use an external module, static linking would require each of those programs to include a copy of the modules binary. It's clearly a waste of disk


Page 414:
383, , 8.4 / Programming Tools, MyProg.exe, (Main Program), , Main, Memory, Addresses, , CALL ProcA, ..., ..., ..., CALL ProcC, ..., .. . , ..., CALL ProcB, , 2D8, CALL ProcA, CALL ProcC, CALL ProcB, , ProcA ∅ ∅ ∅, ProcB ∅ ∅ ∅, ProcC ∅ ∅ ∅, , ProcC, ..., ..., . 🇧🇷 🇧🇷 ProcB, ..., ProcC.DLL, ..., . 🇧🇷 🇧🇷 ProcA, ProcB.DLL, ..., ..., ..., ProcA.DLL, , ProcA 6A4, Binding, Loader, , ProcB 8B2, ProcC ∅ ∅ ∅, ProcA, ..., . .., . .., , 6A4, , 8B2, ProcB, ..., ..., ..., , FIGURE 8.7 Dynamic binding with load-time address resolution, , space to have multiple copies of the same code, so we save space by late-binding. The second advantage of dynamic linking is that if the code in one of the external modules changes, it is not necessary to relink each module that was linked to preserve the integrity of the program. which modules use which particular external modules can be difficult, perhaps impossible, for large systems. Third, dynamic links provide the means by which third parties can create common libraries, the presence of which can be assumed by anyone writing programs for a specific system. In other words, if you are writing a program for a particular brand of operating system, you can assume that certain specific libraries will be available on all computers running that operating system. You don't have to worry about the OS version number, patch level, or anything else that is subject to frequent changes. As long as the library is never deleted, it can be used for dynamic linking. Dynamic linking can occur when a program is loaded or when a program first calls an unbound procedure while it is running. Dynamic linking, at load time, causes delays in program startup. Instead of just reading the program


Page 415:
384, Chapter 8 / System Software, , binary code from disk and running it, the operating system not only loads the main program, but also loads the binaries for all the modules that the program uses. The loader provides the load addresses for each module in the main program before executing the first statement in the program. The time lag between the time the user invokes the program and the time the program actually begins execution may be unacceptable for some applications. On the other hand, late-binding does not incur the initialization penalties of load-time binding, because a module is bound only if it is called. This saves a considerable amount of work when relatively few modules of a program are actually called. However, some users object to perceived erratic response times when a running program stops frequently to load library routines. , routine. Therefore, if the authors of the binding library code decide to change its functionality, they can do so without the knowledge or consent of the people using the library. Also, as anyone who has written commercial programs can tell you, the smallest changes to these library routines can cause ripple effects throughout the system. These effects can be disruptive and very difficult to trace back to their source. Fortunately, such surprises are rare, so dynamic linking remains the preferred approach for distributing commercial binary code across entire classes of operating systems. Languages, it cannot be done. First, assembly language gives the programmer direct access to the underlying architecture of the machine. Programs used to control and/or communicate with peripheral devices are generally written in assembly language because of the special instructions available in assembly language that are not normally available in high-level languages. A programmer does not need to depend on an operating system, services to control a communication port, for example. Using assembly language, you can make the machine do anything, even those things for which no operating system services are provided. In particular, programmers often use assembly language to take advantage of specialized hardware, because compilers for high-level languages ​​are not designed to handle rare or infrequently used devices. Also, well-written assembly code is incredibly fast. Each primitive instruction can be enhanced to produce the most timely and effective action on the system. These advantages, however, are not sufficient reasons to use assembly language for general application development. The fact is that assembly language programming is difficult and prone to errors. It is even more difficult to maintain than it is to write, especially if the programmer who maintains it is not the original author of the program. Most importantly, assembly languages ​​are not portable to different machine architectures. For these reasons, most general-purpose system software contains very few, if any, assembly instructions. Assembly code is only used when it is absolutely necessary to do so.


Page 416:
8.4 / Programming Tools, , 385, , Today, virtually all system programs and applications use higher-level languages ​​almost exclusively. Of course, "upper level" is a relative term, subject to misunderstanding. An accepted taxonomy for programming languages ​​begins by calling binary machine code a "first generation" computer language (1GL). The programmers, of this 1GL, pre-entered the program instructions directly into the machine using toggle switches on the system console! More "privileged" users wrote binary instructions on slips of paper or cards. Programming productivity skyrocketed when the first assemblers were written in the early 1950s. These "second generation" languages ​​(2GL) eliminated errors introduced when instructions were manually translated into machine code. The next leap in productivity came with the introduction of compiled symbolic languages, or "third generation" languages ​​(3GL), in the late 1950s. FORTRAN (FORMULA TRANSLATION) was the first of these, pioneered by John Backus and his IBM team. in 1957. In the years that followed, a veritable alphabet soup of 3GL spread throughout the programming community. Sometimes their names are quick acronyms like COBOL, SNOBOL, and COOL. Sometimes they are named after people, like Pascal and Ada. It's not uncommon for 3GLs to be called whatever their designers want to call them, such as C, C++, and Java. solves them. Some fourth and fifth generation languages ​​are so user friendly that end users can easily perform programming tasks that previously required a trained professional programmer, the key idea being that the user simply tells the computer what to do, not how to do it. . it's. 🇧🇷 The compiler figures out the rest. By making things simple for the user, these state-of-the-art languages ​​put a substantial overhead on computer systems. Ultimately, all instructions must go down the language hierarchy, because the digital hardware that actually does the work can only execute binary instructions. In Chapter 4, we pointed out that there is a one-to-one correspondence between assembly language statements and the binary code that the machine actually executes. In compiled languages, this is a one-to-many relationship. For example, enabling, for variable storage definitions, the high-level language instruction, x = 3*y, would require at least 12 MARIE assembly language program instructions. The ratio between the source code instructions and the binary machine instructions becomes smaller in proportion to the sophistication of the source language. The more "superior" the language, the more machine instructions each program line typically generates. This relationship is shown in the programming language hierarchy of Figure 8.8. The science of writing compilers has continued to improve since the first compilers were written in the late 1950s. Through its accomplishments in building compilers, the science of software engineering has demonstrated its ability to turn seemingly intractable problems into in routine programming tasks. The intractability of the problem lies in bridging the semantic gap between statements that make sense to people and statements that make sense to machines. Most compilers perform this transformation using a six-stage process, as shown in Figure 8.9. The first step in code compilation, called lexical analysis, aims to extract meaningful language primitives, or tokens, from a textual source stream, code. These tokens consist of language-specific reserved words (for example, if,


Page 417:
386, , Chapter 8 / System Software, , 5th Generation Languages, (Natural Language), , +, , 4th Generation Languages, (SQL, LISP, etc.), , Ease of, Human Understanding, , 3rd Generation Languages, (Java, C, Pascal, FORTRAN, COBOL, etc.), , –, , 2nd Generation Languages ​​(Assembly Code), 1st Generation Languages ​​(Binary Machine Code), Number of Instructions, , FIGURE 8.8 Hierarchy of a programming language, else ), Boolean and mathematical operators, literals (eg, 12.27), and program-defined variables, , mer. While the lexer is building the token stream, it is also building the framework for a token table. At this point, the symbol table probably contains user-defined tokens (variables and procedure names), along with notes about their location and data type. Lexical errors occur when foreign language constructs or characters are discovered in the source code., A=, , B, ;, +6, , Source, Code, , Lexical, Analyzer, , =, Token Stream, , A= B+ , , 6, , Syntax, Parser, , A, , Analyze, Tree, +, , B, , 6, , Semantics, Parser, =, , Symbol, Table, , A, , Intermediate, Code, Generator, Code, Optimizer , , Code, Generator, , B, , Parse, Tree, +, (float) 6, , A := B + 6.0, Intermediate Code, , LOAD 0A2, ADD, 0A6, STORE 0A0, Optimized Code, , 10A2, 30A6 , 20A0 , , FIGURE 8.9 The six phases of program compilation


Page 418:
8.4 / Programming Tools, , 387, , The programmer-defined variable 1DaysPay, for example, would produce a lexical error in most languages ​​because variable names cannot normally begin with a digit. If no lexical errors are found, the compiler proceeds to parse the token stream syntax. Traversing a parse tree in order usually gives the expression just parsed. Consider, for example, the following program declaration:, monthPrincipal = payment – ​​(outstanding balance * interest rate), , Figure 8.10 shows a correct syntax tree for this declaration., The parser checks for the presence of the table of programmer-defined symbols, variables that populate the tree. If the parser finds a variable for which there is no description in the symbol table, it issues an error message. The parser also detects illegal constructs such as A = B + C = D. However, what the parser does not do is check that the = or + operators are valid for variables A, B, C, and D. The semantic parser does this. in the next phase. Takes the parse tree as input and checks for the appropriate data types using information from the symbol, table. The semantic analyzer also performs appropriate data type promotions, such as changing an integer to a floating point value or variable, if such promotions are compatible with the language rules. Once the compiler has completed its parsing functions, it begins its synthesis phase using the syntax tree of the semantic parsing phase. The first step in code synthesis is to create pseudo-assembly code from the syntax tree. This code is often called three-address code because it supports instructions, such as A = B + C, that most assembly languages ​​don't. This intermediate code allows compilers to be portable to many different types of computers. Once all of the tokenization, tree building, and semantic analysis is done, it becomes a relatively easy task to write a 3-way code translator that produces output for many different instruction sets. Most ISA systems use 2 address codes, so addressing mode differences must be resolved during the translation process. (Remember that the MARIE instruction set is a file-, , =, –, , principalMonth, , payment, , *, , Outstanding Balance, , interest rate, , FIGURE 8.10 A syntax tree


Page 419:
388, , Chapter 8 / System Software, , tecture.) The final stage of the compiler, however, usually does more than simply translate intermediate code into assembler instructions. Good compilers try to optimize code, which can account for different memory and register organizations, as well as providing the most powerful instructions needed to accomplish the task. Code optimization also involves removing unnecessary temporary variables, collapsing repeated expressions into single expressions, and flagging inactive (inaccessible) code. suitable for linking and running on the target system., 8.4.5, , Interpreters, Like compiled languages, interpreted languages ​​also have a one-to-many relationship between source code instructions and executable machine instructions. However, unlike compilers, which read the entire source file before producing a binary stream, interpreters process one source declaration at a time. With so much work being done "on the fly", interpreters are often much slower than compilers. At least five of the six steps required by compilers must also be performed in interpreters, and these steps are performed in "real time". This approach does not offer any opportunity for code optimization. Also, error detection in interpreters is often limited to language syntax and variable type checking. For example, very few interpreters detect possible illegal arithmetic operations before they happen or warn the programmer before exceeding the bounds of an array. For example, if a user types "esle" instead of "else", the editor will immediately issue a note to that effect. Other shells allow the use of general-purpose text editors, delaying syntax checking until runtime. The latter approach is particularly risky when used for business-critical application programs. If the application program executes a branch of code that has not been checked for proper syntax, the program crashes, leaving the dissatisfied user to see a strange looking prompt, with their files perhaps only partially up to date. slow execution speed and delayed error checking, there are good reasons to use an interpreted language. Chief among these is that interpreted languages ​​allow debugging at the source code level, making them ideal for both beginning programmers and end users. So in 1964, two Dartmouth professors, John G. Kemeny and Thomas E. Kurtz, invented BASIC, the All-Purpose Symbolic Instruction Code for Beginners. At that time, students' first programming experiences consisted of entering FORTRAN instructions on 80-column cards. The cards were then run through a mainframe compiler, which often had a turnaround time measured in hours. Sometimes it would take days before a clean build and run could be achieved. In its drastic change from compiling instructions in batch mode, BASIC allowed students to write program instructions during an interactive terminal session. The BASIC interpreter, which was constantly


Page 420:
8.5 / Java: All of the above, , 389, , running on the mainframe, provided immediate feedback to students. They could quickly correct syntax and logic errors, thus creating a more positive and effective learning experience. For these same reasons, BASIC was the language of choice for early personal computer systems. Many first-time computer buyers were not experienced programmers, so they needed a language that would make it easier for them to learn to program on their own. BASIC was ideal for this purpose. Also, on a single user personal system, very few people cared that BASIC parsing was so much slower than a compiled language. His team at Sun Microsystems set out to create a programming language that could run on any computing platform. The mantra was to create a "write once, run anywhere" computing language. In 1995, Sun released the first version of the Java programming language. Due to its portability and open specifications, Java has become extremely popular. Java, the code can run on virtually any computing platform, from the smallest handheld devices to the largest mainframes. Java's timing couldn't have been better: it's a cross-platform language implementable at the dawn of large-scale Internet-based commerce, the perfect model for cross-platform computing. Although Java and some of its features were briefly introduced in Chapter 5, we will now go into more detail. If you've ever studied the Java programming language, you know that the output of the Java compiler is a binary class file. This class file is executable by a Java Virtual Machine (JVM), which resembles a real machine in many ways. It has private memory areas addressable only by processes running inside the machine. It also has its own bona fide instruction set architecture. That, ISA is stack-based to keep the machine simple and portable to virtually any computing platform. Of course, a Java Virtual Machine is not a real machine. It is a software layer that sits between the operating system and the application program: a binary class file. Class files include variables, as well as the methods (procedures) to manipulate those variables. Figure 8.11 illustrates how the JVM is a miniature computing machine, with its own memory and method area. Note that the memory heap, method, code, and "native method interface" areas are shared among all running processes within the machine. chain execution. Java heap memory deallocation is (indelicately) called garbage collection, which the JVM (rather than the OS) does automatically. The Java native methods area provides a workspace for non-Java binaries, such as compiled C++ or assembly language modules. The JVM method area contains the binary code necessary to execute each application thread in the JVM. That is, where the class variable data structures and the program instructions required by the

(Video) Digital System Architecture Introduction


Page 421:
390, , Chapter 8 / System Software, , Method, Area, , Heap, , Native Method Area, (Interface for programs, written in other languages), , Thread n, Program, Counter, Thread 1, , Framework, , Program, Counter, , Method 2, Method 1, , Method 2, Framework, , Method 1, , Framework, , Framework, Method 3, Framework, , FIGURE 8.11 The Java Virtual Machine class, , resides. Java program executable instructions are stored in intermediate code called bytecode, also introduced in Chapter 5. Java method bytecode is executed in multiple thread processes. The JVM automatically starts several thread processes, the main thread of the program being one of them. Only one method can be active at a time on each thread, and programs can spawn additional threads to provide concurrency. When a thread invokes a method, it creates a memory frame for the method. Part of that memory, the framework, is used for the method's local variables and another part for its private stack. method., Every Java class contains a type of symbol table called a constant array, which is an array that contains information about the data type of each of the variables in a class and the initial value of the variable, as well as the access flags for the variable (for example, whether it is public or private for the class). The set of constants also contains several structures other than those defined by the programmer. This is why Sun Microsystems calls the entries in the constant set (the elements of the array) attributes. Among the attributes of each Java class are housekeeping elements such as the name of the Java source file, part of its inheritance hierarchy, and pointers to other internal JVM data structures.


Page 422:
8.5 / Java: All of the above, , 391, , public class Simple {, public static void main (String[] args) {, int i = 0;, double j = 0;, while (i < 10) {, i = i + 1;, j = j + 1.0;, } // while, } // main(), } // Simple(), FIGURE 8.12 A simple Java program, , to illustrate how the JVM executes the bytecode method , Consider the Java program shown in Figure 8.12. Java requires that the source code for this class be stored in a text file named Simple.java. The Java compiler reads Simple.java and does everything that other compilers do. Its output is a binary stream of bytecode called Simple.class. The Simple.class file can be executed by any JVM with the same version or later than the compiler that created the class. These steps are shown in Figure 8.13. At runtime, a Java virtual machine must be running on the host system. When the JVM loads a class file, the first thing it does is check the integrity of the bytecode by checking the format of the class file, checking the format of the bytecode instructions, and making sure no illegal references are made. After the preflight completes successfully, the loader performs a series of runtime checks while putting the bytecode into memory. Once all verification steps have been completed, the loader calls the bytecode interpreter. This interpreter has six phases in which: 1. It will perform a bind edit of the bytecode instructions by asking the loader to provide all referenced classes and system binaries if they are not already loaded., 2. It will create and initialize the main data frame. and local variables., , Source, Code, Java Virtual Machine, , .java file, Java, Compiler, , .class file, (bytecode), , Bytecode, Verifier, , Class, Loader, , Bytecode, Interpreter, , FIGURE 8.13 Compiling and running Java classes


Page 423:
392, , Chapter 8 / System Software, , 3. Create and start threads of execution., 4. While threads are running, manage the heap by deallocating unused storage., 5. As each thread dies, deallocate their resources. , 6. After the program ends, kill the remaining threads and shut down the JVM. Figure 8.14 shows the hex image of the bytecode for Simple.class. The address of each byte can be found by adding the value in the first (shaded) column to the row offset in the first (shaded) row. For convenience, we translate the bytecode to characters where the binary value has a significant 7-bit ASCII value. You can see the name of the source file, Simple.java, starting at address 06Dh. The name of the class starts at 080h. Readers familiar with Java will know that the Simple class is also known as .this class and its superclass is java.lang.Object, whose name begins at address 089h. Notice that our class file begins with the hexadecimal number CAFEBABE. It's the magic number that indicates the beginning of a class file (and yes, it's politically incorrect!). An 8-byte string indicating the language version of the class file follows the magic number. If this sequence number is greater than the version that the interpreting JVM can support, the browser closes the JVM. The executable bytecode starts at address 0E6h. The hexadecimal digits, 16, at address 0E5h tell the interpreter that the executable method's bytecode is 22 bytes long. As in assembly languages, each executable bytecode has a corresponding mnemonic. Java currently defines 204 different bytecode instructions. 0, AC, 00, 03, 6E, 6D, 61, 53, 70, 53, 6E, 00, 00, 00, 00, 00, 3C, 00, 00, 00, , +1, FE, 0D, 28, 65, 61, 6E, 6F, 6C, 69, 67, 00, 00, 00, 09, 04, 28, 07, 04, 15, , +2, BA, 07, 29, 4E, 69, 67, 75, 65, 6D, 2F, 00, 1D, 00, 00, 00, 0F, 00, 00, 00, , +3, BE, 00, 56, 75, 6E, 2F, 72, 2E, 70, 4F, 00, 00, 01, 08, 00, 63, 00, 05, 09, , +4, 00, 0E, 01, 6D, 01, 53, 63, 6A, 6C, 62, 02, 01, 00, 00, 00, 49, 00, 00, 00, , +5, 03, 01, 00, 62, 00, 74, 65, 61, 65, 6A, 00, 00, 07, 09, 16, 1B, 1E, 07, 01, , +6, 00, 00, 04, 65, 16, 72, 46, 76, 01, 65, 01, 01, 00, 00, 03, 10, 00, 00, 00, , +7, 2D, 06, 43, 72, 28, 69, 69, 61, 00, 63, 00, 00, 00, 01, 3C, 0A, 07, 06, 0A, +8, 00, 3C, 6F, 54, 5B, 6E, 6C , 0C, 10, 74, 04, 00, 00, 00, 0E, A1, 00, 00, 00, , +9, 0F, 69, 64, 61, 4C, 67, 65, 00, 6A, 00, 00 , 00, 06, 06, 49, FF, 00, 0B, 00, , +A, 0A, 6E, 65, 62, 6A, 3B, 01, 04, 61, 21, 05, 05, 00, 00, A7 , F5, 00, 00, 00, , +B, 00, 69, 01, 6C, 61, 29, 00, 00, 76, 00, 00, 2A, 01, 00, 00, B1, 03, 07, 02 , , +C, 03, 74, 00, 6 5 , 76, 56, 0B, 05, 61, 02, 01, B7, 00, 00, 0B, 00, 00, 00, 00, , +D, 00, 3E, 0F, 01, 61, 01, 53, 01 , 2F, 00, 00, 00, 00, 46, 1B, 00, 02, 0F, 0B, , +E, 0C, 01, 4C, 00, 2F, 00, 69, 00, 6C, 03, 06, 01 , 00, 00, 04, 00, 00, 00, 00, , +F, 07, 00, 69, 04, 6C, 0A, 6D, 06, 61, 00, 00, B1, 01, 04, 60, 01 , 04, 05, 3D, , FIGURE 8.14 Binary image of Simple.class, , Characters, <init>, ()V Code, Li, neNumberTable, main ([Ljava/l, ang/String;)V, SourceFile Yes , ple.java, Simple java/la, ng/Object !, *, F , < I, , `, , <(cI, , =


Page 424:
8.5/Java: All of the above, , 393, , so only one byte is needed for the full range of opcodes. These small opcodes help keep classes small, making them fast to load and easily convertible to binary instructions on the host system. necessary. For example, the iconst_5 mnemonic pushes the integer 5 onto the stack. To push larger constants onto the stack, two bytecodes are needed, the first for the operation and the second for the operand. As we mentioned earlier, the local variables for each class are kept in an array. Characteristically, the first elements of this array are the most active, so there are specific bytecodes to address these initial local elements of the array. Accessing other positions in the array requires a 2-byte instruction: one for the opcode and the second for the array element offset. With that said, let's look at the bytecode of Simple .class's main() method. We extract the bytecode from Figure 8.14 and list it in Figure 8.15 along with mnemonics and some comments. The leftmost column gives the relative address of each instruction. The thread-specific program counter uses this relative address to control the flow of the program. We now trace the execution of this bytecode so you can see how it works. When the interpreter starts executing this code, the pc is initially set to zero and the iconst_0 instruction is executed. This is the implementation of int i =, 0; declaration in the third line of the Simple.java source code. The PC increments by one and then executes each initialization statement until it encounters the goto statement at statement 4. This statement adds a decimal 11 to the program counter, so its value becomes 0Fh, which points to the statement load_1. At this point, the JVM has assigned initial values ​​to i and j and now checks the initial condition of the while loop to see if the body of the loop should be executed. To do this, it pushes the value of i (from the array of local variables) onto the stack, and then pushes the comparison value 0Ah. Note that this is a bit of code optimization that the compiler did for us. By default, Java stores integer values ​​in 32 bits, thus occupying 4 bytes. However, the compiler is smart enough to see that the decimal constant 10 is small enough to store in one byte, so it wrote code to put a single byte instead of four bytes on the stack. The op comparison statement, if_icmplt, prints i and 0Ah and compares their values ​​(the lt at the end of the mnemonic means what you're looking for, the less than condition). If i is less than 10, 0Bh is subtracted from PC, resulting in 7, which is the starting address of the loop body. When the statements inside the body of the loop are complete, execution switches to conditional processing at address 0Fh. Once this condition goes false, the shell returns control to the operating system, after performing a cleanup. image of the binary class file in Figure 8.14. This is the start of line, table of numbers that associates the value of the program counter with a given line in the


Page 425:
394, , Chapter 8 / System Software, , Offset, (PC value), -, , -, , -, , Bytecode, -, , -, , -, , Mnemonic, -, , -, , -, , Meaning , -, , 0, , 03, , iconst_0, , Push the integer 0 onto the data stack., , ​​1, , 3C, , istore_1, , Pop the integer off the top of the stack and place it in the array of local variables at address 1., , 2 , , 0E, , dconst_0, , Push the double-precision constant 0 onto the stack., , ​​3, , A7, , goto, , Ignore the number of bytes as indicated a then two bytes., This means we will skip the next 11 bytes., This means adding 0B to the program counter., , 00, 0B, -, , -, , Variable Initialization, , -, , -, , -, , - , , -, , -, , -, , -, , -, , Loop body, , 7, , 1B, , iload_1, , Push an integer value from the local array to stack., , ​​8, , 04, , iconst_1, , Inserts the integer constant 1., , 9, , 60 , , iadd, , Adds two integer operands that are on top of the stack. (The sum is sent.), , A, , 3C, , istore_1, , Pop the integer off the top of the stack and place it in the local variable array at address 1., , B, , 28, , dload_2 , , Load the double variable from the local array and push it onto the stack., , ​​C, , 0F, , dconst_1, , Push the double-precision constant 1 onto the stack., , ​​D, , 63, , dadd, , Add the double-precision values ​​at the top of the stack., , ​​E, , 49, , dstore_2, , Stores the double sum at the top of the stack, in position 2 of the array of local variables., , -, , - , , -, , -, , -, , -, , -, , -, , -, , Loop Condition, Load integer from array of local variables, position 1., , F, , 1B, , iload_1, , 10, , 10, , bipush, , Pushes the next byte value onto the stack., (10 decimal), , if_icmplt, , Compares two integer values ​​at the top of the stack, for the condition "less than that"., If true (i < 10), add the following value to the program counter., Note: FFF5 = –11 decimal., , return, , Otherwise, return., , 0A, 12, , A1, FF, F5, , 15 , , B1, , FIGURE 8.15 Annotated byte of a Simple.class


Page 426:
8.6 / Database Software, , PC, , , 00, 02, 04, 07, 0B, 0F, 15, , 395, , Source, Line Number, 1, 2, 3, 4, 5, 6, 7 , 8 , 9 , A, , public class Simple {, public static void main (String[] args) {, int i = 0;, double j = 0;, while (i < 10) {, i = i + 1 ;, j = j + 1.0;, } // while, } // main(), } // Simple(), , FIGURE 8.16 A program counter for cross-referencing the source line to Simple.class, source program . The two bytes starting at address 106h tell the JVM that there are seven entries in the line number table below. Filling in a few details, we can build the cross reference shown in Figure 8.16. Note that if the program fails when PC = 9, for example, the wrong source program line would be line 6. Interpreting the generated bytecode, for source code, line number 7 starts when PC is greater than or equal to to 0Bh, but less than 0Fh. Since the JVM does everything that loads and executes its bytecode, its performance cannot match the performance of a compiled language. This is true even when using acceleration software such as Java's Just-In-Time (JIT) compiler. However, the downside is that class files can be created and stored on one platform and run on a completely different platform. For example, we can write and compile a Java program on an Alpha RISC server and it will also run on CISC Pentium class clients that download the bytecode from the class file. This "write once, run anywhere" paradigm is of great benefit to enterprises with disparate and geographically separated systems. Java applets (bytecode executed in browsers) are essential for web-based transactions and e-commerce. Ultimately, all that is required of the user is running (reasonably) up-to-date browser software. Given its portability and relative ease of use, the Java language and its virtual machine environment is an ideal middleware platform. but your data Regardless of the nature of the business, whether it is a private company, an educational institution or a government agency, the definitive record of its history and current state is imprinted in its data. If the data is inconsistent with the state of the business, or if the data is inconsistent with itself, its usefulness is questionable and problems are likely to arise.


Page 427:
396, , Chapter 8 / System Software, , Any computer system that supports a business is the platform for interrelated application programs. These programs perform data updates according to changes in the state of the company. Groups of interrelated programs are often called application systems because they work together as an integrated whole: few parts are very useful on their own. Application system components share the same data set and usually, but not necessarily, share the same computing environment. Today's application systems use many platforms: desktop microcomputers, file servers, and mainframes. With web-based cooperative computing becoming all the rage, sometimes we don't know or care where the application is running. While each platform brings its unique benefits and challenges to the science of data management, the fundamental concepts of database management software have remained unchanged for more than three decades. Early application systems recorded data using magnetic tape or punched cards. Due to their sequential nature, tape and punch card updates had to be executed in batches, in batch mode, for greater efficiency. Since any data item on magnetic disks is directly accessible, the system architecture no longer enforced batching of updates against flat files. However, old habits are hard to break and programs are expensive to rewrite. Consequently, flat file processing persisted for years after most card readers became museum exhibits. In flat file systems, each application program is free to define any data object it needs. For this reason, it is difficult to impose a coherent vision of the system. For example, let's say we have an accounts receivable system, that is, an enforcement system that keeps track of who owes us, how much money, and how long it is owed. The program that produces monthly invoices can enter monthly transactions in a 6-digit field (or data element) called CUST_OWE. be five digits wide. It is almost certain that at some point information will be lost and confusion will reign. Sometime during the month, after several thousand dollars have "gone", the debuggers will finally discover that CUST_OWE is the same data element as CUST_BAL and that the problem was caused by truncation or overflow of field, condition., Database Management Systems (DBMS)) were created to avoid these situations. They enforce order and consistency in file-based application systems. With database systems, programmers are no longer free to describe and access a data item in any way they like. There is one, and only one, definition of data elements in a database management system. This definition is the database schema of a system. In some systems, a distinction is made between the programmer's view of the database, his logical schema, and the computer system's view of the database, called the physical schema. The database management system integrates the physical and logical views of the database. Application programs use the logical schema presented by the database management system to read and update data in the physical schema, under the control of the database management system and the operating system. Figure 8.17 illustrates this relationship.


Page 428:
8.6 / Database Software, , 397, , Application, Program, Application, Program, , Application, Program, , Database, Administrator, , Operating, System, , Hardware, , Data, Files, , FIGURE 8.17 The relationship from a database management system to, Other system components, , The individual data elements defined by a database schema are organized into logical structures called records, which are grouped into files. Related, the files collectively form the database. Database architects are attentive to application needs as well as performance by creating logical and physical schemas. The overall goal is to minimize redundancy and wasted space while maintaining the desired level of performance, typically measured in terms of application response time. A banking system, for example, would not put a customer's name and address on every canceled check record in the database. This information would be maintained in a master account file that uses an account number as a key field. Each canceled check would have to contain only the account number along with information specific to the check itself. Database management systems vary widely in the way data is physically organized. Virtually all database vendors have invented proprietary methods for managing and indexing files. Most systems use a variant of the B+-tree data structure. (See Appendix A for more details.) By removing the


Page 429:
398, , Chapter 8 / System Software, , process operating system layer, database systems can optimize reads and writes according to the database schema and index design. In Chapter 7, we studied the organization of files on disk. We learned that most disk systems read data in blocks of disk, with the smallest addressable unit being a sector. Most large systems read an entire track at a time. As the structure of an index becomes very deep, the likelihood increases that we will need more than one read operation to traverse the index tree. So how do we organize the tree to keep disk and I/O as infrequent as possible? Is it better to create very large internal index nodes, so that more record values ​​can be spanned per node? This would reduce the number of nodes per level and perhaps allow an entire tree level to be accessed in a single read operation. Or is it better to keep the internal node sizes small so that we can read more layers of the index in a single read operation? The answers to all of these questions can only be found in the context of the specific system the database is running on. An optimal answer may even depend on the data itself. For example, if the keys are sparse, that is, if there are many possible key values ​​that are not used, then we can choose a certain index organization, schema. But with densely populated index structures, we can choose another. Regardless of implementation, database tuning is a non-trivial task that requires an understanding of database management software, storage, system architecture, and the details of the data population managed by the system. . The database usually contains more than one index. For example, if you have a customer database, it would be a good idea to search for records by customer account number and by customer name. Each index, of course, adds overhead to the system, both in terms of space (to store the index) and time (because all indexes need to be updated at the same time when records are added or deleted). there are enough indexes to allow fast log retrieval in most circumstances, but not so many that the system is burdened with an excessive amount of maintenance. and easy access to large amounts of data, but to ensure the integrity of the database, it is always retained. This means that a database management system must allow users to define and manage rules or restrictions placed on certain critical data elements. Sometimes these constraints are just simple rules like "Customer number cannot be null". More complex rules determine which specific users can see which data items and how files with interrelated data items will be updated. Defining and enforcing data integrity and security constraints is critical to the usefulness of any database management system. Another core component of a database management system is its transaction manager. A transaction manager handles data object updates to ensure that the database is always in a consistent state. Formally, a transaction manager keeps track of changes in data state so that each transaction has the following properties: • Atomicity: All related updates occur within the transaction boundaries, or no updates are made.


Page 430:
399, 8.6 / Database Software, • Consistency: All updates adhere to the constraints placed on all data elements. • Isolation: No transaction can interfere with the activities or updates of another transaction. • Durability: Successful transactions are written to "durable". ” medium (for example, magnetic disk) as soon as possible. These four elements are known as transaction management ACID properties. The importance of ACID properties can be easily understood through an example. Let's say you've made your monthly payment with your credit card, and shortly after sending it, you go to a nearby store to make another purchase with your card. Also suppose that, at the same time that the clerk is swiping his plastic through a magnetic reader, a bank clerk is entering his payment into the bank's database. Figure 8.18 illustrates one way that a central computer system might process these transactions. In the figure, the accounting clerk finishes his update before the salesperson finishes his, leaving him with an outstanding balance of $300. The transaction could occur as easily as shown in Figure 8.19, where the clerk finishes his update first and then the account ends. with a balance of $0.00 and you just got your free stuff! things that would probably make him happy, it's just as likely that he'll end up paying his bill twice (or fighting with the accountant, until his records are corrected). The situation just described is called a race condition, because the final state of the database does not depend on the correctness of the updates, but on which transaction ends last. , atomicity and isolation. They do this by putting various types of locks on data, records. In our example of Figure 8.18, the accounting clerk should be given an "exclusive" hold on his credit card record. Lock is released only after, Credit Account, Time, Read, , $200, , Read, , +$, , 200, +100, 300, Sales, Employee, , –$, , Write, , $0, , Write , , $300, , FIGURE 8.18 Transaction Scenario, , 200, –200, 0, Accounting, Employee


Page 431:
400, , Chapter 8 / System Software, Credit Account, Time, Read, , Read, , $200, , +$, , –$, , 200, +100, 300, , Write, $300, , Sales, Employee, , Write , , 200, –200, 0, Accounting, Employee, , $0, , FIGURE 8.19 Another transaction scenario, the updated balance is written back to disk. While the counter transaction is running, the vendor receives a message that the system is busy. Once the update is complete, the transaction manager releases the counter lock and immediately places another one for the seller. The corrected transaction is shown in Figure 8.20. There is some risk in this approach. Any time an entity is locked in a complex system, there is the possibility of a deadlock. Systems can intelligently manage their locks to reduce the risk of deadlocks, but all steps are taken to prevent or detect: Credit Account, Time, Read, $200, , –$, Wait, L, O, C , K, Write , , +$, $0, Read, 0, +100, 100, , Accounting, Clerk, , $0, L, O, C, K, , Write, , Sales, Clerk, , 200, –200, 0 , , $100 , , FIGURE 8.20 An isolated atomic transaction


Page 432:
8.7 / Transaction managers, , 401, , deadlock puts more overhead on the system. With too much lock management, transaction performance suffers. In general, deadlock prevention and detection are secondary to performance considerations. Deadlock situations are rare, while performance is a factor in all transactions. Another performance impediment is data logging. During log updating (which includes record deletion), database transaction managers write transaction images to a log file. Therefore, each update requires at least two writes: one to the main file and one to the log file. The log file is important because it helps the system maintain the integrity of the transaction if it needs to be stopped due to an error. If, for example, the database management system captures an image of the registry being updated before the update is performed, that previous image can be quickly written back to disk, erasing all subsequent updates to the registry. Registration. On some systems, "before" and "after" images are captured, making error recovery relatively easy. 🇧🇷 Some prudent system administrators keep these log files for years in protected tape libraries. Log files are especially important tools for data backup and recovery. Some databases are simply too large to copy to tape or optical disc every night; it takes too long. Instead, full backups of the database files are only done once or twice a week, but the log files are backed up every night. If a disaster occurs at some point between these full backups, the transaction log files from the other days will be used for future recovery, reconstructing each day's transactions as if they had been re-entered by users on the images. of the complete database taken days before. the database access controls we just discussed (security, index, management, and lock management) consume enormous system resources. In fact, this overhead was so great in early systems that some people successfully argued that they should keep using their file-based systems because their host computers couldn't handle the database administration load. Even with today's extremely powerful systems, performance can suffer severely if the database system is not properly tuned and maintained. High-volume transaction environments are often staffed by system programmers and database analysts whose sole job is to keep the system running at peak performance. to do, less work by shifting some of its functions to other system components. Transaction management is a database component that is generally separated from the basic data management functions of a database management system. Standalone transaction managers also often incorporate load balancing and other optimization features that are not suitable for inclusion in core database software, improving the efficiency of the entire system. Transaction managers are particularly useful when business transactions span two or more separate databases. None


Page 433:
402, , Chapter 8 / System Software, , participating databases may be responsible for the integrity of their peer databases, but an external transaction manager may keep them all in sync. Information and Control System (CICS). CICS has been around for more than three decades and was released in 1968. CICS stands out because it was the first system to integrate transaction processing (TP), database management, and communications management into a single set of applications. . However, CICS components were (and still are) loosely coupled to allow tuning and management of each component as a separate entity. The CICS communications management component controls the interactions, called conversations, between dummy terminals and a host system. Freed from the burdens of protocol management, the database and application programs do their jobs more efficiently. CICS was one of the first application systems to employ remote procedure calls in a client-server environment. In its contemporary incarnation, CICS can manage transaction processing among thousands of Internet users and large host systems. But even today, CICS closely resembles its old architecture from the 1960s, which has become the paradigm for virtually every transaction processing system invented since. The modern CICS architecture is shown schematically in Figure 8.21. As you can see in the diagram, a program called the transaction processing monitor (TP monitor) is the central component of the system. It takes information from the telecom administrator and authenticates the transaction against data files that contain lists of which users are authorized for which transactions. Sometimes this security information includes specific information, such as defining which locations can perform specific transactions (intranet versus the Internet, for example). After the transaction is authenticated, the monitor starts the application program requested by the user. When the application needs data, the TP monitor sends a request to the database management software. It does all of this while maintaining atomicity and isolation among many concurrent application processes. , there is no reason to keep them together. Some distributed architectures dedicate groups of small servers to running TP monitors. These systems are physically different from the systems that contain the database management software. It is also not necessary that the systems running the TP monitors be of the same system class as the systems running the database software. For example, you might have Sun Unix RISC systems for communications management and a Unisys ES/7000 running database software on the Windows Datacenter operating system. Transactions would be entered via desktop or mobile personal computers. This configuration is known as a 3-tier architecture, where each platform represents one of the tiers; the general case being an n-tier or multi-tier architecture. With the advent of web computing and e-commerce, tiered TP architectures are becoming increasingly popular. Many vendors, including Microsoft, Netscape, Sybase, SAP AG, and IBM CICS, have been successful in supporting multiple n-tier transaction systems. Of course it is


Page 434:
Chapter Summary, , 403, , Browser Sessions, , Application, Program, , Internet or Intranet, , Application, Program, , Application, Program, Transaction, Processing, Monitor, , Telecommunications, Administrator, (port listener, port daemon process), , Database, Manager, , Security, Data, , Application, Data, , Application, Data, Application, Data, , FIGURE 8.21 The CICS architecture, , impossible to say which is "best" for a given enterprise, each one has its own advantages and disadvantages. The prudent system architect considers all cost and reliability factors when designing a TP system before deciding which architecture makes the most sense for a particular environment. CHAPTER SUMMARY This chapter describes the mutual dependence of computer hardware and software. System software works together with system hardware to create a functional and efficient system. System software, including operating systems and application software, is an interface between the user and the hardware, allowing the low-level architecture of a computer to be treated abstractly. This provides users with an environment in which they can focus on troubleshooting rather than system operations., , T


Page 435:
404, , Chapter 8 / System Software, , The interaction and deep interdependence between hardware and software is most evident in the design of the operating system. In their historical development, operating systems started with an "open purchase" approach, then moved to an operator-driven batch approach, and then evolved to support interactive multiprogramming and distributed computing. Modern operating systems provide a user interface as well as a variety of services including memory management, process management, general resource management, scheduling, and protection. Knowledge of operating system concepts is essential for all computer professionals. Virtually all system activities are tied to operating system services. When the operating system fails, the entire system fails. However, you should realize that not all computers have or need operating systems. This is particularly true in the case of embedded systems. Your car's computer or microwave is so simple it doesn't need an operating system. However, for computers that go beyond the simple task of running a single program, operating systems are essential for efficiency and ease of use. Operating systems are one of many examples of large software systems. Their study provides valuable lessons applicable to software development in general. For these and many other reasons, we sincerely encourage further exploration of operating system design and development. Assemblers and compilers provide the means by which human-readable computer languages ​​are translated into a binary format suitable for execution on a computer. Interpreters also produce binary code, but the code is generally not as fast or efficient as that generated by an assembler. The Java programming language produces code that is interpreted by a virtual machine located between its bytecode and the operating system. Java code runs slower than binary programs, but is portable to a wide variety of platforms. The ACID properties of a database system ensure that the data is always in a consistent state. Building large and reliable systems is a major challenge facing computing today. By now, you understand that a computer system is much more than hardware and programs. Enterprise-class systems are aggregations of interdependent processes, each with its own purpose. The failure or poor performance of any of these processes will have a detrimental effect on the entire system, if only on the perception of its users. As you continue with your career and education, you will study many of the topics in this chapter in more detail. If you are a system administrator or system programmer, you will master these ideas as they are applied in the context of a specific operating environment. No matter how clever we are at writing our programs, there is little we can do to compensate for the performance of any of the system components our programs depend on. We invite you to dive into Chapter 10, where we take a closer look at system performance issues. FURTHER READING The most interesting material in the area of ​​systems software is that which accompanies certain vendor products. In fact, you can often judge the quality of a supplier


Page 436:
References, , 405, , product for the quality and care with which the documentation is prepared. A visit to a vendor's website can sometimes reward you with a first-rate presentation of the theoretical foundation of their products. Two of the best vendor sites on this topic are IBM and Sun: www.software.ibm.com and www.java.sun.com. If you persevere, you will undoubtedly find others. Hall's (1994) book on client-server systems provides an excellent introduction to client-server theory. Explore a number of products that were popular when the book was written. Stallings (2001), Tanenbaum (1997), and Silberschatz, Galvin, and Gagne (2001) provide excellent coverage of the operating system concepts presented in this chapter, as well as more advanced topics. Stallings includes detailed examples of various operating systems and their relationship to actual machine hardware. An illuminating account of the development of OS/360 can be found in Brooks (1995). Gorsline's book on assembly (1988) offers one of the best treatments of how assemblers work. It also dives into the details of macro linking and assembly. Aho, Sethi and Ullman (1986) wrote the "definitive" compiler book. Often called "The Dragon Book" because of its cover illustration, it stayed in print for nearly two decades due to its clear and comprehensive coverage of compiler theory. Every serious computer scientist should have this book on hand. Sun Microsystems is the premier source for everything related to the Java language. Addison-Wesley publishes a series of books detailing the finer points of Java. Lindholm and Yellin's The Java Virtual Machine Specification (1999) is one of the books in the series. It will provide some of the details of creating class files that we have covered in this introductory material. Lindholm and Yellin's book also includes a complete list of Java bytecode instructions with their binary equivalents. A careful study of this work is sure to give you a new perspective on language. Although somewhat dated, Gray and Reuter's (1993) book on transaction processing is comprehensive and easy to read. This will give you a good foundation for further study in this area. A highly regarded and comprehensive treatment of database theory and applications can be found in Silberschatz, Korth, and Sudarshan (2001)., REFERENCES, Aho, Alfred V., Sethi, Ravi, and Ullman, Jeffrey D. Techniques and tools., Reading, MA: Addison-Wesley, 1986., Brooks, Fred. The mythical man-month. Reading, MA: Addison-Wesley, 1995., Gorsline, George W. Assembly and Assemblers: The Motorola MC68000 Family. Englewood, Cliffs, NJ: Prentice Hall, 1988, Gray, Jim and Reuter, Andreas. Transaction Processing: Concepts and Techniques. San Mateo, CA:, Morgan Kaufmann, 1993., Hall, Carl. Technical Fundamentals of Client/Server Systems. New York: Wiley, 1994., Lindholm, Tim and Yellin, Frank. The Java Virtual Machine Specification, 2nd ed. Reading, MA: Addison-Wesley, 1999.


Page 437:
406, , Chapter 8 / System Software, Silberschatz, Abraham, Galvin, Peter and Gagne, Greg. Operating System Concepts, 6th ed. Reading, MA: Addison-Wesley, 2001., Silberschatz, Abraham, Korth, Henry F., and Sudarshan, S. Database Systems Concepts, 4th ed., Boston, MA: McGraw-Hill, 2001., Stallings, W. Operating Systems, 4th ed. New York: Macmillan Publishing Company, 2001., Tanenbaum, Andrew and Woodhull, Albert. Operating Systems, Design and Implementation, 2nd, ed. Englewood Cliffs, NJ: Prentice Hall, 1997., , REVIEW OF ESSENTIAL TERMS AND CONCEPTS, 1. What was the primary goal of early operating systems compared to the goals of today's systems? 2. What were the improvements in computing operations generated by resident monitors? 3. Regarding printer output, how was the word spool derived? 4. Describe how multiprogramming systems differ from multiprogramming systems. time sharing., 5. What is the most critical factor in the operation of real-time systems?, 6. Can multiprocessor systems be classified by the way they communicate?, , How are they classified in this chapter ?, 7. How does a distributed operating system differ from a network operating system?, 8. What does transparency mean?, 9. Describe the two divergent philosophies of operating system kernel design., 10. Which are the benefits and drawbacks of a GUI operating system interface?, 11. With or Is long-term process scheduling different from short-term process scheduling?, 12. What is preventive scheduling?, 13. What method of programming processes is more useful in a timesharing environment?, 14. What is process scheduling? method that has proven to be ideal?, 15. Describe the steps involved in executing a context switch., 16. Besides process management, what are the other two important functions of an operating system?, 17. What is an overlay? Why are overlays no longer needed in large computing systems? 18. The operating system and a user program have two different perceptions of a virtual machine. Explain how they differ. 19. What is the difference between a subsystem and a logical partition? 20. Name some advantages of server consolidation. Is server consolidation a good idea for all companies? 21. Describe the programming language hierarchy. Why is a triangle a suitable symbol to represent this hierarchy? 22. How is absolute code different from relocatable code?


Page 438:
Exercises, , 407, , 23. What is a link editor used for? How is it different from a dynamic link library? for your, 27., 28., 29., 30., 31., 32., 33., 34., 35., portability in different hardware environments? edited., Java compilers produce _________ which is interpreted at runtime., What is the magic number that identifies a Java class file?, How does a logical database schema differ from a database schema? of physical data?, What data structure is most commonly used to index databases?, Why are database reorganizations necessary?, Explain the ACID properties of a database system, What Is it a race condition? The data in the database records has two purposes. What are they?, What services do transaction managers provide?, , EXERCISES, 1. What are the limitations of a computer without an operating system?, , How would a user load and run a program?, 2. Microkernels try to provide the smallest possible kernel by putting much of the operating system support in additional modules. What do you think are the minimum services that the kernel should provide? 3. If you were writing code for a real-time operating system, what constraints would you like to put on the system? between multiprogramming and multiprocessing? Multiprogramming and multithreading?, ◆ 5. Under what circumstances is it desirable to assemble groups of processes and programs into subsystems running on a large computer? What would be the advantages of creating logical partitions on this system?, 6. What would be the advantages of using subsystems and logical partitions on the same machine? 🇧🇷 Why is relocatable code preferred? 8. Suppose relocatable program code did not exist. How would the memory paging process become more complex? in a pass over the source file? How would code written for a one-step assembler be different from code written for a two-step assembler?


Page 439:
408, , Chapter 8 / System Software, 11. Why should assembly language be avoided for general application development?, , Under what circumstances is assembly language preferred or necessary?, 12. Under what circumstances Would you argue in favor of using assembly language, , code to develop an application program?, 13. What are the advantages of using a compiled language over an interpreted one?, , Under what circumstances would you choose to use an interpreted language?, 14 Discuss the following questions regarding compilers: a) Which phase of a compiler would give you a syntax error, b) Which phase would complain about undefined variables, c) If you tried to add an integer to a string, which compiler phase would issue the error message? ?, 15. Why is the execution environment of a Java class called a virtual machine? How does this virtual machine compare to a real machine running code written in C? .We declare that only one method at a time can be active on each thread running in the JVM. Why do you think that is so? 18. The Java bytecode for accessing a class's array of local variables is at most two bytes long. One byte is used for the opcode, the other indicates the offset in the array. How many variables can be contained in the local variables array? What do you think happens when that number is exceeded?, ◆, , 19. Java is called an interpreted language, but Java is a compiled language that produces, , a binary output stream. Explain how this language can be compiled and interpreted. 20. We claim that the performance of a Java program running on the JVM cannot be equal to that of a regular compiled language. Explain why this occurs., 21. Answer the following regarding database processing:, ◆, , a) What is a race condition? Give an example., , ◆, , b) How can race conditions be avoided?, , ◆, , c) What are the risks of preventing race conditions?, , 22. In what way can architectures of data processing are n-level transactions? superior to individual tiered architectures? What usually costs more? 23. To improve performance, your company decided to replicate your product database across multiple servers so that not all transactions go through a single system. What types of problems should be considered? 24. We said that the deadlock risk is always present whenever a system resource is blocked. Describe one way that a deadlock can occur.


Page 440:
Exercises, , 409, , *25. Look for various command line interfaces (such as Unix, MS-DOS, and VMS) and different Windows interfaces (such as any Microsoft Windows, macOS, and KDE products). a) Consider some of the main commands, such as getting a directory listing, removing a file, or changing directory. Explain how each of these commands is implemented in the various operating systems you studied. List and explain some of the commands that are easier to use in a GUI than in a command line interface. c) What type of interface do you prefer? Because?


Page 442:
Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction, and skilful execution; represents the wise choice of many alternatives. —William A. Foster, It seems that we have reached the limit of what is possible to achieve with computer technology, although we must be careful with such claims, as they tend to sound pretty silly in 5 years . , , —John von Neumann, 1949, , CHAPTER, , 9, 9.1, , Alternative Architectures, INTRODUCTION, Our previous chapters have featured excursions into the background of comtechnology. The presentations clearly focused on uniprocessor systems from the point of view of a computer professional. We hope that he understands the functions of the various hardware components and that he can see how each contributes to the overall performance of his system. This understanding is vital not only for hardware design, but also for the efficient implementation of algorithms. Most people become familiar with computer hardware through their experiences with personal computers and workstations. This leaves an important area of ​​computer architecture intact: that of alternative architectures. Therefore, the focus of this chapter is to present some of the architectures that transcend the classical von Neumann approach. This chapter discusses RISC machines, architectures that take advantage of instruction-level parallelism, and multiprocessing architectures (with a brief summary of processing). We start with the notorious RISC vs. CISC debate to give you an idea of ​​the differences between these two ISAs and their relative advantages and disadvantages. We then provide a taxonomy by which the various architectures can be classified, with an overview of how parallel architectures fit into the classification. We then consider issues relevant to instruction-level parallel architectures, emphasizing superscalar architectures and reintroducing EPIC (explicitly parallel instruction computers) and VLIW (very long instruction word) designs. Finally, we provide a brief introduction to multiprocessor systems and some alternative approaches to parallelism. Computer hardware designers began to reevaluate various architectural principles in the early 1980s. The first target of this reevaluation was the architect instruction set-, , Oputing, 411


Page 443:
412, , Chapter 9 / Alternative architectures, , ture. The designers wondered why a machine needed such a large set of complex instructions when only about 20% of the instructions were used most of the time. This question led to the development of RISC machines, which we first introduced in Chapters 4 and 5, and to which we now devote an entire section of this chapter. The spread of RISC designs has led to a unique marriage of CISC and RISC. Many architectures now use RISC kernels to implement CISC architectures. Chapters 4 and 5 describe how new architectures like VLIW, EPIC, and multiprocessors are taking over a large percentage of the hardware market. The invention of architectures that take advantage of instruction-level parallelism has led to techniques that accurately predict the outcome of forks in program code before the code is executed. Prefetch instructions based on these predictions greatly increased the performance of the computer. In addition to predicting the next instruction to be fetched, high degrees of instruction-level parallelism gave rise to ideas such as speculative execution, in which the processor guesses the value of a result before it has actually been computed. Alternative architectures also include multiprocessor systems. For these architectures, we go back to the lesson we learned from our ancestors and the friendly ox. If we use an ox to uproot a tree and the tree is too big, we don't try to create a bigger ox. Instead, we use two oxen. Multiprocessing architectures are analogous to oxen. We need them if we want to move the stumps of intractable problems. However, multiprocessor systems present us with unique challenges, particularly with respect to cache coherence and memory consistency. We note that while some of these alternative architectures are becoming established, their actual progress depends on their incremental cost. Currently, the relationship between the performance provided by advanced systems and their cost is not linear, with the cost far outweighing the performance gains in most situations. This makes it prohibitively expensive to integrate these architectures into core applications. However, alternative architectures have their place in the market. Highly numerical science and engineering applications require machines that outperform standard single-processor systems. For computers in this league, cost is generally not an issue. As you read this chapter, keep in mind the previous generations of computers, introduced in Chapter 1. Many people believe that we have entered a new generation based on these alternative architectures, particularly in the area of ​​parallel processing., 9.2, RISC MACHINES, Introduction to RISC architectures in the context of instruction set design in Chapters 4 and 5. Recall that RISC machines are so named because they originally offered a smaller instruction set compared to CISC machines. As RISC machines have developed, the term "reduced" has become something of a misnomer, and it is even more so now. The original idea was to provide a minimal set of instructions that could perform all the essential operations: move data, ALU operations, and branches. Only explicit load and store instructions allowed access to memory.


Page 444:
9.2 / RISC Machines, , 413, , Complex instruction set designs were motivated by the high cost of memory. Having more complexity in each instruction meant that programs could be smaller, taking up less storage space. CISC ISAs use variable-length instructions, which keep simple instructions short and allow for longer, more complicated instructions. Also, CISC architectures include a large number of instructions that directly access memory. So what we have at this point is a dense, powerful, variable-length instruction set that results in a variable number of clock cycles per instruction. Some complex instructions, particularly those that access memory, require hundreds of cycles. In certain circumstances, computer designers have found it necessary to slow down the system clock (by increasing the interval between clock ticks) to allow sufficient time for instructions to complete. All this translates into a longer execution time. Human languages ​​exhibit some of the qualities of RISC and CISC and serve as a good analogy for understanding the differences between the two. Suppose you have a Chinese pen pal. Suppose each of you is fluent in both English and Chinese. They both want to keep the cost of their correspondence to a minimum, although they both enjoy sharing long letters. You can choose between using expensive airmail paper, which will save you a considerable amount of postage, or using plain paper and paying more for stamps. A third alternative is to put more information on each page written. Compared with the Chinese language, English is simple but extensive. Chinese characters are more complex than English words, and what may require 200, English letters may only require 20 Chinese characters. Chinese correspondence requires fewer symbols, which saves paper and postage. However, reading and writing Chinese requires more effort because each symbol contains more information. The English words are analogous to RISC instructions, while the Chinese symbols are analogous to CISC instructions. For most English speakers, “processing” the letter in English would take less time, but would also require more physical resources. Although RISC is considered by many sources to be a revolutionary new design, its seeds were planted in the mid-1970s through the work of IBM's John Cocke. Cocke, began building his experimental Model 801 mainframe in 1975. Initially, this system received little attention and details of it were not released until many years later. Meanwhile, David Patterson and David Ditzel published their acclaimed "Case for a Reduced Instruction Set Computer" in 1980. This paper spawned a radically new way of thinking about computer architecture and introduced the acronyms CISC and RISC into scientific lexicon. computer. The new architecture proposed by Patterson and Ditzel advocated simple instructions, all the same size. Each statement would do less work, but the time required to execute the statement would be constant and predictable. Support for RISC machines came through programming observations on CISC machines. These studies revealed that data movement instructions accounted for approximately 45% of all instructions, ALU operations (including arithmetic, comparison, and logic) accounted for 25%, and branching (or flow control) accounted for 30%. %. Although there were many complex instructions, few were used. This discovery, combined with the advent of cheaper and more abundant products,


Page 445:
414, , Chapter 9 / Alternative Architectures, , memory and the development of VLSI technology led to a different type of architecture. Cheaper memory meant programs could use more storage. Longer programs consisting of simple, predictable instructions could replace shorter programs consisting of complicated, variable-length instructions. Simple instructions would allow the use of shorter clock cycles. Also, having fewer instructions would mean fewer transistors are needed on the chip. Fewer transistors means cheaper manufacturing costs and more chips available for other uses. Instruction predictability, coupled with advances in VLSI, would allow various tricks to improve performance, such as pipelines, to be implemented in hardware. CISC does not provide this diverse range of performance improvement opportunities. We can quantify the differences between RISC and CISC using the basic computer performance equation as follows: time, cycles, instructions, time, =, ×, ×, program, instruction cycle, program, Computer performance, measured times the execution time of the program, is directly proportional to the clock cycle time, the number of clock cycles per instruction, and the number of instructions in the program. Shortening the clock cycle where possible results in better performance for RISC and CISC. Otherwise, CISC machines increase performance by reducing the number of instructions per program. RISC computers minimize the number of cycles per instruction. However, both architectures can produce identical results in roughly the same amount of time. At the gate level, both systems perform an equivalent amount of work. So what happens between the program level and the gate level? CISC machines rely on microcode to handle the complexity of instructions. The microcode tells the processor how to execute each instruction. For performance reasons, the microcode is compact, efficient, and certainly must be correct. However, the efficiency of the microcode is limited by variable-length instructions, which slow down the decoding process, and a variable number of clock cycles per instruction, which makes it difficult to implement instruction pipelines. Also, the microcode interprets each instruction as it is retrieved from memory. This additional translation process takes time. The more complex the instruction set, the longer it will take to fetch the instruction and activate the appropriate hardware to execute it. RISC architectures take a different approach. Most RISC instructions are executed in one clock cycle. To achieve this acceleration, the microprogrammed control is replaced by the wired control, which is faster in the execution of instructions. This makes instruction chaining easier, but makes hardware-level complexity harder to deal with. In RISC systems, the complexity removed from the instruction set is taken to a higher level in the compiler domain. To illustrate, let's look at an instruction. Suppose we want to compute the product, 5 ⫻ 10. The code on a CISC machine might look like this: mov ax, 10, mov bx, 5, mul bx, ax


Page 446:
9.2 / RISC Machines, , 415, , A minimalist RISC ISA has no multiplication instructions. So, in a RISC system, our multiplication problem would look like this: mov ax, 0, mov bx, 10, mov cx, 5, Begin: add ax, bx, loop Begin, , ;loops cx times, , O O CISC The code, although shorter, requires more clock cycles to execute. Assume that on each architecture, the operations of moves, adds, and register-to-register loops each consume one clock cycle. Also assume that a multiplication operation requires 30 clock cycles.1 Comparing the two code snippets, we have:, CISC instructions:, Total clock cycles = (2 moves ⫻ 1 clock cycle) + (1 mul ⫻ 30 clock cycles clock), = 32 clock cycles, RISC instructions:, Total clock cycles = (3 moves ⫻ 1 clock cycle) + (5 adds ⫻ 1 clock cycle), + (5 loops ⫻ 1 clock cycle) , = 13 clock cycles, , Add to this the fact that RISC clock cycles are generally shorter than CISC clock cycles and it should be clear that even though there are more instructions, the actual execution time is less for RISC than for CISC. This is the main inspiration behind RISC design. We mentioned that reducing the complexity of the instructions results in simpler chips. Transistors previously used in CISC instruction execution are used for pipelines, caching, and registers. Of these three, logs offer the most potential to improve performance, so it makes sense to increase the number of logs and use them in innovative ways. One of these innovations is the use of registry window sets. While not as widely accepted as other innovations associated with RISC architectures, registry windowing is an interesting idea and is briefly presented here. High-level languages ​​rely on modularization for efficiency. Procedure, calls, and parameter passing are natural side effects of using these modules. Calling a procedure is not a trivial task. It involves saving a return address, preserving, registering values, passing parameters (either by pushing them on a stack or using registers), branching to the subroutine, and executing the subroutine. At the end of the subroutine, the modifications to the parameter values ​​must be saved and the previous register values ​​must be restored before execution is returned to the calling program. Saving records, passing parameters, and restoring records requires considerable effort and resources. With RISC chips capable of hundreds, 1This is not an unrealistic number: a multiplication on an Intel 8088 requires 133 clock cycles for two 16-bit numbers.


Page 447:
416, Chapter 9 / Alternative Registry Architectures, the save and restore sequence can be reduced to a simple change, registry environments. To fully understand this concept, try visualizing all records as being divided into sets. When a program runs in an environment, only a certain set of records is seen. If the program changes to a different environment (say a procedure is called), the visible set of records for the new environment changes. For example, while the main program is running, you may only see registers 0 through 9. When a certain procedure is called, you may see registers 10 through 19. Typical values ​​for real RISC architectures include 16 sets of records (or , windows) of 32 records each. The CPU is restricted to operate in only one window at any given time. Therefore, from the programmer's point of view, there are only 32 registers available. Logger windows, by themselves, do not necessarily help with procedures, calls, or parameter passing. However, if these windows overlap carefully, passing parameters from one module to another becomes a simple matter of switching from one set of registers to another, allowing the two sets to overlap exactly by the registers to be to share. . This is done by dividing the log window into different partitions, including global logs (common to all windows), local logs (local to the current window), input logs (which overlap the output logs of the window above) and output records (which overlap). the input records of the next window). When the CPU changes from one procedure to the next, it switches to a different register window, but overlapping windows allow parameters to be "passed" simply by switching from output registers in the calling module to input registers in the called module. A current window pointer (CWP) points to the record window defined to be used at any time. Consider a scenario where Procedure One is calling Procedure Two. Of the 32 records in each set, assume that 8 are global, 8 are local, 8 are input, and 8 are output. When Procedure One calls Procedure Two, all parameters that need to be passed are placed in Procedure One's output record set. Once Procedure Two starts executing, these records become the input record defined for Procedure One. Procedure Two. This process is illustrated in Figure 9.1. Another important piece of information to note regarding record windows on RISC machines is the circular nature of the recordset. For programs with a high degree of nesting, it is possible to exhaust the supply of records. When this occurs, main memory takes over, storing the lowest numbered windows, which contain values ​​from the oldest procedure calls. The highest numbered record locations (the most recent activations) are grouped under the lowest numbered records. As procedures return, the nesting level decreases and the register values ​​from memory are restored in the order they were saved. In addition to simple, fixed-length instructions, efficient pipelines in RISC machines gave these architectures a huge speed boost. The simpler instructions freed up physical space on the chip, resulting not only in more usable space, but also in chips that are easier and less time-consuming to design and manufacture.


Page 448:
9.3 / Flynn's Taxonomy, R0, ..., , R0, ..., , Globals, , R8, ..., , Entries, , R15, R16, ..., , Locals, , R23, R24, . . ., , Globals, , R7, , R7, , Procedure One, , 417, , CWP = 8, , Outputs, , R31, , Overlap, , R8, ..., , CWP = 24, Inputs, , R15, R16 , ..., , Locales, , R23, R24, ..., , Outputs, , Procedure Two, , R31, , FIGURE 9.1 Overlapping Log Windows, , You should be aware that it is becoming increasingly difficult to categorize current processors such as RISC or CISC. The lines that separate these architectures have become blurred. Some current architectures use both approaches. If you look at some of the more recent chip manuals, you'll see that today's RISC machines have more sophisticated and complex instructions than some CISC machines. The RISC PowerPC, for example, has a larger instruction set than the CISC Pentium. As VLSI technology continues to make transistors smaller and cheaper, instruction set expansion is becoming less of an issue in CISC compared to the RISC debate, while register usage and load architecture /storage are becoming less of a problem. That said, we cautiously provide Table 9.1 as a summary of the classic differences between RISC and CISC. As we mentioned, although many sources praise the revolutionary innovations of RISC design, many of the ideas used in RISC (simple instructions) machines were implemented on mainframes in the 1960s and 1970s. There are many so-called new designs that are not really new. , but simply recycled. Innovation does not necessarily mean inventing a new wheel; it could be a simple case of figuring out the best way to use a wheel that already exists. This is a lesson that will be useful to you in your career in the computer field. Although none of them is perfect, the most widely accepted taxonomy today is the one proposed by Michael Flynn in 1972. Flynn's taxonomy considers two factors: the number of instructions and the number of


Page 449:
418, , Chapter 9 / Alternative Architectures, , RISC, , CISC, , Multiple record sets, often consisting of more than 256 records, , Single record set, typically 6 to 16 records total, , Three allowed register operands per instruction (eg, add R1, R2, R3), One or two register operands, allowed per instruction (eg, add R1, R2), Parameter passing, efficient on-chip register windows, Passing of parameters inefficient, chip out of memory, Single-loop instructions (except for load and store), , Multiple-loop instructions, , Hardwired control, , Microprogrammed control, , Highly channelized, , Less channelized, , Simple instructions that are few in number, many complex instructions, fixed-length instructions, variable-length instructions, compiler complexity, microcode complexity, load-only instructions and the storage can access memory, many in instructions can access memory, few addressing modes, many addressing modes, TABLE 9.1 The eristic character of RISC versus CISC machines, data streams flowing into the processor. A machine can have one or several data streams and can have one or several processors working with that data. This gives us four possible combinations: SISD (single instruction sequence, single data stream), SIMD (single instruction sequence, multiple data streams), MISD (multiple instruction sequences, single data stream), and MIMD (multiple instruction sequences, single data stream). data) multiple instructions, multiple data streams). Uniprocessors are SISD machines. SIMD machines, which have a single checkpoint, execute the same instruction simultaneously on multiple data and values. The SIMD category includes matrix processors, vector processors, and systolic matrices. MISD machines have multiple instruction streams that operate on the same data stream. MIMD machines, which employ multiple checkpoints, have independent instructions and data streams. Most of today's multiprocessors and parallel systems are MIMD machines. SIMD computers are simpler to design than MIMD machines, but they are also considerably less flexible. All SIMD multiprocessors must execute the same instruction simultaneously. If you think about it, doing something as simple as conditional branching can quickly become very expensive.


Page 450:
9.3 / Flynn's Taxonomy, , 419, , Flynn's taxonomy falls short in several areas. For one, there seem to be very, very few applications (if any) for MISD machines. Second, Flynn assumed that the parallelism was homogeneous. A collection of processors can be homogeneous or heterogeneous. A machine might have four separate floating points, adders, two multipliers, and a single integer unit. So this machine could perform seven operations in parallel, but it doesn't fit easily into Flynn's classification system. Another problem with this taxonomy is with the MIMD category. A multiprocessor architecture falls into this category without regard to how the processors are connected or how they view memory. There have been various attempts to refine the MIMD category. Suggested changes include subdivision, MIMD to differentiate systems that share memory from those that do not, as well as categorizing processors based on whether they are bus-based or switched. Shared memory systems are those in which all processors have access to global memory and communicate through shared variables, just as processes on a single processor do. If multiple processors do not share memory, each processor must own a piece of memory. Consequently, all processors must communicate using message passing, which can be costly and inefficient. The problem some people have with memory usage as a determining factor in classifying hardware is that shared memory and message passing are actually programming models, not hardware models. Therefore, they more properly belong to the domain of system software. The two main paradigms of parallel architecture, SMP (symmetric multiprocessors) and MPP (massively parallel processors), are MIMD architectures, but they differ in the way they use memory. SMP machines like a two-processor Intel PC and the 256-processor Origin3000 share memory, while MPP processors like the nCube, CM5, and Cray T3E do not. These specific MPP machines often house thousands of CPUs in a single large cabinet connected to hundreds of gigabytes of memory. The price of these systems can reach millions of dollars. Originally, the term MPP described tightly coupled SIMD multiprocessors such as the Connection Machine and Goodyear's MPP. Today, however, the term MPP is used to refer to parallel architectures that have multiple autonomous nodes with private memories, all of which have the ability to communicate over a network. An easy way to differentiate between SMP and MPP (as currently defined) is as follows: MPP = many processors + distributed memory + network communication, and SMP = few processors + shared memory + memory communication, distributed computing is another example. of the MIMD architecture. Distributed computing is generally defined as a collection of networked computers that work collaboratively to solve a problem. However, such collaboration can occur in many different ways.


Page 451:
420, , Chapter 9 / Alternative Architectures, , A Workstation Network (NOW) is a collection of distributed workstations, running in parallel only while the nodes are not used as normal workstations. NOWs typically consist of heterogeneous systems, with different processors and software, communicating over the Internet. Individual users must establish the proper connection to the network before joining parallel computing. A Cluster of Workstations (COW) is a collection similar to NOW, but requires a single entity to be in charge. The nodes usually have common software and a user who can access one node can usually access all nodes. A dedicated cluster parallel computer (DCPC) is a collection of workstations assembled specifically to work on a given parallel computation. The workstations have common software and file systems, are managed by a single entity, communicate over the Internet, and are not used as workstations. A stack of PCs (POPC) is a cluster of dedicated heterogeneous hardware used to build a parallel system. While a DCPC has relatively few, but expensive and fast components, a POPC uses a large number of slow but relatively cheap nodes. The BEOWULF project, presented in 1994 by Thomas Sterling and Donald Becker of the Goddard Space Flight Center, is a POPC architecture that successfully combined multiple hardware platforms with purpose-built software, resulting in an architecture that has the appearance of a parallel machine. unified. . Nodes in a BEOWULF network are always connected via a private network. If you have an old Sun SPARC, some 486 machines, a DEC Alpha (or just a big collection of dusty Intel machines!), your own personal but extremely powerful parallel computer. Flynn's taxonomy has recently been extended to include SPMD (single program, multiple data) architectures. An SPMD consists of multiprocessors, each with its own set of data and program memory. The same program runs on each processor, with synchronization at multiple global control points. Although each processor loads the same program, each can execute different instructions. For example, a program might have code similar to: If myNodeNum = 1 do this, else do that. same program. SPMD is actually a programming paradigm used in MIMD machines and it differs from SIMD in that the processors can do different things at the same time. Supercomputers often use an SPMD design. add one more feature, which is whether the architecture is instruction-based or data-based. The classical von Neumann architecture is instruction oriented. All processor activities are determined by a sequence of program code. Program instructions, act on data. Data-driven, or data-flow, architectures do the exact opposite. The characteristics of the data determine the sequence of event processing. We will explore this idea in more detail in Section 9.5. With the addition of data flow computers and some refinements to the MIMD classification, we get the taxonomy shown in Figure 9.2. you can wish


Page 452:
9.4 / Parallel and multiprocessor architectures, , 421, , Architecture, , Instruction flow, , SISD, , SIMD, , MISD, , Data flow, , MIMD, , Shared memory, , SPMD, , Distributed memory, , MPP, , Supercomputers, , Distributed systems, , FIGURE 9.2 A taxonomy of computer architectures, , refer to it as you read the following sections. We start at the left branch of the tree, with topics relevant to SIMD and MIMD architectures. 🇧🇷 Miniaturization technology has resulted in improved circuitry and more on a chip. Clocks got faster, leading to CPUs in the gigahertz range. However, we know that there are physical barriers that control how much performance can be improved from a single processor. Heat and electromagnetic interference limit the density of the chip's transistor. Even if (when?) these issues are fixed, processor speeds will always be limited by the speed of light. In addition to these physical limitations, there are also financial limitations. At some point, the cost of making an ever-faster processor will exceed the price anyone is willing to pay. Ultimately, we won't have a viable way to improve processor performance except to spread the computational load across multiple processors. For these reasons, parallelism is becoming increasingly popular. However, it is important to note that not all applications can benefit from parallelism. For example, multithreading parallelism adds costs (such as processes, synchronization, and other aspects of process management). if an application


Page 453:
422, , Chapter 9 / Alternative Architectures, , is not compatible with a parallel solution, it is generally not cost-effective to move to a parallel multiprocessing architecture., Implemented correctly, parallelism results in higher performance, better fault tolerance, and better more attractive price/performance ratio. While parallelism can result in significant acceleration, that acceleration can never be perfect. Given n processors running in parallel, perfect speedup would imply that a computational task could be completed in n1 time, leading to an n-fold increase in power (or execution time, reduced by a factor of n). Law to understand why perfect acceleration is not possible. If two processing components operate at two different speeds, the slower speed will dominate. This law also governs the speedup that can be achieved by using parallel processors in a problem. No matter how well an application parallelizes, there will always be a small amount of work done by a processor that needs to be done serially. Additional processors cannot do anything other than wait for serial processing to complete. The underlying premise is that every algorithm has a sequential part, which limits the speedup that can be achieved through a multiprocessor implementation. The more sequential processing, the less economical it is to employ a parallel multiprocessing architecture. Using multiple processors on a single task is just one of many different types of parallelism. In previous chapters, we've introduced some of them, including channeling, VLIW, and ILP, and provided motivations for each specific type. Other parallel architectures deal with multiple (or parallel) data. Examples include SIMD, machines such as vector, neural, and systolic processors. There are many architectures that allow multiple or parallel processes, characteristic of all MIMD machines. It is important to note that "parallel" can have many different meanings and it is equally important to be able to distinguish between them. We begin this section with a discussion of example ILP architectures and then move on to SIMD and MIMD architectures. The last section introduces alternative (less conventional) parallel processing approaches, including systolics, arrays, neural networks, and dataflow computing, 9.4.1, Superscalar, and VLIW. for VLIW architectures. The superscalar and VLIW architectures exhibit parallelism at the instruction level but differ in their approach. To set the stage for our discussion, we start with a definition of superpipelining. Remember that the pipeline breaks the get, decode, and execute cycle into stages, where a set of instructions is in different stages at the same time. In a perfect world, one instruction would exit the tube every clock cycle. However, due to branching instructions and data dependencies in the code, the goal of one instruction per loop is never fully achieved. Super pipeline occurs when a pipeline has stages that require less than half a clock cycle to execute. You can add an internal clock that, running at twice the speed of the external clock, can complete two tasks per cycle of the external clock. While superpipeline is equally applicable to RISC and CISC architectures, it is more commonly incorporated into RISC processors. superpipeline is


Page 454:
9.4 / Parallel and Multiprocessor Architectures, , 423, , an aspect of superscalar design, and for that reason there is often confusion about which is which. So what exactly is a superscalar processor? We know that the Pentium processor is superscalar, but we haven't discussed what it really means yet. Superscalar is a design methodology that allows multiple instructions to be executed simultaneously in each cycle. Although superscalar differs from pipelining in several ways that will be discussed shortly, the net effect is the same. The way superscalar designs achieve acceleration is similar to the idea of ​​adding another lane to a busy one-lane highway. Additional "hardware" is required, but in the end, more cars (instructions) can get from point A to point B in the same amount of time. The superscalar components analogous to our additional highway lanes are called execution units. The execution units consist of floating point adders and multipliers, integer adders and multipliers, and other specialized components. While the units can also work independently, it is important that the architecture have a sufficient number of these specialized units to process multiple instructions in parallel. It is not uncommon for execution units to double; for example, a system may have a pair of identical floating point units. Execution drives are often pipelined, which provides even better performance. A critical component of this architecture is a specialized instruction fetch unit, which can retrieve multiple instructions simultaneously from memory. This unit, in turn, passes the instructions to a complex decoder unit that determines if the instructions are independent (and therefore can be executed simultaneously) or if there is some kind of dependency (in which case not all the instructions can be executed). ). at the same time). As an example, consider the IBM RS/6000. This processor had an instruction, a seek unit, and two processors, each containing a 6-stage floating-point unit and a 4-stage integer unit. The instruction getting unit was configured with a 2-stage pipeline, where the first stage got packets of four instructions each, and the second stage delivered the instructions to the appropriate processing unit. . The superscalar design includes superpipeline, concurrent fetching of multiple instructions, a complex decoding unit capable of determining instruction dependencies and combining them dynamically to ensure that dependencies are not violated, and sufficient amounts of resources for parallel execution of multiple instructions. We note that although this type of parallelism requires very specific hardware, a superscalar architecture also requires a sophisticated compiler to schedule operations that make the most of the machine's resources. the compiler (to generate approximate timelines), VLIW processors are entirely compiler dependent. VLIW processors bundle independent instructions into one long instruction, which in turn tells the execution units what to do. Many argue that because the compiler has a better overview of the dependencies in the code, this approach results in better performance. However, the compiler cannot see an overview of the code at run time, so you are forced to be conservative in your programming.


Page 455:
424, , Chapter 9 / Alternative Architectures, , Because a VLIW compiler creates very long instructions, it also arbitrates all dependencies. Instructions, which are fixed at compile time, typically contain four to eight instructions. Since the instructions are fixed, any modification that might affect instruction scheduling (such as changing memory latency) requires a recompile of the code, which can cause a number of problems for software vendors. VLIW advocates point out that this technology simplifies the hardware by shifting the complexity to the compiler. Proponents of superscaling counter with the argument that VLIW can, in turn, lead to significant increases in the amount of code generated. For example, when program control fields are not used, memory space and bandwidth are wasted. In fact, a typical FORTRAN program explodes at double and sometimes triple its normal size when compiled on a VLIW machine. Intel's Itanium IA-64 is an example of a VLIW processor. Remember that the IA-64 uses an EPIC style of VLIW processor. An EPIC architecture has some advantages over a normal VLIW processor. Like VLIW, EPIC packages its instructions for delivery to multiple execution units. Unlike VLIW, however, these packets do not need to be the same length. A special delimiter indicates where one packet ends and another begins. The instruction words are prefetched by the hardware, which identifies and schedules the packets into independent groups for parallel execution. This is an attempt to overcome the limitations introduced by the compiler's lack of complete knowledge of the code at runtime. Instructions inside packages can be executed in parallel without worrying about dependencies, and therefore without worrying about order. By most people's definition, EPIC is really VLIW. While Intel might argue and die-hard architects would mention the minor differences mentioned above (as well as a few others), EPIC is actually an enhanced version of VLIW, 9.4.2, vector processors, often referred to as supercomputers. Vector processors are heavy-duty, specialized pipeline processors that perform efficient operations on entire vectors and matrices at once. This class of processor is ideal for applications that can benefit from a high degree of parallelism, such as weather forecasting, medical diagnostics, and image processing. To understand vector processing, one must first understand vector arithmetic. one-dimensional array of fixed-length values ​​or an ordered array of scalar quantities. Several arithmetic operations are defined on vectors, including addition, subtraction, and multiplication. Vector computers are heavily threaded, so arithmetic operations can overlap. Each declaration specifies a set of operations to be performed on an entire vector. For example, let's say we want to add the vectors V1 and V2 and put the results in V3. On a traditional processor, our code would include the following loop:, for i = 0 to VectorLength, V3[i] = V1[i] + V2[i];


Page 456:
9.4 / Parallel and Multiprocessor Architectures, , 425, , However, on a vector processor, this code becomes, LDV, LDV, ADDV, STV, , V1,, V2,, R3,, R3,, , R1, R2 , R1, R2, V3, , ;Loads vector1 into vector register R1, , ;Stores vector register R3 into vector V3, , Vector registers are specialized registers that can hold multiple vector elements at once. The contents of the registers are sent one element at a time to a vector pipeline, and the output of the pipeline is sent back to the vector registers one element at a time. These registers are therefore FIFO queues capable of holding many values. Vector processors usually have several of these registers. The instruction, defined for a vector processor, contains instructions to load these registers, perform operations on elements within the registers, and store the vector data back into memory. its operands. Register-register vector processors require that all operations use registers as source and destination operands. Memory-to-memory vector processors allow operands in memory to be routed directly to the arithmetic unit. The results of the operation are transmitted to memory. Record-to-record processors are at a disadvantage, since long arrays must be partitioned into fixed-length segments that are small enough to fit into the registers. latency. (The initialization time is the time between the initialization of the instruction and the first result coming out of the pipeline.) However, once the pipe is full, this drawback is gone. Vector instructions are efficient for two reasons. First, the machine gets significantly fewer instructions, which means less decoding, control unit overhead, and memory bandwidth usage. Second, the processor knows that it will have a continuous source of data and can start prefetching matching value pairs. If interleaved memory is used, a pair can arrive per clock cycle. The most famous vector processors are the Cray series of supercomputers. Its basic architecture has changed little over the past 25 years., 9.4.3, Interconnection Networks, In MIMD parallel systems, communication is essential for synchronized processing and data exchange. The way in which messages are transmitted between system components determines the overall design of the system. The two options are to use shared memory or an interconnect network model. Shared memory systems have a large amount of memory that is accessed identically by all processors. In networked systems, each processor has its own memory, but processors can access the memory of other processors over the network. Both of course have their strengths and weaknesses. Interconnection networks are generally classified based on topology, routing strategy, and switching technique. The topology of the network, the way


Page 457:
426, , Chapter 9 / Alternative architectures, , the components are interconnected, it is a determining factor in the overhead, the cost of passing messages. The efficiency of the transmission of messages is limited by: • Bandwidth: the information transport capacity of the network. • Message Latency: The time required for the first bit of a message to reach its destination. • Transport latency – The time a message takes on the network. • Overhead – Message processing activities at the sender and receiver. Also, network designs attempt to minimize the number of messages required and the distances they must travel. to travel. , Interconnection networks can be static or dynamic. Dynamic networks allow the path between two entities (two processors or a processor and memory) to change from one communication to the next, whereas static networks do not. The interconnection networks can also be blocking or non-blocking. Non-blocking networks allow new connections in the presence of other concurrent connections, while blocking networks do not. Static interconnect networks are used primarily for message passing and include a variety of types, many of which you may be familiar with. Processors are often interconnected using static networks, while processor-memory pairs often use dynamic networks. Fully connected networks are those in which all components are connected to all other components. They are very expensive to build and as new entities are added they become difficult to manage. Star-connected networks have a central hub through which all messages must pass. While a hub can be a central bottleneck, it provides excellent connectivity. Linear or ring networks allow any entity to communicate directly with its two neighbors, but any other communication must go through multiple entities to reach its destination. (The ring is just a variation of a linear array in which the two end entities are directly connected.) A mesh network connects each feature with four or six neighbors (depending on whether it is two-dimensional or three-dimensional). Extensions of this network include those that wrap around, similar to how a linear network might, to form a ring. Tree networks organize entities into non-cyclic structures, which have the potential to form communication bottlenecks at the root. Hypercube networks are multidimensional extensions of mesh networks in which each dimension has two processors. (Hypercubes generally connect processors, not memory processors, groups.) Two-dimensional hypercubes consist of pairs of processors that are connected by a direct link if and only if the binary representation of their labels differs by exactly one bit position. In an n-dimensional hypercube, each processor is directly connected to n other processors. It is interesting to note that the total number of bit positions by which two labels on a hypercube differ is called the Hamming distance, which is also the term used to indicate the number of communication links in the shortest path between two processors. Figure 9.3 illustrates the various types of static networks.


Page 458:
9.4 / Parallel and Multiprocessor Architectures, , a), , b), , c), , d), , e), , f), , FIGURE 9.3, , 427, , Static Network Topologies, a. Fully connected, b. star, c. Linear and Ring, d. mesh and mesh ring, e.g. tree, F. Three-dimensional hypercube, , Dynamic networking allows dynamic network configuration in two ways: using a bus or using a switch that can change paths through the network. Bus-based networks, illustrated in Figure 9.4, are the simplest and most efficient when cost is a concern and the number of entities is moderate. Clearly, the main disadvantage is the bottleneck that can result from bus contention as the number of entities increases. Parallel buses can alleviate this problem, but their cost is considerable. Figure 9.4 A bus-based network


Page 459:
428, , Chapter 9 / Alternative architectures, MM1, , MM2, , MM3, , MM4, Switch open, , CPU1, CPU2, CPU3, Switch closed, CPU4, , FIGURE 9.5 A network of crossbars, , Switching networks use switches to dynamically change routing There are two types of switches: crossbar switches and 2 x 2 switches. Crossbar switches are simply open or closed switches. Any entity can be connected to any other entity by closing the key (making a connection) between them. The networks, which consist of crossbar switches, are fully connected because any entity can communicate directly with any other entity, and simultaneous communications between different processor/memory pairs are allowed. (However, a given processor can have a maximum of one connection at a time.) Transfers are not prevented by closing a switch. Therefore, the crossbar network is a non-blocking network. However, if there is only one switch at each crossing point, n features require n2 switches. In reality, many multiprocessors require many switches at each crosspoint. Therefore, managing multiple switches quickly becomes difficult and expensive. Crossbar switches are practical only in high-speed multiprocessor vector computers. A crossbar switch configuration is shown in Figure 9.5. Blue switches indicate closed switches. A processor can only be connected to one memory at a time, so there will be a maximum of one closed switch per column. The second type of switch is the 2 ⫻ 2 switch. It is similar to a crossbar switch, except that it is capable of routing its inputs to different destinations, whereas a crossbar simply opens or closes the communication channel. A 2 ⫻, 2 interchange switch has two inputs and two outputs. At any time, a switch, 2 ⫻ 2, can be in one of four states: forward, across, upstream, and downstream, as shown in Fig. 9.6. In the direct state, the top entry is, a), , b), , c), , d), , FIGURE 9.6, , Key states of exchange 2 ⫻ 2, a. through, b. cross, c. Upper drive, d. lower transmission


Page 460:
9.4 / Parallel and Multiprocessor Architectures, , 429, , is routed to the top output, and the bottom input is routed to the bottom output. More simply, the input is routed through the switch. In the crossover state, the top input is routed to the bottom output, and the bottom input is routed to the top output. In upper streaming, the upper input is transmitted to the upper and lower outputs. In bottom streaming, the bottom input is transmitted to the top and bottom outputs. The crossover and crossover states are the ones relevant to interconnection networks. The most advanced class of networks, multistage patching networks, are built using 2 × 2 switches. The idea is to incorporate switching stages, usually with processors on one side and memories on the other, with a series of switching elements, like internal nodes. These switches are dynamically configured to allow a path from any processor to any memory. The number of switches and the number of stages contribute to the path length of each communication channel. A short delay may occur while the switch determines the settings necessary to send a message from the specified source to the desired destination. These multi-stage networks are often called random networks, referring to the pattern of connections between switches. Many topologies for multistage switching networks have been suggested. it can be used in tightly coupled systems to control communications between the processor and memory. A switch can only be in one state at a time, so of course a crash can happen. For example, consider a simple topology for these networks, the Omega network shown in Figure 9.7. It is possible for CPU 00 to communicate with memory module 00 if both switches 1A and 2A are set to, through. At the same time, however, it is impossible for CPU 10 to communicate with memory module 01. For that, both switch 1A and switch 2A would have to cross. This Omega network is therefore a blocking network. Non-blocking multistage networks can be built by adding more switches and more stages. In general, an n-node Omega network requires log2n stages with n2, switches per stage., CPU, , Memory, Switches, , 00, 1A, , 00, 2A, , 01, , 01, 1B, , 2B, , 10 , , 10, , 11, , 11, Stage 1, , Stage 2, , FIGURE 9.7 Two-stage Omega network


Page 461:
430, , Chapter 9 / Alternative architectures, O