Showing posts with label x86. Show all posts
Showing posts with label x86. Show all posts

Chapter 1: Introduction

Welcome to Programming

I love programming. I enjoy the challenge to not only make a working program, but to do so with style. Programming is like poetry. It conveys a message, not only to the computer, but to those who modify and use your program. With a program, you build your own world with your own rules. You create your world according to your conception of both the problem and the solution. Masterful programmers create worlds with programs that are clear and succinct, much like a poem or essay.

One of the greatest programmers, Donald Knuth, describes programming not as telling a computer how to do something, but telling a person how they would instruct a computer to do something. The point is that programs are meant to be read by people, not just computers. Your programs will be modified and updated by others long after you move on to other projects. Thus, programming is not as much about communicating to a computer as it is communicating to those who come after you. A programmer is a problem-solver, a poet, and an instructor all at once. Your goal is to solve the problem at hand, doing so with balance and taste, and teach your solution to future programmers. I hope that this book can teach at least some of the poetry and magic that makes computing exciting.

Most introductory books on programming frustrate me to no end. At the end of them you can still ask "how does the computer really work?" and not have a good answer. They tend to pass over topics that are difficult even though they are important. I will take you through the difficult issues because that is the only way to move on to masterful programming. My goal is to take you from knowing nothing about programming to understanding how to think, write, and learn like a programmer. You won't know everything, but you will have a background for how everything fits together. At the end of this book, you should be able to do the following:

  • Understand how a program works and interacts with other programs

  • Read other people's programs and learn how they work

  • Learn new programming languages quickly

  • Learn advanced concepts in computer science quickly

I will not teach you everything. Computer science is a massive field, especially when you combine the theory with the practice of computer programming. However, I will attempt to get you started on the foundations so you can easily go wherever you want afterwards.

There is somewhat of a chicken and egg problem in teaching programming, especially assembly language. There is a lot to learn - it is almost too much to learn almost at all at once. However, each piece depends on all the others, which makes learning it a piece at a time difficult. Therefore, you must be patient with yourself and the computer while learning to program. If you don't understand something the first time, reread it. If you still don't understand it, it is sometimes best to take it by faith and come back to it later. Often after more exposure to programming the ideas will make more sense. Don't get discouraged. It's a long climb, but very worthwhile.

At the end of each chapter are three sets of review exercises. The first set is more or less regurgitation - they check to see if can you give back what you learned in the chapter. The second set contains application questions - they check to see if you can apply what you learned to solve problems. The final set is to see if you are capable of broadening your horizons. Some of these questions may not be answerable until later in the book, but they give you some things to think about. Other questions require some research into outside sources to discover the answer. Still others require you to simply analyze your options and explain a best solution. Many of the questions don't have right or wrong answers, but that doesn't mean they are unimportant. Learning the issues involved in programming, learning how to research answers, and learning how to look ahead are all a major part of a programmer's work.

If you have problems that you just can't get past, there is a mailing list for this book where readers can discuss and get help with what they are reading. The address is <pgubook-readers@nongnu.org>. This mailing list is open for any type of question or discussion along the lines of this book. You can subscribe to this list by going to http://mail.nongnu.org/mailman/listinfo/pgubook-readers.

If you are thinking of using this book for a class on computer programming but do not have access to Linux computers for your students, I highly suggest you try to find help from the K-12 Linux Project. Their website is at http://www.k12linux.org/ and they have a helpful and responsive mailing list available.

Your Tools

This book teaches assembly language for x86 processors and the GNU/Linux operating system. Therefore we will be giving all of the examples using the GNU/Linux standard GCC tool set. If you are not familiar with GNU/Linux and the GCC tool set, they will be described shortly. If you are new to Linux, you should check out the guide available at http://rute.sourceforge.net/ [1] What I intend to show you is more about programming in general than using a specific tool set on a specific platform, but standardizing on one makes the task much easier.

Those new to Linux should also try to get involved in their local GNU/Linux User's Group. User's Group members are usually very helpful for new people, and will help you from everything from installing Linux to learning to use it most efficiently. A listing of GNU/Linux User's Groups is available at http://www.linux.org/groups/

All of these programs have been tested using Red Hat Linux 8.0, and should work with any other GNU/Linux distribution, too. [2] They will not work with non-Linux operating systems such as BSD or other systems. However, all of the skills learned in this book should be easily transferable to any other system.

If you do not have access to a GNU/Linux machine, you can look for a hosting provider who offers a Linux shell account, which is a command-line only interface to a Linux machine. There are many low-cost shell account providers, but you have to make sure that they match the requirements above (i.e. - Linux on x86). Someone at your local GNU/Linux User's Group may be able to give you one as well. Shell accounts only require that you already have an Internet connection and a telnet program. If you use Windows®, you already have a telnet client - just click on start, then run, then type in telnet. However, it is usually better to download PuTTY from http://www.chiart.greenend.co.uk/~sgtatham/putty/ because Windows' telnet has some weird problems. There are a lot of options for the Macintosh, too. NiftyTelnet is my favorite.

If you don't have GNU/Linux and can't find a shell account service, then you can download Knoppix from http://www.knoppix.org/ Knoppix is a GNU/Linux distribution that boots from CD so that you don't have to actually install it. Once you are done using it, you just reboot and remove the CD and you are back to your regular operating system.

So what is GNU/Linux? GNU/Linux is an operating system modeled after UNIX®. The GNU part comes from the GNU Project (http://www.gnu.org/) [3], which includes most of the programs you will run, including the GCC tool set that we will use to program with. The GCC tool set contains all of the programs necessary to create programs in various computer languages.

Linux is the name of the kernel. The kernel is the core part of an operating system that keeps track of everything. The kernel is both a fence and a gate. As a gate, it allows programs to access hardware in a uniform way. Without the kernel, you would have to write programs to deal with every device model ever made. The kernel handles all device-specific interactions so you don't have to. It also handles file access and interaction between processes. For example, when you type, your typing goes through several programs before it hits your editor. First, the kernel is what handles your hardware, so it is the first to receive notice about the keypress. The keyboard sends in scancodes to the kernel, which then converts them to the actual letters, numbers, and symbols they represent. If you are using a windowing system (like Microsoft Windows® or the X Window System), then the windowing system reads the keypress from the kernel, and delivers it to whatever program is currently in focus on the user's display.

Example 1-1: How the computer processes keyboard sigals
Start example
Keyboard -> Kernel -> Windowing system -> Application program
End example

The kernel also controls the flow of information between programs. The kernel is a program's gate to the world around it. Every time that data moves between processes, the kernel controls the messaging. In our keyboard example above, the kernel would have to be involved for the windowing system to communicate the keypress to the application program.

As a fence, the kernel prevents programs from accidentally overwriting each other's data and from accessing files and devices that they don't have permission to. It limits the amount of damage a poorly-written program can do to other running programs.

In our case, the kernel is Linux. Now, the kernel all by itself won't do anything. You can't even boot up a computer with just a kernel. Think of the kernel as the water pipes for a house. Without the pipes, the faucets won't work, but the pipes are pretty useless if there are no faucets. Together, the user applications (from the GNU project and other places) and the kernel (Linux) make up the entire operating system, GNU/Linux.

For the most part, this book will be using the computer's low-level assembly language. There are essentially three kinds of languages:

Machine Language

  • This is what the computer actually sees and deals with. Every command the computer sees is given as a number or sequence of numbers.

Assembly Language

  • This is the same as machine language, except the command numbers have been replaced by letter sequences which are easier to memorize. Other small things are done to make it easier as well.

High-Level Language

  • High-level languages are there to make programming easier. Assembly language requires you to work with the machine itself. High-level languages allow you to describe the program in a more natural language. A single command in a high-level language usually is equivalent to several commands in an assembly language.

In this book we will learn assembly language, although we will cover a bit of high-level languages. Hopefully by learning assembly language, your understanding of how programming and computers work will put you a step ahead.

[1]This is quite a large document. You certainly don't need to know everything to get started with this book. You simply need to know how to navigate from the command line and how to use an editor like pico, emacs, or vi (or others).

[2]By "GNU/Linux distribution", I mean an x86 GNU/Linux distribution. GNU/Linux dis- tributions for the Power Macintosh, the Alpha processor, or other processors will not work with this book.

[3]The GNU Project is a project by the Free Software Foundation to produce a complete, free operating system.

Programming from the Ground Up

Jonathan Bartlett
Edited by Dominick Bruno, Jr.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in Appendix H. In addition, you are granted full rights to use the code examples for any purpose without even having to credit the authors.

To receive a copy of this book in electronic form, please visit the website http://savannah.nongnu.org/projects/pgubook/ This site contains the instructions for downloading a transparent copy of this book as defined by the GNU Free Documentation License.

All trademarks are property of their respective owners.

ISBN 0-9752838-4-7

Published by Bartlett Publishing in Broken Arrow, Oklahom

Library of Congress Control Number: 2004091465

Bartlett Publishing Cataloging-in-Publication Data

Bartlett, Jonathan, 1977-
Programming from the ground up / Jonathan Bartlett ; edited by Dominick
Bruno.
p. cm.
Includes index.

ISBN 0-9752838-4-7


1. Linux. 2. Operating systems (Computers) 3. Computer programming. I. Bruno, Dominick. II. Title.

QA76.76.O63 2004

005.268—dc22 2004091465

This book can be purchased at http://www.bartlettpublishing.com/

This book is not a reference book, it is an introductory book. It is therefore not suitable by itself to learn how to professionally program in x86 assembly language, as some details have been left out to make the learning process smoother. The point of the book is to help the student understand how assembly language and computer programming works, not to be a reference to the subject. Reference information about a particular processor can be obtained by contacting the company which makes it.

Appendix I: Personal Dedication

There are so many people I could thank. I will name here but a few of the people who have brought me to where I am today. The many family members, Sunday School teachers, youth pastors, school teachers, friends, and other relationships that God has brought into my life to lead me, help me, and teach me are too many to count. This book is dedicated to you all.

There are some people, however, that I would like to thank specifically.

First of all, I want to thank the members of the Vineyard Christian Fellowship Church in Champaign, Illinois for everything that you have done to help me and my family in our times of crisis. It's been a long time since I've seen or heard from any of you, but I think about you always. You have been such a blessing to me, my wife, and Daniel, and I could never thank you enough for showing us Christ's love when we needed it most. I thank God every time I think of you - I thank Him for bringing you all to us in our deepest times of need. Even out in the middle of Illinois with no friends of family, God showed that He was still watching after us. Thank you for being His hands on Earth. Specifically, I'd like to thank Joe and Rhonda, Pam and Dell, and Herschel and Vicki. There were many, many others, too - so many people helped us that it would be impossible to list them all.

I also want to thank my parents, who gave me the example of perserverance and strength in hard times. Your example has helped me be a good father to my children, and a good husband to my wife.

I also want to thank my wife, who even from when we first started dating encouraged me to seek God in everything. Thank you for your support in writing this book, and more importantly, for your support in being obedient to God.

I also want to thanks the Little Light House school. My entire family is continually blessed by the help you give to our son.

I also want to thank Joe and D.A. Thank you for taking a chance on me in ministry. Being able to be a part of God's ministry again has helped me in so many ways.

You all have given me the strength I needed to write this book over the last few years. Without your support, I would have been too overwhelmed by personal crises to even think about anything more than getting through a day, much less putting this book together. You have all been a great blessing to me, and I will keep you in my prayers always.

Appendix H: GNU Free Documentation License

0. PREAMBLE

The purpose of this License is to make a manual, textbook, or other written document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you".

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License.

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License.

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML produced by some word processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.

2. VERBATIM COPYING

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may publicly display copies.

3. COPYING IN QUANTITY

If you publish printed copies of the Document numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network-using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.

4. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

  • A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.

  • B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has less than five).

  • C. State on the Title Page the name of the publisher of the Modified Version, as the publisher.

  • D. Preserve all the copyright notices of the Document.

  • E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.

  • F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.

  • G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.

  • H. Include an unaltered copy of this License.

  • I. Preserve the section entitled "History", and its title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.

  • J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.

  • K. In any section entitled "Acknowledgements" or "Dedications", preserve the section's title, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.

  • L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.

  • M. Delete any section entitled "Endorsements". Such a section may not be included in the Modified Version.

  • N. Do not retitle any existing section as "Endorsements" or to conflict in title with any Invariant Section.

If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.

You may add a section entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.

5. COMBINING DOCUMENTS

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections entitled "History" in the various original documents, forming one section entitled "History"; likewise combine any sections entitled "Acknowledgements", and any sections entitled "Dedications". You must delete all sections entitled "Endorsements."

6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and dispbibute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.

7. AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is called an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document , on account of their being thus compiled, if they are not themselves derivative works of the Document. If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document's Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate.

8. TRANSLATION

Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License provided that you also include the original English version of this License. In case of a disagreement between the translation and the original English version of this License, the original English version will prevail.

9. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.

10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.

Addendum

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:

Copyright © YEAR YOUR NAME.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. A copy of the license is included in the section entitled "GNU Free Documentation License".

If you have no Invariant Sections, write "with no Invariant Sections" instead of saying which ones are invariant. If you have no Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover Texts being LIST"; likewise for Back-Cover Texts.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.

Appendix G: Document History

  • 12/17/2002 - Version 0.5 - Initial posting of book under GNU FDL

  • 07/18/2003 - Version 0.6 - Added ASCII appendix, finished the discussion of the CPU in the Memory chapter, reworked exercises into a new format, corrected several errors. Thanks to Harald Korneliussen for the many suggestions and the ASCII table.

  • 01/11/2004 - Version 0.7 - Added C translation appendix, added the beginnings of an appendix of x86 instructions, added the beginnings of a GDB appendix, finished out the files chapter, finished out the counting chapter, added a records chapter, created a source file of common linux definitions, corrected several errors, and lots of other fixes

  • 01/22/2004 - Version 0.8 - Finished GDB appendix, mostly finished w/appendix of x86 instructions, added section on planning programs, added lots of review questions, and got everything to a completed, initial draft state.

  • 01/29/2004 - Version 0.9 - Lots of editting of all chapters. Made code more consistent and made explanations clearer. Added some illustrations.

  • 01/31/2004 - Version 1.0 - Rewrote chapter 9. Added full index. Lots of minor corrections.

  • 04/18/2004 - Version 1.1 - Lots of minor updates based on reader comments. Made cleared distinction between dynamic and shared libraries.

Appendix F: Using the GDB Debugger

Overview

By the time you read this appendix, you will likely have written at least one program with an error in it. In assembly language, even minor errors usually have results such as the whole program crashing with a segmentation fault error. In most programming languages, you can simply print out the values in your variables as you go along, and use that output to find out where you went wrong. In assembly language, calling output functions is not so easy. Therefore, to aid in determining the source of errors, you must use a source debugger.

A debugger is a program that helps you find bugs by stepping through the program one step at a time, letting you examine memory and register contents along the way. A source debugger is a debugger that allows you to tie the debugging operation directly to the source code of a program. This means that the debugger allows you to look at the source code as you typed it in - complete with symbols, labels, and comments.

The debugger we will be looking at is GDB - the GNU Debugger. This application is present on almost all GNU/Linux distributions. It can debug programs in multiple programming languages, including assembly language.

An Example Debugging Session

The best way to explain how a debugger works is by using it. The program we will be using the debugger on is the maximum program used in Chapter 3. Let's say that you entered the program perfectly, except that you left out the line:

 incl %edi

When you run the program, it just goes in an infinite loop - it never exits. To determine the cause, you need to run the program under GDB. However, to do this, you need to have the assembler include debugging information in the executable. All you need to do to enable this is to add the --gstabs option to the as command. So, you would assemble it like this:

as --gstabs maximum.s -o maximum.o

Linking would be the same as normal. "stabs" is the debugging format used by GDB. Now, to run the program under the debugger, you would type in gdb ./maximum. Be sure that the source files are in the current directory. The output should look similar to this:

GNU gdb Red Hat Linux (5.2.1-4)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are welcome to change it and/or
distribute copies of it under certain conditions. Type
"show copying" to see the conditions. There is
absolutely no warranty for GDB. Type "show warranty"
for details.
This GDB was configured as "i386-redhat-linux"...
(gdb)

Depending on which version of GDB you are running, this output may vary slightly. At this point, the program is loaded, but is not running yet. The debugger is waiting your command. To run your program, just type in run. This will not return, because the program is running in an infinite loop. To stop the program, hit control-c. The screen will then say this:

Starting program: /home/johnnyb/maximum

Program received signal SIGINT, Interrupt.
start_loop () at maximum.s:34
34 movl data_items(,%edi,4), %eax
Current language: auto; currently asm
(gdb)

This tells you that the program was interrupted by the SIGINT signal (from your control-c), and was within the section labelled start_loop, and was executing on line 34 when it stopped. It gives you the code that it is about to execute.

Depending on exactly when you hit control-c, it may have stopped on a different line or a different instruction than the example.

One of the best ways to find bugs in a program is to follow the flow of the program to see where it is branching incorrectly. To follow the flow of this program, keep on entering stepi (for "step instruction"), which will cause the computer to execute one instruction at a time. If you do this several times, your output will look something like this:

(gdb) stepi
35 cmpl %ebx, %eax
(gdb) stepi
36 jle start_loop
(gdb) stepi
32 cmpl $0, %eax
(gdb) stepi
33 je loop_exit
(gdb) stepi
34 movl data_items(,%edi,4), %eax
(gdb) stepi
35 cmpl %ebx, %eax
(gdb) stepi
36 jle start_loop
(gdb) step
32 cmpl $0, %eax

As you can tell, it has looped. In general, this is good, since we wrote it to loop. However, the problem is that it is never stopping. Therefore, to find out what the problem is, let's look at the point in our code where we should be exitting the loop:

cmpl  $0, %eax
je loop_exit

Basically, it is checking to see if %eax hits zero. If so, it should exit the loop. There are several things to check here. First of all, you may have left this piece out altogether. It is not uncommon for a programmer to forget to include a way to exit a loop. However, this is not the case here. Second, you should make sure that loop_exit actually is outside the loop. If we put the label in the wrong place, strange things would happen. However, again, this is not the case.

Neither of those potential problems are the culprit. So, the next option is that perhaps %eax has the wrong value. There are two ways to check the contents of register in GDB. The first one is the command info register. This will display the contents of all registers in hexadecimal. However, we are only interested in %eax at this point. To just display %eax we can do print/$eax to print it in hexadecimal, or do print/d $eax to print it in decimal. Notice that in GDB, registers are prefixed with dollar signs rather than percent signs. Your screen should have this on it:

(gdb) print/d $eax
$1 = 3
(gdb)

This means that the result of your first inquiry is 3. Every inquiry you make will be assigned a number prefixed with a dollar sign. Now, if you look back into the code, you will find that 3 is the first number in the list of numbers to search through. If you step through the loop a few more times, you will find that in every loop iteration %eax has the number 3. This is not what should be happening. %eax should go to the next value in the list in every iteration.

Okay, now we know that %eax is being loaded with the same value over and over again. Let's search to see where %eax is being loaded from. The line of code is this:

 movl data_items(,%edi,4), %eax

So, step until this line of code is ready to execute. Now, this code depends on two values - data_items and %edi. data_items is a symbol, and therefore constant. It's a good idea to check your source code to make sure the label is in front of the right data, but in our case it is. Therefore, we need to look at %edi. So, we need to print it out. It will look like this:

(gdb) print/d $edi
$2 = 0
(gdb)

This indicates that %edi is set to zero, which is why it keeps on loading the first element of the array. This should cause you to ask yourself two questions - what is the purpose of %edi, and how should its value be changed? To answer the first question, we just need to look in the comments. %edi is holding the current index of data_items. Since our search is a sequential search through the list of numbers in data_items, it would make sense that %edi should be incremented with every loop iteration.

Scanning the code, there is no code which alters %edi at all. Therefore, we should add a line to increment %edi at the beginning of every loop iteration. This happens to be exactly the line we tossed out at the beginning. Assembling, linking, and running the program again will show that it now works correctly.

Hopefully this exercise provided some insight into using GDB to help you find errors in your programs.

Breakpoints and Other GDB Features

The program we entered in the last section had an infinite loop, and could be easily stopped using control-c. Other programs may simply abort or finish with errors. In these cases, control-c doesn't help, because by the time you press control-c, the program is already finished. To fix this, you need to set breakpoints. A breakpoint is a place in the source code that you have marked to indicate to the debugger that it should stop the program when it hits that point.

To set breakpoints you have to set them up before you run the program. Before issuing the run command, you can set up breakpoints using the break command. For example, to break on line 27, issue the command break 27. Then, when the program crosses line 27, it will stop running, and print out the current line and instruction. You can then step through the program from that point and examine registers and memory. To look at the lines and line numbers of your program, you can simply use the command 1. This will print out your program with line numbers a screen at a time.

When dealing with functions, you can also break on the function names. For example, in the factorial program in Chapter 4, we could set a breakpoint for the factorial function by typing in break factorial. This will cause the debugger to break immediately after the function call and the function setup (it skips the pushing of %ebp and the copying of %esp).

When stepping through code, you often don't want to have to step through every instruction of every function. Well-tested functions are usually a waste of time to step through except on rare occasion. Therefore, if you use the nexti command instead of the stepi command, GDB will wait until completion of the function before going on. Otherwise, with stepi, GDB would step you through every instruction within every called function.


Warning

One problem that GDB has is with handling interrupts. Often times GDB will miss the instruction that immediately follows an interrupt. The instruction is actually executed, but GDB doesn't step through it. This should not be a problem - just be aware that it may happen.


GDB Quick-Reference

This quick-reference table is copyright 2002 Robert M. Dondero, Jr., and is used by permission in this book. Parameters listed in brackets are optional.

Table F-1: Common GDB Debugging Commands

Miscellaneous

quit

Exit GDB

help [cmd]

Print description of debugger command cmd. Without cmd, prints a list of topics.

directory [dir1] [dir2] ...

Add directories dir1, dir2, etc. to the list of directories searched for source files.

Running the Program

run [arg1] [arg2] ...

Run the program with command line arguments arg1, arg2, etc.

set args arg1 [arg2] ...

Set the program's command-line arguments to arg1, arg2, etc.

show args

Print the program's command-line arguments.

Using Breakpoints

info breakpoints

Print a list of all breakpoints and their numbers (breakpoint numbers are used for other breakpoint commands).

break linenum

Set a breakpoint at line number linenum.

break *addr

Set a breakpoint at memory address addr.

break fn

Set a breakpoint at the beginning of function fn.

condition bpnum expr

Break at breakpoint bpnum only if expression expr is non-zero.

command [bpnum] cmd1 [cmd2] ...

Execute commands cmd1, cmd2, etc. whenever breakpoint bpnum (or the current breakpoint) is hit.

Continue

Continue executing the program.

Kill

Stop executing the program.

delete [bpnum1] [bpnum2] ...

Delete breakpoints bpnuml, bpnum2, etc., or all breakpoints if none specified.

clear *addr

Clear the breakpoint at memory address addr.

clear [fn]

Clear the breakpoint at function fn, or the current breakpoint.

clear linenum

Clear the breakpoint at line number linenum.

disable [bpnum1] [bpnum2] ...

Disable breakpoints bpnum1, bpnum2, etc., or all breakpoints if none specified.

enable [bpnum1] [bpnum2] ...

Enable breakpoints bpnum1, bpnum2, etc., or all breakpoints if none specified.

Stepping through the Program

nexti

"Step over" the next instruction (doesn't follow function calls).

stepi

"Step into" the next instruction (follows function calls).

finish

"Step out" of the current function.

Examining Registers and Memory

info registers

Print the contents of all registers.

print/f $reg

Print the contents of register reg using format f. The format can be x (hexadecimal), u (unsigned decimal), o (octal), a(address), c (character), or f (floating point).

x/rsf addr

Print the contents of memory address addr using repeat count r, size s, and format f. Repeat count defaults to 1 if not specified. Size can be b (byte), h (halfword), w (word), or g (double word). Size defaults to word if not specified. Format is the same as for print, with the additions of s (string) and i (instruction).

info display

Shows a numbered list of expressions set up to display automatically at each break.

display/f $reg

At each break, print the contents of register reg using format f.

display/si addr

At each break, print the contents of memory address addr using size s (same options as for the x command).

display/ss addr

At each break, print the string of size s that begins in memory address addr.

undisplay displaynum

Remove displaynum from the display list.

Examining the Call Stack

where

Print the call stack.

backtrace

Print the call stack.

frame

Print the top of the call stack.

up

Move the context toward the bottom of the call stack.

down

Move the context toward the top of the call stack.



Appendix E: C Idioms in Assembly Language

This appendix is for C programmers learning assembly language. It is meant to give a general idea about how C constructs can be implemented in assembly language.

If Statement

In C, an if statement consists of three parts - the condition, the true branch, and the false branch. However, since assembly language is not a block structured language, you have to work a little to implement the block-like nature of C. For example, look at the following C code:

if (a == b)
{
/* True Branch Code Here */
}
else
{
/* False Branch Code Here */
}

/* At This Point, Reconverge */

In assembly language, this can be rendered as:

 #Move a and b into registers for comparison
movl a, %eax
movl b, %ebx

#Compare
cmpl %eax, %ebx

#If True, go to true branch
je true_branch
false_branch: #This label is unnecessary,
#only here for documentation
#False Branch Code Here

#Jump to recovergence point
jmp reconverge


true_branch:
#True Branch Code Here


reconverge:
#Both branches recoverge to this point

As you can see, since assembly language is linear, the blocks have to jump around each other. Recovergence is handled by the programmer, not the system.

A case statement is written just like a sequence of if statements.

Function Call

A function call in assembly language simply requires pushing the arguments to the function onto the stack in reverse order, and issuing a call instruction. After calling, the arguments are then popped back off of the stack. For example, consider the C code:

 printf("The number is %d", 88);

In assembly language, this would be rendered as:

 .section .data
text_string:
.ascii "The number is %d\0"
.section .text
pushl $88
pushl $text_string
call printf
popl %eax
popl %eax #%eax is just a dummy variable,
#nothing is actually being done
#with the value. You can also
#directly re-adjust %esp to the
#proper location.

Variables and Assignment

Global and static variables are declared using .data or .bss entries. Local variables are declared by reserving space on the stack at the beginning of the function. This space is given back at the end of the function.

Interestingly, global variables are accessed differently than local variables in assembly language. Global variables are accessed using direct addressing, while local variables are accessed using base pointer addressing. For example, consider the following C code:

int my_global_var;

int foo()
{
int my_local_var;

my_local_var = 1;
my_global_var = 2;

return 0;
}

This would be rendered in assembly language as:

 .section .data
.lcomm my_global_var, 4

.type foo, @function
foo:
pushl %ebp #Save old base pointer
movl %esp, $ebp #make stack pointer base pointer
subl $4, %esp #Make room for my_local_var
.equ my_local_var, -4 #Can now use my_local_var to
#find the local variable


movl $1, my_local_var(%ebp)
movl $2, my_global_var

movl %ebp, %esp #Clean up function and return
popl %ebp
ret

What may not be obvious is that accessing the global variable takes fewer machine cycles than accessing the local variable. However, that may not matter because the stack is more likely to be in physical memory (instead of swap) than the global variable is.

Also note that in the C programming language, after the compiler loads a value into a register, that value will likely stay in that register until that register is needed for something else. It may also move registers. For example, if you have a variable foo, it may start on the stack, but the compiler will eventually move it into registers for processing. If there aren't many variables in use, the value may simply stay in the register until it is needed again. Otherwise, when that register is needed for something else, the value, if it's changed, is copied back to its corresponding memory location. In C, you can use the keyword volatile to make sure all modifications and references to the variable are done to the memory location itself, rather than a register copy of it, in case other processes, threads, or hardware may be modifying the value while your function is running.

Loops

Loops work a lot like if statements in assembly language - the blocks are formed by jumping around. In C, a while loop consists of a loop body, and a test to determine whether or not it is time to exit the loop. A for loop is exactly the same, with optional initialization and counter-increment sections. These can simply be moved around to make a while loop.

In C, a while loop looks like this:

while(a < b)
{
/* Do stuff here */
}

/* Finished Looping */

This can be rendered in assembly language like this:

loop_begin:
movl a, %eax
movl b, %ebx
cmpl %eax, %ebx
jge loop_end

loop_body:
#Do stuff here

jmp loop_begin

loop_end:
#Finished looping

The x86 assembly language has some direct support for looping as well. The %ecx register can be used as a counter that ends with zero. The loop instruction will decrement %ecx and jump to a specified address unless %ecx is zero. For example, if you wanted to execute a statement 100 times, you would do this in C:

 for(i=0; i < 100; i++)
{
/* Do process here */
}

In assembly language it would be written like this:

loop_initialize:
movl $100, %ecx
loop_begin:
#
#Do Process Here
#

#Decrement %ecx and loops if not zero
loop loop_begin

rest_of_program:
#Continues on to here

One thing to notice is that the loop instruction requires you to be counting backwards to zero. If you need to count forwards or use another ending number, you should use the loop form which does not include the loop instruction.

For really tight loops of character string operations, there is also the rep instruction, but we will leave learning about that as an exercise to the reader.


Structs

Structs are simply descriptions of memory blocks. For example, in C you can say:

struct person {
char firstname[40];
char lastname[40];
int age;
};

This doesn't do anything by itself, except give you ways of intelligently using 84 bytes of data. You can do basically the same thing using .equ directives in assembly language. Like this:

 .equ PERSON_SIZE, 84
.equ PERSON_FIRSTNAME_OFFSET, 0
.equ PERSON_LASTNAME_OFFSET, 40
.equ PERSON_AGE_OFFSET, 80

When you declare a variable of this type, all you are doing is reserving 84 bytes of space. So, if you have this in C:

void foo()
{
struct person p;

/* Do stuff here */
}

In assembly language you would have:

foo:
#Standard header beginning
pushl %ebp
movl %esp, %ebp

#Reserve our local variable
subl $PERSON_SIZE, %esp
#This is the variable's offset from %ebp
.equ P_VAR, 0 - PERSON_SIZE

#Do Stuff Here

#Standard function ending
movl %ebp, %esp
popl %ebp
ret

To access structure members, you just have to use base pointer addressing with the offsets defined above. For example, in C you could set the person's age like this:

 p.age  =  30;

In assembly language it would look like this:

 movl $30, P_VAR + PERSON_AGE_OFFSET(%ebp)

Pointers

Pointers are very easy. Remember, pointers are simply the address that a value resides at. Let's start by taking a look at global variables. For example:

int global_data = 30;

In assembly language, this would be:

 .section .data
global_data:
.long 30

Taking the address of this data in C:

 a = &global_data;

Taking the address of this data in assembly language:

 movl $global_data, %eax

You see, with assembly language, you are almost always accessing memory through pointers. That's what direct addressing is. To get the pointer itself, you just have to go with immediate mode addressing.

Local variables are a little more difficult, but not much. Here is how you take the address of a local variable in C:

void foo()
{
int a;
int *b;

a = 30;

b = &a;

*b = 44;
}

The same code in assembly language:

foo:
#Standard opening
pushl %ebp
movl %esp, %ebp

#Reserve two words of memory
subl $8, $esp
.equ A_VAR, -4
.equ B_VAR, -8

#a = 30
movl $30, A_VAR(%ebp)

#b = &a
movl $A_VAR, B_VAR(%ebp)
addl %ebp, B_VAR(%ebp)

#*b = 30
movl B_VAR(%ebp), %eax
movl $30, (%eax)

#Standard closing
movl %ebp, %esp
popl %ebp
ret

As you can see, to take the address of a local variable, the address has to be computed the same way the computer computes the addresses in base pointer addressing. There is an easier way - the processor provides the instruction leal, which stands for "load effective address". This lets the computer compute the address, and then load it wherever you want. So, we could just say:

 #b = &a
leal A_VAR(%ebp), %eax
movl %eax, B_VAR(%ebp)

It's the same number of lines, but a little cleaner. Then, to use this value, you simply have to move it to a general-purpose register and use indirect addressing, as shown in the example above.

Getting GCC to Help

One of the nice things about GCC is its ability to spit out assembly language code. To convert a C language file to assembly, you can simply do:

gcc -S file.c

The output will be in file.s. It's not the most readable output - most of the variable names have been removed and replaced either with numeric stack locations or references to automatically-generated labels. To start with, you probably want to turn off optimizations with -O0 so that the assembly language output will follow your source code better.

Something else you might notice is that GCC reserves more stack space for local variables than we do, and then AND's %esp [1] This is to increase memory and cache efficiency by double-word aligning variables.

Finally, at the end of functions, we usually do the following instructions to clean up the stack before issuing a ret instruction:

 movl %ebp, %esp
popl %ebp

However, GCC output will usually just include the instruction leave. This instruction is simply the combination of the above two instructions. We do not use leave in this text because we want to be clear about exactly what is happening at the processor level.

I encourage you to take a C program you have written and compile it to assembly language and trace the logic. Then, add in optimizations and try again. See how the compiler chose to rearrange your program to be more optimized, and try to figure out why it chose the arrangement and instructions it did.

[1]Note that different versions of GCC do this differently.




Appendix D: Table of ASCII Codes

To use this table, simply find the character or escape that you want the code for, and add the number on the left and the top.

Table D-1: Table of ASCII codes in decimal

+0

+1

+2

+3

+4

+5

+6

+7

0

NUL

SOH

STX

ETX

EOT

ENQ

ACK

BEL

8

BS

HT

LF

VT

FF

CR

SO

SI

16

DLE

DC1

DC2

DC3

DC4

NAK

SYN

ETB

24

CAN

EM

SUB

ESC

FS

GS

RS

US

32

!

"

#

$

%

&

'

40

(

)

*

+

,

-

.

/

48

0

1

2

3

4

5

6

7

56

8

9

:

;

<

=

>

?

64

@

A

B

C

D

E

F

G

72

H

I

J

K

L

M

N

O

80

P

Q

R

S

T

U

V

W

88

X

Y

Z

[

\

]

^

_

96

'

a

b

c

d

e

f

g

104

h

i

j

k

l

m

n

o

112

p

q

r

s

t

u

v

w

120

x

y

z

{

|

}

~

DEL

ASCII is actually being phased out in favor of an international standard known as Unicode, which allows you to display any character from any known writing system in the world. As you may have noticed, ASCII only has support for English characters. Unicode is much more complicated, however, because it requires more than one byte to encode a single character. There are several different methods for encoding Unicode characters. The most common is UTF-8 and UTF-32. UTF-8 is somewhat backwards-compatible with ASCII (it is stored the same for English characters, but expands into multiple byte for international characters). UTF-32 simply requires four bytes for each character rather than one. Windows® uses UTF-16, which is a variable-length encoding which requires at least 2 bytes per character, so it is not backwards-compatible with ASCII.

A good tutorial on internationalization issues, fonts, and Unicode is available in a great Article by Joe Spolsky, called "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)", available online at http://www.joelonsoftware.com/articles/Unicode.html

Appendix C: Important System Calls

These are some of the more important system calls to use when dealing with Linux. For most cases, however, it is best to use library functions rather than direct system calls, because the system calls were designed to be minimalistic while the library functions were designed to be easy to program with. For information about the Linux C library, see the manual at http://www.gnu.org/software/libc/manual/

Remember that %eax holds the system call numbers, and that the return values and error codes are also stored in %eax.

Table C-1: Important Linux System Calls

%eax

Name

%ebx

%ecx

%edx

Notes

1

exit

return value (int)

Exits the program

3

read

file descriptor

buffer start

buffer size (int)

Reads into the given buffer

4

write

file descriptor

buffer start

buffer size (int)

Writes the buffer to the file descriptor

5

open

null-terminate file name

option list

permission mode

Opens the given file. Returns the file descriptor or an error number.

6

close

file descriptor

Closes the give file descriptor

12

chdir

null-terminated directory name

Changes the current directory of your program.

19

lseek

file descriptor

offset

mode

Repositions where you are in the given file. The mode (called the "whence") should be 0 for absolute positioning, and 1 for relative positioning.

20

getpid

Returns the process ID of the current process.

39

mkdir

null-terminated directory name

permission mode

Creates the given directory. Assumes all directories leading up to it already exist.

40

rmdir

null-terminated directory name

Removes the given directory.

41

dup

file descriptor

Returns a new file descriptor pat works just like the existing file descriptor.

42

pipe

pipe array

Creates two file descriptors, where writing on one produces data to read on the other and vice-versa. %ebx is a pointer to two words of storage to hold the file descriptors.

45

brk

new system break

Sets the system break (i.e. -the end of the data section). If the system break is 0, it simply returns the current system break.

54

ioctl

file descriptor

request

arguments

This is used to set parameters on device files. Its actual usage varies based on the type of file or device your descriptor references.

A more complete listing of system calls, along with additional information is available at http://www.lxhp.in-berlin.de/lhpsyscal.html You can also get more information about a system call by typing in man 2 SYSCALLNAME which will return you the information about the system call from section 2 of the UNIX manual. However, this refers to the usage of the system call from the C programming language, and may or may not be directly helpful.

For information on how system calls are implemented on Linux, see the Linux Kernel 2.4 Internals section on how system calls are implemented at http://www.faqs.org/docs/kernel_2_4/lki-2.html#ss2.11