CS 107: Programming Paradigms
3.9 (86 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
29,720 students enrolled
Wishlisted Wishlist

Please confirm that you want to add CS 107: Programming Paradigms to your Wishlist.

Add to Wishlist

CS 107: Programming Paradigms

Programming Paradigms (CS107) introduces several programming languages, including C, Assembly, C++ and more
3.9 (86 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
29,720 students enrolled
Published 3/2010
CS 107: Programming Paradigms
Price: Free
  • 1 Article
  • 57 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion

Programming Paradigms (CS107) introduces several programming languages, including C, Assembly, C++, Concurrent Programming, Scheme, and Python. The class aims to teach students how to write code for each of these individual languages and to understand the programming paradigms behind these languages. Start learning the programming paradigms now.

Compare to Other Software Engineering Courses
Curriculum For This Course
85 Lectures
1 Lecture 02:18
CS107 Handout 01 Spring 2008 April 2, 2008 CS107 Course InformationInstructor: Jerry CainPrereqsThe prerequisite for the class is programming and problem solving at the
Lectures and Assignments
84 Lectures 00:00


Topics: Administrative Details, Exams - Time limit, Conflicts, Course Grade Breakdown, Assignment Details - Submission, Grading, Late Days, Course Email, Newsgroup, Facebook/Twitter, Mailing List, Course Prerequisites, Languages and Paradigms Taught - C++ vs. Pure C, Procedural Paradigm vs. Object-Oriented Paradigm, Assembly, Concurrent Programming Overview, Example of Data Sharing Issues with Concurrent Programming, Scheme, Functional Paradigm Overview, Python Overview, Benefits and Common Uses


Instructor (Jerry Cain):For 240, now there won’t be.

Student:I can sit in [inaudible].

Instructor (Jerry Cain):Oh, that’s fine. Hey, everyone, welcome, we are live on television. So this is cs107. My name is Jerry Cain. I have two handouts for you today. They’re the type of handouts that you expect to get on the first day of a course. As far as pertinent information about myself; my life synopsis is on the board right here. There’s my email address, those are my office hours, and that is my office upstairs, just more or less, directly above this room. By far, the best way to get in touch with me is email via that address. Certainly if you have a question that you need to ask me specifically about then email is great, I’m very good about checking email. I’m the type of person that just clicks on send and receive every three seconds, so I know when your email comes in so I’m always very good about responding. The office hours are early in the mornings on Mondays and Wednesday, I know that’s not the best time in the world, but just for various reasons, that’s the only time I can really have them, but in general, if I’m in my office you are more than welcome to stop by and you’re always welcome to call my office to see if I’m in during some other time in the week. Okay. I’ll try to make it a point to just casually be around the office as much as possible, but Monday and Wednesday, 9:00 to 10:30, is really the official time when I’m around. Okay. I have a good number of staff members; I can have them just wave, you don’t need to stand up, just wave, you can wave a little bit more enthusiastically, thank you. Okay. That actually only represents about a third of what the staff will be. Some people have a conflict today so they’re not here. And I actually only figured out last night how large this class is. We expected a 130 - 140 students and I just checked at 10:55 that we have 241 signed up for the course, which is the largest that cs107 has ever been. It’s really weird having this many people come and take and interest and listen when you talk.

I feel a little bit like Barack Obama. I’m delighted that this many people are taking it and so I certainly expect to have, on the order of 15 to 18 TAs and undergraduate section leaders working and they’re very pertinent to the course. I almost require – in fact, it is the case this quarter, that every single person I’m gonna hire has taken 107 either as a undergraduate or as a first year graduate student, so they are great resources for questions and their office hours, with very little exception, they’ve all done all the various assignments that you’ll be doing this quarter, so they already know what your questions are gonna be. Okay. As far as their office hours are concerned, they will have office hours and they’ll rotate through a grid of evening hours at the main computer cluster where you’ll be expected, or at least invited, to do all of your online programming work. Okay. Those will start next week. You’re not gonna have an assignment until this Friday, and I make that one as easy as possible as far as the coding assignment is concerned because you have to get used to Unix and this new development environment. A lot of you have probably never been there before, so I go soft on your on assignment one so you can actually not be intimidated by the whole Unix development process. We do have a discussion section. We had scheduled one for tomorrow – I’m not even gonna say the time because I don’t want to confuse people – but it turned out that it was terrible for three of my four head TAs so I’m in the process of rescheduling that. It’s not gonna be on Thursday, it’s probably gonna be on Tuesdays. That’s what I’m shooting for now. At the time I had to take these handouts to press I just didn’t have a time yet, but I will announce, certainly in lecture on Friday and probably prior to that via the emailing list. You’re not required to go to discussion session, it is gonna be televised just like the lectures are – there will always be section handouts and section solutions, so even if you just don’t want to bother watching it, not that I’m recommending that, but you certainly have the resources to figure out what they went over during that section because I’ll be very good about putting up section handouts and section solutions so you know exactly what topics they were discussing. Okay. Whatever the time is next week, it’ll probably be Tuesday sometime in the afternoon.

The first section, if you go to one section off quarter and go to one live section off quarter, next weeks is probably the one to go to if you’ve never really dealt with Unix before. They’ll invest some energy coming up with some slides and some examples to show you what it’s like to open an [inaudible] in Emax and to actually code there and then to use a command line to compile the program and see how you actually compile and execute and debug programs in a Unix world. Okay. And it’s a little weird at first, you kind of hate it for three or four days then you start to like it because it’s really light weight and very fast. So that’s another reason why I make the first assignment less about coding and more about just getting used to Unix. Okay. Not next week, but the week after when we start having discussion sessions again, and having more of them, that’ll adopt a more traditional format where we bring section problems and we talk and everybody raises their hand and asks questions and things like that. Okay. As far as reading material is concerned, in the handout I specify three books that I think are really great books. Two of them are on C++; they’re actually quite advanced books. There’s also one book on a language called Scheme, which I’ll talk about in a second. These are by no means required and I don’t even want to say that I recommend them in a sense that those who are really fastidious about going and buying every book they think is gonna help them, I’m not really even saying that you should go buy it, I just want them to be on your radar so that if after you get through the C and C+ segment of the course, you can see yourself doing more serious C++ programming later on, then you might want to go and get these book. But don’t spend money on books yet because this class has more or less been taught with the same type of structure over the past 15 years so we’ve compiled pages and pages of handouts. This is just the tip of the iceberg. Anything you’re gonna be responsible for is covered either in lecture of in the handouts. I’m not gonna have any external reading from text books so just worry about the handouts. The handouts are very polished. I think they’re in really good shape. You’ll average one to two handouts every single lecture and probably Friday and Monday, you’re gonna get four or five each day. As far as the exams are concerned – let me talk a little bit about that. Midterm, there is one, and it is Wednesday, May 7. Okay. I schedule it for 7:00 to 10:00 P.M. People roll their eyes when they see a three-hour window set aside for a midterm.

I’m not gonna write the midterm with the intent of you needing every single minute of those three hours. I genuinely make an effort to make the exam so that it can be completed, patiently, in 60 to 90 minutes, okay, and occasionally, I’m off. When that happens, you have this buffer of an extra hour and a half to really work through it. Okay. I just give people three hours with the intent of giving the illusion of an infinite amount of time so that they don’t feel that they’re pressured to get through it. That may sound like I’m being really nice, but I’m actually being slight cruel by giving you three hours because if you do badly on it, then you have one less reason to blame me. You actually have to work through the problems and you really have to know the material, and you can’t blame time pressure as being a factor as to why the exam didn’t go well. So that’s why I want to give people as much time as I can reasonably give them without going into 2:00 A.M., 3:00 A.M. Okay. I recognize that because this is scheduled outside of lecture that many of you may have classes, orchestra meets at night, people have various activities they have to do at night, I want to displace you from those because I decide to have my exam outside of lecture so if you can’t make that time, that’s fine, just let me know soon and plan on taking the exam during some three hour window that fits between 10:00 A.M. and 6:00 P.M. Okay. So most people that works with. If that’s not gonna work out, then let me know now. I’ll make sure you take the exam sometime, but I’ll be around more or less or TAs will be more or less all day Wednesday to make sure that you can take an exam when you need to. I’ll accommodate any reasonable requests. As far as the final is concerned, I offer the final exam twice. Our dedicated slot is Monday, June 9 and the official time is 8:30 A.M. Now, because we’re on TV and because Stanford students – there’s just not that many times to offer classes, a lot of people have conflicts, a lot of people are taking two classes, sometimes even three classes Monday, Wednesday, Friday at 11:00, and if that’s the case, then you have two or three finals on Monday, June 9th at 8:30. If you want to take all three then, you’re more than welcome to, but I assume you’re not gonna want to do that so if you can’t make this time, it is fine, provided you can make an alternate slot of 3:30 P.M. that day. If you cannot make either of those times, please let me know now so I can take care of that.

As far as the breakdown of assignments versus exams, this is how it all pieces together in the end. You have assignments, you have the midterm, you have the final, these contribute 40 percent of your grade. The midterm is the least of the three and the rest is the final. Okay. That’s a pretty sizable chunk is dedicated to the assignments. People, in general, do very well in the assignments. They might struggle to figure out what kind of things we’re expecting on assignment one, but eventually, people figure out what I’m expecting and they get the programs to work and they write nice clean code, so we see a lot of A’s and A-s on the programs. We just do. And I imagine 80 percent of you will be getting A’s and A-s on some of the later programs because you’ll just start figuring things out. And that’s not to say that grades will be much lower prior to that, but just toward the end everybody seems to be doing very well on the assignments. There’s probably gonna be six or seven assignments. One of them will not be handed in, it’ll be a written problem set where the deadline will, more or less, be at the time you take the midterm because it’ll be a series of written problems that serve as practice for the midterm. You’re certainly responsible for that, but all the other assignments take a programming assignment approach where you actually code up stuff and you electronically submit it and one of the section leaders or the TAs grade your work and they electronically or they email you back feedback. I don’t need any paper copies at all. Everything is dealt with electronically. We’re gonna really put a lot of effort into trying to turn the assignments around very, very quickly. In order to do that, I do allow you to turn work in late. I have the same type of late day policy that 106b does.

I actually give you the ability to grant yourself little extensions in 24-hour units. I give you five of those little days. And if you want, you can use one or two or all five later days on any particular assignment, but after five days, whether using free late days or extra ones that actually cost you a little bit, I need all the assignments within five days of the deadline so that I know that the TAs can start grading them and crank out grades and crank out feedback and get them back to you. Does that all make sense to everybody? Okay. As far as the midterm or final will go I actually try to make this – I don’t want to say that I make it easy, I think, of the two exams, the final is a little bit more difficult because I think the midterm actually deals with the more difficult material so I go soft on you here knowing that I’m gonna revisit some of that material on the final. Okay. Now, when I say, “Soft,” I don’t mean easy, I just meant softer, comparative. Okay. Then I would normally go, but nonetheless, that’s why I have the breakdown of 25 to 35 – really, all the material is equally represented on the final exam. I’m trying to think what else is going on. Oh, of course, as far as resources for keeping in touch with us, if you have a question specifically for me, then by all means email that address right there, but if you’re asking a question about an assignment or lecture material that you know or suspect any section leader or TA would be able to answer, then you’re gonna benefit by sending mail to this address. Cs107@cs.stanford.edu. That’s precisely the address that you would assume would be attached to a staff emailing list for this class. When you do that, you’re sending mail to an account that all 15 or 18 or 19 of us whole on a fairly regular basis and so you’re that much more likely to get a quick response. Maybe not so much on a Saturday morning, but certainly on a weeknight when everybody’s coding and deadlines are approaching. We’re very good about pulling that and getting responses out as quickly as possible. Okay. If you don’t hear back from this within, like say, 12 or 24 hours – let’s say 12 hours in a weekday and 24 hours on the weekend, then it’s okay to email me directly and say, “Hey, I’m not getting my answers, can you answer this question for me?” But be patient with that. We’re actually very good – very often, we get back to people within minutes. A lot of times the TAs are sitting in front of the computer right next to you in the term or cluster where you’re working, you just don’t know what they look like so they respond to your question very, very quickly. I’m also gonna try to experiment – I have no idea what I’m planning to do with this, but I have a Facebook group. The URL is for cs107 – the URL is in the handout. You do have to be a member of Facebook.

I’m not trying to evangelist Facebook just because I work there as well, but if you have a Facebook account already and want to go join this Facebook group, just visit the URL that’s posted in the handout or just do a search for cs107 and I’m sure it’ll come up. I also have a Twitter account for 107. If you want to follow cs107 – I think this is an interesting idea because I could send little announcements, these very lightweight, noninvasive announcements to everybody who’s paying attention to cs107 on Twitter saying, “Webpage has been update or exam cancelled,” or something like that. Okay. And you can all follow what’s going on there. I don’t want you to think that you have to pay attention to these. Anything that’s truly important about assignment deadlines or things that need clarification, I will actually use the mailing list and post to it. That will be the official forum for things that really matter but these are intended to be lighthearted. As far as that mailing list is concerned, I have 241 people signed up for the course already. That means that most of you, if not all of you, are automatically enrolled in the mailing list for the course, so when I send mail to cs107-spr0708-students@list.stanford.edu, you automatically get that email provided you’re registered for the class. If you haven’t registered for the class yet, let me pester you a little bit to go and sign up for the class even if you’re not sure if you’re gonna take it because, probably, I send more announcements in the first week of the course than I do anytime later on because just a lot has to happen and we have to up ramp a lot in the first few days so I’m certainly gonna send an email out as soon as I found when section time is. A lot of you have signed up for the course may not realize this, but sometimes you can inadvertently set some privacy settings so that you’re email is not put on that list – do you know what I’m talking about? Okay. So go to Stanfordu.com or u.stanford.com, I forget what it is or wdu where you just go make sure that your email is public just to the Stanford community so that you can get on that emailing list. If you have a problem with that, I understand that you don’t want your email address advertised, but then just make it a point to email me and say that you’re not that so I can put your email address on some sub list that I always include with the announcements. Does that make sense to everybody? Let me just talk a little bit about the syllabus. Let’s make sure there’s nothing in here. Is there anyone in the room who has not taken 106b or 106x, a couple people I’m assuming, okay. I’m just taking a pulse on this. I don’t necessarily require 106x or 106b, specifically, but I do kind of expect that you’ve done some C++ programming, that the notion of a linked list or a hash table is familiar to you, a binary search tree, a function point or all of those things that are covered in our 106 courses are familiar to you. If that isn’t the case or you don’t know C++ all that well, then I’m a little worried because the first half of 107 is all about advanced C and C++, so if you don’t have the basics down or you haven’t been exposed to that stuff yet, it makes the first two weeks of the course, which is normally actually quite fun because you’re learning all about the inside, under the hood machinery of the C language, it can make it pretty miserable unless you actually anticipate spending a little bit more time getting up and running. So if you don’t know C or C++ and you’ve never written an assignment for a course in C or C++ then talk to me after lecture so I can kind of gauge as to whether 106b or 107 is better for you. Okay.

Here is, in a nutshell, a reduced version of what’s presented in handout two. This is the syllabus. I have several languages that I’m gonna put on the board and one concept. C, Assembly, C++, Concurrent programming – that’s not a language, that’s just a paradigm, Scheme and then it is official, I’m actually gonna cover Python from now on in 107. People look at this and they go, “Wow, I’m gonna be able to legitimately put all of these languages on my resume,” and it’s from a separate list at the bottom and it feels really good. It isn’t so much about that. Certainly we want to give you some mileage in some very relevant languages that are very good for both research and for industry, but the real intellectual value in learning all of these languages is to really study the paradigms that they represent, and I’ll explain what that means in a second. I think a lot of C++ programmers really program in C and just incidentally use the objects within the classes that are available to them, okay, which is a perfectly reasonable way to program. Most people learned how to program in C, at least the people I know in the industry, know C very, very well, and in spite of the fact that there’s 50 million newer languages that are better in many regards, they still stick with what they know and that’s why C and C++ are still such popular languages. There’s nothing wrong with programming in C if you know it very well and you write clean, readable code, it’s just more difficult. I don’t really care so much about teaching you how to program an assembly, I use it as a vehicle for showing you how C and C++ programs compile to dot 0 and to object files and to binaries and that become executables and show you how a line like I = seven or J++ or FU of X and Y is a function call, how that all translates to assembly code. Okay. Does that make sense? You know when you write C++ code that when you execute the program, it’s not C++ anymore, it’s assembly code. It’s all zeros and ones eventually. I want to give you some insight as to how C is translated to assembly code, how all these variables and your functions and your objects and all of that, eventually, get boiled down and hashed down to a collection of zeros and ones and I want to do a little bit of the same thing for C++. It turns out that – well, C++ and C represent different paradigms, that they really compile the zeros and one, and after you get enough experience with this assembly and the manual compilation process that we’re gonna learn about and how to look at C code and figure out what the assembly code would look like, you’re really gonna see that C++ and C almost look like the same language as far as the zeros and ones are concerned. I’m gonna be able to do something like **&**P arrow ***=7 and you’re gonna know exactly what it means. Okay.

So it takes a little bit of work and it’s almost laughable how arbitrary you can be with formulas, but if it compiles, it means something so when it runs it actually does something. It’s probably not good if you have a lot of asterisks and ampersands, but nonetheless, you can have some idea as to why it’s crushing, not just that it is crushing. Okay. I spend a good amount of energy talking about concurrent programming. We actually – at the moment, do that type of programming in C, but all the programs you’ve written in the past two quarters, if you’ve just taken the 106A and 106b courses here or 106x, all the programs you’ve written at Stanford, prior to 107, have been sequential programs. That means that whether it’s object oriented or procedurally oriented, you have this outline of steps that happen one after another. Nothing is done in parallel or pipelined or done side-by-side, everything happens in one clean stream of instructions. Okay. Well, what concurrent programming is about is within a single program trying to get two functions to seemingly run simultaneously. If you can get two functions to seemingly run simultaneously then you can extend that and get 10 functions to run simultaneously or 20 functions to run simultaneously or seemingly simultaneously. I say it that because, technically, they don’t run at the same time. When I go over assembly code, and I think you can intuit enough about what assembly code is like, but if you have this one function, okay, my hand is a function, this hand is another function, okay and you concern yourself with the execution of one of them and then when I do this you can just think about it reading the code and executing it for side effects. Does that make sense to people? When you deal with concurrent programming you have two or more functions, just two right here because I only have two hands, to do this and they both seemingly run at the same time, but what really happens is it’s like watching two movies at the same time where – because there’s only one processor on those machines, it doesn’t really run like this, it runs like this and switches back and forth between the two functions, but it happens so fast that you just can’t see the difference. Okay. It’s more than 24 transfer seconds, it’s, like, a million. There are a lot of situations where concurrent programming is not very useful, but there are several situations, particularly networking, whenever that’s involved, where concurrent programming is actually very useful.

There are some problems that come up when you deal with concurrent programming that you might not think about. The example I always go over the first day of class is just that it uses two Wells Fargo ATM machines. Okay. Think about you have a Wells Fargo checking account, you may not have to think about it because you probably do, so just imagine your checking account is in danger because two people are using ATM machines and you have a $100 in it and you share your PIN with your best friend and you go up to a neighboring ATM machines and you make as much progress as possible to withdraw that $100 and then you both, on the count of three, press okay to try and get $200 collected. Does that make sense? That is not a sense sensible example because both of those machines are basically very simple computers, okay, that ultimately need to access the same master account balance to confirm that $100 is available and in this transactional – transactional makes sense both in terms of money and also in the sense of concurrent programming – you have to make sure that the $100, being identified as the account balance, is maintained with some atomic flavor so that if you have two people trying to withdraw $100 at the same time that only one person gets away with it. That $100 account balance is the shared resource that two different processes have access to. Does that make sense to people? Okay. So there has to be directives that are put in place to make sure that the account balance check and the withdrawal are basically done, either not at all, or in full so that it really does function, truly, as a transaction, both in the finance and the concurrent programming sense. Okay. As far as this Scheme and Python are concerned, once we get through concurrent programming we really switch gears and we start looking at this language called Scheme. You may not have heard of this. If you haven’t heard of this, you may have heard of a language called LISP, which it’s certainly related to. This is a representative of what is called the functional paradigm. Okay. There’s two things about Scheme and functional languages – purely functional languages that are interesting in contrast to C and C++. When you program using the functional paradigm you always rely on the return value of a function to move forward. Okay. And you program without side effects. Now, that’s a very weird thing to hear as an expression when you’ve only coded for a year, but when you code in C and C++, it’s very often all about side effects.

The return value doesn’t always tell you very much. It’s a number or it’s a bouillon, but when you pass in a data structure by reference to a function and you update it so that when the function returns, the original data structure has been changed, right, does that make sense? That’s programming by side effects. Well, the idea with Scheme and particularly the functional paradigm is that you don’t program with side effects. You don’t have that at all. You always synthesize the results or partial results that become larger partial results that eventually become the result that you’re interested in and only then are you allowed to print it to the screen to see what the answer to the problem was. Okay. It’s very difficult to explain Scheme if you’ve never seen it before in a five minute segment of a full introduction, but when you get there, we have tons of examples of the paradigm. It’s a very fun, neat little language to work in. This language called Python – I’m suspecting that most people have heard of it even if they’ve never seen it. It seems to be the rage language at a lot of significant companies in the Bay Area. They’re very smart people with these companies so when they use a language and they like it, there’s usually a very good reason for them liking it. You’ve probably heard of a language called Pearl. Okay. It’s not a very pretty language. You can just think about, in some sense, Python being a more modern, object oriented version of Pearl. Okay. Now, if you don’t know Pearl and you don’t know Python it doesn’t mean anything to you, but just understand that this sexy little language that’s been around for probably 16, 17 years that’s really established itself as a popular language since year 2000, 2001. I know a lot of people who work at Google that program in Python on a daily basis. There’s a subset of us at Facebook that program in Python every day. It actually has a lot of good libraries for dealing with web programming. Web programming can seem really boring to a lot of people because it just seems like it’s just HTML and web pages and things like that. Real web programming is more sophisticated than that. You dynamically generate web pages based on content and databases and things like that and Python, being a scripting language, which means its interpretive and you can type in stuff as you go, it recognizes and reads and executes the stuff as you type, it’s very good for that type of thing, and if all goes well, meaning I have time to develop this assignment idea I have, you’re gonna write a little miniature dynamic web server in Python for your final project. It won’t be that sophisticated. You’re not gonna write all of apache, but you are gonna probably write some little thing where you really do have a web server behind the scenes making decisions about how to construct an HTML page and serve that over to a client. So it’ll be an opportunity to learn Python, to learn with libraries, to see that as a language because it’s fairly young, it has the advantage of not bothering to include C and C++’s and Java’s mistakes. It says, “No, I’ll leave that part out and I’ll go with this more interesting, well formed core,” it has great libraries, it has object orientation, you can program procedurally if you want to and just program like you’re in C, but using Python syntax.

There are even functional programming aspects in the language so even though the syntax is different from Scheme, conceptually, you can use Scheme like ideas in Python coding if you want to. Okay. I’ll be able to illustrate the client server paradigm and how it’s different from traditional programming. That’s not so much a Python thing, but Python is a good vehicle for learning that stuff. There are a few other paradigms that aren’t represented here, but I think I really cover all the ones that you’re likely to see for the next 15 years and you’re a coder. Okay. There are a couple of the languages that I may briefly mention the last night that are just fun, but they all have some overlap with some languages represented right here. Okay. You guys are good? Okay. So I don’t like starting in on any real material when I only have 10 minutes left, so I’m actually gonna let you go, but recognize that Friday I’m gonna have tons of handouts for you, I’m gonna have an assignment, okay, we’re gonna dive right into the low level pointer stuff of C and C++. Okay. So have a good week.



Lecture 1 Programming Paradigms

Topics: C/C++ Data Types - Interpretations, Sizes, Bits- How Bytes are Broken Up into Bits, Breaking Up a Character's Decimal Value into its Underlying Bit Structure, Shorts - Interpreting Data that Consists of More Than One Byte, Representations of Negative Numbers, The Sign Bit, Two's Complement Addition, Converting Between Chars and Shorts, How the Bit Representation is Transferred, Converting Between ints and shorts, Sign Extending During Conversion, Floats, Converting Between Integers and Floats



Instructor (Jerry Cain):Hi, everyone. Welcome. I have four super handouts for you today. If you haven't gotten them yet, feel free to just sit down. We're gonna probably make it a point because there's so many people in the class to just hand them out while I start lecturing. This way we don't have this big bottleneck of people trying to get in by 11:00. The four handouts are posted to the web page. The mailing lists were created last night. And I just looked at it this morning, and there were 245 email addresses on it. So it looks like it's working. I haven't sent anything to the email list yet, but I will just contrive a message later this afternoon, and send it to everybody. And if you don't get that by Monday morning, when I make an announcement saying, "If you didn't get that message let me know," then I'll investigate as to why you're not on it. SEPD students, I'm not sure that you're actually on the mailing list yet. That system runs a little bit differently, and usually they push your email address onto the mailing list a little bit later. I'm not sure why, but – so if you don't get an email over the course of the weekend, then just let me know. And I'll see what I can do to fix it. I'll also post announcements to the web page so that you at least can get them. What I want to do is I want to start talking about the low-level memory mechanics, so that you understand how data – things as simple as Booleans and integers and floating-point numbers and strucks and classes – are all represented in memory. It's very interesting, I think, to understand how everything ultimately gets represented as a collection of zeros and ones. And how it's faithfully interpreted every single time to be that capital A, or that number seven, or Pi, or some struck that represents a fraction, things like that. And we'll just become much, much better C and C++ programmers as a result of just understanding things at this low of a level.

So, for the moment, C and C++ are the same language to me. So let's just talk about this. Let me just put a little grid up here of all the data types that you've probably dealt with. You've probably dealt with boole. You've probably dealt with CAR. I'm sure you have. You may not have dealt with the short, but I'll put it up there anyway. You've certainly dealt with the int. You've certainly – well, maybe not dealt with the long, but let's just pretend you have. You've probably seen a floats. You've probably seen doubles. And that'll be enough for me, enough fodder for the first ten minutes here. These three things – I'm sorry. These three things right there are certainly related. They're all intended to represent scalar numbers. Obviously, this represents a true or a false. This represents in our world one of 256 characters. We usually only pay attention to about 75 of them, but nonetheless, there are 256 characters that could be represented here. These are all numeric. These take a stab at trying to represent arbitrarily precise numbers, okay? The character is usually one byte in memory. At least it is in all C and C++ programmers – program compilers that I know of. This is typically two bytes. The int can actually be anywhere between two and four bytes, but we're going to pretend in this class that it’s always four bytes, okay? The long, for the time being, is four bytes of memory. There is another data type, which isn't really common enough to deserve to be put on the blackboard, called the long long, which gives you eight bytes of memory to represent really, really large decimal numbers. They'll come up later on, but I'll talk about them if I ever need to. The float is four bytes. It somehow tries to take four bytes of memory and represent an arbitrarily precise number to the degree that it can, given that it's using a finite amount of memory to represent a number that requires and infinite amount of precision, sometimes. And a double, I've seen them ten and twelve bytes on some systems, but we're just gonna assume that they're eight bytes. Now, that's the most boring spreadsheet you could possibly introduce a class with, but my motivation is that I want to uncover what the byte is all about, and how four bytes can represent a large frame of numbers, how eight bytes can represent a very large set of numbers, and actually do a pretty good job at representing numbers precisely enough for our purposes.

So forget about bytes for the moment. Now, I'll go back to the boole in a second, because it's kind of out of character as to how much memory it takes. But I'm interested, at the moment, in what is less commonly called the binary digit, but you've heard it called the bit. And Double E students and those who enjoy electronics think of the binary digit in terms of transistors and voltages, high and low voltages. Computer scientists don't need to view it that way. They just need to recognize that a bit is a very small unit of memory that can distinguish between two different values. Double Es would say high-voltage, low-voltage. We don't. We actually just assume that a single bit can store a zero or a one. Technically, a Boolean could just be mapped to a single bit in memory. It turns out it's not practical to do that. But if you really wanted to use a single bit to represent a Boolean variable, you could engineer your compiler to do that, okay? Bits are more interesting when they're taken in groups. If I put down eight bits here – I'm not even going to commit to a zero or a one, but I'm gonna draw this. This isn't zero over one as a fraction, this is me drawing eight bits – let me just draw one over here so I have some room – and put a little box around each one of them in this binary search way, okay? And I have this big exploded picture of what we'll draw several times to represent a single byte of memory. Now, the most interesting thing to take away from this drawing is that this little box right here can adopt one of two values. Independently of whatever value this box chooses to adopt, etc. In fact, there are eight independent choices for each of the bits. I'm assuming that makes sense to everybody, okay? That means that this, as a grouping – a byte of memory with its eight bits that can independently take on zeros and ones can distinguish between two to the eighth, or 256 different values. Okay, and that's why the Ascii table is as big as it is, okay? 65 through 65 plus 25 represents the alphabet. I forget where lowercase A starts. But every single character that's ever printed to the screen or printed to a file is backed by some number. I know you know that. When you look in memory to see how the capital A is represented, you would actually see a one right there – I'm sorry, I forget where it is actually – a one right there and a one right there. I'll draw it out and explain why that's the case. Because capital A is backed by the number 65, we don't put things down in decimal in memory. We put them down in base two. Okay? Because that's what – that's the easiest thing to represent in a bit-oriented system, okay? That make sense to people? Okay. So if I say that the capital A is equal to 65, you have to stop thinking about it as 65. You have to think it about it as some sum of perfect powers of two. So it isn't 64 – it isn't 65 rather, it's actually 64+1. One is two to the zero. A two is two to the first. There's none of that. Four is two to the second. Eight is two to the third. Sixteen is four. Thirty-two is five. Sixty-four is six. This is actually two to the sixth plus two to the zeroth. Make sense? Okay. As far as the representation in a box like this, if you went down and actually examined all the transistors, okay? The eight that are laid side-by-side in a single character, but byte of memory, it would look like this.

And in order to recover the actual decimal equivalent, you really do do – you really do the power series expansion, where you say there's a contribution of two to the sixth because it's in the sixth – counting from zero from the right, the sixth position from the end of the byte. This contributes to the zero, if you can look at it as having contributions of two to the first, and two to the third, and two to the seventh that are weighted by a zero as opposed to a one, okay? That make sense to people? Okay. So that's good enough for characters. Let's graduate to shorts. Some people are very clever when they use shorts. A lot of times they'll – if you know that you're going to store a lot of numbers, and they're all going to be small, they'll go with an array of shorts, or a vector of shorts, knowing that there really will be some memory savings later on. The short, being two bytes, just means that two neighboring bytes in memory would be laid down. Those are the two bytes at the moment – would be laid down, and the two to the sixteenth different patterns that are available to be placed in that space. It can distinguish between two the sixteenth different values. That make sense to people? Okay. So I'll just make this up. I'll put lots of zeros over here, except I'll put one right there. Did I put too many? Yes. I did. And this is a wide bit. Okay. So as far as the number that that represents – I should emphasize that technically, you can map that pattern to any number you want to, as long as you do it consistently. But you want to make the computer hardware easy to interpret. This place right here means that there's a contribution of two to the zeroth, or one. There's a contribution of two to the first, contribution of two to the second. So there's a two and a four that are being added together. Two to the zeroth, two to the seventh, two to the eighth, two to the ninth. Okay, so there actually is a contribution of two to the ninth, which is 512. So this really is the number that's represented by this thing. It would be 512, 516, 518, 519 would have that bit pattern down there, okay? Does that make sense to people? If I have another one – oops, I don't want that there. I have one zero, followed by all ones and all ones, okay? I know that if this had been a one right there, then that would have been a contribution of two to the fifteenth. Does that make sense to people? Okay. Zero followed by all ones in binary is like zero being followed by all nines, in some sense, in decimal. It's one less than some perfect number that has a lot of zeros at the end, okay? Does that make sense? So think about you have a binary odometer on your car, and you want to take a mile off, okay, because you're at, let's say, one followed by 15 zeros. If you back it up, you expect all of these to be demoted not to nine, but to one. So, as far as a representation is concerned, it's one less the two to the fifteenth. Makes sense? And that number is two to the fifteenth minus one, which I'm not going to figure out what it is. Okay? But you get the jest of what's' going on here?

Okay. So that's enough. There is a little bit to be said about this bit right here. If I wanted to represent the numbers zero through two to the sixteen minus one, I could do that. Okay, that's two to the sixteenth different values. I don't want to say that negative numbers are as common as positive numbers, but they're not so uncommon that we don't want to have a contribution of the mapping to include negative numbers. So what usually happens is that this bit, right there, had nothing to do with magnitude. Okay, it's all about sign, whether or not you want a zero there because it's positive, or a one for negative. And that's usually what zero and one mean when they're inside a sign bit. Makes sense? Okay. So if I write down this and I have, let's say, four zeros followed by zero, one, one, one, okay? That's a seven. If I put all zeros there, it happens to be a seven that hogged a little bit more memory, okay? It was a seven character initially, and now it's a seven short. I could argue that this would be the best way to represent negative seven. And you can look at it and you can recover the magnitude based on what's been drawn. And then just say – look all the way to the left – and say that one is basically the equivalent of a minus sign. That would be a fine way to do it if you wanted to go to the effort of actually always looking at the left-most bit to figure out whether it's negative or not. The reason it's not represented this way is because we want addition and subtraction to actually follow very simple rules, okay? Now, let me just be quite obtuse about how you do binary addition. Not because it's difficult, but because it's interesting in framing the argument as to why we adopt a different representation for negative numbers. Let's just deal with a four-bit quantity, okay? And I add a one to it. Okay. Binary addition is like decimal addition with more carries because you just don't have as many digits to absorb magnitude, so one plus one is two, but you write that down as a zero and you carry the one. You do this. And that's how seven plus one becomes eight. Okay. I imagine everybody followed that.

However, I want the computer hardware and its support for addition and subtraction to be as simple and obvious as possible. So what I'd like to do is have the number for positive seven, when added to the representation for negative seven, to very gracefully become all zeros. Does that make sense? Well, if I use the same rules, one – I'm sorry, zeros followed by zero, zero, one, one, one. This is four of them. This is eight of them. And I want to add that to seven zeros followed by four zeros, zero, one, one, one. Let's put a four in there. Let's put an eight in there. If I followed the same rules – and think about – I mean it's not like – the hardware is what's propagating electrons around and voltages around to emulate addition for us. If we want to follow the same rules, we would say, "Okay. Well, that's naught two. Carry the two. That's three. Carry the one. That's that." Let me just make sure I don't mess this up. Seven plus seven is fourteen, so it would be that right there. Okay. And then you'd have 11 zeros followed by a one. If I really just followed the algorithm I did up there blindly, that's how I would arrive at zero. Okay. And that's obviously not right. If I were to argue what representation that is, if this is negative seven, then this has to be negative fourteen. That's not what 7 plus negative 7 is. Okay. So that means that this right here, as negative number, has to be engineered in such a way that when you add this to this using normal binary ripple add pattern, okay, that you somehow get 16 zeros right here, okay? It's easier to not worry about how to get all zeros here. It's actually easier to figure out how to get all ones here. So if I put down four, five, six, seven, eight. One, two, three, four, let's mix it up. Let's put the number 15 down. And I want to figure out what number – or what bit pattern, to put right here to get all ones right here, then you know you would put a bit pattern right there that has a one where there's a zero right here and a zero where there's a one up here, okay? This right here is basically one mile away from turning over the odometer, okay? Does that make sense? Okay. So what I want to do is I want to recognize this as 15 and wonder whether or not this is really close to negative 15.

And the answer is, yes, it is because if I invert – take this right here and I invert all of the bits, if I add one to that number right there, do you understand I get this domino effect of all of these ones becoming zeros? I do get a one at the end, but it doesn't count because there's no room for it. It overflows the two bytes, okay? So this right here would give me this one that I don't have space to draw because I'm out of memory, all the way down. So what I can do is rather than just changing this right here to be a sign bit, I can take the forward number, the positive 15, invert all the numbers, and add one to it, okay? Does that make sense? And that's how I get a representation for negative 15. This right here, this approach – it's what's called ones' complement – it's not used because it screws up addition. This notation for inventing the representation of the negative is what's called two's complement, okay? It's not like you have to memorize that. I'm just saying it. And this is how you've got all zeros. This is positive 15. This is negative 15. That is zero right there, okay? Does that make sense? The neat thing about two's complement is that if you have a negative number, and you want to figure out what negative of negative 15 is, you can follow the same rules. Invert all the bits – there, one – and add one to it, okay? And that's how you get positive 15 back. So this is nice symmetry going on with the system, okay? Make sense? Okay. Now, why am I focusing on this? Because you have to recognize that certainly in the world of characters and shorts, which is all we've discussed so far, that every single pattern corresponds to some real value in our world, okay? Characters, it's one of 256 values. We can fill in the Ascii table with ampersands and As and periods and colons and things like that, and have some unique integer be interpreted as a character every single time. As long as it's constant and it always maps to the same exact pattern, then it's a value mapping. As far as shorts are concerned, I could have used all 16 bits to represent magnitude. I'm not going to do that because I want there to be just as many negative numbers represented as positive numbers.

So I do, in fact dedicate all of the bits from that line to the right to magnitude, okay, and I use the left one – the left-most bit to basically communicate whether the number is negative or not, okay? That means that since there are two to the sixteenth different patterns available to a two-byte figure, that the short can distinguish between that many values. Rather than having it represent zero through two to the sixteenth minus one, I actually have it represent negative two to the fifteen – I'm sorry, negative two to the fourteen – I'm sorry, negative two to the fifteenth through two to the fifteenth minus one. Does that make sense? And everything's center around zero. So I have just as many representations for negative numbers as I have for positive numbers. Okay? Makes sense? Okay. So let's start doing some really simple code, not because it's code you'd normally write. Sometimes you do, not very often. But just to understand what happens when you do something like this. I have a CAR variable, CH, and I set it equal to capital A. And then I have an int variable called – actually, let me make it a short – S, and I set it equal to CH. You don't need a cast for that. What you're really doing is you're just setting S equal to whatever number is backing CH. There's a question right there?


Instructor (Jerry Cain):All right. It shouldn't have been. Oh, I just – I'm sorry, I inverted the bit pattern, and then I said you would add one to this, and I just didn't change the bit in the drawing. Where'd you go? I just saw – okay. So I didn't add one to this yet. But in the conversation at the time I thought it was clear. Okay. Does this make sense to people? Okay. There's – certainly it's gonna compile and it's going to execute. And based on the other seven boards I've drawn in, you should have some idea as to what's gonna happen in response to this line. Print out is the equivalent of a cout statement, but it's in pure C. And if I want to print out a short – actually, let me just cout. Less than, less than S is less than, less than PNDL. In response to that, I expect it to print out the number 65. So to the console I would expect that to be printed. Why is that the case? Because the declaration of CH doesn't put a capital A there, it puts the integer value that backs it there, which I will draw as decimal. You know that it's really ones and zeros. And so when time comes for you to assign to S, what happens in order to achieve the effect of the assignment, it will copy this bit pattern. And this is what it really does electronically. It just replicates whatever bit pattern is right here onto that space right there. It smears these bits onto this little byte like it's peanut butter. And puts a 65 down there. And all the extra space is just padded, in this case, with zeros. So that's how 65 goes from a one-byte representation to a two-byte representation. Does that make sense? I simplified this a little bit. When I put a 65 down here and a smear of 65 in there, I happen to know that the left-most bit is a zero there. It's a positive number so that shouldn't surprise you, okay? Does that make sense to people? What happens if I do the opposite direction? I do this int – I'm sorry, we're not at ints yet. Short – completely new program – S is equal to, I'll say 67, and I do this. It compiles. There's no casts needed. As far as how the 67 is laid down, it's zero, one, zero, zero, zero, zero, one, one. It's two more than 65, obviously. It has an extra byte of all zeros. And this is S.

So when CH gets laid down in memory, and it's assigned to S, two bytes of information, and 16 bits cannot somehow be wedged economically into an eight-bit pattern. So what C and C++, and many program languages for that matter, do is they simply punt on the stuff they don't have room for. And they assume that if you're assigning a large number to a smaller one, and the smaller one can only accommodate a certain range of values in the first place, that your interested in the minutia of the smaller bits. Does that make sense to people? So what happens is it replicates this right here, and it punts on this. And this is how, when you do this right here, okay, you go ahead and you print out a C. Make sense to everybody? Okay. Now, I've kind of evaded the whole negative number things, but negative values don't work too well with characters because unsigned CARs – most characters are unsigned. So you actually do get all positive values with the representations. You know enough about shorts to know that the two-byte figures – I've already told you that longs and ints, at least in our world, are four bytes. They're just a four-byte equivalent of a short. So let me deal with this example. I go ahead and I do a short, S is equal to, I'll just say – let me write it this way. No, I'll just write it as two to the tenth plus two to the third plus two to the zero. That is, of course, not real C. But I'm just writing it because I want to be clear about what the bit pattern for that number is. So just think about whatever number that adds up to as being stored in S. Okay. This is two to the eighth, two to the ninth. One, zero, zero, preceded by all zeros. Lots of zeros. One, zero, zero, one. If I take an int, i, and I set it equal to S, the same argument that I made in the CAR to short assignment can be taken here. And this is how – and this is somehow less surprising because both of them represent integers.

This is all zeros. All zeros. Lots of zeros followed by one, zero, zero, zero, zero, zero, zero, one, zero, zero. And that's why. You just have a lot more space to represent the same small number, okay? Trick question. If I set int i equal to – I have 32 bits available to me to represent pretty big numbers, so I'm gonna do this. Two to the twenty-third plus two to the twenty-first plus two to the fifteenth plus, let's say, seven. Okay? And I'm being quite deliberate in my power of two representation of these numbers because seven always means that at the bottom, okay? Two to the fifteenth means there's a one right there. Two to the – actually, let me change this to two to the fourteen. Make this a zero, one. Two to the twenty-first – although this is two to the twenty-fourth. Two to the twenty-third, followed by zeros. All zeros right there. So that's more or less what the bit pattern for that, as a four-byte integer would look like. I go ahead and I set short S equal to i. You could argue that wow, that numbers so big it's not gonna fit in the short, okay? And so you might argue that well maybe we should try and come as close as possible and make S the biggest number it can be so it can try really hard to look like this number. And that's not going to happen. It's gonna do the simplest thing. Remember this is implemented electronically, and every single example over there has more or less been realized by just doing a bit pattern copy, okay? If you're writing this way, you probably know that you're going and taking a four-byte quantity and using it to initialize a two-byte quantity. So lay that down, this is S. And all it does is say, "You know what, I have no patience for you right there. You're out. I'm just gonna copy this down." Okay? And so I do this followed by lots of zeros, followed by lots more zeros, followed by one, one, one. And I print out S. I'm gonna get the number that is two to the fourteenth plus seven. Does that make sense to people?

Okay. So let me go back and do one more example before I move on to floating-point. Oh, yeah?

Student:Initially you had three to the fifteenth?

Instructor (Jerry Cain):Right. I'd say it's – it's actually confusing as to what happens. It certainly is. I actually don't know what happens when what is a magnitude bit actually becomes a sign bit. I have to say I certainly should know what happens. I just don't, which why I gracefully said, "Oh, I have an idea. Let me just change this to two to the fourteenth." I'll actually run this remnant after lecture and I'll just mail the class this as part of this email that everyone is getting today, okay? Yep?

Student:[Inaudible] the other way around [inaudible] a sign short [inaudible]?

Instructor (Jerry Cain):Well, it will always preserve sign, and I'm gonna – that's the very example I'm gonna do right now, okay? Suppose I did this. Short S is equal to negative one. Totally reasonable. Do any of you have any idea what the bit pattern for that would look like? And you can only answer if you didn't know the answer prior to 11:00 a.m. today. Okay. I want to be able to add one to negative one and get all zeros, okay? Does that make sense? So the representation for this is actually all ones. In fact, anytime you see all ones in a multi-byte figure, it means it's trying to represent negative one. Why? Because when I add positive one to that, it causes this domino effect and makes all those ones zeros. Does that make sense? So to answer your question, int i equal to S, logically I'm supposed to get int to be this very spacious representation of negative one. It actually does use the bit pattern copy approach. It copies these. I've just copied all the magnitude, okay? And by what I put down there, it's either one or – I'm sorry. It's either a very large number or it's negative one. We're told that it's negative one right there, okay? What happens is that when you assign this to that right there, it doesn't just place zeros right there because then all of the sudden it would be destroying the sign bit. It would be putting the zero in the sign bit right there. Make sense? So what it really does, and it actually did this over here, but it was just more obvious, is it takes whatever this is in the original figure and replicates that all the way through. If these would have otherwise been all zeros, and I want to be able to let this one continue a domino effect when you add a positive number to a negative number, you technically do what's called "sign extend" the figure with all of these extra ones. So now you have something that has twice as many dominos that fall over when you add positive one to it, okay? Does that make sense?

Okay. So there you have that. As far as character shorts, ints, and longs, they're all really very similar in that they some binary representation in the back representing them. They happen to map to real numbers, for ints, longs, and shorts. They happen to pixelate on the screen as letters of the alphabet, even though they're really numbers, very small numbers in memory, okay? But the overarching point, and I don't want you to – I actually don't want you to remember – memorize too much of this. Like, if you know what – if you know that seven is one, one, one, and you know that all ones is negative one, that's fine. I just want you to understand the concept with integers I have four bytes. I have 32 bits. That means I have two to the thirty-second different patterns available to me to map to whatever subset of a full integer range I want. The easiest thing to do is to just go from two to the negative thirty-first through two to the positive thirty-first minus one, okay? There's zero in the middle. That's why it breaks it symmetrically a little bit. When I go and start concerning myself with floats – I – you're probably more used to doubles, but this is just a smaller version of doubles. I have four bytes available to me to represent floating-point numbers, integers with decimal parts following it, in any way I want to. This isn't the way it really works, but let me just invent an idea here. Pretend that this is how it works. We're not drawing any boxes yet. I could do, let's say I have a sign bit. I'll represent that up here as a plus or minus. And if I have 32 bits, you'd, by default, thinking about bits and contribution of two to the thirtieth, two to the twenty-nine, all the way down through some contribution of two to the zero. And I'm just describing all the things that can adopt zeros or ones to represent some number, okay? But I want floats to be able to have fractional parts.

So I'll be moving in the fractional direction, and say, "You know what? Why don't I sacrifice two to the thirtieth, and let one bit actually be a contribution of two to the negative first?" I'm just making this up. Well, I'm not making it up. This is the way I've done it the last seven times I taught this. But I'm moving toward what will really be the representation for floating-point numbers. If I happen to have 32 bits right here. And I lay down this right here. That's not the number seven – I'm sorry, that's not the number 15 anymore. Now, it's number seven point five. Does that make sense? Okay, well floats aren't very useful if now all you have are integers and half-integers. So what I'm gonna do is I'm gonna stop drawing these things above it because I have to keep erasing them. Let's just assume that rather than the last bit being a contribution of two to the negative first, let me let that be a contribution of negative two to the negative first, and that let that be a contribution of two to the negative two. Now I can go down to quarter fractions. Does that make sense? Well, what I could do is I could make this right here a contribution of two to the zero, two to the negative one, two to the negative two, three four, five, six, seven, eight, two to the negative nine. And if I wanted to represent Pi – I'm not going to draw it on the board because I'm not really sure what it is, although I know that this part would be one, one – then I would use the remaining nine bits that are available to me, okay, to do as good a job using contributions of two to the negative first, and two to the negative third, and two the negative seventh to come as close as possible to point one four one five whatever it is, okay? Does that make sense to – I'm assuming? It is an interesting point to remember that because you're using a finite amount of memory, you're not going to do a perfect job representing all numbers in the infinite, and infinitely dense, real number domain, okay? But you just assume that there's enough bits dedicated to fractional parts that you can come close enough without it not really impacting what you're trying to do, okay? You only print it out to four decimal places, or something that just looks like it's perfect, okay? Does that make sense? It turns out if I do it that way, then addition works fine. So I add two point five contributions and it ripples to give me a one and I carry a one. It just words exactly the same way. Does that make sense? Okay. It turns out that this is not the way it's represented, but it is a technically a reasonable way to do it. And when they came up with the standard for representing floating-point numbers, they could have gone this way. They just elected not to.

So what I'm gonna do now is I'm gonna show you what it really does look like. It's a very weird thing. But remember that they can interpret a 32-bit pattern any way they want to, as long as the protocol is clear, and it's done exactly the same way every single time. So for the twentieth time today, I'm gonna draw a four byte figure. I'm gonna leave it open as four byte rectangle because I'm not gonna subdivide it into bytes perfectly. I'm going to make this a sign bit because I do want to represent – I want negative numbers and positive numbers that are floating-point to have an equal shot at being represented, okay? That's one of the 32 bits. Does that make sense? The next eight bits are actually taken to be a magnitude only – I say it that way. I should just call it an unsigned integer – from here to there, okay? And the remaining 23 bits talk about contributions of two to the negative one, and two to the negative two, and two to the negative three. Okay, this right here, I'm gonna abbreviate as EXP. And this right here, I'm just gonna abbreviate as dot XXX XX, okay? The – what – this figure and how it's subdivided is trying to represent this as a number. Negative one to – I'll abbreviate this as S – to S right there. One point XXX XX times two to the one twenty-eight – I'm sorry, hold on a second. EXP minus one twenty-seven, okay? It's a little weird to kind of figure out how the top box matches to the bottom one. What this means is that these 23 bits somehow take a shot at representing point zero, perfectly as it turns out, to something that's as close to point nine, nine bar as you could possibly get with 23 bits of information. When these are all ones, it's not negative one. It's basically one minus two to the twenty-third. Does that make sense to every? Okay. That is added to one to become the factor that multiplies some perfect power of two. Okay? This right here ranges between two to the eighth – I'm sorry, 255 and zero. Does that make sense?

When it's 255 and it's all ones, it means the exponent is very, very large. Does that make sense? When it's all zeros, it means the exponent is really small. So the exponent, the way I've drawn this here, can range from 128 all the way down to negative 127. Makes sense? That means this right here can actually scale the number that's being represented to be huge, in the two to the one twenty-eight domain, or very small, two to the negative one twenty-seventh, okay? The number of added to the world down to the size of an atom, okay? You may think this is a weird thing to multiply it by, but because this power of two-thing right there really means the number is being represented in the power of two domain. You may question whether or not any number I can think of can be represented by this thing right here. And then once you come up with a representation, you just dissect it and figure out how to lay down a bit pattern in 32 byte – 32-bit figure. Let me just put the number seven point zero right there. Well, how do I know that that can be represented right here? Seven point zero is not seven point zero. It's seven point zero times two to the zeroth, okay? There's not way to get and layer that seven point zero over this one point XXX and figure out how – what XXX should be. XXX is bound between zero and point nine bar. But I can really write it this way. Three point five times two to the first, rather one point seven five times two to the second. So as long as I can do a plus or minus on the exponent, I can divide and multiply this by two to squash this into the one to one point nine range. And just make sure that – I have to give up if this becomes larger than 128 or less than negative 127. But you're dealing with, then, absurdly large numbers, or absurdly small numbers. But doubles love the absurdity because they have space for that accurate of a fraction, okay? Does that make sense to people? Okay, so this right here happens to be the way that floating-point numbers are actually represented in memory. If you had the means, and you will in a few weeks, to go down and look at the bit patterns for a float, you would be able to pull the bit patterns out, actually write them down, do the conversion right here, and figure out what it would print at. It would be a little tedious, but you certainly could do it. And you'd understand what the protocol for coming from representation to floating-point number would be, okay? Let me hit the last ten minutes and talk about what happens when you assign an integer to a float, or a float to an integer, okay? I'm gonna get a little crazy on you on the code, all right. But you'll be able to take it.

I have this int i is equal to 35 – actually, let me chose a smaller number. Let me do just five is fine. And then I do this. Float F is equal to i. Now you know that this as a 32-bit pattern had lots of zeros, followed by zero, one, zero, one at the end, four plus one, okay? Makes sense? When I do this, if I print out F, don't let all this talk about bits and representation confuse the matter. When you print out F there, it's going to print the number five, okay? The interesting thing here is that the representation of five as a decimal number is very, very different than the representation of five using this protocol right here. So every time – not that you shouldn't do it – but every time you assign an int to a float, or a float to an int, it actually has to evaluate what number the original bit pattern corresponds to. And then it has to invent a new bit pattern that can lay down in a float variable. Does that make sense? This five is not – the five isn't five so much as it is one point two five times two to the second. Okay, as far as this is concerned right here. So that five, when it's really interpreted to be a five point zero, it's really taken to be a one point two five – is that right? Yeah. – Times two to the second. So we have to choose EXP to be 129 and we have to choose XXX to be point two five. That means when you lay down a bit pattern for five point zero, you expect a one to be right there. And you expect one – one, zero, zero, zero, zero, zero, zero, one to be laid down right there, 128 plus one. Does that make sense to people? You gotta nod your head, or else I don't know. Okay. This is very different – and this is where things start to get wacky – and this is what one oh seven's all about. If I do this right here, int i is equal to 37. And then I do this, float F is equal to asterisk – you're all ready scared. Float, star, ampersand of i. I'm gonna be very technical in the way I describe this, but I want you to get it. The example above the double line, it evaluates i, discovers that it represents five, so it knows how to initialize F. Does that make sense? This right here isn't an operation. It doesn't evaluate i at all. All it does is it evaluates the location of i. Does that make sense? So when the 37, with it's ones and zeros represented right there, this is where i is in memory. The ampersand of i represents that arrow right there, okay? Since i is of type int, ampersand of i is of type int, star, raw exposed address of a variable. That's four bytes that happens to be storing something we understand to be an int. And then we seduce it, momentarily, into thinking that it's a float star, okay?

Now, this doesn't cause bits to move around, saying, "Oh, I have to pretend I'm something else." That would be i reacting to an operation against the address of i, okay? All the furniture in the house stays exactly the same, okay? All the ones and zeros assume their original position. They don't assume, they stay in their original position. It doesn't tell i to move at all. But the type system of this line says, "Oh, you know what? Oh, look. I'm pointing to a float star. Isn't that interesting? Now, I'm gonna be reference it." And whatever bit pattern happened to be there corresponds to some float. We have no idea what it is, except I do know that it's not going to be thirty-seven point zero, okay. Does that make sense? In fact it's small enough that all the bits for the number 37 are gonna be down here, right, leaving all of these zeros to the left of it, okay? So if I say stop and look at this four byte figure through a new set of glasses, this is going to be all zeros, which means that the overall number is gonna be weighed by two to the negative one twenty-seven. Makes sense? There's gonna be some contribution of one point XXX, but this is nothing compared to the weight of a two to the negative one twenty-seven. So as a result of this right here, and this assignment, if I print out F after this, it's just gonna be some ridiculously small number because the bits for 37 happen to occupy positions in the floating-point format that contribute to the negative twenty-third, and to the negative twentieth, and things like that, okay? Does that make sense to people? Okay. So this is representative of the type of things that we're gonna be doing for the next week and a half. A lot of the examples up front are going to seem contrived and meaningless. I don't want to say that they're meaningless. They're certainly contrived because I just want you to get an understanding of how memory is manipulated at the processor level.

Ultimately, come next Wednesday, we're gonna be able to write real code that leverages off of this understanding of how bits are laid down, and how ints versus floats versus doubles are all represented, okay? I have two minutes. I want to try one more example. I just want to introduce you to yet one more complexity of the C and C++ type system, and all this cast business. Let me do this. Let me do float F is equal to seven point zero. And let me do this short S is equal to asterisk, short star, ampersand of F. Looks very similar to this, except there's the one interesting part that's being introduced to this problem, is that the figures are different sizes, okay? Here I laid down F. It stores the number seven point zero in there. And that's the bit pattern for it, okay? The second line says, "I don't care what F is. I trust that it's normally interpreted as a float, and that's why I know that this arrow is of type float, star." Oh, let's pretend – no, it isn't any more. You're actually pointing – that arrow we just evaluated? It wasn't pointing to a float. We were wrong. It's actually pointing to a two byte short. So all of the sudden, it only sees this far, okay? It's got twenty-forty vision, and this right here, this arrow, gets dereferenced. And as far as the initialization of S is concerned, it assumes that this is a short. It assumes that this is a short so it can achieve the effect of the assignment by just replicating this bit pattern right there, okay? And so it gets that. Okay, and whatever bit pattern that happens to correspond to in the short integer domain, is what it is. So when we print it out, it's going to print something. Seven point zero means that there's probably gonna be some non-zero bits right here. So it's actually going to be a fairly – it's gonna have ones in the upper half of the representation. So S is gonna be non-zero. I'm pretty sure of that, okay? Does that make sense to people?

Okay. That's a good place to leave. Come Monday, we'll start talking about – we'll talk a little bit about doubles, not much. Talk about strucks, pointers, all of that stuff, and eventually start to write real code. Okay.


Lecture 2 Programming Paradigms

My intent here is to provide a gentle introduction to some of the container classes
defined by the STL. My personal experience is that the pair, the vector and the map
are used more often than the other STL classes, so I’ll frame my discussion around them.
After reading through this, you might bookmark two publicly which explain (or at least
document) all of the various components of the STL.
Handout 1 Introducing the STL
8 pages

This handout covers some of the basics of using the Leland UNIX environment to edit,
compile, and submit your programs:
 Logging in, class directory, starter files
 Editing your source files
 Compiling your program
 Submitting your program
Handout 2 UNIX Basics
4 pages

To begin, we are going to take a glimpse into the inner workings of a computer. The
goal is that by seeing the basics of how the computer and compiler cooperate, you will
better understand how language features work.
Handout 3 Computer Architecture
4 pages

The Ins and Outs of C Arrays
Handout 4 Arrays the Full Story
7 pages

Programming Assignment 1 RSG Instructions
8 pages

Topics: Converting Between Types of Different Sizes and Bit Representations Using Pointers, Little Endian vs. Big Endian, Structs: How the Data of a Struct is Stored, Accessing the Data of a Struct, Arrays, Pointer Arithmetic on Arrays, Result of Casting Arrays to Different Types, Layout in Memory of Structs, Dynamically Allocated Strings in C vs. Arrays of Characters, Modifying Internal Data of Structs Using strcpy, Character Arrays and cout, Generic Functions in C Using Memory and Pointers



Instructor (Jerry Cain):Hey everyone. I have one handout for you today. You don't have it in your hands because we're gonna pass it around. I just really have the one. I'm gonna have T.A.s pass it around during the lecture. I sent out an email on Saturday. Did anyone not get that email? Probably a handful. Oh, wow, nobody – everybody got it. That's great. Okay, you did not get it – the email? Okay. Well, if you can, like, email me directly after class so I can figure out what the deal is. The reason I say that is because I just realized over the weekend – this is totally my fault – that the section room for tomorrow, Skilling 183, seats 48 people. And I'm telling everybody to go to it. So I have some – I have to figure something out there. Normally that's going to be fine because most of you will just watch it on TV and get in the habit of doing that. But tomorrow I kinda want you to go. So I'm working on one of two solutions. I'm either gonna try and get a bigger room just for tomorrow for 4:15, or I'm gonna get a huge room at 3:15, still have the 4:15 section, and try and push everyone to go to the earlier one if they have any flexibility. So I will make a decision as to what – based on what rooms are available to me later today. And I will send out an email, which is why I'm asking about the email. Okay? So definitely stay tuned for a CS107 email after today. When I left you –what? Question right there.

Student:What was the time again?

Instructor (Jerry Cain):Tuesday at 4:15 are the normal sections. I may have one – may is the operative verb there – I may have one tomorrow at 3:15, just tomorrow, to accommodate the sheer number of people I expect to show up. Okay? When I left you last time, I was doing little asterisk and ampersand tricks. Let me do another one. I have a double D, and I set equal to 3.1416. And as a result, I actually get a fairly large figure in memory that's populated with pi. We'll decorate this with a D variable. And if I go ahead and do the following, CAR CH is equal to asterisk, CAR star, ampersand of D – there was a little bit of confusion about this. Because of that ampersand operation right there, the actual bit pattern that resides in the eight-byte figure that we're calling D, the bit pattern is actually irrelevant. This is an expression on the address of D. It doesn't have to look inside the eight boxes to figure out what's important here. It has to evaluate that address because that's there. It's seduced into thinking that it's actually storing the address – I'm sorry, that' it's an address of a single character. So when it's dereferences this right here, it goes and it embraces that single byte right there. Whatever bit pattern happened to reside there before is now pretending to be a character for the lifetime of this statement right here. Does that make sense to people? So if this happened to be – I don't know what it is – but suppose it's this bit pattern right there, okay? This variable called CH will get that very bit pattern. And if I go ahead and do cout << CH << ENDL, whatever this corresponds to gets printed to the console. Okay. That make sense to people? Okay, let me do one more little tricky thing here. If I declare a short, and I set it equal to 45, I get a two-byte figure that has a 45 in it. It's stored in binary, but I'm just gonna write 45 because it's easier to look at that. And I do this; this is where you get into a little bit of danger. Double star – I'm sorry, double D is equal to asterisk of double star ampersand of S. Most of what I've said prior applies here as well. This is one scenario where things are a little bit mysterious. This right here evaluates to the ampersand of S. That's the address associated with that arrow, the number associated with that arrow. And this is a brute force reinterpretation of that address. So what's gonna happen is it's gonna say oh wow, that arrow now – it never pointed to a short. It actually points to an eight-byte double. So it will go, and not only include those two bytes right there, but the six bytes that follow it. Okay? Whatever bit pattern happens to reside there, provided there's no memory crash – and I'll explain why that could happen in a second. As long as it gets away with it, it's gonna go and embrace all eight bytes of those – eight bytes of information there, interpret it as an eight-byte double, and then assign it to this thing called D. So if this is a 45, followed by that as a byte pattern, then D would get 45 with this as a byte pattern. And when I print that out, it's gonna print out whatever number happens to be associated with that representation. Does that make sense to people? Okay. Now, I could do these for days, okay? And show you every little combination between asterisks and ampersands and double asterisks and whatnot. I want to move on and start talking about arrays and structs first because I think you'll just get more practice there. We'll start learning some more material. There was a question right here?

Student:Yeah. Are these examples gonna behave differently on low NDN?

Instructor (Jerry Cain):Certainly, yeah. The – well, as far as the bit copying is concerned, no. This ampersand, right here, is always the address of the lowest byte. But as far as how the – NDNS has to do more with interpretation and placement of bytes relevant to one another. As far as – there was these phrases in the handout that I kind of de-emphasized, but they're there. And since your asking, I'll talk about it. Little NDN. If I were to write down a two-byte short like that, and if I were to store the number one in that short, you would just say, "Oh, well the one just goes right here, and it's proceeded by 15 zeros." Does that make sense to people? Okay. That is true on about half of the systems in existence at the moment. This right here – not only is a representation for positive one as a two-byte short – it happens to be stored in what's called big NDN format. And the best way to remember that – it's kind of arbitrary as to what big versus little means in this context. I just remember it as the lowest byte stores the bits that correspond to the largest contributions of magnitude. Does that make sense? Okay. And since on is such a small number, that's why you have all these ones over here. On some machines, in particular the Linux machines that you're probably working on, it would store this with the bytes in the reverse order. Okay? It would actually have the one right there, proceeded by seven zeros, followed by eight zeros right there. And it's just when it goes it interprets it as two-bye short. It actually assumes that these are bits zero through seven. And these are bits eight through fifteen. Does that make sense? So if you actually were to – and you could do this if you wanted to – if you were to copy a two-byte short from a Linux machine to Solaris machine, and just do it on a byte copy level, you wouldn't get the same numbers on different machines. Okay, you would get one on a bit NDN machine here. You get 256 – I'm sorry – yeah 256 on little NDN machine. Does that make sense to people?

Okay. For the most part you don't have to worry about NDNS at all. There's one aspect in assignment two that happens to deal with it, but I'm more or less insulate you from it. Okay? Just something to be sensitive to. What I want to talk about now are structs, how they work. How arrays work. How arrays of structs, structs with arrays inside all work. Given what we know already – let's kill this. Let me go ahead and declare this as a struct right here. Very simple. Struct fraction int num int denom. That right there, that's the C way – I'll assume we're in C++ for the moment. That's enough to declare fractions stand alone as a new type. If I do this, fraction – let’s say pi as a variable name. As a result of that, I obviously get enough memory to store a fraction. In 106B and 106X, and maybe 106A as well, you drew them as the somewhat loose rectangles around two boxes. Okay, I want to be a little bit more structured than that. I want to recognize that the amount of memory that's set aside for the struct fraction, not surprisingly, is eight bytes. Okay? It's basically the sum of some of its parts, and it actually packs all of those bytes as tightly as possible. I'm gonna draw this as eight bytes. I'm gonna emphasize the fact that it’s really four byte stacks on top of four more bytes. The address of the entire struct is always coincident with the address of the first field. So looking at this, and assuming that that's a picture of one of these things, you know that this is the num field. In this case, it'll be pi dot num because that's the way I declared it right there. And stacked on top of that, four bytes above the base address of the entire thing, would bit pi dot denom. Okay? So when I draw that arrow right there, unless they give you some context, you don't know whether it's an int star pointing to the address of the num field – I'm sorry, storing the address of the num field or the address of the entire struct. Okay? When I do this, not surprisingly, that places a 22 in the lower four bytes of the entire figure. The more technically accurate way of saying it is that it actually stores a 22 to the field that's at an offset of zero from the base address of the entire struct. Okay? That's why 22 gets placed there. When I do this and store a seven some where – because it recognizes base on this definition, which it certainly sees before it sees this line right here, it knows that denom is stacked on top of num because num is a four-byte integer, but denom is four bytes above the base address of the entire thing. And that's how it knows where to put the 29 zeros followed by three ones for the seven. Okay? If I go ahead and do this – this is where things get crazy – if I go ahead and do this, ampersand of pi dot denom. I'll do that. You don't technically need it, but that makes it clear what you're taking the address of. I have an int star, right, unless I do this. Okay? And now I have the address of a fraction. So what happens is just on the fly, it stops thinking about this address right there as a stand-alone integer, or pointing to a stand-alone integer. Now it has this picture, all of a sudden, at the moment that it's addressing this eight-byte picture that overlays that space right there. So when I go ahead – let me actually draw this a little bit more accurately. I go ahead and do fraction, asterisk, and I do this, num is equal to 12. The arrow comes after something that at that moment is assumed to be the address of an entire fraction struct. So the arrow travels to that struct. There's not much traveling to do because it's already there. And then it goes inside and identifies the num field as the place that should receive the 12. Where is that num field? It is right here. Does that make sense to people? Okay. So if I go ahead and I just print out cout << pi dot denom, behind pi dot denom's back, it was changed from a seven to a twelve. Okay? If I do the same exact thing, fraction, star, ampersand of pi dot denom, arrow denom is equal to 33, it's going to – it's not going to be concerned about the fact that I really don't own the space above what is truly pi dot denom. The way the mechanics work takes the base address of this. Oh look, it's a fraction star now. Go four bytes beyond that to find out where the 33 belongs. It's gonna smear down the four-byte representation of 33 in this space right here. And there's no legal way to get to it and print it out, but if I did this again to the right of a cout statement, it would print out a 33. Does that make sense to people? Okay. Good. Let me do one more thing. Actually, let's not. Let's go on to arrays. Int array. Very different from Java. We didn't talk about arrays in CS106B and 106X as much as we will in 107. Question in the back?

Student:Yeah. Sir, I don't know if I caught you correctly, but did you say that when you set the fraction and then ampersand, p dot denom, the num to 12, does that mean when you access pi dot num – or sorry, pi dot denom that it's gonna be 12 and [inaudible].

Instructor (Jerry Cain):That is correct. Right. You happen to invade pi dot denom's space using some quirky syntax. Okay? Just because you happen to know that pi cot denom resides above pi dot num, just because you reinterpret the address to be associated with a different data type, if you happen to operate on space that overlays the original pi dot denom, then you're affecting what really is pi dot denom's value.

Student:So is there to access the denominator – or the denom of the four-byte representation within the four-byte representation?

Instructor (Jerry Cain):Within the –

Student:How – basically, how can you access that 33 without doing another [inaudible].

Instructor (Jerry Cain):You actually can't. I'm sorry, you certainly could use this expression again to do so.

Student:You can't access it any other way?

Instructor (Jerry Cain):If you wanted to – I mean this will be more clear after I do the array example, but if I wanted to – since you're asking – pi dot denom, fraction star. Okay? Do you understand that that's the address of the top two thirds of the drawing? What I could do is I could do something like this. That's a little sneaky. That'll be clear after I do the formal array example that I'm covering up right now. But that's effectively a dereference to go and get to the denom field. Okay? I wouldn’t even have to do this necessarily. I could just do address of pi of one dot num is equal to, or print that out or something like that. Okay? I mean this will become more clear after I talk about the array business a little bit.

Student:Because what's at the one index is actually a fraction?

Instructor (Jerry Cain):It literally is eight bytes beyond what's at the zero index. Okay? You don't get that; don't worry. I'll start with a simpler example right here. This right here, you know this already. It allocates 40 bytes of memory, okay, for the ten bytes that are being set aside and under the jurisdiction of this key word called "array." So I'm going to draw it this way, and do a module of five – spit it out. You know that this is a zero index, and you assigned to it using an array of zero. This is an array of nine. So when I go ahead and do something like this, array of zero is equal to 44. I put a 44 there. You know this. If I do array of nine is equal to 100, and 100 goes there. What you may not recognize is that array itself is synonymous with the address of the zeroth entry. Okay? Let me write that down as, like, a little – like a little theorem. In the context of that declaration right there, array is completely synonymous with ampersand of array of zero. Okay. That's why when you pass an array to a helper function, or any function whatsoever, you're not passing the entire array, you're just identifying the location of the zeroth entry, and from that you can access anything legitimately beyond it, as long as you know how long the array is. Okay? If I go ahead and do this, let's create some tension.

45, and obviously zero, one, two, three, four, five, that's nothing new. If I go ahead and mess up, and I don't understand four loops, and I don't understand arrays well enough to not make this mistake yet. If I go ahead and I write down the number one, it's consistent with the offsetting that's done relative to the base address of the entire thing. This right here assigns a 44 to the int that is at zero ints forwards of the base address. Go ahead nine quantums of integers to find out where the one hundredth should go. Go ahead five. Java's a different story, but in C and C++, there's no bounce checking done at all on raw arrays, and that's exactly what this thing is right here. So, when I do this, it really says, oh. Well, that's interested in that address right there. This is ten is interpreted to be ten times the size of an integer, which is four, for a 40-byte offset from the base address right here. So it goes right there, and it leaps forward forty bytes to the base address of what it has no choice but to assume is an integer space. So it's going to go down. Whether it's going to cause problems or not is a different story. It'll try and place a one right there. Okay? If I do this, f 25 is equal to 25, and somewhere over here, a 25 is laid down. Okay? It actually even tolerates negative numbers. It's that brute force – I don't want to put zero. That's not very useful, 77. It would march back one, two, three, four places to figure out where to place this 77, and that's how memory as a side effect would be updated by these bogus little statements right there. Okay? Does that make sense? Question in the back?

Student:[Inaudible] make the assignment of the right 10 if it's going to do that anyway?

Instructor (Jerry Cain):I'm not sure what you mean. Say it again.

Student:Say an array 10, like, you initialize it 10 by spaces, but like, what's the point of initializing it if it's just going to do what's – basically do what you want when you get inside of it.

Instructor (Jerry Cain):That is true, actually. This right here is really just documentation for how much space is being allocated. And then you're supposed to write code – I'm not saying this is good code. I'm just saying its code. Okay? You're supposed to write code that's consistent with the amount of space that you legally have. But this, this, and this just work because there's no bounce checking. It doesn't look arbitrarily far backwards to figure out whether or not it's an in-range index. So when it gets away with this, and it compiles, and it runs, it's just gonna put a one where it assumes that the eleventh entry would be, or the twenty-sixth entry, or the negative fourth entry. Okay? Does that make sense to people? Does that make sense? Okay. Yep?

Student:[Inaudible] the memory?

Instructor (Jerry Cain):It doesn't in C and C++, not at all. All it does is instruction for that one declaration as to how much – how many variables, more or less to – I'm sorry, how many ints to set aside space for. But once you do that, like, there's no – the length of the array is that. But the length of the memory figure, it's not exposed to you. So there's no way to recover it. That's why you always pass around the length, width, a raw array in C and C++. Okay? You use vectors more than you did raw arrays in C in 106B, but we're gonna be more C programmers than C++ programmers for the next few weeks, so we don’t have vectors because we don't have classes. And we don't have templates. So we actually have to take this approach right here. Okay? Yep?

Student:So you use the address of pi and the X sets it as an array there. Is it gonna know that the sides of each element is a fraction?

Instructor (Jerry Cain):Yep. That's – it uses the data typing of whatever pi is right there, and because ampersand of pi is an int star, it knows that if you automatically – if you just all of a sudden start treating it as the base address of an array, even if it is really only an array of length one, it's gonna deal with the default offset of eight because that's how many bytes are in a fraction. Okay? So this will become a little bit more clear after I put a few more little theorems over there. Okay? When you do this right here – let's say it this way. Array of K, where K is an arbitrary integer, it is – the address of that thing is completely synonymous -- and you did not see this all that much, if at all in 106. It is synonymous with this right here. Okay. So the first line isn't array so much as it is array plus zero on the left hand side. Okay? This right here, given that example, array is of type int star. There's no storage for array. It's not like the address, the base address of the array is stored anywhere that you can manipulate. But this is of type int star. If this is assumed to be an integer, which it is in this example, then you're not doing normal arithmetic here. You're doing what's called pointer arithmetic. And it knows that you're not going to be dealing with arbitrary bytes inside an array. You're only supposed to be concerned with the boundaries that separate the space where one int ends and another one begins. So whenever this is understood to be a pointer right here, this number isn't added verbatim. It is automatically scaled by the size of the figure being addressed. And it knows what the figure is based on the type system. In this case, it knows that it's pointed to an int. That's why there's – those are four bytes. That's why all these rectangles are seemingly four bytes wide. In this example up here, ampersand of pi evaluates to fraction star. So when I start treating it like it's an array even though it's not, I have no choice but to rely on this rule right here to figure out where the oneth, counting from zero, fraction would be, starting at this address. Okay? And that's why it advanced eight bytes beyond the base of that entire drawing to figure out where to start dealing with things. Okay? Does that sit well with everybody? Yes? Yep?

Student:Is there any way to get access [inaudible].

Instructor (Jerry Cain):There is. You can actually use some casting tricks. I'll do that in a second, okay? I will do that, like, probably in two or three minutes, but I'll do a really good example for that question. Okay? What I hope is a good example. I should say it that way.

Student:Permission for the array [inaudible].

Instructor (Jerry Cain):That's correct.


Instructor (Jerry Cain):That is correct. So just because I write a one here and I have code that actually tries to do it, doesn't mean that while it's running, it's gonna succeed. If it succeeds, it does place the bit pattern for one there. It might also crash. Okay? Or it might actually succeed. But this space right here? We'll see this very shortly. This space right here and this space right here is gonna be associated with other local variables that happen to be declared above this and below this. There's no impact here because this is the only declaration. But if I were to declare int I right there, and double D right there, the model we're going to use – and this is the model that's really used – is gonna packs all local variables into a little thing called an activation record. That's just fancy terminology for the block of memory that's set aside for all local variables in a function. So if you touch this right here, you're really touching some other local variable. You're touching the one over here that was declared after the array. This is the way it kind of works out. Okay? Does that make sense? Okay. There's a couple more rules I want to talk about here. When you dereference this right here, if you put an asterisk in front of this ampersand, they kind of negate one another. So this is synonymous with array of zero. The extension of that for this line is that if I put an asterisk in front of this, the pointer arithmetic is done first so it computes the address of the integer you're interested in, and then the asterisk actually brings you into that rectangle. It is synonymous with that right there. So that's why when you do something like array of negative four, you're really doing this. Oops. Pointer arithmetic brings you not four bytes before, but 16 bytes before that address.

You dereference it to actually sit in, and find yourself in a rectangle that's capable of receiving the 77. Does that make sense to everybody? Okay. That's great. So in a second I will start mixing arrays and structs, but to get to your point with regards to how do you access the internals if you want to do it. You rarely want to do it, although there are – actually it turns out that there are features of assignment two that rely on this type of knowledge. I'm not encouraging you to write this code, but I don't see the disadvantage of understanding it. If I go ahead and declare, let's say, an int array – I'll keep it small – five. Oops. I get this right here, one, two three, four. If I go ahead and set array of three equal to – let me do 128. Uninitialized, left uninitialized, left uninitialized, left uninitialized. I actually put a 128 there, and I'm drawing in the right half of the box because that's really where the bits will be updated. Everything to the left of the 128, right here, will be all zeros. And this will be one followed by seven zeros. I'm just emphasizing the fact that the 128 happens to fit in the lower of the two bytes. Okay? If I do this, the data type of that is int star, right, unless I do this. Okay? Now, ARR is brainwashed momentarily into thinking that it addresses a short. And there, incidentally, is space for ten shorts there. Okay? The way ARR, or the way the result of that expression sees it, that's short of zero, short of one, short of two, short of three, short of four, short of five, short of six. Make sense? Okay? This is kind of what you were getting at, I'm assuming. Zero, one, zero, two, four, six. It's gonna write a two in that byte right there. Okay? So when I go ahead and I cout << ARR of three << ENDL, you are not printing out a 128. You're actually printing out 512 plus 128. Everyone know where the – where I'm recovering that 512 value from? Okay. If this is the number two, and I multiply it by two eight times to get into that position right there, that's really two to the ninth plus two to the seventh. Okay? And so it's going to print out whatever that number is right here. Okay? I can go arbitrarily nuts with all of this casting. If I wanted to set ARR of one address, and I want to cast that to be a CAR star, I want to add eight to that, and I want to cast that to be a short star. And I want to find the third short after that and set it equal to 100. I think I have the patience to go through with this and show you what's going on right here. ARR of one is that box right there, so the ampersand is that right there. Pretend just for me that you're a CAR star so I can do something funky and add eight to you, but have it mean eight time the size of CAR plus two plus four plus six plus eight. So that's the address of this right here. Okay? You're a CAR star. No, you're not. You're a short star. Okay? Pretend you’re the base address of an array. I don't care how long of an array it is. Just go three shorts forward of that short star that's right there to figure out where to write a 100. This is the zeroth one. This is the oneth one, the twoth one. The third one. This is where the 100 would go. Okay? Don’t write code like this; just understand it. Okay? This make sense to people now? Okay. Let me start blending structs and fractions to get more interesting examples. We're – come Wednesday, we're gonna be able to do meaningful stuff with this knowledge. Right now it's all gibberish, and it seems like it's just contrived code. It is certainly contrived code because the examples need to be small and focused.

But once we understand how to deal with memory – and that's what we're really doing with all of these examples – you'll be able to take the understanding of memory and write meaningful generic code in C. C we don't have templates. That's how we dealt with generics in C++. In C we have to leverage off – over the fact that we know the size of everything, and we know that bit patterns represent vales to be able to write a generic binary search, or a generic linear search, or a generic swap function, or a generic vector, or things like that. Okay? And that's what Wednesday and Friday, and probably next Monday are going to be all about. Did you get this example? Okay. You'll – if you don't get it yet, section handout come next Tuesday, not tomorrow, will deal with more of this stuff. Okay? Let me go on to structs with arrays in side of them. I'm gonna need two boards. That's why I'm erasing so much. I should just erase with the chalk. Here's the struct definition I want to deal with. Struct student. Okay? I have a field inside. I want to store an exposed character pointer. You're not used to doing this because you had a string class in C++. We actually don't have those in pure C. They're always represented as character arrays, okay, where the characters in the string are laid out side by side. Rather than there being a period at the end, there's what's called a "null character," The backside zero character that's at the end. This one's gonna happen to reside as a string outside the struct. All I want to do is I want to store the address of the zero character of the entire name. That's different from this, SUID of eight. I want to store the individual digits of a seven-digit SUID in an array that's wedged inside the struct. This will become clear from a picture in a second. And then at the bottom I just want a normal integer num units. And there we have our definition. Okay? What's a picture of one these things look like? It looks like this right here. There's my CAR star. There's my static character array of length eight. And there's my num units field. So this is a sixteen-byte struct, okay? You're not used to looking at these things this way, but in memory diagrams, at least usually – at least for the next day, you read left to right, bottom to top because you're always worried about the lower addresses. The address of the entire struct is coincident with the address of that CAR star. Okay? So to see this arrow, you don't actually know whether or not it's a student star or a CAR star star. Yes, we'll be dealing with double pointers. Okay? If I go ahead and declare four of these things in an array, student, pupils of four, then I get four of those things laid out side-by-side. The way of laying down the elements, the base elements of an array, is the same whether you're dealing with Booleans or ints or doubles or structs. So actually you're gonna have four of these things. Draw those right there to make it clear that we're – that's the zeroth, the oneth, the twoth, the third. Okay. So I have all 64 bytes of memory for my packed array of four items. Each item is a struct, and the same skeleton, or the same view of memory, overlays each of the four quantum elements.

So when I do this, pupils of zero dot num units equals 21, you know that 21 goes somewhere, and this isn't too bad. You know it's gonna go in the space that's dedicated to the num units field of the very first student struct. Okay? If I do this, pupils of two dot, let's say, name is equal to – there's this function I want to talk about – strdup Atom. S-T-R-D-U-P, strdup is actually shorthand for string duplicate. Okay? So what this does as a function is it dynamically allocates just enough space to store the string – in this case Atom – and then it actually writes down Atom in that space, and as a function returns the address of the capital A. Okay? These four things right here are all local variables – I'm sorry, the entire array is a local variable. It resides in a part of memory called the stack. I'm assuming you've heard of the word "stack" before. You may not have heard it talked about in the case of memory. But the dynamically allocated string – and this is dynamically allocated – that is drawn from a part of memory called the "heap." Logically, we assume that it's five bytes. It actually makes space for that backslash zero. Okay? And then the address of that new figure right there, after it's been initialized with whatever this string logically is, gets returned, and it's dropped in the name field of the third, counting from zero, okay, struct. So this gets placed right there. Okay? That's very different than this type of setup, where pupils of three dot name is equal to pupils of zero dot SUID plus, let's say, plus six. There's a lot going on in that line. Let's just look at the right hand side, pupils of zero dot SUID. Pupils of zero SUID of zero is that right there. Pupils SUID of four is right there, but I don't have any array index dereference going on there. I have just the raw array name right there. That's synonymous with that arrow right there. In spite of the fact that this is a big, nasty expression that evaluates to a pointer, when I add six to it, it's doing pointer arithmetic against a CAR star. Okay? So the six is effectively, even though it doesn't matter, it's scaled by the size of a character, which is one. And so the overall right-hand side expression is the address of that character right there. Does that make sense to people? Okay. The tale of that arrow is assigned right there. All I did was I assigned an actual value to the name field of the very last student in that record. It happens to be the address of something that resides inside the entire figure. Okay? If I do this, pupils of one – oops, messed up. Str – not dup – cpy of pupils of one dot SUID four zero four one five XX. Right there. Strcpy is like strdup, except it doesn't actually allocate any memory. It assumes the address where you should copy the string is identified by the first arguments. So what this does is beneath the surface – in strdup as well, but specifically in strcpy – there is some little four loop that keeps on copying characters one after another until it finds a backslash zero, and it copies that as well. In fact, strdup, after it calls C is equivalent of operator new, which is called malick; it actually calls strcpy. This right here, on this one-by-one basis would write a four right there. And then a zero, and then a four, and then a one, a five, an X, an X, and a backslash zero would be written right there, and then it would return. And it's completely useful because of its side effect of copying characters around. You have to make sure that the address you pass in there actually points to character space that's really under your jurisdiction because it's gonna try and write characters to whatever address that's specified there. You better make sure it's a good address. Okay? This one right here, strcpy again, of pupils of three dot name, one, two, three, four, five, six. – and that's enough; don't worry about the fact that that's not really a Stanford ID. It was 70 years ago, I'm sure. Pupils of three dot name. That evaluates to whatever this evaluates to. That means that location right there, okay, is what's identified as the place where characters should be written. Okay? It follows exactly the same recipe that the first called of strcpy did. It's not as if this byte and this byte are auto-declarated behind the scenes as things that can always store characters. And it's not as if that's turned off right here. As far as strcpy is concerned, it sees this as a base address of an arbitrarily long character sequence space where characters can be written. And so what's going to happen is it's going to write the digit character one right there. It's going to write the digit character two right there.

It's gonna do exactly the same thing right there with a three, a four, a five, and a six. It's gonna write a backslash zero – oops, not there – in the left-most byte of that name field right there. Okay? Does that make sense? And then strcpy's, like, I did my job. I'm awesome. I'm gonna return back to the main function. So when you come back, if you want to print out the number of units this student was taking, it's a lot. Okay? It is three times to the 24th plus four times two to the 16th plus five times two to the eighth plus 6. They'd have to petition to do that. Okay? If I go ahead and I print out this string right here, and I actually pass this in, if I do cout << pupils of three dot name, it actually would print one, two, three, four, five, six, and that's it. Okay? That's because it just receives the address of something it trusts to be a character followed by probably another one followed by yet another one. It just crawls over consecutive bytes of memory until it incidentally finds one with a zero in it. Does that make sense to everybody? Okay. Again, you will not be writing code like this, okay, but you should be able to understand, at least believe that the drawing I'm putting here is consistent with the code. You had a question?

Student:Yeah. [Inaudible].

Instructor (Jerry Cain):Yeah, that pupil's three dot name. So this right here, the name field – it's not ampersand of name. So I don't pass the address of this box. I have pupil of three dot name evaluate itself. Okay? So if this is the number 1,000 in here, it's because the address of that box right there is really a 1,000. And that's what passed as strcpy. So it starts copying characters to address 1,000, and then 1,001, 1,002, etcetera. Does that make sense? Okay? Question in the back? No. Okay, you guys are good. One of the thing I – yep, right there.

Student:Look at variables in the string. What happens [inaudible]?

Instructor (Jerry Cain):If I – this right here, before this block ends, unless I want to pretend that atom is a helium balloon, and I want it to fly off and never be recovered, I would have to free it before this code block. And I could just do that by passing pupils of two dot name to free. Okay, free is the C equivalent of this delete thing you're familiar with. Okay?

Student:It doesn't really tell us [inaudible]? Teacher:

Nope, not at all. That's Java. That's not C++. Okay. One of the line pupils of 7 dot SUID of 12 – let's not do that; let's do 11 – is equal to the character A. Just because there are structs involved doesn't mean that it intimidates the executable. It will go ahead and it will do the manual pointer arithmetic to find out where the seventh student would reside if this address, if the array actually existed there. So I would go to not the zeroth, the oneth, second, or third. I better go to the fourth, the fifth, the sixth, the seventh. There's a gesture, a little phantom halo, around the space that we're identifying, or pretending, these pupils of seven. Then I jump to its SUID field. That would reside and begin right here. As if I legitimately had space for eight characters right there. It even double whammies the system and goes beyond that array boundary. This is four – I'm sorry, this is zero. This is four. This is eight, nine, ten, eleven. It would write this scattered A, 65, in that one little byte over there in memory. Would it succeed while it's running? If it crashes, no. If it doesn't, yes. Okay? That's just the way it will work out. Okay? Does that make sense to people? If I were to go ahead, and I were to print in this state right here, if I were to print just the address itself, all I know is that the other three bytes are uninitialized. If I print this entire number, okay, all I can tell is that it would be less than two to the 24th. That's all I know because I zeroed out the really large contribution to the overall thing. Does that make sense? Okay. If I were to print out this right there, if I were to pass that CAR star, it has no idea that it's the address of a character that happens to be in larger string that starts before it, so if I were to pass that address to cout <<, it would print one XX, and that's it. Does that make sense? Okay. Just to make that clear to everybody. No boards. Here is – let's just say it's a dynamically allocated string that has Colleen backslash zero in it. Okay? If I pass that address to cout, it prints out the entire name Colleen. If I use my pointer tricks to pass that to cout <<, it still begins a null terminated string. It happens to not mean as much to us, but it will still print it out textually. It would print out L-L-E-E-N. Okay? If I pass that right there, it would print out E-N. If I pass that, it would just print nothing because it doesn’t actually print the backslash zero. So that's basically this weird representation of the empty string. Okay? Does that make sense to people? Okay. Very good. Okay. You guys are doing okay? Good. What I wanna do now, is I wanna start talking about how to write generics in C. We have enough experience with this memory business so that I can actually write a real function in C that leverages off of this stuff. Let me just write a function I know you've seen before, and it's actually charmingly simple for us to go out because this is all very difficult compared to what I'm about to write. Actually, this board's better. I want to just write a really simple function, and use advanced memory terminology to describe what happens. Void swap int star – actually, you probably haven't seen this version before if you've used references in the past.

What happens is that you've probably declared two integers, X seven, int Very good. Okay. You guys are doing okay? Good. What I wanna do now, is I wanna start talking about how to write generics in C. We have enough experience with this memory business so that I can actually write a real function in C that leverages off of this stuff. Let me just write a function I know you've seen before, and it's actually charmingly simple for us to go out because this is all very difficult compared to what I'm about to write. Actually, this board's better. I want to just write a really simple function, and use advanced memory terminology to describe what happens. Void swap int star – actually, you probably haven't seen this version before if you've used references in the past. What happens is that you've probably declared two integers, X seven; int Y is equal to 117. And I'm concerned with the call to swap, where I pass in the locations of my X and Y variables. C – and I'm writing up here. C function right here has no templates. That's relevant. It also has no references. Okay? So there are few meanings – fewer meanings of the ampersand symbol in C. What I'm doing here is I'm assuming I own X and Y as little jewel boxes, and I pass the addresses of those to the sway function so it knows, at least, where to go to move byte patterns around. That's effectively what's done by the swap when you think about it memory terms. Okay? Does that make sense? So this is a function I haven't written yet, but I know that this thing called AP and BP – the P is there for just to remind myself that it's a pointer – this points to the X box and the Y box that has a 117 in it. So what I want to do is I want to exchange the one – I'm sorry, the seven in the 17. The way I do this is I declare a tenth variable, and set it equal to what I get by traveling from the AP pointer to the space it addresses. So I get temp right there. How is it initialized? It's not set to this number. The asterisk says please hop forward once to find the place that should be copied. The bit pattern for that seven is replicated right there. Because tenth and the space addressed by AP are both ints, the bit patterns mean the same thing in both contexts. Then I do this. A little bit more involved, but you understand, certainly, what's going to happen. You may not – if I wrote a more difficult version of this type of function, it might not get it. But what happens here is the space addressed by AP – not this space right here, but the space addressed by it – is identified as the L value, or the recipient of whatever the right-hand side evaluates to. The right-hand side evaluates not to BP, but to what it addresses. Okay?

So this 117 is replicated right there. The four-byte representation of 117 is replicated in the space addressed by AP, and then finally I do this. BP addresses whatever was stored here previously. And that's how I get a seven right there. Let's get a better seven. Okay? Now, what I did there, algorithmically, had very little to do with ints. The only part of the fact that – the only fact about ints that was involved was that the figure's being rotated and swapped for four bytes. Okay? Make X and Y floats. Make this float star and float star, and make that a float. The pictures can even stay the same in terms of the drawings – in terms of the sizes. They're still four bytes, and as long as I exchange all these things, okay, then I'm going to effectively achieve the swap, even though I don't necessarily care that they were floats versus integers. Okay? If I pass in double stars, or CAR stars, or bowl stars, or struct student stars, the same rules apply. Okay? It's bit pattern swapping is what it's – what it really is. Okay? You know enough about generics from CS106 PM, 106X to know that we would probably use references – because references are prettier, right – from this point forward. And we would also templatize it if we wanted the same block of code that we write to be used in different type scenarios. We have neither one of those in pure C. But there are several situations where you do benefit by actually going the extra mile and making the code you write generic. Okay? Well, it's not pretty. Turns out it's actually kind of – it's something of a hack to write a generic function in C, but it is the way it's done. And once you understand memory really well, you stop thinking of it as a hack, and you start to see it as very, very beautiful. Okay? As the way it actually works – because you understand what's happening on your behalf when you swap these two figures – and you just specify the addresses. Or you linear search this array, and the algorithm for linear search is the same whether or not ints or strings or struct students are involved. Binary search the same way. Merge sort, quick sort, all those things you learned about, and templatize, in 106B still can be done in languages older than C++ using this information about memory that we've learned over the last two lectures. Okay? So come Wednesday, I will go generic on you with this function right here. And frame it in terms of generic pointers and generic byte swappers. Okay? Have a good night.


Lecture 3 Programming Paradigms


In writing C and C++ programs to run under Unix, there are several concepts and tools that turn out to be quite useful. The most obvious difference, if you are coming from a PC or Macintosh programming background, is that the tools are separate entities, not components in a tightly coupled environment like Metrowerks CodeWarrior or Microsoft Visual C++. The appendix at the end of the handout gives a summary of some basic UNIX and EMACS commands. The most important tools in this domain are the editor, the compiler, the linker, the make utility, and the debugger. There are a variety of choices as far as "which compiler" or "which editor", but the choice is usually one of personal preference. The choice of editor, however, is almost a religious issue. EMACS integrates well with the other tools, has a nice graphical interface, and is almost an operating system unto itself, so we will encourage its use.

Handout 5 UNIX Development
18 pages

Topics: Creating a Generic Swap Function for Data Types of Arbitrary Size, Void* Type for Generic Pointers, Implementation of Swap Function Using memcpy, Client Interface to Generic Swap Function, Pros and Cons of C Generics vs. C++ Generics, Errors Resulting from Improper Use of C Generic Swap Function that Compile, Swapping Pointers, Pitfalls when Swapping Pointers Using Generics, Implementing a Generic Linear Search, Implementing a Generic Linear Search, Using Casts and Pointer Arithmetic, Comparing Memory Blocks Using memcmp or a Comparison Function



Instructor (Jerry Cain):Hey, hey everyone. Welcome. You made it through a week of 107. I have two handouts for you today, although I really only have one. I have one fresh handout, and I also have hard copies of the discussion session handout from yesterday, which I know not everybody can go. So if you need a hard copy of that and don’t want to print it out yourself, then come grab a copy before you leave. When I left you last time, I had gotten through the implementation of swap that was specific to ints. So I want to make a few points about that and then go generic on you by implementing the C version of what we would do in C++ using templates.

This is more or less the code I wrote for you last time. The idea being that AP and BP actually address in some mysterious space. They know the address of it, but they don’t know what the source of it is. Whether it’s the heap or subfunction call or whatever, but algorithmically what happens is that the two integers in those boxes are effectively exchanged. Now the 106A or 106B way of saying this, in spite of the fact that it uses pointers, is that it actually rotates the integers.

The 107 spin on this, which I think is more helpful for the code we are going to write in a second, is that, oh, I don’t really care that they’re integers as long as I exchange the representations for those things – the 4-byte representations for these two integers. Then when we go back to the code that calls this, it will notice that two of its integer variables, whether they are embedded inside a raise or struct or they are two stand-alone integers – their bit patterns will have been exchanged, their representation is being exchanged so that when they look at those they’ll be each other’s integers. Okay, does that make sense to people? Yes? No? Okay, it did not or did? We got a nod.

The reason I say this is because the implementation here – and I’m going to frame this with a 107 bent on it. This declares a 4-byte figure, and this assignment replicates the four bytes held by this box in that box right there. It knows that we are dealing with a 4-byte figure because this and this and temp are all typed in a way that’s related to an integer. Okay? This does the same thing, takes a bit pattern right here and replicates it in the space addressed by AP, and then finally remembers what used to be pointed to by AP and puts that in what’s pointed to by BP. Okay, so it’s really a bit pattern rotation.

There is an implicit knowledge of the number of bytes that are being moved around because ints are just understood even in compile-time to be 4-byte figures. If I want to use this function to swap doubles, I am not going to be able to do it. If I want to be able to swap two structs or two classes, I am not going to be able to do it.

What I want to do is I want to write a version of swap that can exchange – I’m gonna design it to exchange two arbitrarily sized figures. I’m sorry – the two figures themselves will be the same size, but I don’t want to constrain it to be 4 bytes. So what I want to do is I want to write this as a function void swap, rather than accepting an int * and requiring that I get the address of an integer or something that’s posing as an integer. I want to be able to pass in an arbitrary address here, and I don’t want to constrain it to point to any one type.

The way you do that in Pure C and even in C++ technically, but in Pure C is to write down a generic pointer type. That is a type void *. Now, that doesn’t mean that it points to nothing; it just means that it points to something that doesn’t have any type information about it. Okay? And I’ll put down VP1. The second argument will be VP2. The set-up here is that VP1 and VP2 are addressing some things that begin at the addresses that are stored there.

Now I draw them as L’s as opposed to rectangles because I don’t know how wide they are. Does that make sense to people? Okay. And it’s really a generic address. It’s just an arbitrary location in memory. There may be a character, there may be a short, there may be a Boolean and there may be an unsigned long. There may be a struct Fraction or a struct student. We just don’t know.

Let me make the mistake of closing this off and showing you what problems we run into. If you try to do this – I’m not trying to be funny here, but if you try to do something like this – your heart is in the right place, but this is just plagued with issues. There is one quite clear problem with this and there is one slightly more subtle problem with this. You cannot declare a variable called temp to be a type of void.

Okay, that’s just a return type for functions. That just states that there is nothing to be returned. You can pass it in void as a lone argument to a function or a method to say that we’re not expecting anything or you can use void in the contents of void * to mean generic pointer. You cannot declare temp to be a void, okay?

The more subtle problem here is that you are not allowed to dereference a void *. And you may be like, “Well, why not?” And the answer is it doesn’t know how many bytes to go out and embrace as part of the identification process, okay. “Do I go out and do I deal with a 1-byte figure, a 2-byte figure, or a 4-byte figure?” There is no type information about this, so it doesn’t know whether it is 4, 16 or 128 bytes.

Does that make sense, people? It has no size information about the thing being addressed at all. So the official thing to do, recognizing that we still want to rotate bit patterns, is to expect a third argument. Int called size where size is supposed to be explicitly stated as the number of bytes making up the figures being swapped.

So at least it has more information than it had before. It actually doesn’t really care whether they’re 4-byte integers or 4-byte floats or a struct with two shorts inside. As long as I exchange the 4-byte bit patterns, I am effectively swapping the values. This is how you do it – (char) (buffer) (size). Our version of GCC and G++ allows you to declare arrays with a size that depend on a parameter. So this might seem weird that I’m declaring a character buffer, but it isn’t really a character buffer in the C-string sense. I’m really just setting aside enough space to hold size bytes so it can function as temp does in that block of code up there.

I don’t care to interpret buffer as a string. I just want it to be this little storage unit where I can copy something. Remember how last time I went over this function called strcpy that knew how to copy bytes from one location to another location and it kept on copying until it found a � and it copied the � as well? There is a more generic version of that that is not dedicated to characters.

There is a function called memcpy. What that’s taken to do – it’s like strcpy, except it doesn’t pay attention to �, so you have to explicitly tell it how many bytes to copy to its memory location. If I write buffer there and I write VP1 there, that’s an instruction to keep copying bytes, you can think about it copying bytes one by one; one byte after another to the space addressed by this right here. Okay?

This is the source of those bytes. It doesn’t care about zero bytes. You may be copying 20 bytes of zeros; this is why you need the size parameter to be passed in so you know how many bytes should be copied. So before I finish this, let me just give you a sense as to what is happening here. Suppose this is in fact an 8-byte figure and this is the bit pattern that is right there. This declares something that is as wide as that. It's not to run the scale but I will just emphasize the fact that it is all characters.

This right here says, “Please copy stuff from that address into this address right here,” and it just does it byte by byte. It doesn't matter that they're not really characters. They are just bit patterns that are taken or digested; one byte at a time and the full bit pattern that’s right there is replicated in that perfectly sized space. Does that make sense?

Only in Java, it does; it doesn’t in C++, it’s only one byte. Yeah. The memcpy right here basically does the equivalent of that first line up there. It just took two lines here. Then what I can do is I can do a memcpy into the space addressed by VP1 from the space that is addressed by VP2 and copy the same number of bytes. That takes that right there and, as a bit pattern, replicates it over this space.

And then finally, I do this – copy to the space VP2, the stuff that was stored in buffer, and I get that then. So it achieves the same byte pattern rotation that you see in that very type specific version up there; it just does it generically. Okay, does that sit well with everybody? Now you may look at this and say, “It’s kind of ugly.” It is kind of ugly. There are actually a lot of problems with this.

This right here, that declaration of an array, is supported by our version of a compiler. True anti-C that’s compatible with all compilers actually doesn’t allow you to put anything other than a constant inside. I don’t mind if you do this. You can use the compiler that you have. But the real implementation to probably dynamically allocate a block that is that number of bytes, move it, use it as a temp even though you are copying to the heap as opposed to the stack, and then you get to free it at the end. So most of the energy is invested in the dynamic allocation and de-allocation of a buffer or a temp space. Okay? Do you guys understand this function right here?

Well you wouldn’t – you would call malloc, which is like OperatorNew from C++ and you would – I’ll talk about malloc when we get there, but it’s just the C equivalent of OperatorNew. I just like this version better because it’s a little cleaner and I want to talk about memcpy more than I want to talk about malloc. The thing about this – you say, “Okay, well, that’s great. I guess I have to deal with the void *s but it’s not that bad.” The problem is that lots and lots of things can be disguised as void *s.

Let me make the proper call here. If I go ahead and declare (int, x = 17) and (Y = 37) and I do this (* of X), (* of Y) and I pass in – you could pass in the number four, but that’s not a cross platform solution you want – not the size of four. That would actually return four. That’s the way the client has to interact with this generic function right here. Identify where those two ints are; the swap implementation doesn’t care that they’re ints, it just cares that they are 4-bytes wide, so it does the right number of byte rotations as far as these three calls. Does that make sense to people? Okay. The problem comes if you try to do something like this (double) – actually, this is not a great example.

Let me just do this (d = pi) (e = e). Just pretend that that makes sense. And I want to make the call for this. You do this. And the same code works. Let me frame some plusses of this right here. The same code gets used for both of those calls right there, okay? It emphasizes the fact that it’s this generic byte rotator. Think about what you’d have to do in C++. I know you probably know you’d use templates in C++.

The one perk of this over templates is that just this code gets compiled and the same assembly code that corresponds to this right here gets used for both calls. When you deal with templates, there are many plusses of templates, but it expands a compilation or a call to swap of int or swap of double in a template setting, actually expands two independent versions of the same code and compiles them in the int specific domain or the double int specific domain.

Do you understand what I mean when I say that? Okay. That’s not a tragedy if you’re only calling swap twice but if you call swap in a very large code base, you call swap fifty different ways with fifty different data types you get fifty different copies of the same code in your executable.

Okay, one is set up to deal with chars, one’s set up to deal with the shorts, one’s set up to deal with the ints, etcetera. This is very lean and economical in the way that it deals with the swapping process. The problems – I actually say, “I’m not trying to illustrate this as the best solution; this is just what C has to offer.” The problem is there are so many mistakes that can be made when you are dealing with a generic function like this.

Swap is pretty easy in the grand scheme of things but we’ll see in a second, that it’s actually easy to get the call wrong and for the compiler to tell you nothing at all, because it’s very easy to be a void *. Okay? You can pass in the address of a float, the address of a double and pass in 32 right here. It’s not going to work very nicely when you actually run it, but it will compile. Does that make sense to people?

So these void *s, particularly the cast we’ve been dealing with for the past two lectures, they kind of sedate the compiler enough so that it doesn’t complain when it otherwise would have complained. Okay? That might be great to actually feel like you might be making progress towards your goal, but you really do want the compiler to edit and coach you as much as possible.

So to the extent that you use generics in C and cast in C, you are basically telling the compiler not do as much work for you and you are just risking more when you actually run the program. Okay? You wouldn’t make this call, but just pretend you did. Suppose I do an int right here. I is equal to 44, and I do this, short s is equal to five, and logically, what I want to do is, I just say, “For various reasons, I need to view the different sizes but now I need the 5 and the 44 to logically exchange positions.” And you do this. And you just pass in the smaller of the two sizes.

The memory set up here is i is that wide, it has a 44 inside. S has a 5 inside. The VP1 and the VP2 that accept these addresses. Even though this is really an int and that is really a short, VP1 and VP2 don’t have that. It’s like they don’t have their typed contact lenses on or something. Okay, they just have the address itself and the only reason they know to access those two bytes and in this case, those two bytes is because we explicitly tell it how wide the figure is right there. Okay?

Now algorithmically follow the recipe right here. What is going to happen is it’s going to take this 5 and write the bit copy for the 5 in the left half of ( i ) right there. It is going to take whatever happens to reside right there; those two bytes, and replicate it down there. Okay. On a big-endian system, it’s going to put this 5 on the upper half of ( i ) and not clobber the 44, okay? And it’s going to take all of the zeros that used to be here and put it right there. So as a result of this call right here, ( i ) would take on a value of 5 times 2 to the 16th plus 44 and ( s ) would become zero. Does that make sense to people? Okay.


It would be a little bit different on a little-endian system. It actually would kind of get it right on a little-endian system, okay? But it would be a complete miracle that it is.

This isn’t a – it’s actually – it depends on what you call a disaster. This will survive compilation because this is a generic address. This is a generic address and this is effectively an int. So the compiler says, “Are you calling the functions properly?” Yes, I’m getting two addresses and I’m getting a number. Okay, and then it just runs this code, it exchanges the bytes according to its own little recipe inside and then whatever side effect is achieved by that rotation of bytes, is what you see when you go and you print i and s out.

Absolutely, this recipe, because size of short was passed in. It doesn’t even see the 44 and the other two bytes. It’s just not in its jurisdiction. Okay.

Oh, that was a mistake. Sorry. This should have been doubled. This was intended to be a valid call. They are all valid calls actually, but only some of them work.

I just chose buffer, it doesn’t have to be that. Yeah, I just chose it to emphasize the fact that it really is just this generic store of bytes. I could have called it temp; I just didn’t.

Yeah, anything with very few exceptions, everything that’s legal C code is legal C++ code. Just some things about type casting are a little bit different, that’s it.

Yeah, absolutely. It would actually be worse. If I were to put the word int right here all it does is, it gives the swap function a wider pair of arms to go and grab 4-byte figures instead. So this as a byte pattern would be exchanged with that space as a byte pattern, okay? If it didn’t crash and it ran, ( s ) would just take on whatever happened to be the left two bytes in the ( i ) figure.

Not using this right here. In that situation, you would have to write either an int short specific version of swap or you would have to allow for the possibility that you pass in two different sizes, one for each of the two void *s. It would be complicated. It is probably the case that you would not make any of these mistakes.

If you are dealing with atomic figures like doubles and ints and shorts, you are just probably not going to make a mistake like that. Okay, it’s a little troubling that the language doesn’t actually enforce the rules you want it to, but in this case, it really doesn’t amount to much. There is an example where I think it does but it’s gets to be more complicated and that’s what I’m going to do next.

Well, you could, but then once you cast something, it forces it to evaluate it and so you wouldn’t be passing in the address of s. You’d be passing in the address of the constant 5, and that doesn’t make sense. Okay, you actually have to have storage associated with an address, so it has to correspond to the addresses of some variable.

You could if you wanted to, set like ints ss = s and then do a swap between ( i ) and ( ss ) and then after it was over, the set s = ss to whatever it turned out to be. You could do it that way. That’s a weird band-aid to overcome or to use, when it’s really the function that’s the problem. You would just want to write an int specific version of swap if you really needed to do this. Okay? Let me deal with data types that are already pointers.

You could. There is usually not very much reason to use const here. Usually you only use const – I know you are seeing a lot of const(s) in Assignment One. Const is only generally used when there is some sharing of information going on. This function owns its own copy of size. Does that make sense?

Oh, I see what you are saying. No, this still has to be evaluated, it would have to actually be a constant; like a 40 or an 80 or something like that.

It doesn’t have to do with the fact whether it is changeable. This is the fact that it’s an expression that evaluates to an int, but it is not actually an int. I don’t want to confuse matters. I think this is fine because we have a compiler that happens to like it. Some compilers might not, but this is just an easier way to write this function. Still dealing with this code right here –

Well, we can’t – if we do that, then we are trying to dereference a void *. You want to identify the address of the house with all of the bytes in it. Okay and that’s why you just let VP1 evaluate itself. If you try to dereference it, even if you can’t dereference it, if you dereference it, then you actually lose access to the address and so you don’t actually get access to all of the bytes that are there. Some compilers do let you dereference void *s, but I actually set up the warning so that you can’t. Okay? Any other questions at all? Okay, yep. Go ahead.

Well, if you set the warnings properly it doesn’t let you; it’s a compiler error. I’m sorry; it’s at least a warning in G++ by default. It just assumes that it is a 4-byte figure and it just deals with them as longs. But I want you to assume that void *s can’t be dereferenced. Let me write this block of code right here; there are so many ways this can be messed up. I have a char * called husband and I set it equal to *strdup of Fred. I have a char * called wife equal to *strdup of Wilma. I’m calling strdup here because I want independent copies of these screens to exist on behalf of this little snippet of code I’m drawing right here.

Let me draw the state of memory right here. I have this thing called husband, I’ll just put an “h” there to mean husband. I have this variable called wife, which just gets a “w” here, and then in the heap over here, I get space for Fred�, I get space for Wilma and this points to that as a result of second strdup call, that points to the first one as a result of the first strdup call. And what I want to do is I want to exchange, just for a day, I want Fred to do all of Wilma’s work and Wilma to do all of Fred’s work. So we can have [inaudible] point to it as if it really existed.

So what I want to do is I basically want to exchange the two strings. I want the husband variable to be associated with the Wilma string and I want the wife variable to be associated with the Fred string. The correct way to call this, it’s actually quite confusing. I want to call swap. A very reasonable question to ask here is whether you need an ampersand, because you say, “Okay, husband and wife are already pointers.”

I’m going to write it the right way. I’m going to put address of husband, I’m going to put address of wife and I’m going to put size of char *. Now don’t think too hard until I say a few more things. I actually want to exchange the two things that are held by the wife and the husband variables. Does that make sense?

When I wanted to exchange two ints, I passed int *s to swap. Okay? If I want to exchange two char *s or actually, I want to exchange things that are that many bytes wide. If I want to exchange char *s, I have to pass in the address of char *s to swap. Okay, that way it swaps these things right there. Okay, does that make sense? So this gets associated with VP1 in the swap implementation.

This right here gets associated with VP2 in the swap implementation. The size of char * is the size of this thing right here. That means I get a buffer of characters that is four bytes wide. Even though VP1 and VP2 recognize these addresses as generic and as void *s, I know that they are really char * *s. Okay?

The way this works is that this implementation forgets about the fact that there are these things over here and it forgets that these things really are char *s, it just rotates the bytes. So what happens is that this as a pattern, identifies the – I’m sorry, this as an address identifies that pattern as something that should be replicated, so it copies the address pattern right here and it happens to be interpreted that way.

Do you understand what I mean when I say that? This material is replicated right there. I draw it as a pointer, not because this knows it’s a pointer because we know it’s a pointer. Okay. This right here – I’m sorry, this material right there is replicated right there. And actually, technically that still points to Wilma. Okay?

And then finally this is updated to actually point to that right there. Okay, now it’s a lot of arrows that are moving around but the “h” – the tales inside husband and wife are actually exchanged. Nothing happens to the capital F; nothing happens to capital W and all of the characters after them, they stay put. I just have Fred and Wilma as husband and wife exchange names.

Okay and there is some confusion in the matter because C-strings are just not as elegant as C++ strings and Java strings; they’re very manual and exposed character arrays. But you have to exchange the char *s. Okay? The problem with this is that if you forget to do that right there, it will still compile and it will still execute. And it will actually still run, and it will do something. It will not crash. I promise you, okay?

Let me redraw everything. I won’t be so careful with the drawings of Fred and Wilma. Here’s Fred�, here’s Wilma� in memory, here’s husband, here’s wife, with an “h” and a “w.” There’s that and there’s that. So I redrew it, its set-up and forget about the ampersands being there. What actually happens now – 4 gets passed to swap. So it's going to be rotating 4-byte figures. Okay? But I kind of mess up a little bit.

Even though I am passing in a char * here and a char * there and I’m passing in size of char * there, this address gets stored in VP1. This address gets stored in VP2. Without the asterisk, it doesn't back up one level. Okay? It actually gives you the address husband and wife actually evaluate to the tales of those pointers right there. So whatever the address of capital F and capital W are here, are stored there and there. They're evaluated and passed directly to the VP1 and VP2.

So swap is like okay, I got two addresses and I’m supposed to swap 4-byte figures. It goes and it actually copies, wilm there, and it leaves the a alone, and so those two strings would become – it would actually change the character strings without changing husband and wife itself to wilm and freda, okay? Does that make sense to people? Do you understand why it won’t crash? It's actually accessing, even though it's not the material we wanted. it’s the material that’s under the jurisdiction of the code block. Okay, does that make sense?

If I were to do this, it would not care why you would do that, but think about the compiler was like, "Okay, I’m happy with that." “Yeah, I have two voids *s coming in,” one happens to be a char *. One happens to be a char **. What would happen is that the address that is stored in wife would be placed as the first four bytes of the fred string. Does that make sense to people? This right here would be replicated right there.

I can tell you, you can print it out, it's going to be something; it's not going to be a pretty string, but it's going to be a string that is no larger than four characters. It may be smaller because there may be a zero byte involved in the address. They would be random characters question marks, diamonds – whatever you see when you open one of those binary files accidentally. Okay? All those little numbers that don't happen to be letters of the alphabet or numbers or periods or what have you.

This right here, wilm right there, this is the problem. I’m sorry, the Fred that used to be there would actually be exchanged with this pointer so you would lay down Fred as a byte pattern in this thing that is going to normally be interpreted as an address. Okay? So that means that whenever 4-byte figure that corresponds to, if you pass wife to see out [inaudible] it's going to jump to the fred address in memory, which you certainly do not own if it doesn't crash because it's not inside the stack or the heap, which probably will be the case. But if it doesn't crash, it's just going to print out random characters that happen to reside at fred, interpreted as an address. Does that make sense? Okay.

I'm not encouraging you to write code like this, and even if it works, I'm not trying to get you to write it in as complicated a manner as possible. I'm just trying to communicate the things – when you write code and you kind of mess up on the type system, it’s not a tragedy because the compiler will usually tell you there is a problem, unless you're dealing with generics, right here. Okay? And then it says, “Okay, I’m just going to trust you because you told me that it was just a pointer. And I can't argue with just a pointer when they are pointers.” So you have to be oober careful about how you code when you're dealing with generics. Okay, very powerful, also very dangerous. Okay?

Because of the asymmetry, right here let me draw this a little bit more cleanly. When a husband points to fred�, that’s 4 bytes, what I just underlined right there, right? When I pass an ampersand of w, this points to wilma�. This is a 4-byte figure, but it turns out, that doesn't matter. Because the addresses I passed to swap are, this right there and that right there. That means that these four bytes will be exchanged with those four bytes. Okay?

And just to make it clear how ludicrous it is, that means that fred as a bit pattern will be placed f r e d, asked the value for f times 2 to the 24th, asked the value for r times 2 to the 16th; all be assembled and interpreted later on as a regular pointer, okay? Whatever this is, whatever the bit pattern is, it logically can be set up to point to capital W, although it is going to be interpreted not as a pointer but as four side-by-side characters. Does that make sense?

Okay. There are all types of mistakes that can happen here, you can include both ampersands and get it right. If you put a double * there, it actually works because all pointers are 4-bytes, at least on our systems. That doesn't mean it's the right way to do it. If you want you can put size of double *****, and it will work. Okay? But you really want to be clear about what you understand to really being exchanged. If you really know you are exchanging char *s by identifying two char **s, you should for clarity's sake, not just because you can get away with it, you should put size of char *, right there. Okay, does that sit well with everybody? Okay, good.

I want to graduate to a new example. Let me once again write a really simple function for you. Int – I'm just calling it L search. While I pass in an int, I'm going to call it a key int array int size, and I want this to just be a linear search from front to back of the array for the first instance of key in that array, and I want it to return the index of it or -1 if it can't be found.

So algorithmically, this is very 106A. What I want to do is this. I want to be prepared to exhaustively move over everything, but if along the way I happen to find array of i matching this key, I want to go ahead and return what ( i ) turned out to be, okay? If I get this far because I've exhaustively searched and found nothing to match, at the bottom, I return -1.

Now, I know that you think that that's all reasonable code, and you wish that all examples were like that, but they're not. The only reason I'm putting this up here is because I want to frame the implementation with the new vocabulary that's now accessible to us because of what we talked about for the last three days. This 4-loop, that right there, that right there, that's the same whether it’s int specific or generic.

There's a remarkable amount of stuff going on in that line right there. There's point arithmetic. There is basically an implied asterisk, that comes with the square brackets. Does that make sense? There is the double equals that actually does a bit wise comparison of the two 4-byte figures to figure out whether they are equal. Okay, does that make sense?

So if I want to go generic here and I don't want to engineer it to just deal with ints, that means that I have to pass in more information, more variables than I'm actually passing in right here. When that ( i ) is placed right there, let’s say it evaluates the three. It knows to go from the base address of the array, plus three times the size of int, right. Okay? And that's how it identifies the base address of the thing that should be compared to key on that particular iteration.

If I make this a void *, then all of a sudden, I lose the implicit point arithmetic that comes with array notation. In fact, you can't use array notation on a void * for the same reasons you can't dereference it. Okay, there is no size information that accompanies it. So this is what – and I also lose the ability to compare two integers. When I know they are integers, it’s enough to just look at the bit patterns in the space that we call ints.

When we don't know what they are, we don't necessarily know how to compare them. Maybe double equals works, okay? Probably not necessarily – certainly, it won't with strings. So if I want to write the generic version of this, I will have a better time doing it if I frame it in terms of a generic blob of memory. This is going to be the array that is linearly searched in a generic manner.

In order for me to advance from ( i = 0) to (i = 1) and know where the 1th element begins, I'm going to have to pass in some size information about how big the elements are, so that I can manually compute what the addresses are. Does that make sense to people? Okay. I also am going to have to pass in a comparison function so that I know how to compare the key to the material that resides in what is just taken to be the ( i ) entry in the array. Okay, I can’t use double equals very easily.

So what I want to do is I want to write a function that returns the address within the array of the matching element. Just a generic picture right here. Okay? Maybe it's the case that this is the key and I have no idea what it is accept that, it's as wide as these boxes are right here, okay? I'm going to specify the key by address. I'm going to specify the array by address. I’m going to tell me how many figures are in here.

I'm also going to tell myself how wide these individual boxes are, so I know how many times to 4-loop, and how far in memory to advance with each iteration, okay? I also have to be able to compare this pointer to that pointer somehow, or not the pointers themselves, but the material that’s at those pointers, okay, or at those addresses so that I can decide whether this matches this and I should return that value.

If on the next iteration it doesn't match, I have to be able to compare this value to that value. Or rather, the things that are at those addresses to see whether or not, there is a match. I have to rely on a comparison function to do that for me. Okay? So this is the prototype for the function I want to write (L search void * key) (void * base). I’ll call it base because the documentation for functions like this, actually calls it base. It just means the base of the array, okay?

I want to pass in (int, n). That's the number of elements that that the client knows is in the array to be searched. (int, OM sized) and that’s all the passing for the moment. I have to pass one more thing, but I'll do it a second, okay? I basically want to do the same thing up there, but I want to be able to be prepared to return an address as opposed to an integer, okay. And I have to implement this thing generically.

What I want to do is I want to be prepared to loop this many times. I don't think we're going to argue with that. Okay, with each iteration, what I have to do is I have to compute the ( i ) address or the address of the i's element. This is how you do this. This is actually something new. I want to set a void *. I'll call it elem address. That's going to be a variable that is bound to the tale of that or the tale of that or the tale of any one of these things, depending on how far I get, okay? Can I do this?

Your heart is in the right place if you try to do that. You are trying to jump forward i elements and you just want the compiler to know how big those things are, okay. It doesn’t know how big they are. So you say, “Okay, well, I will tell it explicitly how much to scale each offset by,” so at least numerically, that is correct, okay?

Take whatever the base address is and march forward this many quantum elements where the elements are just basically identified by size, okay? This is still point arithmetic, okay, and it’s against a void *, so the compiler doesn’t care or most compilers don’t care that you know that this numerically this should work out.

I’m trying to manually synthesize the address of the ( i ) element but from an address standpoint it’s like, “No, I don’t care whether you are being smart over here. You are telling me to do point arithmetic against a type less pointer so I don’t know how to interpret this and I’m not just going to assume that you are doing normal math here.” So the trick is to do this. It’s totally a hack, but it’s a hack that is used everyday in generic C programming. I want to base and I want to cast it to be a char * and after I do that add i times the elem size, it is called the void * hack, at least in 107 circles it is. That’s one full expression.

What I am saying is seduce this base or whatever number it evaluates to, to think that it’s pointing to these 1-byte characters, okay? Then if I do point arithmetic against a char *, then pointer math and regular math are exactly the same thing. Does that make sense to people? So this right here would end up giving you the delta between the beginning of the array and the element that is of interest to you for that iteration. Does that make sense? Okay. This overall thing is a type char *, but when it is assigned to a void *, that’s a fine direction t go in; it’s going from more specific to less specific and it doesn’t mind that. Okay? Make sense? Yes/No? Does not make sense.

You understand how this is the quantum number of bytes between the beginning of the array and the element of interest? You understand that that item applies to the base address of the entire array itself. The only reason I am doing this is to kind of get the compiler to work with me. It won’t let you do anything like point arithmetic against a void *. You’re even correcting, by scaling up by elem size, you are actually doing some of its work for you. Up here, this ( i ) right there is implicitly multiplied by size of int for you. Does that make sense? You have to nod or shake your head. It does not make sense.

Okay, this array is the base address, this is basically equivalent to * of array + i because this is a pointer and that’s an integer constant. It multiplies this behind the scenes by size of int. Okay? It will not do that in a void * setting because it doesn’t know what the implicit multiplication factor should be. So what I am doing here is I’m brute force doing the pointer math for the compiler, okay? I’m saying, “I’m dealing with void *s, I don’t expect you to do very much for me on void *s so let me just cast it to be a char * so that I can do normal math against a pointer.”

Some people cast these to be unsigned longs. I just happen to see char * more often than I see unsigned long, but they are both 4-byte figures where you can do normal math on them. It’s incidentally normal math with char *s because characters are 1-byte wide, so the scaling factor is just 1, okay?

This is the number of bytes between the front of the array and the element that you are interested in. You assign it to elem address so that on this iteration, elem address, something like that or something like this can be passed as a second argument where this is the first argument to some comparison function. Okay, and it comes back with a yes or a no as to whether they match. Does that make sense?

Now if I write this this way, then the best I can do – if I don’t pass in the comparison function, the best I can do is a generic memory comparison of the elem-sized bytes that reside at the two addresses to be compared. You could do this – this is one line. You could do this, that if it’s the case that memcmp of key and elem address – elem size double = zero, you can go ahead and return elem address.

Now, the one thing you have not seen before, I’m assuming, is this memcmp function. It’s like string comparison but it’s not dealing with characters specifically. It compares this many bytes at that address to this many bytes at that address and if they are a dead match, it returns zero. It would otherwise return a positive number or a negative number depending on whether or not the first non-matching bytes differ in a negative direction or a positive direction but we’re only interested in a yes or a no.

So that’s why we do these double equals right here. Does that make sense? If you wanted to, I don’t recommend it but you could do – if you hated double equals comparing integers, you could pass the address of your two integers here, passing size of int right there and it would do exactly the same thing that i double equals j does. Okay?

If you really just want to compare memory patterns and see if it’s a dead match, then you can use this right here. This is going to work for Booleans, for shorts, for characters, for longs, for ints, for doubles and floats because everything resides directly in the primary rectangle. Okay? It will not work very well for character pointers or for C-strings. It will not work very well for structs that have pointers inside. Okay?

That point in the material that actually should be involved for the comparison. Does that make sense? So this is something that could work if you didn’t want to deal with function pointers but you really should deal with function pointers. So let me just write this a second time. Then you go ahead and return null if things don’t work out. Okay, you give it “n” opportunities to find a match and if it fails you just return zero as a sentinel, saying I couldn’t find anything, okay?

Before I let you go, let me write the prototype for the function that we are going to write the beginning of Friday. (void * L search); the L is still for linear search. I want to pass in (void * key). I want to pass in (void * dates). I want to pass in (int n). I want to pass in (int OM size), and then I want to pass in the address of some function that is capable of comparing the elements that I know them to be. (int * ) – I’m used to asterisks there. You don’t actually need them if you have a parenthesis, but I like the asterisk there to remind me that it’s a function pointer. And it takes two arguments. It takes a void * and another void * and that’s the entire prototype of that function.

Okay, now of course that is all supposed to be one line. That means of the fifth parameter to any call to L search absolutely has to be a function that takes two void *s, say void *s like this and this, okay? And somehow translates that to a zero, positive 1 or negative 1, okay? It has to basically have the same prototype that memcmp has, okay? When we write this next time, I am going to go through the implementation.

It really just more or less replaces memcmp with cmpfn right there and it does what we want. Okay, but it is interesting to see how you use it as a client and search an array of integers using the generic version. How you search an array of C-strings using the generic versions. It’s just very complicated to understand the first time you see it, okay? That’s what we’ll focus on on Friday.


Lecture 4 Programming Paradigms

Programming Assignment 2 Six Degrees Instructions
10 pages

Programming Assignment 2 Six Degrees FAQ
1 page

Topics: Generic Lsearch - Prototype, Comparison Function, Implementation, Casting Void*S to Char*S to Compute Byte Offsets, Client Use of Generic Lsearch, Example of a Comparison Function for Integers, More Complicated Data Types and Lsearch- Example Using C-Strings, Comparison Function for Two C-Strings, With Arguments that Represent Char**S, Comparison Functions Where the Key is a Different Type than the Second Argument, Using a Pointer to a Struct as a Key in Order to Access Additional Data in a Comparison Function, Functions Vs. Methods, C Data Structures - Implementing a Non-Generic Stack of Integers, C Stack Interface, Implementation, Preallocating Memory, Client Use of C Stack, State of Internal Memory of the Stack, Growth of Memory when Stack Becomes Too Large, Implementation of Stacknew, Asserts



Instructor (Jerry Cain):Hey, everyone, welcome. I don't have any handouts for you today. You're all crankin' on Assignment 1, which was intended to be very short through Sunday night. The first real assignment went out Wednesday. That's due next Thursday evening. And at least until the mid-term, I'm gonna establish this Wednesday to next Thursday schedule with all the assignments so there's some reliability as to how the workload ebbs and flows.

When I left you last time, I was probably about 60 percent of the way through my lsearch implementation. I'm trying to go from type specific to generic, but I'm trying to do that in the C language.

So this is what I wrote last time.


void *lsearch


And let's see if I can write the parameters out a little bit more neatly this time.

void *key


void *base int

m is the length of that array

int lm size is the size of the elements

And that's technically all the implementation that lsearch needs in order to figure out where the boundaries are between neighboring elements.

The fifth parameter is the one I want to focus on for the next 20 minutes. It has to have this as a prototype. I like the asterisk, we don't need it right there but I like to keep it there, and then I take two void *'s. I don't need to provide parameters names here because I'm not implementing this function here.

The basic algorithm for a linear search from front to back, that doesn't change. It's just the fact that we're trying to present an implementation that doesn't care about any specific one data type.

So I want to do this for int i = 0; i < mi++. With each iteration, I want to manually compute the address of the i'th element. I can certainly do that in terms of base, lm size is the quantum distance to move with each hop, and then i obviously tells me which element I'm interested in. Internally, I want to do this:


void * lm (address)


This is the thing that's going to be compared against that key right there to figure out whether or not we have a match.


This is equal to numerically:


base plus i times om size.


But we mentioned last time that this is strictly pointer arithmetic against a typeless pointer. No, I'm sorry, the pointer has a data type; it's a type void * so it doesn't know what it's pointing to. Several people have suggested, or asked, why they just didn't make default to normal mac when this is a void *. The specification of C just said I don't want to allow point arithmetic by default against a void *, because there was a clear rule for what point arithmetic means when this is strongly typed.

When it's weakly typed with a void *, very generic, I'm just pointing to anything, and I have no idea what. You can't do this, the hack, and it really is a hack, but it's a well-known hack, is to sedate base into behaving like a character pointer just long enough to actually get a number out of this expression.

Drag the base here, say you're pointing to characters. Do technically point arithmetic against the character pointer. This, as an expression, is an integer. It's technically multiplied by size of char, but that's one. So this ends up being a char * that happens to point to a boundary between the i minus 1th and the i'th element. I assigned it to a void *. You do not have to cast the overall thing to a void * if you don't want to because this is a more general pointer; it's willing to take on any pointer type.


If after you do this, you use the comparison function written by the client that knows how to compare the things that reside at these addresses, pass in a key, pass in a lm address. And if that comes back with a match of zero, then go ahead and return (I want to return the pointer), so go ahead and return lm address. This ends the entire four-loop. And if I get to the end and I have nothing to return, I'll just return null as a centinal that nothing worked out.

This replaces the double-equals that sits in between two integers from the integer version we wrote in the middle of last lecture. Double-equals between two strong types that are atomic, it knows how to do comparison. In almost all cases, it just does a bitwise comparison and can come out with a -1, or a +1, or a 0.

When you go generic on the C compiler, you have to say that I know how to compare the elements because I know, even though this is generic code, I know what type of array I'm searching. This code does not. So you have to pass in a little bit of a callback, or a hook, to tell the implementation how it should be comparing your elements.

This is easy to understand. It's easy to just look at this and understand what's going on because you know what linear search is; that's not the hard part. The hard part is getting all the pointer math correct in the char * trick, and actually aligning these things up properly, and invoking the comparison function properly.

Using this as a client is at least as difficult as understanding this code. If I want to go, just think in terms of the int domain, and I have the following, I have intArray is equal to (this is a shorthand way of initializing an array) int size is equal to, I'll just hard code it as 6. If I want to search for the number seven, I actually have to do the following: number = 7. I have to set aside space for the key that I'm interested in because I have to pass the address of that thing as the very first element to the lsearch call. Does that make sense?

You know that this is laid out as an array of length 6. The 7 resides there. I'm passing that and that width and 6 to the lsearch routine. I'm hoping that it returns that right there. I want to find the place where the very first seven in the array resides.

int * found = (This is how I would call lsearch.) lsearch & of number. Array (No & is needed. There's an implicit & because it's really & of array of zero) pass in 6, pass in size of int. That at compile time evaluates to 4, at least in our world. And then I have to provide a comparison function. I want to write a comparison function that's dedicated to comparing integers.

So I'm gonna write that right now, int compare. Now, this will have to be defined as a function before I call this code right here, that I'm having to implement it afterwards. If it's the case that found equals equals null, then you're sad, otherwise you're happy. Does that make sense to people? Let me write the comparison function so we can understand why it has to take the form that it is, int cmp, if it's gonna actually compile and match this as a prototype, it absolutely has to take two void *'s and return an integer. That is the class of function that's accepted as the fifth parameter right there. You may ask, well, can I just actually write a comparison function that takes two int *’s and returns an int? And the answer is you could, but you'd have to cast it to be this type right here. It turns out that this is all pointers of the same size. You would pass them then as int *'s, and they would be absorbed as void *'s. But it's just a much better thing to do is to actually write the comparison function to match that prototype exactly. The implementation of that is a little clunky, but it doesn't surprise you, it just has a lot of asterisks involved.

void * lm 1

void * lm 2

Just because some – let's write the seven right here, this is the thing called number. On an arbitrary iteration, it may pass this int as the first argument to the comparison function. This right here is being invoked right there. It's gonna pass in the address of that one isolated seven right there every single time. The second parameter's gonna get that, and if it fails then that, and if it fails then that, etc., until it runs out of space. I have to return a -1, or a +1, or a 0 depending on whether they match or not. I also, because I am writing this function, specifically to make this call, this constrains the prototype to take two void *'s, but I know that they're really int *'s. So because I'm writing that code as a client, I can reinterpret the void *'s to be the int *'s that they really are. So I will do this, int * ip 2, and I will just set it equal to lm 1 and lm 2. It turns out in a pure C compile you do not need to do a cast there, it just understands that the cast is implicit; it has to do it. So I have these local variable, ip 1 and ip 2, that not only point to this and that right there, but they actually understand them to be four-by quantities to be interpreted as integers. Does that make sense? So all I have to do is return *ip 1 - *ip 2. I'm relaxing a little bit on the return type, I'm letting zero meet a match. And of course, if that difference is zero than of the same number, -1 and +1, I could constrain it to be that. I just want it to really be a negative number or a positive number to reflect the delta between the two. Does that make sense to people? So if you understand this, great. If you understand this, even better. I'm sure most of you understand this even if it's the first time you've seen this type of code. Once we actually understand how all of this stuff works you're gonna be very, very happy. It's a little hard to understand the very first time you see it. But you have to recognize that this is not exactly the most elegant way to support generics, it's just the best that C, with it's specification that was more or less defined 35 years ago, can actually do. All the other languages you've ever heard of they are all so much younger that they've learned from C's mistakes and they have better solution for supporting generics. There are some plus's to this. It's very fast. You only use one copy of the code, ever, to do all of your lsearching. The template approach, it's more type safe. You get more information and compile time, but you get code bloat because you've got one instance of that lsearch algorithm for every single data type you ever searched for. Does that make sense? It’s easier to get this right because you’re not dealing with atomic types that are themselves pointers. We have integers right here. This gets a lot more complicated when you start dealing with the problem of lsearching an array of C-strings. Okay. So you’re going to have an array of char *’s, and you’re gonna have to search for a particular char * to see whether or not you have a match or not. These are the int boards. I want to deal with this setup right here. I have a small array 1, 2, 3, 4. And let’s say I have an array of C-strings. Let’s just assume that it’s initialized this way. And I have an array of five little notes there. And I want to search the array using lsearch for an E-flat. So here’s my key that I’m searching for; it points to an E-flat. I should emphasize that these are really character arrays that are null-terminated. Same thing with this. This right here is a character, character, character, character. This is a character array. That means that’s a char *; char *, char *, char *. The address of the array right there, the arrow I’ve just drawn in, is technically of type char * *. How is lsearch gonna absorb that? It’s going to absorb it as a void *. The only way it’s gonna be able to compute that address, and that address, and that address as part of the linear search is because we’re also gonna communicate the size of a char *, so it can manually compute the addresses of all those boundaries. The comparison function that needs to be written in order to do this has to be willing to accept addresses of that type and that type right there. This is where things can get confusing because you can kinda drift back and say, well, everything’s a pointer so why does it matter that I pass in this as opposed to this? You’re gonna see, when we write the comparison function, that the number of hops from the tail of the arrow that’s passed in really matters. If lsearch passes this type of pointer to your comparison function then you really are two hops away from the actual characters that are compared to one another. Does that make sense to people? You may ask, well, why doesn’t the comparison function just pass these pointers in? The answer is that lsearch has no idea that those things are really pointers. The only thing it knows is if they happen to be for four-by-fourth of information. Make sense? Let me declare this: char * notes array is equal to, I’ll write them as string constants, A-flat, F-sharp, B, then G-flat, and then an isolated D. I can talk about how these things are stored in a little bit. They’re not in the heap, they’re actually global variables that happen to be constant. It’s like normal global variables, except they happen to be character arrays that reside up there, and these are replaced at load time with the base addresses of the A, F and the D.

char * (favorite note), as if I have a favorite note, it is E-flat.

Let me be very clear about this picture again – actually, let me draw it again; favorite note points to E-flat. The actual array happens to be of length 5. It points to strings A-flat, F-sharp, B, G-flat, and D. It’s a cleaner picture. I want to search the array for my favorite note E-flat. The way you have to do this:

char * * (found)

Now, that enough is a headache. To understand why it’s a char * * as opposed to a char *. But we’ll get to that in a second. This is the way you would call lsearch. I have to pass in the address of my favorite note (I don’t have to but I’m going to, I’ll explain why in a second), & favorite note. I’m gonna pass in notes, that’s the name of the array. Think about the data type of notes. Notes is the & of the zeroth element. Since the zeroth element is a char *, it’s the base address of that capital A. Note is synonymous with that value right there. So even though it’s being absorbed by lsearch as a void *, because it was written generically so that’s the best it can do, we know that it’s really a char * *. I have five of these notes pass in size of char *. Why char * as opposed to char * *? Because I’m interested in the actual width of these boxes so that lsearch can actually compute the boundaries between elements. And then, finally, with capital S and capital C, I’m just contriving the name of some function called StrCmp. Where actually there’s two versions of StrComp, the one I’m writing and the one that’s built into the C library, but the one that’s in the C library doesn’t have capital letters there. The reason I’m passing in the & here is because I want the true data type of the key, that’s held by lsearch, to really be of the same type as all the pointers that are computed manually as part of the lsearch implementation. Does that make sense? If I know that this, and that, and that, and that, and that are all really char * *’s, it just makes life a little bit easier, if you have some symmetry inside code that’s otherwise very complicated, to make sure that the key that’s being compared against those five arrows is of the same data type. It doesn’t have to be. I’ll get to that in a second. But I’m writing it this way. So I pass in the & of favorite notes. So I get the & of the tail of that arrow that points to E-flat, two hops away from the capital E. I have to write the StrComp function. Even though lsearch returns a void *, it either returns a null, which we can check just like we did up there, or it returns one of these five arrows. Now, because E-flat isn’t in there it’s gonna return null. But if I had asked for the matching pointer to a G-flat, it would return that. I know that they’re really char *’s in here. Lsearch doesn’t but I do. So when I know it’s returning this type of pointer, I know that it’s truly of type pointer to char * or char * *. Make sense?


[Inaudible] the same as [inaudible] char * *?

Instructor (Jerry Cain):Yeah. The question is the size of char * the same as the size of char * *? The answer is, yes, because all pointers, at least in our world, are the same byte, and they’re always the same size in any given system, and any given executable. You asked, I’m sure, just to be clear that they’re both the same size, but you really do want this for readability purposes to be the true data type that’s held inside the boxes of the array.

I could, if I wanted to, put 17 *’s there, and it would still work. That doesn’t mean the code is the best way we could write it. Does that make sense?


Instructor (Jerry Cain):You don’t have to. Right now, just for symmetry purposes, I’m making sure that the key and the addresses of all the elements in the array are of the same true type. I’ll explain how we didn’t have to bother with the & right here. You only can get away with that if you really understand what’s going on. And I’m just not assuming that that’s the case in the first 15 minutes of the example. But after I write it the one way, I’ll explain how we could have gotten rid of that &. Okay. Let me get rid of these asterisks.

Okay. So I have to write StrCmp. I’m gonna do it over on this board.


int StrCmp takes two void *’s. I’ll come up with better names this time, void * vp 2. The first one is always gonna be that address right there because that’s what I passed in, & of favorite note. I know it’s actually of type char **. On an arbitrary iteration, it might pass in the address of that right there.

So now that I’ve caused the implementation of lsearch, that right there, to just momentarily jump back to my type-safe code, the signature isn’t type safe, but the code that’s inside can become type-safe if I actually cast things properly. So I’m gonna go ahead and do this:


char * s1 (for string 1) is equal to *char * * vp1, *char * * vp 2


Now, why does that look the way it does. I’m casting vp1 and vp2 to be of the type that I know that they really are, this type and that type right there. They’re two hops away from bonafide characters. After I do that, I dereference them once so that this as a value, and maybe this as a value are sitting in local variables called s1 and s2.

The reason I like that is because there is a built-in function as part of the clib that is completely in tune with the fact that the notion of a string is supported as character arrays that happen to be null-terminated, and that we pass around to those strings by address of the first character. This is the address of the capital E. this is the address of the capital G right there.

What I can do is I can pass the buck to this built-in function with a lower case s and a lower case c, s1, s2. It takes two char *’s, and it knows how to do the booforce comparison of characters one after another, as long as they match continues. If it ever finds two characters that don’t match then it knows that it can’t return 0, it just returns the difference between the two ASCII values of the non-matching characters. That even applies if you hit a backslash 0 and 1 before you hit a backslash 0 and the other one. The delta is still what’s returned. Does that make sense to people?

Student:[Inaudible] char *?

Instructor (Jerry Cain):Why didn’t I just cast it to be a char *?


Instructor (Jerry Cain):That’s actually the question everybody asks right at this minute, the last 18 times I’ve taught the lecture.

So this right here is saying – is recognizing – I’m recognizing the vp1 that’s being passed to me is really two hops away from actual characters. So that’s why the double * is really the right thing there. And then I want to get two values that are just one hop away from the real data because that’s what the built-in StrComp wants. StrComp, just like my intCompare function, it returns 0, -1 or +1, so it happens to return the value that I’m interested in.

So you’re questioning why a double * here and then dereference once when I might be able to just get rid of those two things and put char *? Is that what you’re asking?


Instructor (Jerry Cain):Okay. The problem is is that * in front of the open paren, as opposed to the other two *’s on each line, that really is an instruction to hop forward once in memory and do a dereference. If I pass this as vp2, I say that you’re not pointing to a generic pointer you’re actually pointing to a char *. That’s what the char * * cast does. And then when I dereference it once I do that. Given the way I’ve set up the call right there, if I were to do this, this would take this right here and it would assume that the actual material right there are actual characters. Does that make sense?

Student:Actually, I understand that [inaudible].

Instructor (Jerry Cain):This right here?


Instructor (Jerry Cain):Well, that’s actually the part that does the hop and goes from here to there right there. You’re just dereferencing a pointer. Does that make sense?


Instructor (Jerry Cain):Was there another question flying up somewhere?

Student:Why [inaudible] char * * [inaudible]?

Instructor (Jerry Cain):Well, what’s the alternative?

Student:Just like referencing the [inaudible].

Instructor (Jerry Cain):You actually could do that. That’s where it confuses matters a little bit. But a void * you can’t dereference because it doesn’t know what it’s pointing to. A void * * knows that it’s pointing to a void *. Does that make sense to people?

I actually want to bring it into the char * domain as quickly as possible because then I really know sooner than later that I’m actually dealing with strings. Otherwise, I’m just leveraging off of my understanding of memory in a way that might not be clear to the person reading the code. Other questions?


Now, somebody asked about this right here. My implementation of lsearch up here, it’s very careful to pass in key as the first of the two parameters to every call-up comparison function. Does that make sense?

Somebody asked what happens if I forget the & there. Well, my callback function still interprets whatever pointer is passed in as a char * *, so rather than this being passed as the first argument to the comparison function every time, and pass this in, it would still do a dereference after it cast this to be a char * *. So that would mean momentarily it’s pretending that the E and B, and the backslash 0, and the mystery character that’s right there, that that actually represents a char *, and then it would pass that to StrComp. That would not be good because it would jump to the E-flat mystery address in memory and just assume that there are really characters there.

However, not that I like this for this example, but if you know what you’re doing and you want to pass this in right here, you just don’t want to deal with the overhead of a dereference when you know you don’t need to, you could pass this in. And you could recognize that the first argument that’s being passed in is actually one hop away from the characters, and the second one is actually two hops away from the characters.


Instructor (Jerry Cain):Well, I can say the way I wrote it first is the way it’s typically written. Because of the symmetry, I think coders, I don’t know if they like to see it, I think they’re just in the habit of only dealing with comparison functions that really deal with the same incoming data type. And that’s not the case if one’s a char * for real and one’s a char * * for real.

So it is more common for you to put an & right there, and to do this just so that the first line and the second line kinda have the same structure.

Now, for Assignment 2 search certainly comes up. As opposed to all of these examples, you know that there’s some sordid flavor to the arrays that you’re searching there. If you haven’t read Assignment 2, again, I’ll try to be as generic as possible in my description. But you basically have the opportunity to binary search as opposed to linear search for Assignment 2.

There’s a built in function called bsearch. It turns out that there’s a built-in function called lsearch, as well. It’s not technically standard, but almost all compilers provide it, at least on UNIX systems. I’m gonna want you to use the generic bsearch algorithm, which has more or less the same prototype as lsearch right here, that’s why I chose the prototype the way I did there, and it just does a generic binary search. You can implement it again yourself. If you already did then don’t go back and call bsearch that’s built-in. But I’d actually prefer you to use the built-in just so you learn how to use it.

This is the prototype for that built-in: void * is the return type. It’s called bsearch, or naturally binary search. It takes a void * called key, it takes a void * called base, it takes an int, I think it’s called len for length. I actually like n better, though, n always means size of an array. Int lm size, and then it takes the comparison function that takes two void *’s. The algorithm – in many ways the pointer mechanics are exactly the same as they are up there, the only part that’s different is that it kinda does this binary search to figure out what index to probe next. It assumes that the data is in sorted order.

Now, I am going to say this, and you have to recognize it even though it doesn’t sound very deep and insightful, it is. If you want to do the bsearch use this function for Assignment 2, and you want to do it as elegantly as possible. You have to recognize, kind of in sync with what I did over here, when I erased the & right here, you can pass in the address of anything you want to provided the comparison function knows that the first argument is gonna be the address of that something. Does that make sense to people?

With the & I pass in a char * *, without it I pass in a char *. I could have constructed a record and put four pieces of information in there, passed in the & of it, and then I could have cast the address that comes in as the first argument to be the address of that type of struct. The reason I’m saying that is because you’re gonna want to do exactly that for Assignment 2. You’re gonna need more than one piece of information to be available to the implementation of what you pass in right here.

As far as this is concerned, I’ve never said this in lecture before, but I’m glad I’m remembering right now, it has to truly be an actual function. CS106b and 106x, I don’t want to say they’re careless about, but they’re just not concerned about it at the time. They use the word function everywhere for any block of code that takes parameters. When I say function, I’m talking about this object-oriented-less unit, which is just some block of code that gets called as a function that has no object or class declaration around it.

When I’m talking about the type of number functions or functions that are inside classes, I don’t refer to them as functions, I refer to them as methods. The difference between a function and a method, they look very similar, except that methods actually have the address of the relevant object lying around as this invisible parameter via this invisible parameter called this.

The type of function that gets passed right here has to be either a global function that has nothing to do with the class or it has to be a method inside a class that’s declared as static. Which means that it does not have any this pointer passed around on your behalf behind the scenes.

I’ll probably send them an email just about that one point. Because if there are two or three problems that everybody has with Assignment 2, one of them is related to this thing right here. Do you guys know about the this pointer from 106b and 106x? I think they actually used this even more in 106a, when they talked about Java, and it seems to come up more there.

C++ methods, those number functions that are defined in classes, normally pass around the address of the receiving object via an invisible parameter called this. And if you need to, you don’t very often have to, but if you need to you can actually refer to the keyword this inside the implementation of any method, and it just evaluates to the address of the object that’s being manipulated. That’s what makes a method different than a regular function. Regular functions have nothing to do with objects so there’s no invisible this pointer being passed around. You have to pass one of those object-oriented-less normal functions, or the name of one, as the fifth primary to bsearch.

Student:Why is it that the comp function [inaudible)] behind before.

Instructor (Jerry Cain):This right here?


Instructor (Jerry Cain):Because these parenthesis were here, it’s clear syntactically that it has to be a function pointer. And until about four years ago the asterisk inside was always required, and now it’s just not. Because just the lexors and the [inaudible] know how to just decide if this is a function pointer type.

I like the pointer there, for various reasons, just because that’s how I used them for the first 17 years I coded in C. And then someone went and changed it on me, and I’m like, I don’t care, I want to use it the old way. That’s a very C way of looking at it, too. There’s nothing modern about C, so you shouldn’t adopt any of it to modernisms. Any other questions at all?

There are a billion little generic algorithms I could write, but I don’t want to focus on these. You now have all the material I think you need to really make progress this weekend on Assignment 2 if you want to. Assignment 2 is definitely a jump up from Assignment 1. Assignment 1 is intended to be all about UNIX, and just whenever you had time to get to it just to learn the UNIX that’s necessary and then code up 20 lines of code to get RSG running. This is the one that really has some real C-isms that are required for the first half of the program. The second half, where you do the search, that’s very C++-ish. Cubes and stacks and all that kind of stuff you’ve seen that before.

What I want to do now is I want to transition from generic algorithms to generic data structures. And you probably have more practice with generics and templates in C++ with the vector, and the q, and the map, and the stack, and all of those things. I think more often than not, people program in C++ as if it’s C that happens to have objects, and they use the vector and they use the map. They don’t use the ones from 106, they use the ones from the actual built-in STL library. A lot of people code procedurally, and write C functions, and they happen to incidentally use the vector and the map as data structures.

What I want to do is I want to write the same exact thing, support the same type of functionality in some C generics, recognizing that we don’t have references, and we don’t have templates, we don’t even have classes. So we have to do the best we can to imitate the modern functionality that’s offered by C++ and Java, and their templates, using C that has none of it.

So what I want to do is I want to slow down a little bit, and I want to implement a stack data structure. I want to make it int specific just so we have a clear idea as to how the generic should be implemented. But I’m just gonna go up front and say, we’re gonna just implement everything in terms of int ’s so there’s no void * business yet.

Just as there in C++, you’ll normally be very aggressive about separating behavior and implementation using the dot-h and the dot-CC scheme. But if you’re a pure C you don’t use dot-cc as in extension you use dot-C so you know that the file contains pure C code as opposed to C++ code.

So what I want to write here is a stacked out h file, and this is how I’m gonna do it. There’s several ways to do it in C, but I want to imitate the way you’re used to it from C++ as much as possible.

There’s no class keyword in C, but there is the struct. We’re gonna use that. There’s no const, there’s no public, and there’s no private. Our compiler actually supports const, but there’s certainly no public and there’s certainly no private. So what I want to do is I want to come as close to the definition of a class right here as possible using just C syntax. And this is how you do that:


typedef struct (The typedef keyword is required in C; it’s not required in C++).


And then I want to do the following:


int * lm’s


int logical (length)


int allocative (length)



And that is it. I want to call this thing a stack.

Now, in the dot-h file, when I define the struct right there, technically all three fields are exposed so they’re implicitly public. Documentation above the dot-h, at least in Assignment 3 when we start doing this type of stuff, it’s very clear that we’re just exposing these three fields for convenience so people can actually declare stacks as local variables, and the compiler knows that they’re 12 bytes tall but that you should not manipulate these three things at all.

You should just rely on the functions, not methods, but functions right here to manipulate them. And just take this, accept for your ability to declare the stack and that you know that it has three fields inside. Think of any struct as a black box where you just aren’t afraid to manipulate the 12 bytes that are inside.

I want to write a constructor function. I want to write this destructor, or disposal function, and then I want to write an is empty function, a pop function, a push function, things like that. So here’s the prototype of the first thing I’m interested in:


void * stack (new)


All I’m gonna do is I’m gonna pass in or expect the address of some stack that’s already been allocated.

We were talking about the this pointer before. You know how when you call a constructor in a class it has access to that this pointer, it’s because it’s passed in as like the -1’th parameter, or this invisible parameter before everything else. All we’re doing is we’re being very explicit about the fact that the address of the receiving structure is being passed in as the zeroth argument. We have to because that’s what C allows us to do.

I also have this function stack dispose. I want to identify the address of the stack structure that should be disposed. This is gonna be a dynamically allocated array that’s not perfectly sized. So I want to keep track of how much space I have and how much of it I’m using. I also want these methods. Let’s forget about the is empty and the def, let’s just do it with the real functions.


Void stack push


What stack am I pushing onto? The one identified by address right there. What integer’s getting pressed? This one. And actually we’ll go with an int right here, stack pop. Which stack am I popping off of? The one that’s identified by address right there. I just want to be concerned with those things right here.

I don’t know that I’m gonna be able to implement very much because I only have about nine minutes left, but I can certainly, without code, just like pictures that serves a pseudo code, just give you some sense as to how things are gonna work.

The allocation of a stack, when you do this, conceptually all I want to happen is for me to get space for one of these things right here. That means that this, as a picture, is gonna be set aside. And you know, based on what we’ve talked about in the past, that it’s 12 bytes if the lm field is at the bottom, and that the two integers are stacked on top of it. But as far as the declaration is concerned, it doesn’t actually clear these out, or zero them out like Java does, it just inherits whatever bits happen to reside in the 12 bytes that are overlaid by this new variable.

It’s when I call stack new that I pass in the address of this. Why does that work, and why do we know it can work? Because we identify the location of my question-mark-holding stack pass into a block of code that we’re gonna allow to actually manipulate these three fields. And I’m going to logically do the following:

I’m gonna take the raw space, that’s set up this way. I’m gonna set it’s length to be zero. I’m gonna make space for four elements. And I’m gonna store the address of a dynamically allocated array of integers, where these question marks are right here, and initialize the thing that way. That means that that number can be used not only to store the effective depth of the stack, but it can also identify the index where I’d like to place the next integer to be pushed.

So because I’m pre-allocating space for four elements, that means that this is a function. It’s gonna be able to run very, very quickly for the very first calls. And it’s only when I actually push a fifth element, that I detect that allocated space has been saturated, that I have to recover and panic a little bit and say, oh, I have to put these things somewhere else. That it’ll actually go and allocate another array that’s twice as big, and move everything over, and then carry on as if the array were of length 8 instead of 4 all along.

You’ve done this very type of thing algorithmically. At least you’ve seen it with the C++ implementation of templates, and at least just these type of data structures from 106b and 106x. I’m just doing this because I want to be able to start talking about the same implementation with int ’s. Using 107 terminology we’re gonna be dealing with arrays. You can imagine that when we go generic this is still gonna be an array, just like the arrays passed to lsearch and bsearch are, but we’re gonna have to manually compute the insertion index to house the next push call, or to accommodate the next push call, and do the same thing for hop.

I do this [inaudible] int i = 0; i < 5, i++. I want to go ahead and I want to do a stack push. Which stack? The one at that address, and I just want to pass in i. Just draw the picture as to how everything’s updated. And then right here, rather than dealing with the pop problem, which is actually not anymore difficult than the push problem, I just want to go ahead and stack dispose & of s.

So from a picture standpoint, the very first iteration of this thing is gonna push a zero onto the stack, it’s gonna push it at that index. So I’m gonna put a zero right there, and put a 1 right there. It’s that 1 that more or less marks the implicit boundary between what’s in use and what’s not in use. Make sense?

Next several iterations succeed in sending that to 2 after there’s a 1 there, and 3 to put a 2 there. It makes this a 4, puts a 3 right there. It detects that now as the boundary between what’s in use and what’s not in use. You could reallocate right here if you wanted to. I wouldn’t do it yet, I would only do it when you absolutely need to on the very fifth iteration of this thing. So what has to happen is that on the very last iteration here I have to do that little panic thing, where I say, I don’t have space for the 4. So I have to go and allocate space for everything.

So I’m gonna use this doubling strategy, where I’ve gotta set aside space for eight elements. I copy over all the data that was already here. I free this. Get this to point to this space as if it were the original figure I’ve allocated. Forget about that smaller house, I’ve moved into a bigger house and I hated the older house. And then I can finally put down the 4 and make this a 5 and make this an 8. So that’s the generic algorithm we’re gonna be following for this code right here.

Now, I do have a few minutes. Let me implement stack new and stack disposed. And then I’ll come back and I’ll deal with stack push the beginning of Monday. I just want to go ahead and put the dot-h here and have its dot-c profile right to its right.

I want to implement stack new. So take a stack address, just like that, recognize that s is a local variable. It has to make the assumption that it’s pointing to this 12-byte figure of question marks that is that tall right there.

So what I want to do is I want to go in. I want to s arrow logical n = zero. I want to do s arrow alloc len = 4, and then I want to do the following: I want to do s arrow lm’s = (this is a function you have not seen before) malloc times 4 times size of int . Now, if I tell you that this is dynamic memory allocation you’re not gonna be surprised by that because the word alloc, the substring alloc, comes up in the function. This is C’s earlier solution to the operator new solution. We don’t have new and delete in pure C, we have this raw memory allocator called malloc.

Operator new takes account and an implicit data type, because you actually say new into 4 or new double of 20. You don’t do that in C. Not that we should be impressed with the idea, but the way malloc works is it expects one argument to be the raw number of bytes that you need for whatever array or whatever structure you’re building. And if I want space for four integers that’s certainly in sync with this line, where I’m saying I’m allocating four of them, you have to do four times the size of the figure, it goes and searches for a blob in heap, that’s 16 bytes wide, and it returns the address of it.

There is some value in actually doing this. You’ve seen the assert function in the assignment starter code, or Assignment 1. There is this function called assert. It’s actually not a function it’s actually a macro. There’s this thing you can invoke called assert in the code, which takes a boolean value, takes something that functions as a test. It effectively becomes a no op if this test is true, but if this is false assert actually ends the program and tells you what line the program ended at. It tells you the file number containing and the line number of the assert that broke.

The idea here is that you want to assert the truth of some condition, or assert that some condition is being met, before you carry forward. Because if I don’t put this here and malloc failed, it couldn’t find 16 bytes (That wouldn’t happen but just assume it could) and it returned null, you don’t want to allow the program to run for 44 more seconds for it to crash because you de-referenced a null pointer somewhere. You just don’t want to dereference a null pointer because it’s not a legitimate pointer it’s a centinal meaning failure. So you don’t want to dereference failure because that amounts to more failure.

And then you’ll get something, while the program is running, called a seg fault or a bus error. And I’m sure some of you have seen them. Those are things you don’t want to see. You’d rather see an assert, where it tells you what line number was the problem as opposed to a seg fault, which just says, I’m a seg fault, and notice your program stops.

This can be stripped out very easily so that it doesn’t exist in production code. You actually don’t want this failing if a customer is using this code because then it makes it clear that your code broke as opposed to their code. This can be very easily stripped out at compile time without actually changing the code.

So when we come back on Monday I’ll finish the rest of these three and then we will go generic on you.


Lecture 5 Programming Paradigms

Handout 6 Memory
8 pages

Topics: Integer Stack Implementation - Constructor and Destructor, Stackpush Implementation, Reallocation of Memory when Stack Grows Too Big Using Realloc, How Memory is Copied Using Realloc, Stackpop Implementation, Reimplementing the Stack Interface as a Generic Data Structure, Generic Implementation of StackNew, Generic Implentation of Stackpush Using Memcpy, Stackgrow Implementation, Static (Internal) Functions, Generic Stackpop Implementation Using Memcpy, Where it is the Responsibility of the Caller to Allocate the Memory Where the Popped Element is Stored.



Instructor (Jerry Cain):Hey, everyone. Welcome. I have one very short handout for you today. It is the handout that has two problems that we’ll be going over during tomorrow afternoon’s discussion section. Remember, we don’t have it at 3:15; that was just last week to accommodate what I assumed would be a large audience. It’s at 4:15. It’s actually permanently in the room down the hall, in Gates B03. So I’m not teaching the section, Samaya is, who I think is here, who is handing out the handout. So look for that guy tomorrow at 4:15.

Both of these are all the exam problems, so they’re certainly good problems to understand the answers to. And if you’re not gonna watch the section or attend, just make sure you at least read the handouts. You may very well be able to do the problems yourself, and if so then that’s fine, but if not, make sure you watch the discussion section at some point.

When I left you on Friday, I was a quarter way, a third a way through the implementation of an int-specific stack. So there’s nothing technically cs107 level about this, except for the fact that we’re being careful to write it in pure C, as opposed to C++.

Now, if you remember the details from Friday, about the interface file, the functions are, I mean, I’ll talk about those in a second. We actually expose the full struct. There are no classes, and there’s no public, and no private. So everything is implicitly public in the dot age. That’s a little weird. C++, when you learned about classes and objects, it was all about encapsulation and privacy of whatever could be private. We can’t technically do that in C; although, we can certainly write tons and tons of documentation saying just pay attention to the type, use these functions to manipulate it, and pretend that these are invisible to you.

You more or less operate as if this is a constructor, a destructor, and methods, but they happen to come in pure, top-level function form, where you pass in the structure being manipulated. So StackNew is supposed to take the struct, one of these things, addressed by s, and take it from a 12-byte block of question marks to be logically representing a stack of depth zero. This is supposed to kill a stack at that address. This is supposed to increase its depth. This is supposed to pop something off.

I wrote StackNew and StackDispose. I certainly wrote StackNew, I’m forgetting about StackDispose, but I’ll write them really quickly right now.

This is the dot c file:


Stack .c


void StackNew




I add these three things:



Stack logLength = 0 (that’s because the stack is empty)


I’m going to make space for four elements, and then I’m going to go ahead and allocate space for four elements using c as a raw dynamic memory allocator, called malloc. It doesn’t take anything related to a data type. It doesn’t know that anything related to a data type is coming in as an argument. You have to pass in the wrong number of bytes that are needed for your four integers. And that’s how you do this. All that malloc feels is an incoming 16. It goes to the heap and finds a figure that’s that big, puts a little halo around it saying it’s in use, and then returns the base address of it.

I told you to get in the habit of using a certain macro, that was mentioned a little bit in Assignment 1, just to confirm that elems ? null. If malloc fails for some reason it rarely will fail because it runs out of memory, it’s more likely to fail because you called free on something you shouldn’t have called freed on. So you’ve messed up the whole memory allocator behind the scenes.

That’s a good thing to do right there because it very clearly tells you where there’s a problem, as opposed to it just seg filtering or bus erroring or it crashing in some anonymous way, and you have not idea how to back trace to figure out where the problem started.

As far as stackDispose is concerned, this is trivial. Although, there’s one thing I want to say about it, actually, two things I can think of. I want to go ahead and do the opposite of malloc. This corresponds to operator delete from C++. I want to free s-> elems so that it knows that whatever figure is identified by that address right there should be donated back to the heap. That’s just what I mean when I talk about free right here. Some people question whether or not I should go and actually free s itself. They ask whether or not that should happen. The answer is no.

Nowhere during StackNew did you actually allocate space for a stack. You assumed that space for the stack had been set aside already, and that the address of it had been identified to the StackNew function. So you don’t even know that it was dynamically allocated. In fact, the sample code I wrote last time declared a stack called s as a local variable. So it isn’t dynamically allocated at all. So you definitely in this case do not want to do this.

The other thing I want to mention is that regardless of whether or not the stack, at the moment this thing is called, is of depth zero or of depth 4500, the actual ints that are held inside the dynamically allocated rectangle of bytes, there’s no reason to zero them out. And there’s certainly no freeing needs held by those integers. I say that because just imagine this not being an int stack but imagine it being a char * stack, where I’m storing dynamically allocated c strings, Freds’ and Wilmas’ and things like that.

If I did have that, in the case of the char *, I would have to for loop over all the strings that are still held by the stack at the time it’s being disposed and make sure I properly dispose of all of those strings. I don’t have any of that with the int specific version. But the reason I’m saying this is it’s going to have to accommodate that very scenario when we go generic on this thing and just deal with blobs as opposed to integers.

The most interesting of the four functions is the StackPush. And it’s interesting not because of the algorithm to put an integer at the end of an array, but the algorithm that’s in place to manage the extension of what’s been allocated to be that much bigger because you’ve saturated what’s already there.

I chose an initial allocated length of four, so I’m certainly able to push four integers onto the stack and not meet any resistance whatsoever. But if I try to press a fifth in on the stack, it has to react and say, I don’t have space for five integers I better go and allocate space for some more, copy over whatever’s been pushed, and dispose of the old array to make it look like I had space for 8 or 16 or 1,024 or whatever.

So the implementation, assuming I do have enough space, would be this simple: StackPush. Pushing onto what stack? The one that’s addressed by this variable called s. What number am I pushing on? The one that comes in via the value parameter.

Let me leave some space here for what I’ll inline as the reallocation part. But if there’s enough memory, where down here I can assume that there’s definitely enough memory, I can do this:


s-> elems of s-> logLength equals this value


Think about the scenario where the stack is empty. The logLength happens to be zero, which happens to be the index where you should insert the next element.

Once I do this for the next time, I have to go ahead and say, you know what, the logLength just increased by one. Not only does that tell me how deep the stack is, it also tells me the insertion index for the very next push call. It isn’t that simple. Eighty percent of the code that gets written here has to deal with the one in two the end time scenarios where you actually are out of space.

If it is the case that s-> logLength == s-> allocLength, then as an implementation you’re unhappy because you’re dealing with a stack and a value where the value has no home in the stack at the moment.

I’m trying to press on a 7. Let’s say that s points to this right here, and the logical length and the allocated length are both 4, and I’ve pushed these four numbers on there, the client doesn’t have to know that there’s a temporary emergency. So right here I’m just gonna react by saying, you know what, that 4 wasn’t big enough, I want to go ahead and I want to, without writing code yet, I want to basically reallocate this array. Now, I say reallocate because there really is a function related to that word in the standard C library.

In C++ you have to go ahead and allocate an array that’s bigger. You don’t have to use a doubling strategy; I’m just using that as a heuristic to allocate something that’s twice as big. You have to manually copy everything over, and then you have to dispose of this after setting elems equal to the new figure.

It turns out C is more in touch with exposed memory than C++ is. So rather than calling malloc yourself, you can call a function that’s like malloc except it takes a value, that’s been previously handed back to you by malloc, and says please resize this previously issued dynamically allocated memory block. That is this:

Let me write just one line right here. I want to take allocLength and I want to double it. I could do plus equals 10; I could do plus equals 4. I happen to use a doubling strategy. And then what I want to do is I want to call this function. Let me deallocate these so I have space to write code.


s-> elems equals this function called realloc


There’s no equivalent of this in C++. I’ll explain it in a few weeks why there isn’t. But realloc actually tries to take the pointer that’s passed in and it deals with a couple of scenarios. It sees whether or not the dynamically allocated figure can be resized in place, because the memory that comes after it in the heap isn’t in use. There’s no reason to lift up a block of memory and replicate it somewhere else, to resize it if the part after the currently allocated block can just be extended very easily in constant time. The second argument to realloc is a raw number of bytes. So it would be:

s-> allocLength times size of integer.

I see a lot of people forgetting to pass in or forgetting to scale this byte size event. Even though that they know to do it with malloc, they forget with realloc for some reason. It takes this parameter right here, it assumes it’s pointing to a dynamically allocated block, and it just takes care of all the details to make that block this big. If it can resize it in place, all it does is it records that the block has been extended to include more space and it returns the same exact address.

So that deals with the scenario where this is what you have, this is what you want, and it turns out that this is not in use so it can just do it. So it took this address in and it returns the same exact address.

Well, you may question why does it return an address at all? In this case it doesn’t need to. However, it may be the case that you pass that in and you want to double the size or make it bigger, and that space is in use. So it actually does a lot of work for you there. It says, okay, well, I can’t use this so I have to just go, and it really calls malloc somewhere else on your behalf. Just assume that’s twice as big as this. Whatever byte pattern happens to reside here is replicated right there. The remaining part of the figure isn’t initialized to anything, because it’s uninitialized just like most C variables are. It actually frees this for you and it returns this address.

So under the covers, beneath the hood of this realloc function, it just figures out how to take this array and logically resize it, and preserves all the meaningful content that’s in there. If it has to move it it does free this. It turns out realloc actually defaults to a malloc call if you pass in null here. So technically you don’t need the malloc call ever. That actually turns out to be convenient when you have entered a processes that have to keep resizing a node, and you don’t want a special case it and call it malloc the very first time, you can just call it realloc every single time when the first parameter is null on the very first iteration.

It’s very easy to forget to catch the return value. If you forget to do that then you refer to s-> elems captures the original address, which may have changed, and you now after this call may be referring to dead memory that’s been donated back to the heap manager. So it’s very important to catch it like this. If realloc fails it’ll return null. So I’m gonna put an assert right here.

The one thing about realloc that’s neat is that if you don’t want to just end the program because it failed, it actually, if it returns null it won’t free the original block, it would only return null if it actually had to move it. Actually, that’s not true. If it can’t meet the realloc page in request it’ll just leave the old memory alone and return null. In theory, you don’t have to assert. And in the program here you could just check for null, and say, okay, well, maybe I won’t resize it, maybe I’ll just print a nice little error message saying, I actually cannot extend the stack at this time, my apologies, please do something else.

We’ll learn a lot about how malloc and realloc and free all work together. They’re implemented in the same file. They’re implemented in the same file because even though that the actual blob of memory doesn’t expose its size to you, like in a dot length field like in Java, somehow free knows how big it is. Well, there’s cataloging going on behind the scene so it knows how much memory to donate back to the heap every time you call free. But I don’t want to focus on that.

This right here, it doesn’t get called very often. This doubling strategy is popular because it only gets invoked as a code block one out of every two to the n calls once you get beyond 4. Because of the doubling strategy [inaudible] only comes up once every power of 2; 512 wasn’t big enough, okay, well, maybe 1,204 will be. And you have a lot of time before you need a second reallocation request.

There are a couple of subtleties as to how this is copied. In this example right here, all of these little chicken scratch figures right there, they correspond to byte patterns for integers. So when I replicate them right there, I trust that the interpretation of those integers right there will be the same right there. So realloc is really moving my integers for me.

If these happen to be not four integers but four char *’s, that means the byte patterns that were here would have actually been interpreted as addresses. When it replicates this byte pattern down here, it turns and it replicates the addresses verbatim. This would point to that; that will point to that; this will point to that; this will point to that. Do you understand what I mean when I do that?

When I dispose of this it doesn’t go in and free the pointers here. It doesn’t even know there are pointers in the first place so how could it free them? So as this goes away, and these all point to where other pointers used to be pointing, you don’t lose access to your character strings.

This is certainly the most involved of all four functions.

Student:If you could explain the assertion again.

Instructor (Jerry Cain):All assert is, for the moment just pretend that it’s a function. It takes a boolean. Its implementation is to do absolutely nothing and return immediately if it gets true, and if it gets false its implementation is to call exit after it prints an error message saying an assert failed online, such and such, in file stack dot c.

Student:So actually [inaudible].

Instructor (Jerry Cain):S-> elems ? null. You want to assert the truth of some condition that needs to be met in order for you to move forward. If realloc fails it will return null. You want to make sure that did not happen. That’s why there is a not equals there as opposed to a double equals.

This isn’t technically a function. We’ll rely on that knowledge later on. It’s technically what’s called a macro. It’s like a #define that takes arguments. But, nonetheless, just pretend for the moment that it works like a function.

Student:If the realloc has to move the array is it now or anytime?

Instructor (Jerry Cain):Yeah. The question is if realloc has to actually move the array it is time consuming. It’s even more time consuming than o of m. It actually involves not only the size of the figure being copied, but the amount of time it takes to search the heap for a figure that big. So it really can, in theory, be o of m, where m is the size of a heap.

Let me just quickly, just for completeness, write stackPop. It turns out it’s not very difficult. The most complicated part about it is making sure that the stack isn’t empty.

This continues on this board right here. I have this thing that returns an int stackPop. I’m popping off of this stack right here. No other arguments assert that s-> logLength is greater than 0. If you get here then you’re happy because it has something you can pop off. So go ahead and do the following:


s-> logLength -- and then return s-> elems of s-> logLength


The delta by one and the array access are in the opposite order here. I think for pretty obvious reasons. You’re effectively control z’ing or command z’ing the stack to pop off what was most recently pushed on. So if this came most recently, you want to say I didn’t even do that. Oh, and by the way, there’s the element that I put there by mistake. That’s what I mean.

You could say you see this as demanding a reallocation request and going from 100 percent to a new 100 percent, where the old 100 percent was actually 50 percent. You may ask whether or not if you fall below a 50 percent threshold that you should reallocate and say, oh, I’m being a memory hog for no good reason and donate it back to the heap. You could if you want to.

My understanding of the implementation of realloc, because it wants to execute as quickly as possible, it ignores any request to shrink an array. As long as it meets the size request, it doesn’t actually care if it allocates a little bit more. So it’s like, oh, the size is 100 bytes and you just want 50. Okay, then only use 50 but I’m gonna keep 100 here because it’s faster to do that.

With regard to the stackPop prototype right there, it sounds like a dumb observation, but it’ll become clear in a second when we go generic. This particular stackPop elects to return the value being popped off. I can do that very easily here because I know that the return value is a four-byte figure. If this were a double stack I would just make that leftmost int up there double and it would just return an eight-byte figure.

When I go generic, and I stop dealing with int *’s and I start dealing with void *’s, I’m gonna have to make the return type either void *, which means I return a dynamically allocated copy of the element, for reasons that’ll become clear in a little bit I’m not gonna like that, or I have to return void and return the value by reference by passing in an int * or a void * right here. That will become more clear once I actually write the code for it. But we take advantage of a lot of the fact that we know that we’re returning a four-byte figure here so the return type can be expressed quite explicitly as in int.

So now what I want to do is I want to start over. And I want to implement all of this stuff very generically. And I want to recognize that we’re trying to handle ints, and bools, and doubles, and struct fractions, and actually the most complicated part of it are char *’s because they have dynamically allocated memory associated with them.

Let me redraw stack dot h. And we are going completely generic right here. Most of the boilerplate is the same. Typedef struct, and I don’t want to commit to a data type, elems. Now, think about what I lost. I lost my ability to do pointer arithmetic without some char * casting. I also lost intimate knowledge about how big the elements themselves are. So I can’t assume the size of int anymore because it may not be that; it probably won’t be. So I’m gonna require the prototype of StackNew to change, to not only pass in the address of the stack being initialized, but please tell me how big the elements that I’m storing are so that I can store it inside the struct.

The logical length versus the allocated length and the need to store that, that doesn’t change. I still want to maintain information about how many of these mystery elements I have. I also want to keep track of how much I am capable of storing, given my current allocation. And there’s gonna be one more element that we store on a byte. So I’ll leave that piece of spencil for the next 15 minutes and we’ll come back to it.

The prototype of the functions change a little bit. StackNew, same first argument, but now I take an elemSize.


Void * stackDispose


stack *s (That doesn’t need to change. It’s mostly the same, although, we’ll have something to say about what happens when it’s storing char *’s.)


Void stackPush


stack s


And then I’m gonna pass in void * called elemAddr.

I can’t commit to a pointer type that’s anymore specific right there ‘cause it may not be an int *, it may not be a char **, it may not be a structFraction *. It just needs to be some address that I trust because the information that I am holding is pointing to an s-> elemSize figure. So there’s that.

This is the part that freaks people out. This is what I’m going to elect to do, I’ll talk about the alternative in a second. But when I pop an element off the stack, I want to identify which stack I’m popping off of, and I also want to supply Address. This is me supplying an address. I’m actually gonna identify a place where one of my client elements, that was previously pushed on, should be laid down. It’s like I’m descending a little basket from a helicopter so somebody can lay an integer or a double or something in it so I can reel it back up to me. So void * elemAddr. And that’s all I’m gonna concern myself with.

Student:Where is the struct named stack?

Instructor (Jerry Cain):I’m sorry, I just forgot it. It’s right there.

So 70 percent of the code I’m gonna write it’s all gonna be the same. It’s gonna be slightly twisted to deal with generics as opposed to ints. But rather than using assignment, which works perfectly well when you’re taking one space for an int and assigning it to another int, we’re gonna have to rely on memcopy and things like that.

StackNew is not hard.



Void StackNew



stack *s int elemSize (Same kind of stuff)



s-> logLength = zero


s-> allocLength = four


s-> elemSize = elemSize


s-> elem = malloc


I don’t have a hard data type so I can’t use size of here, but I have no reason to. I have been told how big the elements are via the second parameter. So four times elemSize, just use the parameter as opposed to the field inside the struct.

I can do a few things to make my life easier. S-> elems better not be equal to null or else I don’t want to continue. And the assert will make sure of that. I also could benefit by doing this: assert that s-> elemSize – this one is not as important because it takes a lot of work to pass in a negative value for elemSize. But, nonetheless, it’s not a bad thing to put there because if you try to allocate -40 bytes it’s not gonna work. That and implementation make sense.

I think even if it makes sense, there’s some value in seeing a picture. It takes this stack right there, with its four fields inside at the moment, and let’s say I want to go ahead and allocate a stack to store doubles. That means the elemSize field would have an eight right there. I’d set aside space for four of them. The logical length is zero. I’m just making sure this is consistent with the way I’ve done this. That’s right. And then I would set this to Point 2. As far as the stack knows, it has no idea doubles are involved, it just knows that it’s a 32-byte wide figure. And it has all of the information it needs to compute the boundary between zero and one, one and two, and two and three.

As far as stackDispose is concerned, stack *s, this is incomplete, but it’s complete for the moment given what I’ve talked about. I just want to go ahead and I want to free s-> elems, and not concern myself yet with the fact that the actual material stored inside the blob might be really complicated. Just for the moment think ints, and doubles, and plain characters. But the fact that I’m leaving space there [inaudible] that this will change soon.


Let me write stackPush right here.


void stackPush


stack *s


void * elemAddr


I’m gonna simplify the implementation here a little bit by writing a helper function. I could have written the helper function on the int specific version but just didn’t.

Up front, if it’s the case that s-> logLength == s-> allocLength, then I want to cope by calling this function called stackGrow. And you know what stackGrow’s intentions are. And I’m just gonna assume that stackGrow takes care of the reallocation part. I’ll write an int second, that’s not the hard part. Once I get this far, whether or not stackGrow was involved or not, I want to somehow take the s-> elemSize bytes that are sitting right there and write into the next slot in memory that I know is there for me, because if it wasn’t this would have been called.

So what has to happen? Let me refer to this picture. Let’s just assume that this picture is good enough, because I obviously have enough space to accommodate this new element. Let’s say that I have three elements. So that these have been filled up with interesting material. And I somehow, in the context of that code over there, have a pointer to some other eight-byte figure that needs to be replicated in.

This is gonna be the second argument to memcopy. This right there is gonna be the third argument. The only complexity is computing and figuring out what the first argument is. It has to be that. It doesn’t need to know that it’s a double *, or a struct with two ints inside *, or whatever. It just needs to actually get the raw address and replicate a byte pattern. No matter what the byte pattern is, as long as it’s the same in both places, you’ve replicated the value.

So the hardest part about it is doing this:


void * target = char * s-> elems + s->

logLength times s->elemSize


This is why I demanded all of this stuff to be passed to my constructor function so it was available to me to do the manual pointer arithmetic later on.

I don’t think I messed this up. LogLength is right there for the same reasons it was in between the square brackets for the int specific implementation of this.

Then what I want to do is I want to go ahead memcopy into the target space whatever’s that elemAddr. How many bytes? This many, s-> elemSize. Then I can’t forget this, this was present at the other implementation as well. I have to note the fact that the logical length just increased by one. So that’s how I managed that.

Do you guys see just the bytes moving on your behalf in response to this function?

There’s a question back there.

Student:The char * after void target equals?

Instructor (Jerry Cain):This right here, remember that s-> elems, unless I made a mistake, is typed to be a void *. So you can’t do pointer arithmetic on the void *, so the trick – there’s actually two tricks. I opt with this one because I’m just more familiar with it, you can either cast it to be a char * so that pointer arithmetic defaults to regular arithmetic. This as an offset it’s still the offset, it just happened to involve multiplication. It is itself implicitly multiplied by size of char, which as far as multiplication is concerned, is a no op.

I have also seen people, I think I mentioned this before, I’ve seen people cast that to be an unsigned long, so that it really is just plain math, and then they cast it to be void *. Pointers and unsigned longs are supposed to be the same size on 32-byte systems. A long is supposed to be the size of a word on the register set. So I’ve seen that as well. You can do whatever you want, I just – all of my examples use the char * version so that’s why I use that one.

The stackGrow thing. The reason I want to do that is not because of the mem copying or the void *, I just want to explain what a student asked two seconds before class started. You’re kind of already getting the idea that the word static has like 85 meanings in C and C++. Well, here’s one more.

When you see the word static, decorating the prototype of a C or a C++ function, not a method in a class just a regular function, such as static void stackGrow, and it takes a stack *s. What that means is that it is considered to be a private function that should not be advertised outside this file. So in many ways it means private in the C++ sense. The technical explanation is that static marks this function for what’s called internal linkage.

You know how you’re generating all these dot o files, when you type make all this stuff appears in your directory, some of them are dot o files. I’ll show you a tool later on where you can actually look at what’s exported and used internally by those dot o files.

StackPush, and stackNew, and stackDispose are all marked as global functions, and that the symbols, or the names of those functions, should be exported and accessible from other dot o files, or made available to other dot o files. Something like this is marked as what’s called local or internal. And even though the function name exists, it can’t be called from other files. That may seem like it was a silly waste of time, but it really is not.

Because you can imagine in a code base of say one million files, it’s not outlandish believe it or not. Think Microsoft Office, the whole thing, probably has on the order of hundreds of thousands of files, maybe tens of thousands, I don’t know what it is, more than a few. You can imagine a lot of people defining their own little swap functions. And if they’re not marked as internal functions at the time that everything is linked together to build Word or Excel or whatever, the linker is gonna freak out and say, which version of swap do I call? I can’t tell. Because it has 70 million of them. But if all 70 million are marked as private, or static, then there’s none of those collisions going on at the time the application’s built.

This is responsible for doing that reallocation. Assume it’s only being called if it understands that this is being met as a precondition. So it can internally just do this:


s-> elems = realloc


s-> allocLength times s-> elemSize


That’s the cleaner way to write it. It makes this focus on the interesting part that’s hard to get, and kind of puts this aside as uninteresting.

Algorithmically, the function that’s most different from the integer version actually is this StackPop call. This is how I want to implement it:


void stackPop.


I’m popping from this stack right here. I’m placing the element that’s leaving the stack and coming back to me at this address. There’s no stack shrink function to worry about. So what I want to do is declare this void *, not called target but called source = char * of s-> elems plus s-> logLength minus 1 times s-> elemSize. I forgot to do the minus minus beforehand so I recovered by doing a minus 1 right there.

This is where we’re drawing a byte pattern from in the big blob behind the scenes. We’re drawing byte patterns from there so we can replicate it into elemAddr. ElemAddr is the first argument. I’m copying from that address right there how many bytes. This many. And then do what I should have done earlier, logLength --. This really should have been the first line, and I shouldn’t have the -1 there, but this is still correct.

Do you guys get what’s going on here? If you understand the helicopter basket analogy? We’re on actually identifying a space where it’s safe to write exactly one element so that when the function returns I can go, wow, that’s been filled up with an interesting element to me. And I can go ahead and print it out or add it to something or replace the seventh character or whatever I want to do with it.

This used to be int. If I wanted to I could have punted on this right here and just passed in one argument. And I could have returned a void * that pointed to a dynamically allocated element that’s elemSize bytes wide. And I just would have copied not into elemAddr, but into the result of the malloc call. With very few exceptions, malloc and strdup and realloc being them, you usually don’t like a function to dynamically allocate space for the call and then make it the responsibility of the person who called the function to free it.

There’s this asymmetry of responsibility, and you try to get in the habit as much as possible of making any function that allocates memory be the thing that deallocate’s it as well. There’s just some symmetry there and it’s just easier to maintain dynamically allocated memory responsibilities.

It’s not wrong it’s just more difficult to maintain. It actually clogs up the heap with lots and lots of little void *’s pointing to s-> elemSize bytes, as opposed to just dealing with the one central figure that’s held by the stack, and then locally defined variables of type int and double that are passed in as int * and double *’s recognized as void *’s. But because we laid down the right number of bytes, as long as everything is consistent we get back ints and doubles and things like that.

Student:In this case do you have to with our elemAddr [inaudible] right?

Instructor (Jerry Cain):Yeah. I’m not actually changing elemAddr I’m just changing what’s elemAddr. So let me actually make a sample call to this.

Suppose I have a stack s. And I called StackNew with & of s, and I pass in size of int. That means that I have this thing, that’s my stack and I’m supposed to just take it as a black box and not really deal with it. But I know behind the scenes that it has a 4 there, and maybe my stack has 14 elements in it, and it can accommodate 16 elements. And that it points to this thing that has not 2 or 4 or 8, but 16 elements in it. And I’m like, you know what, I did all this work, and I pushed on 14 elements right there, but I’d like to now pop off the top element.

When I do this stackPop, I have to pass in that. I have to pass in the address of an integer. So the stackPop call has a variable called elemAddr that points to my variable called top, and it relies on this variable to figure out where to write the element at position 13. It happens to reside right there. The memcopy says where do I write it? I write it at that address right there. You would replicate this byte pattern in the top space, this returns, and I print out top and that corresponds to the number 7,300 or something like that.

If I really wanted to change this void *, I don’t change the void * anywhere, I just use it as sort of a social security number or an ID number on the integer. If I really wanted to change this you’d have to pass in a void * *. There are scenarios where that actually turns out to be what you need, but this is not one of them.

Student:For this pop, let’s say that this function doesn’t work, that it’s popping the wrong stuff. How do we test to find out if we pop [inaudible]. Like, for instance, let’s say you wanted to pop something that was an integer, and you popped [inaudible], and then what if there was something wrong with the resizing, can we set that in that we just pop to like null or something and then –

Instructor (Jerry Cain):You could. If you really want to exercise the implementation of the stack to make sure it’s doing everything correctly, you’ll write these things, you may have heard this, what are called unit tests. Which are usually implemented – there’s two different types of unit tests, there’s those that are written on the dot c end, which know about how everything works, and then those that are written as a client to make sure that it’s behaving like you think it should. If that were the case, and you wanted to protect against that, you could write all these very simple tests to make sure that things are working.

Because in unit tests, as a client you might for loop over the integers one through a million. You might actually set the initial allocation length not to be four, but to be one, so that you get as many reallocations as possible. You could for loop and push the number one through a million on top. In a different function, you could actually pop everything off until it’s empty and make sure that they are descending from one million down to one. And if it fails, then you know that there’s something wrong. If it succeeds, you never know that something is completely working, but you have pretty good evidence that it’s probably pretty close if it isn’t.

You can change the implementation to kind of make sure that all of the parts that are really risky, like the reallocation, and the

--, and the memcopy calls are working, one thing you could do, and Assignment 3 takes this approach a little bit, right here I use this doubling strategy. If I really wanted to test the reallocation business, I could actually do a ++ instead of a times two on the allocation size and to reallocate every single time. Not because you want it to work that way, but because you could just make sure that algorithmically everything else works regardless of how you resize them.

Now, another answer to your question, if I didn’t take that approach with the unit test answer, is that when you’re dealing with generics in C, and void *’s, and memcopy, and manual guests, you have to be that much more of an expert in a language to make sure that you don’t make mistakes.

It’s very easy. Think about it this way, I know you’re not gonna believe this yet because you haven’t coded in C that much, but Assignment 3 you will. I know how most of you program. You write the entire program and then you compile it. And you have 5,500 errors. And you get rid of them one by one, and it takes you like three days, and then it finally compiles. And you run it, and even if it doesn’t quite run the way you want it to it rarely crashes. Most people don’t even know what the word crash means until they get to 107. You will next week, trust me.

As far as C++ is concerned, because it’s so much more strongly typed than C is, it is possible for you to write an entire program, to have it compile, because compilation does so much type checking for you in a C++ program that uses templates, it’s quite possible that once it compiles that it actually works as expected. It doesn’t happen very often but it’s certainly possible.

In a C program, where you’re using void *’s, and char *’s, and memcopy, and memmove, and bsearch, and all of these other functions you’re gonna need to use for Assignment 3, it’s actually very easy to get it to compile. Because it’s like, oh, void *, I can take that, yep, that’s fine. It just does it on all the variables, and so it compiles. And you’re like, good, it wasn’t three days, it was one day. And then you run it and it crashes because you did not deal with the raw exposed pointers properly. So that’s what makes certainly the first assignment in pure C with dealing with void *’s difficult.

But there’s an argument that can be made to say that it’s very hard to get all this stuff right every single time you program. I’ve already told you that right here I forget this s-> elemSize probably one out of every two times I teach this example. That’s because it’s very unnatural compared to the C++ way of allocating arrays to think in terms of raw bytes. And you think in terms of sizes of the figures, you see the pictures in your head as to how big things should be, so you just remember that number right there but you forget about this. If you forget this right here, it compiles, it runs. If you’re dealing with integers then your array is one-fourth as big as it needs to be to store all the integers you want there.

You will learn this and feel it at 11:59 a week from Thursday. All these types of errors, because it’s gonna compile and it’s gonna run very often. But occasionally it’s gonna crash and you’re not gonna know why, and you’re gonna say, oh, it’s the compiler. It’s not, it’s your code.

Okay. So we will talk more on Wednesday.


Lecture 6: Programming Paradigms

Handout 7: Stack Implementation
52 pages

Section Assignment 1
4 pages

Section Assignment 1 Solutions
2 pages

Topics: Problems with Ownership of Memory, How Default Implementation of Stackdispose Does Not Free Dynamically Allocated Data, Adding a Free Function to the Stack Implementation, Rewriting Stackdispose to Incorporate It, Writing a Free Function for a Stack of C-Strings, Pitfalls When Writing Such Functions, C Library Functions for Assignment 3 - Memmove (Memcpy That Can Copy Using Two Regions That Overlap), Example of Rotate Function, C Qsort Function, Global Layout of Memory - Stack Segment, Heap Segment, How the Heap Manager Allocates And Frees Memory on the Heap, Underlying Linked List of Free Node Information



Instructor (Jerry Cain):Hey, everyone. Welcome. I have one handout for you today. It is Assignment 3. It is due next Thursday evening. You’ll get Assignment 4 next Wednesday. It’ll be due the following Thursday evening. You’ll get a written problem set a few Wednesdays from now. It won’t need to be turned in. I’ll talk about that more when I actually give it out, but it’ll provide a collection of written problems that you’ll be responsible for for the midterm come the following Wednesday night.

When I left you on Monday, I had really just gotten through what, at the moment, was the full implementation of a generic stack. I’ve actually made parts of it easier than it really needs to be because we focused on storing ints, and doubles, and characters, and Booleans.

What I wanna do now is put off the implementation for about ten or 15 minutes, look at that again, and think about how we would use it to store a stack – I’m sorry, use a stack – a generic stack right there – to store a collection of strings and print them out in reverse order.

Now, the nonsense code I’m gonna put on the board is effectively gonna just print out the reverse of an array, but it’s really in place to illustrate the mechanics of using those four functions when you’re storing C strings. That’s gonna become very important come Assignment 4 to manipulate C strings, so that’s why I want to do this.

So just imagine this right here being your main function. I don’t care about Argc and Argv, but I do care about declaring one of these things – and I’ll emphasize the fact that it is a string stack – and what I wanna do is I wanna press on four deep copies of these strings right here, const, char*. I’ll just say letters is equal to – actually, you know what? Let me not call them letters. We’ll just say friends – and I’ll set it equal to this array. Now, Bob, Carl, and that’ll be enough.

So what I wanna do is I wanna declare one of these stacks. This picture I get right here, according to that typedef over there, isn’t very sophisticated. It’s 16 bytes of nothing, or nothing meaningful, okay. But I rely on stack new ampersand of the string stack where I pass in size of char*, and all of a sudden now it’s getting a little complicated.

I’m gonna ask the stack to basically keep track of the addresses of dynamically allocated C strings. So what I wanna do is I just got this picture. This points to a 16-byte blob. Four bytes – there’s no – the depth – the stack is zero, but I have space for four elements. This four right here is really size of char*.

What I wanna do is I wanna for loop from i’s equal to zero, i less than three because I wanna go ahead, and I wanna make a copy of each one of those strings. I do. That’s char* – I’ll call it Copy – is equal to strdup – oops – of friends of i, okay. This is an important enough variable that I actually wanna draw it, copy. On the very first iteration when i is equal to zero, it is set to point two, a deep copy of Al backslash zero in the heap.

What I wanna do – and I think this is the best way to phrase it – is that when you push an element onto the stack, you transfer ownership from you to the stack. The way you do that, based on this right here, is for me to do stack, push, which stack? The string stack that I’ve just initialized.

There’s some drama and controversy over what the next argument should be. Since I am storing char*, that means that these things – even though the stack doesn’t know it – that they have to hold as material these four byte character pointers. That means that I have to pass in the address of a char* so it knows to go to that address and copy the four bytes into the stack. Does that make sense to people?

Okay. Because I’m putting the ampersand here, it knows to go and replicate that material right there, so that it points there, increments this to a one. On the very next iteration, I reuse i, and actually destroy and re-declare Copy to no longer point to Al, but as it’s re-declared and reinitialized it’s set to point up to Bob.

This is Copy on the second iteration. I pass that in. It replicates the material that’s in there because it has the address of that size of char* figure so that it could replicate that address right here, and so it’s almost like this as a for loop blows up three balloons, okay, with Al, Bob, and Carl’s name on it, and then knowingly memcpys the end of the string, or the tail of the string, and actually copies it to the stack behind the scenes. Do you understand what I mean when I say that? Okay. So, I’m really transferring ownership of these three dynamically allocated copies over to the stack.

What I wanna do now is I wanna go ahead and I wanna print these things out. I really wanna ask for ownership back, so I’m going to do this char*, name, and then a four int i is equal to zero, i less than three, i++.

What I wanna do is I wanna ask for the stack to pop off – string stack – and I want it to place the most recently pressed, or pushed, kite string back into the space that’s right here, okay. So I want it to – this would have been set up to point to Carl. I want it to replicate this space in my local variable called Name so this ends up pointing to Carl, and the logical length of the entire stack is detrimental to two. Okay, does that make sense?

The way I do that is like this. Then I can do this. This is the equivalent in C of C out. [Inaudible] where this is the string percent. This is a placeholder for the string to be printed, and then I can go ahead and free main. That means that it basically passes the end of the kite string – or the end of the balloon string – to the deallocator, so it goes to the Carl address, or the Al address, or the Bob address, and actually donates that number back to the heap. I wanna be clean about it. I do stack dispose at the end, and I just pass an ampersand of string stack, and that is that, okay.

Now, these ampersands right here typically don’t surprise people because I’m dealing with a direct allocation on the stack frame of this function of a thing called string stack, and I have to pass the location of it around, okay.

These ampersands right there very often surprise people. If you do not put them there, for many of the same reasons we saw in previous examples, it would still compile and run, but if you provide this address to stack push, then you’re going to get it right. But if you actually don’t include the ampersand right there, and you pass in that value right there, it’s going to go ahead and replicate BOBY – I’m sorry – BOB backslash zero – as if it’s an address, and copy that into the stack frame, okay. That’s not what you want. Okay, does that make sense? Okay, very good.

Now, the problem comes – suppose I go ahead and I comment all of this out, or I set this equal to i less than two, or something. Suppose, at the time, I actually call stack dispose. The stack actually has some material that it still owns.

I’ve been very symmetric in the way that I allocate build up, bring down, and then dispose of, but the stack shouldn’t be obligated to be empty, or the client shouldn’t be forced to pop everything off the stack before they call dispose. Stack dispose should be able to say, “Okay, I seem to have retained ownership of some elements that were pressed onto me. I would like to be able to dispose of these things on behalf of the client before I go and clean up, and donate this blob of memory back to the heap.” Okay.

Many times there’s nothing to be done at all. When this thing stores ints, or longs, or doubles, or characters, you don’t have to go in and zero them out. That’s really not that useful. You do have to be careful about donating back any dynamically allocated resources, or maybe any open files. That’s less common, but dynamic memory allocation is certainly more common.

If these things really are owned by the stack at the time it’s disposed of, then the stack dispose function has to figure out how to actually pass these things right there to free just like we do right there. Does that make sense to people? Okay.

It’s actually very difficult to do that because the implementation of stack dispose doesn’t actually know that these things are pointers. It just knows, at best, that they’re four-byte figures. It is capable of computing these addresses, and so if the depth of the stack is three, so that those three arrows are arrows that point to elements it’s holding for the client. It could pass those three arrows to some disposal function, okay.


Now, this isn’t always going to be a simple pointer. This might be a struct with three pointers inside, okay. It might be itself a pointer to a pointer to a struct that has three pointers inside. So, you wanna have some very general framework for being able to free whatever’s at those three arrows if, in fact, there’s anything freeing needs.

So, what I wanna do here is I want to upgrade this right here to not take two arguments, but to take three arguments, and this is what it’s going to look like. This is the upgraded stack new function. Stack new is going to do that. It’s gonna take elemsize, and it’s also gonna take this, void. Free function takes a void* and doesn’t return anything.

The idea here is that I want to pass to the constructor function information about how to destroy any elements that it holds for me when I call stack dispose, okay. Those three arrows – if you’re dealing with a stack of depth three, and it’s storing strings – it’s prepared to pass those three things in sequence. At least, that’s what I wanna write code for – those three things in sequence to this function that you write and pass the name of to your stack new function, okay. And it will invoke this function for every single element it holds for you.

Since we write this, we can accept the void*s right there. We interpret them, in this case, to be the char**s we know them to be, dereference them, and pass them to free, okay. Does that make sense to people? Yes? No? Okay.

So, I have to rewrite a few things. I have to actually store the free function as a field inside the struct. That means that this is a picture. We’ll actually have this fifth field that points to a block of code that knows how to free things for me, okay.

We have to also handle the scenario where we’re storing ints or doubles in the stack, and there actually is no freeing to be done on behalf of those things. So, the client, when they call this new function, they’re supposed to pass on the first two arguments as they always have.

If you’re storing these base types that have no freeing needs, I expect the client to pass an annul here, and that will be checked for and stack dispose. If you’re storing things like char*s, or pointers to structs, or even direct structs that have pointers inside that are pointed to dynamically allocated memory, then you have a meaningful free function placed right here, okay. Does that make sense? Okay.

Let me rewrite stack new. I’m sorry, not stack new. This is easy. Let me rewrite stack dispose. Stack astro s understand that now I’m getting the pointer of one of these five field structures where the fifth field is actually either some null pointer, or a pointer to a legitimate freeing function.

Before I go ahead and dispose of the elems blob, I better check to see whether any – to see whether or not anything complicated is residing within the logLength elements that are still inside. If it is the case that s arrow free function – that’s the name I gave to that field up there – I don’t want to be too clever the way I do that, so let me say if it’s not equal to null, then that means I have to apply this free function as a block of code against those three arrows, or all of the arrows that come up, the manual addresses – the manual the computer addresses of all the things that reside behind the scenes.

You could do this. Four int i is equal to zero, i less than s, logLength i++, I would just do s arrow, free function, char*, SRO elems plus i times s arrow elemsize. Okay. I think people don’t usually argue with that because it’s something they’ve seen before, so they kinda trust it. It’s not difficult to get that part right when you know you’re storing a free function. The part that’s difficult to get right is right in the free function itself.

Let’s revisit this example now that we know that when we store strings, we actually have to potentially set up the stack, or set up the stack to potentially delete elements for us. That means when we declare a stack called string stack, and we call stack new with size of char*, and I wanna pass in some function called string free – I’m just contriving the name. I do know that it has to match that right there as a prototype. It has to take a void* and return nothing.

I have the responsibility if actually writing the string free function. Well, I have to set up string free to actually accept the void* knowing that it’s really a char**. Does that make sense to people? Okay.

Let’s revisit this picture. These are the types of things that are gonna be passed to my free function. I need to reinterpret that as something that’s at least dereferenceable, okay, and then hop into the actual rectangle, or the box, and take that number and pass it to the free function, okay.

So, as an aside, prior to doing this you would implement string three to take the void* – oops, let’s give it a name – Elem, or VP, or whatever you wanna do – and because I’m writing this specifically for the char* stack case, I would just do this. This is really an asterisk. It just came out badly. Okay, that’s a good one.

Okay, now what happens if I forget this, and I forget that? Think about what actually gets passed to free. If I don’t cast this to at least be a double pointer – and I’m actually casting it to be a char double pointer because I know it’s really two hops away as an arrow to characters – and dereference – this dereference is really what matters. It’s the thing that takes me from this little fence post right there to the box that’s addressed by the fence post, okay.

If I leave that that way, it will pass this address to free. It will pass that address to the free function. It will pass that address to the free function, and that’s bad because the first one actually can be passed to free. You shouldn’t be doing that, though, because you didn’t allocate that block. These two addresses should certainly not be passed to free because they weren’t directly handed back to anyone via up call to malloc or realloc, okay.

You don’t own these copies of the pointers, but you know that they’re char*s, and you’re just telling the stack to actually dispose of those things for you because even though the stack isn’t empty, you don’t need the stack anymore. That’s why it’s imperative that these things really be there, okay.

If you’re storing a stack of ints, or a stack of longs, or a stack of even struct fractions where there’s no pointers inside, you would just pass in null there instead, and that would be special-cased away right there, okay. Does that make sense to people? Okay.

If I’m storing ints, or longs, or even struct fractions – which, when we double struct fractions just had two direct ints inside, if there’s no dynamic memory allocation involved in the things I’m storing – even if they’re structs – I would pass in null. The constant right here is a sentinel meaning there’s no free function needs, okay, and it’s special-cased, and would be observed to be null right here, and circumvent this for loop, okay.

Just to make sure, let me ask some questions. Some of them were easy, but I wanna make sure you believe the answers. You understand why I for loop up to logLength and not allocLength, right? I have no business asking the stack to free things beyond the boundary between what’s in use and what’s not in use, okay. I have no reason to trust that anything meaningful is in that extra space. In fact, I know it isn’t.

How come I don’t free the element using this function right here just before stack pop exits? Do you understand what I mean? Like, if the stack is saying, “Oh, they want an element back. I better return this.” Why isn’t the stack applying the free function to the top element before it returns it? The way I framed it there it kinda sounds like an idiotic question. But you’re not so much – you’re not really transferring a copy of the string back. You’re transferring ownership of the original string back to the client. You’re taking this pointer and using memcpy to replicate it in client-supplied space.

So, if you do that and they have an alias to a pointer that you have an alias for, and then you apply the free function to it, you’re killing the string one instruction after you actually give it back to the client, okay. Does that make sense? Okay, that’s great.

Even if all the code makes sense, just be sensitive to the fact that the compiler does not help you out as much as we’d like. So, you really have to be very thoughtful about the placement of the ampersands, and the double asterisk versus the single asterisk, and whether or not you have to dereference a char** cast as opposed to not dereferencing a plain char* cast, okay.

The number of hops really matters, okay. And you always want to interpret the void*s to be the types of addresses you really know them to be, okay, because if you don’t the composite is gonna let you do whatever you wanna do. And – well – I mean, in theory a compile time you can get away with a lot, but at run time you never get away with anything, okay. Does that make sense to people? Yeah?

Student:So, the free function will understand it’s not just one character we’re looking at; it’s the whole string and the element?

Instructor (Jerry Cain):Well, that’s not so much – that has nothing to do with string so much as it has to do with malloc and free, but the addresses that reside in this space right here, they were created using this function called strdup. I think the code’s still up on the board right there, and strdup actually relies on malloc to allocate the memory.

Behind the scenes, even though it’s unexposed to us, it actually keeps track of exactly how much memory is part of that blob. So, as long as you pass the leading address to it, it goes to a file where everything is kept track of, and it looks – it compares that address to something in a symbol table, or something else to recover the actual allocation size so it knows exactly how much to donate back to the heap. Does that make sense? Okay.

Any other – the question way in the back.

Student:In the first line of int main, should that be a double char* [inaudible]?

Instructor (Jerry Cain):[Inaudible] it absolutely should be. This should be a double star. I meant to do this. Sorry. That’s the way it is always in my sample code. I just forgot to do it here. Sorry about that. Was there another question that flew up over here? Yeah.


Instructor (Jerry Cain):But I did, actually. This – there’s a malloc that happens as part of that strdup. So, every single string has to be either freed by the stack itself as part of stack dispose, or if it comes back to me because I asked for via stack pop, and after I’m done with it I have to probably dispose of it. So there has to be a one-to-one correspondence between every call to strdup and every call to free.

In an ideal world where you only dispose of empty stacks, all the free calls would come in the client side. But if you ever dispose of a stack that still holds on to its elements – or is holding on to a subset of the elements – then the number of free calls is going to be distributed between the stack and the client. Does that make sense, okay. That’s good. Okay.

So, what Assignment 3 is all about – it actually turns out that Assignment 3 used to be a very, very difficult assignment, but I started doing this in lecture about, like, I don’t know, like three and a half years ago. And then all of a sudden, things became much more clear when it come assignment time because the implementation you write for the first half of Assignment 3 is really just an extension of this.

Rather than actually just dealing with push and pop as operations – those are the dynamic operations we’re memcpy’ing those on – I want you to generalize it so that you can insert anywhere, and you’re familiar with that from a data structure standpoint using the capital V vector from 106 or the lowercase v vector from the STL that you used in Assignment 1 and 2, okay. Does that make sense?

There are two things that I just wanna mention before I formally put this material to bed, and I wanna talk a little bit about the implementation of malloc, and free, and realloc and how they work. It’s actually very interesting, I think. But I have to talk about two other functions that come up during the implementation of Assignment 3. I should just talk about them.

Let’s – this is okay – I wanna write a function that’s very similar to swap. I wanna write a function called rotate. This is actually imitates a function that’s provided as part of the STL library, but I’m gonna write it in pure C, and I’m gonna pass in three void*s. Void* – I’ll call it front – void* – I’ll call it middle – and void* – I’ll call it end. And what I wanna do – I’ll use this board to draw a picture – is I want front, middle, and end to actually be sorted pointers that point to various boundaries inside an entire array.

So let’s assume I have an array of 50 integers, okay, and for whatever reason, I want to move the front four all the way to the back, okay. Does that make sense to people? So, think of it as, like, a bookshelf with 50 books on it, and for whatever reason the front four are out of alphabetical order, and you’ve decided to take them out as a chunk, and move them to the back, but in the process you have to slide 46 books forward. Now drop the word “book” and use the word “int” and that’s what I wanna do.

The intent of this rotate function is if it’s given the absolute opening address of the entire figure, it’s given the midpoint that separates front from back – even though they’re not equal sized – and then end is actually passed the end iterator in the C++ stance, but it’s really the address of the first byte that has nothing to do with the array.

I can manually compute the number of bytes that’s right there. I can manually compute the number of bytes that’s right there from these three void*s. This is an implementation, needs to know nothing at all about the fact that that happens to be 50 ints over in that drawing. It could’ve been 200 characters, or it could’ve been 100 shorts, and it still should do the byte rotation in exactly the same way, okay.

The slight complication here that did not come up in the swap implementation is that this right here has to be written to temporary space. You’re familiar with that idea from the swap implementation. Then this right here has to be memcpy’d – although that won’t be the function we’ll use. We’ll see in a second why. This has to be memcpy’d from right here to right here. Does that make sense? Okay.

The problem with that is that – unlike all the other examples we’ve dealt with – the source region – this space, and this space right there – actually overlap, or can potentially overlap. Does that make sense to people?

The implementation of memcpy is brute force. It carries things four bytes at a time, and then at the end does whatever mod tricks it needs to copy off an odd number of bytes, but it assumes they’re not overlapping, okay. When they’re overlapping, that brute force approach might not work.

Suppose I wanna copy these first five characters – and this is meaningless, and this is meaningless. What memcpy would do – if I wanted to copy these five characters to right there – memcpy is actually quite careless – and it doesn’t do exactly t his, but it does more or less this – where it would actually copy the A right there, and then copy the B right there, and then copy the C right there, and then copy the D right there, and copy the E right there, except that you’ve trounced on the C, D, and E before you had a chance to copy it. Does that make sense to people?

Now, memcpy could figure it out. It could actually check the start address – I’m sorry – the target address and the source address – and it could copy either from the front to the back or the back to the front, whichever direction is needed to ensure that it doesn’t actually trounce over data before it’s copied. Memcpy said, “I don’t wanna bother with that. I wanna run as quickly as possible, and I want the client to take responsibility of only calling memcpy when, in fact, he or she knows that there’s no overlap.”

If they don’t know if there’s gonna be overlap they have to use a version of the function that’s slightly less efficient, but does the error checking for them. Does that make sense? It has the same exact prototype – that shouldn’t surprise you – but it’s not called memcpy. It’s called memmove.

I don’t know why they use “move” as opposed to “copy.” Probably because it’s supposed to imply that it’s actually shifting somewhere in – if you’re dealing with overlapping ranges that you’re really just moving bytes in a direction just by a little amount as opposed to really relocating them, okay.

So, the implementation of this has to be sensitive to that. What I wanna do is I wanna compute a few values. I wanna do int, front, size is equal to char*, middle minus char* front. Now, you look at that, and that looks a little weird. I am subtracting one void* from another. For the same reasons C does not allow pointer arithmetic – I’m sorry – pointer addition, it doesn’t allow pointer subtraction either. Pointer subtraction you didn’t deal with too much in 106B or 106X, but it is a defined, legal operation.

What it’s supposed to do when you subtract one pointer from another – you may think that it returns the number of bytes that sits in between them. That’s not true. If they’re strongly typed to be int*s for instance – if you do pointer subtraction between two int*s it’s supposed to return the number of ints that fit in between the two addresses, and that’s consistent with the way that pointer addition works, okay. Does that make sense to people?

So, what I’m doing here is I’m – it’s the same hack. I’m casting both of these things to be char*s so that pointer subtraction becomes regular subtraction, and I’m given the physical number of bytes that reside between that and that right there. Does that make sense? Okay.


I wanna do the same thing with end and middle so at least I know how big the two regions are. What I can do now is I can declare a raw buffer, just like I did with the generic swap function, char buffer, and I can allocate it to be of size front size. Again, this isn’t ANSI standard, but it works on our compiler, and it’s so much nicer that I’m just gonna do it.

So now I have a character buffer that’s just as big as this thing is here, in terms of bytes. So maybe that’s set aside right here because I said those were ints – if I draw it even close to scale – it will be of size 16 right there. And then I use memcpy.

I actually prefer to use memcpy, and I want you to use memcpy if you know you have the option to. I wanna memcpy into this buffer from front, front size so that if this as a figure happens to reside there, it’s replicated right there. And if these are four ints, then this will eventually be able to stand in as four ints when it’s placed in integer space.

Then I wanna take this and I wanna slide it down. When you call memcpy your heart’s in the right place, but you’re dealing with two overlapping figures so you wanna call – not memcpy – but memmove with the same signature, okay. That would be this – memmove. I wanna copy to front from middle, and I wanna copy back size, okay. Does that make sense to people?

Now, if mid is very close to the end, and it’s beyond the 50 percent point, then it turns out that memmove didn’t buy us very much because there’s no overlapping figure – I’m sorry – there’s no overlapping between the source region and the destination region.

But you can’t – in a general sense – actually anticipate that, okay. You could actually look at how much closer – whether or not middle is closer to front or end – and then call one of two versions that only called memcpy, but then all you’re doing is the error checking that memmove would do for you. So I’d rather just deal with one implementation, okay. Make sense? Okay.

The only complexity of the last line is getting this right here into the last front size bytes. I don’t actually have a target pointer for this, but what I’ll do is I’ll memcpy – that’s okay because I’m copying from the buffer – I’ll do char*, end minus front size. If from here to here is front size, then from there to there is front size. I wanna copy from the buffer, and the size of the buffer is front size, okay. Does that make sense?

So, you only call memmove if you have to, okay, because you know that there’s a very reasonable chance that the two – the source region and the destination region – will be overlapping. But I want you to call memcpy if you know you’re able to because it’s more efficient, and when you’re starting to write systems code like this you actually do have to think a little bit about – more about – efficiency than you did in 106b.

People argue, “Well, why don’t I just call memmove all time because then I can just forget.” You’re right. You could, but I’d rather you not. So I actually wanna pretend that memmove actually blows the computer up if the two regions don’t overlap, okay, because I want you to call memcpy if you know you can, okay. Does that make sense to people? Okay.

The only other function I have to mention briefly – and I actually have plenty of time to do it so I’ll just – but I just wanna go over the prototype of the generic quick sort function that exists in the C language, okay.

When you’re sorting, there’s not key. There’s just the array, the element size, the number of elements, and then the comparison function is still relevant. You know how B sort only takes five arguments? Well, Q sort only takes four. It punts in the key. Everything internally is a key compared to everything else, and it just uses the comparison function to guide it in doing all these generic swaps behind the scenes.

I do want you to use Q sort. I just wanna put the implementation up on the board so that I can say I’ve formally covered it, but you’re all familiar with just sorting in general. Quick sort happens to be a very fast sorting algorithm. It has this as a prototype. Q sort takes a void* called base. It takes an int called n – or size, actually – an int called elemsize, and then it takes a comparison function that knows how to compare two generic addresses. And there’s that. That’s the prototype for it.

If you want more detail – this is even relevant in some degree to Assignment 2 – if you happen to be stuck on using B sorts and you just wanna use some more details, you can – at the command prompt – type in MAN Q sort, and you will get more information about Q sort than you care to get. But nonetheless – at the top – will remind you what the prototype is, and what all the arguments are named so you know which ints correspond to which.

You can do MAN on B search – MAN is short for manual so it just basically gives you a textual documentation of that function. Also, memcpy, memmove – and I think there’s some other ones – malloc, realloc, free. These are the types of functions that are exercised aggressively by Assignment 3 so you’ll definitely want some resource to go to if it’s 5 a.m., and you’re working on it, and there’s nobody around, okay. Does that sit well with everybody? Okay.

So you have plenty of time to do Assignment 3. It has turned out to be – it is – I won’t say it’s easy by any stretch of the imagination because I don’t have – there’s no advantage to me saying that, but it is actually a little bit less work than Assignment 2. And people always start on it with a lot of enthusiasm because they think there’s a lot of work to be done – and there actually is – but there’s a clear list – a to do list of things – there’s like 13 functions that have to be implemented, and I provide all kinds of unit tests to exercise all of these things, okay.

And you will just make very piecemeal progress on a nightly basis, okay, so that you can get it done in one or two nights if you actually know what’s going on, okay. So just give yourself a little bit of a buffer in case you have some gotchas that come up while you’re implementing. But it is – usually surprises people that it’s not as much coding as Assignment 2 is, okay.

What I wanna do now is I have ten minutes. I just wanna give you a little preamble to the more – the more involved lecture I’m gonna give on Friday about the implementation of malloc, and realloc, and free. Let me draw what – for the first time of many –is gonna be my generic drawing of all the memory.

Here’s RAM, and since we’re dealing with a – since we’re dealing with an architecture where longs and pointers are four bytes, that means that pointers can distinguish between two to the thirty-second different addresses. That means that the lowest address in memory is zero – which is that null that you’re starting to fear a little bit – and then the highest address is two to the thirty-second minus one, okay.

Whenever you call functions, and the function call forces the allocation of lots of local variables, those local variable – or I’m sorry – the memory for those local variables is drawn from a subset of all RAM called the stack. I’m gonna draw that up here. I drew it a little bit bigger than I needed to, but here it is. The stack segment is what this thing is called.

It doesn’t necessarily use all of the stack, but for reasons that will become clear – and there’s even a little bit of intuition, I think, as to why it might be called a stack – when you call main you get all of its local variables, and they’re alive and they’re active, okay.

When main calls a helper function it’s not like main’s functions – main’s variables – go away. They’re just temporarily disabled, and you don’t have access to them – at least not via the normal variable names, right. So main calls helper, which calls helper helper, which calls helper helper helper, and you have all of these variables that are active – I’m sorry – that are allocated, but only the ones on the bottom most function are actually alive and accessible via their variable names.

When helper helper helper returns, you return back to the local where helper helper has local variables, and you can access those, okay. So basically, when a function calls another function, the first function’s variables are suspended until whatever happens in response to the function call actually ends, okay. And it may itself call several helper functions, and just go through lots of – a big code tree of function calls before it actually returns a value, or not, okay.

What happens is that, initially, that much space is set aside from the stack segment to just hold the main’s variables – whatever main’s local variables are. And when main calls something, this threshold is lowered to there to just make sure that not only is there space for main’s variables set aside, but also for the helper function’s variables, okay.

And it goes down and up, down and up, every time it goes down it’s because some function was called, and every time it goes up it’s because some function returned, okay. And the same argument can be made for methods in C++.

It’s called a stack because the most recently called function is the one that is invited to return before any other ones unless it calls some other function, okay. That’s why it’s called a stack.

We’ll get to this – we’ll probably spend two or three lectures talking about not only how this thing’s formatted, and how variables are laid out, but also how assembly code actually manipulates the information in here. And assembly code even actually decrements and increments this boundary for us in response to function call and return.

What I wanna focus on is this other block of memory called the heap segment. The fact that the heap and the stack are date structures from cs106b is actually irrelevant here – mostly irrelevant, okay.

Heap in this world doesn’t mean like a priority cube back in data structure. It really means blob of arbitrary bytes that this is the lowest address, this is the highest address of the heap segment as opposed to this segment right here, which is completely managed by the hardware – by the assembly code, which actually happens to be down here, okay – this right there, this boundary, and that address, and that address is admitted to software – software that implements what we call the heap manager, okay.

And the heap manager is software – it’s code. The implementation for malloc, realloc, and free are all written in the same file, and they basically manage this memory right here, okay.

So, every time you call malloc of 40, it goes, and virtually finds a block of size 40 in this, and seemingly returns the lead address of it, okay. If you haven’t freed 40 yet, and you go and allocate space – or malloc a request for 80, it might draw it from here – it’s a little bit more organized than this. It doesn’t just draw it from a random location, but malloc would return that.

If you realloc this pointer right here, and you ask for it not to be 40 anymore, but to be 100 because of the way it’s drawn it actually might just extend this to be a blank 100, and not touch the bytes that are up top – up front. If you reallocate this, and you ask for 8,000, and it doesn’t happen to have 8,000 minus 80 bytes between that boundary and that boundary, it might go and allocate a blob that’s 8,000 bytes, copy this over, and copy it right – and then free this right here.

So this really is a sandbox of bytes, and whether they’re integer arrays, or char* arrays, or struct fraction arrays, it’s all immaterial to malloc. It only allocates things in terms of numBytes requests, okay. Does that sit well with everybody? Okay.

So what I wanna do is I wanna be a little bit more organized in the way I demonstrate how malloc, and realloc, and free work because it isn’t that rain pell mell about allocating bytes. It does have some normative process that’s followed behind the scenes so it can be as efficient as possible on your behalf. Malloc, and free, and realloc are actually called enough – either directly or via things like operator new, and operator delete, or strdup, or your constructors, or whatever, but it wants to make them run as quickly as possible.

So what I’m gonna do is I’m gonna explode this picture to this board over here – actually, this one’s better – but I’m gonna emphasize the fact that it is one big linear array of bytes. And so, rather than drawing it as a tall rectangle, I’m gonna draw it as a very wide rectangle, and make it clear that this address right there is that one right there, and this address right there is that right there, okay. Does that make sense to people?

So I go ahead and I declare void* A is equal to malloc of 40. This isn’t exactly what happens, but it’s pretty close. It will usually search from the beginning of the heap, and look for the smallest – I’m sorry – the first free block of memory that’s able to accommodate this size request.

And initially, since the entire heap is available – what’ll happen is it’ll do whatever accounting behind the scenes as is necessary to clip off the first 40 bytes, record that it’s in use somehow – we’ll talk about that in more detail on Friday – and return the address of that right there. Does that make sense? Okay.

Next line is this. Malloc of 60 able – take this least – this naïve approach, and that’s what it does very often. It just scans from the beginning of the heap, and find the first open block that’s able to meet this request. It sees that that’s in use. It’s able to quickly hop here – we’ll see why on Friday – and say, “Okay, this entire thing is free. That’s certainly bigger than 60. So I will do that, and then return that address.” Okay.

Punting on realloc for a second, if I go ahead and call free on A it will go in because this address – this arrow right there – the tail of it is held in the A variable.

It will go and remove the halo around this memory – it doesn’t clear out the bit patterns because the bit patterns are supposed to stop mattering – and then donates it back. So if I do this, void* C is equal to malloc, and I’ll [inaudible] notches about it, and I say 44, it will look at this block right here. It will record it, and note that it’s a 40 byte free block that could’ve been used had this number been less than or equal to 40, but since it isn’t it has to hop over and consider this right there. So it will clip this off and return that address and bind it to C.

If on the very next line I do this – some implementations actually carry off where the last search ended, but the way I’m talking about it, it always searches from the beginning, okay. This time this block is big enough to meet that size request so it might clip this off, record that it’s 20 bytes wide, and return that pointer again, okay. Does that make sense?

Entirely software managed with very little exception, okay – and I say exception because the operating system and what’s called the loader has to admit to the implementation what the boundaries of the stacks of the heap segment are – but everything else is really frame in terms of this raw memory allocator, okay.

As far as realloc is concerned, if I pass this address to realloc, and I ask it to become bigger, it’ll have to do a reallocation request, and put it somewhere else – probably right there, the way we’ve been talking about it. If I didn’t ask for that to be realloced, but I asked for this to be realloced – to go to 88 – it would just extend it in place.

What’s gonna happen – and we’ll be more detailed about this come Friday – is that there really is a little bit of a data structure that overlays the entire heap segment, okay. It is manually managed using lots of void* business.

In a nutshell – let me actually, in the final ten seconds here – let’s say that this is the heap again – and just to emphasize what’s been allocated and what hasn’t been, let’s say that this has been set aside. Let’s say this has been set aside, and let’s say that this has been set aside. The data structure that’s more or less used by the heap manager overlays a linked list of what are called free notes, okay.

And it always keeps the address of the very first free note, and because you’re not using this as a client, the heap manager uses it as a variably sized node that – right in the first eight or four bytes – keeps track of how big that node is. Does that make sense to people?

So, it might have subdivided that, and to the left of that line might have the size of that node, and to the right of that line might actually have a pointer to that right there, okay. Now it’s not like there’s ints and doubles typing any of the material over here. The heap manager has to do this very generically so it constantly is casting addresses to be void*s and void**s, okay, in order to actually jump through what this thing called the free list behind the scenes to figure out which node best accommodates the next malloc request, okay. When you free this node right here it has to be threaded back into the free list, okay. Does that make sense?

I’ll be a little bit more detailed come Friday as opposed to this hand wavy after 11:50 comment, okay. But I want you to understand all this. Okay, see you on Friday.


Lecture 7: Programming Paradigms

Programming Assignment 3: Vector Instructions
17 pages

Programming Assignment 3: Vector FAQ
2 pages

Topics: Heap Management - How Information about Allocations are Stored in the Heap, Result of Freeing Memory Improperly, Actual Sizes of Heap Allocations - Nearest Power of 2, Management of Free Blocks on the Heap by Storing Addresses in the Blocks of Free Memory, Algorithms for Choosing Which Free Block to Allocate, How the Heap's Free List Can Be Updated When Memory is Freed, How Adjacent Free Blocks Are Combined To Avoid Fragmentation, Compacting the Heap By Using Handles, Stack Segment Layout, Allocation of Local Variables on the Stack by Decrementing the Stack Pointer, Activation Records and State of the Stack Pointer During Nested Function Calls, Assembly Code and the Code Segment, RAM, Registers, and the ALU, Example That Demonstrates How an Arithmetic Expression is Translated Into Register Operations



Instructor (Jerry Cain):Hey, everyone. Welcome. We actually have some handouts for you today. We just decided to hand them out after you all sat down. So you’ll be getting three handouts and they should be posted to the website, as my TA Dan Wilson is gonna do that sometime before 11:00 a.m. – before noon. Well let’s see, last time, I had just given you a little bit of an introduction as to how the heap is managed by software that’s included in C libraries that are linked against all of your own code. Every time something like six degrees, or RSG or vector test, or whatever, is created – to remind you from where we were last time, we’re gonna start talking about all the various segments in memory. Each application behaves as if it owns all of memory. We’ll give more insight into that in a few lectures. I’m specifically interested in the segment that, in practice, is usually – I mean very approximately, but usually right around here. When an application is loaded into memory, the lowest address and the highest address of the heap are advertised to a library of code that’s responsible for implementing malloc, free and realloc. Okay? Now this is a general sandbox of bytes. Malloc just hands out addresses to the internals. It actually records, behind the scenes, how big each figure actually is, so that when free is called and passed one of those three pointers there, it knows exactly how much memory to donate back to the heap.

I’ll explain how that works, in a second. Because this is managed by software, malloc, free and realloc, the person implementing that can use whatever heuristics they want to make it run as quickly and efficiently and, obviously, as correctly as possible. And so we’re gonna talk about some of those heuristics and how that works. Now you’ve always been under the impression that, when you do this – I’ll just call it ARR is equal to malloc – and you do something like 40 times the size of, oops, the size of int. And actually, I’ll go ahead and strongly type this pointer, to not be a void star but to be an int. You’re under the impression that you get actually 16o bytes back.

I can tell you, right now, that you do not – you certainly have more than 160 bytes set aside on your behalf. If this is the heap – I won’t draw all the other nodes that have been handed out, but at some point, it discovers a node that is large enough to accommodate that request right there. We think of it as perfectly sized to be 160 bytes. This address is ostensibly handed back to you and when you pass this either to realloc or to free, it actually can confirm internally – or no, I’m sorry, it can’t confirm, but it just assumes that the pointer that’s handed to free or realloc is one that has been previously handed back by either a call to malloc or realloc. Okay?

So that is2 some legitimate pointer that has been handed back. The way memory is set aside on your behalf is that it actually allocates more than that number. Okay? For a variety of reasons. But what it will normally do before it hands back an address to you, if you ask for 160 bytes, it’ll usually set aside 164 bytes or 168 bytes. Why? Because it’ll actually include space at the beginning of either 4 or 8, or actually 16 or 32, whatever it decides, a little header on the full mode, where it can actually lay down a little bit of information about how big the node is. Does that make sense to people?

So if this really is 160 bytes and this is 4, it might, among other things, write down a 164 inside that 4-byte figure. Okay? If it’s 8 bytes, it can actually write more than just the size. When you get a pointer back, you actually don’t get a pointer to the head of the entire node; you get a pointer that’s 4 or 8 bytes inset from the beginning. Does that make sense? You have access, seemingly, to all of that space right there. When you pass the pointer back to free – let’s forget about realloc for the minute – for a moment – free says, “Oh, I’m just assuming that this a pointer that I handed back earlier. If I really handed this back, then I know that I put down, as a little footprint, how big this node was. So I’m gonna cast this pointer to be an int star or a long star, or a long long star, or whatever it wants to. So that I can gracefully back up 4 or 8 bytes, interpret those 4 or 8 bytes, in a way that I know I laid down information before.” And say, “Oh look, there’s a number 164.” That means, from this point to that point right there, should somehow be threaded back into the heap. Does that make sense to people?

Okay. Given that right there, here are a couple of problems that I want you to understand why they don’t work. Array is equal to malloc 100 times the size of int. So this is a legitimately allocated block of 100 arrays. You know that you’re gonna get either 104 bytes or 108 bytes, just a little bit more – I’m sorry 404 or 408 – and then, you go ahead and you populate the array and you realize that you don’t need all the space. Maybe you only need to use 60 of the integers and you’ve recorded that in some other variable. Okay? And so you think you’re being a good memory citizen and you do this: ARR plus 60. Now the code wouldn’t get written that way, it would probably be framed in terms of some local variable, say “num ints in use” or something like that, or “effective length,” and you might because you’re more sensitive to memory allocation and you might want to donate back what you’re not using, might think that that should work. Well, if this is the 400 byte figure that you’ve logically gotten, and you’ve been handed that address and there’s a pre-node header of meaningful information there, meaningful to malloc and realloc, and you go ahead and hand back that address to free, depending on the implementation it may be very naïve and just assume, without error-checking, that it was something that was handed back via malloc or realloc before. It might do some internal checking, but malloc and free realloc are supposed to be implemented as quickly as possible and not do the error checking for you because they’re assuming that you’re really good at this C and C+ + programming thing, So they’re not gonna interfere with execution by doing all of this error checking every single time free and malloc get called. So if you were to do this, right here, and it doesn’t do any integrity checks on the number that it gets, it will blindly back up 4 or 8 bytes, whatever happens to reside in the two integers, or the one integer at index 58 and 59, would be interpreted as one of these things right here. And if it happens to store in the place where 164 is stored, right there, if it happens to store the number 29,000, it will go from this point forward 29,000 bytes and just blindly follow whatever algorithm it does to integrate a 29,000 byte block back into the heap. Okay?

Does that make sense? Now you’re not gonna say, “Sure, what impact that has on the heap and what data structures look like.” I’ll give you a sense in a second. But the bottom line, the takeaway point here, is that you can’t do that and now you have some insight as to why. Okay?

If you do this, int array 100, and you statically allocate your array, and you don’t involve the heap at all, and you use it and because you’re new to C programming and you think that you have to free the array, if it doesn’t do any error-checking on the address, it’s not even obligated to do any error-checking to confirm that it’s in the heap segment in the first place, it would go to the base address of your static array, back up 4 or 8 bytes, whatever figure and bit pattern happens to reside there would be interpreted as the size of some node, okay, that should be incorporated into the free list data structure. Okay, I’m sorry, I shouldn’t say free list because you don’t know what that is yet.

Incorporated into the collection of free nodes that the heap can consider for future – in response to future call to malloc and realloc. Okay? Is this sitting well with everybody? Okay? The best implementations – or well, best is up for debate – but the fastest implementation is: Don’t do error checking. They’ll use heuristics to make sure they run as quickly as possible. Some implementations, I’ve never seen one, but I’ve read that some implementations do actually basically keep track of a set of void stars that have been handed back.

And it’ll do a very quick check of the void star, that it gets passed to it, against – and make sure it’s a member of the set of void stars that have been handed out. And if it’s in debug mode, it might give an error if it’s not present. If it’s not in debug mode, it may ignore it and say, “I’m not going to free this thing because it will only cause problems later on.” Okay? Does that make sense to people? Yes, no? Got a nod. Okay.

Now, as far as this 160 is concerned, that is not exactly a – that’s not exactly a perfect power of 2. Implementations that I’ve seen, if this is the entire heap, I’ve seen this as a heuristic. When you pass in a numbytes figure to malloc, let’s say that it’s 6, if you quickly recognize whether or not the number that’s supplied is less than, say, 2 to the 3rd or 2 to the 5th, or 2 to the 7th, basically categorize and throw it in a size bucket as to whether it’s small, medium or large. I’ve seen some implementations actually divide the heap up, so that anything less than or equal to, say 2 to the 3rd equal to 8 bytes, is allocated from right there. Anything less than or equal to 2 to the 3rd, I’m sorry, 2 to the, like, 6th equal to 64, might be allocated from this segment. And it would always give out a block that is, in fact, exactly 64 bytes or 8 bytes long. Okay? In other words, it won’t try to actually perfectly size everything because that takes a lot of work. If it can take some normative approach as to how it clips off individual segments within each sub segment, it might actually have a much easier time keeping everything clean and organized. Okay? As long as it allocates – if you ask for 160 bytes, if it goes ahead and allocates 192 bytes, or 256 bytes, you actually don’t mind all that much because at least you’re given – you’re certainly given enough memory to meet the 160 request. Okay? Does that make sense? Yeah?

Student:That doesn’t necessarily mean that, if you have an array of 160 bytes, you [inaudible]

Instructor (Jerry Cain):Right, you’re not supposed to rely on implementation details because the implementation of malloc, free, and realloc on, say, you know, one flavor of Linux may actually be different than it is on another flavor of Linux. I mean, probably not. I’d say all of those compilers are probably GNU backed, so it’s like GCC. But like, for instance, Code Warrior vs. X code on the Macintosh. They are implemented mostly, I mean primarily, by two different sets of engineers. And one may use one different heuristic for how to allocate stuff and those who wrote X code may have gone with a different approach.

So you certainly can’t assume you have that memory. You’re just supposed to assume that you’ve got the 160 bytes and that was it. Okay? Do you understand, now, a little bit why running over the boundaries of an array can cause problems? Sometimes, it doesn’t. The most common overrun, actually, is at the end of the array. So you’ll do something like i less than or equal to 10 as opposed to i less than 10. You’ll write one space too far. Consider the less common but certainly not unrealistic situation where you actually visit all your elements from top to bottom, you go one element too far, and you actually access and write to array of negative 1. You know where that resides. It happens to overlay the space where malloc and realloc actually put down information about how big the node is. So if you, for whatever reason, go and zero out from 100 down through zero, well then you actually go 1 too far then you zero out the 1 byte – the 4 bytes where malloc is really relying on meaningful information to be preserved. And it will completely toy with the implementation of malloc and realloc in its ability to do the job properly. Okay? Does that make sense? Okay.

What else did I want to talk about? As far as how it keeps track of all of the different portions within the heap, that are available to be handed back to the client, in response to malloc, realloc calls, let me go with this picture. Let’s assume that this is the entire heap and I’m writing it to emphasize that it’s a stream of bytes. And a snapshot of any one moment – this is allocated out, right here, and this is allocated out, and let’s say that this is allocated out, as well. Each of these will probably have a pre-node header at the front. Okay? With meaningful information – they can actually keep more than just the size of the node.

In fact, some implementations – I have seen this, will not only keep track of how big this node is, but it’ll actually also keep track of whether or not the node after it is free or not. So it can kind of simplify or optimize the implementation of realloc. So it can keep a pointer to this right here. Does that make sense to people? Okay? And it can go right here and see what – how big it is and whether or not it can accommodate the stretch that is an option in response to a call to realloc. As far as all these blank nodes are concerned, it wants to be able to quickly scan the heap for unused blocks whenever malloc and realloc are called. What will typically happen is that it will interpret these nodes right here as variably sized nodes. You’re familiar with that from Assignment Two and it’ll overlay a link list of blobs over these 3 ravens right here. So the beginning of the heap is right there, right at the beginning, so it knows where the next pointer is. It’ll use the first 4 bytes of the unused blob to store the address of the next blob. Okay? It’ll store the address of the next blob right there, and then maybe, it’ll put null there or maybe it’ll cycle back to the front and use something of a circular link list approach. Every single time you call malloc or free, it obviously wants to traverse this link list and come up with some node that’s big enough to meet the request. More often than not, I see a heuristic in place that just, basically, selects the first node that it can find that actually meets the allocation request. So if this type of thing isn’t in use and it’s not segmented down into sub segments, and it’s just one big heap, it might start here. And if it’s just looking for a figure big enough to accommodate 8 bytes, maybe this one will work. Okay?

If it needs 64 bytes and this is only 32, but this is 128, it will say, “You’re not big enough, but you are.” Does that make sense? So it would approach this first fit heuristic in searching from the beginning. There are other heuristics that are in place. Sometimes, they do aggressively search the entire heaps free list. That’s what this thing is called. That’s what I used earlier. And it’ll actually do an exhaustive search because it wants to find the best fit. It wants to find the node that is closest in size to the actual size that’s needed by the call to malloc so that as little as memory as possible is left over in some free node. Does that make sense? Okay?

I’ve seen – I’ve read about, although I’ve never seen, that sometimes they use a worst fit strategy. Which means they basically scan the entire heap for the biggest node and use that one with the idea that the part that’s left over will be still fairly big. And we’re not likely to get little clips of 4 and 8 bytes which are gonna be more or less useless for most malloc calls. Does that make sense?

I have seen heuristics where they actually remember where they left off, at the end of malloc. And the next call to malloc or realloc – I’m sorry, the next call to malloc will actually continue from that point. So that, all parts of the heap are equally visited during an executable that runs for more than a few seconds. Okay? So you don’t actually get lots of complexity over here and then this unused heap over here. Okay? Does that sit well with everybody?

There are all types of things that can be done here. There’s extra meta-information can be stored here, and there, and there, about what comes afterwards, so it can simplify the implementation of free and realloc. If I go ahead and free this node right here, when it’s actually freed and donated back to the heap, none of this is changed, but the first 4 bytes actually set to thread to that right there. And this right here would be updated to point to there instead. Does that make sense to people? Okay? It wouldn’t actually go out and clear any information because it just doesn’t want to bother doing that. It could, but it’s just time consuming. And this could, in theory, be 1 megabyte and why go through and zero out 1 megabyte of information, when the client’s not supposed to touch it again? Okay?

Let me just erase this, not because it’s would be cleared out, but just so it looks like a free node, like all the other ones. Some implementations would actually prefer not two so-and-so size nodes together but one big node, since they can’t see many disadvantages to having one very large node vs. two side-by-side smaller nodes. Does that make sense?

So some of them will go to the effort of actually coalescing nodes so that the free list is simpler and they have a little bit more flexibility as to how they chop things up. Okay? Make sense? I have seen, and this is kind of funny, I have seen some implementations of free actually just record the address as something that should be freed, ultimately. But it actually doesn’t commit to the free call until the very next malloc call or, in some cases, until it actually does need to start donating memory back to the free list because it can’t otherwise meet the request of malloc and free alloc – malloc and realloc. Does that make sense?

So I’m speaking in, like, run-on paragraph form here about all the things that can be done. That’s because it’s written in software and people, those that design these things are typically very good systems programmers. They can adopt whatever heuristic they want to, to make the thing run, certainly correctly but as efficiently and elegantly as possible. Okay? And if, ultimately, this all starts out as one big random blob of bytes, the way the heap is set up is that the address of the entire free list is right there and the first 4 bytes have a null inside of it. Okay?

And that basically means that there’s no node following this one. It would also probably have some information about how big the entire heap is. Okay? Because that’s just the type of information that’s maintained on behalf of all nodes at all times, so it just happens to be the size of the heap when things start out. Okay?

Now, consider this problem right here. Here’s the heap again and let’s say the entire thing is 200 bytes wide. This is free and it’s 40 bytes. This is allocated, and let’s say it is 20 bytes, not drawn to scale. Let’s say that this is 100 bytes wide and it is free. Let’s say that this is in use; it is 40 bytes wide. And is that gonna work out? No, let’s make this a little bit smaller. Let’s make this 80; make this 20 and so this, right here, is unused. And of course, the heap is much bigger than this and the way it’s been chopped down in used block and unused blocks, is a little bit more complicated than this. But you certainly understand what I mean, when I say that there are 160 free bytes in my heap of, that’s normally of size 200. Okay? If I go ahead and make a malloc request for 40, it could use this because it’s the best fit or the first fit. It could use this because it’s the worst fit. Okay? It can use whatever it wants to as long as it gets the job done and you’re not blocked by a faulty implementation. If I ask for 100 bytes, it’s not gonna work out, right? Because, even though the sum of these free nodes is a whopping 160 bytes, they’re not uniformly aggregated in a way that you need when you malloc 100 bytes or 160 bytes. Okay? I mean, you really have to assume these things are gonna be used as an array of ints or structs, or whatever. And you can’t expect the client to really absorb a collection of pointers and figure out how to maintain an array or overlay an array over multiple blocks. So you may ask yourself, “Well, can I actually slide this over and slide this over even further, to come up with this type of scenario where this at the front and this is right next to it? And then I really do get 160 bytes that are free – 20 and 20, so that if I ask for 100 bytes, I can actually use it?” This process, it does exist. It won’t exist in this problem for – in this scenario for reasons I’ll explain in a second. But what I’m really doing here is I’m just basically compacting the heap like a trash compactor compacts trash. Okay? Try and get it as close to the front as possible, okay because you know that, what remains after everything’s been sifted to the front, is one very large block. Does that make sense? Okay? That’s all fine and dandy except for the fact that you, very likely, I’m sure it’s the case, have handed that address and that address out to the client. And it’s going to resent it if move the data out of its way. Okay? For you to bring these over here means that the client is still pointing there and right there and now, they have no idea where their data went. Okay? That’s a problematic implementation.

So there are some benefits of actually compacting this heap. Okay? Now I haven’t seen any modern systems do this, but I know that the Macintosh did this about 12 years ago, when I was doing more systems work on the Macintosh. Oops, you know that’s fun. Recognizing that there are some advantages to being able to compact the heap to create fewer large nodes as opposed to lots and little fragmented nodes, they might clip off some part of the heap. This is used in a traditional manner by the heap manager. It’s directly dealt with by malloc, free and realloc. But they might clip this part off and, even though it looks small on the board, it would in theory be a pretty large fraction of the entire heap, which is, you know, megabytes or – megabytes or gigabytes in size. It would manage this via handles. So rather than handing out a direct pointer into the heap, like we have up there, and we saw that interfered with heap compaction, what some operating systems did, in addition to malloc and free and realloc, would actually hand out what are called handles, which are not single pointers directly to data, but pointers that are two hops away as opposed to one hop away from the data. And you may say, “Well, why would they want to do that?” Well, they would not only hand out double pointers, but they would actually maintain a list of single pointers. If you ask for 80 bytes via a handle as opposed to a pointer, then it could maintain a table of master pointers and hand out the address of that to the client. Okay? So the client is now two hops away from his data, but the advantage is that this is owned by the heap manager and, if it wants to do compaction on this upper fourth of the picture, it can do it because it can also update these without affecting your void double stars. Does that make sense? Okay?

I saw that used in Mac OS 7.6 and Mac OS, like 8 – not actually 8 never existed, but like all of the 7 series of Macintosh Operating System, probably from 1994 and ’95, when I was doing some consulting work that involved this. Very clever idea, the problem is that thing is compacted in the background, at a low-priority thread. Okay? Or it becomes higher priority if there’s a demand for this and there’s no space for it. You can’t actually have the heap compaction going on simultaneously to an actual double d reference request because you can’t actually have data sliding in the moment while you’re trying to access it.

So the paradigm, or the idiom rather, that people would use for this is, if you really wanted to get memory to something that’s compacted and managed more aggressively by the heap manager, you would do something like this. Void star star handle and it would be some function like new handle. That’s what it was in Mac OS 7.6 days. You might ask for 40 bytes. Okay? Recognizing that the 40 bytes might be sliding around in a low-priority thread in the background, to keep that thing as compact as possible, you wouldn’t bother with it when you knew that you were gonna certainly read to but even read from those 40 bytes. That you actually, somehow, had to tell the operating system to stop moving things around, just long enough for me to read and/or write. Okay? So you would do something like this, handle lock, which basically puts safety pins on all the blocks in that upper fourth of the diagram and you say, “You can move everything else around, but you better not change the pointer that’s maintained in this table, that’s addressed by my handle. Because I’m gonna be annoyed if you do because I’m about to manipulate it right here.” And if you’re a good programmer, when you’re done, you unlock, you call handle unlock, so that flexibility has been restored for the heap manager to start moving things around, including the block that’s addressed right there. Does that make sense? So there’s a hint of, like, concurrency there. We’ll talk about that more in a few weeks. But that’s how heap compaction, as an idea, can be supported by a heap manager, okay, without interfering with your ability to actually get to the data. Okay? You never really lose track of it because you’re always two hops away as opposed to one hop away. Okay? Does that make sense? Okay, very good. I’m trying to think what else. I think that’s enough. Just kind of a hodgepodge of ideas. I’m not gonna have you implement any of this stuff, okay? But we’ve actually – I’ll try and dig up a section problem, not for this week but for the week after, where people on the mid-term implemented a little miniature version of malloc, where you actually had to understand all of this stuff. Okay, now I can tell you right now that I’m not gonna do that because that problem didn’t go very well, when I saw it was given. But it’s certainly a suitable section problem and something that you can do in like a less time-pressured environment. Okay? Make sense? Okay. Software managed. It is managed entirely by malloc, realloc and free. What I want to do now, is I want to start talking a little bit about the stack segment. Let me get some clear board space here. The first 20 minutes of this actually very easy to take. It’s Monday that’ll be a little bit more intense. When I talk about the stacks thing, I drew it last time; it’s typically at a higher address space, as a segment. This is the entire, as a rectangle, the entire thing is set aside as a stack segment, but you’re not always using all of at any one moment. In fact, you’re using very little of it when a program begins because there’s so few active functions. Okay? Usually, the portion that’s in use is roughly – it’s very rough –roughly proportional to the number of active functions, of the stack depth of all the functions that are currently executing. Okay? I will draw more elaborate pictures within this in a little bit. But let me just invent a function and I don’t care about code so much as I just care about the local variable set. Let’s say that the main function – let’s not worry about any local, any parameters – let’s say that it declares an int called “a.” Let’s say that it declares a short array called “b” of length 4, and let’s say that it declares a double called “c.” Okay? Actually, I don’t want to call this “main.” Let’s keep it nice and simple; let’s just call it the “a function.” All of the implementation of “a” does, is it calls “b” – I’m not concerned about passing parameters, and I’d call “c” afterwards, and then it returns. We’ll blow this off for a second. When you call the “a” function, obviously space needs to be set aside for this int and those 4 shorts and that single double there. 1-byte figure, four 2-byte figures and one 8 byte figure. Not surprisingly, it’s somewhat organized in the way that it clips off memory. It doesn’t take them from the heap; it actually draws the memory from the stack. All of the memory that’s needed for these four variable right here – I’m sorry, three variables and technically, like six because there’s four of them right here, they’re packed as aggressively as possible and there’s even some ordering scheme that’s in place. Because “a” is declared first, “a” gets a 4-byte rectangle, and because “b’s” declaration is right afterwards, it’s allocated right below it in memory. Okay? This being an 8 byte double would have a rectangle that’s twice as tall and I’ll just do that to emphasize the fact that it’s two [inaudible] as opposed to one. Okay? Does that make sense?

When you call this “a” function, what happens is, if I go ahead and just represent – I can’t – I don’t want to draw this entire thing again. So just let this little picture right here be abbreviated by the hexagon. Okay? This thing is what’s called a activation record, or stack frame, okay, for this “a” function. Let’s assume that “a” is about to be called. When “a” is called, this pointer, that’s internal into the stack segment, it actually separates the space that’s in use from the space that’s not in use. It’ll actually decrement this by size of hexagon. Okay? Do you understand what I mean when I say that?

And that means that the memory right here is interpreted according to this picture, right here. It’s like this picture right here overlays those 20 bytes that the stack pointer was just decremented by. Okay? Notice it doesn’t lose track of all the previously declared variables that are part of functions above “a” in the stack trays. And when “a” calls “b,” and calls “c,” it’ll decrement the stack pointer even a little more. Does that make sense? Okay. Maybe element “b” – I’ll keep these simple. Void “b” declares an int called “x,” a car star called “y” and let’s say a car star array called “z” of length 2. The stack frame would look like this. “X” would be above “y’s” 4 by pointer, and then, this is an array of size 2. So this is a total of 16 bytes, there. I’ll abbreviate it with a triangle. Okay? And let’s say that “b” actually calls “c” as well and “c” has a very simple local variable set, I’ll say a double m of length 3, and I’ll just do a stand-alone int called “m” and it doesn’t do anything that involves any other function calls. The activation for this would be big, there’s three doubles; there’s a stand-alone int below it; there’s “m” – this is all of “m” of zero through 2. And I’m gonna abbreviate this with a circle. Okay?

So the very first time “a” gets called, you know just the work flow will bring you from “a” to “b” to “c”, back to “b”, back to “a”, which will then call “c”. Okay? So as far as the stack is concerned – let me just draw a really narrow stack, okay, to emphasize the fact that the full width is being used to help accommodate the stack frames right here, when “a” is called, whatever’s above here, we have no idea. They’re obviously gonna correspond to local variables; they’re preserving their values; there’s just no easy way to access them, unless you happen to have pointers passing those parameters. Okay?

When you call “a,” this thing is decremented by size of hexagon. Oops. I needed to remind myself what a hexagon looked like. That’s that and then, this pointer right here, is kind of the base or the entry point that grants the “a” function access to all of its local variables. It knows that the last parameter “c” is at an offset of zero from this pointer right here. Okay. 8 bytes above that is where the second to the last variable that was declared can be accessed, etc.

This pointer is actually stored on the hardware; we’ll see that on Monday. But this actually what’s called the stack pointer and it always keeps track and points to the most recently called functions activation record. When “a” calls “b,” it decrements this even further. I’m gonna stop drawing the arrow and the triangle activation record is laid right below that. Okay? Access to this, it just isn’t available to the “b” function because “b” doesn’t even know about it. It just – it’s incidental that it happens to be above it, but “b” can certainly access all the variables that part of the triangle activation record. Since “b” calls “c,” it’s gonna lay a circle below that. When it calls “c,” that circle will be alive and in use, the stack pointer will be pointing right there until “c” is done. When “c” exits, it simply raises the stack pointer back to where it was before “c” was called. Whatever information has been written there, okay, it actually stays there. Okay? But it’s supposed to be out of view; it can’t be accessed. Okay? So – I’m sorry, “c” returns and then “b” returns and we come back to that situation and the drawing there technically corresponds to the moment – we’ve just returned from “b” but we have yet to call “c.” Right? And so when it calls “c,” it follows the same formula. It has no idea that “b” has been called recently. And the formula here is that it just overlays a circle over the space where the triangle was before. Okay? So it actually layers over whatever information was legally interpreted and owned by the “b” function. But it doesn’t know that and it’s not really gonna be the case that “c” has any idea how to interpret “b’s” data. It doesn’t even necessarily know that “b” was called recently. All it’s really doing is it’s inheriting a random select – a random set of bits that it’s supposed to be initialize to be meaningful if it’s going to do anything useful with them. Okay? Does that make sense? Do you now see why this thing is called a stack? I’m assuming you do. Okay? Every time – before you can actually access “b’s” variables, if you call “c” – “c” has to be popped off the stack the stack frame, before you can come back to “b.” Okay? Now we’re gonna be a less shape-oriented in a little bit and start talking about assembly code and how it manipulates the stack. So we’re gonna suspend our discussion of the stack for, like half a lecture, okay? But now, I want to start focusing a little bit on assembly code. Okay? And how that actually guides execution. We’re ultimately going to be hand compiling little snippets of C and C + + code to a mock assembly language. Nothing – it’s not an industrial thing like MIPS or X86 or something like that. That would be more syntax than concepts. So we have this very simple syntax for emulating the ideas of an assembly language. And I want to start talking about that now. There is a segment down here. It usually doesn’t have to be that big because the heap and the stack – the stack is actually never this big when it starts out; it can be expanded. The heap is usually set aside to be very big because anything that’s– any huge memory resources are typically dynamically allocated while the program is running. This right here, I’m gonna refer to as the “code segment.” Just like the heap and the stack, it stores a pattern of zeros and ones there, but all of the information in the code segment corresponds to the assembly code that compiled from your C and C + + programs. Does that make sense? Okay. So I’ll give you an idea of what these things look like in a little bit. Let me just talk about, at least at the cs107 level, what a computer processor looks like. So I’ve already drawn all of RAM a couple of times, I’ll do it one more time. There it is; I don’t have to break it down, just stack and heap and code. That’s all of RAM. In any modern computer, there’s usually – now it’s very often that there’s more than one of these things, but we’re just gonna assume a UNA processor, okay, where this is relatively slow memory. You’re not used to hearing of RAM, which this is, as slow. But it’s slow compared to the memory that’s set aside in the form of a register set. I’m drawing it – it makes it look like it’s one-fourth the size of RAM; it’s not; it’s much smaller. In our world, we’re gonna actually assume that all processors have 32 of these registers, where a register is just a general purpose 4 byte figure that we happen to have really, really fast access to. Okay? Does that make sense to people? I’m going to call this R 1; I’m going to call this R 2; I’m going to call this R 3 and I’m going to draw a 3 instead of a 2, etc. And those are going to be the names for these registers. The registers, themselves, are electronically in touch with all of RAM. There’s a reason for that. Okay? And when I’m doing this, I’m just drawing this big network of silicone wires that are laid down microscopically – okay, not microscopically, but you know what I mean. So that, every single register can technically, draw and flush information to and from to and from RAM. Okay? The 16 or the 32 registers right here, are also electronically in touch with the electronics. It’s always drawn this way; I’m not really sure why. What’s called the “arithmetic logic unit” or the ALU, that is the piece of electronics that’s responsible for emulating what we understand to be addition and multiplication, and left shifting and right shifting of bits, and masking it. All of the things that can be done very easily on 4 byte figures. Okay? Does that make sense to people? So we can support plus and minus, and times, and div, and mod and double less than and double greater than, and double ampersand and double vertical bar, and all of these things because of this ALU right here. Okay? It is electronically in touch with the register set. There are some architectures that use a slightly scheme right here. I’m going with an architecture or an assembly code – I’m sorry, I’m going with a processor architecture where the ALU is just in touch with these things right here, and it’s not directly in touch with general RAM. The implications of that is that all meaningful mathematical operations have to actually be done using registers. Does that make sense?

Now you may think that that’s kind of a liability, that you’d have to actually take something from memory, load it into a register, in order to 1 to it. Okay? Well, that actually is what happens. The alternative is for you to get this right here to electronically in touch with all of RAM and that would be either prohibitively expensive, or it would be prohibitively slow. Okay? So they can optimize on just load and store between registers and general RAM and then, once they get something into the register set, that’s where they do all the interesting stuff.

So the typical idiom that is followed for most statements – anything mathematical – think i plus plus, or i plus 10 or something like that, is to load the variables from wherever they are, either in the stack or the heap, okay; load them into registers; do the mathematics; put the result in some other register, and then flush the result that’s stored in the register, out to where it really belongs in memory. Okay?

Just assume, without committing to assembly code right here, that that’s the space that happens to be set aside for your favorite variable, i. And it has a 7 inside of it. Okay? Let’s say that somewhere, probably close by, there is a 10. Okay? And you’re curious as to what happens on your behalf at the memory level in response to that right there. Well, truly, what happens is the j plus i compiles to assembly code and the assembly code is the recipe that knows how to load j and i into register set; do the addition and then flush the result back out to the same space that j occupies. Okay?

But just in terms of actually seeing where – how the 7 and 10 move around, what would probably happen is the 7 would be loaded into R 1. The 10 would be loaded into R 2. You can actually add the 10 and the 7 and store the result in R 3 because these two registers are in touch with the electronics that are capable of doing that addition for you. And after you synthesize the result right here, you can flush it back out to j. So given what I’ve shown you, so far, it’s not a useless metaphor for you to just think about assembly code instructions as byte shovelers, okay, to and from RAM, and also doing very atomic, simple mathematics. Okay? Plus, minus, things like that. Okay? The components of assembly code that really are different are usually implementation. I think the double e aspects of computer architecture are very difficult, probably because I’ve never really studied it; I’m a compute scientist. Okay? But also, the parts that are interesting to us, is how something that is stated, usually quite clearly, in C or C + + code, actually gets translated to all of these assembly code instructions, such that the assembly code, which is in touch with memory, actually executes and imitates the functionality of the C and C + + code. Okay? If this is my C + + code, okay, then I need this to become a 17. How does the assembly code actually do that for me? How do I have this one to one translation or one to many translation, between individual C and C + + statements, okay, and the assembly code instructions that have to be done in sequence in order to emulate that, right there. Does that make sense? Now, given these pictures over here, you shouldn’t be surprised that j doesn’t move as a variable while that program – while that statement is running. Okay? So it’s always gonna be associated with the same address in RAM. Okay? That’s good. You don’t want it flying around when you try to write a 17 back to it. Okay? You may ask why I don’t – why they wouldn’t – what are the disadvantages of trying to get this to be in touch with general memory? I mean, if we have to have this in touch with that – if we have to have this complex matrix between the register set and this right here, okay, then why not just have it this complex matrix of wires between general RAM and the ALU? It just makes the implementation of the hardware that much more complicated. There’s no way of getting around at least a few set of operations between RAM and the register set. Okay? At the very least, you have to have load and store instructions to move four 2 and 1 byte quantities to and from RAM, in between the register set. Does that make sense? You have to have at least those. If you actually try to support addition between two arbitrary addresses, it can technically be done. It might make a clock cycle that’s actually on the order of seconds. Okay? But it technically can be done. But hardware designers obviously want the hard – want to be able to do any form of atomic operation at the hardware level, but they also want it to run as quickly as possible. So given that this right here would have to correspond to more than a few actions; load i into register; load j into a register; do the addition and write things back. There were, like basically four verbs in that sentence, four actions that needed to happen in order for that to emulate that right there. What I’m loosely saying is this is a statement that will correspond or gen – will – this will compile to four assembly code instructions, two loads, an addition and a store. If it tried to optimize and do all of this in one assembly code instruction, it could do it. Okay? It would actually probably require that this be in touch with that right there, but the clock cycle would have to increase because whenever you have a more complicated hardware implementation, it’s generally the case that the clock cycle speed goes up. It may go up – if it goes up by more than a factor of 4, then you’d actually prefer the really simple, load; do the work here and then flush out. Because in the end, you’re really worried about correctness but also speed. Okay? And if you have this very robust implementation, where everything can be done directly in memory, but the clock cycle is on the order of, like, milliseconds as opposed to nanoseconds – or not nanoseconds, microseconds, okay, then you actually prefer to go with the simpler idea, this load store architecture, where you always load things into registers, manipulate them in a meaningful, mathematical way and then flush the result out to memory. Okay? So I actually don’t have any time. I have 25 seconds; I can’t do very much, then. What I will do on Monday is I will start seemingly inventing assembly code instructions. And I will come up with syntax for actually loading into a register a 4-byte figure from general memory, doing the opposite, taking something that’s in a register and then flushing it out. And then, talking about what forms of addition and subtraction and multiplication are supported between two different registers. Okay? We’ll get –


Lecture 8: Programming Paradigms

Computer architecture
A simplified picture with the major features of a computer identified. The CPU is where all the work gets done, the memory is where all the code and data is stored. The path connecting the two is known as the "bus."
Handout 8: Computer Architecture
5 pages

Section Assignment 2
3 pages

Section Assignment 2 Solutions
6 pages

Handout 9: Simple Code Generation
10 pages

Topics: How a Code Snippet is Translated into Assembly Instructions, Store, Load, and ALU Operations, Assembly Optimizations for 4-Byte Addresses, Context-Insensitive Code Translation, Overriding the 4-Byte Default in Assembly Instructions, Translating a For Loop into Assembly, Using Branch Instructions and the PC Register, Pointer/Array Arithmetic in Assembly, Unconditional Branch Instructions (Jumps), How a 4-Byte Assembly Instruction is Encoded in Memory



Instructor (Jerry Cain):Hey, everyone, we’re online. I don’t know have any handouts for you today. I started to introduce the next segment of the course as being the part where we actually cover computer architecture and assembly language. We’re gonna spend a lot of time trying to figure out exactly what our C and C++ code snippets, or just the C and C++ code, although we’ll only deal with snippets in any one example, how it actually compiles to assembly code. Talk a little bit about the memory the models, talk a little bit about how function call and return works, and also to expose you to, not a real assembly language, but at least a little mock one that we’ve invented for CSten7 purposes to kind of show you all of the little gears that are in place to get addition and multiplication and division and function call and return and pointers, and all those things to actually work, but at the hardware level. Okay? Now, when I left you last time I had started talking about the stack segment. And if you remember, probably about halfway through Friday’s lecture I had hexagons and circles and triangles as placeholders for basically the skeletal figures that overlaid memory, and showed you how all the local variables in a particular function were actually packed together in what was called an activation record. In many ways, the assembly code that we’re going to be writing, are really just these 4-byte instructions. They’re ultimately 0’s and 1’s, but they’re interpreted by the hardware to access variables within the triangles and the hexagons, okay, and pull them into registers. Maybe add one to them, or add ten to them, or pass it to some helper functions just to get the assembly code to imitate exactly what your C and C++ code was written to do. Okay?

So here’s the stack segment. Let me just contrive this one little block of code. All right, int i, I have int j, and then I do i is equal to ten. And then I do j is equal to i plus seven, and then I’ll do j plus plus. Let’s just assume, not surprisingly, that i and j are packed together as some 8-byte activation record and they reside somewhere in the stack segment. I’m just gonna draw it randomly right here. Okay? And that address right there, I’m going to assume, is stored in one of those 32 registers that I named last time; I’m just going to go with the first one and call it R1. So R1 is a general purpose 4-byte bit pattern storer that actually happens to store the base address of the activation record that makes up this code snippet. Okay? Now, you ultimately know that a ten has to be placed there, and that an 18 will eventually go there. Okay? Does that make sense to people, I’m assuming? Okay. But I actually am more focused not on the numbers themselves, as the assembly code that actually does that relative to this address right here. So I’m going to make some assumptions that the base address of the two variables that are relative here, are stored in a special dedicated register. I’m going to call it R1 now; I’ll change the name in a little bit. And I’m just gonna illustrate by example what assembly code instructions look like to actually put a ten right there, and then to pull it into a register, add seven to it and flush it back out to the space for j. Okay?

In order to get a ten into that space right there, we don’t actually deal with that address specifically, we just deal with an address that’s at an offset of positive four from the address stored right there. Okay? The notation for doing this, this one line in our little mock assembly language would compile to that. And that’s our first assembly code instruction. Okay? That capital M, it’s like it’s the name of all of RAM. You think about all of RAM as this big array of bytes, M is the name of that thing, R1 is the base address, four is an offset. This right there identifies the base address of the 4-byte figure in all RAM that should get a ten. Does that make sense to people? Okay. So this is an example of a store operation. And it’s called a store operation because it actually updates some region in a stack segment with some new value. The very next few instructions are in place to actually take care of j is equal to i plus seven. I don’t want to bank on the fact – or rely on the fact that I know because of this small example that i is equal to ten. I wanna do the most robust thing, which is to actually go and fetch the value of i, pull it into a register where you’re allowed to do addition, added a seven to it, and then flush the result out to M of R1. Does that set well with everybody? Okay.

So don’t bank on the fact that there happens to be a ten right there. An optimizing compiler might take an advantage of that, but we’re not writing an optimizing compiler, we’re just trying to brute force without the translation process right here. I wanna do this: into R2, another general purpose register, I wanna load the 4-byte bit pattern that happens to reside right here. Now, this is an example of a load operation. Okay. Let’s make it so that that really looks like a load. R2 contains the 4-byte bit pattern for what’s residing in i right there, so this corresponds to the understanding that we’re going to operate on i’s value. Into R3, another general purpose register, I’m going to take whatever R2 has as a bit pattern and add the immediate constant seven to it; this is an ALU operation. It is actually very often the case, though, on the right hand side of some ALU operation, there are either two registers, or there’s a single register and a constant, and that wherever the result is to be stored is identified on the left hand side. Okay. That probably makes sense to people. This is a bit pattern, is the thing that has to be replicated in the space that’s really set aside for j, and we know where j is because we have R1 storing the base address of it. So right here, I have yet another store operation. So two of the registers, R2 and R3 were piecemeal updated with a ten right there, a 17 right there, and then a 17 is flushed right there. Okay. Does that make sense to people?

This load ALU store sequence is actually very common, and exists more or less on behalf of any kind of assignment-oriented statement in C or C++. Okay? The very next line really expands to j is equal to j plus one, so it has a structure that’s very similar to this. I can reuse registers. I can even discard a temporary value that’s in R2 if, for whatever reason I want to. I don’t have to write to R3 right there, and then I can do M of R1 is equal to the new value that ended up in R2. Okay? I didn’t have to be that efficient about the user registers, but it’s not a problem if I don’t need the old value of j and I just want to deal with the incremented value. It’s fine to overwrite R2, and to override its old value. So this has the same exact structure as that right there, it just happens to be dealing with variable in RAM as opposed to two, and this is how this 17 goes from an 18, and R2 would also have an 18 inside of it. So a few things to say about this, by default all of the load and store and ALU operations deal with 4-byte quantities. Okay? It’s clearly the case that pointers and ints are probably by far the most common atomic type you deal with in programming, at least in C and C++, so the hardware is set up to optimize, by default, dealing with 4-byte figures. Okay. That doesn’t mean we can't deal with isolated characters and shorts, or that we can't deal with doubles and structs as a whole. But I’m more interested in being 4-byte oriented with all of my instructions here. Okay.

A lot of you may be questioning why I bother to do this. You could say, “Well, why don’t I just set R3 equal to ten plus seven?” I want to – every single one of these blocks – this corresponds to that statement, these three correspond to that statement, these three correspond to that statement. I want every single block of assembly code instructions to be emitted or rendered in this context insensitive manner. And the reason I’m saying that is because this code is going to work out and do the right thing even if I change this line right here. Do you understand what I mean when I say that? Okay. If I were to hard code in an M of R1 is equal to 17, then that would stop being correct if I changed this line, which was supposed to be completely unrelated to it – to, like, a 12 or a 100. Okay? Does that set well with everybody? Okay. The same thing with this. You could argue, “Well, why don’t I just go in and do M of R1 ++?” R assembly code instruction doesn’t allow ALU-like operations to be performed on arbitrary memory addresses. You always have to go through the load, do an ALU operation on just registered values, okay, and then flush the result back out to memory. It more or less makes for a simpler assembly language, and it also makes for a faster clock speed when things are that simple. Okay. Make sense to people?

Let’s deal with an example that actually doesn’t always deal with 4-byte quantities. Let me go ahead and do int i car ch – actually, I don’t want to do that – [inaudible]. Let’s do short S1 short S2. Okay. The way I’ve declared these, we would be dealing with an activation record that looked like this: i’s right there; this is where S1 is gonna be declared, this is where S2 is gonna be declared. Okay? That looks a little weird, but S2 is the last of the three declarations, and so it’s gonna be at the lowest base address according to my model. Okay. I’m always gonna give you variable declarations that work out nicely and that are an even multiple of 4-bytes so that the pictures are always fully rectangular and with no, like, chunks or nips out of the corners. If I do this, that’s more or less the same as that instruction up there, with a different number. This would translate to M of R1 plus four, assuming that the register R1 has been set to point to that right there. R1 plus four, memory [inaudible] reference is equal to 200. Okay. That means as a 4-byte figure, the bit pattern for 200 is laid down. Now, 200 is a pretty small number, so in a big ending world, we’d expect one byte of zero followed by another byte of zero, followed by a third byte of zero, followed by one byte that happens to have all of the 200 in it. Does that set well with everybody? Okay. This line has to somehow update S1 – let’s actually write it, to be equal to, logically, 200. But it’s only supposed to update two bytes of memory. Does that make sense? Well, if you wrote this, your heart would be in the right place, but there are two fairly relevant errors going on here. First of all, you can't do a load and a store in one single operation. Okay. Because that would require assembly code instructions to somehow encode in four bytes the source, memory address, and the destination memory address, and that’s difficult to do. So what we wanna do is we want to evaluate i as a standalone expression, and just do this: R2 is equal to M of R1 plus four, again I’m being context insensitive about it.

That means that R2 is gonna have that as a bit pattern inside of it. And then what I want to happen is I somehow want to update this two bytes right here with the 200, but I only want those two bytes to be copied. That’s consistent with what we know happens just at the C and C++ level. Okay. Now, if I do this, that won't do what we want because the assembly code that I’m writing right there has absolutely no memory as it’s executing what C or C++ code was in place to generate it. And the way I’ve written it right there, it’s just like so many other operations I’ve put up here. This would be taken as an instruction to update the four bytes from this address through this address, plus three, okay, with new information. And it would update that and that and that and that with four new bytes. Does that make sense to people? I don’t want that to happen. This instruction right here would put a zero and a zero and a zero and a 200 right there, and then all of a sudden, S1 would be initialized to zero, and i would become some very, very large number. Okay? That’s because I’m mistakenly dealing with a 4-byte byte transfer here and I don’t want that. In our world, you override the 4-byte rule by actually putting a little dot-2 right there. It’s as if all of these other instructions have an implicit dot-4, but we don’t bother writing dot-4 because dot-4 is just for the default. But when you only wanna move around a single half-word, which is two bytes, or a single byte, you’d put dot-2 or dot-1 there. Okay? That’s an instruction that when we’re sourcing from a register, so just take the lower half of it and update the two bytes that they give in this address right here. Okay? And update this with zero and 200, and that’s how S1 becomes 200, and that’s how i is left alone as the 200 is a 4-byte quantity. Okay? Does that make sense to people? Okay. If I do this, S2 is equal to S1 plus one, it’s very similar to the j++ up there, but we have a lot of dot-2’s that are in place to make sure that only two bytes are moved around at any one moment.

This right here would have to load S1 into a register. This is how I would do that: R2 is equal to M – whoops – M of R1 plus two. But the way I’ve written it right there, it will copy four bytes as opposed to two bytes, unless I do this. What that does is forget about the old value and set R2. When it pulls two bytes into a register, it lays these two bytes in the lower two bytes of the entire register, and then it sign extends it just like it would in C and C++. So it’s padded with two bytes of zero’s right there. Okay. Now this plus one is just traditional plus one. R3 is equal to R2 plus one; that’s what takes this to be 200 to 201. And then when I flush the result out to this other space that has just two bytes associated with it, I do it this way: M of R1 equals dot-2, the result that’s stored in R3. Okay. So I understand that these examples aren’t all that riveting from an algorithmic standpoint, I’m really just trying to illustrate our assembly code languages as operations on general memory where the general memory is framed as an array of bytes, but everything is taken in 4-byte chunks, by default anyway. Okay. Yep, go ahead.


Is there a reason why you used a third register in that last –

Instructor (Jerry Cain):

As opposed to there, no.


– second one?

Instructor (Jerry Cain):I actually – I’m inconsistent with my use of registers, in terms of, like, the conserving. I just conserve them if I anticipate having a lot of them in the example. I just did it there, I should have put R2. I thought of that as I was writing it. Okay. Any other questions at all? Yeah?


What is the [inaudible]?

Instructor (Jerry Cain):M of R1 identifies this space right there; that corresponds to S2. Okay. So that’s receiving some 2-byte bit pattern because of that dot-2 right there. Does that make sense? R3 stores the incremented value, or the result of the plus one operation that’s right there. So a full evaluation of the right-hand side ultimately made it into R3. I only have room for the bottom two bytes, which is why I have that dot-2 right there. Okay. So this is how I get zero, 201 right there. Okay.

Now there are other things that go on, but as far as – interestingly enough, except for function call and return, you’ve seen almost all the mechanics of the assembly code language that I want to teach you. Okay.

Let me do this, let me deal with an array of length ten – no, actually that’s too big. An array of length four is big enough, and then I have a standalone integer. And I just wanna go ahead and figure out what this four loop will translate. And with each iteration, I just wanna update some value in the array to be equal to zero. After it’s all over I wanna set i minus minus.

Just to say something silly right here, do you understand how this code is simple enough that it’s just executed sequentially? The assembly code is executed in sequence as well. First clock cycle it updates memory according to that rule right there; next clock cycle it does that and then that and then that and that and that and that. And over the accumulation of six clock cycles, we’ve effectively realized these three C statements. Memory is updated in a way that’s consistent with the way these three lines have been written. Okay.

There’s some looping going on there, clearly. If there’s looping in a language, then you shouldn’t be surprised that there are at least directives in it in the assembly code language to jump back an arbitrary distance to start over some loop again. Okay. Or with the case of, like, if statements and switch statements; they’re based on the result of some test, you actually jump forward four instructions or 12 instructions to the point it starts executing the else clause or some particular case statement. Okay. Does that make sense to everybody? Okay.

So as I write this, I should write assembly code, and you’re familiar with how that and that would be executed, maybe to some degree that, as well, although you haven’t seen a raise in the context of assembly language, but the test is new to us. The i minus minus is not, but the actual looping certainly is.

You know how there are six different relational operators that can be set in between any two integers? Less than, less than or equal to, greater than, greater than or equal to, double equals, not equals to. It’s typically the case in most assembly languages that they have branch instructions that are guided by the same six types of relational operators. Okay.

Let me just write the code for this. Let me draw the picture; the way this would be laid out is that I’d have 16 bytes with four more bytes below it, that’s I, this is array of zero through three, but I’ll emphasize that this is the zero, the oneth, the twoth, and the threeth. Okay. And assume that R1 points to the base address of the entire figure. The actual hardware will make sure that the base address of the currently relevant activation record – that that address is stored in some register; I just happen to be calling it R1 at the moment. Okay.

So we come here. We assume that R1, as a register, stores the address of that picture up there. And the first thing that happens, is that this thing gets executed exactly once because it’s in the [inaudible] portion of the four-loop, obviously. So what happens is that right up front, M of R1 is set equal to zero. And that gets a zero right there. Okay.

We next execute code on behalf of this to decide whether we’re going inside the body of the four loop, or we’re circumventing it because the test failed. Okay. These are the assembly code instructions that would be expanded on behalf of this test right there. I’ll put a double line right there to mean that there’s some new C statement that we’re dealing with.

I would have to load into R2 the value that’s at M of R1. Okay. Again, I’m starting to generate code in a context insensitive manner, I don’t want to assume that there’s a zero in there. In fact, you’ll see in a little bit that there won't always be a zero in M of R1. So I wanna load that into a register, and then I have this branch instruction, b, and I’m gonna leave a blank and a blank right there. We’re gonna fill this in with some abbreviation for not equals or greater than or equal to or whatever. We’ll decide what it is in a second. It takes three arguments. It takes – whoops – the first register in the comparison or constant, the second register in the comparison or a constant, in this case it’s four; and then it takes as a third argument what’s called a target address, the place to jump to if the branch instruction passes. Okay. Now we jump forward several instructions – I’m sorry, let’s say one C instruction here if this test fails. But if I take the logical and inversion of this, and I jump forward when this test passes, okay, then I’m circumventing the four loop. Does that make sense to people? This is where code for the array of i equals zero will be placed. When this test fails, this test needs to pass so that I jump forward a certain number of assembly code instructions to the part that actually executes that right there. Okay. That means that I want to branch, I want to circumvent the normal pattern of advancing to the next assembly code instruction if this as a number is greater than or equal to four. Okay? And so the abbreviations for these branch instructions shouldn’t surprise you; they are: branch on equal, branch on not equal, branch on less than, branch on less than or equal to, branch on greater than, branch on greater than or equal to. Okay. So if R2, which stores the current value of i is greater than or equal to four, we know the loop is over, so we wanna jump forward as if the loop never existed before. Okay. We have to fill this part in. All I can tell you right now is that the address here is gonna be framed as some offset relative to a special register I call PC. Now, PC is really gonna be like the 27th or the 29th or the 31st of the 32 registers, we just give a better name for it. PC stands for program counter, and it stores the assembly code instruction of the currently executing instruction. Okay. I’m sorry, it stores the address of the currently executing instruction. We never know what PC really is, but if this is PC, this is PC plus four, PC plus eight, PC plus 12, PC plus 16, et cetera. Does that set well with everybody? And with each clock cycle, as part of each clock cycle, it by default just updates PC to be four larger than what it was before because all of our assembly code instructions are 4-bytes wide. Okay. And by default, it always just advances to the next one unless some jump instruction or some branch instruction like that tells us to do otherwise. Okay. I’m leaving a question mark right here because I just don’t know how many lines array of i equals zero is gonna translate to. Okay. All I can tell you right now is that the offset is gonna be positive, and that it’s going to be some multiple of four. So something like plus 16 or plus 12 or plus 24, we have no idea what yet. Okay.

If this branch instruction fails, it’s because this test passed which means I’d have to just fall right to the next line of C code, which means I’d have to fall to the next line of assembly code, which implements this right here. Okay. Now you know enough about pointer math, this is easy pointer math. But this has to translate at the assembly code level to something that finds the address of the ith figure in the array and writes a zero there. There’s implicit scaling of i times size of integer here; does that make sense? The scaling over here has to be explicit. Okay. If you just write assembly code by hand, which you can do, and you’re just really thinking in C and C++ terms while you’re doing it, you have to make sure that you assign to an offset of plus zero or plus four or plus eight relative to the base address of the array when you’re writing a zero. If you’re trying to emulate a four loop inside assembly code, then you have to make sure you take care of the pointer math explicitly. So what I wanna do is I want to reload the value of I because I am being context insensitive about how I use variable values. It’s a zero, a one, a two, or a three; we know that. But then what I wanna do, is I wanna take R3 and I wanna multiply it by four. Now that four is there because ints are actually four bytes. I can't write size of int here like I encourage you to, not in this example but whenever you have to manually deal with type sizes in C and C++ code because this has nothing to do ultimately with C or C++. Okay. It happens to be imitating the execution of a C++ program, but all size and type information at the assembly code level is completely gone. It just has to be the case that the code is written in a way that’s consistent with this is intended to do. Okay. R3 has the value of i, R4 has the value of i scaled by four, so now R4 has the distance in terms of bytes from the base address of array to where the zero has to be placed. Does that make sense? What is the value of array? We know that it’s ampersand of array of zero, right? Array of zero resides right here. Okay. So the ampersand of array of zero, which is synonymous with just plain old array, it is synonymous with R1 plus 4. So I’m gonna do this. This stores the offset; this stores the base address of the entire array. R6 is equal to R4 plus R5 – whoops – R6 has the address within the array that should get a zero on this particular iteration. Okay.

Every single iteration – R5 gets the same value every single time, but R4 certainly does not. So that means that M of R6 is equal to – I’m sorry, M of R1 – oh, no, I’m sorry, that’s right. R6 is equal to zero; that’s the base address of the place in the stack activation record, that should get the zero and that’s why a zero was there. These five lines right there, it’s complicated, but they actually are in place to emulate that line right there. If we were writing a CS107 assembly language compiler, this line would translate to these five lines or something that’s equivalent to it. Okay. Then what happens next unconditionally, is that we jump back up here to i plus plus, and we execute this. That just does this – R2 is equal to M of R1; R2 is equal to R2 plus 1; M of R1 is equal to R2. And then we know that we execute the test again. So what happens here? Rather than actually writing the code for the test again, you go back and you reuse the same code you wrote for it the first time. So this is an unconditional branch. Okay. We don’t use the word branch, we just use jmp because they didn’t have room for the u – jmp, and you actually jump to a hard-coded address, but we always frame it in terms of the current PC value. So it’s not PC itself, it’s PC minus some value. I wanna jump back to this line right here; the line that loads i, compares it to four and decides what it’s gonna do. Okay? This minus has to be scaled by four, but I wanna jump back one, two, three, four, five, six, seven, eight, nine, ten instructions. Does that make sense to people? So this would be a PC minus 40. Okay. Right here, I’m out of room, but this is where I would continue. This is where the code for i minus minus would go, and it would be assembly code that’s emitted for i minus minus that has nothing to do with this four loop. And it doesn’t bank on the fact or the understanding that I would be equal to four at the time it gets there, it would just do the R2 is equal to M of R1; R2 is equal to R2 minus one, and then flush it back out to i. The reason I’m writing that there and the reason I have this here is because I wanna make it clear that this is the place where we should be jumping forward to at the assembly code level when that test passes. Okay? When this test fails, I just do the implicit update of PC to PC plus four. You don’t have to write anything for that that just happens by default. If you wanna override what gets used as the new PC value, you have to set PC right here to be PC plus one, two, three, four, five, six, seven, eight, nine, ten instructions. Okay. Times four, so this question mark would become a 40. Okay? Do you guys understand why those numbers are 40?

You may think that this minus value right there and that plus value there have to be exactly the same, that’s not always the case. Okay. It just depends on how complicated the test is, but sometimes this right here, which is the part that evaluates the i, this could be something that’s arbitrarily complicated, like i plus 24 plus j or something like that. So it might actually be a lot of code that would have to be accounted for in this jump right here. Okay. But this plus 40 deals with an offset from this to the line after the unconditional jump back; it may be a smaller value. Okay. So they’re not always the same number. Okay. I think you understand, even if the assembly code is weird for you at the moment, you understand that it really is this brute force translation of this right here to code in a language that just thinks about moving 4-byte quantities around by default. Okay. Short action branch instructions, it has some unconditional branches and some conditional branches. There are gonna be a few more things in place, but really you’ve seen like 70 percent of the language already. Okay. You guys get what’s going on here? Okay. Very good. Questions? Yeah, right there.

Student:If you wanted to, instead of jumping back ten lines, could you just jump back nine because the top line is R2 equals MR1 and you [inaudible] right before the jump set MR1 equals R2?

Instructor (Jerry Cain):Yu could. It actually – it would certainly be correct in the sense that it would do the right thing. That would just be taking advantage of the fact that the register happens to have the right value, but I just wanna be consistent. I’m not going to enforce this – I’m not gonna police the matter to the point where I actually yell at you about it, but I’d actually like you to just get in the habit of generating code in this context insensitive manner. And this load right here, this one happened to do with the fact that there’s an i present in the test; does that make sense? This i – I’m sorry, this R2 being set to that right there, okay, just happens to be associated with this i plus plus right here. And as it turns out, actually – I’m sorry, this right here has the right value. It’s – I don’t want to say it’s a coincidence, because it’s not a coincidence, it’s a four loop and it’s the traditional idiom with the four loop. But I’d rather you just be fastidious about just generating it and, like, basically going brain dead about what you’ve generated code for before because then you know you’re always right, okay, regardless of whether or not this changes.

You want the code that’s emitted on behalf of this right here and that right there to be the same every single time, or at least allow it to work if it’s the same every single time, regardless of what you do right here. Okay. Yeah?

Student:If all of the translations of every single translation translates into 32 bits –

Instructor (Jerry Cain):That’s right. We’re dealing with assembly code language where all instructions are four bytes wide. Okay. Thirty-two bits, all of them, yeah. Okay.?

Does that make sense to people? Okay. Let me explain a little bit how 32 bits are usually just enough for you to encode an instruction, just to understand what encoding is like. When a 4-byte figure is understood to be an assembly code instruction, it’s typically subdivided into little packets or little sub-packets. Here is a 4-byte instruction, and I’ll just draw very loose boundaries here for the bytes. We’re used to instructions like this: R1 is equal to M of R2 – I’m sorry, M of R2 plus four. Something like R1 is equal to a constant is not unusual either. Maybe you see something like this: R3 is equal to R6 times R10. And then something like M of R1 minus 20 is equal to let’s say R19. A load, a direct immediate constant load, an ALU operation, and a store; these are little bit more elaborate than you’d see in practice, but nonetheless we have to be able to support them.

This type of instruction and that and that, they’re certainly different from one another. This is a load, this is an ALU operation, this is a store. You understand what I mean when I say that. I even argue that this right here is technically a different type of instruction than this one because this is framed in terms of a register and an arbitrary memory address, and this is framed in terms of the constant. Does that make sense?

Let’s say that I’ve decided that my assembly code language has let’s say – let me – 59 different types of instructions. I have to somehow encode in this four bytes right here, which of the 59 instructions we’re actually dealing with. Does that make sense? Well, 59, unfortunately, isn’t a perfect power of two, so if I really want to be able to distinguish between 59 patterns, I have to be able to distinguish between, it turns out, 64 different patterns, okay? So I might set aside the first six bits of all 32 bits of – of all 32 of them to be the part that the hardware looks at to figure out what type of instructions should be executing. Maybe it’s the case that this corresponds to – in this space right here would be called an operation code, or an op code; maybe it’s the case that all zeros means this type of load instruction. Maybe this right here is the op code for an immediate constant load into a register. Maybe this right here, being a multiplication, corresponds to that type right there, and then that right there might be let’s say all ones. Does that make sense to people?

When the hardware looks at an assembly code instruction, it doesn’t actually see these. This is an assembly language instruction that makes sense to us, but it’s actually expressed in the hardware as a machine code, which is 32 zeros and ones. It would actually have to look at the first six during the first few percents of the clock cycle, okay, to figure out how to interpret the remaining 26 bits. Does that make sense to people? Okay. Maybe this is the type of instruction that allows any one of 32 registers to be updated; it allows any one of 32 registers to be the base right there, and then it allows all of the other bits to express this as a signed constant. Do you understand what I mean when I say that?

Okay. There are 32 possibilities for this, there are 32 possibilities for that, and there’s however many possibilities are allowed based on how much room we have left to encode that, we would need five bits to encode which register gets the update, five bits which determines the base address and the memory offset. And then maybe it’s the case that all the remaining bits, in theory, hardware – people who actually still believe would laugh at this, but in theory, the remaining whatever 16 bits could be used to express a signed offset from this right here. Okay. And I draw this subdivision of five and five and 16 right there. That subdivision’s only relevant when the first six bits happen to contain all zeros. Does that make sense to people?

For this right here, all I would need to do is set aside five bits. If it read zero, zero, zero, zero, one right there, then it would say, “Oh, you know what? The hardware is implemented in such a way that when there’s all zeros followed by a one right here, that the first five bits tell me which of 32 registers I’m assigning to, and all of the remaining bits, all 21 of them, can be expressed a signed integer that actually gets put into that space.” Okay. So the subdivision scheme from bit seven forward actually depends and how it’s interpreted depends on what the op code says.

Now ultimately what happens at the EE level is that these are all taken as instructions as to how to propagate signals through the hardware so that the signals at the beginning of the clock cycle look like the ones and the 17’s and the ten’s and the 200’s that we’ve dealt with in the prior examples. Okay. That’s the extent of my understanding of EE, the way I just said that. Okay. But you get the principles of what I’m trying to say, right?

Okay. What else did I wanna say? There’s absolutely no requirement that all op codes be exactly the same size. You all did – a lot of you did the – I’m sorry, there’s this one assignment that we use in 106X called the Huffman encoding assignment. Those who have heard about it know about it. 106B doesn’t do Huffman encoding do they? Okay. Well, that’s an example where they actually have variable length encodings. Here we have a constant length encoding for all op codes. It could be the case that one of the op codes could just be this right there. Okay. And then some other op codes would have to be longer, so maybe this is another op code. It would only require that the first three bits don’t happen to be coincidence if it’s something that could be interpreted as a 3-bit op code. Does that make sense to people? That may seem like a silly thing to do, except that this might be the type of instruction that benefits from having lots and lots of bits set aside for some unsigned integer offset. Okay. Or it might be the one that’s most popular so it wants the most flexibility in how it expresses its arguments. Okay. I don’t wanna say that this is not common; I think it actually is common. It certainly comes up in Nips which is the assembly code instruction that EE’s study in EE 108B, I think it is now, and CS majors currently have to as well. Okay. I’m just gonna assume for all of our examples that this is the case, that we have constant op code lengths just because it’s simpler to just rely on that type of information. Okay? Does that make sense to everybody? Okay. So there are a couple things I can do in the final six minutes. Before I start talking about function call and return, which I’ll get to on Wednesday, I should talk a little bit about structs, but more importantly, I should talk about pointers and casting. That’s the part that makes C hard, but somehow compiles. And when it compiles, it means it compiles to something like this, but in – not in CS107 assembly language, but like in X86 or whatever, or, like, Nips or whatever the target language happens to be.

Let me do one example here that’s framed in terms of the struck fraction we dealt with lecture three or four. Struct traction int, num, int, denom, and that’s it. And I declare a struct fraction called pi, and I do this: pi dot nu is equal to 22. The way I told you that structs were laid out in the third or fourth lecture, that’s actually true on virtually – in any architecture that I know of. That the structure packed – that the first field is at the lowest address and everything is stacked on top of it. And if I declare this one variable so that this is logically functioning as my pi variable, then R1, in our world at the moment, stores the base address of the one variable that’s there. But the one variable actually knows how it’s decomposed into smaller atomic types. Okay. When I do this, not surprisingly, this actually translates to M of R1 is equal to 22. If I do this, pi dot denom is equal to seven, I get M of R1 plus four is equal to seven. So that’s easy, except for the fact that you’re dealing with structs and you have to understand that the assignments have sort have forgotten about the structs and just are really updating individual integers inside the struct. The part that’s interesting, transition to slightly more scary stuff, is if I do this ampersand of pi dot – let me rewrite this. Ampersand of pi dot denom, if I write it that way and then I cast it to be a struct fraction star, I’ll abbreviate there, and then I do this. Forget about the assembly code for a second, you know how memory is supposed to be updated; it’s a little weird to see this type of thing, but it’s irrelevant that it’s weird because it’s a legal C code, and it’s supposed to compile to something. This says, “Identify the l-value of pi dot denom.” Where is it located?” Okay. Stop pretending it’s a standalone int and think that it’s a base address of an entire fraction, go to its ghost denom field and put a 451 there, okay?

The order at which things are kind of realized here, is it discovers this, it evaluates that address, it casts it to think that that right there is the base address of not a standalone int inside a struct, but the base address of an entire struct fraction, and a 451 needs to be placed there. Okay. What assembly code instruction, there’s only one of them that’s needed, what’s the assembly code instruction that would need to be in place in order for that 451 to be placed there? It just translates to this. Now you may think I’m cheating there and I’m actually doing work, but this right here is understood to be an offset of four from R1; the address is four beyond to that base address. Just because I cast it to be a struct fraction star doesn’t change the value of the address, it just has a different idea as to what is at that address. It’s like the compiler puts on a different set of glasses all of the sudden when it’s looking at this one address right here, and it knows that at the denom field that you’d get by de-referencing this, “Oh, it’s a struct fracture because it says so,” it’s an offset of four beyond what it was already an offset from, R1, and then it puts a 451 there. So all the energy that’s in place to compile this, either by hand or by code if you wanna write your own compiler, it actually can discover that this is really referring to an integer that’s presumably at an offset of eight from R1, which is where pi originally lives. Okay. Does that set well with everybody? So the cast – when you put a cast in place, at least in pure C code, there’s no assembly code instruction that gets generated as a result of the cast operation. All the cast does, is it allows the compiler to generate code that it otherwise wouldn’t have been able to generate code for because it’s like taking a little permission slip to behave differently, okay? Does that make sense? It wants to trust that there really is some legal interpretation associated with the code that it will emit by seeing that struct traction cast right there. Okay?

These are the types of things that exist in the assembly code that’s generated by your IMDB get methods and get cast class. The C++ turns out it’s not that much more than C in terms of compilation. Okay? You’re doing all these void star and car star casts inside assignment three; either you have or you’re going to very shortly. The same type of thing happens as it is compiled to either X86 or spark assembly, which is one of the two things you’re dealing with if you’re on either of the pods or the [inaudible], okay? Does that make sense to people? Okay. I can get arbitrarily complicated with all of these casts. You know I’m the type of person that will be arbitrarily complicated with them, so you’ll see some examples of these in a section handout for next Tuesday, and also in the problem set assignment that will go out not this Wednesday, but next Wednesday. It’ll be your final assignment before the mid-term, okay? Or I want you to just master this pointer stuff. And believe it or not, you get a lot of mastery by actually coding with it, but if you’re forced to draw pictures and generate code and make sure that the code you write down is assembly code, logically matches what mine and the answer key does, it actually resolves any remaining mysteries that might be in place in spite of the fact you get assignment two and assignment three working. Okay? There’s still some mysteries that are in place for a lot of people and this usually resolves those mysteries. Okay. Come Wednesday I’ll do one more intense example with a little bit more casts; I’ll probably bring in an old exam problem and show you how fun they can be. And then we’ll move on to starting to understand how function call and return works at the assembly code level, how we introduce the hexagons and the circles to the stack frame and how we get rid of them. Okay. I will see you all on Wednesday.


Lecture 9: Programming Paradigms

Topics: More Detail about Activation Records - Layout of Memory During a Function Call, How the Return Address of a Function is Stored on the Stack, Example Showing How an Activation Record is Constructed on the Stack, Setting Up Function Parameters on the Stack, Using the Call Instruction to Jump to the Function, Cleaning Up at the End of a Function and Using the RET Instruction and the Saved Return Address to Return to the Original Function, General Layout of an Activation Record, Who Sets Up Each Part of the Activation Record, Assembly Code Translation of the Factorial Function, How Recursion Translates into Assembly, Why Registers Need to be Reloaded After Other Functions Are Called, Animation Demonstrating the Assembly Execution for the Factorial Function



Instructor (Jerry Cain):Everyone, welcome. I have a good slew of handouts for you today. Sorry about the lines on the photocopies. It's the just the photocopier, it's not me putting this really annoying background image behind all the text. It's Chapters 3 and 4 of this little computer architecture series of handouts that I'm giving you. I'm gonna spend today and easily the rest of Friday talking about all this stuff and I'll go through plenty of examples.

You also get your fourth assignment today. I'm gonna make it due next Thursday evening. I will tell you now that this is the one that surprises everybody a little bit. It's certainly doable, but there are aspects of using the vector and hash set that always take a few people by surprise, particularly how you store dynamically allocated C strings in these vectors and hash sets. If you've just made one key mistake, then you can actually waste an hour or two trying to figure out why it's not working even though it's compiling.

So be sensitive to the fact that you might just want to read through the handout, and maybe get the first 25 percent of the assignment done because immediately you have to start dealing with these C strings, and once you figure out how to store C strings in these things, you do much, much better and it goes much more smoothly after that. Now I'm gonna do something this week that I won't do very often, but I'm gonna have another discussion section this Friday at 2:15.

Now I'm just inventing the time and I know I can't check with everybody in the class as to whether it's convenient or not, so I just had to schedule it. We're gonna videotape it. We're gonna put it online. The discussion section just happens to come at a kind of crappy time in the assignment cycle. It's like two days before the assignment's due. It's not a disaster for the assignments that we've had so far, but I really want to show you some examples using the vector and the hash set in a structured discussion section, and I want to do it more than two days before the assignment deadline.

So this Friday at 2:15 if you wanna attend live by all means do it. Skilling 191. It's just this week. We will have discussion section next Tuesday as well. I just have this one I wanna insert into the sequence just so you have a little more practice before you really tackle the assignment this weekend, which is when I most of you will start.

Okay. I wanna continue with this cogeneration thing. I wanna get the piece of chalk that I'm gonna use, and I wanna talk a little bit more realistically about what activation records look like. I have kind of blown off the parameters, and where they reside in these activation records. I've only dealt with local variables, but all the functions I've invented have had no parameters passed in. I did that because I just wanted to simplify things.

So there's a couple of confessions I have to make about how I slightly misled you on Monday just to make things easier. If you have this as a function prototype, and you'll be able to infer structure based on this small example, I think. I'll just call it foo, and I'll pass in an int l bar and an int star r called baz, and internally I'll declare some variables. A char array, I'll it snink. I have no idea what these words are; I'm just making them up. And then let's say a short star called Y.

And I don't care about the code at the moment because I just want to show you what the activation record will look like. Now obviously it shouldn't surprise you that this and this and this and this are all packed somewhere close to each other in memory. And in the example I gave you on Monday, I only had stuff like this. The way these are laid out relative to one another does not change. I would have a character array of 4 bytes, this would be this thing called snink, which is a word I've never used before, but here it is.

The four characters would be packed in a static array that resides inside the activation record. Below that, I don't have a short, I have an address to a short that is called Y. Now you may ask what about these things? These things are certainly close. Let me just draw the picture and explain why it looks the way it does. There is this reserved four byte figure that sits right there that has nothing to do with parameters or local variables.

But on top of that there's gonna be bar and on top of that there's gonna be this thing called baz. Okay. And this right here is the full 20 byte activation record that makes up – or that accompanies any call to the foo function right here. Now I put bar below baz. Is that arbitrary? The answer is it is not arbitrary. Now I can't really gracefully explain why this goes below that and why – basically parameters are laid down from high to low address from right to left.

In other words, the zero parameter here is always below all of the other ones. And the first is stacked on top of that and the second is stacked on top of that. Give me 40 minutes of more lecture and I'll be able to explain why that has to be the case in a language like C and C++. These right here are actually stacked in the order they appear. All of these appear at lower addresses than these. This right here is something I'll be able to discuss a little bit more in about 15 minutes.

That is the space that sits in between parameters and local variables. It actually has information about the function that called us. Obviously, foo is invoked from the main function or from some other function or maybe even foo itself if it turns out to be recursive. We're gonna need to lay down a little piece of popcorn right there about where in the code base we actually found this call to foo. Okay. And when the function exits, it relies on this value right there, which I am also going to call a safe PC.

It's what the safe PC value would have been had some function call not interrupted the stream of instructions. Do you understand what I mean when I say that? Okay. But don't worry about how that's manipulated yet. Just understand that it's there and I'll be more sensitive to using it in a few minutes. So that is the activation record layout for an arbitrary function. What I want to do now is talk about how something like that is constructed because a function like foo is called.

Now I think foo is kind of a weird looking function, but I'm gonna go with it. Int main – I'm not gonna concern myself with the parameters – actually I lied. I will. Int arg c char star star arg v and I'm gonna declare one local variable I = 4 and I'm gonna call foo of I and & of I. And then I'll just return zero at the end. Recall back to last Friday when I termed everything in terms of triangles and hexagons. I want to be a little bit more scientific than that.

And I wanna explain how we go from this, where this is arg c and this is arg v and this is set to point to something like 2, and this is set up to point to an array of char stars. Okay. I wanna figure out how we go from that, actually to technically this – I'll call it the safe PC here. When I generate code for this right here, eventually I have to generate code for the function call, you actually generate code to basically allocate space for all the local variables.

Only main's implementation knows how many local variables are needed in order to accomplish what it's trying to accomplish. So by protocol, the very first thing a function does – a C function does, is it makes space for its local variables. Main is called with a partial activation record. It has to complete the full activation record by doing something like this. That's P = SP - 4. Now remember on Monday I was using R1 to always track the base address of the activation record? Okay.

Well, R1 is actually supposed to be a general-purpose register, but there's actually a dedicated register. We just call it SP. It's short for stack pointer. Okay? And that is always the thing that is truly pointing to the lowest address in the stack that's irrelevant to execution. When main gets called SP is pointing right there. The allocation or the creation of this variable right here actually compiles to an instruction to demote this value by four more bytes.

Why four? Because I only have one four-byte figure right here. Okay. And then this becomes the boundary between what's being used in the stack segment and what's not in use. Does that sit well with everybody? Okay. Then it carries on and it does this one initialization right here. The next thing that would happen is that it would just do M. It was R1 on Monday, today it's SP = 4. That takes care of the assignment right there. Okay.

And then I have to actually prepare to call the foo function and wait for it to return before I go ahead and return zero. So the instructions that are executed on behalf of this foo function has to do what somebody did for the main function. It has to set aside space for the parameters. So it has to build a partial activation record for that thing right there. Okay.

It can tell because it sees the prototype how many bytes are gonna be above that safe PC or that return link. Because I have four bytes, and four bytes making up the upper half of the activation record, the first thing that would happen is that you would do something like SP = SP - 8. That would bring this down to there. Okay. This part right there is still the activation record for main, but I've just built 40 percent of the activation record that needs to be in place for foo to run. Does that make sense to people? Okay.

I have to pass in I and I also have to pass in the address of I. I have to make sure that the current value of I is place right there. The address of I is placed right there, and then I have to transfer control over to the code that emulates whatever foo is supposed to be doing. Okay? Does that make sense to people? Yes? No? Okay.

So in R1 I'm gonna do M of SP + 8. In R2, I'm gonna put the actual result of SP + 8. This address right there is relevant to both of those lines. What placed in R2 is the actual value of this thing. What placed in R1 is the contents of this thing right here. Okay. The reason I want to do that is because I want to lay down an M of SP and M of SP + 4 the actual values that need to be communicated to the foo function.

SP is the lower of the parameters. That's the thing that's supposed to get R1. This is supposed to get what I've laid down in R2. Okay. So what happens there is I have effectively done this. I've copied a 4 – actually, a 4 went there in response to that line right there. I copy a 4 right there. I effectively laid down that address in the second of the two boxes. Does that make sense?

Do you understand this part below the arc? It is basically 40 percent of that thing right up there. Yes? No? Okay. After I've done this, what I do in response to after I've set up the parameters, I actually transfer control to the foo function by using this assembly code instruction. That's basically a jump instruction that says jump to the very first assembly code instruction that's associated with the foo function, execute that, and then somehow figure out once you're done executing that to jump back to this right here.

Whatever this is right here, it's gonna actually do this – actually it's gonna do SP = SP + 8. I'll explain why it's that in a second. But do you understand that if I weren't jumping to the foo function that this would be the next thing that gets executed. Does that make sense to people? Okay. This address is actually, what gets laid down in this little safe PC thing. Okay. And it's automatically placed there by the call instruction.

At the time that the call instruction executes, it has a clear idea of what PC is so it knows what PC + 4 is. It actually on our behalf decrements SP by four more bytes and lays down that safe PC right there. So when foo is done executing it has information in its activation record about where to jump back to. Okay. This is basically at the electronic level or the hardware level, basically a piece of popcorn to remember where you were walking before you took a turn. Okay.

Does that make sense to everybody? Okay. This transfers control over to the foo function. Well all I'm gonna do as part of foo, is I'm gonna do something like let's say Y = short star snink + 2 and then I will do *Y = 50. And then I will return. Okay. What foo has to do is it has to complete its activation record by decrementing the stack pointer even further to accommodate and make space for those two variables right there.

So what happens is that this – let me actually write this somewhere else. What foo does, it makes space for these right here and that right there. What happens is SP is set equal to SP – 8. Why is it minus 8? Because it can tell while it's being compiled that that's how many bytes are needed to extend the activation record distance enough just to complete the full activation record. So it does this, brings the stack pointer down to there, leaves it unitialized. Okay. You know that because C isn't an initialized local variables so the assembly code won't either.

And then it carries on and compiles code for this, and it accesses this variable and that variable relative to the value that it's the SP register. So the first thing that happens here is the value of Y, the contents of this thing right here, has to be updated to include the address of that thing right there. Okay. Does that make sense to people. Forget about the cast, I have to evaluate what snink + 2 is. That's really & of snink + 2. Okay.

So what I can do, in R1, I can set that equal to SP + 6. Why that? SP points to the base of the full activation record at this point, has to go four bytes beyond to basically circumvent Y, and go two bytes forward because it knows that snink is a char star so pointer [inaudible] the same thing, and it wants to store that address at this moment into a register so that it can be assigned to M of SP.

So R1 stores that value, M of SP identifies this space right there. That's what we want to get this value because that's the space that overlays the variable called Y. Okay. Make sense? As far as this is concerned, what has to happen is I have to get a 50 somewhere in memory. Where in memory? Well it's at whatever address happens to be stored in the Y variable.

So what would happen here – that separates the variable declaration. There's that. Into a local register, I would actually reload Y, R1 stores the address where a 50 should be written, but I only want to write it as a 2-byte figure because it's typed to point to a 2-byte figure. Does that make sense? Okay. So I would do M of R1 = .2, 50. Okay. Does that make sense to people? This is done. Okay. So now, it has to exit and basically return somehow to the main function that's right here. Okay.

You understand why that SP = SP - 8? Sits right there. That was to make space for the two local variables to basically make use of eight more bytes in the local stack segment for its local variable set. Well it has to promote SP back to where it was before it entered this function. This is basically the equivalent of – I don't want to say it's the equivalent of malloc; it's the equivalent of allocating space for variables, we have to deallocate that space. Okay.

So for SP=SP + 8 would be the last thing that's done right here. That leaves SP to be pointing right there. Do you understand that SP is addressing the very 4-byte figure that has the little piece of popcorn in it? Okay. Does that make sense to people? So this final instruction, RET for return, is understood to be an assembly code instruction that pulls this value out, places it – basically populates the true PC register with what this is, brings the SP register up four bytes and then carries on as if it were executing at PC + 4 all along. Does that sit well with everybody? Okay.

So there's that. As far as main is concerned, this is the next instruction that gets executed after this return executes. Okay. It jumps back, and it puts the address of that in the PC register. This SP is equal to SP + 8. Actually deallocates the space that's set aside for these two parameters that it put down there in preparation for the foo call. Does that make sense? Okay. So that's what that is, and then what I do, is I use another register. There's not too many dedicated registers with special names, but this is one more of them.

We have PC, we have SP, we also have this thing called RV which is this four byte register dedicated to communicating return values between caller and callee functions. Okay. Does that make sense to people? Okay. I'm returning a zero to whoever called main, so you have to think of RV as this little cubbyhole where return information is placed so that once it jumps back to whatever function called main, it knows to look in RV immediately, and pull that value out to take it as a return value.

The metaphor I usually use here is that think about you're trying to leave money in a locker at an airport. Okay. You wanna put the money in the locker, and immediately walk away when you know the person who's supposed to get the money is the next person looking at it. Does that make sense? Okay. We didn't have a return value for this foo function. I'll write another function that's a little bit simpler than this that really just makes use of a meaningful return value. This is also a meaningful return value, we usually just blow it off because we're not concerned about who's calling main.

Let me just draw a general activation record. Here are all the params. Here's the safe PC where the return link, and here are all the locals. These are allocated and admitted by the person calling the function. These are allocated and initialized by the actual function that's being called, the callee. So the entire picture is the activation record that has to be built in order for the code inside a function to execute and have access to all the variables.

It may seem a little weird that this part, and technically everything through that right there, but that it's set up and initialized by the person calling the function, and the rest of it is set up and initialized by the function itself. Why is there this separation of responsibility between actually building the entire thing? Why can't the caller build the entire thing? Do you understand what I mean when I say that? Why couldn't main set aside space for all the variables?

Why couldn't foo actually set aside space for all the variables? The reason is that the caller has to be involved in setting at least this portion up because only it knows how to actually put meaningful parameter values in there. Does that make sense? Who else is going to put the four and the ampersand of I in there other than the caller. Make sense to people? This right here can't be set up by the caller because the caller has no idea how many local variables are involved in the implementation of, in this case, foo. Okay. Does that make sense?

So the separation of responsibility really is the most practical and sensible thing to do. The caller knows exactly how many parameters there are, can look at the prototype to tell, and it knows how to initialize it. That's why the top half is set up by the caller. The bottom half, the caller doesn't even know how many local variables there are much less how to manipulate them.

So you have to rely on the callee function, this is foo in this case, to go ahead and complete the picture by deallocating SP to make space for the local variables. Now I put this right here. When I say call foo right there, it's really a jump instruction that's associated with the address of this first instruction right here. RET is an instruction to basically jump back to whatever address happens to occupy the saved link. Okay. Sit well with everybody? Okay, very good.

Let me write a little bit more of a practical function just so you understand how this return register and call and return all work in a case of a recursive function. Okay. So let's write a function that's a little more familiar to us. I want to write this function called factorial which is framed in terms of a single parameter called N, and I'm not gonna have any local variables. If it's the case that N double equals zero, I just want to go ahead and return 1.

Otherwise, I want to go ahead and return m times whatever the recursive call to factorial of M – 1 returns. I'm not concerning myself with M being negative. I don't care about the error checking. I want to concern myself with how this translates to assembly code in our little language. Okay. Let me erase this. Yep?

Student:Okay, in the first [inaudible] of foo, you substituted [inaudible], I mean are you insinuating that you can substitute [inaudible]?

Instructor (Jerry Cain):In other words, do one variable at a time?


Instructor (Jerry Cain):Yeah, you could do that. It wouldn't be incorrect. Compilers when they're generating this code, they can look at the full set of local variables that are declared at the top, and it can compute the sum of all the sizes, and reduce the allocation of all of them to one assembly code instruction instead of several.

Student:So it's more efficient, right?

Instructor (Jerry Cain):Well, yes, technically. I don't want to say that it's – that's not a top priority, but real compilers would just use that right there because they could very quickly add up all the variable sizes, and just know that they're gonna be packed together, and just do this on one little swoop. Okay. Now as far as this factorial function is concerned, I want this to label the first assembly code instruction that has anything to do with this as a function.

It has to assume this as a picture. It has to assume that some value of N has already been passed in. Okay. That whoever called it, laid down a call to the fact function, and as a result of that, laid down some safe PC so that it knows where to jump back to after factorial computes its answer. Does that sit well with everybody? We can momentarily forget about the fact that function, call, and return is involved because the first few statements right here are just normal little C code. Okay.

What I want to do up front is I want to load the value of N into a register, SP + 4 is the address of M in this case. Okay. And I want to pull the 4 into a register called R1 because I want to conditionally branch around this return statement if some test passes. Okay. Well, I want to branch – I'll leave those open for the moment, depending on whether R1 and zero basically mismatch. If it's equal to zero, I just want to fall to the return statement and scram, but if they're not equal, which is why I will put M and E right there.

If they're not equal, then I'm doing a little transition of this, then I want this to not execute and it to come down here. So all I'll do is I'll put PC +, and I'll leave this open because I don't know how many assembly code instructions I'm jumping forward yet until I execute the code for that. Okay. If this test fails, it's because this test passes and I want to basically populate RV with a 1 and then call return. So I would do this return value is equal to 1 and then I would return. Okay. Does that make sense to people?

If this doesn't happen – I'm sorry, if this test fails, I better jump beyond the return statement, which means that this should be a 12. Okay. And the next instruction I'm gonna draw, has to start on the computation of this thing right here. Now I have no reason to load M into a register because I'm only going to potentially clobber that register when I call this function recursively.

So what I need to do is I have to prepare the value of M – 1, push that onto the stack frame in preparation for a recursive call to factorial, let the recursive call do what it needs to do, assume it puts an answer in the RV cubbyhole, and then use that answer immediately to multiply it by what my local value of N is to figure out what RV should not be populated with. So I will do this, R1 = M of SP + 4, that loads in into R1. I will compute what

N 1 is.

And now I have all the information I need to make this recursive call. So I'm going to set up the partial activation record. I'm gonna do SP = SP - 4. Okay. I'm gonna write to SP, this value of R1. So what just happened here? I decremented – this is where SP is pointing right now, I decremented to a point right there. That right there is the original activation record. I'm in the process of building the activation record for the next call. I lay down a 3 right there, because that's what R1 has at the moment. It had a 4, but I just pulled it down to a 3. Okay. Make sense?

And then I call on the factorial function. In response to that, this is decremented four more bytes, the address of this instruction is placed in the safe PC, but execution jumps back to follow this recipe all over again, but on behalf of a second activation record. Does that sit well with everybody? Yes? No? Okay.

So it's as if the primary call here – you just can think of it as suspending, it's not really what's happening, but access to this activation record is temporarily suspended while the recursive call deals with these addresses downward. You just have to assume by protocol, that the recursive call is the person you're in cooperation with and the recursive call is placing money in the RV locker, okay, so that when you pick up execution right here, you know that the RV register has something meaningful. Okay.

Well, when you get right here, you have to clean up space for the local parameters. The return statement that brings us back here, gets rid of this, and then that SP is equal to SP + 4, brings this arrow back up here, and that was the picture that was in place before we involved any type of recursive function call or any function call at all. Okay.

We know that RV has – it would be six if factorial was really doing its job. So I'll actually put that. It has a six in it, if we just take the leap of faith, but at the assembly code level that the recursion is working, and then once I clean these up right here, I have to reload M, so R1 now has a 4 in it. I can do this. That emulates that multiplication right there. Okay.

RV has the return value of this. R1 has the value of that. I have no local parameters to clean up. RV has the register that I need to communicate back to whoever called me, so I will just return. Okay. Does that make sense to people? Question right there?


Instructor (Jerry Cain):The PC + 12, where did I put that? I'm drawing a blank. Oh, this right here? This is normally, if there are no branch instructions, you know how PC is the address of the currently executing instruction, and each instruction is 4 bytes wide. So by default with each clock cycle, the PC value was updated to be 4 bytes larger than it was before. So normally it executes things in order.

If this is PC, this is PC + 4, PC + 8, PC + 12. If this branch instruction actually passes then I don't want to fall to this, you specify as the third argument the actual address of the instruction you should jump to relative to the current PC value. Okay. Now in some real systems, PC's already been promoted by four, so this could be PC + 12 on some other systems, but in our little world I'm just assuming that PC retains its value during the full execution of this thing right here. Okay. And it's just identifying that right there. Were there other questions? Yeah?

Student:What it be alright to do something less arbitrary –

Instructor (Jerry Cain):Yeah, then it's a little different. I'm gonna just constrain all our examples that deal with things that are exactly 4 bytes. I'm sorry, 4, 2 or 1 bytes; things that can fit into an RV register. MIPS as an architecture, I know that one best of all of them, there are two return value registers and it usually only uses one of them unless you're returning like a double or a long, long, which is an 8 byte integer, and it would use both of them.

If you're returning a struct with 12 or more bytes in it, then what it'll do, is it'll actually place the address of a temporary struct somewhere in memory, and just assume that the caller knows that a struct is being returned and will take the RV register and dereference it to go to the thing that's actually a struct. Does that make sense? So they figured it out. I'm just not gonna deal with that particular stuff because it's more minutia to worry about. Yeah?

Student:[Inaudible] I didn't quite get that.

Instructor (Jerry Cain):That's emulating that multiplication right there. At the C level you know you have to spin on the return value from the cursive call, and multiply it by the current local value of M. Right?


Instructor (Jerry Cain):So this right here, just stores the current value of M and RV right there, stores the result of the recursive call. Okay. Does that make sense? Now I have the first piece of evidence as to why I always want you to reload variables. Do you understand that right there I absolutely had, absolutely had to reload the value of M. Okay. You may say, well I did it right there. Okay.

And even if I didn't do that right there, I did this with R2, I'd absolutely have to reload the value of M because I have no idea how complex and how motivated this as a recursive function, or any function at all, is to actually clobber all of the register values. All of the function calls are using the same exact register set. Does that make sense? Okay. So that's why I want you to get in the habit of reloading all your variables as you advance from one C statement to the next one. Okay. Does this make sense to people? Yeah. Question right there? Student:

Why do [inaudible] right after the return statement above?

Instructor (Jerry Cain):Because – this right here, for the same reason basically. I want all code emission to be context insensitive and I don't want it to leverage off of things that happened in prior C statements. Real compilers would do that. I just don't want us to do it. I'm not adamant about it, but I just think that there's a nice clean formula about always generating things from scratch because if this code changes because the C code statement prior it changed, this code still does the right thing for the statement that it's being admitted for. Does that make sense? Okay.

So I don't want to kill this example yet. I want to run through an animation, which is why I have my computer here just so you see how the stack grows and shrinks in response to assembly code instructions actually running. Yep?

Student:[Inaudible] why, for example, isn't there a PC - 12? Or not 12, PC – [inaudible]

Instructor (Jerry Cain):Oh, I see what you're saying. When the program is actually loaded, this actually would be replaced by PC - 28 or whatever it is. These are just placeholders at the moment because these are easier to deal with. Think of these as like go-to labels. That's really what they function as at the assembly code level. A real assembler or linker would in fact go through all the dot O files, and replace these things right here with the actually PC-relevant addresses. It usually will defer it until link time because it wants to be able to do the same replacement of these things with PC-relevant addresses between functions that are actually in different dot O files. Okay.

And I'll talk a little bit more about that on Monday, and probably Wednesday about how all the dot O's are basically assembled into an A dot out file or a 6 dash degrees or what have you. Okay. Okay. So give me two seconds to do this. Okay. We should be good to go. Create some mood lighting. Let's see how well this translates to the screen. Okay. So this is more or less – it's an exact replica except for spacing of the function I just wrote on the board. Okay. And, oh, that's not showing up. That's not good.

Can I get it to show up? It shows up over there. That's great. Oh, it's really up there. That stinks. That's never happened before. Hold on a second. Maybe it'll just readjust to the screen size. Stalling, stalling, stalling, stalling. It has no input so something's weird going on. Okay. Actually, I have an idea. Yeah, but that's still not really very good. Let me do – let me just run it in place. Okay, you guys can all see this right here. Let's make sure the full picture can be seen. What's up?

Okay, there's that. It's actually pretty close. I don't think it zooms very well when it's driving both this computer and that monitor. Okay. That's actually not bad. The return type of fact is int. Okay. It's being clipped up by that mushroom that's on the screen up there. So this right here is the code – the C code obviously. This right here is the assembly code I more or less just wrote on the board. Okay. This is the animation, and I'll try and do this this way. Let me do next slide. Yeah, this'll work. There's that. Okay.

So the original call is the factorial of 3 and it's just gonna follow these instructions one by one and use the accumulation of the stack to just compute what 3 factorial is supposed to be. Now it checks immediately. It loads in. It checks to see whether or not N = 0. It's not, so it actually does branch forward and it prepares for the recursive call. Does this make sense to people? Okay. It has to assemble the recursive value, has to make space on the stack frame for that new – the recursive value of M, it has to write to there.

So you how we're basically in the process of building the activation record for the recursive call and then we have to call factorial and some things are updated. The little circle that's been populated in the safe PC portion, it actually has as a piece of popcorn, a pointer to the next instruction so it knows where to carry forward when the recursive call returns. Does that make sense? Okay. Question?

Student:So we aren't required to store PC?

Instructor (Jerry Cain):No, it actually happens – the call instruction does that for us. Okay. Do this – comes back up here – the original call doesn't forget where it should carry forward when the recursive call returns because we have that piece of popcorn in the stack frame. Okay. So there is this is exactly the same thing where N = 2 and N = 3. Notice that both recursive calls have as their safe PC the instruction after the recursive call. That should make sense. They all need to know where to carry forward from after the call returns.

It just happens to be the same place both times. It does go through one more recursive call, lays down a zero, one more, and then finally it comes to an invocation. It's following the recipe for a fourth time. The first three times it's suspended execution, but this time this branch instruction is going to fail, so it's going to go and actually populate RV – actually I can't – this right here has been updated with a 1, and so it's gonna go ahead and return these second to last recursive call is like, oh, I'm active again, I'm going to continue carrying on from where I was left off before.

I immediately look into the RV register and spin RV 1 x 1 stays the one, but as things unwind, the all carry off from here, the 1 in RV becomes a 2 – this isn't working very well – the RV becomes a 6, and finally I return to whoever made the original call which is probably – or which may not be a factorial. Does that make sense to people? Yep?

Student:Those arrows that were pointing down onto the screen, where those –

Instructor (Jerry Cain):They were the safe PC registers. The safe PC blocks. You remember the 4 bytes that existed between parameters and locals? Well there is a real number that's placed there. It happens to be the address of the instruction that comes after the call statement so it knows where to jump back to. It's almost as if it pretends that it wishes the call thing were just one assembly code instruction, but since it isn't and it jumps away only to come back later on, it needs to know how to come back. Okay. Okay. So does that sit well with everybody?

So that was a breeze through that, but I have a little bit of stuff to talk about come Friday. The one thing I will tell you is that C++, if you code in pure C++, you're probably programming in the object-oriented paradigm so your object and data focused as opposed to function and verb focused. In C, which we're dealing with right now, you always think about the function names, the data is incidental, you always pass the data in its parameters as opposed to having the data being messaged by method calls.

We're gonna see on Friday, and this will be an aside, I won't actually test you on this part on the mid-term, but you'll see that C++ method calls and C function calls really translate to the same type of assembly code that everything is emulated by a function call and return with call and return thing at the assembly code level, and it's really just an adaptation of either object orientation or imperative procedural orientation to assembly code to get it to run, emulate either C code or C++ code. Okay. And that's what we'll focus on on Friday. Okay. Have a good night and I will see you on Friday.

[End of Audio]


Lecture 10: Programming Paradigms

Handout 10: Function Call and Return
6 pages

Handout 11: Code Generation Examples
12 pages

Handout 12: Factorial Trace
39 pages

Programming Assignment 4: RSS Instructions
4 pages

Programming Assignment 4: RSS FAQ
1 page

Topics: Moving from C Code Generation to C++ Code Generation: Basic Swap Example, Code Generation for the Pointer Swap Function, Code Generation for the C++ Version Of Swap Using References, Which Are Treated as Automatically Dereferenced Pointers, Local Variables Declared as References, Difference Between References and Pointers, Effect Of Declaring a Class on Memory in the Stack, Class Methods, Which Take a "This" Pointer as An invisible First Parameter, Effect Of the "This" Pointer on the Activation Record for a Class Method, Static Class Methods (Standalone Functions), Which Don't Contain a "This" Pointer, Compilation and Linking - #Define and the Preprocessor



Instructor (Jerry Cain):Hey, everyone. Welcome. I actually have one handout for you today, and Ryan just walked in with it. So he’s going to be passing it around to the people who are in the room. Remember, we have a special discussion section today, at – I’m forgetting what time it is. What did I say? 1:15 p.m.? I’d check.—whatever I said on Wednesday. What did I say? 2:15pm, okay. I don’t know why I’m forgetting: 2:15pm to 3:05pm, in scaling 191. We’re having a special discussion section, just this one week because I want some example problems to be in – to sit in front of you, and for you to think about them, more than two days before the next assignment deadline. Which is why I’m having it today.

It’s going to be videotaped. I’m gonna call SCPD after the lecture today, and try and make sure that they post it by 5:00 p.m. today, so it’s around for the weekend, online, so you can watch it. But I’ve already, I think, pretty well advertised it, Assignment 4; I think to be kind of the biggest surprise in terms of workload and difficulty for most students. So that’s why I’m kind of advertising it now, to be something you want to start on, sooner than later. Because if you start next Thursday, it’s very likely, you will not finish on time. So just try and get – take a stab at it a little bit ahead of time, this time.

When I left you last time, I had shown you two examples of how function call and return works, in general, but specifically in our assembly language. So what I want to do is, I want to do one specific example, actually much, even simpler than the last example I did. Because I want to make a transition from C code generation to C + + code generation, and show you just how similar the two languages ultimately end up looking like, when they compile to assembly code.

So let’s deal with this scenario: a void function. I just want to say foo, and I’ll concern myself with the allocation of two local variables. I’ll set x equal to 11; I’ll set y equal to 17. And then, I’m going to call this swap function. This time, I’m interested in writing swap to show you what the assembly code looks like. And I think you’ll be surprised to see how the pointer version of swap and the reference version of swap look exactly the same. But the way I’m writing it, right now, this is pure C code. Actually, you can call it C + + code, for that matter. And then, that’s all I want to do.

I just want you to understand how code for this would be generated. I don’t have any parameters, whatsoever, so I would expect the stack pointer, right there, to have the safe p c associated with whatever function happens to be calling foo, right here. There’s a call foo instruction in someone else’s assembly stream, and that safe p c is the address of the instruction that comes directly after that. That’s the trail of popcorn I was talking about, on Wednesday. The first thing that’ll happen is that this thing will actually make space for its two locals. I actually like getting in the habit of kind of taking care of my deallocation call right away. That make sense to people? Okay? Well, as soon as I write that, I just want to remember just so I just don’t forget to put it on the page. I like to put the balancing s p change at the bottom of the stream. Now, I’ll concern myself with this, right here. This brings this down, to introduce those 8 bytes. X is above y, so this instruction right there, translates to m of s p plus 4 is equal to 11; then m of s p is equal to 17, and then virtually all of the rest is associated with the work that’s necessary to set up and react to – set up a function call, and react to its return. Okay? So what I want to do here is I want to evaluate ampersand of x, and ampersand of y. The fact that they’re an eleven here – oops – an eleven there and a 17, is really immaterial to the evaluation of these two things, right here. This, given that setup right there, is synonymous with a p c plus 4; this is synonymous with p c. Okay? I want to put those values in registers: R 1 is equal to s p – I’m sorry, that’s right – s p y 2 is equal to s p plus 4. This is ampersand of y; this is ampersand of x. I want to further decrement the stack pointer by minus 8. The fact that this is minus 8 is just a coincidence with that being minus 8. There happen to be two 4-byte parameters being passed here; there were two 4-byte parameters – 2 4-byte locals, local to foo. Once I’ve done that, I’ve decremented this by 4 more bytes. That marks the bottom of the foo activation record. I’m now building a partial activation record for the swap function. I want to lay this as a figure, right there. This, as a figure, right there. This first parameter, the zero parameter, actually goes at the bottom. So I want to do this: m of s p, which just changed, is equal to r 2; m of s p plus 4 is equal to r 1. Okay? That, basically, gets this right here, to point to that, and this to point to that. So when we go ahead, and call this swap function, we’re just inferring its prototype to take two int stars, it just sees this. Technically, it has the addresses, the locations of the two ints it’s going to be dealing with, but it doesn’t, technically, know that they’re right above it on the stack frame. It actually just has the addresses on their little houses that just happen to be just down the block. Okay?

This, right here, all it has to do is basically clean up the parameters. When the call swap function happens, p c is pointing right there. It can, by protocol, assume that that’s where the safe p c is left – I’m sorry, the stack pointer is left, when it jumps back to this instruction because the return at the end of the swap is responsible for bringing it all the way back up here, just by protocol. Does that make sense? Okay.

So all we need to do: fill in the space. This balances that; that balances that. You could coalesce these to one statement, if you wanted to. I just don’t see a compelling reason to. And then, since there’s nothing being done with the new values of x and y, I can just return. Okay? Does that make sense to people? As far as the code for swap is concerned, this is void swap. I’ll write it to be int star specific; a p int star b p int temp is equal to asterisk a p. There’s actually a little bit of work here, at the assembly code level. I know you’ve seen this implementation, probably 40 times, in the last two quarters, but there’s some value in actually seeing it, one final time, and for me to generate assembly code for it.

It starts executing with the assumption that s p is pointing to a safe p c. In this particular example, it happens to be that address, right there, that’s the safe p c. This is internally referred to as a p, b p. It’s not like the words a p and b p are written on the hardware, but there’s just – the code is generated, in such a way, that a p is always synonymous, at the moment, with s p plus 4; and b p is synonymous with s p plus 8. Okay? The moment the stack pointer points right there, I have to make space for my one local variable. That happens because I do this. I’ll make it clear that this is associated with the label swap that we actually call or jump to. That brings this down here. This is locally referred to as temp in the code; this is the space that corresponds to temp in the activation record. The offsets of a p and b p actually shift, a little bit. Now, this is an address, s p plus 8, s p plus 12. I have to rely on these things pointing to real data. I don’t need to know where that data is to write the correct assembly code. I need to evaluate a p and then, what a p points to. So as part of initializing that, right there, I have to load into a register, m of s p plus 8. Do you understand that it lifts this bit pattern, right there, and places it in a register, so I can basically do a cascade of dereferences. Does that make sense to people? Okay? Now I have a p in the register, I can do this and now I have the very integer that is addressed by this a p thing, okay, sitting inside r 2. You may ask whether or not you can do something like this. Conceptually, of course you can do that, except you’re not going to find an architecture that supports double cascade in a single assembly code instruction. Okay? So that’s why they’re broken up over several lines.

So let’s assume that this points to that, right there, that the integer that has to be loaded into temp. That’s only going to happen because I do this. Oops. And that ends this line, right here. Does that make sense to people? Yeah, go ahead.

Student:When you say r2 equals m, r of r 1 –

Instructor (Jerry Cain):Yep.

Student:Why does it not just copy the address in, say, a p and actually [inaudible]

Instructor (Jerry Cain):Well, the memory of – you basically go inside an array of bytes that’s based addressed, right there. Okay? So you think about the entire array of – entire array of bytes that’s in RAM as being indexed from position zero. And so when you put that number inside, it actually goes to that address. We know that s p plus 8, this right here, we know that the 4 byte bit pattern that resides there is actually the raw address of some integer. The assembly code doesn’t know that, but we do. So we’re asking it to shovel around a 4-byte figure and put it into this temp variable. Right? This loads that address. This address right here, let’s say it’s address 4,000. I just loaded a 4,000 into r 1. Maybe it’s the case that the 17 is in address 4,000. Okay, that would make sense. So by effectively doing m of r 1, in this particular case, I’m doing m of 4,000 because that’s what r 1 evaluates to. Okay? And by default, we always do it with 4-byte transfers. Okay? So this would get a 17 to play the role of happy face, and would put a 17 right there. Does that make sense? Okay. So now, what has to happen is something similar. I have to do this right here. But it’s actually a little different than this line right here because I have to find the L value by actually following a pointer. The L value, being temp, is directly on the stack frame, in this case. Right? The space where the new value is being written is actually one hop off of the stack frame. Okay? Let me evaluate asterisk b p first because it’s very similar. I will do r 1 is equal to m of s p plus 12. That loads just the value of b p because that’s what’s stored as a variable on the stack frame. Okay? In r 2, I’ll do m of r 1, but r 1 has a different value this time. Maybe it is the flat emoticon, okay, and that gets loaded into the r 2 register. That’s what actually has to be written over this smiley face, right there. Make sense? So in order for that to happen, I have to load this again: r 3 is equal to m of s p plus 8. And I don’t want to load m of r 3 because I don’t care about the old value. I want to actually want to set m of r 3 equal to r 2. Okay? Does that sit well with everybody? Just making sure I did that right. Yeah, a p right there. Okay? Making sense? The last line is actually very similar to the first one. I know that temp is sitting right at the base of the entire activation record. Now, what I have to do is I have to load the address associated with b p into r 2. That’s stored at m of s p plus 12. And then I have to do m of r 2 – I’m sorry, yeah – m of r 2 is equal to r 1. That realizes this, right here. So in those three blocks, I’ve achieved the algorithm that is the rotation of these three—of these two-byte patterns, actually, through a third. Right before I clean this up here, I have to do an s p is equal to s p plus 4. Internally, I have to bring the stack pointer up to the place where it was before I executed any code for this function. Okay? That means that s p is now pointing to the safe p c. Which is perfect because in response to this return instruction, our return actually knows, just procedurally, to go ahead and pull that value out, populate it with the – place it into the real p c register and simultaneously, or just after that, bring this back up to there. Okay? That’s exactly where the s p pointer was pointing before that call statement. Does that make sense? Okay.

So I went through that exhaustively because pointers are still mysterious for some people, and understandably, because they’re difficult. So if you start to understand them, at the hardware level, you have some idea as to what the pointers are trying to achieve. They really are raw locations of figures being moved around, or at least, manipulated. Okay? What I want to do now is, I want to show you what the activation record and the function call and return mechanisms would be for the C + + version of swap. This is very close. I’ll write it again and leave it as an open question, for about two minutes, as to what the activation record must look like. I’ll just do a int ampersand b. So just pretend we’re dealing with pure C, except that we’ve integrated the reference feature from C + +. Okay? Int temp is equal to – what did I see over there?—a, and a is equal to b, and then b is equal to temp. Algorithmically, it looks similar, and it even sort of has the same side effect. But you may question how things are actually working behind the scenes, in order to swap things by reference, as by opposed address. Let me draw the activation record for you. This is the thing that’s referred to as a; this is the thing that’s referred to as b. This is always below that. That’s just the rule for parameters to a function or a method call. There’s a safe p c, right here, pointing to the instruction that comes right after the call to the swap function. And then, ultimately, the very first instruction does an s p is equal to s p minus 4. So that, this is the entire activation record that is in place before swap actually does anything in terms of byte moving around. Okay? This a and this b, just because we refer to them as if they’re direct integers doesn’t mean that, behind the scenes, they have to actually be real integers. The way pointer – I’m sorry – the way references work is that they are basically just automatically dereferenced pointers. Okay? So without writing assembly code for this, when I do this, and I do this, just because I’m passing in x and y – that’s the way you’re familiar with the swap function from 106b and 106x – just because you’re not putting the ampersands in front of those xs and ys, doesn’t mean that compiler is blocked from passing their addresses.

This, right here, has to be an L value; this has to be an L value, as well. It means it actually has to name a space in memory, okay, that can be involved in an update or an exchange by reference. When this, right here, is compiled to cs107 assembly language, it actually does exactly the same thing that that does, right there. C + + would say, “Oh, this is x and y, but I’m not supposed to evaluate x and y because I can tell, from the prototype, that they’re being evaluated – they’re being passed by reference.” So the way it does it is, on your behalf, just secretly says, “Okay, they need – they really need this x and this y to be updated in response to the swap call.” That’s only gonna happen if this, as a function, knows where x and y are. Okay? So the illusion is that a and b, as stand-alone integers, are second names for what was originally referred to as x and y. Behind the scenes, what happens is that this lays down a pointer and this lays down a pointer. You don’t have to use the word pointer to understand references; you can just say it’s magic and somehow, it actually does update the original values. It lays down the address of these things. The assembly code that is generated for this function, right here, understands even though you don’t necessarily know this. It understands that it’s passing around pointers wherever references are expected. And so it automatically does the dereferencing of the raw pointers that implement those references, for you. The assembly code for this function is exactly the same as this, like, down to the instruction, down to the offsets; everything is exactly the same. Okay? Do you understand how that’s possible? Just because you don’t put ampersands in there, doesn’t leave the compiler clueless because it knows, from the prototype, that it’s expecting references. It’s just this second flavor of actually passing things around by address. You’re just not required to deal with the ampersands and the asterisks. Okay?

It knows because of the way you declared these, that every time you refer to a and b in the code, that you’re really referring to whatever is accessible from the address that’s stored inside the space for a. Okay? Does that sit well with everybody? Okay, that’s good. So people freaked out a little bit on Assignment 1, I think – or Assignment 2 – when they saw local variables declared as references. Like, you’re used to the whole parameter being a reference, as if this data type and that data type, you’re only allowed to put them in parameter lists. That’s just not true. If you want to do this, if you do this right here, then that’s just traditional allocation of a second variable that happens to be initialized to the x variable – I’m sorry – to whatever x evaluates to. Okay? And that’s just normal. If you want to do this, you can. Okay? And this isn’t a very compelling reason because ints are small, and there’s no algorithm attached to this code. So you’re not necessarily clear why it would do that. The was this is set up, is that x really is set aside as an integer, with a 17; y is really set aside with it’s own copy of the 17. But z is declared – you drew them in 106b and 106x, this way. And the picture is totally relevant to the actual compilation measures because it really does associate the address of y inside the space that’s set aside for z. Okay? Does that make sense? If I were to do this, you would totally draw that without the shaded box. Right? This and this, from an assembly code standpoint, are precisely the same. At the C and C + + language level, you’re required to put the explicit asterisk in here; you’re not there. Okay?

You may say, “Well, why would I ever use actual pointers, if references just become pointers?” Well, references are convenient, in the sense that they give the illusion of second names for original figures that are declared elsewhere. The thing is, with references, you can’t rebind the references to new L values. Do you understand what I mean when I say that? It’s a technical way of saying you can’t reassociate z with x, as opposed to y. Once it’s bound as a reference to another space, that’s in place forever. So you don’t have the flexibility to change what the arrow – where the arrow points to. You do have that ability with raw pointers. So you could not very easily build a link list, if you just had references. Does that make sense to people? Okay. So that’s why the pointer still has utility, even in C + +. Okay? You saw a method where – I think it was, like, get last player, or something like that. There was some method, in one of the classes for Assignments 1 and 2, that returned a reference. If you were, like, “I’ve never seen that before; what does that mean?” All it’s doing is it’s returning a pointer behind the scenes, okay, and you don’t have to deal with it that way. You can just actually assume that it’s returning a real string, as opposed to a string star, or an int as opposed to an int star, and just deal with it like that. But behind the scenes, it’s really referring to an int or a string that resides elsewhere. Does that make sense to people? Yes, no? Okay. So even though in C – when you – C + +, you start dealing with references, it’s not like the compilation efforts that are in place. And the assembly code that’s generated is fundamentally different in the way it supports function call and return, and just code execution. Okay? It just has a different language semantic at the C + + level that happens to be realized, similarly, at the assembly code level. Okay?

References are the one addition to C + +, I’m sorry, there’s a few –but there are – a lot of people, when they program in C + +, they actually program in C, where they just happen to use objects and references. They don’t use inheritance; they don’t necessarily use templates all that often, although most of the time, they do. But that’s not an object-oriented issue. They don’t use inheritance; they don’t use a lot of the clever little C + +isms that happen to be in there. They really just code in C, with references and objects. And they just think of references as more convenient pointers, less confusing. And they think of objects as structs that happen to have methods defined inside. Does that mean there’s a reasonable analogy I’m throwing by, I’m assuming? Okay? Well, when you program as an LL purist, you’re not supposed to think of objects as structs, you’re supposed to think of them as these alive entities, that actually are self-maintaining and you communicate with them through the series of instructions that you know they they’ll respond to because you read the dot h file. Okay? Turns out that structs and classes – just looking at this from an “under the hood” perspective – structs and classes are laid down in memory virtually the same way. Not even virtually, exactly the same way. Okay? In C + +, structs can have methods. The structs – I’m not misusing a word there – structs can have constructors, destructors, and methods, as can classes. Classes aren’t required to have any methods. The only difference between structs and classes, in C + +, is that the default access modifier for classes is private, and the default access modifier for structs is public. Does that make sense to people? Okay? So the compiler actually views them as more or less the same thing. It’s just there’s a little bit of basically a switch statement at the beginning, where it says, “Was it declared as a struct or a class?” And then, it says, “Okay, it’s either private or public, by default.” Okay?

When you do something like this: class, I’m gonna say – I’ll just do binky, and I’ll worry about the public section in a second. Let’s not worry about constructors or destructors. They’re interesting, but they’re just complicated, so I want to blow them off for a minute. And then, privately, let’s say that I have an int called winky, a char star called blinky, and let’s say I have a wedged character array of size 8, called slinky. And let’s just get all these semicolons to look like semicolons. And that is it. And those are the only data fields I push inside. Every single time you declare one of these binky records, or binky objects, you do this. It shouldn’t be that alarming that you really get a blob of memory that packs all the memory needed for these three fields into one rectangle. And it uses exactly the same layout rules because of all the fields, winky is declared before whatever I called it, blinky. And then, on top of that, is an array of 8 characters called slinky, that this entire thing is the b type. It’s laid out that way because if you look at this, and think of it as a struct, the layout rules are exactly the same. Winky is stacked at offset zero; this is stacked on top of that, and this resides on top. This is an exposed pointer, which is why there 4 bytes for that rectangle. This is an array that’s wedged inside the struct/class, which is why it looks like that for the upper 50 percent of it. When you do something like this – forget about constructors, let’s just invent a method, right here. Let’s just assume that I have this method, int and some other thing, donkey where I pass in – let’s say a [inaudible] x and an int y. Okay? And let me actually declare another method, char star – I’m running out of consonants – minky, and all I want to do is I want to pass in, I’ll say, an int star called z. And I will inline the implementation of this thing to do this: int y – I don’t want to do that – int w is equal to asterisk z. And then, I want to return slinky plus whatever I get by calling this donkey method.

I’m not going to implement all the code for this, but I will do this: Winky, winky, and that’s gonna be good enough. It’s a very short method. You don’t normally inline it in the dot h; I’m just putting all the code in one board. Okay? Just look at this from an implementation standpoint, and let me ask something I know you know the answer to. Z right there, is explicitly passed in as a parameter; w is some local variable, okay, that’s clearly gonna be part of the activation record for this minky thing. Why does it have access to winky right there, or slinky right there? Because you know that it has the address of the object that’s being operated on. Does that make sense? The this pointer is always passed around as the negative 1th argument, or the 0th argument, where everything is slided over to make space for the this pointer. It’s always implicitly passed on your behalf. Okay? Do you know how, for like vector-nu, and vector-dispose, and vector-nth, vector-append, you always pass in the address of the relevant struct, as the explicit 0th parameter. Okay? Well, they just don’t bother being explicit about it in C + + because if they know that you’re using this type of – if you’re using that type of notation, you’re really dealing with object orientation. What really happens, on your behalf, is that it basically calls the minky function, not with one variable, but with two. It actually the address of something that has to be a binky class, or a binky struct, as the 0th argument. So whenever you see this in C + +, what’s really happening is that it’s doing this: it’s calling the minky function, passing in the ampersand of b, and the ampersand of n. Okay? That happens to be an elaborately named function. And I’m just going with the name spacing to make it clear that minky is defined inside binky. Okay? And I’m writing it this way because even though we don’t call it using that – that we don’t use it – call it using this structure, right here, this is precisely how it’s called at the assembly code level. Okay?

There’s certainly an address of the first assembly code instruction that has anything to do with the allocation of this w variable. There’s going to be an s p is equal to s p minus 4, at the top of the assembly code that gets admitted on behalf of this. People believe that, I’m assuming? Yes, no? Okay? The reason that C + + compilers can just be augmented, at least the code admission port of it, can be augmented to really accommodate some parts of C + + fairly easily, references and traditional method calls against an object, is that, whenever it see this, it says, “Okay, I know they mean this because they’re being all LL about it. But they’re really calling a block of code, okay, associated with the minky function inside the minky class, and I have to not only pass in ampersand of m as an explicit argument, but before that I have to splice in the address of the receiving object.” Does that make sense? Okay. So the activation record that exists on behalf of any normal method inside a class, it always has one more parameter than you think But it still is gonna have a safe p c – I’ll write it right there – it’s gonna have two parameters, on top of it. This right there, this would be the thing that is locally referred to as z. Okay? Below that, it would make space for this variable uptight int called w. This right here might point to something like that; it certainly would point to something like that in this scenario, right here. Does that make sense to people?

Because n – I’m not using pure stack frame picture here – but because n is declared with a 17 right there, this would obviously be declared and initialized that way. Okay? Make sense? This would point to some assembly kind code instruction associated with after that call, right there. So there’s a little bit more to keep track of, but as long as you just understand that k argument methods – k just being some number – k argument methods really are just like k plus 1 argument functions. Okay? When we’re thinking in terms of a paradigm, we don’t actually wonder about how C and C + + are similar. But when we have to write assembly code for something, we say, “Okay, I want to use the same code admission scheme for both C structs, with functions, and C + + objects, with methods.” You can use exactly the same scheme, the same memory model, for structs and classes, and function call, and function call and return, by just looking at k algorithmic methods, as if they’re k plus 1 arguments functions, knowing that the 0th argument always becomes the address of the relevant object. Okay? Does that make sense to people? When this, ultimately, calls this function right here, you understand that it’s really equivalent to that. We just don’t bother putting in the this pointer. Does that make sense? So internally, when I set up function call, or assembly code to actually jump to the binky colon colon donkey method, I actually have to make space for 12 bytes of parameters, 8 bytes for two copies of winky. Make sense? And also, the this pointer. And because there’s nothing in front of this method call, it just knows to propagate whatever value this is, and replicate it down here for the second method call that comes within the first one. Does that sit well with everybody? Okay?

Had I had a variable of type binky reference here, then – and I had done, like let’s say I had done this – I can’t change the picture, but I’m just improvising this one point. If I had done something like this: binky of d – binky reference d, and done this, then the address of whatever binky object d refers to would have to be the thing that’s laid down in the this portion of the activation portion of the record for the donkey method. Okay? That make sense to every body? Yes, no. Got a nod; it doesn’t tell me. Okay. So even though you think you’re data-centric when you program in C + +, and you’re verb function or procedure-centric, when you programming in C, the compilers really see them as just different syntaxes for exactly the same thing. Okay? And they ultimately become this stream of assembly code instructions that just get executed in sequence, with occasional jumps and returns back. And it just promises because the compiler makes sure that it can meet the promise. It just promises to emulate what the C – procedural C – or the object-oriented C + + programmer intended to do. Okay? Make sense? Yeah.

Student:[Inaudible] use stasis?

Instructor (Jerry Cain):That’s completely different. We don’t see static methods too much in 106 and 107. You know how the word static seems to have, like, 75 different meanings? Well, it has a 76th meaning, whenever the word static decorates the definition of a method inside a class. Okay? I can go over this. Suppose I have a class called fraction, okay, and publicly, I have a constructor fraction where I pass in an int m and an int denom and I might even default the denominator to be 1. So you can actually initialize fractions to pure integers. And I have this whole stream of methods. But then, I also have this one method inside, called reduce, which is intended to take a fraction that’s stored as 8 4ths and bring it down to 2 over 1. Does that make sense to people?

Well, as part of that, typically what happens is that you’ll write some function: int greatest common divisor int x and int y, and they’re still put inside the fraction class because it’s seen – it might only be relevant to your fraction class. And you actually pass in these two parameters and it really just behaves as a function because it doesn’t need the this pointer, in order to compute the greatest divisor that goes into x and y. So a lot of times, what you’ll do is you’ll mark it as static. And what that means in the context of a C + + class, is that you don’t need an instance of the class to actually invoke it. You can actually just invoke it as a stand-alone function. In fact, it really is a regular function that happens to be scoped inside the definition of a class as if it’s just – as if the class has a name space around it. This is – I didn’t mean to put public here, I meant to put private. You remember how on – this is how similar static methods and functions really are. Remember Assignment 2? Some of you had that headache of trying to pass in a method pointer to be searched, when it actually had to be a function pointer, and you’re all like, “What’s the difference. They’re all the same to me.” Static methods because they don’t have this invisible this pointer being passed around, they really are internally recognized as regular functions. So if I wanted to define my act or compare function, not as a function but as a static method inside, I could have done that. I could have passed the address of a static function to b search. Does that make sense? Static method, I meant. Okay? Good? Okay. So static, I don’t want to say you should avoid it. There are certainly places where static is usually a good thing to do. It interferes with your ability to use inheritance when you’re dealing with C and C + +. You don’t get inheritance and you don’t get – you don’t get the right thing, in terms of inheritance, when you’re dealing with static methods. Which is why you don’t see them as often. And normally, when things are written as methods inside a class like this, it’s because they are actions against objects, as opposed to actions on behalf of the class, like this would be, right here. Okay? Any other questions? Okay. So next Tuesday, when we pick up on the normal discussion section cycle, you will get a section handout, where you’ll have some elaborate – I don’t want to say elaborate – some meaty examples on C and C + + code generation. I’m only gonna test you on C code generation, with no objects, and no references on the mid-term, although it’ll be fair game on the final.

Next Wednesday, I will give out Assignment 5; it will not be a programming assignment. It’ll be a written problem set, where you’re not required to hand it in. You’re just required to do it, and make sure that your answers sync up with the answers that I give out in the key. And it’ll be totally testable material. You’re not – you’re just required to effectively do it by 6:59pm, on Wednesday, May 7th. Okay? Does that make sense? So you have a full week to do this one assignment, which means you’ll all do it on Tuesday, May 6th. But you’ll have a week to actually kind of recover from the Assignment 4 experience, and learn all the pointer stuff. Okay? This make sense? What I want to do is I want to start talking, a little bit, about how compilation and linking works. We completely disguise compilation and linking with this one word called make. And it’s just like – it’s this magic verb like, “Just do it, and make it work for me.” And somehow, it actually, usually does do that for us. But when you compile C code – C + + codes as well – but C Code, I’ll focus on. It normally invokes the preprocessor, which is dedicated to dealing with pound define and pound includes. Then it invokes what is truly called the compiler. That’s what’s responsible for taking your dot C files, and your dot C C files, and generating all those dot o files, that happen to appear in your directory, after you type make. Make sense? And then, after compilation, there’s this one thing you don’t think about because code warrior, and x code, and visual studio C + +, they all make compilation look like it’s the full effort to create an executable. But once all those dot o files are created, and one of the dot o files has code for the main function inside, there is this step called linking, which is responsible for taking a list of dot o files; stacking them on top of one another; making sure that any function call that could possibly ever happen during execution, is accounted for because there’s code for that function one of the dot o files.

And then, it just wraps a dot out, or 6 degrees, or my executable, or whatever you want to call it, around the bundle of all of those dot o files. Does that make sense? Okay. Somebody had mentioned, the other day, that a line like this, why wouldn’t it be replaced by call of, like, p c plus 480, or p c minus 1760? To actually jump to the actual p c relative address where the swap function starts, that actually will happen in response to the linking phase because everything’s supposed to be available to build the executable file. And so if it knows where everything is, in every single dot o file, and it knows how they’re being stacked up one on top of one another, they can replace things like this, with their p c relative addresses. Maybe, at the time that things are linked together, it knows that swap function happens to be the one defined rate above this. So when it calls swap, it actually calls p c minus , let’s say 96, or something like that. Okay? Make sense? Okay. Let me get some clean board space, and just spend five minutes, that’s all I’ll be able to do, really, talking about the preprocessor.

And I’m doing – okay, five minutes. Whenever you’re dealing with pure C, if you wanted to declare global constants, traditionally until recently, you only had pound defines as the option for defining nice, meaningful top-level names for these magic numbers, or these magic strings. So something like this: pound define – I’ll just do k with, let’s say it’s 480 pixels, so I write down 480. And then I have something like this, k height 720. And then, I have some pound includes; I’ll talk about those later. And then, there’s all this code right here, where usually, it’s the case that some part of it is framed in terms of that and/or that. So maybe it’s the case that you have a printout statement with is percent d backslash n –oops, right there – and then, you feed it the value of k width. Maybe a more meaningful thing might be something like int area is equal to k width times k height. And it exists in your code somewhere. When you feed, or when a dot C file is fed to GCC or G + +, there’s a component of either GCC or G + + called the preprocessor that doesn’t do very much in terms of compilation. It doesn’t do compilation, as you know it. It actually reads your – the dot C file that’s fed to it, from top to bottom, left to right, like we would read a page. And every time it sees a pound define, it really only pays attention to these words. What it’ll normally do is, it’ll read in a line. And if there’s nothing that has a hash at the beginning of the line, so it doesn’t involve any pound defines or pound includes, it’ll just say, “Okay, I just read this one line, and I don’t care about it. So I’m just gonna publish it again as part of my output.” When it reads something like this, it internally has something like a hash set, although it’s probably – I don’t want to say it’s strong, eh, but it’s probably strongly typed.

It associates this as a key with this string of text; it doesn’t even really know that it’s a number. Okay? It just says, “Everything between the white space that comes right after width, and the white space that comes after the first token, that’s associated as a text stream with that identifier, right there.” Okay? It does the same thing for that, right there, and as the preprocessor continues to read, every time it finds either that or that, in your code – the one exception being within a string constant – but everywhere else, it says, “Okay. Well, when I see that, they really meant that number right there.” So it lays down 480, as if you typed it there explicitly. Does that make sense? Okay? You know the alternative would be to put lots of 480s all over the code base, without using a pound define, or lots of 720s around the code base, without using a pound define. But then you change the dimensions of your rectangle, in your graphics program, or whatever. And then, you have to go through and search and replace all the 480s, to make them 520s or whatever. It makes some sense to consolidate that 480 to a pound define, if the pound define is the only thing that is available to you. But the pound define is really very little more than pure token, search, and replace. Okay? Does that make sense to everybody? So the responsibility of the preprocessor is to read into the dot C file, and to actually output the same dot C file, where all the pound defines and other things, like pound includes, have been stripped out. And the side effects that come with pound defines have been implanted into the output. So this, right here, would be missing these two lines. Anything right here, that doesn’t involve those two pound defines, would be output verbatim. This one line would be output verbatim, except that would have a 480 replaced, right there. Okay? And this, right here, would have [inaudible], and that right there, it would probably have a 480 times 720. Although, the compiler might actually do the multiplication for you, if it’s a very good compiler. Okay? Does that make sense? Actually, the – I’m sorry – the preprocessor would not do that, though. This would just become 480 times 720. Okay? And the fact that it happens to be text – it is really text, at the moment – it’s just that, when it’s fed to the next phase of compilation, it’ll go ahead, and it’ll chew on those things, and recognize that they really are numbers. And that’s when it might do the multiplication. Yeah?

Student:[Inaudible] define k width and then you declare a variable called k width underscore second, or whatever?

Instructor (Jerry Cain):Yeah, it won’t do that. It has to be – it has to be recognized as a stand-alone token. So it I did this, for instance, like k width underscore 2, something like that, it’ll not do sub – sub token replacement for you. It also won’t do it within string constants, either. Okay? Okay. I’m not done with the preprocessor yet, but I can certainly answer your question. Go ahead.

Student:[Inaudible] verbatim? [Inaudible] printouts?

Instructor (Jerry Cain):Yeah.

Student:What does the preprocessor do with that line?

Instructor (Jerry Cain):The preprocessor would, basically, output this line right here, except that the k width would have been replaced – spliced out and replaced with the 480. Okay? Which would be – it’s supposed to be functionally equivalent. It’s just allowing for the consolidation of magic numbers and constants. So whatever we want to use the pound defines for, to just consolidate all of this information at the top. Okay? Yep?

Student:[Inaudible] preprocessor [inaudible]

Instructor (Jerry Cain):It actually does not know the 480’s a number yet. It is – it is – they just are incidentally digit characters, and that’s it. But if I had done this, like that right there, the preprocessor would be, like, “Okay. I’m just gonna take that 6 character string, and put it right there, and right there.” And it’s only later on, during real compilation, post preprocessor, that it’ll be, like, “Hey, you can’t do that.” Okay? Does that make sense? Okay. And that wouldn’t happen. Okay. So come Monday, I will finish up the preprocessor. Pound defines are easy; pound includes are not. They’re actually some involved with that. And then, I’ll talk about compilation and linking. Okay. I will see you on

[End of Audio]


Lecture 11: Programming Paradigms

Section Assignment 3
8 pages

Section Assignment 3 Solutions
4 pages

Topics: Preprocessing Commands - #Define as a Glorified Find and Replace, Preprocessing Macros - Preprocessor Commands With Arguments, Example of Macro Usage in the Vector assignment to Calculate the Address of the Nth Element, Assert Macro Implementation, How Asserts are Stripped Out for the Final Product Using #Ifdef and #Define, C Macro Drawbacks When Given More Complex Arguments, #Include as a Search and Replace Operation, < > Vs. " ", Output to the Compiler After Preprocessing, How to View the Output of the Preprocessor Using Gcc, How to Avoid Circular #Include Loops Using #Ifndef, #Define, and #Endif, Visual Representation of the Result of Preprocessing (The Translation Unit) that Is Passed to the Compiler to Create a .O File, An Example Illustrating the Preprocessing and Compilation of a Simple Program



Instructor (Jerry Cain):Here we go. Hey, everyone, welcome. I actually have two handouts for you today. They’re posted online, and we’ll distribute them through the lecture at just right now, as I’m starting. One of them is tomorrow’s section handout. Its focus is pretty much on the assembly code generation I was talking about last Monday, Wednesday, and Friday.

The second of the two handouts is Assignment 5. I was gonna hand it out on Wednesday. Then I said, “You know what? I’ll hand it out a little bit earlier,” because I just want to like afford everyone the flexibility to work on these problems when they have time.

You don’t have to hand anything in for Assignment 5. It is a written problem set. There are no programming exercises whatsoever. It’s just lots and lots of practice with this code generation stuff that we’re doing in section tomorrow, but you’ll also certainly see C code generation on the midterm next Wednesday evening.

So the only deadline I’m really imposing on Assignment 5 is that you actually do the problems, make sure your answers are consistent with mine. And I say it that way because it doesn’t have to be exact on an instruction by instruction basis, but you have to just make sure that your code dereferences things the right number of times and loads things the right number of times in order to feel comfortable with that material because it definitely will appear on the mid-term I give next Wednesday evening at 7:00 p.m., okay.

I don’t know where the mid-term is gonna be yet. I’ll probably figure that out in the next two days. This Wednesday I will certainly give out a practice mid-term and a practice solution so that you have some fodder to play with over the course of the next week. But all of the section handout problems, if there were 20 problems on those section handouts, 19 of them came from old practice midterms.

So definitely make sure you understand those. You’re welcomed to bring in any lecture notes, any of your assignment printouts, whatever you want to bring in. You can bring in textbooks. I don’t see the value of it since everything that you’re really responsible for has been covered either in lecture or in handouts. So that’s that.

I know Assignment 4 is due this Thursday. I think people have started it, and they are true believers when I say that it is probably the most difficult of the four you’ve seen all quarter. So start that soon, even if for no other reason than just doing a small component of it tonight so you know what you’re up against with this Thursday deadline, okay.

When I left you last time, I had just started to talking about the C preprocessor. I want to talk about preprocessing versus compilation versus linking. You’re used to, from at least 106 memories, it all being the same thing. You clicked command all or you did a drop down and you clicked build, and all of a sudden, this double clickable app was created.

That’s because it does these three things in sequence behind the scenes, and it doesn’t very clearly advertise whether or not something in preprocessing or compilation or linking broke down. You don’t necessarily know the difference.

So I want to focus on the differences and tell you what each phase is responsible for. And when I left you last time, I had just introduced the notion of a pound define, and I advertised it quite clearly as something that was no more sophisticated than glorified search and replace of this token with that text right there. So if I do this, a height – whoops, and I say that this is 80, then anywhere K width and K height appear beyond these two lines, it actually substitutes this for that and this for that.

The only exception is it won’t do that in string constants, but it’ll even do it in future pound define. So if I were to do something like this, K perimeter, and I equated it with K width plus K height, then this would not only substitute anything down here, but it really would replace that with a 40, and this right there would be replaced with a 80. So by the time you got around to the definition of K perimeter, it would see this not as this token stream, but as two times open paren, 40 + 80, close paren, okay.

It doesn’t evaluate it. It doesn’t even recognize that they’re integers. It just looks at it as blank text, but the substitution of this there and this there is exactly what you do want. Pound defines are really nothing more than glorious search and replace. We use them in C, pure C, to consolidate what otherwise metric numbers and metric string constants, and attached meaningful names to them. What you may not know about pound defines is that you can define an extension to the pound define, and you can actually pass arguments to pound defines as if they’re functions.

They’re not called functions. They’re called macros, so I could do something like this. The maximum string, A, B. As long as there’s no space between that paren and the final character of the token right there, it’s clearly understood to be a little bit more than just a pound define constant. It’s a pound define expression that’s parameterized on whatever A and B adopt in context when they’re used later on, okay.

So if you want some quick and dirty way to find a larger of two numbers, you could substitute it with this. And just to be clear about order of operations and evaluation of everything, we usually see an intense number of parenthesis put around these things just so that there’s absolutely no ambiguity as to how things should be evaluated if this thing is just plopped in context somewhere later on, okay. And order of operations might otherwise confuse things. You’ll actually see an example of that in a second.

Anywhere you see this later on, you wouldn’t type it in this way, but just pretend that you did. If I, for whatever reason, needed it to tell me that 40 was, in fact, greater than 10, when I see this in code later on, it really will go and find the max symbol, and it will – every place that there was an A, it will place a 10.

Every place there was a B, it will place a 40. So this would, during preprocessing, be replaced by 10 greater than 40, if true, 10, otherwise 40. And even though that’s an obtuse way of identifying that 40 is greater than 10, that is the textual substitution you would get in response to that, okay.

So it’s like a pound define. It’s this quick and dirty way to inline functionality that’s otherwise complicated with something that’s a little bit more readable. You could, of course, go with a function, but you already know from the assembly code you saw last week regarding function call and return, that a lot of time is spent setting up parameters, writing the parameters there, jumping to the function, and then after it’s all over, jumping back and cleaning up the parameters.

It’s not that much work. It may be ten assembly code instructions, but this is the type of thing that would expand to like three or four assembly code instructions. So the entire function or the entire effort of determining a maximum number using just traditional function column return would spend 70 percent of its time, or something about that percentage, just calling and returning from the function. Do you understand what I mean when I say that?

Okay, using this pound define thing, this is this very efficient way of jamming in an expansion of this every place MAX with two parameters is actually used. Now it doesn’t actually require that A and B be integers. I mean, of course, we know to look at them, that they should be integers, but if I were to do this, get rid of that.

If I were to be senseless and do something like this, this would eventually cause problems. But as far as preprocessing is concerned, all it would do would be doing – all it would do here is do templatized search and replace, would use this.

Every place there’s an A there, you’d see a 40.2, and every place there’s a B, you’d see a hello as a string constant. And only during compilation when it reads the expansion of this as if we typed it in that way, well, say, you know what? I don’t like you comparing doubles to car stars, okay, using a greater than sign. Okay, so you would get in there eventually, but you wouldn’t get it via the preprocessor. Do you understand what I mean when I say that? Yep.

Student:Is there a good or bad style to do something like that?

Instructor (Jerry Cain):Well, in something like this?


Instructor (Jerry Cain):I actually don’t see the problem with this as long as you have been doing it for more than a few days. I mean this is – I’ll show you an example of two pound define macros that we used in Assignment 3, one of which you didn’t even know you were using, and the other one is in my solution, okay.

This is obviously a hack just to introduce a point, that preprocessing is still just text and replace, and that it leads to problems later on might be tracked and flagged in compilation, or it might be flagged when you get a (inaudible) at 4:00 in the morning. Okay, you just don’t know. There was a question right here.

Student:Do you receive from this (inaudible)?

Instructor (Jerry Cain):The question is do you receive anything. It doesn’t receive in the sense of return value, but this is an expression that evaluates to either the result of evaluating A or evaluating B. So this one, before I crossed it out, this would evaluate to the number 40. So if I wanted to, I could do this, all caps, of like, let’s say Fibonacci of 100 and factorial of 4,000, and I’m curious as to which one’s bigger.

There’s actually a problem with that that I’ll outline in a second, but that would really bind max to the larger of those two values, okay. It’s interesting that this is something – there’s something about that call that I don’t like, but I’ll explain that in a second. Let me just show you some reasonable uses of pound defines. We’ll be more central.

Do you know how in Assignment 3 there was some situations where you wanted the assert condition to be either greater than or equal to zero, and less than logical length, and in other ones, you wanted to be less than or equal to logical length? And depending on how aggressively you reuse and call vector nth yourself, there may have been situations where you were blocked out by the assert statement that sat at the top of the implementation of vector nth.

Vector append or vector insert, the logical length is a completely reasonable parameter to accept, but if you called vector nth using that value and you had the right assert statement inside, it would actually block you out and error out and end the program. Do you understand what I mean when I say that?

Well, what I did, rather than writing a function that computed the nth of the address of the nth element in a blob of memory, I wrote it as a pound define macro. I just did this. Pound define, I called in nth Lin address, and I framed it in terms of base and a Lin size and index, okay. And I equated that all in the same line. You can actually do that and it’ll allow you to continue the definition on the next line. I equated it with this like that, okay.

I could have written it as a function. The reason I wrote it as a separate thing altogether is because I wanted something that did the point arithmetic for me without the asserts. I wanted to control the go and get the millionth element, even if it were dealing with an array length too, but I would actually call this from within vector nth after I’ve den the assert. Does that make sense to people?

And so this way I had this quick and dirty way of actually doing this type of point of reference just once, studying it and saying, “Okay. This needs to be careful code because it’s the type of code that can go wrong if you’re not careful about it.” Make sure that this is doing exactly what I want it to do, and then call this everywhere. I see a lot of people do the point arithmetic like seven or eight times in vector dot C, okay. And if you’re cutting and pasting it, that’s not great, okay.

If you’re cutting and pasting and you got it right the first time, it’s probably okay, but I’d much rather see people consolidated this to either a help or function, or now that we know it, a little macro that jams this calculation in the code for me even though it looks like the function, okay. Does that make sense?

There’s no asserting going on here whatsoever, so I can get the asserts right, and rather than calling vector nth everywhere, I can just call nth a Lin address, okay, wherever I would otherwise call vector nth internally. So I never have to worry about whether or not the off by one nature of what vector nth allows in terms of incoming values to block me out accidentally, okay. Make sense?

Now the thing about this is this looks like a function call. There is really no type checking done on these things right here, so this only works post preprocessor time. If this gets specified to be a pointer, and these are things that can be multiplied together and ultimately be treated as an integer offset, okay.

You usually do get that right, but it’s not as good as a true function in that regard because a preprocessor doesn’t do error checking at all, but it does push the expansion to the compilation phase where it does do error checking, okay. Usually don’t like separation of cogeneration, or I’m sorry, let’s say C code generation from the actual type checking, but you just deal with it with the down points. Question over there?

Student:Well, with the (inaudible) equal to or as integers?

Instructor (Jerry Cain):You could certainly. When I used this, I implemented void star vector nth, took a vector star V, and an nth. I think it was called position. And as it turns out, it was two lines long. I had the assert position greater than or equal to zero. I had the assert – I actually had these on one line, but I’m just making it clear, position is less than V arrow lodge length, spell it right, lodge length. And then right at the end, I said return nth a Lin address where I passed N V arrow OM’s, V arrow OM size and position.

So in response to this macro call right there, it’s not really a call. That’s kind of the wrong word. It’s just the placement of a macro so that it expands during preprocessor time to that as if we typed it in ourselves that way, okay. And as an expression, it evaluates presumably to the right address, so that’s what gets returned, okay. That make sense? Okay.

There are some drawbacks to this. It’s quite clear that that’s a macro because I put it right there. What you may not have know is that these right there are also macros, okay, and I’ll show you what they look like in a second. They’re a little weirder, but nonetheless, they are in fact macros, and that’s how they can be stripped out using some compilation flags so that they’re not present in the final executable that you ship as a product. Somebody had a question, yeah.

Student:Quickly, that’s not your data void star, but the top was car star (inaudible)?

Instructor (Jerry Cain):Actually, in pure C it’s not a problem. Actually, in none of the language it’s a problem because remember void star is like the all accepting pointer, so it’s what you’re doing when you assign something of type car star, which is what this becomes, and you return and funnel it through a void star. You’re doing what’s called upcasting. You’re just going from a more specifically typed pointer to something more generic, and it just knows that there’s no danger in that direction.

It’s when you downcast and you say, “I have this generic pointer, but now I’m claiming that it was really this very specific pointer all along,” but you really do need a cast in many situations, certainly, if you have references involved, okay. Other questions? Okay.

So the problem with this that I did outline is that you don’t get side checking at all during the preprocessing phase. There are other problems associated with this, but let me talk about what assert looks like. You’ve seen – I imagine 80 to 90 percent of you have actually seen an assert statement fail, and you’ve seen what happens when the condition is passed to assert isn’t met.

The assert dot H file defines assert. It doesn’t define it as a function. It looks like a function call, but it really is this. Define assert, and I’ll just put C, O, N, D, and it’s equated with this. It actually evaluates cond, okay. And if the condition is true, you know that it just returns in the functional sense, although it really is not a function call.

When this passes, it just basically evaluates to a no op and doesn’t do anything, and just continues to the line after the assert. What it does is it needs to have at least one statement in the – sort of the if region of this turnery thing. So it just casts zero to be a void just to say, “Okay, don’t do anything with this zero. Don’t allow it to be assigned. Just has to be present to sit in between the question mark and the colon.”

This right here is some elaborate thing. That printout, it’s a standard error. Some string that involves the filename and the line number of this assert in the original file followed by an exit. Actually, it doesn’t have this right there, okay.

So you may not understand the syntax and how everything is exactly relevant to the implementation of assert, but you know that this looks harmless and this looks pretty drastic, okay. So whenever you put assert position is greater than zero in your code, what you’re really asking the preprocessor to do for you is say, “Yeah, take this assert position greater than zero, greater than or equal to zero, and replace it with position greater than or equal to zero, oh, awesome. Don’t do anything, otherwise end my program and tell me what line this thing failed at.”

Does that make sense? Okay. The actual full definition is this. If defines N debug, that’s kind of like a pound define, but it’s an if question about the presence of a pound define. If that’s the case, then pound define assert of condition to just be a no op – whoops, regardless L rather, we’ll do this. So this is the thing you’re using in Assignment 3, and this is the way it’s – this is really turned on.

If you pass or you define a count defined constant prior to this called MD bug for no debugging, then it replaces all of your assert calls with this harmless statement right there. Okay, so it technically is one statement, and this zero compiles to just one line of assembly that’s optimized down to zero lines of assembly. But that’s how the asserts go away when you compile it a different way so that there’s no danger of asserts actually failing on your behalf in production code. Does that make sense to people? Okay.

There are some problems with the definition of assert, not really. I actually want to go back and revisit this function right here, and in particular, that right there, that particular use of max, and start to show you the drawbacks of the preprocessor. And this is actually related to why I prefer static const globals as opposed to pound define constants because I’m trying to like get you away from the preprocessor to the degree you can.

This right here, it is so literal about a textual search and replace that it will call one of these things once, and the other one, the larger of the two might quite arguably be the more time consuming one. It will call it twice. Why is that the case?

Because this right here, because of that pound define definition for max over there, that one expands to this is equal to – this is the case that Fibonacci of 100 is greater than factorial of 4,000. If so, then return Fibonacci of 100, else return factorial of 4,000. Okay, and then there would be that right there.

Okay, that’s how literal the text and replace is, text, search, and replace is. And so you actually get the imprint of this is a very time consuming function. This isn’t quite as bad because it’s a linear recursion. But if you turn it up to Fibonacci 100 is greater than this right here, it’s gonna take not only a long time, but twice as long as a long time, okay, because of the second call right here. Make sense?

So it doesn’t actually cache results internally, or it’s not clever at all. It assumes that you really meant to type it this way because of the way you framed the definition of the pound define, okay. There are even – even so, even if it’s kind of stupid from an efficiency standpoint, at least it’s correct. Clever C programmers at one point go through this phase where they try to do as much in a single statement as possible, and so they might want to figure out the larger of two variables, and simultaneously increment the two variables, so they’ll do something like this.

Oh, yeah. I want to know the larger between M plus M and N, but I also want to increment both of them at the same time, okay. It will actually commit to a ++ on the smaller one just once, and will commit to a ++ on the larger one twice because of the way it expands it. This would be replaced at preprocessor time with this. Is M greater than M? Oh, and by the way, increment them.

Whoops, oh, it is, okay. Well, then return the value and then increment it, otherwise, return the other element and increment it. So you certainly see that ++ is being levied a total of three times, okay. That make sense? It’ll return one more than the true larger value, and it’ll also promote the larger value twice as opposed to once, okay.

Now you could argue that these are moronic examples because people wouldn’t do this in practice, but you could also argue that the language should be sophisticated enough that it just doesn’t allow people to do these types of things because if it does happen, maybe it happens one day out of 300, once a year, but you could very easily spend four to eight hours just trying to figure out why this one little line isn’t working properly, okay.

When those types of things are allowed to happen, you have to somewhat blame the language. You certainly can blame the language as opposed to the programmer if other languages wouldn’t have allowed something like this to happen, okay. So as we get to be better programmers, we’ll start to be more opinionated about how good the languages themselves are, and how they allow us to quickly get to a final product, and making it as easy in the process as possible.

Okay, C is really working against you in a lot of ways, okay. It was invented in like the late 60s, early 70s. It came into fashion. The spirit of programming then was let me do whatever I want, man, and so you can get down in the hardware. And it wasn’t as problematic then because think about how small code bases were in 1965. You can’t even think about that because I wasn’t even born yet, much less you.

But you’re dealing with programs, except for operating systems. Unix was being written in the late 60s and early 70s, maybe a little bit earlier than that, but most programs were like pawn and maybe like miniature golf with like the most ridiculous paddle – club and ball that you can imagine, just really, really simple programs that had to fit in 64K of memory or 16K of memory. There just couldn’t be that many programs. That just means programs were more manageable then.

Now you’re dealing with code bases. I can’t even imagine how many lines of code exist behind Google walls, behind Microsoft walls. We’re talking millions, tens of millions of probably lines of code, probably more than that. I have no idea, okay, but like we’re a magnitude like where the exponent is six or seven, okay, very, very large. If you’re weighing that much code, you don’t want to have to say, “God, this,” you don’t want to have to look for a problem like that and do a binary search on 10 million files to figure out what the problem is.

You want it to be very, very likely that you’d get something right the very first time you type it, and that is unlikely in C++. You’re all learning that right now, okay. Does this make sense to people? Okay.

There are other aspects of the preprocessor I should talk about. I think I’ve hit on everything with regard to pound define. There’s also the pound include. When you do this, pound include, I’ll do assert dot H. I’ll do one above it, include – let’s do STDIO dot H. That’s for print F and scan F and things like that. You know about assert dot H, and then you also do this, and you saw things like how to include gen lib and simp IO dot H in CS106. I don’t know whether anyone ever answered the angle bracket versus the double quotes thing, whether you just say, “Oh, I have no idea, but I’ll just do it because it works if I do it that way.”

Whenever you use angle brackets or less than and greater than signs to delineate the name of the dot H file, it’s taken by the preprocessors to mean, oh, that’s a system header file, so that actually should be with the compiler, so I should look one place by default for those files. But when it’s in double quotes, it assumes that it is a client written dot H file, so it looks in the actual working directory by default.

There are flags you can pass to GCC via the Make system to tell it other places where the pound include files might live, but by default this means in user slash bin slash include, and user include, which you’ve never looked at before, but they exist. This means, at least in our world, just looking at currently working directory over your compiling, and that’s probably where they are, okay. Make sense?

Another thing you might now know about these things is just like pound defines in many ways these are instructions to search and replace this line with something else. This one’s easier to deal with because you have a sense of what vector dot H looks like. What this does, when the preprocessor folds over that line and says, “Oh, pound include vector dot H in double quotes. Let me go find it. Oh, I found it.” It removes that line right there, and it replaces it with the full contents of the vector dot H file. Does that make sense to people?

And so the stream text that it builds for you as part of preprocessing, the output of preprocessing, it’s what’s called a translation unit where all the pound defines and all the pound includes have been stripped out. It creates the text that’s actually fed to the compiler on behalf of this line right here. It would replace it with the contents of vector dot H as if you’d typed it in by hand there, okay. Does that make sense?

Now you say, “Well, why don’t I just type in all the prototypes every single time at the top?” You want to consolidate all the prototypes to one file so that everyone agrees consistently on how all those functions should be called. But if you wanted to, you could just get rid of this, and if you’re only gonna use one or two of the functions, you can manually prototype them right there. And as long as it’s consistent with the real prototypes that exist in the dot H file, it wouldn’t cause any problems, okay.

The pound include process is recursive. So if you pound include a file that itself has pound includes, it will keep on doing until it just bottoms out, okay. It does basically this recursive depth research. It’s like random sentence generator without any random numbers, okay, where it builds a full stream of text built out of all the pound include files until it just has one stream of non pound include and non pound define oriented text that gets fed to the compiler, okay. Does that make sense? Okay.

So there’s that. If you want to experiment and you want to see what the product of just preprocessing is, what happens when just the pound include and the pound defines are stripped out, go create like a three line file with two pound define constants, and just pound include a dot H file that you write yourself. Don’t pound include any system headers because then the output is really, really long.

But if you want to do this, GCC, you’re used to seeing something like GCC, the name of a file dash C, like let’s say vector dot C or something like that. You haven’t typed that in yourself, but you see that published to the screen every time you type make, first time at three and four. Well, dash C means compile, but don’t try to build an executable. There’s actually something a little more drastic, dash E.

What that means if run the preprocessor and output the result of preprocessing, but don’t go further than that. So that means if you look at this file, you’ll have some senses as to what it should look like before. You certainly know what it looks like before preprocessing. All of the components that make up this file and vector dot H and anything that vector dot H pound includes will be spliced in sequence to build one big translation unit, okay, with all the prototypes and all the implementations that are in vector dot H, vector dot C rather, to the compiler itself, okay. Make sense?

Okay, as far – what happens if vector dot H pound include – oh, I’m sorry. You know hashset dot H pound includes vector dot H. Suppose I were airheaded and I said, “Oh, I want – you know, I think that vector dot H should also pound include hashset dot H.” You could if the preprocessor weren’t very smart, and you also didn’t have the power to prevent this. You could get circular inclusions. Oh, I better include that. Well, I have to include that. Oh, I better include that. It just could go back and forth forever.

The preprocessors will solve this problem a while ago. We’re not he first people to accidentally do that, but you’ve also seen things like this. If not defined, something like vector dot H, they’d go ahead and define it, and then list all the prototypes that come in vector dot H, and then mark the end region. The very first time that vector dot H gets pound included, or presumably this is the contents of that vector dot H file, as the preprocessor folds over it, it looks in this and goes, “Oh, have I not seen this little token before?”

And if it hasn’t, it’s like, “Okay, well, then I guess this is safe to do.” It’ll come down here and define exactly the same thing. You don’t have to associate anything with this key right here. It’s just basically like a valueless key and a hashset behind the scenes, but as long as it’s defined, then if for whatever reason this pound includes, either it’s self directly or something that would pound include vector dot H, the second time it’s the preprocessor tries to digest it as part of the generation of the translation unit.

It’ll come here and say, “Oh, is this not defined?” No, actually it is defined for reasons that may not be clear to me, but I defined it earlier apparently, so it’ll circumvent all this and put an end to the vicious cycle, okay. Make sense? Question over there?

Student:Yeah, just a question. The reason why you don’t want to include CPP files for that very reason?

Instructor (Jerry Cain):No, actually that’s a slightly different reason. All the dot H files, they declare prototypes, but nothing in dot H files ever emits – has any code emitted on its behalf. Like you declare structs, but it doesn’t actually generate code in response to that. You’re not supposed to declare storage for anything in dot H files except occasionally a very clever way of declaring a shared global variable, okay.

But the dot C files and the dot CC files, they actually define global variables and global functions and class methods and things like that, things that really do translate to zeros and ones in the form of machine code, but we view them as like M of R 1 is equal to R 3 plus 12 or something like that, okay. But dot H files are supposed to be just about definitions that have no cogeneration associated with them so that you can read them multiple times.

Like how many files are there for Assignment 4 and every single one of them probably pound includes vector dot H, right? If they all pound included vector dot C, then they would all be defining vector new and vector dispose, and so when time came to build RSS new search as an executable, you’d have like three or four implementations of the same function. Does that make sense?

Declaring the prototype for a function is very different than actually defining the function. One has code emission associated with it, the compilation actually generates code on behalf of the implementation. It doesn’t do anything on behalf of the prototypes, okay.


Instructor (Jerry Cain):Yeah, absolutely. You’re not required to do this. You just try to choose tokens that are very, very, very unlikely to come up anywhere else, okay. I mean this might be what you choose every time you have a vector dot H file, but presumably, you only have one vector dot H file, which means you’d only have one token defined like this. And when you really use normal pound defines, you just avoid the leaving underscores and the trailing underscores, okay. Does that all make sense? Okay.

So if you get a chance, it takes you all of 15 seconds to do this. Just type in by hand GCC space dash capital E, and then the name of some dot C file in the directory where you happen to be, okay. And you’ll just see it like tons and tons of stuff, but toward the end, you’ll see familiar codes. You’ll see the vector dot C code you wrote at the end of it, but at the top, all the prototypes and any of the dot – the stuff inside the dot H files that happen to be pound included by vector dot H, okay, and also by vector dot C for that matter. Question in the back?

Student:Yes. So you said that that’s the way they had them including circulation.

Instructor (Jerry Cain):That’s one of the ways. That’s the antsy standard way of doing so, yes.

Student:So my question was if that was not included, what did you say?

Instructor (Jerry Cain):Most preprocessors are smart enough that they don’t want to commit to circular recursion just because you’re not telling it to not do that. Most of them are very smart and they just keep track of it. And I think by protocol it understands that there’s no value in ever pound including something twice, but earlier implementations of preprocessors weren’t interested in solving every single problem that might come up.

It wasn’t – I don’t want to say it’s an edge case. It’s probably a very common case, but in theory, you don’t want to just assume that the preprocessor does the right thing, so you just want to make sure it couldn’t possibly fail you or infinitely recourse and loop forever, even if you’re using like some dummy implementation of the preprocessor, okay.

Some compilers have their own versions of this. I’ve seen – ten years ago I saw a preprocessor directive called pragma, and it had this optional word over here called once. That was just a more condensed version of trying to do exactly the same thing here without having to invent these names. This doesn’t exist, and certainly not antsy standard, and it used to exist in code wear and I don’t even see it in code wear anymore.

But different preprocessors can do whatever they want to to extend the standard preprocessor directives. You should just concern yourself with pound define, and if you want if not defines and if defines, and the L’s, but really just worry about pound define and pound include. And if you know what those are doing at preprocessor time, then you’re certainly walking away with a good amount of information, okay.

So there’s that. Let me draw some pictures so you’ll have something to write down. So this is vector dot C, and it has this as a code base in it. And it has this file, this file, and this file pound included at the front of it. Let’s just say that this is A dot H, and B dot H, and C dot H. I know that’s small, but you can just name them anything you want to, okay.

You know that it’ll go and find the contents of A dot H and B dot H and C dot H, and as part of preprocessing, what it’ll do if the contents of A dot H happens to be that, and the contents of B dot H happens to be this, and the contents of C dot H happens to be this, it really will build a stream of text that’s consistent with all these stacked emoticons. This is the stream of text it would build in memory, and the nose list smiley face would be at the bottom. And that stream of text would be passed on to the true compilation base, okay.

Everything that resides in here is still supposed to be legal C, it was just spread among multiple files at this level, so that things like prototype and struck definitions and class definitions and pound define macros and constants could all be consolidated to one place. You’re familiar with that concept, right, once used from everywhere, okay.

Well, if you let it got further, it will now compile, okay, where it will take this stream of text as if you typed it in character by character this way and compile it and emit assembly code on your behalf, and as long as there are no errors, it’ll build the dot O file, okay. As soon as it finds one error, it’ll say, “Oop, an error.” And you know, you probably remember the C++ compilers from X code and from Visual Studio C++. When it gives you an error, it gives you a lot of them, and it goes on for pages and pages and pages.

You can suppress it. You can tell it to stop after one error if you want to, but just assuming that everything compiles cleanly, this by default would generate a vector dot O file, okay. And you’ve seen these dot O files pop up in your directories. This would have all these assembly code statements. If it were compiling to CS107 assembly, you might see things like M of R 1 is equal to SP, things like that, the things that actually emulate the implementations of all of the functions that happen to exist in this translation unit, okay. Does that make sense everybody? Okay.

So what I want to do is I want to talk about compilation and linking kind of simultaneously. And I’m just gonna go through one. I don’t want to say it’s an easy example. It’s actually quite sophisticated, but it’s a short program and I can just talk about what happens, and then talk about what happens when you just stop – when you start to remove pound include statements, okay.

Now I am being GCC specific in my discussion of compilation. I’m just doing so because GCC will probably become the most important compiler to you, at least at Stanford, if you’re programming in C++, okay. Let me just give you a sense as to what the dot O file would look like in response to this dot C file. Let me just write this file called main dot C. It’s gonna be a full program. It’s not gonna do anything, but it’s gonna be legal C code, and it’s gonna cause some functions.

I am going to pound include STDIO dot H. The only thing that’s relevant is that it defines the printec function, okay. I’m also gonna pound include STBLIB dot H with the L right there. This is gonna define malloc and free. It also defines realloc, but I’m not gonna call realloc. And I’m also gonna pound include assert dot H, not N, H, and this is the program.

Nth main, nth ard C, car star ard V, it’s an array. And I’m just gonna do this. It’s like four or five lines. Void star memory is equal to malloc of 400. I’m going to assert that memory is not equal to null. I’m going to print F (inaudible) because if I’ve gotten this far, then I know that I got real memory, and I’m gonna celebrate by bringing it.

So this is in place just to demonstrate exactly what compilation does. Now pretend we’re in a world where there are no other architectures beyond the mock CS107 architecture we discussed last week, okay. So on the CS107 chip, and I feed this to GCC in accordance to the way that the make files that you’re dealing with actually would call it. It’s going to run it through the preprocessor. You know that these three things would be recursively replaced to whatever extent it’s needed to build one big stream of text, which at the end has this right here, okay.

This right here corresponds to that in this emoticon drawing over here, okay. I don’t have to generate the full assembly codes for this, but the interesting parts are gonna be this. This is the full dot O file that’s generated as the compiler digests the expansion of this to a translation unit. Preprocessing takes this and builds a long stream of text without pound include and pound defines, and that’s fed to the GCC compiler that actually generates that O code for you.

You certainly should expect there to be a call to malloc, okay. You would actually see some lines right here like SP is equal to SP minus four. M of SP is equal to 400. Those things should be familiar to you based on what we talked about last week. I’ll move over to the right, okay.

You would expect to see a call to printec. You would expect to see a call to free. You would expect to see RV is equal to 0. You would expect to see a return at the end. Those are gestures to the interesting parts of this program from a compilation standpoint, okay. Why isn’t there a call to the assert function? Because I included preprocessing in the discussion, and that right there doesn’t define an assert function. It declares or defines a way to take this right here and replace it with an expression that doesn’t involve an insert function, okay.

There would actually be a – based on the way I wrote it before, I didn’t preserve it. Remember how I called F print F before? That’s the file star version, or basically the IF stream version of print F. There would be a call to F print F in here as well because of the way I defined assert. Does that make sense? Okay.

So there’s that. This is a clean working program. It’s not very interesting. It does lots of business and has the weirdest way of deciding whether to print yay or not, but nonetheless, it would compile and it would run. It doesn’t even (inaudible) from memory because I’m very careful to free it down here, okay.

Compilation generates this dot O file. If I don’t include a flag inside the make file or with the GCC call, it’ll actually try to continue and build an executable. By default, it’s named A dot Out. If I just use GCC right here, if I want to suppress linking in the creation of an executable and just stop at the creation of a dot O file, I would pass it – I wouldn’t call GCC, but I would call GCC dash C. It means stop after compilation, okay. And you’ve seen the dash C’s fly by with all the GCC calls that are generated from make. Make sense? Okay.

If I don’t include this, then it will try to build an executable. By default, it would create something called A dot Out unless you actually use the dash O flag to specify what name should be given to the product. And if I say my prog for my program, then it won’t use A, its default, (inaudible). It’ll actually name it my.prog, okay. The only requirement that’s needed past compilation, this is compilation, the generation of this.

When it tries to create an executable, you’re technically in what’s called the link phase where it tries to bundle all the dot O files that are relevant to each other. In this case, there’s only one dot O file, at least exposed to us. And it tries to build an executable. The only requirement that really – you really need is you need there to be a main function so it knows how to enter the program. You have to have a definition for every single function that could potentially be called from anywhere, okay. And you can only define all the functions – each function can only be defined once, okay.

There’s not many link errors that can happen when you’re trying to create an executable, okay. Does that make sense? Now by default it actually links against some libraries that are held behind the scenes that provide the implementations of print F and F print F and malloc and free and realloc and all of those, okay. Does that make sense? Okay.

So there’s that. This is compilation. This is linking right here. I’ll let you say so. And if I type in dot slash my prog, it’ll run this thing, print yeah, and we’ll have a working program here.

What I want to do, I only have a minute, so I’ll just kind of give you like a little teaser as to what we should – what we’ll see on Wednesday. I want to kind of tinker with what happens if I forget to pound include STDIO dot H. All that, that just confuses matters a little bit with regard to the definition of print F. Does that make sense? Okay.

Then I’ll say what happens if I forget to pound include STDLIB dot H and I don’t have explicit prototypes for malloc and free visible? They’re not included in the translation unit, so they’re not around during compilation, okay. What kind of impact does that have on the ability to build A dot L or my prog? And the most interesting of the three is what happens if I accidentally exclude the definition of the assert macro, so that it’s not visible during compilation. Does that make sense? Okay.

Well, I have negative ten seconds, so I’ll let you go. I will talk about those three things. I’ll reproduce this on Wednesday and we’ll spend the first half an hour talking about it, okay.

[End of Audio]


Lecture 12: Programming Paradigms

Section Assignment 4
2 pages

Section 4 Assignment Solutions
3 pages

Programming Assignment 5: Raw Memory Instructions
6 pages

Programming Assignment 5: Raw Memory Solutions
8 pages

Topics: Review of Compilation Process of a Simple Program Into a .O File, Effect of Commenting Out a C Standard Library .H File on the Resulting Translation Unit, How Gcc Infers a Prototype When None Is Found and the .O File Remains the Same, How the Gcc Linker Is Able to Link Standard Library Files Without a #Include, The (Similar) Result When the .H File with Malloc's Prototype Is Not Included, How Commenting Out Assert.H Creates Different Results, Failing In the Linker Since Assert Is a Macro, As Opposed to a Function In the Standard Libraries, Effect of Calling Strlen with the Wrong Number of Arguments on the Compilation/Linking Process, Effect of Calling Memcmp with too Few Arguments on the Compilation/Linking Process, How C++ Disambiguates Between Different Function Prototypes to Avoid the Problems Posed By the Previous Two Examples, Debugging Information - Seg Faults (Usually Dereferencing a Bad Pointer) Vs. Bus Errors (Dereferencing Data that Isn't Correctly Aligned), Debugging Example Where Overflowing an Integer Array Leads to an Infinite Loop, Similar Example with a Short Array that Works Differently on Big-Endian Systems Vs. Little-Endian Systems, Example Where an Array Overflow Overwrites the Saved PC and Leads to an Infinite Loop



Instructor (Jerry Cain):There we go. Everyone, welcome. I have two handouts for you today. One project is mid-term and its cousin, the practice solution. I haven’t written our mid-term yet, but I’ll get to that this weekend. I am typically very good about imitating the structure of the practice mid-term but I never promise that I will. If I come up with some new idea that I think is fair to test the material, then I go for it. Okay, but all of those problems on the practice mid-term are drawn from previous mid-terms over the past three years. It’s a little longer, just a hair longer than you’re likely to see.

Typically it’s two, three or four questions, depending on how intense any one question is. This one has five on it with some short answers. My recommendation, if you can do this – it’s difficult to, but if you can take the practice mid-term during some three hour block and just write it out in full and then grade your own work, that’s a much better way than just saying, looking at the mid-term looking at the solution saying, yeah, that’s exactly what I would have done because it isn’t.

Okay, so try and write out the solutions and then you’ll have to invest some energy in deciding what types of errors are significant and what types of errors are not. I can tell you right now that we’re very concerned with the asterisks and the ampersands and the arrows and the dots and the casts. Ultimately, the questions are really just hopefully interesting back-story and vehicles for us testing all of that stuff and I don’t care about the four loops, I don’t care about the flux pluses unless it’s point arithmetic.

Okay, I really care about the asterisks and the double car star casts and things like that, so don’t think that they’re going to be viewed as minor errors. They’re not, because a car star cast is very different from a car star star cast. The others will cast it to be a struck triple star. It’s just as wrong if it’s not correct, so concern yourself with those things.

The mid-term is a week from tonight. It is in Hewlett 200. That’s probably one of the largest lecture halls on campus. It’s not in this building, obviously, it’s across the street, so there’s that. You can take it any time on Wednesday during some three-hour block that fits in between 9:00 a.m. and 5:00 p.m., if you can’t make it that night. You can’t start it during lecture hour. If it’s best for you to skip lecture hour that’s fine, if at all you can make it that night, it just makes everybody’s life easier because otherwise somebody has to be around to make sure we can answer your questions. We will be around, there’s no doubt about it, but don’t take it earlier just because it’s convenient for you. Please try to take it at night if you at all can. There’s something else. I updated the website yesterday, but I didn’t make an announcement because another website wasn’t updated yet, but immediately after lecture on Friday, a friend and colleague of mine – colleague is such a snotty word – a co-worker of mine – Facebook is giving a technology talk in actually as it turns out, the same room where your mid-term is so you can practice and get a feel for the room on Friday by going at noon.

I saw him give the talk about three months ago and at the time I was watching him give it I said, this is exactly the type of material, the type of talk that helps motivate the beginning four weeks of CS107. He won’t talk specifically in terms of realic and malic and free and vectors and half-sets, but when he talks about the system that’s in place to store the billions and billions of photos that they have, and they need laser-fast access to any particular photo that might need to be served up, that’s an incredibly difficult infrastructure problem and it relies on this very clever indexing scheme in order to go and find a photo somewhere on one of the – I can’t say the numbers, n number of servers where photos reside and it has to do that quickly. And so it’s actually concerned with things like minimizing the number of disk reads and things like that and so, very little of it is beyond the scope of what you’ve learned in the first few weeks of 107. We’re also giving you pizza and soda so if you feel like you’re sacrificing lunch then you’re not really because we’re gonna be giving you food. If you’re at all interested in going to that, please RSVP by visiting 107 which leads you to the ACM website. Just RSVP so that we know to get you some pizza. It’s not televised unfortunately, so you actually have to attend something live if you want to go see this, and we can’t televise it, I think, for obvious reasons. Okay, so there’s that.

When I left you on Monday, today’s Wednesday, when I left you on Monday, I was five minutes into an example that I want to use to illustrate how compilation and linking work, and I’m gonna actually frame it specifically in terms of GCC. This was the program I wrote. IntMaim, I really don’t care about those arguments, but I do care about those arguments, but I do care about this, void star memory is equal to malic of 400. I want to assert in the style that we’re used to, mem is not equal to nul. I want to print yay, excise m, I want to dispose of the memory and I want to return zero. I’m only allowed to call malic and free without running into any issues. If go ahead and pound include STD, LIB.H, there’s a D right there. I only can call Print F cleanly if I pound include that right there and a certain macro is available to me via this header file. This is the entire program, save for the implementation of malic and Print F and free which resides in standard C libraries. If I compile this and I just do GCC, whatever this file is named, it can be made to generate two things. It can be made to generate a .O file and it can be further made to generate an executable. Based on the way this is set up right here, you wouldn’t be surprised that there’s a call to malic instruction somewhere inside there, that there’s a call to Printa, that there is a call to the free function, that eventually there’s an RV as equal to zero, and there’s a return. There’s actually a NSP is equal to SP – 4, up top there’s the corresponding SP is equal to SP + 4 right there.

Those things should make sense to you because you know what happens in response to local variable declarations and return values and function calls, okay. That make sense to people? If I were to compile this, even though it’s not a very interesting program, I compile it. It generates this assembly code. This is just compilation, this is linking by default, even if you just have one .O file, what linking really is, is it’s the blending and merging of this .O file and any other ones you have; but in this case we only have one. The .O file with all the standard libraries and the standard .O files that come with the compiler, the standard compilers have the .O code for Print F and malic and free and things like that. It basically does what is necessary and stacks all of them on top of one another to build one big executable. It usually splices out everything that’s not touched and not needed but it includes the code for any function that might come up during execution, okay. It has to be able to jump anywhere possible while a program is running. Because this is set up the way it is, it’ll compile cleanly, it will run, it’ll print yay as side effects without really making too much noise. It’ll allocate a buffer of 400 bytes, make sure that it succeeded and then free it and return zero. It’s just this very quick program to really print yay, but to also exercise malic and free and confirm that those things are working, okay. What I want to do, is I want to see what happens if I comment out this right there, just give you some insight as to how little, what little impact the .H file has on compilation and linking.

If you were to say that this would generate compiler errors because it doesn’t know about this Print F function, you would be right for some compilers, okay. As a result of commoning that line out there, you know enough to know that the translation unit that’s built during pre-processing won’t make any mention whatsoever of Printa, okay. So when time comes to actually check to see whether or not Printa was being called correctly, it doesn’t have the information it normally has, okay. Many compilers would be like whoa, what are you doing? I’ve never heard of this function, I don’t like calling functions that I don’t see prototypes for and some might issue an error; GCC des not. We’ll talk about Print F in a little bit, in a little bit more, but if during compilation it sees what appear to be a function call, what it’ll do is it’ll just infer the prototype based on the call. Does that make sense to people? Okay, it sees that a single string is being passed in here, compilation would issue a warning saying no prototype for Print F found, but it doesn’t block you from compiling it still with generated .O file. We are calling it correctly as it turns out, okay. As long as you pass in at least one string, Print F is able to do the job and if there’s no place holders in here, so it’s not gonna have any problems executing. By default, when GCC infers a prototype, it always infers a return type of Int. Turns out that is the real return value of Printa, we normally ignore it, but the return value is always equal to the number of placeholders that are successfully matched to. This particular call would return zero, but I’m not concerned about the actual return value. I’m concerned about the return type; it happens to mesh up with what’s inferred by the GCC compiler. Does that make sense to people?

Now, if several other Print F calls were to appear after this one, they would all have to actually take exactly one argument, because it’s inferred a prototype. It’s actually slightly different than the one that really is in place, okay. Does that make sense to people? So you may say, okay, it’s gonna compile here. What is the .O file going to look like? It’s going to look exactly the same. The .H file just consolidates data structure definitions and prototypes. There’s no code emitted on its behalf; all it does is it kind of trains the compiler to know what’s syntactically okay and what is syntactically not okay, okay. Makes sense? But as it infers this prototype, it’s one slight hiccup in the form of a warning during compilation, but it still does SP is equal to SP – 4, it still copies two M of SP, the address of this capital Y, it still calls Print F and it technically expects RV to be populated with the return value, although this happens to ignore it, okay. You generate this and I’d say more than half of the students assume that when you link to try to build this executable, that somehow its gonna break down because it doesn’t know the Print F should somehow be included. Do you understand why people might think that? Yes, yes. Every time you try to build an executable using the GCC system, technically you’re using GCC but you’re using a cousin of it called LD, which is just for load. I’m sorry, for link. It always goes and searches against the standard libraries whether or not there were warnings during compilation or not. Print F sits as a code block in all of those standard libraries, so it happens to be present during the link phase, even though we never saw the prototype for it.

The presence of a pound include has nothing to do and makes no guarantee about whether or not the implementation of any functions defined in that thing or available at link time. If something is defined in the standard library set it’s always available, whether or not we behave with the prototypes. Okay, does that make sense to people? Okay, if I bring this back and I comment this out, then it doesn’t have official prototypes for malic or free. So if it comes down here, it doesn’t even see that line, it expands this to a bunch to a prototypes and data definitions, expands this to at least the definition for a cert. It’s fine with that, looks at this as well, I have no idea what malic is. We’re calling this like a function, so I’m gonna assume it’s a function; I’m inferring its prototype to be something that takes an Int and returns an Int. It does not actually look how it’s being used in the assignment statement to decide what the return type is. So it’ll issue two warnings in response to this line. It’s inferring the prototype and then you’re assigning a pointer to be equal to a bona fide, what’s supposed to be a plain old integer, okay. There are no problems with this because it sees the definition of cert, Print F is fine, it looks at this right here and it doesn’t like this line either because it hasn’t a prototype. It infers one; it issues a warning saying it’s inferring a prototype for free. It assumes a, it takes a void star, it assumes it returns an Int, which is not true; but we’re not looking for a return value, so it’s not a problem, okay, and then comes down here.

So this would also generate the same exact .O file. It would issue three warnings okay, two for missing prototypes and one for incompatibility between L value and assigned value; but it would create this and when we link it, it’s completely lost memory. There’s no record in here that some .H wasn’t around and that there were warnings. It’s just that there’s a little bit of risk in what’s happening here, but I generated it because this as a .O file is consistent with the way you wrote the code here, okay. Making sense? Okay, so it goes on and links and when we run this, it runs beautifully. If I comment these two lines out, then I get a total of four warnings, but it still generates the same .O and it generates the A.L file that still runs. Yeah?

Student:I understand why you said it runs beautifully. It somehow understands [inaudible] right there.

Instructor (Jerry Cain):It does run beautifully, the question is he doesn’t understand why it runs beautifully. All it does is it tells the compiler what the prototype are. There’s no information in .H files about where the code lives. The link phase is responsible for going and finding the standard libraries. That’s where malic and free and Print F are defined. It’s not as if there are hooks that are planted in the .O file because of what the pound includes used to be. As long as this and that and that are in there, and they will be whether it’s the result of a clean compilation or one with three or four warnings. It’ll still have these and so when we link against the standard leverage then set, it’ll try and patch these against these three things again. The same symbols that exist in the .O files and it’ll create the .O files. Does that make sense? Okay.


Instructor (Jerry Cain):That’s a different problem, but that isn’t the case here, okay. I haven’t defined my own Print F or my own free, okay. If I comment these two things out, then I get all the warnings that I’ve talked about so far. If I bring them back, I have a clean compilation again. If I comment this out, we have a completely different scenario. If it doesn’t see any mention of how a cert is introduced as a symbol to the compilation unit, it comes down here it says, I know what malic is. It’s a function that takes a size T and returns a void star, okay.

It comes here and it’s like I don’t know what a cert is, so look at it. If I didn’t tell you what a cert was two days ago, or I guess, yeah two days ago, you would say oh, that must be a function that takes a bullion and returns a void. Okay, but we know that it really isn’t that because we know how it’s officially defined in a cert.H; but because this has been commented out, it looks at this and it’s not like the word cert is hard coded into the compiler. It actually assumes that this is a function call. This would now appear in the .O file. Does that make sense to people; yes, no?

Okay, we compile fine, we compile fine, we compile fine, generates this, the link phase would actually fail and the reason it would fail is because even though there’s code for Print F and malic and free in the standard library set, they are real functions. A cert doesn’t exist anywhere in the standard library set, okay. A cert was just this quick and dirty macro that was defined in a cert.H. Okay, make sense? Okay, so there’s that, and I bring it back, everything is obviously restored. The prototypes and I think this is a fairly astute way of putting this; the prototypes are really in place so that caller and callee have complete agreement on how everything above the safe PC in the activation record is set up. Does that make sense to people? The prototype only mentions parameters. The parameters are housed by the space in an activation record above the safe PC. Everything below the safe PC is the callee’s business, but when I call Print F and I jump to Print F code, we have to make sure that the callee and the caller are in agreement with how information was stacked at the top portion of an activation record, because it’s just this trust where if I’m calling Print F and I lay down a string constant right here, and I say, take over, Print F has to be able go, oh, I just assumed that there’s a string above the safe PC. I hope so, because I’m treating it as such and if it isn’t there are problems, but if it is, it just prints it. Does that make sense? Okay.

If I were to do this, let me write another block of code. This is actually a really weird looking program, but this is typical of the type of thing you see in 107. Int, Main, I actually don’t care about arc c and arc v. I’m gonna declare an Int, I’m just gonna call it num, and I’m gonna set it equal to 15. Actually, you know what? I lied. I’m gonna set it equal to 65 and I’m gonna do this, Int length is equal to, no laughing, stir limb, ampersand of car start, ampersand of num, num, okay. So I’m calling star line in this completely bogus way. I want a Print F; length equals percent D/M. I’ll just put length down and I will return zero and let’s for the moment not put any prototypes up there. Okay, suppose I completely punt on the pound include section, compiles this, it’s like that’s fine. Oh, I don’t like that, I’m seeing a function call, I haven’t seen a prototype for, but because of the way I’m calling it, I’m going to assume it takes a car star, followed by an integer, okay, and I’m gonna assume it returns an Int, because I always do that for prototypes that I make, that I infer information about. The assignment works fine. It prints out whatever length it happens to be bound to, and then returns. So this, if were to compile this, it would only issue one warning. Does that make sense? Now that call is totally messed up. I don’t know how often you’ve had to call star LAN, you’ve called like mem copy and star copy and all that. This just takes normally one argument, which is a strain and returns the number of visible characters before the backslash zero. That make sense? Okay.

So the way I’m calling this, this is where num resides in memory. I put a 65 there. This is where length resides, okay. I haven’t initialized it yet, but it calls star LAN before I can initialize length, so if that’s the activation record and it preps for this call right here, and it’s inferring the prototype, it goes okay, I don’t know what we’re doing, but I’m gonna do an SP is equal to SP minus eight. I’m gonna put the address of num right there. It has to be car star, so it thinks that there are characters right there, four of them okay. Make sense? It puts a 65 right there. It leaves SP pointing right there and then it calls star LAN. When this generates a .O file, all it has inside that’s of interest to the linker is the call to star LAN. You may think that during link phase that it’s gonna freak out because it’s gonna somehow know that star LAN only takes one argument. That is not true, there’s no information, there’s no direct information inside the .O files about how many parameters something is supposed to take. You can look at the .O file and you can see SP is equal to SP minus eight versus SP is equal to SP minus four, versus SP is equal to SP is equal to SP minus 16. You have some sense of how many parameters there are; but not really, because it might be one 16 by struct or four, four by integers, okay. When it does the linking it just looks for the name, it doesn’t do any checking whatsoever on argument type, I’m sorry, on parameter type.

So the fact that this signature is zonked and messed up, it’s irrelevant to the link phase. All that it looks for during the link phase is for something that responds to the name star LAN and that’s exactly what happens. So when this executes and it jumps to star LAN, star LAN inherits this picture. This is where our safe PC is set up. The real implementation of star LAN was written and compiled to only have one car star variable. Does that make sense to people? So its assembly code only goes this high. It may actually detriment this by some other multiple of four bytes for its local variables, okay. It does a four loop inside really, is what, all it does and then returns it to here. But do you understand why the 65 is more or less ignored? Does that make sense to people? Okay, so as it turns out, this will compile with one warning. If I want to turn off the warning, I can do this. I can actually manually prototype it at the top. That’s the type of thing that comes into the translation unit as a result of preprocessing anyway, so if I want to suppress the warning there, and then just manually prototype, because obviously I think that’s the prototype because that’s the way I’m calling it. Then I can just do that okay.

A lot of times you will see people only include the prototype manually. The alternative is to pound include this big .H file that’s closed on compilation and it you’re concerned about, if you have to remake a system 75 times a day from scratch, and the number of pound includes actually impacts compilation time, a lot of times you’ll just manually prototype things instead of pound including everything. It’s a little risky, because technically – less egregious, but technically incorrect versions of this mistake could actually happen, and cause code to compile that probably shouldn’t. But this will now compile cleanly, because I said this line is perfectly good as is. When it runs, it calls star LAN and only thinks about everything below that little arc right there, okay. It actually treats that as a car star, it has no choice but to do that, we even coached it to think that it’s a car star for the call. It’s gonna return, it’s gonna bind length to some value. It’s gonna print it out. Then any idea what’s gonna be printed out by this program? Yeah?


Instructor (Jerry Cain):It would be zero on basically most systems; I’m gonna say most, many systems. It would actually print out one on some other systems.


Instructor (Jerry Cain):I know, it does and it actually doesn’t have anything to do with the nul character. I’m assuming that the four byte integer that really resides here is stored this way. Does that make sense?

If that’s the way it’s set up and there really is a single byte of zero all the way to the left and so as far as this argument is concerned, it actually thinks it’s pointed to the empty string in this little static space on the stack okay. Make sense? That is the big NDNGO. In the little NDN system, these would be reversed, right? And so 65 happens to be a capital A. It doesn’t even care that it’s 65; so much of it is non-zero. So on little NDN systems, this would actually print out length is equal to one, okay. Does that make sense?

The interesting part is that this is a complete hack. I mainly prototyped it right there. Try not to do that, because it allows things like this to happen. It turns out it doesn’t cause any problems. I’m sorry, it doesn’t cause any run-time problems, it’ll actually execute properly and because this happens to point to either 000 65 or 65 000, both of those can be interpreted from the beginning of that sequence, as some c-string. One happens to be the empty string, the other one happens to be the capital A followed by a backslash zero, okay.

It will just have some response, even though if it’s a little weird, okay. Make sense? Yeah?

Student:[Inaudible] the second parameter?

Instructor (Jerry Cain):The caller does not. The caller actually places it there, but think about the implementation of star LAN was really compiled with the normal prototype, not this one. So it only reaches four bytes above, at most four bytes above the safe PC, which is why I draw this arc right here, okay. Does that make sense?

This still sits there; it’s still popped off the stack when star LAN returns because this did an SP is equal to SP minus eight. It just assumes that 65 was integral to the implementation or to the execution of star LAN and it pops it off afterwards, okay. Does that make sense?

That was – it wasn’t my exam question, Julie Slonski gave us like 12 years ago. I couldn’t believe it when I wrote it. I was like wow, it was really hard. I don’t know whether a couple of people got it right or not, but I thought it was very interesting that’s why I wanted to put it in lecture.

Let me give you the opposite scenario here, though. Suppose I do this, Int, mem compare, void star, v1 and I just mess up and for whatever reason, I think that somehow a mem compare only needs one void star and then I have some block of code that declares an Int called n is equal to 17 and I do Int m is equal to NPMCM, ampersand of n, and that’s it, that’s all I care about.

Okay, it’s not a very interesting block of code, I just want to see what happens if we call a function with only one argument that really expects three, okay. You may not remember the real prototype for mem compare. It’s used incidentally in assignment four, but it’s kind of like star comp, except you explicitly pass in the number of bytes that should be compared. That’s why zeros are meaningless to a memory compare. The actual prototype is this and this is the way mem compare was compiled behind the scenes, v1 void star, v2 Int size. That’s the real prototype. The call here would declare m, stack it right there with the 17, it would put an m there that has no value yet. When it calls mem compare according to that prototype, there’s a safe PC right there, there’s the address of n is right there, and this is compiled with the idea that only the four bytes above the safe PC actually are relevant to the implementation of mem compare. Does that make sense to people? When we transfer execution to the real mem compare that has a completely different idea of what the parameter list looks like, it inherits that as a safe PC and it’s like wow, I have 12 bytes of information above the safe PC. So it just accesses them, okay, so this overlays the space for v1. This uninitialized value overlays the space that’s used for v2 and it happens to inherit 17 for the size and it executes according to those three values. I’m not saying it’s sensible, I’m just telling you what happens, okay.

So because I did that and because this is the only thing that’s part of my main function, it would compile, it would execute, it probably would crash, okay. It’s probably the case because this is uninitialized, so it’s very unlikely that it has as a random four byte address inside of that point, something that’s really part of a stack, for the heap, for the code segment, okay. Does that make sense? If it happens to be that, then it would run somehow, okay, but it probably would not. You guys get what’s going on here? Okay. It’s, c as a language, it’s very easy to get something to compile, and it sounds like, I mean, you may not believe that, but coming from c plus, plus LAN and 106b, maybe you do realize it now. You’re certainly seeing things compile in assignments three and assignments four that are wrong, okay. They just crash so they don’t work. If it were a fully, strongly type system where there was no idea of exposed pointer or void star or casting, you’d have to actually get a lot more things right before things compiled. Does that make sense? You’re used to templates from C plus, plus and you don’t use any generics with void stars and c plus, plus. I’m sorry, you just usually don’t.

Even though it’s a pain to get things to compile in C plus, plus, it’s rarely the case that you crash, okay. To the extent that you use templates, template containers in c plus, plus, you’re that much less likely to deal with pointers and so you just don’t see as many crashes when you run a c plus, plus program, okay. You do see a lot of crashes with c programs; we all believe that now, okay. It’s almost like c plus, plus as a language, the compiler is like this hyper-persnickety wedding planner, where everything has to be in place before it’ll let the wedding happen, okay. Does that make sense? Language like c, it’s like it’ll all work out and so that’s what the wedding planner is saying, it’s like yeah, it’s a void star, it’s a void star, yeah, as long as you know what’s going on, I trust it’ll all execute and if it doesn’t, well then, that’s your problem, okay. It’s really what the c compiler is really viewing it as. This is definitely a c compiler, an exploit right here. You couldn’t do something like this in the pure c plus, plus extensions of the c language, okay. Does that make sense? Do you know how you can overload functions in c plus, plus? You can actually define a function with the same name as long as it has different parameter types. You can even have the same number of parameters as long as it can disambiguate between two different versions based on the signatures of the two calls and the data facts of the call, it’ll let you define one. You can’t do that in pure c. If I have MEMCMP as a function name, then I can only use it once.

What c plus, plus does is very clever. When it compiles it, it actually doesn’t tag the first instruction of a function with just the name of the function, it actually uses the name of the function and the data types in the order they’re presented on the argument list to assemble a more complicated function symbol. So something like this in c plus, plus would be a call to MEMCMP. I’ll write it this way. This is the way it would be set up in pure c. In c plus, plus, it has to be able to disambiguate between multiple versions of MEMCMP with different signatures. So it actually does this, and I’m making this part up, but something along those lines okay. Does that make sense? So if you were to compile this with a c plus, plus compiler and you were to compile that implementation with a c plus, plus compiler, this would be call to MEM compare, underscore void star, underscore P, whereas this would be tagged with MEM compare underscore void, underscore P, underscore void, underscore P, underscore Int. Does that make sense? You might as well call the functions x and y as far as the c plus, plus compiler is concerned, okay. So a call to this from a c plus, plus .O file to this, would lead the linking problems, okay. Does that make sense to people; yes, no? So c plus, plus is a little safer in that regard as well.

Okay, so there’s that. I have a few things I want to do. I’m gonna spend today and Friday easing up a little bit before – I’ve actually formally covered everything that I’m gonna cover on the mid-term. In fact, I’m not even gonna test you on this stuff on the mid-term. I’m just trying to give you a very big picture of what the entire effort is into building a c or a plus, plus executable. I want to go back a little bit and talk about debugging and in particular, give you some lighthearted examples as to why programs crash the way they do. It’s one thing to say they crash, that’s not very interesting and insightful. Yeah, it crashed, well of course it did, it’s c, but why did it crash? It’s like why did it crash and what happened at run-time to actually prompt the crash or the segmentation fault or the plus error. I know you have no idea what the difference between those two is. I’ll tell you right now what they are, but I just want to show you why programs run the way they do when there are little bugs in there okay, and I mean, if something survives compilation and survives linking, why it runs and it can still go astray.

Let me quickly talk about the two very harsh alerts that are thrown when your program crashes. You’re used to seeing segmentation faults and you’re used to seeing bus error, okay. You’ve probably seen seg faults more recently in assignments three and assignment four more. I wouldn’t be surprised if you saw a lot of bus errors in assignment two, okay. This right here always comes into plays when you dereference a bad pointer. That turns out to be the case with this as well, but there’s different reasons in each scenario. If you ever try to dereference the nul pointer, if you really try to do this, well that wouldn’t compile, because you can’t dereference a void star, but conceptually, if you were to actually try and jump to the nul address to discover an integer or a car star, it issues a segmentation fault, okay. The reason that’s the case is because for the 12th time this quarter, I’m drawing all of ram and I’m drawing the stack up here and I’m drawing the heap right here. Stack, heap, here’s the code segment. There’s also the data segment is usually actually done here, but I’ll draw it up here because there’s room. The nul address corresponds to that. Do you understand that the four bytes at address zero, one, two and three, they’re not part of any segment, okay. The operating system can tell that. It’s like, okay, you’re dereferencing the nul pointer. I’m not mapping the zero address to your stack or your heap or your code segment. So I know this can’t possibly be right, because you’re not dealing with an address that should be under your jurisdiction. It’s not the address of a local variable; it wasn’t an address that was handed back from malic, so why are you dereferencing it? I’m gonna scream at you in the form of a seg pull, okay, and that’s what a segmentation fault is. It’s your fault for not mapping to a segment, okay. This is a little bit different. Bus errors are actually generated when you dereference an address that really is somewhere in one of the four segments, but it can tell, or it thinks it can tell that the address couldn’t possibly correspond to what you think it is. If you have an arbitrary address here, void star at VP is equal to whatever is equal to and then you do this. If VP really is an address that corresponds to – that resides somewhere on one of the four segments, you will not get a segmentation fault because you are hitting a segment.

Because it wants to make things simpler, compilers adhere to what’s called – adhere to a restriction that’s imposed by the hardware and the operation system that all integers actually begin at addresses that are multiple of four. That all shorts begin at even addresses, there’s no restriction on characters, but basically just to keep things clean and to kind of optimize the hardware, it always assumes that all figures other than shorts and bytes reside at addresses that are a multiple of four. I don’t know whether you questioned why I had this random padding every once in a while in the data images from assignment two. Like I said okay, and there’s a two byte short that follows a backslash zero, unless the name of the actor is even in which case there’s two backslash zeros. That’s because I knew you wanted to dereference some pointer in there as a short star and if it happened to be an odd address, even though the two bytes that are there really do pack in a short, the hardware will like whoa, bus error, I don’t like that. I don’t like you dereferencing odd addresses and thinking that there are shorts there because I know that the compiler would never normally put a short at an odd based address. Does that make sense to people? Okay. If I do this, and so let’s say that this right here, if VP is equally likely to be any address that’s inside a stack, the stack or the heap or the data segment or the co-segment, then this would throw a bus error with 50 percent probability. Does that make sense? Okay, if it doesn’t throw a bus error, then it really does write a two byte seven somewhere.

If I were to do this, then that would throw a bus error if VP was really part of some segment somewhere, but VP wasn’t a multiple of four. The address 2002, no it wouldn’t put an Int there, right. So it’s not gonna let you start laying down a four byte integer at what appears like an odd address, even though it’s not really odd. Address 2000, it’s fine. Address 2004 and 2008, they’re great; 2002? No. 2001? Doubly no, okay. All of those intervening addresses could not house or be the base address of an integer. It’s like a block where all the houses have to begin with like, have a, to be a multiple four, an address. Okay or one of those snobby neighborhoods where everything’s like 100 or 200, okay, the addresses on the houses. Does that make sense to people? Okay, so when you see a bus error, and – I’m sorry, when you see a seg fault, it’s almost always because you have some nul pointer. In theory it can be any address off of a segment, but it’s gonna be either dereferencing a nul or a four or eight or some very small no pointer relative address, okay. Bus errors I see less often. It only usually happens when you’re dealing with manually packed data like we say on assignment two and you’re trying to rehydrate two byte and one byte – four byte figures from arbitrary addresses internal to some data image, okay.

Okay, what I want to do now is now that you have that, you have some more information and when you see bus errors and seg faults at least you have some idea of why, what’s happening. Let me throw some code up on the board, and I want you to understand why this program does what it does. Here is the entire program. I’m not gonna concern myself with pound includes, just assume anything that needs to be pound included is, and I’m gonna declare an Int I right there. I’m gonna declare an Int array of length four below it and then I’m going to deliberately mess up. I is equal to zero, I less than – is it less than or less than or equal to, I don’t know, I’m gonna include more, that’s probably safer and I’m going to do array of I is equal to zero, okay. You see the bug, okay; you see that it’s overflowing the bounds of the array. What you may not be sensitive to, and this is something you can only understand after you see a mock memory model, which is actually really close to the real memory model of most function column return mechanisms, is that this code executes with this image in mind. Here is the safe PC of whoever called main. It’s actually a function called start that calls main and starts responsibility is to pass the command line to build the Arc V array of strings, count the number of strings that are there, put that number in Arc C. It actually puts a nul after the very last string, so in case you want to ignore Arc C you can, but then this is that array of length four.

This is I, and let me just be obtuse about it and just trace through this, even though you know exactly how it’s gonna run. It’s gonna set this equal to zero, it’s gonna pass the test, it’s gonna put a zero right there, it’s gonna come around and promote that to a one. It’s gonna pass the test, so it actually lays a zero right there. It succeeds in making this two and then three and intermittently getting that right there. After it writes that zero, it promotes this to a four, okay, and you’re like okay, something’s gonna happen and it’s probably not good. It comes over here, it passes this test, so it says okay, I guess –and it’s not even gonna say I guess, it just does it, it’s not suspicious. It comes over here and it writes a zero to something that’s four integers above the base address of a write. So where does it write that zero? Over the four. So it just does that, it comes back up here and it’s like wow, that’s weird, I thought I saw a four here before, but I guess it was a zero, so I’m gonna promote it to a one and I’m just gonna write a zero over here. Wow, it’s a zero already, what a coincidence, and it’s gonna keep on doing this, okay and it’s gonna basically start right there, go up here and it’s gonna keep on cycling here. How long? Forever, okay, and that’s because of the way that everything happens to be packed in memory, that this buffer overflow, technically that’s what it is, really does damage. It doesn’t actually – it kills data. In this case it happens to get a program to run forever, so this is a slightly more complicated version of Wild True, okay. Does that make sense to people? Okay.

There are other variations on that, let me just do a couple of other things here. Let me just assume – I have to change just one line here so I’ll give you a second to recover. Suppose I just do this, short array, so the picture, the stack print picture actually changes a little bit. Now the stack picture looks like this, I is still this big fat Int, but now there are four shorts packed here, okay. Makes sense? On some systems this is gonna work fine, fine being a relative term and meaning not badly and on some of the systems it’s gonna run forever, okay, for very much the same reasons except there’s a little bit of a size indifference thing here that we have to worry about. This is set to zero, lays a zero down there, this is set to one, lays a zero down there, this is set to a big fat two, lays a zero there, three, puts a zero right there.

Then this thing is promoted to a four. I’ll put it right there, we’ll say it’s a big NDN system, okay, so it’s really dot, to dot, to dot like there. So if it’s a big NDN system, when it overwrites the bounds of the array, all it’s doing is it’s overlaying zeros where zeros already were, so I’m not saying it’s correct, but you actually don’t see the problem and it just runs and it takes 20 percent longer than it should have, but you don’t deal with things that fast, so you don’t really care and it runs and it returns and you think all is great and so you move from the [inaudible] and it’s 11:59 and you’re like oh, I better test it on the pods. So you bring it over here and you’re like wow, it runs forever. Why? Okay. That’s because the four was over here on the little NDN systems, and the pods run Linux on X86 machines, they’re a little NDN. It writes this for – overwrites this four zero, a two byte representation in little NDN form of a four, with zero zero, which is zero in both big NDN and little NDN and then goes through the same confused cycle that the Int array version did. Does that make sense to people? Okay, so there’s that. I have one minute, let me just give you one other example, I have one minute and 20 seconds, I can do it. I actually gave this about five years ago on a mid-term. I thought it was so clever and they didn’t. So I had this as a function, void fu and I was curious as to what happens when you call fu. I did this, Int array of size four, Int I and then I did this. Don’t question why I’m doing it; I less than or equal to four. The error is the same, the four loop issue was exactly the same, but I do this and the fact that array is not initialized, that’s a weird thing to do to an uninitialized array slot, okay. Notice that the array is above I this time, okay, so I’m back with all integers.

There’s my I variable, here are the four integers that are part of a larger array, okay, and so it does this and all it does here, this goes from zero up to four, it just demotes this by four, whatever happens to be in there, and because it’s allowed to go one iteration too many, right, whatever happens to be here is also decremented by four. Now we know whatever happens to be there, if that’s the only set of local variables that I declare, this is the safe PC, okay. That make sense to people? So the safe PC without really knowing, somebody took the safe PC and decremented it by four, numerically by four. What that means is that this as a pointer, which used to point to the instruction after call to fu, something that let’s this continue, somebody said I’m gonna make you unhappy and put you right there. The impact is that this thing returns and the callee wakes up and execution carries forth where the safe PC says it should carry forth from. Somebody move the piece of popcorn back four feet and so it says oh, I have to call fu again, and it does and it returns like oh, I have to call fu again. Okay, just that’s exactly what happens, it keeps putting the address of this thing down here, but because of this buffer overflow, it actually keeps decrementing the safe PC back four bytes, which means it marches it back one instruction, so how you get this more interesting version of infinite. It kind of is infinite recursion. Right at the end of the fu call, really, really toward the end of the fu call, it calls fu again, okay. Does that make sense?

Okay, so come Friday I will talk to you about Print F and a couple of other things and I will talk about – that’s probably it. It’ll probably be a nice, easy lecture on Friday. Have a good night.

[End of Audio]


Lecture 13: Programming Paradigms

Topics: Example in Which Writing Past the End of Array Causes the Return Address of the Function to be Overwritten, Leading to An Infinite Loop, Example in Which Data Is Incorrectly Shared between Two Different Functions, But Can Still be Printed Out Due to the Structure of the Activation Record (Channelling), How Printf's Prototype Uses "...", Which Allows It to Take A Variable Number of Arguments, Why Parameters Are Pushed Onto the Stack From Right to Left, in the Context of Printf Crawling Up the Stack And Functions With A Variable Number of Arguments, Justification For Structs' Fields being Laid Out Sequentially in Memory, in Terms of Casting between Different Structs With Similar Internal Structures, Sequential Programming Vs. Concurrent Programming, Example of Many Different Processes Running in Separate Virtual Address Spaces, Each Mapped to Physical Addresses by A Central Memory Management Unit, How Concurrent Programming (Multiprocessing) Allows Mutiple Processes to Seemingly Run At the Same Time, How Multithreading Allows Multiple Functions in Run 'Simultaneously' Within One Process (E.G. the office Assistant in Microsoft office Or Downloading Songs in Itunes), Real-World Situation that Can be Modeled Using Threads (10 Ticket Agents Simultaneously Selling 150 Tickets)



Instructor (Jerry Cain):We’re on. Hey, everyone. Welcome. I don’t have any handouts for you today. I actually want to finish up the implementation section of the course. I think we’ll get through it today. I’ll make it a point to finish it today. Come Monday, we’re gonna start talking about multithreading and I’ll even preface that a little bit today, provided I don’t run out of time. Remember that your midterm is Wednesday evening over in Hewlett 200, the largest room on this end of campus. It’s 7:00 to 10:00 p.m. It’s open note, open book. You can bring print outs of your programs, whatever you need. Just plan on taking the exam in the three-hour period. It’s designed to be more than enough time for the midterm. Okay. There’s also the technology talk right after this class right here over in Hewlett 201. So you can first visit Hewlett 200 to see what room it’s like for the exam. But in 201, there’ll be a technology talk from 12:00 to 1:00. Okay. I left you last time with this example, but I got several questions about how it worked, which probably means that I rushed through it in the last five minutes of class, which is probably true. We had something like this, where I declared an int array of length four, and int i to serve as four loop index, and then I’m just gonna go and do this. I don’t care that the array hasn’t been initialized. I want to go ahead and do this right here. And then just return. What you probably do remember from Wednesday is that given R memory model, that this would prompt the program to run forever. Why is that the case? Based on this local variable set, we’re dealing with this as an activation record. One, two, three, this is the array. As far as that four loop is concerned, it’s just one too small. This is the i variable. It goes through and it demotes all of these variables by four. What values were they before? We have no idea. But it will certainly take whatever values happen to reside there and demote them by four. Unfortunately, because the test is wrong, it goes up and it demotes this numerically by four, as well. Now that four isn’t so much a four as it is – it isn’t so much a negative four delta as it is a negative one instruction delta, because you know that this is the Safe PC in our model. This was supposed to be pointing somewhere in the code segment to the line that called the Fu function. This is planted down there in response to that assembly code statement. Does that sit well with everybody? Okay.

When you do this, you inadvertently tell the Safe PC that is wasn’t pointed to this instruction, but that it should back up four bytes, which happens to be this nice round number, as far as assembly code instructions are concerned. And to stop pointing there, and to point there, instead. So when this as a function returns, it jumps back to this right here, and it executes the call, having no memory whatsoever that it called Fu like 14 or 15 assembly code instructions ago. Okay. Does that make sense to people? So this is this very well disguised form of recursion. You won’t be able to emulate this on the Solaris boxes or on the pods or the mist, because their memory model is a little bit more sophisticated than ours. They don’t put the Safe PC right next to this right here. But nonetheless, that is probably the fifth example of infinite recursion we’ve seen in the last two days. Let me show you another program. No infinite recursion in this one. I want to write a simple main program, int name. I don’t care about the parameters. I want to do this. I want to do – let’s just say, “declare and init array.” Now, I’m not declaring any local variables whatsoever. So you should already be a little bit suspicious as to what that type of function should do. But imagine the CS 106a program in week four, learning C instead of Java. And they just don’t have arrays and parameters passing down. So they have this right here. And then afterwards, they have this things called “print array.” And that is it. Now unless there’s global variables involved, there’s no legitimate, even though we don’t like globals, it still would be a legitimate way to communicate information between function calls. But suppose there’s no globals, either. The program doesn’t think they need globals because they do this: void declare and init array.

And they just declare an array of length 100. They declare the forward variable i. They’re not going to overrun the bounds this time. They’re gonna get it right. And they’re gonna set array of i equal to i. So that’s a beautiful little code, but it does very little outside of the scope of the declare and init array function. But for whatever reason, they’ve decided that they want to declare this array, locally update it to be a counting array, and then leave. Okay. Then they come back and they want to print out that array and this is not that uncommon when we taught in C. They think as long as they name the array the exact same thing, that all of a sudden there’s a relationship set up between that array and this one, as if the word array has now been reserved throughout the entire program. And they do this and then they do this for i is equal to zero. They get the forward bounds right again; that’s not the issue. Print out percent D backslash N array of i. And they come in during office hours because there’s some part of the program beyond “print array” that’s not working properly. But then a TA looks at this or I’ll look at this and say, “Oh, this is wrong right here.” And they’re like, “No, that’s working fine. It’s actually down here.” Well, it may be something that’s wrong down here, as well, but clearly there’s something wrong going on right here. However they make the argument, “Well, it’s working.” And the answer is, it is working as far as they’re concerned, because that manages to print out zero through 99, inclusive. You all probably have some sense as to why that’s happening. Try explaining that to someone who’s never programmed before. This right here, built-in activation record of 104 bytes, goes down and it lays down the counting array in the top 100 of the 101 slots. Returns.

It calls a function over there that has exactly the same image for its activation record. So it goes down and embraces the same 404 bytes. This is me embracing 404 bytes. Okay. And it happens to print over the footprint of this function right here. It’s not like when this thing returned it cleared out all of those bits and said we got to scramble this so it doesn’t look like a counting array. It doesn’t take the time to do that. So basically, what happens is if this is the top of the activation record and this is the bottom of the activation record, SP is descendant once, this is filled up with happy information, comes back up after the first call returns, comes back to the same exact point, the happy face is still there. We print over it and it happens to be revisiting the same exact memory with the same exact activation record, so it prints everything out exactly the same. Makes sense? Now, there’s some advanced uses of this, where it really does help to do this. A lot of times, not in sequential code like we’re used to, but about 11 years ago, I had to rely on this little feature when I was writing a driver for a sound card. And you had to actually execute little snippets of code with each hardware interrupt. And so a lot of times, you had relatively little to do in one little heartbeat of an interrupt, and you had a lot to do the next one. In fact, you had so much to do that you weren’t sure you were going to have time for it consistently. So a lot of times, you would prepare stuff ahead of time and put it in exactly the right space so you knew where it was the next time without having to actually generate it. Do you understand what I mean when I say that? Okay. So you almost think of this as this abusive, perverse way of dealing with a global variable. And you’re setting up parameters for code that needs to run later on. This example wouldn’t exactly work that way. This isn’t a great example of that, but at least it highlights the feature.

This thing called “channeling,” that’s exactly what this thing is when really advanced C++ programmers take advantage of this type of knowledge as to how memory is laid out. As far as the problem with hand is concerned, you feel helpless, you try to explain, for instance, if you’ve just put a “call to printf” right there, that its activation record has nothing to do with these activation records. So it goes in and garbles all of the sacred data from zero to 99. And their response is, well, just don’t do that. And comma is out and they think they restored program, but they really have not. Does this make sense to people? Now, I want to revisit this printf thing, actually. I don’t know how many of you know the prototype for printf. You really are just learning it, basically, on the fly in context when you’re dealing with a simon four and you’re just seeing lots of printf’s in the start code and you just kind of understand that there’s a one-to-one mapping between placeholders and these percent D things or percent G things and the number of additional parameters. Well, you know sample calls are things like this: printf hello. And that’s in contrast to C out less than less of hello is string constant with an “endl” at the end. When you have something like this: percent D plus percent D equals percent D backslash N, and you want to fill it in with four and four and eight, if you wanted to do it that way, you certainly could. But the interesting thing from a programming language standpoint is that printf seems to be taking either one argument or four arguments, or really, it takes anything between one and, in theory, an infinite number of arguments. It’s supposed to be the case that those things line up, in terms of placement and data type, with the additional parameters. What kind of prototype exists in the language to accommodate that kind of thing? Well, we always need the control string.

That’s either the template or the verbatim string that has to be posted to the console. So the first argument to this thing is either a char star or a const char star that can polish supports const. So I call it “control.” But then, there’s no data type that really has to be set in stone. There doesn’t even have to be a second argument, much less a data type attached to it. I could put a string there. If this is a percent S, a float there if this is a percent G, things like that. The prototype for that is dot, dot, dot. And so forth. Whatever they want to type in. And the complier is actually quite promiscuous, and what it will accept is arguments two, three, and four, provided that dot is there. If you don’t want to insert anything, it’s fine. If you want to insert 55 things, it’s great. The complier is not obligated to do any kind of type checking between this and what this evaluates to. There’s nothing implicit in the prototype right there. It’s just a char star, free form char star, and then whatever you want to pass in. You want to pass in structs? Great. Pointers, structs, you can do that. GCC, for quite some time, has an extension to the C spec that is implementing. Where it does try to do type checking between this and that right there, if you mismatch, it’s doing a little bit more work at compile time than it’s really obligated to do. It wants to make sure that this printf call works out. So if you were to put percent S, percent S, percent S, and put four, four, eight there, most compliers would just let you do it and run and it would just lead to really bad things. But GCC will, in fact, flag it and say, you didn’t mean to do that. Okay. Does that make sense? This return type – I mentioned this on Wednesday – this return type is the number of placeholders that are successfully bound to. So as long as everything goes well, it would be zero for this call and three for that call. It’s very unusual for printf to fail. Scanf, the read in equivalent of printf, can fail more often.

But if, for whatever reason something goes badly, this would return negative one. Do you know how IF streams set the fail bit to true so that when you call the dot fail method inside C++, it’ll basically evaluate to true and that’s the way you break out of a file reading program? Well, you’re relying on the equivalent of printf’s return value, which is called “scanf” or “f scanf” to return to negative one when it’s at the end of the file. The reason I’m bringing this up is because, based on what we know and the way we’ve adopted a memory model, I now can defend why we push parameters on the stack from right to left, why the zero f parameter is always at the bottom and the one f parameter is always above that. Let’s just consider that call right there. The prototype during compilation just says that either of these calls is legitimate, but when it actually complies that second printf there, it really does go and count parameters and figures out exactly how many bytes to decrement the stack pointer by for that particular call. So the way that the stack frame would be set up is that this would be the Safe PC that’s set up by the call to printf. Above it would be a pointer to the string, percent D plus percent D equals percent D backslash N. And this would have a four, this would have a four, and this would have an eight. The activation record for the first call would just have this many bytes, and would have a pointer to the hello string. So the activation records, the portion above the Safe PC actually is completely influenced, not surprisingly, by the number of parameters that are pushed onto it. Now, when we actually jump to printf, and this is where the SP is left, it doesn’t have any clear information about what resides above the one char star that’s guaranteed to be there. Does that make sense to people?

So really all that happens – I’m gonna draw this arc again, like I did before – it knows about that much, if there were special directives in the implementation of printf that allow it to manually crawl up the stack. But a number of arguments and the interpretation of the four-byte figures that reside there, it could only figure that stuff out by really analyzing and crawling over this control string character by character. This is almost like the roadmap to what resides above it in the stack frame. Does that make sense? So the printf function really does need to get to the control string, and it reads it character by character, and every time it read a percent D – let’s say if reads a percent D right at the front, it says, oh, the four bytes above the control string must be an integer. And then it sees another percent D along the way. It says, above that there must be some other integer. And this is how it discovers the stuff that should fill in the control string. So you understand that, otherwise, if it didn’t have this right here, this would truly be a big series of question marks. It’s still kind of is a series of question marks. If this is the wrong roadmap, if I do percent D plus percent D equals percent D and I pass in three strings, these things will be laid down as char stars. That’s what the caller would do. And then it would interpret them as four-byte integers. So whatever addresses happen to be stored there would be taken as unassigned integers, and it would just fill those three things in that way. Makes sense? This is consistent with the way we push parameters onto the stack, from right left, last argument first, then the second to last argument below that, etc., so that the zeroth argument is at the bottom.

Imagine the scenario where the Safe PC is addressed by the stack pointer, but you have the mystery number of bytes below the control string. And that question mark region would be of height zero for the printf hello call, and of height 12 for the printf four plus four is equal to eight call. It would have no consistent, reliable way of actually going and finding the roadmap as to how to interpret the rest of the activation record. Does that sit well with everybody? So as long as you understand that, then at least you have a defense for why that dot, dot, dot – because of the dot, dot, dot, the C spec more of less has – I guess it doesn’t have to, but it just made sense to for compliers to implement this left to right parameter pushing strategy. Because they want to support that dot, dot, dot in the language. C++ has to do the same thing, because it’s backwards compatible with C. Java just recently introduced the ellipses – I think it was either Java 1.5 or Java 1.6, I’m not sure – but very recently they introduce the ellipses. And so I just know, without actually reading anything about it, that they have to push their parameters on the stack in exactly the same way. Pascal, old school, wasn’t old school for me when I was in college, but it’s old school for everybody here. I didn’t learn Pascal. I learned C first, but when I read about Pascal, it doesn’t have this ellipses option. You have to specify the number of arguments. It happens to press the argument on in the opposite order. And it doesn’t cause any problems. They had the flexibility to do it an either order because they never had to deal with this question mark region in a struct. Does that make sense? But as far as structs are concerned, you may ask – this is a hard point to make; I’m gonna try and do it. Struct Fu, let’s say I have an int code and I have, let’s say this. Int code, and I’m just gonna do that right there.

It’s unusual for you to have a struct around one data type, but I’m gonna do it anyway. And then I have struct type one, which has an int code and, let’s say, several other parameters. Maybe it’s the case that the code inside a type one struct is always supposed to be one. So just take this as a series of equipments of the example. If I have struct type two, with an int code at the front, I might require that all instances of type two actually have a two at the front. These are really esoteric examples. I’m just making this up. I haven’t done this is past quarters. But this right here is a data structure that I guarantee, whenever I give you a pointer to a struct base, the idea is that there is one of two values that sits right there. It’s almost like it’s a little opt code in the assembly instruction, in a sense, it to figure out how to interpret what resides, or figure out what resides above the one or two in memory. You could cast that pointer to a struct base knowing that there’s gonna be at – or maybe it is typed to be a struct base, which means that you know that there’s some kind of opt code or type code sitting there. And then based on the result here, you can either cast this arrow to be a type one star or a type two star to figure out how the rest of the information is fleshed out. A complicated example, but there are various structs – I don’t know whether you’ve looked at – I don’t think I’ve exposed the code for all the networking in the URL connection stuff. If I did, you would have hated Assignment 4 even more, because you would have thought you were responsible for it. But there are lots of structs that are afforded by GCC and G++ to help manage networking, and I tried to insulate you from that. Old school networking deals with four byte representations of IP addresses. They realized about 15, 20 years ago that they were going to run out of IP addresses pretty soon, so there’s actually a six byte universal version of IP codes. It’s standard, but it’s not really widely adopted yet. There are two different structs associated for the two different protocols. The four byte version IP, version four, there’s IP version six. The IP version four struct has been around for some 25, 30 years. Those things aren’t going to go away. So when they designed the IP version six struct, they had to make sure that the first half of it had exactly the same structure as the IP four version, and then they extended it with all this extra information. Does that make sense? Do you always know that you’re getting a pointer in networking code?

You always know you’re getting a pointer to one of those two structs, and you analyze the first few bytes to figure out whether or not you have an IP four struct or an IP six struct. Does that make sense? The reason I’m mentioning this is because now it kind of gives you some sense as to why the first parameter, or the first field in a struct or a class actually has to be at the lowest address. It doesn’t have to be, but that’s just the way they did it, because they had these types of things in mind. If this and this and this were always the last thing, or the thing at the top of the struct in the activation record, it would always be at a mystery distance from the base address of the entire thing. Does that make sense? Now you could just argue that you could make the code the last thing in a struct so you could do it the other way. It just makes sense to people who designed the spec or designed the compliers. I don’t think it’s part of the specification, although maybe it is, that the first field is always in offset zero, and you can exploit that knowledge to do clever things with C and C++ structs. Does that make sense? So I’m cranking on time. So what I will do now is I will give you a little bit of a head’s up on how we’re gonna transition things. Everything so far in C and C++ has been sequential. And I’m talking in Java it’s – actually, not in Java necessarily, but all of the C++ and C you’ve done, in all 106b and 106x, and for example, 107, has either been strictly or seemingly sequential. And you know what sequential means now, because you’re waiting for the sequence of RSS news feeds to all load over a five-minute period while you test. So you know what sequence means more than you did a week ago. I want to introduce the idea of how two functions can seemingly run at the same time. That’s going to be a huge win in the context of Assignment 4. You’ll all be delighted that you’re going to revisit Assignment 4 before Assignment 6. And you’re going to make it run oh so much faster by using this thing called threading. What I want to do is I want to talk about how two applications on a – and just give you very high level stuff – two applications can seemingly run on the same processor simultaneously, and then use ideas from that to defend how two functions within a process can seemingly run simultaneously. I’ll frame this this way. Let’s just think about one of the pods. This is the virtual address base, virtual meaning the allusion of a full address base that make has wireless running. You don’t think about Make as an application, but it certainly is.

It’s designed to read in the data file that you call a Make file to figure out how to invoke GCC and G++ and the linker and purifying and all of those things, to build an instrument and executable. This has a stack segment associated with it. All the local variables of the thing that implements make go there. There is the heap. There is the code segment. While make is running, it’s probably the case that GCC, as an executable, is running several times, but we’ll just talk about the snapshot or time slice where just one GCC is running. GCC is an executable. You’re first C complier was probably written in C – not written in C, was written in Assembly, but then it kept bootstrapping on the original compiler to build up more and more sophisticated compliers. So the C complier was written in C, I’m sure of it. It also thinks its stack is there and its heap is there and its co-segment where all the assembly code stuff resides right there. I can tell you right now that they’re not both all in the same place. They do not share the stack and they do not share the heap and they do not share the code. This virtual picture was in place so that make can just operate thinking it owns all of memory. And it lets the smoke and mirrors that the OS managers, to map these to real addresses and map this to real addresses and map that to real addresses. It just wants to be insulated from that. Maybe it’s the case that you have, I don’t know, Firefox up and running on one of the Linux boxes, has the same picture. And then you have some other application, like Clock or whatever you have, up there and it has the same exact picture. Those are four virtual address bases that all seem to be active at the same time. I’m gonna call this process one, I’m gonna call this process two, call this process three, and I’ll call this process four.

When I draw these little bands of segments right here, these segments is what they’re called, they’re all assuming that they have as much room to stretch out to make the stack as big as necessary, the heap as big as necessary, to meet the demands of the program. But on a single processor machine, there is only one address base. This is the real deal right there. That’s physical memory. And this has to somehow host all the memory that’s meaning to GCC and Make and Clock and Firefox, or I should say the processes that are running according to the code that’s stored in those executables. Does that sit well with everybody? Well, it turns out that this and that right there, they may have the same virtual address, but they can’t really be the same physical address. The space has to be truly owned, or the values there in virtual space have to be truly owned by this process I’ve called number one. So what the operating system will do is it will invoke what’s called the memory management unit to basically build a table of – let’s say that this is address 4,000. It’ll actually build a table of process and something related to an address. And it’ll actually map it to a real address somewhere in memory. Does that make sense? The idea, I think, probably makes sense to people. So that this right here, it thinks it’s storing something in address 4,000. Let’s say address 600,000 is right there. Any request to manipulate or deal with address 4,000 is somehow translated to a request to deal with the real address at address 600,000. Any type of load or store or access of this right here has to somehow be proxied or managed by this daemon process behind the scenes, this thing that just runs in the background all the time, to actually map virtual addresses to physical addresses. And it knows that this address 4,000 is the one that process one owns, so it just has this little map of information – I’ve drawn it has a map of pairs to physical addresses – so it knows exactly where to look on behalf of this process.

It stores them in address 4,000; it updates the four bytes that resides at address 600,000. Now, it doesn’t really clip these things down at the four-byte level. Normally, what will happen is it’ll allocate things – I don’t mean allocate in a malik sense – it’ll just associate very large blocks in virtual address space with very equally large blocks in physical address space. So this might be some 1K or 8K block of memory. And if it’s ever used, then it’s sure to map the same size block or adopt the same size block in physical memory so that address 4,000 through, let’s say, address 8,000, would map to whatever this is through whatever this is plus 4,000. So there’s some contiguous nature going on between all the things in this virtual address base and what it maps to in physical space. Does that make sense? A lot of work. It’s a very difficult thing to implement, certainly, the first time. I’ve never implemented one of these things. Maybe it’s even more difficult than I’m giving the impression it is. But it’s doing all this stuff in the background, it’s threads, it’s all kinds of stuff that makes OS and systems code interesting, but difficult. So we’ve solved the memory problem. In theory, you can run 40 applications. Usually it’s the case that the stack segment is never really that big. It’s initially slotted off to be fairly small for most applications. Because unless you’re going to call Fibonacci of a billion, it’s probably not going to have a call depth greater than 50 or 60. In fact, when you have a stack called f of 50 or 60, it usually means a lot of things are happening. The distance down from main to the subhelper to the 59th power function. I don’t want to say that’s unusual – maybe 100 is normal, maybe 200 is normal; 2 billion is not. So you know that most activation records are on the order of, let’s say that they’re 1K. It might be the case that you set aside 64K for the virtual stack or the stack in virtual space. You have two the thirty-second different addresses. 64K is ridiculously small amount of that.

It’s like that. So it definitely has space for it. You could be more aggressive about the way you use the heap. You could allocate megs and megs of memory there. It’s still going to be a relatively small portion of memory when you’re talking about two to the thirty-second different addresses. Makes sense? So the smoke and mirrors that’s in place so that every single application can run at the same time and not have its address space, or what it thinks is its address space, being clobbered by other processes. That’s managed pretty well by the OS – not pretty well – ostensibly perfectly by this memory management. You can share address spaces across applications, but you have to use advanced unit directives to do that. The part that is not clear, and this is going to become more clear, hopefully, next week and the Monday after it, is how the applications seemingly run at the same time, when there’s really only one register set, one processor digesting instructions at a time. I did this the first day of class, but it totally makes sense to do it again here. Forget about Firefox and Clock, let’s just deal with Make and GCC, which is what you’ve really been doing. And think about Make and GCC actually running seemingly sequentially.

You look at them both running – and my hands are sifting over the assembly code instructions. And they’re both seemingly running at the same time. That’s not what’s happening. What really happens if that Make makes a little bit of progress, and then GCC makes a little bit of progress, Make makes a little bit, and this just all happens in this interlay fashion, so fast that you don’t see any one lagging over the other one. It’s like watching two movies at the same time, where not much is happening. So you can actually follow both movies fairly well, as long as it’s clear that both of them are actually running. The argument for two hands scales perfectly well to three hands and five hands and ten hands and 50 hands, as long as the processor has the bandwidth to actually switch between all of the processes fairly quickly. That make sense? Now in a dual processor machine or a four processor machine or a multiple core machines, it can actually really run two processes and four processes at the same time, but you can always run more processes than there are processors on any sophisticated system. If something’s running a dishwasher, then it probably can’t deal with threading, but if it’s actually running some real program, it probably is dealing with a real processor, and the OS can actually dispatch and switch between processes fairly quickly. Makes sense? The reason I bring this up is because that, as a concept, is going to translate, I think, somewhat nicely to the notion of threading. This is multiprocessing. Several processes are seemingly running at the same time, and each process has its own heap and its own stack and its own code segment, and its virtual space.

Slightly different, but certainly related, is the idea that two functions in the same process, one code segment, one heap segment, technically one stack segment. We’re curious as to whether or not it’s possible for two functions to seemingly run at the same time inside a single process. You know; you’ve seen this before. Microsoft Office, like you’re typing and then while you’re typing, all of a sudden in the background, some little paperclip comes up and says, I think you’re trying to write a letter. And that happens in the background, and that’s because something in the event handlers that actually catch your keystrokes have done enough synthesis of the string to look that it looks like a header of a letter. And so it spawns up this other function that doesn’t – it’s not really supposed to interfere with your typing, and from a computational standpoint, it doesn’t. From an actual mood standpoint, it does, because you actually go down and look at it. But that is an example of a thread that is spawned off in reaction to an event, or something like that. That makes sense?

iTunes, you buy an album of 13 songs – I’m hip; I buy music online – and you check the actual download screen and three songs at a time are actually downloading. Not really. It’s really doing – and iTunes is another hand over here. And it’s pulling things in in little time slices, but it happens so fast and the time slices are so small compared to what we can detect, that it looks like all three songs are being downloaded simultaneously. There’s one process going on there. It’s not like it spawns off a different executable called the “paperclip executable.” It’s just some function to bring up that little widget. When you’re downloading music, there’s one process that’s doing it, it’s iTunes. And internally, it has some function related to the downloading of a single song that at any one moment it’s allowing to run, seemingly, three times – sorry, three at the same time. Does that make sense to people? So just imagine the scenario where there are two songs downloading at the same time. In that case, they both would be following the same assembly code block. They have the same recipe for downloading a song, for the most part. In that case, the stack segment itself could be subdivided into smaller substacks, where the stack frame for song one could be right there, and the stack frame for the downloading of song two is in this completely unrelated space – not unrelated. It’s in the same segment, but far enough away that you’re not going to have one stack run over the top of another one. Do you understand what I mean? So when the process has the processor, it also subdivided its time to switch back and forth between the two threads – that’s what you call these things. And so basically, it goes you have time, you have time, you have time, hope you’re downloading songs. And it keeps doing that until one or both of them ends. Does that make sense? It shares the heap. So they all share the same call to malik and they all draw their memory from the same memory pool.

There’s only one copy of the code. You only need one copy of the code. It’s read only. It’s fine for this stack and this stack to both be guided by the same set of assembly code instructions, as long as each one has some kind of thread identifier. If the word thread is bothering you, just think take the phrase “thread of discussion” or “discussion thread” and just translate it to “function thread.” Does that make sense? You may ask what types of scenarios actually require threading and which ones really don’t require threading. Let me just go over this very simple example where you really would expect threading to be in place to model the real world situation. I will revisit this example on Monday. I have this really simple program, int main – and there’s no threading whatsoever. I have this four loop, int num agents. When I call this num agents, I’m actually thinking about ticket agents who answer the telephone at United or some other airline that still hasn’t declared bankruptcy yet. Num agents is equal to ten, and let’s assume that the job of this program is to simulate the sale of 100 tickets for the only flight that happens to be flying anymore. You might do it this way. You might intrinsically hard code the 100 in there by calling some function “sell tickets,” where you pass in i and you pass in ten.

And I don’t need to tell you what sell tickets – I’ll write code for sell tickets in a second. But the idea here is that we have to sell 100 tickets. This is the number of agents right here. That number is the same as there. In fact, let’s say that there are 150 seats on the flight, so we don’t have to mix ten’s. The idea of “sell tickets” is that it’s supposed to be in place to simulate the requirement that some ticket agent actually sells 15 tickets before his or her job is done. As long as these run in sequence, eventually you’ll get to the point where you actually sell 150 tickets in this really rude but simple simulation. The problem here is that in the real world, it’s just fine for all ten of these agents to be answering telephones simultaneously. It’s not – I don’t want to start introducing thread functions yet, but I just want to leave you with the idea that we’re going to be able to get that function right there to run in ten different threads. In other words, I’m going to spawn off ten different threads. All of them are going to be following the same exact recipe, where each one has to sell 15 tickets, and when a particular thread exists, we just know because of the way we code it up that 15 tickets have been sold. I’m not going to write code for this yet, but this is, I think, a fairly good analogy. Imagine a horserace or a dog race. Sequentially, if you wanted all ten dogs to get to the finish line, you could just let one go, and when it gets to the finish line, let the next one go. And just do it that way, and take 15 times as long or ten times as long as you really need to. Or you can line them all up in ten gates and lift the gates at once. And some will be faster than the other ones, but eventually, they’re all going to get across the finish line. Like you’re basically pipelining and taking advantage of the fact that things can happen more or less at the same time. Now the difference is that they don’t run like this every time, and they’re not respecting time slices.

They actually are really independent agents. But we’re going to try and simulate that idea as much as possible with this example when we introduce the thread library next time. So this is a great place to stop because I’m at the end of a segment in the course, and to try and introduce threading in five minutes would be pretty difficult. I will see you on Monday and we’ll talk more about threading then.

[End of Audio]


Lecture 14: Programming Paradigms

Topics: Transitioning from Sequential Programming to Concurrent Programming in the Ticket Sale Example, Problems with the Sequential Model, Threading Interface, Rewriting the Ticket Example to Use It, Adding a Randomized Threadsleep Call to the Threads to Make the Time Slices Used by the Different Threads Less Uniform, Sample Output of Our Ticket Threads, How a Thread Can be Interrupted in the Middle of a Nonatomic Operation, How Multithreading Can Drastically Speed Up the RSS News Reader by Allowing Some Feeds to be Loaded While the Other Feeds Are Blocked, Allowing Each of the Ticket Threads to Access A Global Pool of Tickets, Rather than Allocating 15 Tickets to Each Agent, How This Can Lead to Problems When Threads Attempt to Access the Shared Data Simultaneously, How We Can Prevent This from Happening By Enclosing the Critical Region within A Semaphore, the Semaphorewait And Semaphoresignal Functions, Modifying the Selltickets Function to Use the Semaphore to Protect the Shared Data, How Changing the initial Value of the Semaphore Can Create Deadlock or Allow Too Many Threads to Access the Shared Data At Once



Instructor (Jerry Cain):Here we go. He’s obviously ready. Welcome. I have three handouts for you today. I put them up on the board because they’re not in sequence. I went back and used the No. 4 Assignment 5 that I gave out last week, a week ago today. So people who are watching on TV make sure you go back and get the solution to handout 19. I think that’s what it is. Let me just check, yes, it is indeed. So there’s all of these handouts in the last week that have corresponding solution sets, so make sure that you actually download the solutions as well so you can compare your answers to mine and make sure they’re in sync with one another. You know you have a midterm on Wednesday evening, it’s 7:00 to 10:00 in Hewitt 200. It’s a huge room over there; plenty of space as well as the other room, so it’ll be a great place to hang out for three hours. I want to be clear that I am certainly covering co-generation; you’ve seen that on the sample exams that I gave out last week. I am not gonna include co-generation for C++ features, no true references and no object orientation, no methods. So pointers, that’s fine, anything related to asterisk, that’s real C and I’ve emphasized that in the early part of the co-generation, but references and methods, the co-generation for that it’s not testable in the midterm. You will certainly see it on the final. I always know exactly what type of questions I put on the final for that stuff, but you will not see it this Wednesday. Okay? I also promise to not have any preprocessor or linker or compiler stuff. I went through that kind of as a transition from C figuring out how to build executables from the C language, but I’m not gonna test that material because I have tested it in the past and it just never goes well because it’s so esoteric and we don’t have any kind of real problems to exercise the materials. So people just didn’t do well, so I just stopped testing it. When I left you last time, I had written this partially simple program that was supposed to model the selling of a 150 airline tickets on a single flight. So let me repeat that and point out why it’s problematic and how we’re gonna move away from it.

I went ahead and did something like this. I wrote it a little bit differently last time, but I’ll write it like this, num tickets is equal to 150. All I want to do, in this brute force four loop, I’ll write agent is equal to one agent less than or equal to num agents, agent ++ – I went ahead and I called this function, called sell tickets, and I’m gonna frame in terms of the agent ID num tickets dib, not agents, so that it’s planarized in terms of these two values right here, and [inaudible] each ticket agent knows that he or she has to sell that many tickets as part of his or her function call and that’s it. That’s return zero to satisfy the compiler. Okay. We’re not gonna allow anything to go wrong in this simple program. The implementation of sell tickets – it’s not gonna be rocket science. I’m gonna write it a little bit differently than I would traditionally write because I’m paying forward to the way we’re gonna change the example to in a second. Void sell tickets, int agent ID, int num ticks to sell, and even though it’s a little weird, let me not use a four loop, let me use a wild loop. Wild loop is the case that num tickets to sell is greater than zero – go ahead and do print F agent percent V sells a ticket. Agent number and then does num tickets to sell minus minus, finally, my arms tired, but I’m gonna keep on writing, print F agent percent D all done. Agent num – this arms gonna be bigger than the other one by the end of lecture. Okay. There we go. From a code standpoint, it’s moronically simple. I’m not trying to revisit four loops and wild loops. What I’m more interested in doing is figuring out why this is as simulation is really not all that good in the sense that it’s not really modeling what would truly happen.

The way this is set up, and I’m speaking as this is new material, but it’s not, it’s clearly gonna be sequential and ticket agent one is gonna sell all of his or her 15 tickets before anything happens with ticket agent two. So you know that the print out of this would have a 160 lines, 16 lines per ticket agent – okay, I’m sorry, yeah, 16 lines per ticket agent, but they would be sorted all ticket agent number one followed by an all done comment, that’s the 16th. Okay. Does that make sense to people? Okay. I’m sorry, 165 because there’s 15 agents going on here – no, I’m sorry, there’s 10 agents so that’s 160. I’m just confused, but everything’s gonna be all about agent number one before it’s agent number two, before it’s agent number three, etcetera. I really don’t like that. Okay? This is a fairly compelling example where if we’re really trying to model the simulation of an actual airline ticketing room that you want to see all of these ticket agents running simultaneously and working, not competing, but working collaboratively to sell all 150 tickets at the same time. Now, what I’m gonna do is I’m gonna repeat the four loop, int agent, agent, less than or equal to the agents, agent ++. Here is how you set up the actual dogs at the racetrack. What I want to do is I want to create a name that’s unique to a particular ticket agent. I’m gonna do that by declaring this in place buffer – you haven’t seen this function, it’s not the emphasis, but I might as well show it to you cause it’s in the handout. I’m gonna print F, not to the console, but to a character buffer, there’s a function called S print off to do that rather than actually echoing the characters to the screen, I echo the characters in place to a character array and make sure it turns out to be a C string.

The place where they function as that console of sorts begins at the name address, okay, and as long as I don’t print more than 32 characters I won’t overrun the boundaries of this thing. This is the structure of what gets printed and then I will fill in agent. So for all intensive purposes, on the zero generation or the first generation of this thing, after that S print off call is made, name contains – it goes from garbage to the C string, agent one thread. The reason I do that is because I want to call this function called thread new and the name actually serves as the name of the thread and it’s also helpful for debugging purposes should you be passing a true to that thread package. And then you go ahead and you pass the address of the function that you’d like to execute in a single thread of execution. Okay. This right here is just an arbitrary function pointer. All of the arguments have to be four bi arguments with just the constrain of the system so you can pass ints and floats and pointers, but you can’t throw in struts or characters or shorts, it’ll confuse the system. You have to tell it how many arguments are expected of this particular function. We’re gonna have scenarios where the thread functions don’t need to take any arguments. We’re gonna have a scenario like we have right here where sell tickets takes two of them. Beyond the two, you pass in the numbers that are of interest, so agent and num tickets divided buy num agents. Now, this does not actually prompt sell tickets to start executing. All it does is it lines it up at a gate.


Instructor (Jerry Cain):The prototype of thread new just requires a thread name right here. We have to do it because it’s [inaudible] but that’s a lame answer. You really do want something available to you to identify a particular thread, that for instance, is failing you during the debugging process, and if you have 15 of these things, does that make sense? I’m sorry, you have 10 of these things and then one of them is falling or one of them is never exiting, which is actually a common thing we’ll see in multi-threading. You want to be able to know which of the threads is not exiting so you can go and look at the particular implementation of that function. Okay.

Okay. So, procedurally, what happens up front, as we say, we’re using threads and then you lay down gates one through 10, all of these things that are gonna follow, the sell tickets recipe, that’s what they need to follow in order to run from the starting gate to the finish line and then this basically sounds the bell, fires the gun. Okay. This is a function block until all of these threads actually finish and run a completion and then when all 10 threads are finished, this returns and it passes on to what will be the end of the entire function.

Student:If I wanted to do the same process somewhere else, is this the int package and run all threads within the scope of the function?

Instructor (Jerry Cain):It actually does not have to be in main, it’s just most conveniently put there.

Student:[Inaudible] as well when [inaudible] everything out there?

Instructor (Jerry Cain):What has to happen is that before you call run all threads you have to call the package just exactly once and you have to set up all the threads, whether it’s directly in main or through sub functions to set up all of the dogs. Okay. Does that make sense? This actually fires the gun and tells all the threads to start running. It turns out that as threads execute, as part of their implementation, they can themselves call thread new. Does that make sense? The threads themselves can spawn their own child threads, okay, grandchildren threads, whatever you want to you. The ridiculous metaphor I have is that somehow while a dog is in the race it gives birth to three new babies and throws them back to the beginning of the gate, okay, and say’s, “Please run,” and sometimes there’s interesting concurrency issues that can come up with that type of thing, but I’ll actually get to that with a more advanced example probably Wednesday or Friday.


Instructor (Jerry Cain):This just basically says now we’re in thread mode. Okay. Once this has been called all threads that are ever created in the process, even if they’re children threads, just aren’t executing immediately. Okay. So as far as this is concerned, I want to just invent a function right here. If random chance, 0.1, I want to call a thread sleep function and I’ll just pass in and say 1,000. Okay. That thread sleep basically says that as part of execution if a thread is running and it executes the thread sleep function that it pulls itself off the processor for at least, in this case, a second, that number that passes as an argument is expressed in milliseconds. So this means that every time it flips a biased coin and it comes up heads with probability of 10 percent or .1, rather, it’ll force it to halt. Now, that’s not the only way a thread will halt, but this is a way for you to problematically tell a thread to stop running. Let’s forget about thread sleep. I shouldn’t have talked about that yet. What actually happens is that when you spawn off two or more threads, even technically one child thread, but two or more is when it’s interesting, run all threads establishes something of a heartbeat.

In between the top of every single finger some different function, usually in a round robin fashion, but not necessarily, gets the time slice that exists in between the two finger taps. Does that make sense? So it’s, like, agent one, agent two runs, agent three runs, agent four runs, in that round robin manner, okay, and however much progress they happen to make in that time slice is the progress they make. Now, if I don’t introduce any randomization it’s probably the case that on a real system it would execute the same exact number of assembly code instructions. Okay. Or very close to it so everyone would make exactly the same amount of partial progress with each time slice. Okay. Does that make sense? To make it a little bit more real world, we introduce some [inaudible] process here where things all of a sudden get a little bit random and maybe it’s the case that ticket agent one sells two tickets and it’s in his or her time slice and then gets pulled off the processor instead of actually being allowed to sell two more tickets. Maybe ticket agent two comes next and sells four tickets; maybe ticket agent three comes next and sells four tickets because this coin flip never comes up heads. Does that make sense to people? Yep.

Student:[Inaudible], which threads [inaudible]?

Instructor (Jerry Cain):Well, the one that’s executed. The one that actually calls it. Okay. I mean, it’s only called once, but the 10 dogs up there are actually each following this recipe, they each have their own little pointer in to be an assembly code that this compiled to. Okay. And if they happen to jump into this function right here then the thread that is actually is stuck inside that function is pulled off the processor and it’s even pulled off what is called the Ready Q and put on this thing called the Blocked Q until this number of milliseconds, in terms of time, elapses. Does that make sense? Yep.

Student:[Inaudible] or run all threads, is there any control over how many [inaudible] cycles each one will get?

Instructor (Jerry Cain):Not in our system. Some very sophisticated thread libraries, I don’t want to say very sophisticated, a lot of thread libraries, they don’t always give you control over the amount of time that’s in a time slice. They do want that to be somewhat regular because they don’t want you to have unpredictable results, certainly not during the development process. Even though ours does not, some thread libraries, particularly the one in Java, everybody will be the most familiar with by the end of next year when you take 108, you learn about the thread library there. You can attach priorities. There’s only three degrees of priorities in Java. I’m sorry, that’s not true – there’s 10 levels of priorities in Java where you can assign a priority of one to 10 and then, usually, it’s the case that they’re all sorted and all the threads with priority 10 execute and run to completion before anything with priority 9 is given time. Okay. But we don’t have any of that here. We really just want to think of all threads as equally likely to get the processor for a particular time slice in this round robin manner unless there are other things in place that actually block it from being able to make process.

Okay. So think about this line as basically really not blocking it all or blocking for an arbitrary amount of time. Okay. What I want you to imagine here is what type of print out you might actually get in response to this thread implementation. You might get three print Fs, agent one sells a ticket, it may happen like that. Maybe it sells three, maybe agent two comes next, maybe agent three sells five because that’s how much time slice allows, maybe five is actually the most you’d actually see as the number of tickets that sold. But you understand what I’m getting at here. Does that make sense? It would just keep on cycling through all of them. Maybe it is the case after 130 or so lines that, for whatever reason, agent seven all done gets printed and then maybe it’s the case that agent eight sells a ticket, agent eight all done and then eventually maybe it’s the case, for whatever reason, agent four is the last one to sell a ticket and this is just representative of the type of output you might see from this. Now, there’s nothing interesting about this from a simulation standpoint because there’s really no compelling reason; from a performance stand point to use threading here except that you’re trying to emulate the real world a little bit more. There are situations that you want to go with threading for performance reasons, this isn’t one of them. I’m just trying to illustrate the thread package. Yep.

Student:If you didn’t do the thread sleeve, would you see agent one sells a ticket, agent one sells a ticket, agent one sells a ticket or is it agent one sells a ticket, agent two sells a ticket?

Instructor (Jerry Cain):You would actually see – the thread library has no notion of what a wild loop is so it’s not like it detects it, you jump back and uses that as a signal to pull it off the processor. Let’s say that the typical time slice is 100 milliseconds, however many tickets can get sold in a 100 milliseconds is how many would be published. Sometimes it’s gonna be partial, maybe you’re gonna be halfway through the implementation of a call to print F, right, when it gets pulled off a processor and then when it gets the processor back it continues through the partial execution of print out to complete it, return and decrement the num tickets to sell count. Okay. Does that make sense?

Student:There’s no way to [inaudible] two processes that you want to run parallel, but you want them to switch back and forth faster than 100 milliseconds?

Instructor (Jerry Cain):You can. Well, [inaudible] you certainly do not have that. I have to think that there’s some thread packages out there that do allow you to control the time slice. It’s usually not that high priority. Usually you don’t introduce threading into a program to have control over the time slicing, you really just do it to have concurrency in the first place, let the thread manager figure out which thread is gonna make the most progress. In an ideal world, we actually don’t want to pull agent one off the processor at all if agents two through 10 are on extremely long phone conversations, and that doesn’t happen here, but in a real simulation, you might want agent one to keep on selling tickets if all the others are blocked from something else. Okay. This is just in place to illustrate the thread new and the run all the threads and the initial [inaudible] concept. Okay. Yep.

Student:[Inaudible] doesn’t [inaudible], it just sort of pulls that one off the Q and it keeps running?

Instructor (Jerry Cain):That’s right. And you have to think that this is actually being called by 10 different threads in this set up. It’s only written once. It’s, basically, like 10 copies of the same book, but it’s not even that. It’ actually 10 copies of the same webpage and the webpage itself is hosted on one machine. Okay. Does that make sense? There’s one copy of the code, 10 independent threads are following the same recipe. Okay.

Student:Well, and [inaudible] run threads directly after any initial thread package?

Instructor (Jerry Cain):Well, it has to be called – oh, I see what you’re saying. In other words, to set this up and maybe call it right there –


Instructor (Jerry Cain):– with the idea that these actually run immediately, I know what you’re doing. In our system, it wouldn’t work because this is a function blocks until all threads have been completed. So what would happen is you could it an int thread package, you would call one whole thread, but there wouldn’t be any, so it would return and then it would go on and spawn these threads that aren’t allowed to run because run all threads is actually a return.


Instructor (Jerry Cain):Okay. This is just idiomatic. Do this, set up at least one thread to run to make sure that all the work that needs to get done gets done, but in this concurrent manner as opposed to this sequential manner, and then call that just to fire the gun. Okay. Yep.

Student:What happens to all sell tickets as that whole thread contacts [inaudible] thread [inaudible] command?

Instructor (Jerry Cain):It’s not inside a thread?


Instructor (Jerry Cain):It could work. The way that the thread library works, the main thread is also a thread, so as part of as sequential execution, it’s really not sequential. It happens to still be in a thread, it just happens to be the main thread as opposed to one of these child threads that’s spawned off by what was reachable for main. Now, I have to say I’ve never tested that because I’ve never given an assignment or done an example where I exercised the edges of the thread library, I’ve just kind of gone with the way it was designed to be just so I can make progress. But you could try it when the assignment goes out and see what happens with it. You’ll never see a meaningful example from me that actually will realize that. Yep.

Student:You said the thread doesn’t know if it’s in a wild loop, does it know if it’s in an instruction, like, does it know that [inaudible] and then the time went off, is it –

Instructor (Jerry Cain):Right. Right. That’s certainly the most interesting part of today’s lecture is that right now, you know enough about co-generation, I’m hoping, because you’re gonna be tested on it in two days, what we say is that it’s not an atomic operation. It looks like it’s atomic because it’s written in one statement right here, but what happens is that this really corresponds to probably what we would as a local variable, it would be a three-assembly code statement. Does that make sense to people? So it’s gonna basically load num tickets to sell and do a registered decrement it by one and flush it back out. This is going to be a complexity that we start to solve in the last 20 minutes of lecture right here. So when it gets swapped off a processor, it could be right before the first instruction of the three that this compiles to. It could actually finish right after the third of the three, or it could be pulled off the processor 33 percent or 66.7 percent of the way through the code block that this compiles to. Does that make sense? You feel every single stall time in sequence. If you use threading, and this is the best example of all of the one that I’m gonna have, I think threading is really important. If you use threading and you spawn off 12 download from BBC server threads, all of them make enough progress, all of them try to open the connect and because that’s considered a block at the kernel level, it’s pulled off the processor. It’s a much more harsh version of thread sleep. But it sleeps for a meaningful reason because it really can make a good progress and the thread realizes that and the thread manager realizes that so it pulls it off a processor while it’s waiting for the connection to be established. Does that make sense? Well, imagine that all happening with 12 threads, all of those dead times that are associated with the network connection, they all align and overlap and pipeline in this way that really saves us a lot of time. Does that make sense? Okay. Yep. Go ahead.

Student:The last lecture you mentioned about downloading strings in class; I was thinking if you only have download capacity, you only have so much speed you can download, so what size files can you download in a certain amount of time, how can you download more than that?

Instructor (Jerry Cain):You’re actually not. As far as the downloading is concerned, if you’re dealing with a unit processor and you’re dealing with one processor with one ram and one core, then you’re dealing, primarily, only with the ability to index one article at a time as the text comes through. So you’re right, you don’t save time for the actual pulling of the text and parsing of it, and updating your hash sets, but you really save the time with your network connections, and that’s what the huge win is. Okay. Does that make sense? Okay.

So what I want to do here is I want to complicate this problem a little bit, but complicate it in a meaningful way. In a real world simulation, it might be the case that you have two ticket agents, and you have to sell 10 more tickets and if somebody’s stuck on the phone because they want to buy the ticket or not, so the other ticket agent should be able to sell all nine or 10 tickets while the other is blocked with some time consuming customer. So what I’d rather do is rather than actual instructing each particular thread to sell a pre-determined number of tickets, I’d rather grant them all access to the same shared integer, the master variable, that stores the number of remaining tickets and do something like this; int agent and int star numb tickets and I’ll put a P there. I’ll close it off for the moment. I’ll change this up here in a second. What I want to do is I want each agent to know what their badge number is, but I also want them to be able to go back to the main function and find the master copy of the number of variables that are remaining. This is basically the equivalent of the one master copy of your checking account balance. Every single ATM machine in the world is supposed to have atomic transactional access to. Does that make sense to people? Okay. Here is the main thread and its stack frame. All 10 other stack frames for the 10 other executing threads all have pointers to that one 150 inside, and that’s how they kind of keep dibs on how many tickets there are remaining to sell. Okay. Does that make sense? Now, the problem and this is actually not even the full problem, but I’ll simplify the problem to make it seem like it’s easily solved, is that I, as ticket agent one, might come through and I might commit to that test and say, “Oh, wow, there is, in fact, one ticket left, that’s greater than zero, so I’m gonna commit to selling it.”

Make sense? And then boo-hoo, it gets swapped off the processor right after the curly [inaudible], but before anything associated with the num tickets minus minus. Okay. Make sense? So it gets swapped off the processor and thread number two comes in and executes the same test. “Oh, look, there’s one ticket left, I’m gonna sell it,” and it comes in and it commits to trying to sell it, but it gets swapped off the processor. Same thing for thread three, thread four, it could be this diabolical situation where everybody is really excited to sell the one remaining ticket. They don’t go back and recheck the test after they get the processor back, that’s not what is probability encoded, so they’re all gonna try and decrement this shared global and so one, could, potentially, go down to negative nine. I don’t think this is why airlines overbook flights, okay? But you can understand the type of concurrency problem that exists here. They’re all depending on the same shared piece of data, and if they’re not careful in the way they manage the shared data and if it’s partway through the execution and it makes decisions based on information that will become immediately stale if its pulled off the processor then the integrity of the global data can actually be mucked with and be compromised. So, at the very least, we want this all the way through that to, more or less, be executed in full. Okay. So basically what that top bracket and what the bottom bracket does it kind of marks that thing right there, is what’s called a critical region. It’s, like, once I enter this region, no one else is supposed to be in here while I’m doing surgery on that global variable. Does that make sense? Now, there’s nothing in the code that actually says, “Please other threads, don’t come in here because I am,” there have to be some directives that are put in place to block out other threads. This is the situation where you’re really glad that the bathroom door locks because if you’re in there, you don’t want them to have the privilege of just walking in because they’re running in their own little thread. You actually have to have a directive in place, this thing called a lock, I’m gonna frame it as a binary lock, I think for obvious reasons, because you only want one person in the bathroom or in the critical region, right here, at any one moment. Okay.

So what I want to do is I want to talk about the most common concurrency tool that’s in place to actually help delineate what is considered to be a critical region. It involves me introducing another variable type. I want to introduce something called a semaphore, and I’m gonna call it lock and I’m gonna set it equal to semaphore new. It takes two arguments; the first one I don’t care about, the first one is gonna be some integer. Now, I’m just introducing semaphore like it’s a word that you’re all familiar with. I know you probably know what semaphore means in a general sense, but in a programming sense what a semaphore really functions as is non-negative integer, at least in our library it’s considered to be a non-negative integer, that as a data type has functionality that supports atomic plus plus and atomic minus minus. This, basically, sets this glorified integer equal to one, okay, the minus minus and the plus plus against this lock, comes in the form of two different functions. There’s a function called semaphore weight which, in this case, would take the lock variable. There’s also another function called semaphore signal, which also takes the semaphore. Now, those are functions that, behind the scenes, emulate minus minus and plus plus, but they just figure out using special hardware or special instructions of the assembly code language to actually take the integer that’s wrapped around by the semaphore, in this case, what’s initially a one, and provide atomic minus minus. Okay. So, in other words, this right here would be decremented to zero if this were called against it. This would promote it back up to one. The reason that weight is the verb here is because we’re gonna generalize a little bit. Think about the semaphore as tracking a resource. In this case, there’s exactly one person allowed in the bathroom or there’s one person allowed into the critical region, okay, which is why that’s a one in the first place and you acquire that resource or you wait for that resource to be available and when you don’t need it anymore, you signal it or you somehow release the lock. There’s one key that I forgot to make – is that because the semaphore integers in our world are never allowed to go from non-negative to negative, there’s a one special scenario that’s handled by semaphore weight. If a semaphore weight is passed to semaphore, that at the moment, it analyzes it and is surrounding a zero, it doesn’t decrement it to negative one; it’s not allowed to do that. That’s just the definition of what a semaphore is. If it detects that it’s a zero, it actually does what is called block and it blocks on that semaphore.

It actually pulls itself off the processor because it knows that it’s obviously waiting, presumably for some other thread to signal that thing before it could ever pass through that semaphore weight [inaudible]. Does that make sense to people? Okay. Basically, if I’m jiggling the door for the bathroom, like we always do at restaurants to wonder whether somebody is really in there or not, okay, you need, before you can really pass in there, you need someone else to release the lock, some other thread or some other agent in the form of a semaphore signal call before you really can go and open that door and then you can look it yourself. What I want to do is I want to do is pass in three arguments to sell tickets. The reason I want to do that is because I want to tell the ticket agent what his or her idea is. I want to pass in the address of the shared resource, but I also want to pass in this thing I called lock. Now, the semaphore type is actually a pointer to an incomplete type. It’s not copies, it’s actually share some kind of strut behind the scenes that tracks the integer inside of it. And then the prototype of this would be the change to take a semaphore, I’ll call lock, and this is the implementation I want to go with. I’m gonna simplify it a little bit. I’m gonna say while true, I’m gonna semaphore weight for the lock. As a thread, I have no business following that pointer and looking at its value and comparing it, using it in any sense, even comparing it to zero, because as I advance through the execution, I can’t trust that that comparison is actually meaningful, if at any point during progression, it actually gets swapped off the processor and other threads can go and muck with that shared variable. Does that make sense to people? Okay. So what I want to do is I want to wait on the locked bathroom door and if I happen to be the one that first detects that it’s unlocked and I can go in and, in this atomic manner, actually do a decrement. So as I detect that it’s been promoted from zero to one, I actually take it from one down to zero and actually pass through this semaphore weight call, then I can do this. Num tickets P is double equal to zero then I want to break, otherwise, I want to do this – I want to print up that I have right here to say that I sold a ticket and then I want to semaphore the signal lock. The one thing I want to do here is that if, as a thread, I acquire the lock and I notice that there are no more tickets to be sold, when I break out I don’t want to forever hold the lock on the bathroom. Okay. If you can programmatically unlock the door from afar you’re no longer in the critical region, but you still somehow manage to unlock the bathroom door. Now, there’s a couple of points I can make about this just to let it rest for you because this is probably where I’m gonna leave things until Wednesday. I initialize the semaphore to one up there.

That basically functions as a true. It basically says that the resource is available to exactly one thread and the first thread to get here actually does manage to, in an atomic way, take the one and do a minus minus on a down to zero because it actually committed to the minus minus, it returns – it executes this. It takes the zero back to a one one. It may come back around and take the one back down to a zero, but it’s always, like, lock, unlock, lock, unlock, lock, and maybe it actually gets swapped off the processor right here. That would normally be dangerous except that it’s leaving the semaphore in a state that it surrounds a zero. Okay. So there’s some other threads that the processor and it certainly will then they come here and they, basically, are blocked by a zero semaphore. Does that make sense? Okay. Imagine a scenario where I accidentally – and this is actually the type of thing you have to be careful about because it’s so easy to type a zero versus a one when you’re typing a lot of them. If I do that right there, this creates a situation that you really have to be worried about when you’re dealing with concurrency and threads, is that if I accidentally lock the bathroom door before anyone comes to the party, everybody’s’ gonna be blocked and no one is in a position to actually unlock it. At least not the way I’ve coded things up right here. Does that make sense? If I make the mistake of putting a zero up there, then every single thread will get this far and they’re all gonna be thinking that someone else is gonna be [inaudible] that semaphore, so all 10 of them are pulled off the processor and everybody’s just waiting. That isn’t the case because of that one little bug that I put up there. Okay. Make sense? If I have the opposite error and I do that right there, from a programmatic standpoint, if it’s gonna be two, it might as well be 10. If you’re gonna let two people in the bathroom why not let all 10? If you’re gonna actually let two people go into the critical region and muck with global data at the same time, then you have the potential for having two threads deal with a shared global variable in a way that they really can’t trust each other. Does that make sense to people?


Instructor (Jerry Cain):Okay. So there’s that. So the real answer here is that this, in this particular case, should be a one. Now, we will see situations where a zero is the right value. Okay. We will see situations where two or five or eight or 20 or 64 are the right values, but for this one scenario where I’m using a semaphore to basically limit access to what’s clearly identified as a critical region, okay, that is the common pattern for using a semaphore. Okay. Question right there?

Student:Do you have two signal locks?

Instructor (Jerry Cain):Two signal locks, oh, this one right here? This is the one that actually is there whenever I actually do do a decrement. Because I can break out of the loop right here – if I break out of the loop, I circumvent this final call right here, but other threads may be blocks on the semaphore right here. All they need to do is to verify, as well, that there are no tickets left, but you still have to allow them to program as they get there so they as threads can also exit. Okay. Does that make sense? Okay. Yep.

Student:Is there something stronger than a semaphore that actually won’t let the thread get pulled if you have something time sensitive?

Instructor (Jerry Cain):Actually, just priorities is really it. Even then, it’s probably up to the thread manager as to whether or not – what would probably happen is a really sophisticated thread manager might actually know behind the scenes before it even grants the thread or processor, but there’s only one thread with that priority. So it might actually have – and I don’t know that this is the case and I’m just speaking in terms of implementation details – it might say, “Okay, that’s the only one of that high priority, so unless we see a spawn of thread of equal priority or higher priority, we’re just gonna let it run until it actually blocks itself,” in which case, we don’t have any choice. I don’t know that many systems do that. It’s technically possible to do it. Okay. So we’ll have more examples come Wednesday, but I just wanted to make sure that you all got this. Have a good night.

[End of Audio]


Lecture 15: Programming Paradigms

Handout 13: Thread Package Docs
5 pages

Handout 14: Concurrency Examples
16 pages

Topics: Review of Semaphore Syntax, Semaphoresignal and Semaphorewait, Semaphore Usage in the Multithreaded Selltickets Function (Protecting a Critical Region), Example of a Race Conditions Where Two Ticket agents Sell the Same Ticket, How the Stack and Various Registers are Saved When the Currently Running Thread Is Swapped, Another Example Using Semaphores that Models the internet, Implementations of a Reader and Writer Thread, Potential Dangers When the Two Threads Run Without Protection, Using a Fullbuffer Semaphore and an Emptybuffer Semaphore To Ensure that Neither Thread Outpaces the Other, Different Semaphore Patterns - Binary Lock Vs. Rendezvous, Effect of Changing the Starting Values of the Emptybuffers and Fullbuffers Semaphore, How To Detect Deadlock, Changes in the Thread Synchornization When Using Multiple Readers and Writers, Dining Philosopher Problem - Modeling Each Philosopher as a Thread, How Deadlock Can Result, How the Deadlock Can be Eliminated by Limiting the Number of Philosophers that Can Eat at Once



Instructor (Jerry Cain):Hello, welcome. I don’t have any handouts for you today. You have plenty of handouts from Monday that we still have to spend the next few lectures on. You’re not getting an assignment today, so you have this grace period from tonight at 10:00 p.m. until Friday, where you have no responsibility for 107 whatsoever. Remember, the exam is this evening at 7:00. It’s in Hewlett 200, which is this huge auditorium in the building across the – beyond the fountain from gates. I’m going to send an email out after lecture, but just in case SUPD students are watching this before the TV exam tonight, I’m planning on posting the exam as a handout at 7:01 p.m. tonight, and then remote students just download it, self-administer, call in if they have questions and then fax it in when they’re done. They don’t need a proctor. I don’t need any of that business. I just assume people are well suited to just sit by themselves and take an exam without somebody of authority hanging over their shoulder. And then, SUPD students actually have the option to take it tomorrow morning as well, and I actually prefer that SUPD students take it, because if there’s a disaster during the exam tonight, people in the room can be dealt with immediately, where as it’s very difficult to probably get that information outward. So, I actually prefer SUPD students to take it tomorrow and then fax it in sometime before 5:00 tomorrow so we can grade them.

Okay, I’m going to try to get the graded exams back to you and available by Sunday evening. I can’t promise that. Okay, I actually have not dealt with a class this large in a long time, so we’re dealing with – I know it looks like it’s this cozy little family here, but it’s not. It’s actually 230 some people and it’s been a while since I have had to manage a grading effort that involved that many people. It’s also complicated by the fact that I am out of town this weekend, so my CAs are grading it and it might be difficult for them to get you the exams back by Sunday evening, but we’ll do our best to make sure that that happens, okay? When I left you last time, I had focused specifically on the first multi-threaded example where we had to introduce this notion of a semaphore in order to control access to what we called a critical region. So, if you remember last time, the threaded function, the one it’s the recipe that ten different dogs follow while they’re trying to get their work done. It looked like this. Sell tickets, token INT, agent, it took an INT star called non-tickets and non-tickets P and then it also took this semaphore, I call lock. Just to review, since this was kind of a fleeting comment in the last ten minutes of Monday’s lecture, the semaphore – it is more less like a – basically let’s say a synchronized counter variable that is always greater than or equal to zero, okay? And so if I construct a semaphore around the number one, and I levy a semaphore weight call against the semaphore, this as a function figures out how to atomically reduce this one to a zero. Okay, and so this is basically equivalent to the minus minus but it does the minus minus in such a way that it actually fully commits to the demotion of the number to one that’s one lower than it, okay? Semaphore signal on the same exact semaphore would actually bring this back up to a one, okay? If this were followed by a semaphore weight call followed by another semaphore weight call, then something more interesting happens, where this one right here decrements the one down to a zero. This one right here would have a very hard time.

Because semaphores at least in our library – this isn’t the case in all systems, but our semaphores are not allowed to go negative. So, when you do a semaphore weight against a zero variable, then this thread actually says oh, I can’t decrement that, at least not now. I need somebody else in another thread to actually plus plus this so that I can actually pass through a minus minus without making the number negative. Does that make sense to people? Okay, so programmatically, the implementation of semaphore weight is in touch with the thread library and so it actually when it’s attached to a zero behind the scenes, it immediately says ok, I can’t make any progress right now. It pulls itself off the processor. It records itself as something that’s called blocked and it puts it in this cue of threads are not allowed to make progress until some other thread signals a semaphore they’re waiting on, okay? Does that sit well with everybody? Okay, this is not constrained to go between one and zero. It can – this can be set to either be zero or one or five or ten. The only example we’ve seen so far is where the semaphore that’s coming in is initialized to surround the one because we really want it to function not so much as one as we want it to function as a true, and it’s basically a light switch that goes on and off, on and off, and on and off, and it’s used. And, we use semaphore weight and semaphore signal against that semaphore to protect access to the non-tickets variable that we have been addressed to.

So, in a nutshell, while it’s the case that true is true, I want this right here to be marked as a critical region. What that means is that I want to be able to do surgery on that non-tickets variable, without anyone else bothering it, okay? And the way you do that is to do a semaphore. I’ll spell it out here. Semaphore weight on the lock, you do the check to see whether or not non-tickets of P is equal to zero and if so, you don’t do the surgery. Okay, on the end, somebody else has done it a hundred times already and there is no reason to do it again. Otherwise, you want to go through and do this, that’s true surgery on what functions as a global variable, at least from the perspective of all the threads running this and then you want to release the lock. You might do some printing here, okay? You might sleep for a little bit. Down here there was an extra semaphore signal call against the lock to accommodate the scenario where the person who breaks out of the wild loop does so after securing the lock, so they actually have to release the lock kind of as they go outside the bathroom window, as the analogy I used on Monday, okay?

This is considered to be – I’m sorry, this right here is considered to be a critical region. It’s supposed to be something that when they’re inside there, they cannot have any other threads during their time slices mucking around with this type of stuff, okay? As an arbitrary thread actually gets here, given that this thing was initialized to one, and because every single semaphore weight call is balanced by a semaphore signal, it’s going to toggle up and down between zero and one. When a thread gets here, there is one of two scenarios. It’s staring at a one and so it actually successfully does the minus minus and is allowed to pass in here and do the work or the thread blocks on this. You may say, well, how would that happen? A thread could potentially block on this if there’s a zero, if another thread saw a one here decremented to a zero, made partial progress through this but the time slice ended before it got to the signal call. Does that make sense to people? Yes? No? Okay, just because a thread owns a lock doesn’t mean that it can’t be pulled off the processor. It might acquire the lock; get two-thirds of the way through this final instruction here, okay? And then be pulled off the processor so that other threads can actually say, oh maybe I can make some progress, but if they get this far, they’re still seeing a zero because of the thread that owns the lock hasn’t released it yet.

Okay, so as other threads hit this semaphore weight call and it surrounds a zero, they are pulled off the processor. That is kind of what you want. If they can’t do any meaningful work, you want the thread manager to say you can’t do any meaningful work. I’m not going to let you even use your full time slice and eventually it’ll get back to the only thread that can do work, which in particular would be the one that owns the lock right here, okay? Does that make sense? Okay, so there’s that right there. Typically, try to keep the critical regions as small as possible. If you’re going to lock down access to code, you don’t want to make it arbitrarily long. That’s basically like saying I want to do all the work in the world, so I’m just going to acquire a lock and I’m going to run this triple four loop. Okay, you only do that if you have to because it’s a critical region. This print up in particular, if it’s just logging information, it might not be imperative that you actually print to the console while you hold the lock, so you could release the lock and let other people make some progress, and then without holding the lock, just go ahead and print to this screen, okay? Does that make sense? Okay, there are a couple of other things. Somebody asked a very good question at the end of a lecture on Monday and I think I want to go over it. Some people were concerned with the case where non-tickets paid minus minus takes a one down to a zero, I don’t mean the lock, I mean the actual number of tickets and they thought that was the one problem we were worrying about.

The answer is that’s not the case and I can actually tell you a little bit more about what happens as threads get swapped off the processor and where all of their data gets stored, and show you that if the number of tickets is originally 100, there is as much of a race condition without the semaphore weight and semaphore signal calls in the minus minus bringing the 100 down to a 99, as there is in bringing a one down to a zero. So, this is what would happen and this is going to be the most important line to concern ourselves with. Forget about the fact that there are ten or 15 ticket agents. Just think about the scenario where there’s two. Okay, it is subdivided and this is the main stack frame. Okay, it’s the thing that sets up the two threads and caldronal threads. Let’s say that this is ticket agent one and this is ticket agent two. Okay, these little like tornadoes are actually stack frames, okay, that each of the threads has. So, each of these things right here have their own activation records for their own call to sell tickets, okay? Declare someone a name is the number of tickets variable that will set equal to a hundred. That make sense? Each of these stack frames stores a pointer to that one hundred, okay? Imagine the scenario where the semaphore weight and semaphore signal are not there. This is how the two ticket agents could each sell one ticket even though it’s only globally reported as a single sale, as opposed to the two tickets that were really sold. Coming down from here, la la la la, you see that it’s not equal to zero, so you go in and try to sell a ticket. You know that this type of instruction actually expands to quite a number of assembly code instructions. Okay, does that make sense to people? So, in a local register set, this R1 may be set to .2, that right there, or two may be set to 100, because you do R2 is equal to that of R1.

Okay, then maybe you go in and you do a minus minus on R2, and bring it down to a 99, but now you’re swapped off the processor. So, you’ve actually committed to the sale of a ticket, right? But, you’re swapped off the processor. What happens is that the entire register state, all 32 registers, including the PC and the SP and the RV registers, if this is the binary state of all those registers, it’s actually copied to the bottom of the thread that’s being swapped out, little stack frame. Okay, and is that image that I just drew, that is used to restore the register set for that thread when it gets the processor back, okay? Embedded inside this image is the 99 that’s going to be used to flush back to this space right here. Does that make sense? Okay, but this has just left the processor and so this thing does exactly the same thing with the register set as if it owns it. It sets R1 equal to that right there. The 100 still resides there because we didn’t successfully flush back. We didn’t get to the point where we actually update it to be the decremented value. So, R2 gets set equal to 100, gets set equal to a 99. This one is prepared to flush with a 99 back to the global space when this thing gets the processor back; it’s going to flush a 99 back to the same space. So, this 99 is designed to override a 100. So, is this one, but unless you have semaphore weight and semaphore signal in place the way we do right here, one of the 99s is going to override a 100 and one of the 99s is going to overwrite the other thread’s 99. Does that make sense to people the way I’m saying that?

Yes? No? You gotta nod your head, okay. If you put that there, and you put that there to balance it, and basically unlock the door, then only one thread is allowed to go through and actually pull the global value into the local register, decrement it locally, and then flush it back out to the global integer, the thing that functions as a global integer, because everybody has a pointer to the same integer. Before any of the thread is allowed to do any part of that, okay, so that is what I mean when I say that this is more or less committed to atomically, okay? Now, that is the overarching principle that is in place when you have threads, and in particular, you have threads accessing shared information, okay? This is the programmatic equivalent of the two Wells Fargo ATM machines where me and my best friend try to take out the last remaining $100.00 in my account at the same time, thinking we’re going to get $200.00. Okay, make sense? Okay, if I were to initialize this semaphore to zero, then I would actually block all threads from entering this critical region right here. Okay, so I’d get deadlock. If I initialize the semaphore not to one but to two, that’s as bad in principle as initializing it to ten, because you don’t want any more than one thread in this region at any one time, okay. Does that sit well with everybody? Okay, good.

Okay, so let me move on and give you another example using threads and a different way to use semaphores. The handout actually uses global variables more than I like to, but this next example, I am going to use globals just so the code matches up a little bit more cleanly with the handout version. I actually like it better if you declare all of the shared variables in main and pass addresses to them, to the threads, because at least everything has a scope to it, whereas globals, it’s a free for all and for two and a half quarters, we have been saying globals are awful. Oh, except for when it’s convenient, okay, and I don’t like that. But, this one next one, I want to frame it in terms of globals. I’m trying to model right now the Internet, where in all the world there’s only server that serves up all the web pages and you have the only other computer with the only browser in the world. Okay, I know you know enough about the HTML server process. You may not know all the mechanics at the low level, but fundamentally, you know that you request a web page from another server. It serves up text in the form of HTML normally. It could be XML, but normally it’s HTML, and as the HTML comes over, it does a partial and eventually a full rendering of the HTML in your browser page. That make sense?

You know and you felt this before where a page is loaded like 70 percent but it’s not quite done yet, you see the progress bar at the bottom, the bottom right, where it’s like three-fourths of the way through and you know there’s more to come. That’s usually because the server has only delivered 75 percent of the content and so this thing has to block in much the same way that the threads up there block, this has to block and stop its rendering process until it gets more data from the server. Does that make sense? So, just use that as a guiding principle for this example. I’m going to insult all the Internet and I’m going to reduce it to a character buffer of size eight, okay? And what I want to do is I want to write a program that simulates the writing and the reading process and I’m just going to reduce the server to something that has to populate that buffer as a ring buffer. In other words, it’s going to write 40 bytes of information, but it’s going to cycle through the same array five to five times and I’m going to write another thread that consumes the information by cycling through the same array five times and digesting all the characters that are written there. Does that make sense to people? Okay, so the main function – I don’t care about those. I have to emit all threads, I’m sorry, that’s not right. It’s the hybrid of two functions. I want to emit thread package of [inaudible] meaning I don’t care about the debug information, and then I want to do this. I want to call thread new twice. I’m going to give them both names. This one is going to be called Writer. This one’s going to be called Reader. Okay, I’m going to call the function writer and I’m going to call the function reader. I only have one instance of each one and neither one of them takes any arguments. Okay, it doesn’t need to take arguments if you use global variables. Then, I do this. Run all threads. This is somewhat pathetically, but on the well intentioned – it is trying to emulate the fact that the server and the client as computers are both running at the same time.

Okay, programmatically I want the writer thread to cycle through and write 40 characters to the Internet. Okay, I want this reader thread to consume the 40 characters that are written in the Internet. Okay, this is what the writer function looks like at the moment. Four INT I is equal to zero, I less than 40, I plus plus, da da da, what I want to do is with these iteration, I want to call some function. I’ll assume it’s thread safe. Prepare random car and then I want to write to buffer of IMOG eight, whatever, let’s give that variable a name, C. Whatever C happened to become bound to and so as an isolation function, I think you can look at this and understand that it’s going to write down random characters in this loop over the buffer five times. Okay, I write this with hopes that it writes data down before the reader consumes it but it doesn’t go so far that it clobbers data that has yet to be read. Does that make sense to people? Okay, let me write the reader, which has the same exact structure: I less than 40, I plus plus, there you have that. What I want to do is I want to basically do this and then I want to basically like, you know, process car, which I don’t care about the details of what process does. This is the consumption line. This is the thing that takes a meaningful piece of data in shared space and brings it into local space, so it kind of owns it. It can do whatever it wants to with it. Okay, let me draw the Internet. That was easy. There you have it. Now, you know, without concurrency, you know exactly how you want writer and reader to behave, so that everything is preserved and the data integrity is respected, and that the reader processes all the character data in the order that the writer writes it.

So, think about the scenario where the writer gets to run first and its first several time slices, it writes those three characters down. Okay, and internally it has a variable of I that is associated with that index, so that’s where it’ll carry on next time. Okay, but it writes those three variables down. And, then the reader gets a time slice and for whatever reason, process character is a little bit more time consuming. It actually has to like open a file or a network connection or whatever it has to do, just pretend that it actually is slower – its [inaudible] is slower, so it only really consumes that A. It doesn’t really remove the A, but it just consumes it so it doesn’t matter what’s there anymore. Okay, so this is where the writer will pick up and this is where the reader gets swapped off, okay? I think it’s pretty clear that if the writer is able to make more progress per time slice than the reader, then there’s the danger that this might happen. And that on the very next iteration, it gets far enough in its time slice that it overwrites data that the reader has yet to deal with. Does that make sense? Okay, you can’t have that obviously. Now, clearly, I’m simplifying things here, but the idea that someone is providing content and someone else is digesting it, that’s not an unfamiliar one with large systems. It’s also in theory possible. Just because I spawn off and set up writer to be the first thing that runs and this the second thing that runs, it might be the case that the reader gets the processor first. In which case, it will be digesting information that has never even been written or created. Does that make sense?

So, what I want to do. I want to create Internet so I can put some more global variables here. I have to make sure that the writer never gets so far ahead that it’s clobbering data that has yet to be consumed. I have to also make sure that the reader never gets – never catches up or passes the writer and consumer information programmatically that isn’t really there. Does that make sense? Okay, so what I could do is I could introduce two global integers and have semaphores that walk them down, but I’m actually going to use semaphores a little bit differently. I’m going to declare two semaphores here. I’m going to call one empty buffers and I’m going to call one full buffers. And, I’m going to let them actually manage integers that are always, almost always, but we’re going to pretend always, are always in sync with the number of slots that can be written to and the number of slots that can be read from. Okay, I also want to enforce that the writer is also just a little bit ahead of the reader in terms of thread progress and that the reader can get and catch up to the writer, but it can’t pass them, and that the writer can’t get so far ahead of the reader that he actually is more than a cycle ahead of him. That make sense?

Okay, so what I want to do, I’m not going to do the semaphore new column. I’m just going to say that this is going to be initialized to eight as a semaphore. That is not the syntax floor but that’s conceptually what I want to happen. Okay, I want to mention that up front that there are absolutely no full buffers whatsoever. Okay, make sense? I’m going to change this function right here to do this. Now, this is a slightly different pattern with the semaphores, but I think it’s really fun. Before I go ahead and write to this buffer, I better make sure that I’m allowed to do that, okay? What I’m going to do is I’m going to semaphore weight on empty buffers. Now, initially, empty buffers is equal to eight, which is consistent with the fact that we don’t care if the writer makes a lot of initial progress. Okay, but if for whatever reason and the writer makes so much more progress than the reader that he gets really far ahead, this eight will have been demoted to a seven, to a six, to a four, to a two, to a one, and it really is just about the clobbered data that has yet to be consumed. It will be waiting on something that will have been demoted so many times that it’s actually zero, okay? So, it will be a victim of its own aggression and it will be blocked out and be pulled off the processor, so that the reader can actually do some work. Okay, the balance here is a semaphore signal call but it’s not against the same semaphore. After you write something down, you want to communicate to the reader that there is even one more piece of information that it’s allowed to consume. So, I’m going to wait for something to be empty. I’m going to change from empty to full and I’m going to signal the full buffer semaphore, okay? The pattern over here is somewhat symmetric.

Let me rewrite it, is that I want to do the same thing, semaphore weight, but I want to wait for there to be a full buffer. When I know that there’s at least one and I pass it that semaphore weight call, I can consume the character that is in global space and pull it down and then after I bring it to local space, I can immediately tell the writer that it’s okay to write there if they’re waiting, and then process car pass the C. Okay, there’s that. Whoops, so it’s like each thread has a little buzzer. Each of them are twittering each other as far as when they’re allowed to proceed to read or write information. Does that make sense? This right here is sending a little buzzer that allows that to execute and return with much more likelihood. This right here is really communicating to the thread at that point and promoting full buffers so that the writer can actually write down more data, if it was previously blocked. Does that make sense? Okay, so think about what happens now. Empty buffers is eight, full buffers is zero. That means the writer has all of this free space to write to. It’s going to have a very easy time passing through the semaphore weight call initially. Whole buffers is zero. The reader thread is bumming because the very first thing it has to do is semaphore weight on something that is set to zero. So, imagine the scenario where the reader actually gets the processor first. It’s going to execute this much. It’s going to declare I, it’s going to consider to zero. It’s going to pass the test. It’s going to come down here and it’s going to be immediately blocked from this line right here because it’s going to be waiting on something that is in fact zero.

Okay, so the reader thread is actually being blocked right up front just like we want it to be. Okay, the other scenario is that the writer thread really fast and very efficient, it actually cycles through this thing eight times and then it hits a wall. Okay, so pair the character out before it actually went to bother on waiting on the lock, but – and it blocks here, it’s because it’s been a processor hog and it’s actually done a lot of work whereas the reader hasn’t really been able to do much at all. Okay, or at least comparatively. That make sense, people? Okay, so the ticket agents example where it uses a semaphore weight and a balance semaphore signal on exactly the same semaphore, and it brackets this thing called a critical region, that semaphore pattern or that semaphore is being used as a binary lock. Okay, binary meaning it’s toggling between zero and one, true and false; however we want to think about it. That’s not the pattern that’s being used here. We certainly have thread communication. We use the semaphore for rudimentary thread communication, okay, but right here what’s happening is we’re actually using these as basically two telephone calls. Okay, between the two threads, okay, this one calls this one whenever it can make more progress. This one calls this one whenever the writer can make more progress.

That is a pattern; it’s what called a rendezvous pattern. Like I’m syncing up with you, that kind of thing, okay? There are more complicated examples of this. This is what’s called binary rendezvous which really just – one thread to one thread communication. This basically says as this type of semaphore weight call means I cannot make any progress until some other thread makes some required amount of progress in order for me to move forward. This thing does the same thing on behalf of this semaphore. This says that I have to wait for some other thread that makes enough progress in order for me to pass, okay, or else the work I will be doing will be meaningless, okay? Make sense? Okay, so what I want to do is I want to just experiment. What happens if I make that a four? It doesn’t change the correctness of the code or it depends on how you define correctness, but you will not get deadlock, okay? And you will not have any integrity data issues, all you’re constraining is that the writer and reader stay within more of a delta of one another than they would have been able to otherwise. When it was an eight, it allowed some more degrees of freedom. It allowed the writer to go much further ahead if that’s just the way thread scheduling worked out. When I made it a four, it just means that the writer can be no more than half of an Internet ahead of the reader, okay? Does that make sense? If I do that right there, I’m really pushing the limits of what’s useful from a threading standpoint. If I’m going to do that, I also will just actually write the reading and writing in the same function and have it alternate between read and write, but if I really let these two threads run with those two initial values, all that’s going to need – this is my W finger, this is my read finger. It just means it’s going to run like this. Okay, does that make sense to people? And really if it tries to like run forward two slots, it’ll be blocked by a semaphore weight call.

Okay? If I do this and I have a different form of deadlock, but deadlock is deadlock. I have a reader saying I can’t do anything because I have no place to read from. The writer says well, I can’t do anything because I have no place to write to. Okay, so you would have deadlock. You look at that and you say I would never do that, yes, you would. You just have, like when you’re writing down all of the semaphore values, maybe you have like 20 semaphores in a real program, it’s very easy for you to cut and paste a zero in place where you really wanted a one or a four or an eight, okay? So, if you have deadlock and you’ve never had that before, maybe you have because you’ve been in some wild true loop, but that’s not the same thing. You really are making progress, you just don’t see it. With threads, if you have deadlock, everything seemingly stops. You get nothing published to the console at all. It doesn’t return. You don’t get your command line prompt back, so things just expand and then you go okay, that probably means that two threads are waiting on each other.

Okay, or that nobody released a lock or something like that. Okay, if I do this, just think about whether that’s damaging or not. You may think that initially [inaudible] should be more than full buffers. Let me do this. You could say well, I just want to kind of constrain full buffers plus empty buffers to always be eight. Okay? But if you do that, that actually allows the reader thread to get one hop ahead of the writer. Okay, so that’s a kind of contrived example, but nonetheless that’s exactly what it will be permitted to do. It doesn’t mean it would actually happen, but it means programmatically it’s possible. Another scenario is when I get this one right, but I do something like that. Okay, you may think that you’re limiting things because you have semaphores in both directions but that thing has to be between one and eight for it to be programmatically correct. To put a 16 means that the writer is allowed to make two loops and take two tracks or two loops on the reader thread and that’s not allowed. It’s supposed to be at most eight slots ahead of, not 16 slots ahead of the reader, okay? Now, I had one and four and eight there before, my argument is that it should be the eight. Okay, if you have multiple options as to what you can initialize your semaphores to be, you always error on the side – although error is not the right word – you always kind of move toward the decision that grants the thread manager the most flexibility as to how he schedules threads, okay? And it also improves the likelihood that every single thread will be able to use all of its time slice. Okay, to the extent that you artificially constrain the threads, if you were to make that eight a one again and you get this again, it probably means that each thread is being hiccupped and pulled off a thread prematurely. I’m pulled off the processor prematurely. Does that make sense? Okay, and so you usually try to maximize throughput and you choose your semaphore values accordingly. Okay, does that sit well with everybody? Yeah?

Student:[Inaudible] semaphore weight?

Instructor (Jerry Cain):Which one is this? This is on the semaphore you mean?

Student:So, you called your semaphores, so now [inaudible] semaphore weight, so –

Instructor (Jerry Cain):That’s actually not. You mean you call semaphore here before you wrap around and wait on empty buffers?


Instructor (Jerry Cain):Well, this one never waits on full buffers. That one does.

Student:[Inaudible] signal before you call semaphore weight?

Instructor (Jerry Cain):Which semaphore weight are you talking about? Oh, I see what you’re saying. In other words, if the reader doesn’t agree with the processor, what happens here – is it the writer just happens to go first? It brings an eight down to a seven and it promotes a zero up to a one. But that’s okay because it really is one slot ahead of where the reader is. The reader hasn’t even started yet.

Okay, and so if this makes, let’s say four full iterations and it brings empty buffers down up to four and full buffers down to four, that’s fine, because if it gets swapped with the processor here, this thing just discovers a world that it’s born into – it says wow, there are four characters I can read right away. Okay, and so it doesn’t matter that it hasn’t blocked – it hasn’t called weight yet.

If it calls weight it only means weight if full buffers is zero. Otherwise, it just means decrement. Does that make sense? The words weight and signal are really, I think, were adopted with the binary lock metaphor in mind. I also hear when the thing is really a lock, I’ll hear acquire and release as the verbs. Some versions of thread libraries actually define a lock type that things that lock acquire and lock release, which are really just the wrappers for semaphore weight and semaphore signal with the understanding that they’re protecting ones and zeroes, okay? But it’s not like this thing has to wait on that before this thing is allowed to signal it. Okay, sometimes you arrive at the bathroom and the door is open already. Okay, it just happens. Make sense to people?