1 00:00:00,390 --> 00:00:05,260 Hello and welcome to Chapter Ten of Python for Informatics, the chapter on tuples. 2 00:00:05,260 --> 00:00:09,960 I'm Charles Severance. I'm your lecturer and I'm the author of the textbook. 3 00:00:09,960 --> 00:00:13,910 As always, this material is copyright Creative Commons Attribution, 4 00:00:13,910 --> 00:00:18,260 including the video lectures, the slides, and the book. 5 00:00:19,600 --> 00:00:24,440 So tuples are the third kind of collection that we've talked about. 6 00:00:24,440 --> 00:00:25,610 We've talked about lists and 7 00:00:25,610 --> 00:00:26,910 we've talked about dictionaries. 8 00:00:26,910 --> 00:00:29,810 And in the dictionary lecture, we kind of alluded to tuples. 9 00:00:32,170 --> 00:00:34,250 We don't have to talk too much about tuples, it's really 10 00:00:34,250 --> 00:00:37,238 shortening the lecture by telling you that they're a lot like lists. 11 00:00:37,238 --> 00:00:42,890 They're a, non-change, they're a non-changeable list. 12 00:00:44,890 --> 00:00:49,140 And, and the syntax of them is 13 00:00:49,140 --> 00:00:51,490 pretty much the same as a list except 14 00:00:51,490 --> 00:00:55,570 that we use parentheses instead of square brackets. 15 00:00:55,570 --> 00:00:57,560 Okay? And so 16 00:00:57,560 --> 00:00:59,960 like here is a three-tuple, a tuple 17 00:00:59,960 --> 00:01:02,580 with three items in it, Glenn, Sally, and Joseph. 18 00:01:02,580 --> 00:01:06,300 They are numbered zero, zero, one, and two. 19 00:01:06,300 --> 00:01:09,810 So the second thing is one. So x sub two is indeed Joseph. 20 00:01:11,670 --> 00:01:17,490 You know, we can pass them in as sequences to things like max or min or sum. 21 00:01:18,820 --> 00:01:22,120 And so the maximum of 1, 9, 2 is 9. 22 00:01:22,120 --> 00:01:23,570 And we can loop through them. 23 00:01:23,570 --> 00:01:26,350 So here is y, it's a tuple. 24 00:01:26,350 --> 00:01:32,570 It's 1, 9, 2. And iteration is going to through the three, three values, right? 25 00:01:32,570 --> 00:01:35,950 And so it's going to print out 1, 9, 2, which runs the intended code once for 26 00:01:35,950 --> 00:01:38,540 each of the values inside the tuple. 27 00:01:38,540 --> 00:01:41,360 And so in this respect they're very much like lists. 28 00:01:42,390 --> 00:01:47,210 But they're also different than lists in some real valuable ways. 29 00:01:47,210 --> 00:01:49,970 Tuples are immutable, and so if you recall 30 00:01:49,970 --> 00:01:52,900 when we talked about lists, we compared them to strings, because 31 00:01:52,900 --> 00:01:55,720 both lists and strings are a sequence of elements where 32 00:01:55,720 --> 00:01:58,082 the first one is zero, one, two, and et cetera. 33 00:01:59,330 --> 00:02:01,230 But if we, if we look at a string for 34 00:02:01,230 --> 00:02:05,300 example, and we have a three-character string A, B, C. 35 00:02:05,300 --> 00:02:08,910 And we want to change the third character, y sub two, to D, 36 00:02:08,910 --> 00:02:12,490 it complains and says, no, you can't do that. 37 00:02:12,490 --> 00:02:15,850 But you can do it on a list, so if we have a list 9, 8, 7, and we 38 00:02:15,850 --> 00:02:18,260 say x sub two is 6, which is the third 39 00:02:18,260 --> 00:02:21,650 item, then the third item changes from 7 to 6. 40 00:02:21,650 --> 00:02:22,590 Okay? 41 00:02:22,590 --> 00:02:28,380 So this is mutable. This is not mutable. 42 00:02:31,370 --> 00:02:35,650 And tuples are also like not, are not mutable. 43 00:02:35,650 --> 00:02:41,830 They're like strings, they're sort of like lists, in terms of what they can store. 44 00:02:41,830 --> 00:02:44,730 But they're like strings in the fact that they can't be changed. 45 00:02:44,730 --> 00:02:49,090 So here we create a three-tuple, a three-item tuple, and we try to change 46 00:02:49,090 --> 00:02:55,690 the third thing from 3 to 0 and it says you can't do that, not mutable. 47 00:02:55,690 --> 00:02:56,310 Okay? 48 00:02:56,310 --> 00:03:00,460 So, so it's kind of like lists in the kind of data that we store them, we 49 00:03:00,460 --> 00:03:02,590 store in them, and it's kind of like strings in that 50 00:03:02,590 --> 00:03:04,716 you can't change them once you create them. 51 00:03:04,716 --> 00:03:08,770 So this parentheses, this constant, is the moment of creation. Once 52 00:03:08,770 --> 00:03:11,670 you've put the things in, you can't fiddle around with them. 53 00:03:12,870 --> 00:03:14,680 There's a bunch of other things you can't do with tuples. 54 00:03:14,680 --> 00:03:17,570 You think why am I even, why even use tuples? 55 00:03:17,570 --> 00:03:19,050 We'll get to that in a second. 56 00:03:19,050 --> 00:03:21,360 So here is a three-tuple 57 00:03:21,360 --> 00:03:23,490 with the numbers 3, 2, 1. 58 00:03:23,490 --> 00:03:27,480 You can't sort it, because if you sorted it, that would change it. 59 00:03:27,480 --> 00:03:28,750 You can't add to it. 60 00:03:28,750 --> 00:03:32,440 You can't append the value 5 to the end of it, because that would change it. 61 00:03:32,440 --> 00:03:35,950 And you can't reverse it. So, none of these are allowed. 62 00:03:37,530 --> 00:03:42,510 Those are things you can do with lists, but you can't do with tuples. 63 00:03:42,510 --> 00:03:44,310 And you can read the documentation, but we 64 00:03:44,310 --> 00:03:46,540 can also use that built-in dir function, that really 65 00:03:46,540 --> 00:03:50,130 awesome dir function, where we make a list and we say hey, Python, 66 00:03:50,130 --> 00:03:53,200 what will you let me do with lists? 67 00:03:53,200 --> 00:03:55,930 Well, you can append, count, extend, index, 68 00:03:55,930 --> 00:03:59,470 insert, sort, reverse, remove, pop. Lots of things. 69 00:03:59,470 --> 00:04:02,980 Now we make a tuple and say hey, Python, what can we do with tuple? 70 00:04:02,980 --> 00:04:05,150 Well, you can do a count or an index, 71 00:04:05,150 --> 00:04:07,120 which means you can't do all these other things. 72 00:04:07,120 --> 00:04:09,980 So this is sort of a, a very much a reduction. 73 00:04:12,570 --> 00:04:15,130 because everything you can do with tuples, you can do with lists. 74 00:04:15,130 --> 00:04:17,990 But not everything you can do with lists, you can do with tuples. 75 00:04:17,990 --> 00:04:19,350 So why? 76 00:04:19,350 --> 00:04:22,650 Why did I just waste all this time introducing tuples? 77 00:04:22,650 --> 00:04:24,370 All they are is have parentheses. 78 00:04:25,750 --> 00:04:26,410 What good are they? 79 00:04:26,410 --> 00:04:29,160 Well, it turns out that they're much more efficient. 80 00:04:29,160 --> 00:04:30,980 Because Python doesn't have to deal with 81 00:04:30,980 --> 00:04:34,100 the fact that we, as programmers, might change them, 82 00:04:34,100 --> 00:04:37,790 Python can make them quicker, they can use less memory, 83 00:04:37,790 --> 00:04:41,610 all kinds of things that save a lot of processing time in Python. 84 00:04:44,120 --> 00:04:45,230 So when would you use a tuple? 85 00:04:45,230 --> 00:04:46,420 Well, in particular, if you're going to 86 00:04:46,420 --> 00:04:50,110 create some list that you're never changing, we prefer to use tuples. 87 00:04:50,110 --> 00:04:52,140 And there's a lot of situations in 88 00:04:52,140 --> 00:04:55,820 programming where we create what we think of as a temporary variable. 89 00:04:55,820 --> 00:04:59,590 And if we're going to use, create it, use it, and throw it 90 00:04:59,590 --> 00:05:04,525 away without ever modifying it, we prefer tuples in those kinds of situations. 91 00:05:04,525 --> 00:05:08,920 Okay? So we prefer tuples when we create things that are just temporary. 92 00:05:08,920 --> 00:05:10,840 It's the fact that they're temporary variables. 93 00:05:10,840 --> 00:05:14,430 They're like temporary lists. Because they're efficient. 94 00:05:14,430 --> 00:05:15,680 They're quick to make and they're quick to 95 00:05:15,680 --> 00:05:17,410 get rid of, and they're quick to go through. 96 00:05:20,120 --> 00:05:22,184 Now, another really neat thing about Python that I 97 00:05:22,184 --> 00:05:24,740 really like is that is the fact that you can 98 00:05:24,740 --> 00:05:28,380 do sort of two assignments in one by putting a tuple on 99 00:05:28,380 --> 00:05:31,400 both the left and the right-hand side of your assignment statement. 100 00:05:31,400 --> 00:05:33,230 So if we think about an assignment statement, 101 00:05:33,230 --> 00:05:35,070 I like to think of it as having a direction. 102 00:05:35,070 --> 00:05:37,210 It says, these things go there. 103 00:05:37,210 --> 00:05:42,260 Well, in Python, you can actually send two things at the same time. 104 00:05:42,260 --> 00:05:45,140 The 4 goes into the x and the fred goes into the y. 105 00:05:45,140 --> 00:05:47,310 This is a tuple. This is a tuple. 106 00:05:47,310 --> 00:05:50,420 You, you cannot have constants on this left-hand side. 107 00:05:50,420 --> 00:05:52,810 You can have variables or constants on the, or expressions 108 00:05:52,810 --> 00:05:56,780 on the right-hand side, but this must be two variables. 109 00:05:56,780 --> 00:06:02,220 Similarly, in this, the 99 goes into a and the 98 goes into b. 110 00:06:02,220 --> 00:06:04,040 Now, it turns out that you can 111 00:06:04,040 --> 00:06:07,920 syntactically eliminate the parentheses if you really want. 112 00:06:07,920 --> 00:06:10,170 And so this leads to a prettier syntax, 113 00:06:10,170 --> 00:06:10,420 I think. 114 00:06:10,420 --> 00:06:14,810 It's the exact same thing with or without parentheses, where we basically just say, 115 00:06:14,810 --> 00:06:20,530 hey, come back, a and b are assigned to the tuple 99, 98. 116 00:06:20,530 --> 00:06:23,010 And so you can eliminate the parentheses as long 117 00:06:23,010 --> 00:06:25,570 as it's very clear what's going on in the tuple. 118 00:06:25,570 --> 00:06:27,310 And so this, this might be a little disquieting 119 00:06:27,310 --> 00:06:29,670 when you first see it, but it's just a 120 00:06:29,670 --> 00:06:35,280 tuple with no parentheses and the 99 goes to the a and the 98 goes to the b. 121 00:06:35,280 --> 00:06:37,900 Now, it turns out we already did this. 122 00:06:37,900 --> 00:06:42,980 I sort of blew by this in the previous lecture in dictionaries 123 00:06:42,980 --> 00:06:45,045 because it allows us to go through the 124 00:06:45,045 --> 00:06:48,350 dictionary's keys and values with two iteration variables. 125 00:06:50,630 --> 00:06:53,270 And so, if you remember, here is a dictionary. 126 00:06:53,270 --> 00:07:01,868 We put two items into it and and we can call d.items and get a list 127 00:07:01,868 --> 00:07:06,480 of tuples, a list of two-tuples. 128 00:07:06,480 --> 00:07:10,025 Two-tuples are a quick way of saying a tuple with two things in it. 129 00:07:10,025 --> 00:07:14,510 It's a two-element list that consists each element is a two-tuple. 130 00:07:14,510 --> 00:07:17,084 And it's the key and the value, 131 00:07:17,084 --> 00:07:21,220 key and the value. And so if we just print this out, it's a list. 132 00:07:21,220 --> 00:07:26,500 So then when we put this on a for loop, it 133 00:07:26,500 --> 00:07:32,180 is a list, but the things inside the list are each a tuple. 134 00:07:33,480 --> 00:07:35,770 Each thing inside the list is a tuple. 135 00:07:36,830 --> 00:07:40,700 So, when this iteration variable goes to there, 136 00:07:40,700 --> 00:07:41,410 it is like 137 00:07:41,410 --> 00:07:45,490 this tuple is being assigned into k,v. Which means the key, 138 00:07:45,490 --> 00:07:49,000 key goes into k and the value goes into v. 139 00:07:49,000 --> 00:07:52,090 The name I picked for k and v don't matter, do not matter. 140 00:07:53,090 --> 00:07:55,150 It's just, it's just the first the first one and the second one. 141 00:07:56,860 --> 00:08:01,920 So k go, k and v point here. Then the next time through the loop, k 142 00:08:01,920 --> 00:08:06,104 and v point here. And so that's 143 00:08:06,104 --> 00:08:11,000 how csev 2 and Chen Wen 4 happen. 144 00:08:11,000 --> 00:08:14,960 And so this is really a tuple assignment or a tuple iterating 145 00:08:14,960 --> 00:08:20,790 through a list of tuple iteration variable or a pair of iteration variables 146 00:08:20,790 --> 00:08:23,170 walking through the list, okay? 147 00:08:24,970 --> 00:08:28,730 We don't do this a lot, and it's really quite, it's most heavily 148 00:08:28,730 --> 00:08:32,400 used for this situation where you're going through a dictionary and you want to see 149 00:08:32,400 --> 00:08:34,060 both the keys and the values. 150 00:08:34,060 --> 00:08:37,530 And then you use this method inside of dictionary called d.items. 151 00:08:38,850 --> 00:08:40,790 Another thing that's cool about tuples 152 00:08:40,790 --> 00:08:47,786 are that they're comparable. So less than, greater than, equals. 153 00:08:47,786 --> 00:08:55,020 And so, they look, they first compare the first, leftmost, thing, then 154 00:08:55,020 --> 00:08:57,500 if that matches, they go to the second one, and then if that one matches, 155 00:08:57,500 --> 00:08:59,020 they go to the third one. 156 00:08:59,020 --> 00:09:02,638 And so if we're asking, is (0, 1, 2) less than (5, 1, 2). 157 00:09:02,638 --> 00:09:03,810 And the answer is True. 158 00:09:03,810 --> 00:09:07,850 And it only looks at the 0 and the 5, and that's less than, so away we go. 159 00:09:08,880 --> 00:09:13,110 If we ask is (0, 1, 2000000) less than (0, 3, 4)? 160 00:09:13,110 --> 00:09:16,700 Well, 0 and 0 match, so it goes to the second one. 161 00:09:16,700 --> 00:09:19,650 1 and 3, well they don't match and they're less than, 162 00:09:19,650 --> 00:09:22,830 so 1 is less than 3, so it 163 00:09:22,830 --> 00:09:25,530 so it's True and it doesn't even look at these numbers because it doesn't have to. 164 00:09:25,530 --> 00:09:26,000 Right? 165 00:09:26,000 --> 00:09:28,280 In this one, it doesn't look at those numbers. 166 00:09:28,280 --> 00:09:34,700 And now if we say, come here, is Jones, Sally less than Jones, Fred? 167 00:09:34,700 --> 00:09:37,800 Well, it compares this and they're equal. 168 00:09:37,800 --> 00:09:41,760 So then it has to look to the second one: is Sally less than Fred? 169 00:09:41,760 --> 00:09:44,730 Well, no, because S is not less than F. 170 00:09:44,730 --> 00:09:50,700 And so that answer is False. Is Jones, Sally 171 00:09:50,700 --> 00:09:54,680 greater than Adams, Sam? Well Jones is greater than Adams, so it 172 00:09:54,680 --> 00:09:57,950 never looks at these variables, and that turns out to be True. 173 00:09:59,360 --> 00:10:01,950 So these are comparable. 174 00:10:01,950 --> 00:10:05,220 Which means we can use the less than, less than or equal to, 175 00:10:05,220 --> 00:10:09,320 greater than or equal to, equal to, or not equal to. 176 00:10:09,320 --> 00:10:12,980 So we can use these operators on whole tuples. 177 00:10:12,980 --> 00:10:15,470 Now this turns out to be quite nice, 178 00:10:15,470 --> 00:10:20,230 because things that can be compared can also be sorted. 179 00:10:21,370 --> 00:10:22,840 Okay? 180 00:10:22,840 --> 00:10:23,735 So here is 181 00:10:23,735 --> 00:10:28,310 [COUGH] a, b, and c. a maps to 10. 182 00:10:28,310 --> 00:10:30,440 b maps to 1. c maps to 22. 183 00:10:30,440 --> 00:10:32,630 If I look at d.items, 184 00:10:32,630 --> 00:10:36,770 I get back a list of two-tuples, three two-tuples. 185 00:10:36,770 --> 00:10:40,690 They are not sorted because dictionaries 186 00:10:40,690 --> 00:10:42,720 aren't sorted. a maps to 10, 187 00:10:42,720 --> 00:10:44,190 c maps to 22,and b maps to 1. 188 00:10:44,190 --> 00:10:49,400 The order that these come out in is not something that we can control. 189 00:10:49,400 --> 00:10:52,480 But if we put these items into a variable, 190 00:10:52,480 --> 00:10:56,830 call it t, t is the list of tuples basically, 191 00:10:56,830 --> 00:11:00,790 and then we tell it to sort, it can do comparisons between all these. 192 00:11:02,820 --> 00:11:05,670 And it can sort them and now they're sorted 193 00:11:05,670 --> 00:11:08,380 in key order: a, b, c. 194 00:11:08,380 --> 00:11:10,130 Now you'll never get any keys that match 195 00:11:10,130 --> 00:11:12,150 so it never looks at the second one, right? 196 00:11:12,150 --> 00:11:15,840 Because there's one and only one key a or b or c. 197 00:11:15,840 --> 00:11:20,148 The value 10 never gets looked at. So this ends up sort by keys. 198 00:11:20,148 --> 00:11:27,860 Sort by keys. Okay, so this is the way to sort by keys. 199 00:11:29,280 --> 00:11:30,750 We take a dictionary. 200 00:11:30,750 --> 00:11:35,730 We get back a list of tuples, key-value tuples, then we sort that dictionary. 201 00:11:35,730 --> 00:11:40,120 I mean, sort that list of key-value tuples. And then, it's sorted by key. 202 00:11:40,120 --> 00:11:41,590 Okay? So that's one sort. 203 00:11:43,370 --> 00:11:46,130 There is a built-in function in Python 204 00:11:48,998 --> 00:11:53,270 called sorted, which takes as a parameter a list, 205 00:11:53,270 --> 00:11:56,140 and gives you back a sorted version of that list. 206 00:11:56,140 --> 00:12:00,090 So we can collapse these operations by saying, oh, 207 00:12:00,090 --> 00:12:04,230 well d sub items is this list of tuples non-sorted. 208 00:12:04,230 --> 00:12:09,030 But sorted of d sub items is that same list of tuples, but then sorted. 209 00:12:09,030 --> 00:12:14,090 So immediately in one step we have 210 00:12:14,090 --> 00:12:16,500 a, b, and c properly sorted. 211 00:12:16,500 --> 00:12:19,580 And we can combine into all this into one nice little for 212 00:12:19,580 --> 00:12:23,990 statement, where we say for k, v in sorted of d sub items. 213 00:12:23,990 --> 00:12:29,250 So this is now going to first sort the key-value pairs by key. 214 00:12:29,250 --> 00:12:32,040 And then k, v is going to run through them, so k's 215 00:12:32,040 --> 00:12:36,160 going to be a, 10. Then it's going to, k's going to be b, 216 00:12:36,160 --> 00:12:38,120 v is going to be 1. k is going to be c, 217 00:12:38,120 --> 00:12:39,930 v is going to be 22. 218 00:12:39,930 --> 00:12:44,590 So now we've printed these things out in alphabetical key order. 219 00:12:44,590 --> 00:12:45,450 Okay? 220 00:12:45,450 --> 00:12:48,350 So by adding sorted to d.items, that means that 221 00:12:48,350 --> 00:12:52,130 this loop is going to run in key-sorted order. 222 00:12:54,150 --> 00:12:55,090 Key-sorted order. 223 00:12:56,300 --> 00:13:00,810 And that's because sorted takes a list and then returns, as a, 224 00:13:00,810 --> 00:13:04,679 takes a list as unsorted list as input and returns a sorted list. 225 00:13:07,440 --> 00:13:07,940 Okay? 226 00:13:09,820 --> 00:13:15,070 Now, if we are doing something like our common problem of what's the most common word, 227 00:13:15,070 --> 00:13:19,070 what if we want to say, what's the five most common words? 228 00:13:19,070 --> 00:13:21,590 In that case, we probably want to sort in 229 00:13:21,590 --> 00:13:25,500 descending order by the values, not the key. 230 00:13:26,550 --> 00:13:27,050 Okay? 231 00:13:28,830 --> 00:13:30,890 So we want sort by the values instead of the key. 232 00:13:32,220 --> 00:13:36,710 So this is a situation where we're going to create a temporary variable. 233 00:13:36,710 --> 00:13:39,550 So here's how we're going to do it. 234 00:13:39,550 --> 00:13:43,970 Here is our dictionary with a, 10 and we want to sort now by 235 00:13:43,970 --> 00:13:47,722 the values, we want to, you know, maybe see the most common or sort by the values. 236 00:13:47,722 --> 00:13:50,735 And so we are going to make a temporary list 237 00:13:50,735 --> 00:13:54,019 and then we are going to loop through the items. 238 00:13:54,019 --> 00:13:58,763 So, so this is going to just loop through them and it's 239 00:13:58,763 --> 00:14:01,810 going to loop through them in non-sorted order and we are 240 00:14:01,810 --> 00:14:07,990 going to add using the append operation to this little list that we are making. 241 00:14:07,990 --> 00:14:14,190 But we're going to add a tuple that is value, comma, key. 242 00:14:14,190 --> 00:14:19,300 So if we make the value first and the key second in this tuple. 243 00:14:19,300 --> 00:14:22,760 So this syntax here, this parentheses v comma k. 244 00:14:22,760 --> 00:14:26,950 that means make a two-tuple with values from the v and 245 00:14:26,950 --> 00:14:28,330 k variable. 246 00:14:28,330 --> 00:14:33,200 And, append a list. So you're going to end up with a list of two-tuples. 247 00:14:34,680 --> 00:14:39,720 So if we, if we take a look when we're all done with this, each of these is a tuple. 248 00:14:39,720 --> 00:14:45,930 10, a gets appended; 22, c gets appended, and it was simply the opposite order. 249 00:14:45,930 --> 00:14:50,840 The, the tuple, each of the tuples now has the value first and the key second. 250 00:14:50,840 --> 00:14:52,040 Value first, key second. 251 00:14:52,040 --> 00:14:54,400 Value first, key second. 252 00:14:54,400 --> 00:14:57,100 So this is a bit of temporary data that 253 00:14:57,100 --> 00:15:00,890 we've created, this is a bit of temporary data that we've created. 254 00:15:00,890 --> 00:15:04,350 Then what we do is we call the sort method. 255 00:15:04,350 --> 00:15:09,900 Sort, take this list. Lists are mutable. The individual tuples can't be changed, but 256 00:15:09,900 --> 00:15:13,670 the order of the tuples can be changed because they are in a list. 257 00:15:13,670 --> 00:15:17,650 tmp.sort, and then we're going to say reverse equals True 258 00:15:17,650 --> 00:15:21,360 so you sort from the highest down to the lowest. Okay? 259 00:15:21,360 --> 00:15:24,420 And now, tmp has been sorted 260 00:15:24,420 --> 00:15:26,350 and now it is in a new order. 261 00:15:26,350 --> 00:15:30,480 22, 10, 1 is what caused it to be sorted. 262 00:15:30,480 --> 00:15:34,470 So we know that the biggest value is 22, the key 263 00:15:34,470 --> 00:15:37,950 of c. Next biggest is 10 with a key of a. 264 00:15:37,950 --> 00:15:42,300 And the smallest is a key of 1, a value of 1 with a key of b. 265 00:15:42,300 --> 00:15:43,750 So the trick here is 266 00:15:43,750 --> 00:15:47,400 if we want to sort in some other way, we just construct 267 00:15:47,400 --> 00:15:50,210 a list where we put it in the order that we want it sorted. 268 00:15:50,210 --> 00:15:51,540 And this is more important now. 269 00:15:51,540 --> 00:15:57,230 The value is more important than the key. Now if we had another, 270 00:15:57,230 --> 00:16:02,130 like a 22, f, it would sort first on the 22. 271 00:16:02,130 --> 00:16:06,030 And then it would, it would sort the f, 1 after the c, 1. 272 00:16:06,030 --> 00:16:06,330 Right? 273 00:16:06,330 --> 00:16:07,850 So we don't have any duplicates. 274 00:16:07,850 --> 00:16:10,090 But we could have the we could have the key of 275 00:16:10,090 --> 00:16:13,323 c to 22 and we could have f also be 22. 276 00:16:15,000 --> 00:16:18,790 Okay, so, take some time on this, get this one right. 277 00:16:19,230 --> 00:16:22,640 So now I want to show you a program 278 00:16:22,640 --> 00:16:25,990 that is going to show you the ten most common words. 279 00:16:25,990 --> 00:16:28,180 We did a, a loop before 280 00:16:30,940 --> 00:16:32,810 where we did the 281 00:16:34,310 --> 00:16:37,950 most common word by doing a maximum loop at the end by looking 282 00:16:37,950 --> 00:16:42,070 through all of the counts in a dictionary and then picking the maximum. 283 00:16:42,070 --> 00:16:44,140 But what if you wanted the top ten? 284 00:16:44,140 --> 00:16:45,810 Right, but that, that you don't want to write 285 00:16:45,810 --> 00:16:47,700 a loop for that, so we're going to use sorting. 286 00:16:47,700 --> 00:16:51,272 So here's what we're going to do. We're going to open a file. 287 00:16:51,272 --> 00:16:55,120 We're going to create a empty counts dictionary. 288 00:16:55,120 --> 00:16:56,034 Then we're going to 289 00:16:56,034 --> 00:17:01,401 write a for loop that reads each line for line in fhand. 290 00:17:01,401 --> 00:17:04,577 Then I'm going to split each line into 291 00:17:04,577 --> 00:17:08,622 words, based on the spaces, using the dot split. 292 00:17:08,622 --> 00:17:12,750 Then I'm going to loop through each word in each line 293 00:17:12,750 --> 00:17:17,716 and use our histogram or dictionary pattern where 294 00:17:17,716 --> 00:17:20,558 I say counts sub word equals counts dot get 295 00:17:20,558 --> 00:17:22,739 word comma zero. That basically says 296 00:17:22,739 --> 00:17:24,654 go look in counts. 297 00:17:24,654 --> 00:17:27,402 If the word key exists, give me back 298 00:17:27,402 --> 00:17:30,244 the value that's in that, otherwise give me zero. 299 00:17:30,244 --> 00:17:31,192 So this both 300 00:17:31,192 --> 00:17:35,271 creates the new entries and updates old entries. 301 00:17:35,271 --> 00:17:38,078 All in one nice simple statement. 302 00:17:38,078 --> 00:17:41,787 So at the end of this bit of code right here 303 00:17:41,787 --> 00:17:47,787 we are going to have counts with keyword word-count pairs. 304 00:17:47,787 --> 00:17:50,847 Okay? So, this is something we've done before. 305 00:17:50,847 --> 00:17:54,732 It's just dictionaries, reading, splitting. 306 00:17:54,732 --> 00:17:58,903 And then this pattern of how to accumulate in a dictionary. 307 00:17:58,903 --> 00:18:02,521 Then what we're going to do is we are going to make a new list 308 00:18:02,521 --> 00:18:07,083 called l-s-t and now we're doing this key-value in the items. 309 00:18:07,083 --> 00:18:12,003 So this is going to go through the key-value pairs in this list, which is the 310 00:18:12,003 --> 00:18:15,400 key-value pairs from the dictionary. Right? 311 00:18:15,700 --> 00:18:18,984 But then we are going to create this temporary list 312 00:18:18,984 --> 00:18:22,474 of tuples that are val, comma, key. 313 00:18:22,474 --> 00:18:28,455 So val is like 20, the; 14, hello; 314 00:18:29,455 --> 00:18:32,864 and that's what the list is going to look like, right? 315 00:18:32,864 --> 00:18:33,909 It's going to be tuples, 316 00:18:33,909 --> 00:18:37,004 but it's going to be the value and then the 317 00:18:37,004 --> 00:18:39,006 key, rather than the key and the value. 318 00:18:39,006 --> 00:18:46,852 This one here is key, value; this one here, l-s-t, is value, key. 319 00:18:46,852 --> 00:18:50,982 Now that we have a list that's value, comma, key, 320 00:18:52,982 --> 00:18:55,995 we are just going to sort it because now it's going to sort based on the first 321 00:18:55,995 --> 00:18:59,032 thing in that tuple and we're going to reverse it 322 00:18:59,032 --> 00:19:01,798 so the biggest values are near the top. 323 00:19:01,798 --> 00:19:04,737 And so when we're all done this is going to be 324 00:19:04,737 --> 00:19:08,602 a list, except it's going to be sorted based on the value. 325 00:19:08,602 --> 00:19:10,507 So that's just one step to sort it. 326 00:19:10,507 --> 00:19:14,667 So this is a good example of how we sort of go through some work, we get a data 327 00:19:14,667 --> 00:19:16,922 structure, a list, the way we want it and 328 00:19:16,922 --> 00:19:19,417 now we can sort of leverage the built-in sort. 329 00:19:19,417 --> 00:19:23,432 We had to prepare a list so we could use the built-in sort. 330 00:19:23,432 --> 00:19:25,252 We could do this by hand, but it'd be very difficult. 331 00:19:25,252 --> 00:19:26,845 But it's easier to say I think 332 00:19:26,845 --> 00:19:29,000 I'll make a list, and then I'll sort it. 333 00:19:29,000 --> 00:19:30,005 Okay? 334 00:19:30,005 --> 00:19:32,165 So I, you know, I made two lists basically. 335 00:19:32,165 --> 00:19:36,637 I made the original one, then I made this one just for the purpose of sorting. 336 00:19:36,637 --> 00:19:39,661 And now what I am going to do to print out the top ten 337 00:19:39,661 --> 00:19:43,876 is I am going to write a for loop val, key. 338 00:19:43,876 --> 00:19:47,920 Remember, this list l-s-t is value-key. 339 00:19:47,920 --> 00:19:52,430 And I'm going to say for val, key in lst, using list slicing, 340 00:19:53,850 --> 00:19:56,340 starting at zero, up to but not including 341 00:19:56,340 --> 00:19:59,340 ten, which is indeed is the first ten items. 342 00:20:00,430 --> 00:20:06,940 Now I'm going to print out key, value, so it's going to print out the, 22; 343 00:20:06,940 --> 00:20:11,550 fred, 16; and so I'm going to first print the first ten. 344 00:20:11,550 --> 00:20:15,590 So, this list is in val-key order, the tuples are val-key order. 345 00:20:15,590 --> 00:20:18,950 And so I'm going to print it out in key-val just so that I print out in a way 346 00:20:18,950 --> 00:20:21,090 that makes the most sense. 347 00:20:21,090 --> 00:20:24,410 And so, this is a simple way to do a 348 00:20:24,410 --> 00:20:27,330 simple histogram of the occurrence of words in a file. 349 00:20:29,470 --> 00:20:34,268 So again, you should know this, you should know every line. 350 00:20:34,268 --> 00:20:37,730 You should know every line. 351 00:20:37,730 --> 00:20:40,830 Go back, review a couple times, but you should know, 352 00:20:40,830 --> 00:20:42,700 you should know the meaning of every line of this. 353 00:20:42,700 --> 00:20:46,490 And if you do, that's really good. So 354 00:20:49,520 --> 00:20:53,910 as you become more powerful and capable inside Python, you will 355 00:20:53,910 --> 00:20:57,750 realize that there are sometimes even shorter ways of doing things. 356 00:20:57,750 --> 00:21:01,560 Now, what I'm showing you here is not that different than what was 357 00:21:01,560 --> 00:21:06,450 on the previous page, it's just really dense, but you have to concentrate. 358 00:21:06,450 --> 00:21:09,510 So if, I want you to understand what's on that previous page. 359 00:21:09,510 --> 00:21:11,200 If you don't understand this, don't feel bad. 360 00:21:11,200 --> 00:21:14,600 I am going to explain it to you but don't feel bad if you don't get it. 361 00:21:14,600 --> 00:21:14,780 Okay? 362 00:21:14,780 --> 00:21:16,060 So I'm just going to explain it. 363 00:21:17,530 --> 00:21:22,090 If it doesn't feel right to you, go back and look at the previous page. 364 00:21:22,090 --> 00:21:22,890 Okay. 365 00:21:22,890 --> 00:21:27,050 So here we go. I am going to have a dictionary. 366 00:21:27,050 --> 00:21:33,270 And then I'm going to print, in one line, sorted by value. 367 00:21:33,270 --> 00:21:37,110 So we'll start from the inside out. 368 00:21:37,110 --> 00:21:40,610 So this is a thing called list comprehension. 369 00:21:40,610 --> 00:21:44,300 It looks like a list constant because we start with square brackets. 370 00:21:44,300 --> 00:21:50,020 But this is a Python syntax that says construct dynamically 371 00:21:50,020 --> 00:21:55,760 a list of tuples v, comma, k and I would like 372 00:21:55,760 --> 00:22:01,990 you to loop through the items with k and v taking on the successive values. 373 00:22:01,990 --> 00:22:05,910 So this is creating that reversed list where value and key 374 00:22:05,910 --> 00:22:09,580 are the order of the items in each tuple. 375 00:22:09,580 --> 00:22:11,740 And it's going to do that, so this is going to expand. 376 00:22:11,740 --> 00:22:14,080 It's sort of like, it goes [SOUND] expands this, 377 00:22:14,080 --> 00:22:17,330 it makes a temporary list, right now. 378 00:22:17,330 --> 00:22:21,450 Now if you look on the previous slide we call that thing l-s-t. 379 00:22:21,450 --> 00:22:27,300 But here we don't even call it l-s-t, and then once we have the list of tuples 380 00:22:27,300 --> 00:22:33,990 in value-key order, then we simply take and pass that into sorted. 381 00:22:33,990 --> 00:22:35,940 This is a function call, 382 00:22:35,940 --> 00:22:40,870 the sorted function. And then, now I'm not reversing it, but the print statement prints 383 00:22:40,870 --> 00:22:47,460 out its ascending order of the value 1, 10, 22. 384 00:22:47,460 --> 00:22:50,490 Okay? So this you can, you can make these more 385 00:22:50,490 --> 00:22:54,230 dense once you're a little more comfortable with what's going on. 386 00:22:54,230 --> 00:22:58,160 It's sometimes easier to construct something that seems to have 387 00:22:58,160 --> 00:23:00,050 steps, where you can put, you know, you can put 388 00:23:00,050 --> 00:23:01,150 a debug print here, 389 00:23:01,150 --> 00:23:03,100 you can put a debug print here, you can do a debug print here, 390 00:23:03,100 --> 00:23:06,250 and you kind of see what's going on, right? 391 00:23:06,250 --> 00:23:08,430 Whereas once you really understand this 392 00:23:08,430 --> 00:23:11,730 you can, you can write some more dense Python. 393 00:23:11,730 --> 00:23:14,520 When you understand this, it's okay. 394 00:23:14,520 --> 00:23:15,210 Right? 395 00:23:15,210 --> 00:23:17,750 So I'm not saying you're supposed to understand this, but I just want 396 00:23:17,750 --> 00:23:21,160 to point out that it's possible to do this in a tighter fashion. 397 00:23:22,610 --> 00:23:23,110 So, 398 00:23:25,130 --> 00:23:29,210 tuples are like lists except that you can't change them. 399 00:23:29,210 --> 00:23:29,610 Right? 400 00:23:29,610 --> 00:23:34,610 You can't change lists. And now you can compare them, you can sort them. 401 00:23:34,610 --> 00:23:38,740 You can sort lists of tuples. You can't sort within the tuple itself. 402 00:23:38,740 --> 00:23:43,940 The two values on the left-hand side of the assignment statement, we can 403 00:23:45,050 --> 00:23:49,290 use sorted, and we played with sorting dictionaries by key and value. 404 00:23:49,290 --> 00:23:49,790 So, 405 00:23:51,650 --> 00:23:53,680 that's kind of the end of this lecture. 406 00:23:53,680 --> 00:23:57,480 And and so at this point I just want to kind of congratulate 407 00:23:57,480 --> 00:24:02,380 you on making it through the first ten chapters of the book. 408 00:24:02,380 --> 00:24:05,190 So I'll, I'll drink a cup of tea to you. 409 00:24:05,190 --> 00:24:08,213 Here's your cup of tea, here's my toast to you. 410 00:24:08,213 --> 00:24:11,340 in my Slitherin cup. 411 00:24:11,340 --> 00:24:16,160 And so it's time for a graduation ceremony. 412 00:24:16,160 --> 00:24:16,805 So, I'll give a 413 00:24:16,805 --> 00:24:22,150 a little graduation speech here with my graduation hat on and this is my 414 00:24:22,150 --> 00:24:27,070 this is my Slitherin wand and so, so the reason I am congratulating you at 415 00:24:27,070 --> 00:24:32,030 the end of this chapter is that at this point, you kind of 416 00:24:34,170 --> 00:24:39,010 know almost, you know all the fundamentals of programming. 417 00:24:39,010 --> 00:24:43,910 Programming really comes down to what's called algorithms and data structures. 418 00:24:43,910 --> 00:24:50,450 Sometimes we solve a problem by a clever series of steps that we put together. 419 00:24:50,450 --> 00:24:54,230 And sometimes we solve a problem by creating a clever data structure. 420 00:24:55,620 --> 00:24:59,240 And so the first few chapters were about algorithms. Steps, 421 00:24:59,240 --> 00:25:04,280 loops, functions, very procedural. How you sort of create these threads of 422 00:25:04,280 --> 00:25:09,040 stepping and do things a bunch of times or skip around or whatever. 423 00:25:09,040 --> 00:25:11,040 And in the last three chapters that we've covered 424 00:25:11,040 --> 00:25:15,080 we're talking about data structures. And programming 425 00:25:15,080 --> 00:25:18,510 power comes when you combine algorithms and data structures. 426 00:25:19,870 --> 00:25:24,590 Now in the next chapters, starting with Chapter Eleven, regular expressions, 427 00:25:24,590 --> 00:25:29,490 we're going to learn sort of more clever ways of doing the same thing. 428 00:25:29,490 --> 00:25:31,730 So you kind of know how to do a lot of stuff now. 429 00:25:31,730 --> 00:25:35,390 From this point forward, you'll see, oh, boy, that's more clever. 430 00:25:35,390 --> 00:25:38,150 Or we'll use a database. Oh, that's more clever. 431 00:25:38,150 --> 00:25:43,710 But it's not fundamentally different. And so that's why it's important for you 432 00:25:43,710 --> 00:25:49,690 to understand, before you leave this moment, to understand everything 433 00:25:49,690 --> 00:25:50,746 that we've covered so far. 434 00:25:50,746 --> 00:25:56,170 Loops, functions, strings, files, 435 00:25:56,170 --> 00:26:01,620 tuples, lists, dictionaries, because they are kind of the foundation and everything 436 00:26:01,620 --> 00:26:06,841 else will just kind of be a subtle refinement/improvement. 437 00:26:06,841 --> 00:26:09,730 So once you understand that you've kind of begun, you've 438 00:26:09,730 --> 00:26:13,140 become a basic programmer and I like, I like poof! 439 00:26:13,140 --> 00:26:15,120 Like I, I like 440 00:26:15,120 --> 00:26:19,680 magically asperio you and turn you into Pythonio, something like that. 441 00:26:19,680 --> 00:26:19,980 Okay. 442 00:26:19,980 --> 00:26:22,770 Enough with the Harry Potter reference. 443 00:26:22,770 --> 00:26:25,750 Thank you for spending all this time with me. 444 00:26:25,750 --> 00:26:27,910 If you've gotten this far, I really appreciate it. 445 00:26:29,720 --> 00:26:32,450 And of course it's really just the beginning but 446 00:26:32,450 --> 00:26:35,550 I hope that it has been a good beginning. 447 00:26:35,550 --> 00:26:35,890 Thank you.