1 00:00:00,160 --> 00:00:04,530 Hello, and welcome to Chapter Eight: Python Lists. 2 00:00:04,530 --> 00:00:08,400 So now we're sort of going to start taking care of business. 3 00:00:08,400 --> 00:00:10,530 We are doing, make lists and 4 00:00:10,530 --> 00:00:13,280 dictionaries and tuples and really start manipulating this data, 5 00:00:13,280 --> 00:00:16,290 and doing real data analysis, starting the, 6 00:00:16,290 --> 00:00:18,260 laying the proper work for real data analysis. 7 00:00:18,260 --> 00:00:21,950 As always, these lectures, audio, video, slides, 8 00:00:21,950 --> 00:00:25,740 and even book are copyright Creative Commons Attribution. 9 00:00:25,740 --> 00:00:31,030 So, lists, dictionaries, and tuples, the next real three big topics we're going to 10 00:00:31,030 --> 00:00:36,270 talk about, are collections. And we've been doing lists already, right? 11 00:00:37,340 --> 00:00:41,060 We've been doing lists when we were doing for loops. 12 00:00:41,060 --> 00:00:44,000 A list in Python is something that has a square braces. 13 00:00:44,000 --> 00:00:45,420 This is a constant list. 14 00:00:46,550 --> 00:00:48,410 Now, when I first talked to you 15 00:00:48,410 --> 00:00:50,530 about variables, I sort of oversimplified things. 16 00:00:50,530 --> 00:00:50,900 I said 17 00:00:50,900 --> 00:00:54,160 if you put like x equals two, and then put 18 00:00:54,160 --> 00:00:57,540 x equals four, the two and the four overwrite each other. 19 00:00:57,540 --> 00:01:01,890 A collection is where you can put a bunch of things in the same variable. 20 00:01:01,890 --> 00:01:04,130 Now, I have to have a way to find those things. 21 00:01:05,570 --> 00:01:08,820 But it allows us to put multiple things in 22 00:01:08,820 --> 00:01:11,810 more, more things, more than one thing in the variable. 23 00:01:11,810 --> 00:01:15,330 So, here we have friends, that has three strings, Joseph, Glenn, and Sally. 24 00:01:15,330 --> 00:01:15,970 And we have carryon 25 00:01:15,970 --> 00:01:20,000 that has socks, shirt, and perfume. So that's the basic idea. 26 00:01:20,000 --> 00:01:21,680 So what's not a collection? 27 00:01:21,680 --> 00:01:23,440 Well, simple variables. 28 00:01:23,440 --> 00:01:26,610 Simple variables are not collections, just like this example. 29 00:01:26,610 --> 00:01:30,190 I say x equals 2, x equals 4, and print x, 30 00:01:30,190 --> 00:01:33,430 and the 4's in there and the 2 is somehow gone. 31 00:01:33,430 --> 00:01:35,570 It was there for a moment, and then it's gone. 32 00:01:36,740 --> 00:01:38,470 And so that's a normal variable. 33 00:01:38,470 --> 00:01:41,490 They're not collections. You can't put more than one thing in it. 34 00:01:41,490 --> 00:01:44,220 But when you put more than one thing in it, then you 35 00:01:44,220 --> 00:01:46,530 have to have a way to find the things that are in there. 36 00:01:46,530 --> 00:01:47,320 We'll, we'll get to that. 37 00:01:49,260 --> 00:01:51,880 So, we've been using list constants for the last couple 38 00:01:51,880 --> 00:01:55,120 of chapters just because we have to use list constants. 39 00:01:55,120 --> 00:01:59,040 You know, so we used, in the for loop chapter, we did lists of numbers. 40 00:02:00,520 --> 00:02:05,000 We have done lists of strings, that's strings, red, yellow, and blue. 41 00:02:06,460 --> 00:02:11,230 And you don't have to necessarily, you don't necessarily 42 00:02:11,230 --> 00:02:13,540 have to have things all of the same type. 43 00:02:13,540 --> 00:02:17,680 This is a three-item list, that has a string red, 44 00:02:17,680 --> 00:02:22,800 the number integer 24, and 98.6, which is a floating point number. 45 00:02:22,800 --> 00:02:25,810 And here's an interesting thing, just as a side note. 46 00:02:25,810 --> 00:02:28,040 This shows that floating point numbers are 47 00:02:28,040 --> 00:02:32,040 not always perfectly represented inside of the computer. 48 00:02:32,040 --> 00:02:34,590 It's sort of an artifact of how they work. 49 00:02:34,590 --> 00:02:36,880 And this is an example of 98.6 is really 98 point 50 00:02:36,880 --> 00:02:38,980 na, na, na, na, na. 51 00:02:38,980 --> 00:02:41,260 So, but, don't, when you see something like that, don't freak out. 52 00:02:41,260 --> 00:02:43,710 Floating point numbers are the ones that show this behavior. 53 00:02:44,760 --> 00:02:48,340 So, interestingly, you can always, although we won't put a lot of energy into 54 00:02:48,340 --> 00:02:52,930 this, you can also have an element of a list be a list itself. 55 00:02:52,930 --> 00:02:55,630 So this a outer list that's got three elements. 56 00:02:55,630 --> 00:02:57,710 1, 7, and then 57 00:02:57,710 --> 00:02:59,860 a list that's 5 and 6. 58 00:02:59,860 --> 00:03:04,470 So, if you look at the length of this, there is three things in it. 59 00:03:04,470 --> 00:03:05,850 Not four, three. 60 00:03:05,850 --> 00:03:08,520 Because the outer list has 1, 2, 3 things in it. 61 00:03:08,520 --> 00:03:12,480 And an empty list is bracket, bracket. 62 00:03:12,480 --> 00:03:13,340 Okay? 63 00:03:13,340 --> 00:03:17,180 Like I said, we have been going through lists all along. 64 00:03:17,180 --> 00:03:19,660 We have iteration variables for i in. 65 00:03:19,660 --> 00:03:22,205 This is a list. We've been using it all along. 66 00:03:22,205 --> 00:03:27,270 Similarly, we've been using lists in definite loops, are a 67 00:03:27,270 --> 00:03:30,340 great way to go through lists, for friend in friends, there we have 68 00:03:30,340 --> 00:03:34,402 goes through three times, out come three lines, with the 69 00:03:34,402 --> 00:03:38,520 variable friend advancing through the three successive items in the list. 70 00:03:38,520 --> 00:03:40,380 And away we go. 71 00:03:40,380 --> 00:03:44,116 So, again, lists are not completely foreign to us. 72 00:03:44,116 --> 00:03:45,541 Now, 73 00:03:45,541 --> 00:03:52,520 just like in a string, we can use the index operator, 74 00:03:52,520 --> 00:03:56,990 the square bracket operator, and we can look up items in the list. 75 00:03:56,990 --> 00:03:59,300 Sub one, friends, sub one. 76 00:04:00,330 --> 00:04:03,780 Not surprisingly, using the European elevator rule, 77 00:04:06,090 --> 00:04:09,130 the first item in a list is sub zero, the second 78 00:04:09,130 --> 00:04:11,570 item is sub one and the third one is sub two. 79 00:04:11,570 --> 00:04:15,150 So here when I print friends sub one I get Glenn. 80 00:04:15,150 --> 00:04:18,420 Which is the second element. Just like strings. 81 00:04:18,420 --> 00:04:20,630 So once you kind of know it for strings, lists 82 00:04:20,630 --> 00:04:22,590 and the rest of these things make a lot more sense. 83 00:04:22,590 --> 00:04:26,060 Just, remember that we're in Europe, and things start with zero. 84 00:04:27,760 --> 00:04:31,813 Some things in these data items that we work with are not mutable. 85 00:04:31,813 --> 00:04:34,423 So for example, strings, when we ask for a lower case 86 00:04:34,423 --> 00:04:37,247 version of a string, we're given a copy of that string. 87 00:04:37,247 --> 00:04:41,547 And that's because strings are not mutable, and we can see this 88 00:04:41,547 --> 00:04:46,550 by doing something like saying fruit sub 0 equals lowercase b. 89 00:04:46,550 --> 00:04:49,620 Now you'd think that that would just change this 90 00:04:49,620 --> 00:04:53,652 to be a lower case b, but it doesn't, okay? 91 00:04:53,652 --> 00:04:57,340 It says string object does not support item assignment 92 00:04:57,340 --> 00:05:00,420 which means that you're not allowed to reassign. 93 00:05:00,420 --> 00:05:03,200 You can make a new string and put different things in 94 00:05:03,200 --> 00:05:06,820 that new string, but once the strings are made, they're not changeable. 95 00:05:06,820 --> 00:05:12,220 And that's why when we call fruit.lower, we get a copy of it in lower case. 96 00:05:12,220 --> 00:05:14,860 And so x is a copy of the original string, but 97 00:05:14,860 --> 00:05:18,150 the original string, once we assign it into fruit, is unchanged. 98 00:05:18,150 --> 00:05:19,080 It can't be changed. 99 00:05:20,340 --> 00:05:22,380 Lists, on the other hand, can be changed, and we 100 00:05:22,380 --> 00:05:23,470 can change them in the middle. 101 00:05:23,470 --> 00:05:26,230 This is one of the things we like about them. 102 00:05:26,230 --> 00:05:29,320 So here we have a list: 2, 14, 26, 41, and 63. 103 00:05:29,320 --> 00:05:31,130 Then we say lotto sub two. 104 00:05:31,130 --> 00:05:33,670 Of course, that's going to be the third item. 105 00:05:33,670 --> 00:05:35,690 Lotto sub two is equal to 28. 106 00:05:35,690 --> 00:05:38,380 Then we print it and we see the new number there. 107 00:05:38,380 --> 00:05:41,190 So all this is saying is that we can change them, right? 108 00:05:41,190 --> 00:05:44,640 Strings no, and lists yes. 109 00:05:44,640 --> 00:05:47,540 You can change lists, but you can't change strings. 110 00:05:49,230 --> 00:05:52,480 So the len function, we've used it for several 111 00:05:52,480 --> 00:05:55,540 things, we can say you know, use, len is 112 00:05:55,540 --> 00:05:58,270 used for, for strings and it's used for lists as well. 113 00:05:58,270 --> 00:06:01,00 So the same function knows when its 114 00:06:01,040 --> 00:06:03,070 parameter is a string. And when its parameter is a string, 115 00:06:03,070 --> 00:06:05,030 it gives us the number of characters in the string. 116 00:06:05,030 --> 00:06:07,390 And when it is a list, it gives us 117 00:06:07,390 --> 00:06:10,640 the number of elements in the list. 118 00:06:10,640 --> 00:06:14,310 And just because one of them is a string, it's still one element from the point 119 00:06:14,310 --> 00:06:15,950 of view of this list. 120 00:06:15,950 --> 00:06:20,925 So it has one, two, three, four - four items in the list, okay? 121 00:06:24,870 --> 00:06:27,580 So, the range function is a special function. 122 00:06:27,580 --> 00:06:30,140 It's probably about time to talk about the range function. 123 00:06:31,350 --> 00:06:34,350 The range function is a function that generates a list, that 124 00:06:34,350 --> 00:06:37,210 produces a list and gives it back to us. 125 00:06:37,210 --> 00:06:38,870 And so you give the range function a 126 00:06:38,870 --> 00:06:42,170 parameter, how many items you want, and the range 127 00:06:42,170 --> 00:06:46,150 function creates and gives us back a list that 128 00:06:46,150 --> 00:06:49,960 is four numbers starting at zero, which is zero 129 00:06:49,960 --> 00:06:53,970 up to, but not including three. Sound familiar? 130 00:06:53,970 --> 00:06:54,390 Yeah. 131 00:06:54,390 --> 00:06:58,460 Zero up to but not, I mean zero up to, but not including four. 132 00:06:58,460 --> 00:07:04,630 And, and so the same thing is true here. So, we can combine the len and the range 133 00:07:04,630 --> 00:07:10,071 to say, you know, to say okay, well len friends, that's three 134 00:07:10,071 --> 00:07:15,400 items, and range len friends is 0, 1, 2. And it also 135 00:07:15,400 --> 00:07:22,620 corresponds exactly to these items. So we can actually use this 136 00:07:22,620 --> 00:07:30,940 to construct loops to go through a list. We already have a basic for loop, right? 137 00:07:30,940 --> 00:07:34,290 We basically have a for loop that is our, 138 00:07:34,290 --> 00:07:38,670 that, that said that for each friend in friends. 139 00:07:38,670 --> 00:07:41,220 And out comes, Happy New Year, Glenn and Joseph. 140 00:07:41,220 --> 00:07:45,070 If we also want to know where, what position we're at as 141 00:07:45,070 --> 00:07:50,040 the loop progresses, we can rewrite the exact same loop a different way. 142 00:07:50,040 --> 00:07:52,950 And make i be our iteration variable. 143 00:07:52,950 --> 00:07:59,250 And say i in range(len(friends)), that turns this into zero, one, two. 144 00:07:59,250 --> 00:08:01,530 And then i goes zero, one, two. 145 00:08:01,530 --> 00:08:03,280 And then, we can in the loop, look up the 146 00:08:03,280 --> 00:08:06,540 particular friend that is the particular one we are interested in, 147 00:08:06,540 --> 00:08:10,670 using the index operator, friend sub i. 148 00:08:10,670 --> 00:08:12,280 And then print Happy New Year. 149 00:08:12,280 --> 00:08:13,660 So these two loops, 150 00:08:15,830 --> 00:08:20,335 these two loops are equivalent. These, oop, not that one. 151 00:08:20,335 --> 00:08:25,460 [SOUND] This loop and this loop. This loop is 152 00:08:25,460 --> 00:08:30,720 preferred, unless you happen to need this value i, which tells you where you're at. 153 00:08:30,720 --> 00:08:32,490 In case maybe you're going to change something, you're 154 00:08:32,490 --> 00:08:34,760 going to look through something and then change it. 155 00:08:34,760 --> 00:08:39,070 So, but, but, for what I've written here, they're exactly equivalent. 156 00:08:39,070 --> 00:08:41,070 Prefer the simpler one, unless you need 157 00:08:41,070 --> 00:08:44,370 the more complex one. They both produce the same kind of output. 158 00:08:46,170 --> 00:08:50,090 We can concatenate lists, much like we concatenate strings, with plus. 159 00:08:53,300 --> 00:08:59,560 And you can think of the Python operator's looking to its right and to its left and 160 00:08:59,560 --> 00:09:02,270 saying oh, those are both lists, I know what 161 00:09:02,270 --> 00:09:04,560 to do with lists, I'm going to put those together. 162 00:09:04,560 --> 00:09:08,200 And so that produces a two, three-long lists become a six-long 163 00:09:08,200 --> 00:09:12,100 list with the first one followed by the second one concatenated. 164 00:09:12,100 --> 00:09:15,710 It didn't hurt the original, a. c is a new list, basically. 165 00:09:19,040 --> 00:09:22,530 We can also slice lists. Feels a lot like strings, right? 166 00:09:22,530 --> 00:09:24,030 Everything's kind of like strings. 167 00:09:24,030 --> 00:09:28,330 For loops like strings, concatenation like strings, and now slicing like strings. 168 00:09:28,330 --> 00:09:30,020 And it is exactly the same. 169 00:09:32,300 --> 00:09:37,810 So one up to, but not including. Just remember, up to, but not including. 170 00:09:37,810 --> 00:09:41,830 the second parameter, is up to but not including, so that starts at the sub one, 171 00:09:41,830 --> 00:09:47,950 which is the second one up to but not including 3, the third one, so. 172 00:09:47,950 --> 00:09:50,910 This is 1, 2, and 3 so that's 41 comma 2. 173 00:09:50,910 --> 00:09:55,320 Starting at the first one, up to but not including the third one. 174 00:09:58,650 --> 00:10:01,570 We can similarly eliminate the first one, 175 00:10:01,570 --> 00:10:04,410 so that's up to but not including the fourth one. 176 00:10:04,410 --> 00:10:08,590 Starting at zero, one, two, three, but not including four. 177 00:10:08,590 --> 00:10:13,651 So that's this one. If we go three to the end, and again, 178 00:10:13,651 --> 00:10:21,020 remember that there, starting at 0, so 3 to the end is 0, 1, 2, 3 to the end. 179 00:10:21,020 --> 00:10:23,540 The number 3 doesn't matter. So that's 3, 74, 15. 180 00:10:23,540 --> 00:10:24,290 And the 181 00:10:25,710 --> 00:10:29,300 whole thing, that's the whole thing, so these two things are the same. 182 00:10:29,300 --> 00:10:33,100 So slicing works like strings, starting and up 183 00:10:33,100 --> 00:10:34,760 to but not including is the second parameter. 184 00:10:36,400 --> 00:10:38,570 There are some methods, and you can 185 00:10:38,570 --> 00:10:43,020 read about these online in the Python documentation. 186 00:10:43,020 --> 00:10:44,820 We can use the built-in function. 187 00:10:44,820 --> 00:10:48,140 It doesn't have a lot of use in sort of how 188 00:10:48,140 --> 00:10:50,590 we run, when we're running programs but it's kind of of useful. 189 00:10:50,590 --> 00:10:51,890 I like it when I'm typing 190 00:10:51,890 --> 00:10:54,440 interactively. Like, what can this thing do? 191 00:10:54,440 --> 00:10:58,120 So I make a list, list is a unique type, and 192 00:10:58,120 --> 00:11:00,340 I say, with dir I say what can we do with it? 193 00:11:00,340 --> 00:11:04,170 Well, we can append, we can count, extend, index, insert, pop, remove, reverse 194 00:11:04,170 --> 00:11:08,300 and sort. And then you can sort of read up on all these things. 195 00:11:08,300 --> 00:11:13,889 I'll show you just a couple. We can build a list with the append. 196 00:11:14,900 --> 00:11:16,100 So this syntax here, 197 00:11:16,100 --> 00:11:19,270 stuff equals list, that's called a constructor 198 00:11:19,270 --> 00:11:21,060 which says give me an empty list. 199 00:11:22,440 --> 00:11:26,280 You could also say bracket, bracket for an empty list. 200 00:11:26,280 --> 00:11:30,060 Whatever, you gotta make an empty list and then you call the append. 201 00:11:30,060 --> 00:11:33,210 Remember that lists are mutable, so it's okay to change it. 202 00:11:33,210 --> 00:11:35,530 So we're saying, okay, we started with an empty list. 203 00:11:35,530 --> 00:11:38,210 Now append to the end of that, the word book. 204 00:11:38,210 --> 00:11:39,910 And then append to that, 99. 205 00:11:39,910 --> 00:11:44,040 Wait a sec. 206 00:11:44,040 --> 00:11:44,860 That's a mistake. 207 00:11:49,110 --> 00:11:52,350 That's a mistake. So I have to fix this mistake. 208 00:11:52,350 --> 00:11:55,440 So watch me fix the mistake. Poof. 209 00:11:57,830 --> 00:12:00,680 Now my thing is magically fixed. Isn't that amazing. 210 00:12:00,680 --> 00:12:03,960 I have magic powers when it comes to slide fixing. 211 00:12:03,960 --> 00:12:07,370 I just snap my fingers and the slides are fixed. 212 00:12:07,370 --> 00:12:07,900 So here we go. 213 00:12:07,900 --> 00:12:10,220 We append the 99, and we print it out. 214 00:12:10,220 --> 00:12:13,920 And it's got book and 99, emphasizing the fact that they don't 215 00:12:13,920 --> 00:12:16,780 have to be the exact same kind of thing in a list. 216 00:12:16,780 --> 00:12:20,450 Then later we append cookie and then it's book, 99, cookie. 217 00:12:20,450 --> 00:12:22,910 Okay? So this append, we won't do it in line 218 00:12:22,910 --> 00:12:25,730 like this so often, we'll tend to do it in a loop as we're building up a 219 00:12:25,730 --> 00:12:27,370 list, but that's the way you start with 220 00:12:27,370 --> 00:12:30,630 an empty list and then [SOUND] programmatically grow it. 221 00:12:33,350 --> 00:12:38,410 We can ask, much like we do in a string, we can ask if an item is in a list. 222 00:12:38,410 --> 00:12:41,280 So here is a list called some, with these numbers in it. 223 00:12:41,280 --> 00:12:42,910 It's got five numbers in it. 224 00:12:42,910 --> 00:12:45,980 Is nine in some? True, yes it is. 225 00:12:45,980 --> 00:12:48,780 Is 15 in some? False. 226 00:12:48,780 --> 00:12:55,300 Is 20 not in, that's a leg, a legal syntax, that is legal syntax. 227 00:12:55,300 --> 00:12:58,280 Is 20 not in some, yes it's not there, okay? 228 00:12:58,280 --> 00:13:02,910 They don't modify the list, don't modify the list, they're just asking questions. 229 00:13:02,910 --> 00:13:06,260 These are logical operations often used in if statements or 230 00:13:06,260 --> 00:13:10,330 while, some kind of a logic that you might be building. 231 00:13:12,050 --> 00:13:14,990 Okay, so lists have order. 232 00:13:14,990 --> 00:13:17,130 So when we were appending them, the first thing went 233 00:13:17,130 --> 00:13:20,730 in first, the second thing went in second, et cetera, et cetera. 234 00:13:20,730 --> 00:13:23,380 And we can also tell the list to sort itself. 235 00:13:23,380 --> 00:13:25,650 So one of the things that we can do with a list, 236 00:13:25,650 --> 00:13:28,780 now we're starting to see some power here, is say, sort yourself. 237 00:13:28,780 --> 00:13:30,186 This is a list of strings. 238 00:13:30,186 --> 00:13:33,105 It can sort numbers, it can sort lots of things. 239 00:13:33,105 --> 00:13:38,550 friends.sort, that says hey there, dear friends, sort yourself. 240 00:13:38,550 --> 00:13:40,080 This makes a change. 241 00:13:42,540 --> 00:13:44,670 It alters the list, and puts it, in 242 00:13:44,670 --> 00:13:48,010 this case, in alphabetical order, Glenn, Joseph, and Sally. 243 00:13:48,010 --> 00:13:51,780 It is muted, it was, it's, it's been modified, and so 244 00:13:51,780 --> 00:13:54,660 friend sub one is now Joseph because that's the second one. 245 00:13:54,660 --> 00:13:55,850 Okay? 246 00:13:55,850 --> 00:14:00,000 So the sort method says sort yourself now, 247 00:14:00,000 --> 00:14:03,680 sort yourself, and it sorts and then it stays sorted. 248 00:14:06,720 --> 00:14:10,590 So [COUGH] 249 00:14:10,590 --> 00:14:13,260 you're going to be kind of ticked about this particular slide. 250 00:14:13,260 --> 00:14:16,790 Because there's a whole bunch of built-in functions that help with lists. 251 00:14:16,790 --> 00:14:22,260 And, there's max, there's min, there's len, various things. 252 00:14:22,260 --> 00:14:24,520 And so we could, all those loops that I told you how to 253 00:14:24,520 --> 00:14:29,646 do, I was just showing you that stuff because I thought it was important. 254 00:14:29,646 --> 00:14:31,854 This the simplest way to go through and 255 00:14:31,854 --> 00:14:35,230 find the largest, smallest, and sum, et cetera. 256 00:14:35,230 --> 00:14:36,860 So here's a list of numbers. 257 00:14:38,150 --> 00:14:39,560 We can say how many are there. 258 00:14:39,560 --> 00:14:43,060 That's the count. We can say what's the largest, it's 74. 259 00:14:43,060 --> 00:14:45,960 What's the smallest, that'd be 3. 260 00:14:45,960 --> 00:14:49,080 What is the sum of the running total of them all? 154. 261 00:14:49,080 --> 00:14:52,310 If you remember from a few lectures ago, these are the same numbers. 262 00:14:52,310 --> 00:14:56,880 And what is the average, which is, sum of them over the length of them, 263 00:14:56,880 --> 00:14:58,120 Okay? 264 00:14:58,120 --> 00:15:00,960 So this makes a lot more sense and if you had a list of numbers 265 00:15:00,960 --> 00:15:04,506 like this, you would simply say what's the max, you wouldn't write a max loop. 266 00:15:04,506 --> 00:15:06,945 I just did that to kind of demonstrate how loops work. 267 00:15:06,945 --> 00:15:09,590 [COUGH] Demonstrate how loops work. 268 00:15:09,590 --> 00:15:12,360 So here is a way that you can sort 269 00:15:12,360 --> 00:15:16,580 of change those kind of programs that we wrote. 270 00:15:16,580 --> 00:15:19,780 So there's two ways to write a summing program. 271 00:15:19,780 --> 00:15:22,100 Let's just say instead of the data being 272 00:15:22,100 --> 00:15:26,370 in a list, we're going to write a while loop that's going to read a 273 00:15:26,370 --> 00:15:31,250 set of numbers until we say done, and then compute the average of those numbers. 274 00:15:31,250 --> 00:15:32,728 Okay, so let's say this is our problem. 275 00:15:32,728 --> 00:15:38,220 Read a list of numbers, wait till the word done comes in, and then average them. 276 00:15:38,220 --> 00:15:40,450 So here's a little program that does that. 277 00:15:40,450 --> 00:15:43,250 We create total equals zero, count equals zero. 278 00:15:43,250 --> 00:15:46,120 Make a infinite loop with while True. 279 00:15:46,120 --> 00:15:47,520 And then we ask 280 00:15:47,520 --> 00:15:48,810 to enter a number. 281 00:15:48,810 --> 00:15:51,750 We get a string back from this, remember raw_input always 282 00:15:51,750 --> 00:15:56,790 gives us strings back, and then if it's done, we're going to break. 283 00:15:56,790 --> 00:15:59,770 This is the version of the if that does not require an indent. 284 00:15:59,770 --> 00:16:01,570 We just put the break up there. 285 00:16:01,570 --> 00:16:04,080 And so that gets us out of the loop when the time is right. 286 00:16:04,080 --> 00:16:06,020 So when the time is right over here. 287 00:16:06,020 --> 00:16:09,810 And then, we convert the value to float. 288 00:16:09,810 --> 00:16:12,830 We use a float to convert the input to a floating point number. 289 00:16:12,830 --> 00:16:15,130 And then we do our accumulation pattern, 290 00:16:15,130 --> 00:16:18,110 total equals total plus value, count equals count plus one. 291 00:16:18,110 --> 00:16:19,070 So this is going to run. 292 00:16:19,070 --> 00:16:21,230 These numbers are going to go up and up and up and up. 293 00:16:21,230 --> 00:16:22,880 And then we're going to break out of it, 294 00:16:22,880 --> 00:16:25,980 calculate the average, and then print the average. 295 00:16:25,980 --> 00:16:29,850 Because that's a floating point number, so now the average is a floating point number. 296 00:16:29,850 --> 00:16:31,070 So that's one way to do it. 297 00:16:31,070 --> 00:16:31,390 Right? 298 00:16:31,390 --> 00:16:34,570 That would be one way to write a program 299 00:16:34,570 --> 00:16:37,990 that does an average, is keep a running average 300 00:16:37,990 --> 00:16:38,999 as you're reading the numbers. 301 00:16:40,060 --> 00:16:44,080 But there's another way to do it, that would exact, work exactly 302 00:16:44,080 --> 00:16:47,508 the same way, and this is when you can start using lists. 303 00:16:47,508 --> 00:16:51,560 So you come in, you say I'm going to make a list 304 00:16:51,560 --> 00:16:56,810 of numbers, just a mnemonic name, numlist, is an empty list. 305 00:16:56,810 --> 00:17:02,070 Then I create another infinite loop that's going to read for enter a number. 306 00:17:02,070 --> 00:17:03,460 And if it's done, break. 307 00:17:03,460 --> 00:17:08,650 That gets us out of it. Convert the value to an int. 308 00:17:08,650 --> 00:17:12,400 Convert the value to a float, the input value to a float. 309 00:17:12,400 --> 00:17:14,440 And then append it to the list. 310 00:17:14,440 --> 00:17:16,580 So now the list is going to grow, each time 311 00:17:16,580 --> 00:17:18,820 we read a number the list is going to grow. 312 00:17:18,820 --> 00:17:21,420 However many times we add the number is 313 00:17:21,420 --> 00:17:23,410 how many things are going to be in the list. 314 00:17:23,410 --> 00:17:25,730 So in this case, when we're at this point and we 315 00:17:25,730 --> 00:17:28,540 type done, there will be three numbers in the list, because we 316 00:17:28,540 --> 00:17:32,560 will have run append three times. We'll have appended 3, 9, and 5. 317 00:17:32,560 --> 00:17:37,160 We'll have them sitting in a list. And we will have exited the loop. 318 00:17:37,160 --> 00:17:39,360 So now you say, oh add up all the numbers in 319 00:17:39,360 --> 00:17:42,720 that list, and then divide it by the length of the list. 320 00:17:42,720 --> 00:17:43,960 And print the average. 321 00:17:43,960 --> 00:17:47,290 So these two programs are basically equivalent. 322 00:17:47,290 --> 00:17:48,620 The only time that they might not be 323 00:17:48,620 --> 00:17:54,120 equivalent was if there was ten million numbers. 324 00:17:54,120 --> 00:17:59,260 This would use up 40 megabytes of your memory, which 325 00:17:59,260 --> 00:18:01,230 is actually not a lot of memory on some computers. 326 00:18:01,230 --> 00:18:05,180 But if memory mattered, this does store all those numbers. 327 00:18:05,180 --> 00:18:07,680 This one actually just runs the calculation. 328 00:18:07,680 --> 00:18:11,660 So if there's a really large number of numbers, this would make a difference, 329 00:18:11,660 --> 00:18:15,660 because the list is growing and keeping them all, summing them all at the end. 330 00:18:15,660 --> 00:18:17,350 This is actually storing very little data. 331 00:18:18,430 --> 00:18:20,600 But for reasonably sized numbers, 332 00:18:20,600 --> 00:18:24,120 like thousands or even hundreds of thousands of numbers, these 333 00:18:24,120 --> 00:18:28,960 two approaches are kind of equivalent. And then sometimes you actually 334 00:18:28,960 --> 00:18:32,070 want to accumulate something a little more complex than this, you want to 335 00:18:32,070 --> 00:18:35,320 sort them or look for the maximum and look for something else. 336 00:18:35,320 --> 00:18:37,430 Who knows what, but the notion of make a 337 00:18:37,430 --> 00:18:39,830 list and then append something to the list 338 00:18:39,830 --> 00:18:42,380 each time through the iteration, and then do something with 339 00:18:42,380 --> 00:18:45,410 the list at the end is a rather powerful pattern. 340 00:18:45,410 --> 00:18:48,720 So this is also a powerful pattern, this is accumulator 341 00:18:48,720 --> 00:18:51,900 pattern where we just have the variables accumulating in the loop. 342 00:18:51,900 --> 00:18:55,040 This one is one where we accumulate the data in 343 00:18:55,040 --> 00:18:58,170 the loop and then do the computations all at the end. 344 00:18:58,170 --> 00:19:02,050 The, certain situations will make use of these different techniques. 345 00:19:03,130 --> 00:19:09,020 Okay. So, connecting strings and lists. 346 00:19:09,020 --> 00:19:11,830 So there's a method, a capability 347 00:19:11,830 --> 00:19:16,190 of strings that is really powerful when it comes to tearing data apart. 348 00:19:18,880 --> 00:19:23,110 It's called the split. So here is a string 349 00:19:23,110 --> 00:19:26,858 with three words and it has blanks in between here. 350 00:19:26,858 --> 00:19:33,720 And abc.split says parse this string, 351 00:19:33,720 --> 00:19:38,690 look for the blanks, break the string into pieces, and give me back a 352 00:19:38,690 --> 00:19:43,920 list with one item for each of the words in the list as 353 00:19:43,920 --> 00:19:47,200 defined by the spaces. Okay? 354 00:19:47,200 --> 00:19:53,150 So, it takes, breaks it into three pieces and gives us that back in a list. 355 00:19:53,150 --> 00:19:55,870 This is very powerful. Okay? 356 00:19:55,870 --> 00:19:58,340 So we're going to split it and we get back a list. 357 00:19:58,340 --> 00:20:04,180 There are three words, and the first word, stuff sub zero, is With. 358 00:20:04,180 --> 00:20:06,200 So there's a lot of parsing going on here. 359 00:20:06,200 --> 00:20:09,180 We could do this with for loops and a lot of other things. 360 00:20:09,180 --> 00:20:11,240 There would be a lot of work in this split. 361 00:20:11,240 --> 00:20:14,180 Given that this is a really common task, it's really 362 00:20:14,180 --> 00:20:17,970 great that this has been put into Python for us. 363 00:20:17,970 --> 00:20:19,350 Okay? 364 00:20:19,350 --> 00:20:22,850 So split breaks a string into parts and produces a list of strings. 365 00:20:22,850 --> 00:20:25,630 We think of these as words, we can access a 366 00:20:25,630 --> 00:20:28,040 particular word or we can loop through all the words. 367 00:20:28,040 --> 00:20:31,050 So here we have stuff again and now we have a, a for loop 368 00:20:32,050 --> 00:20:35,070 for each of the, that's going to go through each of the three words. 369 00:20:35,070 --> 00:20:36,370 And then it's going to run three times. 370 00:20:36,370 --> 00:20:37,410 Now chances are good we're going to do 371 00:20:37,410 --> 00:20:39,600 something different other than just print them out. 372 00:20:39,600 --> 00:20:44,450 But you see how that you quickly can take a split followed by a for, and then write 373 00:20:44,450 --> 00:20:45,720 a loop that's going to go through each of the 374 00:20:45,720 --> 00:20:48,360 words, without working too hard to find the spaces. 375 00:20:48,360 --> 00:20:52,574 You let Python do all the hard work of finding the spaces. 376 00:20:52,574 --> 00:20:53,375 Okay? 377 00:20:53,375 --> 00:20:56,350 So let's take a look at a couple of samples. 378 00:20:58,130 --> 00:21:00,480 Just a couple of things to teach you a little more about split. 379 00:21:01,510 --> 00:21:05,570 Split looks at many spaces as equal to one space. 380 00:21:07,500 --> 00:21:10,810 So, if you split a lot blank, blank, blank of spaces, it's 381 00:21:10,810 --> 00:21:14,480 still just throws away all the spaces and gives us four words. 382 00:21:15,750 --> 00:21:20,480 One, two, three, four and throws away all the spaces, 383 00:21:20,480 --> 00:21:21,900 because it assumes that's what we want done. 384 00:21:21,900 --> 00:21:22,535 So that's nice. 385 00:21:22,535 --> 00:21:26,916 You can also have split, you can also have split, 386 00:21:26,916 --> 00:21:30,310 split on some other character. Sometimes you'll be getting data 387 00:21:30,310 --> 00:21:33,090 and they'll have used a semicolon, or a comma, or 388 00:21:33,090 --> 00:21:36,000 a colon, or a tab character, who knows what they've 389 00:21:36,000 --> 00:21:39,400 used, and your job is to dig that data out. 390 00:21:39,400 --> 00:21:42,900 So you can split, based on the different character. 391 00:21:42,900 --> 00:21:47,070 Here, if we're splitting normally with, with this is a normal split. 392 00:21:47,070 --> 00:21:49,800 It's not going to see the semicolons, it's looking for a space. 393 00:21:49,800 --> 00:21:52,880 And so all we get back is one 394 00:21:52,880 --> 00:21:55,220 item in the string, with the semicolons. 395 00:21:55,220 --> 00:21:58,520 But, if we switch, and we pass semicolon 396 00:21:58,520 --> 00:22:01,080 as a parameter, in as as parameter to split, 397 00:22:01,080 --> 00:22:03,090 then it will know to split it based on 398 00:22:03,090 --> 00:22:06,450 semicolons, and gives us first, second, and third back. 399 00:22:07,520 --> 00:22:07,820 Okay? 400 00:22:07,820 --> 00:22:09,940 And then it gives us three words. 401 00:22:09,940 --> 00:22:13,640 So you can split either on spaces, or you 402 00:22:13,640 --> 00:22:17,490 can split on a character other than a space. 403 00:22:17,490 --> 00:22:18,040 Okay? 404 00:22:18,040 --> 00:22:20,400 [COUGH] 405 00:22:20,400 --> 00:22:25,230 So, let's take a look at how we might turn this into some of our common assignments 406 00:22:25,230 --> 00:22:32,420 that we have in this chapter, where we're going to read some of the mailbox data. Okay? 407 00:22:33,420 --> 00:22:36,720 So, here we go with a little program. 408 00:22:36,720 --> 00:22:41,170 First three lines, we write these a lot. Open the file. 409 00:22:41,170 --> 00:22:43,090 Write a for loop to loop through each 410 00:22:43,090 --> 00:22:44,870 line in the file. 411 00:22:44,870 --> 00:22:48,100 Then we're going to strip off the white space at the end of the line. 412 00:22:48,100 --> 00:22:50,990 One, two, three. Do those all the time. 413 00:22:50,990 --> 00:22:54,990 And we're looking for lines, if you look at the whole file, 414 00:22:54,990 --> 00:22:58,170 we're looking for lines that start with from, followed by a space. 415 00:22:58,170 --> 00:23:00,420 So if the line does not start with from 416 00:23:00,420 --> 00:23:03,700 followed by a space, that's a space right there, continue. 417 00:23:03,700 --> 00:23:08,460 So that's a way to skip all the lines that don't look like this. 418 00:23:08,460 --> 00:23:12,490 There're thousands of lines in this file and just a few that look like this. Okay? 419 00:23:12,490 --> 00:23:17,110 So we're going to look and we're going to try 420 00:23:17,110 --> 00:23:22,790 to find what day of the week this thing happened on. 421 00:23:22,790 --> 00:23:27,700 So, so we're throwing away all the lines with this little bit of code. 422 00:23:27,700 --> 00:23:32,820 Then what we do is we take the line, which is all of this text, and then we split it. 423 00:23:34,110 --> 00:23:38,270 And we know that the day of the week is words sub two. 424 00:23:38,270 --> 00:23:43,080 So this is words sub zero, this is words sub one, and this is words sub two. 425 00:23:43,080 --> 00:23:46,480 So this is words sub zero, sub one, and sub two. 426 00:23:46,480 --> 00:23:48,550 And so, all we have to do is print out the sub two 427 00:23:48,550 --> 00:23:53,740 and we get, we throw away all the lines except the from lines. 428 00:23:53,740 --> 00:23:56,650 We split them and take the sec, uh, the, 429 00:23:56,650 --> 00:23:59,330 the third word or words sub two and we 430 00:23:59,330 --> 00:24:02,260 can quickly quickly create something that's extracting 431 00:24:02,260 --> 00:24:04,060 the day of the week out of these. 432 00:24:06,030 --> 00:24:07,400 Okay? 433 00:24:07,400 --> 00:24:11,890 So it's, it's, I mean, it's quick, because split does the tricky work. 434 00:24:11,890 --> 00:24:15,140 If you go back to the strings chapter, you saw that 435 00:24:15,140 --> 00:24:16,910 we did a lot of work to get this to happen. 436 00:24:17,950 --> 00:24:21,040 So here's even another tricky pattern. 437 00:24:21,040 --> 00:24:26,510 So let's say we want to do what we did at the end of Chapter Six, 438 00:24:26,510 --> 00:24:28,120 the string chapter. 439 00:24:28,120 --> 00:24:30,870 Let's say we wanted to get back this little bit of data. 440 00:24:32,130 --> 00:24:33,330 Okay? 441 00:24:33,330 --> 00:24:37,310 So, can look at this and say, okay, let's split this. 442 00:24:37,310 --> 00:24:42,420 And this will be zero, one, and two, and three, and four, and five, and six. 443 00:24:42,420 --> 00:24:44,530 We're splitting it based on spaces. 444 00:24:44,530 --> 00:24:50,106 Then the email address is words sub one, right? 445 00:24:51,106 --> 00:24:54,666 So that email address is this little bit of stuff 446 00:24:54,666 --> 00:24:58,780 because it's in between spaces, right? So that's what we pull out. 447 00:24:58,780 --> 00:25:02,355 The email address is words sub one. 448 00:25:02,355 --> 00:25:04,512 We've got that. 449 00:25:04,512 --> 00:25:07,730 So that's sitting in this email address variable. 450 00:25:07,730 --> 00:25:10,000 Then we really, all we want, we don't really want the whole thing, 451 00:25:10,000 --> 00:25:11,960 we just want the part after the 452 00:25:11,960 --> 00:25:14,470 at sign, and we can do a lookup for the, oop. 453 00:25:14,470 --> 00:25:16,290 We can do a lookup of the at sign. 454 00:25:17,490 --> 00:25:22,145 But you can also then do a second, come back, come back. 455 00:25:22,145 --> 00:25:25,300 [SOUND] There we come. 456 00:25:25,300 --> 00:25:29,110 You can also do a second split. Okay? 457 00:25:29,110 --> 00:25:31,260 So we're taking this variable here, email, 458 00:25:31,260 --> 00:25:33,980 which is merely this little part right here. 459 00:25:33,980 --> 00:25:36,840 And we are splitting it again, except this 460 00:25:36,840 --> 00:25:38,400 time we're splitting it based on a at sign. 461 00:25:38,400 --> 00:25:42,640 Which means it's going to bust it right here, and find 462 00:25:42,640 --> 00:25:44,140 us two pieces. 463 00:25:44,140 --> 00:25:49,730 So pieces now is a list where the sub zero item is the 464 00:25:49,730 --> 00:25:56,280 person's name and sub one item is the host that their mail address is held from. 465 00:25:56,280 --> 00:26:00,540 Okay? And so then all we need to know is pieces 466 00:26:00,540 --> 00:26:06,380 is sub one, and pieces sub one is this guy right here. 467 00:26:07,900 --> 00:26:10,750 So that's pieces sub one, and so we pulled it out. 468 00:26:10,750 --> 00:26:13,470 So if you go back to how we did it before, we were 469 00:26:13,470 --> 00:26:17,100 doing searching, we were searching some more, and then we were taking slices. 470 00:26:17,100 --> 00:26:19,380 This is a little more elegant, okay? 471 00:26:19,380 --> 00:26:21,110 Because really, we split it and then we split it, 472 00:26:21,110 --> 00:26:23,080 and we knew what piece we were looking at. 473 00:26:23,080 --> 00:26:27,250 So this is what I call the Double Split Pattern, where you split a string 474 00:26:27,250 --> 00:26:30,630 into a list, then you take a thing out, and then you split it again. 475 00:26:31,710 --> 00:26:33,020 Depending on what data you're looking for. 476 00:26:33,020 --> 00:26:35,376 This is just a technique, it's not the only technique. 477 00:26:35,376 --> 00:26:40,480 Okay, so that's lists. 478 00:26:40,480 --> 00:26:42,040 We talked about the concept of a 479 00:26:42,040 --> 00:26:44,540 collection where lists have multiple things in it. 480 00:26:44,540 --> 00:26:47,350 Definite loops, again, we've seen these things. 481 00:26:47,350 --> 00:26:49,600 We're kind of, it looks a lot like strings 482 00:26:49,600 --> 00:26:53,100 except the elements are more powerful and they're more mutable. 483 00:26:53,100 --> 00:26:59,070 We still use the bracket operator and we redid the max, min, and sum. 484 00:26:59,070 --> 00:27:02,382 Except we did it in, like, one line rather than a whole loop. 485 00:27:02,382 --> 00:27:06,110 And something we're going to play with a lot is using split to parse strings, 486 00:27:06,110 --> 00:27:08,630 the single split, and then the double split 487 00:27:08,630 --> 00:27:11,130 is the natural extension of the single split. 488 00:27:11,130 --> 00:27:14,780 So, see you in the next lecture, looking forward to talking about dictionaries.