# How to clean data and split variables 00:00:00,210 --> 00:00:12,270 This is a big data set and it has all the problems that you'll often encounter in this type of data. It's got blank cells, there's inconsistent data, entry cells that aren't in a format you can use.   00:00:12,270 --> 00:00:29,670 When you're presented with data like this you need to be patient and thorough in cleaning it up, but Excel can do a lot of the heavy lifting for you. In this walkthrough we are going to get rid of some cell contents and split other cells into parts. First we need to replace some data. 00:00:29,670 --> 00:01:04,320 In this data set missing values are recorded as .a but we want them to be blank cells instead. We're going to use Excel's 'Find and Replace' tool to change the .a to blank cells for the attractively named variables A009 to X047D. You have probably used 'Find and Replace' in Microsoft Word and this is similar. If you work with data regularly you're going to be using this function most days, for example for placed words such as 'dissatisfied' here in column E with a number. 00:01:04,320 --> 00:01:37,084 After watching this walkthrough you should be able to do this. Select all the columns for the variables labeled A009 to X 047D and in Windows you type Contrl+F - that's F for find and it brings up the 'Find and Replace' box. Click on 'Replace', type ,a into the 'Find' box and leave the replace box empty. This replaces all the cells that contain .a with a blank. Click 'Replace all' and close and there you are. As you can see it's worked. 00:01:38,665 --> 00:02:06,578 If you're using a Mac, you will type Command+F - F for find, or click on the little magnifying glass in the top right. Select 'Replace' and a similar box pops up like this one. Second, we're going to split the variable X025A in two two variables that represent the code for the classification which is a number and a description which is text.   00:02:06,578 --> 00:02:28,866 We're going to do this by extracting data from the left and right hand ends of the cell. X025A is made up of a number, a space, a colon after the space, a space after the colon and some descriptive text with a variable length. So we want to extract the number and the text. First of all, add two column headers.   00:02:28,866 --> 00:02:45,890 We're going to call them 'Education 1' and 'Education 2'. In Education 1's top row type the formula =LEFT and open brackets and you can see for this the arguments are text and a number of characters.   00:02:47,433 --> 00:03:24,800 Click on the cell that has the source text, comma, 1 - because you want only the leftmost character, and close the bracket. You can guess that this will extract the first character of the cell from the left which is the number we need. There we are. Now fill down. But extracting text from the right hand end of this cell is going to be trickier because as you can see the text strings are of a variable length. So we're going to create one formula that will look for the colon, ignore the space after it, and extract all the text to the right until the end of the cell. 00:03:27,080 --> 00:04:09,740 So type =RIGHT, open bracket, click on the cell, add a comma as before then, LEN(O2), close brackets - which returns the length of the data in a cell, then -FIND, open another bracket and the arguments for this are the colon, as I explained you - put that in quotes, O2, which is the data and 1, then close brackets, +2. Make sure you close all the brackets. Now this is going to count from the right and select all the text starting two characters after the colon. 00:04:11,933 --> 00:04:47,120 What we just did might make your head hurt a bit so if you're going to do your own version, test whether you have written the correct formula by thinking about a concrete example. So in this case, if the string was ten characters long and the colon was the third character, we would want characters 5, 6,7, 8, 9 and 10. These would be the first six characters counting back from the right. In this case we've got it right, well, that's a relief! This formula will find a colon and extract all the data from the right of the space to the end, fill down and Ha ha! we have split the cell into two useful components.