stat 579

Homework: week 8

Due in class, Tuesday/Thursday Oct 16/18/22.

This week's homework deals with crimes by type in each state over a period of the last 40 years. The data can be found at the center for disaster assistance.

  1. Data Collection, Cleaning & Processing
    1. Pick three states from the table on the right hand side of the website above.
    2. Using some software (you can try R for this, but Excel will do fine as well), clean the data so that you can load it into R. Check that all data types are correct (e.g. numbers are numbers, not text, ...). Use the following names for the different types of crimes:
      Violent, Property, Murder, Forcible.Rape, Robbery, Aggravated.assault, Burglary, 
      Larceny.Theft, Vehicle.Theft
    3. Introduce a variable called 'State' and use the appropriate 2-letter abbreviation for your data.
    4. Combine the data from all three states (hint: rbind). Calculate Crime indices for populations of size 100,000.
    5. In R, reshape the data, such that you can create a chart as shown below (make use of facet_wrap and set parameter scales="free"):
  2. Processing Description
    In a paragraph, describe the steps that you had to take for the first part of the question. Give an estimate of how much time you spent on processing and cleaning the data. Which parts could you speed up (and by how much), if you had more 'practice' (i.e. if you had to do the same thing for all 50 states)?


Submit a file of the cleaned and processed data, and an R script for the work in 1.5. Embed the write-up for question 2 in the R script in the form of comments.

Sample Solution: