For this project you are asked to identify both topic and data by yourself. You are supposed to work in teams of size 2-5.
The project has a written component with several stages. A first due date will be next week, the final write-up is due the week before the Thanksgiving break. The oral part of the projects consists of a presentation in class. We will have project presentations during the last week of classes and Finals' week.
The topic should be within the realm of general knowledge and interest. You may revisit one of the topics from class, but take care to try to explore VERY different aspects of the topic and make sure to also get additional/other sources for your data.
There are several parts to this project: In the first, you'll need to identify a suitable topic for your project. In the second step, work out how and which data you will use. These two steps are linked - you might have great ideas for topics, but if you cannot find suitable data, it might not be advisable to proceed.
Next, discuss questions you aim to answer, and download/incorporate the data necessary to answer them. In the last part, you'll try and answer your questions using the techniques discussed in class.
Some questions to think about:
- Are there other sources of data necessary to make valid cross-comparisons?
- Are there other sources of data that might be useful?
- What do you want to learn from this data?
- What data do you need to answer those questions?
- What data is available?
- What is your strategy for selecting data?
- How will you structure the data?
- What are the keys/id variables?
- Nov 13/17: one paragraph description of planned project; include potential data sources (web link, electronic data file and source); list of team members & team name
- Dec 4/8: (3+3*team size) pages of write-up
- Dec 9/11/18: in class presentations
- Dec 19: final written project due.
Hints for writing a good project and presentation
Pick a topic of general interest - try it out on your family or friends (outside your area of study). If their eyes glaze over after 1 minute, it's not a topic of general interest.
Use current data - if you've found an exciting data set that is a couple years old, try to come up with a current source and build your own new data. Nobody likes news from last year or left-over analyses.
Show a good variety of different plots - the same type of plot is very tiring on your audience, try to mix it up. Coming up with at least three different types of charts. This will also inspire you to broaden your analysis!
Don't show graphics or tables that you have not made yourself. Reproduce a graphic with your own data, if you like. Since you're supposed to demonstrate that you master the data, you should be able to. Showing off somebody else's work sheds a bad light on your skills.
Each figure or table must have a caption. A good caption consists of 2-3 major pieces: a) a description of the construction of the table or graphic and the data source/data used (what is it? and how did you do it?),
b) the main implication of the graphic or table (why did you do it?), and, if at all possible, c) a secondary finding that might lead to the next question of interest.
Avoid orphaned graphics - i.e. mention each figure in your write-up and refer to it (by number).
Structure your report - each report has to consist of a minimum of 4 sections:
- Introduction (1/2 - 1 page): Motivate your project - why is it exciting? why should people read?
Mention your data source.
If you go into a topic that a general audience is not familiar with, explain all the NECESSARY parts that allow your audience to understand the work. Don't go excessively beyond the page limit. If you need more than 1/2 page, it's a good sign, that you should switch topics.
Summarize your main finding.
Outline your data analysis and the structure of the rest of the paper.
- Data Section: (1/2 - 1 page) Describe the data and its source, summarize the main variables, and, if necessary, give definitions of the variables you used in the analysis.
Describe any kind of cleaning, filtering, subsetting, or any other changes you made to the original data.
- Main Section: this is where you describe the findings from your data exploration. If you were looking at different aspects, you might need to use extra structuring elements.
- Conclusions/Future Work: (1/2 page) summarize your findings - don't just list them, but try to come up with a cohesive statement. If you started your Intro by posing a question, try to answer it at this point.
For future work come up with at least two good points of how to extend the project or your analysis in a meaningful way.
While code is very important for this class, it does not have a place in the report (except for an electronic(!) appendix, which you should submit together with your project write-up.).
No Models! While this is a Statistical Methods class, we need to level the playing field for everybody. Only use the statistical methods emphasized in class, i.e. demonstrate computational proficiency in accessing, organizing and re-structuring data; impress me with your graphical exploration skills. Don't use any statistical models for this project beyond a linear regression of Y in X, if you absolutely have to.
Respond to all the feedback you get! Make the effort to think about questions and suggestions you got when you presented and react to it.
Proofread your paper! At a minimum, run a spellchecker over it. Typos don't make the grader happy :)
Grading rubric: written project
Overall grade breakdown:
- Introduction: 10
- Questions and findings: 60
- Conclusion: 10
- Presentation: 15
- Reproducibility: 10
The grading rubric that I'll use for the written project is available as a pdf.
Grading rubric: oral presentation
More details later.
Some great projects in the past: