Instructor: Professor Rick Wash
Email: wash@msu.edu
Meets: Mondays, 5:00-7:50pm in CAS 025
Office Hours: TBD
This class is a doctoral level methods class where we learn a number of statistical techniques for analyzing quantitative data that arises in doing research on media and information. The majority of this class will focus on interpretation of statistical results: given data with certain properties, how do you interpret the results of statistical calculations to understand what you can learn from various types of analysis. We will also focus on using statistics to answer how much questions rather than just simply yes/no questions.
Topics will include:
Depending upon time and student interest, we may also cover:
The course is structured with a single 170-minute class period per week. Each class period will have one or more topics (specificed in the schedule). A mix of lecture, discussion, and in-class small group exercises will be used to learn the topic for that day. Attendance and participation in the class is required; please notify the professor ahead of time if you cannot make it to class. You are responsible for learning the material covered in class on your own; many of the ideas in the class build on each other, so regular attendance is important if you want to understand the more advanced topics in the class.
Bring your laptop computer (if you have one) to class if at all possible. We will be using it for in-class exercises.
For this class, all of the examples will be calculated using the R Statistical Software. This is a free, open-source statistics package that is widely used and supports a wide variety of statistical analysis. As the instructor, I will support student use of this package by helping students figure out how to use it. However, I do not require the use of R; if you are more comfortable using a different statistics package such as SPSS, then feel free. But know that I will not help you figure out how to use it to do the analysis in this class. Use of a different statistics package is not an excuse for not calculating important values or not understanding what the underlying calculation is (for example, most packages calculate a “Pseudo-R^2” for logistic regression, but there are a number of different pseudo-R^2 calculations, which all mean different things. Which one does your package calculate?).
There are 7 total assignments: 3 data analysis homeworks, and 4 research assignments. Each of these constitutes an equal part of the final grade (~12.5%), with the final paper counting twice.
Grading will be done on a check plus / check / check minus / fail system (which roughly corresponds to a 4.0, 3.5, 3.0, and 2.0). A Rubric is available. For PhD students, I see grades more as an evaluation of work than as an incentive to work harder. You get out of the class what you put into it, and it is your choice what you learn. Sometimes a B makes sense. I won’t force you to learn anything, but I will honestly evaluate your work and give you a grade accordingly. I will provide some short feedback on all of the assignments; for the topics that you really want to learn more about, please feel free to ask me additional questions or for more feedback.
In this class, we will be using the books
Additionally, students will be required to read a few papers that will be assigned periodically throughout the semester (particularly in the first couple of weeks).
The schedule contains the list of what parts of the books should be read when and links to any additional readings.
There is also an optional reading: Signal: Understanding What Matters in a World of Noise by Stephen Few. ISBN 1938377052. You’ll learn more about data analysis from an afternoon reading this book than you will from this course.
There are a number of public datasets available online. We will use a couple of these in class and for the assignments, and any of them can be used for the project if you like.
Randall Munroe did a large survey involving naming colors; given a color, what name do you give it? He did some interesting analysis, but it may be worth re-analyzing: http://blog.xkcd.com/2010/05/03/color-survey-results/
Sean Lahman is a baseball analyst who has put together a large dataset of Baseball player and team statistics that is fascinating and fun to look through. http://baseball1.com/statistics/
The National Broadband Map has made a bunch of location / broadband information available for analysis. http://www.broadbandmap.gov/data-download (If you use this, talk to Bob LaRose and Johannes Bauer; they have experience with this dataset)
Reddit, the social sharing / bookmarking website, has made a lot of its historical data available, originally for the purpose of improving the recommendation system. It also has social science value: http://www.reddit.com/r/redditdev/comments/dtg4j/want_to_help_reddit_build_a_recommender_a_public/
ICPSR at the University of Michigan makes a LOT of social science data publicly available. For example, one interesting dataset is the longitudinal General Social Survey, conducted every year since 1972: http://dx.doi.org/10.3886/ICPSR31521.v1
The US Census makes a lot of good data available: http://census.gov
The Pew Internet and American Life project makes some good datasets available: http://www.pewinternet.org/Data-Tools/Download-Data/Data-Sets.aspx
There is a great list of public datasets here: https://bitly.com/bundles/hmason/1
Reddit has a subreddit dedicated to links to interesting datasets: http://www.reddit.com/r/datasets
Results and betting lines for Soccer matches: http://www.football-data.co.uk/downloadm.php and http://www.football-data.co.uk/englandm.php
A large list of datasets here: http://wiki.urbanhogfarm.com/index.php/Category:Dataset
Stanford maintains a large collection of datasets for network analysis: http://snap.stanford.edu/data/index.html
Kaggle has a large list of public datasets: https://www.kaggle.com/datasets
If you are just learning R and are familiar with SPSS, here are some references (courtesy of Emilee Rader) that may help:
General statistics help is available online in many forums. One of the best is run by StackOverflow: http://stats.stackexchange.com/
Expectations: I expect all students to be familiar with the documents related to this class on this website, and to be aware of all assignments and responsibilities. Students are also responsible for knowing all announcements in class and over email.
Assignments: Assignments can be submitted via email to the instructor. Assignments will be deducted one half a letter grade for each day late; in other words turn in your assignment on time. Of course, if negotiated in advance, reasonable exceptions may be granted by the professor.
Academic Dishonesty: Michigan State University and the Media and Information Studies program both have policies about academic dishonesty. Basically, make sure that everything you turn in with your name on it is your own work, and don’t cheat or lie. If it feels like cheating, it probably is; if you are unsure please ask. Students caught cheating or plagiarizing will receive a 0 for the assignment and be reported to the university. Working together with other students in this class and other classes, however, is encouraged. Make sure that everything you turn in with your name on it is original work of yours.
For classes that involve complex thinking and no right answers like this, I strongly encourage you to work together and ask each other for help. Often when you have a problem or a confusion, the best place to go for help is your colleagues who are also working on similar issues. Also, the Internet is a fantastic source of information when you are stuck. Use these resources copiously. However, make sure that you personally write and understand all of the work that you turn in. Directly copying text that you don’t understand from the Internet or from others is academically dishonest.
Accommodations for Disabilities: Michigan State University is committed to providing equal opportunity for participation in all programs, services and activities. Requests for accommodations by persons with disabilities may be made by contacting the Resource Center for Persons with Disabilities at 517-884-RCPD or on the web at http://rcpd.msu.edu. Once your eligibility for an accommodation has been determined, you will be issued a verified individual services accommodation (“VISA”) form. Please present this form to me at the start of the term and/or two weeks prior to the accommodation date (test, project, etc). Requests received after this date will be honored whenever possible.
Religious Holidays: You may make up course work missed to observe a major religious holiday only if you make arrangements in advance with the instructor.
Required Activity: To make up course work missed to participate in a university-sanctioned event, you must provide the instructor with adequate advance notice and written authorization from a university administrator.