Journalism 772 and 472: Computer-Assisted Reporting

Fall 2012

 

DATA ANALYSIS PROJECT

 

We have talked in class about the process of identifying story ideas and leads in databases. You will now have a chance to try this on your own by analyzing a database on your own.  You will be provided with a database to analyze for this project, or you may ask for permission from your instructor to analyze a database you obtain through your data acquisition project.  After you have begun work on this assignment, you will turn in a brief status memo on your efforts. At the end of the semester, you will turn in a story memo of 1,000 to 1,200 words (about five pages, double-spaced). This story memo should pitch a story idea derived primarily from your analysis of the database, including a discussion of what you found that was newsworthy when you analyzed the data, how the data could be used to develop the story, and what additional steps would be needed to complete the reporting of this story.  The story memo should be written as if you were trying to get the attention of a very busy editor and convince that editor to give you the opportunity to pursue a story you believe is suggested by your analysis. 

 

Key dates

 

Ø  Session 17 (Monday, Nov. 5):  The assignment will be introduced. It is essential before you start analyzing the data that you read the database documentation.

Ø  Session 26 (Wednesday, Dec. 5): A status memo (one page) is due informing me of the following:  the question or questions you hope to answer with the database; why this would be of interest to an audience of readers, viewers or listeners (depending on the medium); what you have discovered in your initial analysis; and what you plan to do to complete the assignment.  You should attach screen captures of two or three key queries that you are using in your analysis.

Ø  Session 27 (Monday, Dec. 10): Students will be asked to make a very brief informal presentation about their projects.

Ø  Wednesday, Dec. 12, Room 2107 – at 10 a.m.:The final project report is due.  This is a story memo of 1,000 to 1,200 words (double-spaced), along with supporting materials (such as key clips), a log of your discovery process, screen captures of key queries (see details below) and any relevant documentation (in the case of data you did not get from your instructor).  This story memo should be written as if you were trying to get the attention of a very busy editor and convince that editor to give you the opportunity to pursue a story you believe is suggested by your analysis.  The memo should include what you found to be newsworthy when you analyzed the data, how the data could be used in a story, and what additional steps would be needed to complete the reporting for that story.   In addition to turning in paper copies of your story memo and supplementary items (discovery log, relevant clips, etc.), you will be required to bring an electronic copy of your story memo to class.  This will be uploaded into a College of Journalism “assessments” portfolio on the college Web site.  More information about this requirement is provided in a separate information sheet posted on the class Web site. 

Ø  Students who wish to submit their final work before Dec. 12 may make arrangements with the instructor in advance to turn in a paper copy of their story memo and supplementary items and to upload their story memo to the online assessments portfolio.

 

 

The final story memo should include the following:

  • A strong statement at the top:  Write this as if I am your editor and you are competing for my attention with reporters pitching other story ideas.  That is, don’t start by saying, “I sat down and launched Access and looked at the database and sorted it 10 different ways…” Tell me right up front what you found that was interesting or what you found that suggests a dynamite story.  Tell me how your most newsworthy findings relate to what you found in your research about prior uses of similar data for stories elsewhere. Do not pitch your findings for more than they are worth, however, or make assertions not supported by your work.
  • Pros and cons of the data:  Discuss both the strengths and limitations of the data.
  • A summary of the results of your analysis:  Describe results that are germane to the main thrust of the story you are pitching and, if appropriate, relevant results that suggest other interesting avenues to explore.  You may want to include graphics, tables or charts that will make it easier for me to grasp what you are trying to communicate. Indicate whether your findings are new or whether they present a local angle for similar findings reported elsewhere.
  • How you would verify your findings: Identify the specific steps would you take and the types of information you would use to determine whether the results of your analysis are accurate and significant.  If your analysis suggests fault or issues of accountability on the part of public or private figures, indicate whom you would need to contact for response to ensure that your reporting was fair and accurate.
  • Bringing it home:  Indicate the specific steps you would need to take to finish reporting the story.  Include the people you would contact and, if relevant, the places you would visit. Indicate how you would make the story real to readers.

 

The supplementary items you submit should include the following:

  • If you did not get the database from me, provide an electronic copy of the database and append the database documentation to your project memo.
  • If not already included in your memo, append any statistics or other materials you used to verify the integrity of your database or to benchmark your findings. If you wish, append copies of one or two key clips you found in your research.
  • Append a discovery log to your project memo.  This would indicate how you arrived at the major insights you reached during this project. It could be a modified version of the project log described below (that is: the project log is something you keep for yourself as you go; the discovery log is what you submit to me).  The discovery log also allows you a place for some of the salient details of your investigation that would otherwise clutter up your final project memo.
  • Append or include in your discovery log either screen captures” of the query grids or a copy of the SQL commands for the queries behind your key findings. We will talk ahead of time about how to do each of these things. (Computers in Room 2107 are equipped with MWSnap3, a program you can use to cut and paste part of a screen as an image into a Word document. MWSnap3 is also available as freeware on the Web.)

 

A few words of advice:

1). Keep a project log of your all your work on this assignment. This sort of audit trail would enable you to recreate and confirm your findings if you were working on a project in a newsroom. You can select parts of this log for the “discovery log” you will submit with this assignment.

2). Do not change any of the original data.  Make any fixes by adding new fields.

3). Steady work on this project will be better than trying to do it all in a rush at the end.

4). Use the IRE Resource Center (if you are an IRE member), Lexis-Nexis and other databases to find stories or even academic research related to your topic.

5). If you are an IRE member, you can check to see if IRE has tip sheets or Uplink articles on the data you are using.  (IRE has a searchable database of Uplink articles; I have paper copies of some key Uplink articles related to your databases). This will help you determine if there are established caveats and opportunities related to the database.

 

The following criteria will be used to grade your project:

  • Accuracy:  This is paramount.  Mistakes in logic or computation that result in erroneous numbers, examples or other details in your memo will count against you.  Check your work carefully and make sure you understand what the data represent or measure.
  • Meeting the deadline:  Projects turned in late will not be accepted.  To waive the deadline, you must get approval from me in writing first.
  • Following the assignment guidelines: This includes submitting all required elements.
  • Understanding of the database:  Your memo should show that you understand both the possible uses and the limitations of the database. Provide specific cautions about possible dirty data, your assessment of alternate interpretations of your findings, and the steps you took to make sure the database was reliable.
  • News judgment and context: The top of your memo is especially important.  You should have developed a sense, from your analysis and the contextual material you researched, of the news value, novelty and significance of the story you are proposing.
  • Presentation and writing: The elements of your final memo should be integrated in a way that makes it easy to read and comprehend.  If you were using this memo to persuade an editor to OK this project, you would have to put it together in a compelling way.  That’s your goal here.  Do not just toss together a bunch of facts and lists that are hard to follow.  Show that you have a strong story idea based on solid work and a good feel for its value in a newsroom.

 


A few words about working with data in the newsroom:

 

After you get a new data set, there are some things you would do to make sure you are following good operating procedures that will help protect you and your work from disaster and error and will enhance your chances of finding interesting leads related to the structure and completeness of the database itself. 

  1. In the newsroom, you would start by preserving the original database and working on a copy.  However, in this case, you have been provided with a copy.
  2. You would read all documentation thoroughly.
  3. You would make sure the record count matches the documentation. In the case of this assignment, for which I have cut a small slice of the data for you, the record count you get would not match the documentation (for example, the record count provided by NICAR may be for the US, but you may only have just a few states).
  4. You would do some queries to get results you can match against a few things you know to be true. This includes checking such things as names, addresses, amounts or dates against other sources of information (paper records, clips, etc.).
  5. You would check for duplicate records. One way to do this is with a “select distinct” query, which I can show you how to do in class. The number of records obtained in such a query should match the number in the database if there are no duplicate records.  If there are duplicate records, a “group-by” query on all fields with a count of records (selecting only groups with a count of two or more) will show you the duplicates.
  6. You would check for duplicates in any fields that are supposed to contain unique identifiers or unique codes, especially in the fields that will be used to join tables.  You can do this by grouping on such a field and doing a count of records to make sure there is only one of each ID or, in a lookup table, only one of each code.
  7. You would check to see how complete the database is – that is, whether there are important fields for which data are missing (blanks or null values).
  8. You would check the database for coherence. Are there fields showing ages, dates or amounts that could not logically be correct? One way to do this is to do a query that sorts each field to see if the range of values is appropriate. You can add up the values in numeric fields to see if the totals make sense and do group-by queries with counts to see if those make sense. Look over text entries – cities, states, company names, codes – for consistency in spelling.
  9. You would check the design of the database to see if fields that are supposed to contain dates are date-type fields (as opposed to text fields), and see if fields that are supposed to contain numbers are numeric-type fields. Check ZIP code fields to make sure they are text fields (which they should be).
  10. You might find it necessary to do some data “cleanup” – converting text strings to numbers or dates where appropriate, removing any extra spaces that appear before and after text entries in some fields, making the case (upper or lower) uniform. If any of that seems necessary in the case of the data for class analysis, we can talk about that in class.