Dissertation Journal: Defended, Edited, Submitted, Accepted

It’s been about a year-and-a-half since my last post about my dissertation.  Two weeks ago, I defended my dissertation NON-RESPONSE BIAS ON WEB-BASED SURVEYS AS INFLUENCED BY THE DIGITAL DIVIDE AND PARTICIPATION GAP.  I’ve included the abstract below if you’re interested in its content but I’ll focus here on some of the process.

I originally intended to write a lot more in this blog about my dissertation-writing process but my posts eventually petered out as I got further and further behind schedule.  After a while, I refused to write about it not only because I had nothing new or interesting to say but more importantly because I was simply ashamed to even bring up the topic.  I don’t know why I stopped writing.  It took me about three years longer to finish this than it should have taken and I can’t help but wonder how different my life and career would be if I had finished in a timely manner.  I’m not sure why I avoided working on this for so long but I know that all of the obstacles were internal and emotional.  And I can’t tell you that I had any miraculous breakthroughs that let me finally finish except for that fact that I was almost out of time.

The defense itself turned out almost exactly as expected.  My committee requested only very, very minor edits that required the addition of only a few sentences.  I had set aside the two days immediately following my defense to make edits and the final submission but I only needed a few hours to make those edits and a second round of (minor typographical) edits requested by my graduate school.  They’ve accepted the document and forwarded it on to ProQuest for permanent archival so I think I’m just waiting for a few random bits of paperwork to work their way through the systems before everything is completely, totally, and finally done.

I’m not sure what my next steps will be.  I worry that the data are too old – Internet access and use data collected in 2010 – to be publishable.  Of course, I have many ideas about how to conduct further data analysis and push this particular set of ideas further but I don’t think that anyone can be surprised that I’m a little bit burnt out on these specific ideas right now.  I’ll be sure to write more here if I do any further work with this study.

I think that my colleagues, family, and friends are surprised that I’m not more celebratory about finishing my doctorate.  The dissertation itself  – conceptualization, collection of data, analysis, and writing – was pretty easy for me and it doesn’t feel much different from other studies I’ve completed.  But the emotional drain of living with this immense self-imposed and emotionally puzzling weight for so long was so soul-sucking that I’m  more relieved than happy or excited to finally be done.  I’ll try to learn to celebrate later but for now I’m enjoying just living without the shame and embarrassment I’ve hidden from everyone for several years.

Now that I’m done, I can begin to chip away at my large backlog of video games.  I’ve tackled the problem of non-response bias on a Web-based survey but now I’m going to save humanity from aliens.


Higher education scholars, policy makers, and administrators know little about the experiences of undergraduate students who matriculate with minimal experience with technology. It is often assumed that all students, particularly traditionally-aged students, have significant experience with, knowledge of, and comfort with technology. Although that assumption is correct for many students, it is false for others. Despite the enormous increase in the use of Web-based assessment surveys and the increasing importance of accurate assessment and accountability data, those efforts may not be collecting adequate and accurate data about and from all students.

This study explores the non-response bias of first-year undergraduate students on a self-administered Web-based survey. First, data were collected with a supplemental survey added to the Beginning College Survey of Student Engagement (BCSSE). K-means clustering was used with this newly constructed Internet Access and Use survey to classify students according to their Internet access and use experiences. Second, demographic data from BCSSE and the Internet access and use data were included in a logistic regression predicting response to the subsequent National Survey of Student Engagement (NSSE).

The Internet Access and Use instrument proved to be a viable way to classify students along lines of their previous Internet access and use experiences. However, that classification played no meaningful role predicting whether students had completed NSSE. Indeed, despite its statistical significance the final logistic regression model using provided little meaningful predictive power.

Generalizing the results of this study to all Web-based surveys of undergraduate college students with random or census sampling indicates that those surveys may not introduce significant non-response bias for students who have had less access to the Internet. This is particularly important since that population is already vulnerable in many ways as being disproportionately composed of first-generation students, underrepresented minority students, and students with lower socioeconomic statuses. This reassures assessment professionals and all higher education stakeholders that cost- and labor-efficient Web-based surveys are capable of collecting data that do not omit the voices of these students.

Inserting Unique Survey IDs into Multipage Paper Surveys

I still believe in paper surveys.  I believe that their immediacy and accessibility makes them very well-suited for some situations.  Although I value technology-based surveys (e.g. Web-based, tablet-based) I definitely believe that there are times when paper surveys are superior.

You can imagine that I was very happy when my new employer approved the purchase of (a) a printer with an automatic duplex scanner and (b) an installation of Remark Office OMR 8.  These two tools together will allow us to conduct paper surveys with some level of ease, automation, and accuracy.  I’m particularly happy that this will allow us to break free from the tyranny of Scantron by allowing us to create customized survey instruments that don’t rely on generic Scantron answer forms.

Now that I am learning how to use Remark Office OMR 8 I am figuring out all of those little things that I was previously able to count on other people to do, often without even knowing that it was being done.  Most recently, I had to figure out how to add unique survey IDs on a multipage survey.  Let me break it down for you:

I have a survey that is six pages long.  On each page, I have the page number and I can tell Remark Office where that page number is so I don’t have to worry about keeping pages in order.  But I also need some way to link all of those pages together when I am scanning multiple surveys so the correct six pages are grouped together in the resulting data file.  Hence I need to add a unique survey ID to each page of each survey.  Adding page numbers is easy but how do I add survey IDs?

I had to do this for my dissertation instrument but that was a one-page instrument so this was a simpler process.  The multipage process took me a few hours to figure it out and here is what I have settled on for now:

  1. Create the survey instrument.  I did this in Microsoft Publisher because it was the desktop publishing tool I had at hand.  I suppose you could use Word or something similar but it won’t give you near as much control over the layout.
  2. Print or save the survey as a pdf.
  3. Use that pdf to create another pdf with multiple copies of the survey instrument.  Right now, this is the clunkiest part of this process as I haven’t yet figured out how to directly print multiple copies of the instrument as a pdf.  Instead, I have to save multiple copies and merge them together.  It’s not entirely horrible as the merges geometrically multiply so it quickly becomes easy to make a single pdf file with many, many copies of the survey instrument.
  4. Create a simple Excel spreadsheet with the sequence of survey IDs.  My survey instrument has six pages so I end up with one column of numbers where each number is repeated six times before being incremented to the next one.  This spreadsheet is used in a mailmerge so I suppose this could easily be done as a comma-separated file or in some other program that produces similar output.  It’s important that the number of survey IDs match the number of surveys in your pdf.
  5. Create a simple Word document whose only text is a merge field that will insert the survey IDs into the document.
  6. Merge the Word document and save or print the resulting file as another pdf.  You now have two pdf files with the same number of pages; one has survey instruments and the other has survey IDs.
  7. Use pdftk to add the survey ID pdf as a background to the survey instrument pdf.  pdftk is a simple command line tool that lets you manipulate pdfs.  It’s freely available for many platforms, including Windows.  I used the “multibackground” parameter to essentially merge these two pdfs into one, adding the survey IDs to the survey instruments.  I got lucky in that my survey IDs were well-aligned with my survey instrument but you might have to modify one or both of your documents to get the survey ID to end up where you want it.

Now that I have unique survey IDs for each survey and page numbers on each page, I can feed the surveys into the scanner in any order I want and everything will work!  I just have to ensure that they’re all right-side up because I don’t know how well Office Remark OMR 8 can detect and correct for upside down instruments (it’s a feature of the software but I’ll have to test it; if this were a real concern I’d be looking into possible solutions such as cutting off or rounding one of the corners but I’ll be working with small enough batches that it will be easier just to flip through the completed instruments).

Item Non-response and Survey Abandonment SPSS Syntax

I don’t often write about what I do in my day-to-day job.  But I’ve recently spent quite a bit of time working on survey item non-response and survey abandonment and I want to save you some time if you’re working on those issues, too.

One of the projects on which I’ve worked over the last couple of years is the development of an updated version of the National Survey of Student Engagement (NSSE) survey instrument. We’ve done a lot – a LOT – of work on this.  As part of this work we’ve pilot tested the draft versions of the new survey.  Some of the many things we’ve analyzed in the pilot data are item non-response and survey abandonment.  I worked on this last year with the first pilot and when I worked on this again with this year’s pilot I got smarter.  Specifically, I wrote an Excel macro that generates the SPSS syntax necessary to analyze item non-response and survey abandonment.

As described in the Excel file, this macro takes a list of survey variable names and creates SPSS syntax that will add several new variables to your SPSS file:

  • A “Abandoned” variable indicating the last question the respondent answered if he or she abandoned the survey. If the respondent didn’t abandon the survey, this variable will be left empty (“SYSMIS”).
  • For every variable, a “SkippedItem__” variable indicating if the survey item was answered, skipped, or left blank because the survey was abandoned.
  • A “SkippedItems” variable indicating the total number of questions the respondent skipped.
  • A “SkippedPercentage” variable indicating the percentage of questions the respondent skipped.
  • A “AbandonedPercentage” variable indicating the percentage of questions the respondent did not answer because he or she abandoned the survey.

I created this macro because there were several versions of the pilot instrument.  Because you have to “work backward” through each question to identify respondents who abandoned the survey, each version of the instrument required a different set of SPSS syntax because each version had a different set of survey questions.  So it was much easier for me to write a program that generates the appropriate syntax then to do it by hand multiple times.  Laziness is a virtue.

Warning: This macro generates a lot of syntax.  The sample input has only four variables but it creates code with 105 lines (including blank lines and comments).  The surveys with which I was working had 130-160 variables and I worked with 11 different versions of the survey instrument.  In the end, I had an SPSS syntax file with tens of thousands of lines of code.  The SPSS syntax editor got very grumpy and slow, probably because of the large number of DO IF conditionals and the syntax highlighting it applies to those blocks of code.  I ended up working mostly in Notepad as I was troubleshooting the syntax and pasting the resulting text into the SPSS syntax editor only when I was ready to run it.  The good news is that the syntax is actually very straight-forward and arithmetically simple so it ran fairly quickly.

I know that this fills a very, very small niche.  But maybe someone will find this helpful or useful.  I spent a few days working on this so there’s no reason why someone else should have to redo this work.

Warning 2: I used this macro again a few years later and noticed that it’s set up to only deal with numeric data. If you have any string data then you’ll need to modify it accordingly.