Subscribe to RSS

Mistaken Goal: Where Higher Education & Technology Meet


"...technology is not something that happens to us. It is something we create. We must not confuse a tool with a goal. We must, therefore, be sure that technology serves the fundamental purposes of higher education." Stanley N. Katz in "In Information Technology, Don't Mistake a Tool for a Goal"

Dissertation Journal: Scanning Survey Instruments

Last week I finally began scanning survey instruments to turn pencil and pen marks into data. Using the setup in the photo, I’ve scanned most of the surveys. The scanner I’m using is quite old but it’s a beast that still scans very quickly. It even scans surveys printed on longer sheets of paper and on both sides; it’s what is used to scan paper versions of NSSE and BCSSE. And the fine folks in IU Center for Survey Research (CSR) are experts in this entire process so they know the hardware, software, and how it’s all used and they’re guiding me with patience and professionalism.

Verifying the surveys – systematically double-checking responses the software thinks are missing and resolving ambiguous situations – takes much longer than scanning them. I can only spend two afternoons a week scanning or verifying but I hope to have this set of surveys done so I can send data back to participating institutions very soon.

I’m still worried that there may not be much variance in the data. It sure seems like that as I glance at each survey while I’m batching, scanning, and verifying them. But that might be selection bias so I just have to wait until I have data in a usable format.

It was also interesting to learn that one of the steps of the whole scanning-surveys-to-import-data process doesn’t work like I thought it did. In fact, the step – “monitoring” – doesn’t take place at all. I thought that there was a step after verifying where some survey instruments (1 out of every 10) were compared to the extracted data. Apparently CSR no longer monitors scans as the scanning process is so accurate that as long as the verification process is carried out accurately the data are accurate. In fact, monitoring not only makes the entire process longer but it may even introduce more error than it reduces. I thought that monitoring was an important part of the process but I trust the experts in CSR and their guidance.

Finally, I really need to get back to working on my dissertation proposal. I’m having trouble getting myself into a productive routine. I hoped that my time would be more open now that I’m done with coursework but it seems like I have even more demands on my time. I’m pulling back from some things (I recently resigned from the ResNet Applied Research Group and I stepped down from my regional NASPA leadership position, for example) but I still feel it necessary to continue some non-dissertation-related activities. I need to find a better balance. Or move to a planet with longer days and more hours to get things done.

Coverage and Prominence of U.S. College and University Wikipedia Articles

A colleague and I are presenting a paper at ASHE in a few months discussing the content of Wikipedia college and university articles.  The most common comment the reviewers made of our paper proposal was that we did not quite answer the “So what?” question.  In other words, we didn’t quite convince them that our topic is important and interesting.  Part of the answer lies in convincing you that U.S. college and university Wikipedia articles are (a) very common and (b) very popular.

First, let’s see how common U.S. college and university Wikipedia articles are.  To do this, I need to figure out how many institutions have a Wikipedia article.  I randomly selected 10% (732 units) of the 2008 IPEDS universe, a listing of every Title-IV-participating institution (e.g. virtually every accredited institution in the United States and its territories).  I then checked to see if these units have Wikipedia articles.  Broken down by sector and control and ignoring the handful of system offices and unclassified institutions pulled into the sample, here is what I found:

Table 1: Coverage of Wikipedia Articles
Less than 2-year 2-year 4-year All
Public 20.69% 87.16% 100.00% 82.04%
Private not-for-profit 9.09% 31.25% 91.28% 81.91%
Private for-profit 13.75% 40.21% 85.96% 35.03%
All 14.50% 62.61% 92.26% 61.47%

Considering that most people in the U.S. think of 4-year institutions when they think of “college” or “university,” Table 1 shows us that it’s fair to say that college and university Wikipedia articles are very common.  Not only are they ubiquitous for public 4-year institutions, they’re very common for private 4-year institutions and community colleges.  The primary types of institutions for which they are uncommon are private 2-year institutions and all types of less than 2-year institutions, institutions typically associated with specialized technical training and usually omitted when talking about colleges and universities.

Next, we need to figure out the popularity of U.S. college and university Wikipedia articles.  In this context, I am defining “popular” by examining where the top three search engines – Google, Yahoo!, and Bing – place U.S. college and university Wikipedia articles.  To do this, I selected a random sample of these Wikipedia articles; the sample is also stratified, including 12 articles from each major quality classification assigned by the Wikiproject Universities (Featured, Good/A, B, C, Start, and Stub).

Table 2: Search Engine Placement
Google Yahoo! Bing
Average placement 6.9 2.3 2.3
Percentage first unofficial link 79% 96% 96%

As shown in Table 2, when you search for these institutions in each of the three leading search engines, Wikipedia articles are not only among the very first results but they’re usually the first result that isn’t controlled by the institution.  Google seemed to struggle with providing accurate results for the institutions who do not have unique names (i.e. Southwestern College, Sierra College), listing several other similarly-named institutions above the Wikipedia article.  Yahoo! and Bing did not have this problem, almost always listing the Wikipedia article immediately after the institution’s official website or immediately after the institution’s official website and the official athletics website (of course, Yahoo! and Bing provided the same results since they use the same search technology).

Based on a random sample of the accredited colleges and universities in the United States, Wikipedia has articles for the majority of institutions.  This is particularly true when considering 2- and 4-year institutions, especially public ones.  Further, those Wikipedia articles are placed very highly in search results, usually immediately proceeding the institution’s official website.  Not only are U.S. college and university Wikipedia articles very common, they’re extremely popular.

(The data are available here:

A few of the spreadsheets are rather large for Google spreadsheets so they’re a bit sluggish.  Sorry!)

Dissertation Journal: From Surveys to Data

Completed surveys from two of the eleven institutions participating in the first wave of data collection have arrived.  Now I’m working with my colleagues in IU’s Center for Survey Research (CSR) to transform these from a stack of completed surveys into an SPSS data file.  One of my colleagues in CSR likened this process to alchemy and I think he’s right!

One of the final steps in creating my survey instrument was to send it down to CSR for them to review and reformat it so their scanners can read it.  The main part of that process involved setting up their scanning program to read this instrument.  Not only did they have to indicate where to look for responses but also what the responses mean (i.e. a mark in this specific area is response number 3 to question 1).  This also involves telling the program how to record the responses (i.e. response number 3 to question 1 generates a value of “4″ for the “compuse” variable).  As can be surmised from the previous example, this setup process also includes naming and defining all of the variables that will eventually end up in the SPSS file.

Just as interesting and important as the automated processes are the manual processes that must be created, documented, and enacted.  Most of these are quality assurance or error checking processes.  For example, after a batch of surveys is scanned someone must manually review the places where the program is unsure (i.e. a large checkmark that spans multiple response boxes) or the response was too faint for the scanner to properly record (all “missing” values are checked to ensure they are actually missing and not a scanning error).  There are also a few points in the process where results are manually double-checked to provide quality assurance.

When the instruments are scanned, the data are inserted into a database.  Then the data have to be extracted from the database and inserted into an SPSS file.  Once the SPSS template is created (and checked and double-checked), inserting the data is fairly trivial.  It can get a bit tricky, however, if you’re merging in data from other sources.  In this instance, we’re merging the results from this survey with the results from these students’ BCSSE surveys but I’ll do that on the back end using SPSS instead of doing it on the front end with a database query; that will make it easier for me to merge these data into the institution-specific data files we return to participating institutions.  It’s also something I can do myself which gives me more control over and understanding of things (I don’t touch the database; that is all CSR).

There are a lot of small details not described in the above overview and I’m really enjoying learning about this entire process.  It’s nice that my survey is a relatively small one: ~1600 one-page instruments.  That allows me to be very hands-on which (a) ensures that I understand the whole process and (b) saves me money because I don’t have to pay someone else to do these things.

There are still some unanswered questions, mostly those surrounding what to return to participating institutions and when to do so.  I wish I had an answer to some of those questions but I don’t.  Part of this is caused by the fact that I spend almost all of my time working on NSSE or occasionally FSSE where data are collected and reports generated at pre-determined and coordinated times.  BCSSE, on the other hand, uses a rolling schedule where we generate reports and return data to institutions as we receive their data.  That might not sound like a big difference but it’s not just a different process but a different mindset, one I had not fully anticipated or appreciated.

Finally, it’s tremendously exciting to finally see data!  We’re going through several test runs to ensure everything is set up properly and I understand how everything works.  I’ve been able to glance at a handful of surveys during testing but it was finally real to me when I received the first (test) SPSS file with MY data from MY survey instrument.  It sounds silly to admit that a screenful of numbers is exciting and even exhilarating but it’s true.  I have quite a ways to go but through the haze I’ve glimpsed the light at the end of the tunnel reflecting off something shiny in the far, far distance.

http://www.indiana.edu/~csr/