Go to table of contents

Get more information on Burton: Computing in the Social Sciences and Humanities

Buy this book





Computer Environments for Content Analysis:
Reconceptualizing the Roles of Humans and Computers


Dictionary-Based Programs: The Legacy of the General Inquirer

The General Inquirer was among the first computer programs for content analysis. As recently as the mid-1990s the General Inquirer remained the most often discussed computer program in the communication literature, dominating discussions of computer-assisted content analysis in methodology textbooks and monographs. Developed and refined by Stone and associates in the early 1960s, by 1966 the General Inquirer had already been applied to a wide variety of texts. 1 However, the General Inquirer has been used only by a small number of researchers. It fell into disuse not because better programs became available but because content analysts failed to embrace computer-assisted methods. In fact, computer techniques remain the exception rather than the rule in published content analyses. Because the General Inquirer gave birth to, and continues to shape, researchers' conceptions of computational content analysis, it is worth reviewing some limitations of the General Inquirer, limitations that are too often seen as limitations of computational content analysis in general rather than merely as limitations of a specific computer program.

      The General Inquirer is a dictionary-based program in that it assigns words to various researcher-defined categories. For example, the word money is assigned by one of the General Inquirer dictionaries (the Lasswell Value Dictionary) to the category "wealth." An occurrence of the word money in a text is an indicator of a concern with wealth, along with words such as resource, industry, and economy, which are also assigned to the "wealth" category. 2 The General Inquirer has been described as a theme-based program 3 or, somewhat less kindly but no less accurately, a "single-word-out-of-context" program. 4 The General Inquirer removes words from their linguistic context and treats them as indicators of the themes, concerns, or values that the dictionary was designed to assess. The General Inquirer washes out grammatical features of language to assess strictly lexical features, and it cannot readily assess relationships between words across sentences and paragraphs. In short, it is linguistically unsophisticated, and it limits users to the use of words (or phrases) as the unit of analysis.

      Of course, dictionary-based programs are appropriate for many research questions. Unfortunately, the General Inquirer leaves a great deal to be desired in terms of letting users create their own dictionaries or customize existing dictionaries. The General Inquirer is customizable in principle, but in practice most users apply one of the two dictionaries—the Lasswell Value Dictionary or the Harvard IV Psychosocial Dictionary—currently packaged with General Inquirer. The General Inquirer is difficult to use because it retains many vestiges of its origins on early IBM mainframe computers. And the developers of the General Inquirer have in recent years seemed more interested in demonstrating the validity of the dictionaries across a variety of texts than in improving the program itself.

      Despite these limitations, it is important to note the many important accomplishments of General Inquirer developers and users. The rigor with which many General Inquirer users have pursued their research programs remains underappreciated and all too rarely matched by other content analysts. For example, the last two books published by General Inquirer users, although now more than ten years old, are exemplars of analytical clarity and methodological sophistication. 5 General Inquirer users were among the first to demonstrate that computers do not necessarily discourage careful scholarship in textual analyses (although some skeptics of quantitative approaches may remain unconvinced). Unfortunately, the rigor evident in much General Inquirer research is in the service of dictionary-based procedures, which are appropriate only for a subset of content analytic research questions.

      A better dictionary-based program has not come along to replace the General Inquirer. Indeed, there seems to be no market for one. The limitations of dictionary-based programs and the inability or unwillingness of researchers to think beyond these limitations left a legacy of disuse and misunderstanding of computer techniques for content analysis. Dictionary-based computer procedures contribute to the somewhat accurate belief that content analysts are obsessed with counting rather than understanding texts, that they are fascinated with texts but perversely uninterested in language per se. Computer techniques typically receive limited and cursory treatment in textbook discussions of content analysis. Computers are seen as adjuncts to content analysis proper; in fact, computational content analysis often is seen as an entirely separable subspecialty of sorts. Certainly, computers are not yet integrated into mainstream content analysis research programs.

Content in Context: The Linguistic Turn in Computational Content Analysis

In recent years there has been a growing movement to devise content analytic computer programs that are sensitive to issues of language and meaning and that permit researchers to more readily adapt the program to specific research questions and text genres. These two developments in content analytic computing—linguistic sophistication and program flexibility—are closely related. They represent a movement toward seeing computers not as surrogate researchers but rather as tools for gleaning information from texts without sacrificing context. They address a litany of complaints about the difficulty of constructing content analytic research designs that capture linguistic and relational aspects of texts. 6 Recent advances in computer programs for qualitative data analysis and, especially, clause-based content analysis suggest that researchers need no longer sacrifice context to exploit the power of computers in content analysis.

      There is a growing use of and literature on computer tools for qualitative research. Programs such as NUDIST NVivo and HyperRESEARCH enable researchers to code texts online and to create, display, and explore relationships between segments of coded text. 7 Although these programs are not designed for content analysis per se, they impressively demonstrate the computer's ability to support context-sensitive text analysis.

      Roberts and Popping review programs that facilitate what they call clause-based text analyses. As Roberts and Popping note, theme-based programs such as the General Inquirer capture only one relevant type of text variable—frequency of occurrence of various themes—whereas clause-based programs allow researchers to examine the relationships between themes in texts. This is accomplished by using the computer to help generate and analyze relationally encoded texts. 8

      Computer-Assisted Evaluative Text Analysis (CETA) is an innovative clause-based program that facilitates the implementation of Osgood's evaluative assertion analysis as slightly reformulated by CETA developers. 9 CETA facilitates the parsing of texts into nuclear sentences that predicate something positive or negative about meaning objects (e.g., people, institutions, concepts, events) or about the relationship between meaning objects. The goal is not only to count how frequently various meaning objects are mentioned but also to determine the qualities (positive and negative) and other meaning objects with which a meaning object is associated (explicitly and implicitly) across the entire text under study. The Map Extraction, Comparison and Analysis (MECA) program also facilitates the discovery and analysis of relationships between words and phrases across entire texts. 10 Unlike theme-based programs, CETA and MECA capture rather than filter out relationships between text elements.

      Another promising clause-based program, Program for Linguistic Content Analysis (PLCA), facilitates Roberts's method of Linguistic Content Analysis by helping researchers discover and code basic linguistic patterns underlying the surface features of a text. 11 PLCA uses a dictionary-based approach in that frequently occurring verbs and nouns are assigned numerical codes, but it preserves the grammatical context of the coded words. Eltinge and Roberts used PLCA to determine the extent to which science is portrayed as a process of inquiry (as opposed to an accumulation of facts) in high school biology textbooks. They assigned words to various categories such that the textbook sentence "The diagram in Fig. 48-4 shows the results of a cross involving two characteristics" was rewritten in PLCA as "THE FIGURE REVEALS/SHOWS an EVENT's RESULT," where the words in uppercase were from the PLCA dictionary created by the researchers. 12 The rewritten sentence was meant to show a somewhat deeper, more basic linguistic structure. Thus, PLCA can be used to determine, say, how often a person (e.g., a student or scientist) rather than a disembodied noun (e.g., figure, laboratory, science) functions as an agent in textbook sentences.

      Franzosi has developed a computer program to facilitate text coding using the principles of semantic text grammar. 13 Unlike the PLCA approach, Franzosi's approach does not aim to rewrite sentences as more basic sentences per se but to filter out linguistic complexity deemed irrelevant for the research purpose at hand. The program helps coders identify and record basic linguistic information—subjects, actions, objects, and modifiers—that is retained in its grammatical context. The coded text is then entered into a relational database management system for analysis. Shapiro and Markoff use a similar computer-supported approach in an extraordinarily detailed study of public documents from eighteenth-century France. 14

      These programs for qualitative and content analytic research have in common a responsiveness to issues of language and meaning and a flexibility that permits researchers to readily tailor the program to their specific needs (although these programs impose significant limitations, discussed later in this chapter). These programs also have in common features designed to support human coders, and in this respect they differ markedly from most dictionary-based programs.

From Computerized to Computer-Supported Content Analysis

Many new programs for content analysis, including most of those discussed in the preceding section, provide online support for human coders. CETA prompts coders in three early stages of analysis: defining scoring options, parsing text, and assigning numeric values to text features. MECA also offers interactive guidance to help coders choose coding options and assign numeric values. PLCA gives coders feedback by displaying reconstructed clauses immediately after their component parts are coded. The coder can choose to recode the clause if the reconstruction is inadequate. Deffner describes a simple but helpful program that prompts coders to assign numeric scores to text passages. 15

      In providing online support for human coders, these programs do much to redress another unfortunate legacy of dictionary-based programs such as the General Inquirer: the notion that computers can or should somehow automate content analysis. Adopting the terminology and enthusiasm of early research in artificial intelligence, developers of some early computer programs for content analysis spoke of automating the process, and some more recent authors continue to speak of "computerized content analysis," an unfortunate phrase that overstates the possibilities of computational content analysis and cultivates the misleading but nonetheless commonly asserted distinction between computational and manual content analysis. The phrases "computer-assisted" and "computer-aided content analysis" are preferable and have achieved limited usage. At the risk of being deemed overly sensitive to semantic niceties, I recommend the label "computer-supported content analysis" 16 because a focus on computer support or scaffolding for various tasks has proven quite useful in designing systems for human-computer interaction. Perhaps the most important feature of many of the recently developed programs for content analysis is their ability to support human coding. The developers of these programs recognize that for the foreseeable future humans will have to code texts if the coding is to be sensitive to the complexities of language and meaning. Programs such as CETA, MECA, and PLCA are designed to support rather than supplant human coders and therefore may help integrate computer techniques with mainstream content analysis.

      Despite the conceptual and technological improvements manifested in these programs, they have two important limitations. First, although they are more readily customizable than their forebears, these programs remain appropriate only for certain kinds of analysis, such as linguistic content analysis (PLCA) or evaluative assertion analysis (CETA). These areas of analysis are rich, and these programs can accommodate a fairly diverse range of research questions and materials, but these programs remain most useful to researchers who share the developers' theoretical perspectives. Second, these programs are limited to basic coding units no larger than sentences. They allow analysis of relationships across sentences, but coding must occur at the level of sentence, clause, or word. It is possible to aggregate sentence-, clause-, or word-level data, but for the researcher who wants to collect data only on larger coding units (e.g., paragraphs or entire texts), the coding of smaller units as an intermediary step will add unnecessary costs and complexity to the project.

      On one hand, content analysis programs designed to advance a particular theoretical approach—and that, as a result, necessarily support only a subset of all potentially useful coding tasks—are welcome in that they facilitate theoretical content analyses and in doing so address the long-standing (and largely justifiable) complaint that content analysis is too often atheoretical. Certainly, it is heartening to see content analysts use computer tools to develop more sophisticated content analytic research programs. On the other hand, the pluralistic nature of communication research and theory militates against the widespread adoption of any program tailored to a single theory. Indeed, the clause-based computer programs discussed in this chapter have not been widely adopted by content analysts, few of whom share the developers' particular theoretical commitments. Moreover, it is unreasonable to expect researchers to develop theory-specific programs in adequate numbers because many researchers lack the computer skills and resources needed to develop such programs.

Computer Environments for Content Analysis

There is a need, then, for a computer tool that can facilitate a wide range of human coding tasks and thereby permit computers to be brought to bear from a wide range of theoretical frameworks on a wide range of materials, both textual and visual. Programs such as CETA, MECA, and PLCA suggest that computers show great promise in supporting human coding tasks. 17 Deffner was perhaps the first to show how researchers might exploit the interactive capabilities of computers to facilitate human coding protocols. 18 Franzosi claims that many human coding tasks are best done online, and he outlines the desirable features and likely benefits of a system for computer-supported coding: The system should display the material to be coded, prompt coders to enter data (numeric or textual) in specific fields, provide an online code book and help function, and allow coders some choice in how to proceed through the coding protocol. 19 Franzosi claims that such a system would provide data that are richer, more reliable, and less costly to collect than the data collected in traditional manual coding procedures. More specifically, data errors would be reduced by eliminating the paper shuffling typical of manual procedures, providing online checks to confirm that entered values fall within a specified range and are logically consistent with values entered in other coding fields, and eliminating the need to keypunch data from paper forms (because coders enter the data directly into the computer). Data errors and other coding problems could also be detected more quickly because researchers will have immediate access to the data. Immediate access also offers researchers more flexibility to change coding protocols as unforeseen problems (and opportunities) arise and makes it possible to analyze and publish research results more quickly. Costs would be lower because online coding can be completed more quickly than manual coding.

      In short, Franzosi recommends that content analysts adopt a computer system similar to those used in computer-assisted telephone interviewing (CATI). CATI systems enjoy widespread use in survey research data collection because of their ability to facilitate reliable and cost-effective data collection. Franzosi suggests that similar benefits regarding data quality and research costs can be realized by content analysts who go online. Franzosi claims that his procedure of coding texts according to the principles of semantic text grammars is particularly (and perhaps uniquely) appropriate for implementation on a computer, but there is no reason why a computer system could not accommodate a wide variety of coding tasks (although such a system admittedly seems unnecessary for dictionary-based coding because most dictionary-based procedures can be readily handled by computer programs that need little or no human coding). There seems to be much to recommend a computer system that would permit researchers to create coding protocols (or select from a variety of customizable coding templates) to be completed online by coders who have access to online support. Such a system would provide the data collection and project management benefits typical of CATI systems to many content analysts, not just to those with extensive computer skills and resources, or to those whose theoretical preferences are already manifested in theory-specific software such as CETA or PLCA. A more general system for computer-supported content analysis would recognize and encourage the pluralism inherent in communication research.

      In addition, such a system could be implemented on a network, thereby facilitating research projects that involve research collaborators and coders at distant sites.

      Finally, such a system would facilitate the content analytic study of images, from drawings to photographs to television footage. Content analysts have in recent years moved away from an exclusive focus on texts to the role of images in our symbolic environment, but to date little attention has been paid to the need for a computer tool to facilitate the study of images. Evans reviews the rapidly growing field of video information retrieval, suggesting how social scientists can exploit the many new systems that aim to index and retrieve video content. 20 These systems have been designed primarily to help corporations manage digital video repositories, but as Evans notes, these systems hold great promise in supporting content analysis of film and television.

      The computer tool suggested here would be applicable to linguistic as well as nonlinguistic phenomena. Indeed, it could provide interactive support for assigning numeric values to artifacts from almost any communicative act.

      Such a tool would be worthwhile simply because it provides for content analysts the data quality and cost-effectiveness benefits typical of CATI systems. But it can also be argued that computer-supported content analysis can increase the sophistication of content analytic research design and play an important role in advancing content analytic theory. Franzosi suggests that the computer is an ideal environment in which to implement sophisticated content analytic research designs, 21 but to date there is little empirical evidence that shows under what conditions, or even whether, computer-supported coding is more effective than traditional paper-and-pencil methods (indeed, there is a paucity of research on the effectiveness of content analytic procedures in general). We need to know more about what coding tasks are best accomplished in a computer environment and why. Still, it seems likely that a computer system to support human coding can facilitate research designs that are more sophisticated than those possible using paper-and-pencil methods.

      The CATI-like system recommended here is not necessarily preferable to theory-specific content analysis programs. In fact, theory-specific programs seem more desirable in that they more readily support and implement theory-driven research designs. The system recommended here is meant only to provide a way for many researchers, with disparate research interests, to move quickly into a computer environment. As discussed later in this chapter, this general system may even encourage the development of more sophisticated, theory-specific programs.

Magic in Computer-Supported Content Analysis: Artificial Intelligence

Almost 50 years ago, Berelson, a seminal figure in content analysis, warned, "Content analysis, as a method, has no magical qualities—you rarely get out of it more than you put in, and sometimes you get less. In the last analysis there is no substitute for a good idea." 22 This warning, or something like it, is often echoed in treatises on content analysis and is especially common in textbook discussions of computer-supported content analysis. In his influential 1969 textbook, Holsti quotes Berelson approvingly and adds, "Development of computer content analysis programs detracts nothing from the wisdom of [Berelson's] assertion." 23

      In contrast, I argue that the time has come to expect some computer magic in content analysis. Though certainly valid as a warning about unsophisticated research designs, sentiments like Berelson's also promote a misleading separation between our theories and research designs and the tools with which they are implemented, especially when these sentiments are ritually repeated in contemporary textbook discussions of computer-supported content analysis. It is no longer tenable to presume that computers cannot help content analysts discover important patterns in their data, patterns that researchers neither intended to investigate nor would have discovered without computer tools. Given recent advances in artificial intelligence, it is no longer tenable to presume that productive insights must be the exclusive province of the content analyst rather than his or her computer.

      Social scientists interested in computer modeling and simulation have adopted artificial intelligence techniques with promising results, and artificial intelligence techniques have already led to several promising advances in computer programs for content analysis. Both CETA and MECA feature inference engines that can identify implicit relationships between text elements. For example, if one clause of a newspaper editorial notes that "Politician X supports Senate bill Y" and a later clause (even many paragraphs away) opines that "Senate bill Y may result in increased unemployment," CETA will make the logical inference that the source of the statements (in this case the editorial writer) implies that "Politician X may cause an increase in unemployment." Furthermore, because CETA implements basic principles of evaluative assertion analysis, it probably will "know" (or certainly be told by coders) that the word unemployment is evaluated negatively by most people. Thus, CETA can also generate the inference that the source suggests that "Politician X may cause something bad to happen." 24

      Developers of programs for qualitative data analysis recently have adopted several promising artificial intelligence techniques. HyperRESEARCH supports the use of production rules similar to those typical of expert system software to help researchers discover and create relationships between coded text segments and to formulate and test hypotheses about these relationships. 25 Analysis of Qualitative Data (AQUAD) also supports users in formulating and testing hypotheses about textual data. AQUAD is written in the logic programming language Prolog and allows users to select from several Prolog procedures for hypothesis testing or to create customized procedures. 26 Although programs for qualitative data analysis typically provide only minimal support for quantitative analysis, content analysts are well advised to look into these programs to get a sense of how artificial intelligence techniques are already being brought to bear, albeit in a limited way, on problems of text analysis.

      As promising as artificial intelligence is for content analysis, it probably will be many years before a system is developed that can fully automate a clause-based content analytic procedure of even minimal sophistication. Even the most advanced programs for natural language processing of new stories, for example, are effective only in a narrow range of tasks within a restricted range of texts. Fan found no viable artificial intelligence applications that would reliably identify and code passages in wire service texts that suggested a positive, negative, or neutral stance toward specific political issues such as the presence of U.S. troops in Lebanon. 27 Fan devised a "successive filtration" method using iterative computer procedures, customized for each topical issue and revised after each iteration, that successively remove irrelevant text, eventually leaving only the material crucial to the coding task, which is then accomplished in the last iteration. In doing so, Fan cleverly mimics a sophisticated natural language parsing routine, with impressive results. Fan's method illustrates the power of combining natural language text processing with human coding. 28 This combination exploits the computer's ability to parse large amounts of text and the researcher's ability to devise and apply meaning-sensitive coding protocols. As Fan's dilemma and solution suggest, both artificial and human intelligence are desirable in content analysis.

      Fan's work also illustrates the usefulness of developing intelligent computer applications tailored to specific content analytic problems and theories. Indeed, problem- and theory-specific programs may be all that are viable, at least for the foreseeable future. Like the field of communication, artificial intelligence is a diverse field that offers several (sometimes incompatible) approaches to text analysis. Given the constraints of current artificial intelligence techniques, any intelligent program for content analysis probably will use only a handful of available techniques, be applicable only to a restricted range of texts, and support only a subset of content analytic research designs. This reality conflicts with traditional views (and hopes) about the role of computers in content analysis. The developers and users of the General Inquirer have long suggested that the General Inquirer can handle all or most of the computer needs of content analysts. The limitations of the General Inquirer often are noted with admirable insight, 29 but it is seldom suggested that alternative computer approaches are desirable or even viable. Even the name of the program, the General Inquirer, boasts of its allegedly wide applicability. Thus, the notion that there must be numerous computer approaches to accommodate the wide variety of content analytic concerns is perhaps puzzling to researchers who are accustomed to routine popular news accounts of great advances in computing.

      The idea that content analysts may have to develop the computer techniques most appropriate for their research may even be distressing to many. Certainly, only a handful of content analysts have developed computer tools to support their work. Forty years ago, the resources and skills needed to develop computer tools for content analysis were immense; the General Inquirer was an heroic achievement given the limitations of computers in the early 1960s. Fortunately, in recent years there has been rapid growth in the availability of software to support the development of intelligent systems. The success of commercial expert systems development tools is but one example of this trend. Thus, content analysts need not necessarily be programmers to exploit intelligent systems, but of course programming skills will be helpful (as will interdisciplinary initiatives with computer scientists; there have been far too few such initiatives). Although they may have to acquire a new set of skills, content analysts need not wait for others to provide sophisticated computer tools for content analysis.

      In fact, it is crucial that researchers not wait for others to develop intelligent computer tools for content analysis. Researchers should begin to develop computer tools that embody and extend specific aspects of their own expertise in understanding texts and images. They might develop techniques to automate some—although probably not all—of the coding tasks relevant to certain problems. Even some of the more sophisticated coding tasks can be automated or at least supported by artificial intelligence techniques. Just as research in artificial intelligence has advanced our understanding of human cognition, content analysts who attempt to build their expertise into intelligent systems may in doing so learn much about the processes and possibilities of content analysis.

      It should be noted that the recommendation made in the preceding section to develop a single computer system to support a wide variety of human coding tasks may seem incompatible with a call to develop theory- and problem-specific programs and to automate as much of the coding process as possible. However, a general system may provide an environment in which to implement intelligent functions, either to support human coders or to automate a part of the coding protocol. In this sense, the general system recommended earlier is designed to take full advantage of, and in fact encourage, developments in artificial intelligence that may ultimately, albeit slowly, lead to fully automated content analyses. In the long term, it seems plausible—indeed, likely—that even very sophisticated content analytic projects can be automated. A general computer environment for human coding is recommended here as an interim step between predominantly manual and predominantly computerized coding.

      Ultimately, we can expect a system that will monitor, process, and code texts and images with little human intervention. This system may be able to retrieve and manage great quantities of material and may actively identify opportunities for content analysis, devise and test content analytic hypotheses, and even learn as it does so. In other words, it is now feasible to begin working toward a kind of magic in content analysis. Furthermore, this magic is appropriate and even necessary if content analysts are to take full advantage of opportunities afforded by the emerging era of electronic databases and interactive media.

Content Analysis in the Era of Electronic Media

Communication researchers must strive to exploit for scholarly purposes the torrent of electronic information flowing from news outlets and other sources worldwide. They must also learn to mine more effectively the increasing number of online databases. The abundance of electronic texts from a wide variety of sources, created for a wide variety of audiences, makes it easier than ever before to test content analytic hypotheses. Computers enable content analysts to more effectively locate, manage, and process texts, of course, but computers may also help improve content analytic practice and theory by facilitating less costly and more sophisticated research designs.

      The growth of digital image repositories and the increasing sophistication of systems for storing and retrieving these images provide an opportunity for content analysts to turn their attention to image analysis. Despite the fact that images have long been said to fall within the purview of content analysis and despite the increasing salience of images in contemporary media, content analysts have done little to document and explain the role of images in our symbolic environment. Computer tools for assigning numeric values to image features are now feasible. As noted earlier, a general computer system for content analysis could support human coding of images. In addition, it is already possible to automate some basic image analysis tasks. Emerging in the commercial marketplace are systems that automate the detection of objects in images, automatically parse video into its constituent shots and scenes, and use face and speech recognition techniques to automatically identify people in video sequences. 30 Even the seemingly intractable problems of automating moving image analysis are being solved quickly.

      The nature of media content itself will change in the emerging era of electronic and interactive media, and content analysts must develop new theories and technologies to understand this change. We are entering an era in which content metadata will be common. Bauer and Scharl demonstrate how social scientists can exploit HTML tags to facilitate analysis of Web page content (a trick used by most Internet search engines as well). 31 Looking beyond HTML, we can expect that online texts will soon be implemented in XML, with which Web developers can create markup tags that provide rich data about the structure and content of a text. Emerging standards for digital video, such as MPEG-7, will enable video producers to append content descriptions and other data to video files. In other words, many of the texts and images distributed and consumed around the world will soon feature content metadata. Content analysts should work to become ready to exploit these metadata.

      The flexible, interactive structure of hypertext and hypermedia require that content analysts develop new conceptions of media content (e.g., where is the "content" of a hypermedia document that may offer tens of thousands of paths through the material?). A discussion of content analytic frameworks for interactive media is beyond the scope of this chapter, 32 but it seems clear that new computer tools will be needed if content analysts are to devise and implement sophisticated procedures for assessing interactive media.


Currently, many content analysts use information retrieval and database management programs to help locate and manage material to be coded, and many exploit the increasingly sophisticated data analysis and display features of statistical packages to assist in exploring and understanding the data that result from coding. But computers are still seen as adjuncts, at best, in the research process. Most content analysts seem unaware of the possibility of computers serving as full partners in formulating research hypotheses, creating and implementing coding protocols, and analyzing and modeling data. In fact, many continue to doubt the ability of computers to support these tasks, as evidenced by the wariness often expressed in textbooks regarding what is seen as undue reliance on computers in content analysis. This wariness was justifiable in the era of dictionary-based programs, and it reflects an admirable concern that content analysts continue to strive for sophistication. But it also poses a danger that content analysts will overlook relevant innovations in the fields of information retrieval, human-computer interaction, and artificial intelligence and therefore fail to develop tools to encourage sophisticated content analytic research in the emerging information era.

      In this chapter I have assiduously avoided arguments about the usefulness of content analysis, computer supported or otherwise, as a method of understanding communication processes. I have asserted only that computers can make content analysis more sophisticated. I leave the reader to judge the sophistication of current content analytic approaches. I am more concerned with challenging the notion that computers have little to offer content analysts or may even hinder the content analyst who relies on them. It seems clear that computers can help in many ways, from reducing research costs to improving data quality. Certainly, we need no longer worry that computers are too stupid to help, that they are merely calculating machines that cultivate naive reductionism among content analysts who use them. Computers are getting smarter at a rate far exceeding any improvements in human thinking skills. Much of the expertise content analysts bring to bear on research problems can be built into computer systems, enabling content analysts to refine, enhance, and extend their thinking.

      It should be noted that the emerging approaches to computer-supported content analysis reviewed in this chapter are not panaceas. All the computer programs discussed here remain so limited as to have found only a handful of users. These programs must be further refined, and new, more sophisticated programs must be created. Still, despite their limitations, the programs discussed here suggest a new and invaluable conceptual framework in which computers are seen not as adjuncts to content analytic research but as tools that can be integrated with human expertise throughout the research process. 33 In the near future, it may be unnecessary to demarcate computer-supported and noncomputational content analysis. We might once again speak simply of "content analysis"; computers will be both ubiquitous and transparent in the research process. We will seldom be concerned in our methodology textbooks (or rather in whatever hypermedia products replace methodology textbooks) with reifying what is already a somewhat arbitrary distinction between computers and humans. This new conceptual framework can be expected to encourage—and is in fact a necessary condition for—content analysis research programs that are responsive to the complexities of language and meaning and capable of assessing twenty-first century communication media and practices.


      1. Philip J. Stone, Dexter C. Dunphy, Marshall S. Smith, and Daniel M. Ogilvie, The General Inquirer: A Computer Approach to Content Analysis (Cambridge, Mass.: MIT Press, 1966).

      2. To be more specific, the words money, resource, industry, and economy are assigned to the category "wealth-other," which is one of three wealth-related categories (along with "wealth-participant" and "wealth-transaction"). The three categories can be collapsed into a single "wealth" category, designated "wealth-total." See Robert Philip Weber, Basic Content Analysis, 2d ed. (Newbury Park, Calif.: Sage, 1990).

      3. Roel Popping, Computer-Assisted Text Analysis (Thousand Oaks, Calif.: Sage, 2000).

      4. Klaus Krippendorff, Content Analysis: An Introduction to Its Methodology (Beverly Hills, Calif.: Sage, 1980), 126.

      5. J. Zvi Namenwirth and Robert Philip Weber, Dynamics of Culture (Boston: Allen & Unwin, 1987); Weber, Basic Content Analysis.

      6. See Hans Mathias Kepplinger, "Content Analysis and Reception Analysis," American Behavioral Scientist 33 (November/December 1989): 175-82; Carl W. Roberts, "Other Than Counting Words: A Linguistic Approach to Content Analysis," Social Forces 68 (September 1989): 147-77; Krippendorff, Content Analysis.

      7. Sharlene Hesse-Biber, Paul Dupuis, and T. Scott Kinder, "HyperRESEARCH: A Computer Program for the Analysis of Qualitative Data with an Emphasis on Hypothesis Testing and Multimedia Analysis," Qualitative Sociology 14 (Winter 1991): 289-306; Lyn Richards, Using NVivo in Qualitative Research (Thousand Oaks, Calif.: Sage, 1999). For overviews of software for qualitative data analysis, see Melina Alexa and Cornelia Zuell, A Review of Software for Text Analysis (Mannheim, Germany: Zentrum für Umfragen, Methoden und Analysen, 1999); Nigel G. Fielding and Raymond M. Lee, Computer Analysis and Qualitative Research (Thousand Oaks, Calif.: Sage, 1998).

      8. Carl W. Roberts and Roel Popping, "Computer-Supported Content Analysis: Some Recent Developments," Social Science Computing Review 11 (Fall 1993): 283-91, quote on 284.

      9. Jan J. Cuilenberg, Jan Kleinnijenhuis, and Jan A. de Ridder, "Artificial Intelligence and Content Analysis: Problems of and Strategies for Computer Text Analysis," Quality and Quantity 22 (Spring 1988): 65-97.

      10. Kathleen Carley and Michael Palmquist, "Extracting, Representing, and Analyzing Mental Models," Social Forces 70 (March 1992): 601-36.

      11. Roberts, "Other Than Counting Words."

      12. Ibid., 69.

      13. Roberto Franzosi, "From Words to Numbers: A Generalized and Linguistics-Based Coding Procedure for Collecting Textual Data," Sociological Methodology (Fall 1988): 263-98.

      14. Gilbert Shapiro and John Markoff, Revolutionary Demands: A Content Analysis of the Cahiers de Doleances of 1789 (Stanford, Calif.: Stanford University Press, 1998).

      15. Gerhard Deffner, "Microcomputers as Aids in Gottschalk-Gleser Ratings," Psychiatry Research 18 (Spring 1986): 151-59.

      16. Roberts and Popping, "Computer-Supported Content Analysis," 283 (emphasis added).

      17. Many computer programs for qualitative data analysis offer substantial and even ingenious support for qualitative coding tasks, but these programs offer little support for quantitative coding. Although these programs may be appropriated for some quantitative coding tasks, they are not widely applicable to the needs of content analysts.

      18. Deffner, "Microcomputers as Aids in Gottschalk-Gleser Ratings."

      19. Roberto Franzosi, "Strategies for the Prevention, Detection, and Correction of Measurement Error in Data Collected from Textual Sources," Sociological Methods and Research 18 (May 1990): 442-72.

      20. William Evans, "Teaching Computers to Watch Television: Content-Based Image Retrieval for Content Analysis," Social Science Computer Review 18 (Fall 2000): 246-57.

      21. Roberto Franzosi, "Computer-Assisted Content Analysis of Newspapers: Can We Make an Expensive Research Tool More Effective?" Quality and Quantity 29 (May 1995): 157-72.

      22. Bernard Berelson, "Content Analysis," in Handbook of Social Psychology, ed. G. Lindzey (Reading, Mass.: Addison-Wesley, 1954), 488-518, quote on 518.

      23. Oli R. Holsti, Content Analysis for the Social Sciences and Humanities (Reading, Mass.: Addison-Wesley, 1969), 194.

      24. Cuilenberg et al., "Artificial Intelligence and Content Analysis."

      25. Hesse-Biber et al., "HyperRESEARCH."

      26. Gunter L. Huber and Carlos Marcelo Garcia, "Computer Assistance for Testing Hypotheses about Qualitative Data: The Software Package AQUAD 3.0," Qualitative Sociology 14 (Fall 1991): 289-306.

      27. David P. Fan, Predictions of Public Opinion from the Mass Media: Computer Content Analysis and Mathematical Modeling (Westport, Conn.: Greenwood, 1988).

      28. Unlike many other systems discussed in this chapter, Fan's system provides no online support for human coding. Rather, a human intervenes after each iteration (or "filtration") to review search results and to specify search criteria for the next iteration.

      29. See Philip J. Stone, "Thematic Text Analysis: New Directions for Analyzing Text Content," in Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts, ed. C. W. Roberts (Mahwah, N.J.: Erlbaum, 1997), 35-54; Weber, Basic Content Analysis.

      30. Evans, "Teaching Computers to Watch Television."

      31. Christian Bauer and Arno Scharl, "Quantitative Evaluation of Web Site Content and Structure," Internet Research 10 (Spring 2000): 31-41.

      32. But see William Evans, "Content Analysis in an Era of Interactive News: Assessing 21st Century Symbolic Environments," in The Electronic Grapevine: Rumor, Reputation, and Reporting in the New On-Line Environment, ed. D. Borden and K. Harvey (Mahwah, N.J.: Erlbaum, 1988), 161-71.

      33. Gilbert Shapiro, "The Future of Coders: Human Judgments in a World of Sophisticated Software," in Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts, ed. C. W. Roberts (Mahwah, N.J.: Erlbaum, 1997), 225-38.





Previous chapter
Table of contents
Next chapter
    The content of this electronic work is intended for personal, noncommercial use only. You may not reproduce, publish, distribute, transmit, participate in the transfer or sale of, modify, create derivative works from, display, or in any way exploit this electronic work in whole or in part without the written permission of the Board of Trustees of the University of Illinois.

© 2012 by the Board of Trustees of the University of Illinois
All rights reserved