Getting a handle on large research projects

I am drawn to big projects. Small projects are easier to manage: if I focus on a single well–defined question, and answer it using a narrowly circumscribed set of source material, I can go from idea to article in less than a year, before my thoughts and notes and sources become too unwieldy to handle with just one brain. Done well, such small projects can be great contributions, and they do wonders for one’s c.v. But sometimes our curiosity, our data, and the good of the field compel us to be more ambitious—to take on questions too complex and datasets too vast for our unaided minds. Such a venture requires better infrastructure—scaffolding built around one’s brain—lest the project collapse under its own weight and leave us retracing our steps to recapture past insights that have disappeared in the rubble. Here are some ideas and advice from my own experience wrangling big projects.

I was faced with the problem of scale in my first book, The Formation of Islamic Hermeneutics, and I am facing it again in my present work on contemporary Qur’anic hermeneutics, which I anticipate will occupy me until retirement. A smaller project like an article might start with a hypothesis, a projected narrative, and a tentative outline into which I can incorporate relevant notes as I dig through my sources; I know from the outset what notes I will need, and where they are likely to fit into the final product. Such a task requires only a word processor and some time. But when I begin with nothing but curiosity, facing a landscape of material that no one has ever mapped before, I can’t (and shouldn’t) begin with a projected outline already in mind. How do I proceed? Do I just read and read until a hypothesis starts to emerge out of the mist—and then, once I know where I’m going, start all over and reread everything so that I can take the notes I need and fit them into my outline? By then I will have forgotten most of what I’ve read. I keep stumbling upon old notes about things I don’t remember ever reading.

That is why some systematic method of organizing source materials is essential. For me, those materials are mostly books, manuscripts, and articles. I still use shelves full of books, and filing cabinets full of articles, in which the key sentence of every paragraph, as well as anything else potentially useful for my project, is underlined in pencil. My guiding principle in reading is that I should never have to reread anything; scanning the underlined words should be all I ever need to do again, even ten years from now. And everything I read (or think I should read) gets entered in a master bibliography, a Word file with comments indicating when I read or skimmed each work, where I filed it, how I might use it someday, and a summary of my impressions and the main points. I also change the font color in the bibliography to dark blue when I have verified the citation information, so that I never have to recheck it. All that may add 25% to my reading time, but that moment when I have just finished reading something, while the whole picture is still fresh in my mind, is a moment I cannot afford to waste; I want to capture it in a place that I can always come back to: not in notes for my current project but in my master bibliography.

A screenshot from my master bibliography

You probably have your own fancier methods to make this more efficient: software for citations and note–taking, annotated pdf libraries, etc. For me the value of visual markup on the printed page is so great—I can review a previously marked–up article in two minutes—that pdfs are no substitute, but I do keep them on my hard drive for the occasional full text search, when I can’t remember where I read something. Your own system will be dictated by your own mental habits. But have a system! That extra investment of time will be richly rewarded in the long run.

Often, though, I don’t need notes about a whole book or article; I need notes about specific ideas, sentences, or words. The legal theory works that I studied for my first book often quoted earlier legal theorists whose works are now lost, and I needed a way to gather and organize those references, not by their source but by the person referred to and by the topic of the reference. What I needed was, in effect, old–fashioned 3 by 5 cards—thousands of them, one for every reference to a prior theorist’s view. Josef van Ess produced his massive Theologie und Gesellschaft in this way, gleaning citations of early theologians from later works and copying them onto thousands of index cards; the cards were produced in one sequence, as he and his students worked page by page through the extant literature, but they were filed under the name of the person quoted, so that when he wrote the book he had all the cards about each figure already gathered in one place.

I needed something similar, but since I was starting several decades later I was able to create my version of index cards electronically in a database (Microsoft Access), using a data structure and input forms that I had to create for myself. This required a lot of time up front, during my graduate years, learning this powerful software and adapting it to the particular requirements of my sources. But I have never regretted taking that time to set up some solid research infrastructure. I still use that very database today, for my current project, because it allows me to take notes as I read, before I have any idea what my final outline will be, and then go back and review those notes in any order I want—chronologically by topic, by school of thought, by person, etc. When it came time to write my first book I was able instantly to organize the note cards in my database into whatever sequence my outline demanded and just work through them, synthesizing the notes into my narrative and inserting references into the footnotes as I went. The book ended up with nearly 1500 footnotes containing some 7000 page references. I could not have been that thorough if I had had to rely on the underlining in my sources, summaries in my bibliography, or a massive and constantly evolving outline full of notes.

General notes on ʿAbd al-Jabbār and his works, teachers, and students in my Access database. Notecards about ʿAbd al-Jabbār’s views on specific topics are under the References tabs.

I am not recommending that you create your own database template in Access. Doubtless there are better templates and simpler programs readily available today. But if you have a large project, taking the time to set up a note–taking infrastructure tailored to your own data and methods will not only make your writing easier, it may enable you to accomplish something that would simply have been impossible otherwise.

The writing process itself also requires some extra infrastructure when you are attempting a large and complex book. I like to keep trying out different outlines as my understanding, my hypotheses, my narratives, and my questions evolve. Periodically rewriting a one–page abstract of the whole book is a good exercise, because it forces you to identify an overarching narrative. But serious writing should not start too early in the research. It might seem safe to start with Chapter One, before you know what Chapter Six will say, if your overarching narrative is simply chronological; but such a narrative seldom makes a great book. A historical narrative needs some connecting threads, including one dazzling bright red thread to connect the whole thing—the thread you can reduce the book to in an abstract or a job interview. That narrative thread needs to be good, and you need to be convinced of it. Drafting when inspiration strikes can help you to find your narrative, but I would suggest that you not start writing in earnest, with carefully tuned language and detailed footnotes, until you know you have found your big story and are ready to craft every paragraph in light of that story. Even then you will continue to refine your narrative and terminology as you write, so inevitably you will need to revise and reword almost everything—probably multiple times—once your first draft is complete. (When I was writing my first book it became a running joke: I would come home from work and announce: “I finished the book!” and my son would retort “Again?”) With an article, one or two passes to tighten things up might be sufficient, but when a book is written over the course of several years, substantial rewriting is just part of the process. As projects get bigger they require exponentially greater effort and more robust infrastructure.

If writing is going to take years, then you need a way to keep track of what parts of your manuscript are just an inspired draft, and what is ready for print. My own practice is first to rapidly draft a section using the default black ink color in Microsoft Word, but when I go back to write in earnest I use a color code: dark blue text for wording and references that I have checked against the primary sources, light blue for wording I have documented adequately but indirectly through secondary sources, green for wording I am confident about and that does not require any documentation, light green for things I am willing to publish but that I wish I could find better support for, orange for things I still need to check against my sources, red for ideas I am still not at all sure of, and brown for notes to myself about things I still need to do.

If I subsequently reword a sentence for the sake of style or argument, I remove the blue coloring of the words I change until I have checked the new wording directly against the original sources. Nothing gets published until every letter is either blue or green. I have seen far too many books with mistaken references that lead to irrelevant or nonexistent pages—perhaps because the reference was based on notes and never checked, or because the sentence the reference originally supported was rewritten or removed—as well as books where the wording looks plausible but upon investigation turns out to be insufficiently precise to accurately represent the technical language of the underlying sources. Avoiding those mistakes—which will be recognized only by that handful of scholars whose respect I crave the most—cannot be left to the vagaries of editing, much less to some ambitious plan to recheck everything one last time: that will never happen, and it would be a tragic waste of time anyway because it is unnecessary. You just need a way to know which words and sentences in your manuscript are really justified and ready for print, and which were just rough ideas when you typed them in three years ago. With such a system in place, you can feel free to write large chunks when inspiration strikes, working from memory or rough notes without verifying every word right then and there, because you know you won’t forget to revisit the parts in black, orange, or red; and once something is in blue you will never need to worry “Am I really sure of this?” And what scholar does not have such worries?

Finally, in the digital age your well–planned infrastructure will eventually open up new possibilities that might not even exist when you start your project. Over the last few years new data visualization tools, many of them freely available online, have given me a whole new way to look at all the notes in my clunky old database. If I tell it to spit out my four thousand notecards in the form of a csv file (a spreadsheet), and feed that into the VOSviewer tool, I can generate a network diagram of the main people in my database, showing who cites whom most often. Not surprisingly, they cluster into colored groups that fall almost perfectly along school lines:

Network diagram of people who cite each other in my database of notes.

Or instead I can generate a map of the main topics dealt with in my notes, with the most common terms clustered into several coherent subject areas:

Heatmap of the most common topics in my database of notes.

Since my database has dates for each figure, I can also diagram the ebb and flow of discussions on these subjects over time, using the online RAWGraphs tool:

Streamgraph of incidence over time of five topics in my database of notes.

All this is no more than eye candy now, because my book on those topics was published years ago; but for my next book I will make use of such visualizations at least for conference presentations, and they may even help me to see overall trends in my notes that I could not have noticed while recording them one by one. Good digital infrastructure doesn’t just help you organize your notes and do your writing; if you structure your data well, it will be ready for all kinds of new tricks and techniques that are coming down the pike. Your institution’s library or Digital Humanities center can help you think about how best to structure and store your data (i.e., your notes and digital source texts) so that they will remain adaptable and useful as new technologies come along.

So far I have spoken only of organizing your sources, notes, and writing, but some of the new techniques on the horizon promise to help you explore, find new sources, and choose what to read next, and some may even suggest hypotheses about the contents of your sources. My current project on global intellectual networks in contemporary Qur’anic hermeneutics, which is likely to keep me busy for the next twenty years, is so big and ambitious that it might be foolhardy were it not for some new text mining techniques that promise to help. My biggest recent investment in infrastructure has been to work with a data scientist and software developer to create a program that maps out the contents of the books on Qur’anic hermeneutics that I have been bringing home from Indonesia. The software lets me navigate around the topics that appear on the map or “termscape,” and helps me to identify which works are most relevant to the topics I want to read about first. The video below shows how it works. That software project has soaked up countless hours without yielding any publishable product, but in the long run it may end up making possible something that I could never otherwise have accomplished: finding the main threads and connections across a vast literature that is growing so fast that I can’t even keep up with what is being published, much less read it all. If so, those many hours will have been a game–changing investment.


And that is the main point I want to offer: big research projects are exponentially harder than small ones, but if done well they are also of exponentially greater value to the field. Some of us should pursue them at least some of the time—even if our institutions do not always recognize or reward all the investment in scholarly infrastructure that they require. If you have enough job security (or are idealistic enough) to take the risk of spending the extra time to set up and maintain the infrastructure required by a large and complex project, then I think you will find your investment richly rewarded in the long run. That infrastructure work will slow you down a lot at first, but it will make your writing easier and better in the end, and most importantly it will enable you to tackle difficult and important projects that otherwise would be too big to manage with just one brain.

Leave a Reply