BibTeX Done Right

Online bibliographies and automation for better citations

Photo by Thomas Kelley on Unsplash

Unlike in disciplines such as the humanities, computer science students are normally not especially trained in the art of correct citation. There are several reasons for this. One is that our work usually does not require heavy text work with direct quotations and precise references to text passages. Furthermore, the use of LaTeX enables us to use the powerful BibTeX tool which creates beautiful bibliographies from structured text files, the .bib bibliography files. The only thing required in addition to filling your .bib files with bibliography entries is to place \cite{paper-key} commands at suitable positions in the LaTeX document.

For example, a .bib file could contain the following entry:

@inproceedings{steinhoefel.haehnle-22,
  author    = {Dominic Steinh{\"{o}}fel and
               Reiner H{\"{a}}hnle},
  editor    = {Maurice H. ter Beek and
               Annabelle McIver and
               Jos{\'{e}} N. Oliveira},
  title     = {Abstract Execution},
  booktitle = {Formal Methods - The Next 30 Years - Third World Congress, {FM} 2019,
               Porto, Portugal, October 7-11, 2019, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {11800},
  pages     = {319--336},
  publisher = {Springer},
  year      = {2019},
  url       = {https://doi.org/10.1007/978-3-030-30942-8\_20},
  doi       = {10.1007/978-3-030-30942-8\_20},
}

You can use this by adding \cite{steinhoefel.haehnle-22} to your .tex file, as in

Abstract Execution~\cite{steinhoefel.haehnle-22} is a framework for proving the
correctness of statement-level program transformations.

Unfortunately, many computer science students—in my personal experience—still produce bibliographies of poor quality due to, e.g., missing fields, wrong entry types, and wrong field contents in .bib files. For example, it would be wrong to omit the title field in the above BibTeX snippet, or use an @article type instead of @inproceedings. It is not strictly wrong to omit the digital object identifier (DOI) or the URL; however, it is good practice to include this information to make it easier for readers and tools to locate and correctly identify a paper. However, the DOI information is not even included in the .bib file supplied by the publisher of the paper referenced above.

There’s plenty of information in the internet on how to use BibTeX in your LaTeX documents (e.g., on Overleaf, Wikipedia, and university websites). I warmly recommend reading these resources. However, I think that such recommendations are insufficient since students are already busy enough writing their thesis, conducting experiments, and so on, and you don’t get credit points for educating yourself to use BibTeX correctly.

This article does not aim to be yet another guide on BibTeX requirements or how to use BibTeX in your paper. Instead, I want to show how online resources and tools can help to get your bibliography correct automatically. This is what you’d expect from a computer scientist, right? 😊

I will expound the following three suggestions:

  1. Use online bibliographies (I recommend DBLP) to obtain the correct BibTeX entry for a published article.
  2. Follow my recommendations for referencing online resources.
  3. Check your file for problems using BibLatex-Check.
  4. (Optional) Format your BibTeX file using BibTool.

Online Bibliographies

OK, I lied: I won’t talk about online bibliographies here, but only about the single one I use regularly: The DBLP database.

Whenever you want to cite an article, first look for it on DBLP.

In my experience, the BibTeX snippets from DBLP are the most correct and complete ones you can get for computer science publications in the Internet. Using DBLP to obtain BibTeX code is easy:

  1. Visit https://dblp.org/.
  2. Enter parts of the paper’s title and authors' names until you sufficiently narrowed the search results.
  3. Hover on the “export record” icon and choose “BibTeX.”
  4. Click on “download as a .bib file” or copy-paste the displayed BibTeX code.

The screen cast below demonstrates how to obtain the reference information for the famous “Dragon Book” by Aho et al.

A screen cast demo of how to extract BibTeX code from DBLP
Extracting BibTeX code from DBLP.

If you do not find a publication on DBLP, you might look for it on other pages, such as Google Scholar, CiteSeerX, or Semantic Scholar. However, the results from these websites are usually much worse compared to DBLP. For example, the citation exported for the Dragon Book by Semantic Scholar has the @inproceedings type instead of @book, which is plain wrong. You can also use online tools supporting the manual creation of BibTeX files by displaying the possible and signalling the required fields for the available BibTeX entry types. If you want to cite a web page, please read on!

Citing Online Resources

Web pages are not “published” in the usual sense. They are not peer-reviewed, usually come without a permanent version identifier (i.e., they can be updated), and there is no BibTeX entry specifically for web pages (however, there is a suitable BibLaTeX entry). If possible, you should not cite websites at all. If you want to refer to a software project’s repository, for example, it is better to use a footnote than a citation. However, there are valid reasons for citing web pages. Consider the following sentence from one of my papers:

“When Companies Become Prisoners of Legacy Systems:” This title of a Wall Street Journal article [41] well describes the predominant perception of legacy software systems in academia.

It’s perfectly legitimate to use a citation for this online article of the Wall Street Journal. Here’s how to do this using BibTeX.

Online Resources in BibTeX

Since BibTeX does not provide us with an entry type for online resources, we have to resort to the @misc type. This entry type does not have any required fields, and offers the optional fields author, title, howpublished, month, year, and note. I strongly recommend using at least the following fields for web pages:

  • title: The title of the web pages. Each web pages should have a title, so it should always be possible to extract this information.
  • howpublished: The URL of the online resource.
  • note: Here, you enter the date when you last accessed this resource. This permits at least some “versioning” information for updatable resources.

Additionally, you should provide the author, month, and year fields if this information is available, which it is for the Wall Street Journal article.

The final entry for our article would be

@Misc{		  schneider-13,
  Author	= {Schneider, Adam},
  Title		= {{When Companies Become Prisoners of Legacy Systems}},
  Month		= oct,
  Year		= 2013,
  HowPublished	= {\url{https://deloitte.wsj.com/articles/when-companies-become-prisoners-of-legacy-systems-1380600092}},
  Note		= {Accessed: 2022-11-03}
}
BibTeX is context-sensitive; it does not matter whether you write @Misc or @misc. The above snippet was processed by BibTool, which normalized keys to “upper camel case.” The astute reader furthermore will have spotted that I surrounded the title with two pairs of curly braces. This preserves the title case in the LaTeX output. I talk about this in the section on formatting.

Online Resources in BibLaTeX

If you use BibLaTeX, you can (and should) use the @online entry type:

@Online{	  schneider-13,
  Author	= {Schneider, Adam},
  Title		= {{When Companies Become Prisoners of Legacy Systems}},
  Month		= oct,
  Year		= 2013,
  URL		= {https://deloitte.wsj.com/articles/when-companies-become-prisoners-of-legacy-systems-1380600092},
  URLDate	= {2022-11-03}
}

Checking BibTeX Files for Problems

If you obtained all your BibTeX entries from DBLP and came up with perfect entries for your online references, you can skip this point. However, we are also using linters, integration tests etc. for our code, even though we carefully tried to avoid any formatting or functional errors. I think that we should apply the same care to our bibliographies!

For automatically checking you BibTeX files for errors, I recommend the BibLatex-Check tool. To use it, follow these steps:

  1. Clone the BibLatex-Check repository:

    git clone https://github.com/rindPHI/BibLatex-Check
    

    The URL for the tool, which was not created by myself, points to my fork of BibLatex-Check where I fixed the option to show the analysis results nicely rendered in your default web browser.

  2. Call BibLatex-Check:

    python3 BibLatex-Check/biblatex_check.py -b path/to/my.bib
    

    This requires Python (version 3) on your system. If you don’t have Python, get it! It’s a great prototyping language 😃

    You can also ask BibLatex-Check to show you a rendering of the results in your web browser:

    python3 BibLatex-Check/biblatex_check.py -b path/to/my.bib -o result.html -v
    
  3. Fix any errors 😎

Here’s a screen cast of me exercising steps 1 and 2:

A screen cast demo of how to check BibTeX code using BibLatex-Check
Checking BibTeX code with BibLatex-Check.

Formatting Your BibTeX

I marked this part as optional because (1) you don’t need to refactor your .bib files for correct bibliographies—the generated entries in the rendered file will look just the same; and (2) preserving “Title Case” in bibliography entries is more controversial than I thought. Still, I recommend to follow the suggestions in this section! You will be rewarded with nice, crisp, and uniformly-looking BibTeX code and titles in Title Case. The world as it should be.

Automatically Refactoring BibTeX

Maybe you’ve heard the term refactoring before: Changing your code such that it still performs the same functionality, but is better to understand or maintain for humans. That’s the first part of my two formatting suggestions: Use BibTool to automatically clean up your BibTeX and bring it into a uniform shape. BibTool is available on GitHub, but also bundled in many LaTeX distributions such that you don’t need to install it manually. Simply type bibtool -h into a terminal window and see if you get an error or a help text.

Assuming you have BibTool installed, you can use my reformatBibliography.sh script to format your bibliography in-place. I recommend to use a version control system like Git and commit your changes to the .bib file before, since the script overwrites the existing file.

To install the script on your (Linux/UNIX/MacOS) system, follow these steps:

  1. Change into a directory in your $PATH, e.g., a local ~/bin directory:
cd ~/bin
  1. Download the sources:
wget https://gist.github.com/rindPHI/bff52e25d70c8acd0b81eab84fad8fca/archive/0c96e4328309fb85ef3ef0c84dd813aef2583c73.zip -O reformatBibliography.zip
  1. Unzip and delete the archive:
unzip reformatBibliography.zip
rm reformatBibliography.zip
  1. Make the script executable:
chmod +x reformatBibliography.sh

Using the script is simple: Call reformatBibliography.sh my.bib to reformat a file my.bib. For example, assume my.bib consists of the DBLP entry for my Abstract Execution paper, which already served us as an example above:

@inproceedings{DBLP:conf/fm/SteinhofelH19,
  author    = {Dominic Steinh{\"{o}}fel and
               Reiner H{\"{a}}hnle},
  editor    = {Maurice H. ter Beek and
               Annabelle McIver and
               Jos{\'{e}} N. Oliveira},
  title     = {Abstract Execution},
  booktitle = {Formal Methods - The Next 30 Years - Third World Congress, {FM} 2019,
               Porto, Portugal, October 7-11, 2019, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {11800},
  pages     = {319--336},
  publisher = {Springer},
  year      = {2019},
  url       = {https://doi.org/10.1007/978-3-030-30942-8\_20},
  doi       = {10.1007/978-3-030-30942-8\_20},
  timestamp = {Sat, 12 Oct 2019 12:51:42 +0200},
  biburl    = {https://dblp.org/rec/conf/fm/SteinhofelH19.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Running reformatBibliography.sh my.bib changes the content of my.bib to the following entry:

@InProceedings{	  steinhoefel.haehnle-19,
  Author	= {Dominic Steinh{\"{o}}fel and Reiner H{\"{a}}hnle},
  Editor	= {Maurice H. ter Beek and Annabelle McIver and Jos{\'{e}} N.
		  Oliveira},
  Title		= {Abstract Execution},
  BookTitle	= {Formal Methods - The Next 30 Years - Third World Congress,
		  {FM} 2019, Porto, Portugal, October 7-11, 2019,
		  Proceedings},
  Series	= {Lecture Notes in Computer Science},
  Volume	= {11800},
  Pages		= {319--336},
  Publisher	= {Springer},
  Year		= {2019},
  URL		= {https://doi.org/10.1007/978-3-030-30942-8\_20},
  DOI		= {10.1007/978-3-030-30942-8\_20},
  timestamp	= {Sat, 12 Oct 2019 12:51:42 +0200},
  biburl	= {https://dblp.org/rec/conf/fm/SteinhofelH19.bib},
  bibsource	= {dblp computer science bibliography, https://dblp.org}
}

Note that BibTool changed the key of the entry. I personally prefer uniform keys and don’t want to figure them out myself. If you want to deactivate this behavior, delete -f "{%-2n(author) # %-2n(editor)}-%2d(year)" from reformatBibliography.sh.

If you add a paper with the same authors published in the same year to a bibliography that has already been processed by BibTool, the key of the existing paper(s) with that particular authors-year combination might be changed. For example, if I simply duplicate the already existing entry in my.bib, one of the entries will receive the key steinhoefel.haehnle-19*1. I am quite sure that this issue can be prevented by adding new entries to the end of the .bib file. It won’t hurt, however, to check whether the citations in your rendered document are still referring to the correct publications if you receive any ...*1 entries.

Preserving Title Case and Removing Superfluous Fields

Now, this is the controversial part 😃 If your title field is only enclosed by single braces (or quotation marks) as in Title = {Abstract Execution}, the title in the rendered bibliography will be “Abstract execution.” The second word appears in lower case, the title case formatting is not preserved. I don’t like this: Titles should be in title case, full stop. However, some people think that you should not fuzz with bib entries. I honestly don’t understand why this should be the case; in the actual papers, the titles are also in Title Case, right? Well, I always change the title fields to look like Title = {{Abstract Execution}}, i.e., enclosed by double braces, which preserves the case.

I generally remove fields like timestamp, biburl, or bibsource added by DBLP. They won’t appear in the rendered document, and only clutter my BibTeX file. Also, I usually change URL to x-URL, which effectively comments the URL out (you could also remove the field, but I’m always a little reluctant to do so). The reason for this is that in bibliography styles which consider the DOI field, the link to the doi.org page would appear twice otherwise, which does not make sense in my opinion.

Thus, the “Abstract Execution” entry in my BibTeX file finally has the following shape (here with URL removed, and not commented out):

@InProceedings{	  steinhoefel.haehnle-19,
  Author	= {Dominic Steinh{\"{o}}fel and Reiner H{\"{a}}hnle},
  Editor	= {Maurice H. ter Beek and Annabelle McIver and Jos{\'{e}} N.
		  Oliveira},
  Title		= {{Abstract Execution}},
  BookTitle	= {Formal Methods - The Next 30 Years - Third World Congress,
		  {FM} 2019, Porto, Portugal, October 7-11, 2019,
		  Proceedings},
  Series	= {Lecture Notes in Computer Science},
  Volume	= {11800},
  Pages		= {319--336},
  Publisher	= {Springer},
  Year		= {2019},
  DOI		= {10.1007/978-3-030-30942-8\_20},
}

That’s it! If you get used to

  1. Using DBLP
  2. Citing online sources correctly
  3. Checking your BibTeX for errors using an automatic linter

you will obtain a nice and correct bibliography in your paper or thesis almost all of the times! Additionally, if you

  1. Use BibTool, remove superfluous fields, and wrap titles (in Title Case) in double braces,

your BibTeX code will look nice and uniform, including the keys, titles will appear in tidy Title Case, and readers of your bibliography will be happy 😊 Enjoy!

Dominic Steinhöfel
Dominic Steinhöfel
Postdoctoral Researcher in Computer Science

I’m a PostDoc @ CISPA (Saarbrücken, Germany). My research interests center around program verification.