Lumen

Lumen

Eager to know more, about the world, about the intelligence, and about myself.
github

More Elegant Citation Keys

I have been using Obsidian's Markdown to write my academic notes and managing my literature in Zotero. The integration between these two applications is the Zotero Integration plugin in Obsidian. This plugin is great and meets most of my daily needs. I insert relevant literature into the text in the form of citation keys and when I need to work on LaTeX later, I just need to import my BibTeX to automatically use the citations embedded in Markdown in LaTeX.

The main purpose of a citation key is to refer to an article in the literature library with a unique string. Therefore, in the citation key, a lot of relevant information about the paper is often added to prevent identical keys. At the same time, this kind of key is different from a randomly generated ID. It has some semantic information, most commonly the author's name, publication year, and paper title. An example is LearningFineGrainedBimanualManipulationZhao2023, which refers to the paper "Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware" published in 2023, with the first author being "Tony Z. Zhao".

However, such citation keys have a problem, which is that they are too long. For example, as a long word, LearningFineGrainedBimanualManipulationZhao2023 often takes up an extra line when embedded in Markdown text, which is already questionable in terms of aesthetics. And when multiple articles need to be cited in a sentence, this problem becomes even more serious. For example, the following is a string of three articles that I cited.

\cite{bousmalisRoboCatSelfImprovingFoundation2023,brohanRT2VisionLanguageActionModels,reedGeneralistAgent2022}

This greatly disrupts the aesthetics of the notes, making it impossible for the preceding and following sentences to appear in my field of attention at the same time. The connected titles require extra effort to separate the words, which undermines readability. Although Zotero Integration provides many ways to cite, including presenting them in the normal way in the paper, these methods are not conducive to the subsequent steps of moving to LaTeX. Therefore, after trying it out, I still plan to use citation keys for referencing. I want to ensure that the citation key contains some readability so that I can roughly guess which article it refers to, while keeping it short enough and ensuring that it is unlikely to overlap.

After reading the documentation of Better BibTeX's Citation Key Generator, I found that Better BibTeX's Citation Key Generator provides a rich set of tools, and there are actually good ways to customize our citation keys. Here are the rules I finally chose.

Title.skipWords(true).abbr(3).substring(1,8) + shortyear + postfix

Breaking it down, we start with the title. SkipWords ignores common function words (of, from, A) and removes symbols. This gives us a relatively clean title. Then abbr(3) takes the first three letters of each word, and substring(1,8) ensures that we only use the first eight characters. Finally, we add a two-digit year.

In fact, at this point, it is very difficult to have the possibility of duplicates, after all, I don't read many articles every year, and there is a new namespace after a new year. But the documentation also provides an additional function, postfix, to ensure uniqueness. Here is the description:

a pseudo-function that sets the citekey disambiguation postfix using an sprintf-js format spec for when a key is generated that already exists. Does not add any text to the citekey otherwise. You must include exactly one of the placeholders %(n)s (number), %(a)s (alpha, lowercase) or %(A)s (alpha, uppercase). For the rest of the disambiguator you can use things like padding and extra text as sprintf-js allows. With start set to 1 the disambiguator is always included, even if there is no need for it when no duplicates exist. The default format is %(a)s. 

Simply put, it checks whether there are duplicate keys in the literature library. If there are none, it does nothing. If there are duplicates, it adds a suffix according to certain rules to ensure uniqueness. This theoretically guarantees the uniqueness of the citation key.

Compared to the three articles cited earlier, the keys generated under these rules look like this:

\cite{RobSelFo23,RT2VisMo,GenAge22}

It's much more concise, and I can also roughly guess which articles they refer to. It's a problem that has a good result after some tinkering.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.