How to manage BibLaTeX across time and cultures

I wrote a paper in English using LaTeX. For my quite interesting topic some essential references did not exist in English. That might sound simple — just list the originals, plus some translation/cross-referencing work to get the necessary information! It isn’t that simple.

This howto is for LaTeX authors with references which are less common in computing/mathematics but otherwise unremarkable, particularly: non-latin scripts, latinisations, non-English references, rare scripts and ancient documents. My sources had all of these at once, giving me the following situation:

Category	Range in my references
Language non-latin scripts	Chinese, Sanscrit, Spanish, Arabic
Language latinisations	Chinese, Sanscrit, Arabic
Eras	ancient (3400BCE), less ancient (900CE), modern historical (1898)
Right-to-left	Arabic
Dates	precise, approximate, and ranges

“Ancient” here refers to the reference text, not the subject of the text. My paper has references analysing Neandethal cultures, but luckily Neanderthals did not publish books (as far as we know) so there were no awkward dates from deep time.

These considerations were entirely outwith my experience, and, as shipped by default and used in the UK, all anglocentric computer systems struggle with them. LaTeX relies on the 50 year old latin-centric TeX engine underneath, on top of which the modern LuaTeX project project has implemented Unicode language support. Besides these, there are anglo-normative problems. The arrangement and ordering of people’s names often differs from English. And sometimes the defaults are just strange, for example automatically cutting down a long list of authors into “and others” without asking (rude!)

The way LaTeX and BibLaTeX work is that the environment is set up in the .tex file, and then conventions in the .bib file match what the envionment is looking for.

Following are excerpts from my .bib and .tex files. Your paper may have different requirements, and these are a mixture of mandatory correct usage of BibLaTeX, plus the occasional useful convention I invented.

Authors and titles

For authors and titles with non-latin characters, always use this form in your biblatex file:

1
2
3
       author                = {张伟},
       shortauthor           = {Zhang Wei},
       nameaddon             = {Zhang Wei},

The two identical English approximations are used differently by Biblatex: shortauthor is rendered in the main text eg:

       Leith residents built a big wall[Zhang Wei 2025]

but for the same entry nameaddon is rendered in the bibliography eg:

1
       Zhang Wei 张伟. Building the Great Wall of Corstophine.

this really matters when things get more complicated, as you’ll see.

Latinisation

Where there are latinised versions of Chinese/Arabic/Sanscrit author names, they must appear in the nameaddon field. This wasn’t a problem in the example above because the only name used is the English approximation. However is often a latinised version that retains features of the original which English cannot express. For example:

1
2
3
       author                  = {鲁迅}, 
       shortauthor             = {Lu Xun},   <-- widely used English approximation
       nameaddon               = {Lǔ Xùn},   <-- latinised equivalent (in this case pinyin)

For English users, even though the script is latinised these systems still often require a specific font installed due to accents needed for specific sounds or grammatical features.

There are maybe 100 or so latinisation systems for encoding non-latin languages. Here are some examples:

Script/Language	Romanization System
Chinese	Pinyin
Arabic	Latin-i harakat
Japanese	Hepburn romanization (Hebon-shiki)
Sanskrit	IAST (International Alphabet of Sanskrit Transliteration)
Korean	Revised Romanization of Korean
Russian	BGN/PCGN romanization
Thai	RTGS (Royal Thai General System)
Serbian	Gaj’s Latin alphabet

There is a similar but slightly different trick for handling latinisations in titles:

1
2
       title                 = {كتاب الحاوي في الطب},
       titleaddon            = {Kitāb al-Ḥāwī fī al-ṭibb},  <-- Latin-i harakat

Which latinisation for references?

Latinisation systems exist for people who prefer to use latin scripts for ease and speed, and in many cases as a response to ubiquitous Western-derived computer technology. Some of these systems are rapidly evolving, for example Chinese which now has Shuangpin, or double-pinyin.

Some language families have a large number of different Latinisation systems. Arabic has three main systems: DIN, ALA-LC and Hans Wehr (a less offensive everyday term covering them all being “Latin-i harakat”.) Japanese has both Hebon-shiki (called Hepburn in English) and also Kunrei-shiki, while Korean has two latinisations, Mandang has three N’ko latinisations and so on. Which should be used in references? Without specialist advice there will always be uncertainty, so the reliable choice is to always include the original script as well.

Translations

There is a problem in the Arabic title given above, because while titleaddon contains the official latinised script it still lacks an English translation. It’s great to have the latin script so you know what to search for if you don’t read Arabic characters (you can still copy/paste Arabic and that can be essential, but even in 2026 some computer systems still don’t handle Arabic very well.). So if there is an English translation of a title or an author, it is helpful to add it.

In this case the translated title should be in the note field, as follows:

1
       note                  = {Translated as: The Comprehensive Book of Medicine}

This isn’t just for Arabic, the same is true for Chinese. Chinese is a great example of this difference: pinyin latin equivalents are often supplied, but this is not a translation. There are often many ways to translate a given title to English. Where translations are few, partial or obscure, the English translation of the title/author may be so misleading to readers you are better off using the original. Even if you have no knowledge of the language and don’t read the script, a search engine is more likely to find information about a rarely-translated author if you use their native Chinese/Arabic/etc. name.

Right-to-left scripts

In the case of left-to-right scripts (which is the default for Chinese/Japanese/Korean (CJK) and latin-based languages) then the above conventions will work. These conventions can seem as though they work for Arabic, but there is still a problem due to script being written right-to-left. Biber detects the arabic text and switches to right-to-left so that the Arabic script is correct, unfortunately it also switches all text in the reference including latin characters whether for English or latinisation. So an Arabic reference containing a latin field, as they normally do, will have latin fields rendered like this:

1
2
3
      'Medicine of Book Comprehensive The'     or

      'enicideM fo kooB eviseneherpmoC ehT'

depending on context. To fix this, set the default language in the preamble to a left-to-right language such as ‘british’ (as in this bibliography) or ‘chinese’. Then preserve the Arabic text in the reference exactly how it is written by enclosing it in double braces like this:

1
       title                 = \textarabic{{كتاب الحاوي في الطب}},

Preservation

Preservation with double braces is useful elsewhere too. Another common problem is that many latinised scripts contain special characters requiring double braces like this Arabic example:

1
2
3
      publisher   = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}}
                        ^          ^        ^
                        \-----------\--------\--- breaks biblatex without {{double}}

The double quotes preserve the string exactly as written. Another example, which applies to normal English BibLaTeX is the first/last author default assumption:

1
      author = {Dundee Museum}

will render as ‘Museum, Dundee’, unless you say

1
      author = {{Dundee Museum}}

Dates

Dates use the EDTF (ISO8601-2) standard. BibLaTeX handles BCE/CE dates correctly but also avoids prefixes when it is pointless or distracting. The full syntax for dates is in the BibLaTeX user manual.

Long lists of authors

For very long lists of authors (such as [Meisner2024] in this bibliography) include all authors separated by ‘and’, rather than saying ‘, and others’. Biber has been setup in the preamble with max/mincitenames and maxbibnames so that it will render “et. al.” in the text, but render all names in the bibliography.

Full example

Here is a fictional example in full, handling both Chinese and right-to-left Arabic, with translations in latin script and correct use of -addon and note fields.

1
2
3
4
5
6
7
       author      = {冷開泰},                                           name
       shortauthor = {Leng Kaitai},                                      English approximation
       nameaddon   = {Lěng Kǎitài},                                      correctly latinised
       title       = \textarabic{{كتاب الحاوي في الطب}},                 Right-left preserved
       titleaddon  = {Kitāb al-Ḥāwī fī al-ṭibb},                         latinised
       note        = {Translated as: The Comprehensive Book of Medicine},English translation
       publisher   = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}}              latinised, with unsafe quotes

NB ‘authoraddon’ is not a valid field name, although it would seem logical that it would be instead of shortauthor.

Entire real-world references look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
@book{sushruta_samhita_1907,
  title         = {The {Suśruta-Saṃhitā}},
  titleaddon    = {\textsanskrit{सुश्रुतसंहिता}},
  author        = {Suśruta (composite work)},
  nameaddon     = {\textsanskrit{सुश्रुत}},
  translator    = {Bhishagratna, Kaviraj Kunja Lal},
  date          = {1907},
  origdate      = {-0599~/-0499~},
  publisher     = {Calcutta},
  url           = {https://wellcomecollection.org/works/vnqskk8w/items?canvas=98&manifest=2},
  note          = {English translation of the original Sanskrit text (circa 600 BCE--500
                   BCE), including discussion on transmissibility. The
                   \href{https://www.wisdomlib.org/hinduism/book/sushruta-samhita-volume-2-nidanasthana/d/doc142863.html}
                   {Wisdom Library translation} appears to be similar.},
  keywords = {ancient},
}

LaTeX setup for this bibliography

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
%%
%%  Font setup for Latin and Chinese
%%

% The following Chinese font support requires an exact font name match Eg on my Linux:
% I install the package adobe-source-han-serif-cn-fonts, followed by 'fc-list | grep "Han Serif"'.
% For reference, on my system 'fc-list | grep Han' gives 136 lines, because
% I installed all the Adobe CJK fonts (Chinese, Japanese, Korean) as recommended by CJK
% experts.

% The order of package loading really matters because some of the references use bidi
% (bi-directional) text to display the relevant Arabic, for which there is no
% translation to a Western Language. bidi was a retrofit onto latex and is a bit sensitive.
% If bidi wasn't needed, packages could be loaded in any order. These problems are steadily
% reducing as lualatex is developed. Lualatex is really quite an impressive redevelopment.

% Maths comes first in a bidi world
\usepackage{amsmath}
\usepackage{amssymb}

% Fonts next for bidi ordering reasons
\usepackage{fontspec}
\usepackage{luatexja-fontspec} % CJK handling (not just ja). No equivalent needed for other languages.

% Not needed at all except for bidi. It "stabilises arrays for bidi" according to experts.
% I don't understand but it did make errors go away.
\usepackage{array}

% Polyglossia is needed to do language-aware hyphenating, date formats, quote style etc
% in at least the csquotes and biblatex packages. Replacement for the older babel package.
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage{arabic}
\setotherlanguage{sanskrit} 

% no \setotherlanguage above for chinese, because luacjk handles this and we don't want
% panglossia and luacjk to get into a fight about who captures the incoming CJK unicode.
% This potential clash is a recurring theme in this preamble.

\newfontfamily\arabicfont[Script=Arabic]{Amiri}
\newfontfamily\arabicfonttt[Script=Arabic]{Amiri}
\newfontfamily\chinesefont{Source Han Serif SC}[
  Renderer=Harfbuzz,
  Script=CJK,
  AutoFakeSlant=0.2,  % CJK doesn't have italics, so this does a bit of judicious tilting
  AutoFakeBold=2 
]

\ltjsetparameter{jacharrange={-1}} % Prevents luatexja from being too aggressive and
% seizing all Unicode text that might be CJK, including parts of biblatex references
% that are merely adjacent to CJK text. 
\setmainjfont{Source Han Serif SC}[
  Index=2,
  Renderer=Harfbuzz,
  AutoFakeSlant=0.2,
  CharacterWidth=Full,  % Forces better mapping of CJK punctuation
  BoldFont={* Bold}     % Explicitly point to the bold weight so Biber knows it exists
]
% The Index=2 above is about mandating which version of a font to pick inside a TrueType collection.
% In this case, 2 is Simplified Chinese. Latex sometimes gets confused and picks (say)
% the Japanese version, so we are explicit. 0=Japanese, 1=Korean, 2=SC, 3=TC.

% Now repeat the above, only for sans not gothic. This is a trick, the point being
% that if latex wants to use a Han sans font, it will now use gothic instead. Reduces errors
% and the result seems good. Need to check with a CJK expert.
\setsansjfont{Source Han Serif SC}[
  Index=2,
  Renderer=Harfbuzz,
  AutoFakeSlant=0.2,
  CharacterWidth=Full,
  BoldFont={* Bold}  
]
% The above three commands (\ltjsetparameter, \setmainjfont, \setsansjfont) collectively
% avoid hundreds of warnings about missing fonts, and emit a better quality result.

\newfontfamily\devanagarifont[
  Script=Devanagari,
  HyphenChar=None, % Explicitly disable hyphenation, otherwise bibtex warns it can't load hyphenation rules
  ItalicFont={Noto Serif Devanagari},         % This script doesn't have italics or bold so we map them.
  BoldItalicFont={Noto Serif Devanagari Bold} % Unlike CJK where we fake slant. Also stops biblatex warnings.
]{Noto Serif Devanagari}

\setmainfont{TeX Gyre Pagella}
\newfontfamily\bigquotefont{TeX Gyre Cursor} % used for big block quotes

% The Gyre project (https://www.gust.org.pl/projects/e-foundry/tex-gyre/index_html)
% explains it all, but basically these are TeX and OpenType font families similar to
% the well-known commercial fonts with similar names, with significantly more functionality.
% On my Linux I installed the package tex-gyre-fonts .

% Disable all CJK small caps attempts. CJK doesn't have smallcaps in the fonts,
% but biber's default is smallcaps for authors. It generates a warning when it can't
% and this avoids large numbers of warnings.
\let\scshape\upshape
\let\textsc\textup

%%
%% Referencing and quoting setup
%%

\usepackage{authblk}

\usepackage[
  backend=biber,
  style=authoryear,  % alternatives include numeric, apa, etc.
  doi=true,
  url=true,
  isbn=false,
  datecirca=true,    % prints "circa" if date is followed by a slash /. Don't use tilda ~ convention.
  dateera=secular,   % handles BCE and CE by printing "BCE/CE"
  dateeraauto=1600,  % adds CE/BCE to anything before this year CE
  backref=true,      % great idea, but sometimes gets confused with the preview feature
  dateabbrev=false,
  language=auto,     % will change language according to langid, if present in an entry
  autolang=other,    % Use polyglossia/babel environments (but I am unsure why I need to set it)
  maxcitenames=2,    % Keep citations short: (Zhang et al., 2024)
  mincitenames=1,
  maxbibnames=99,    % List all authors in the bibliography (who doesn't? rude!)
  uniquelist=false   % Prevents BibLaTeX from adding names to disambiguate
]{biblatex}


% let long DOIs and URLs in bibliography break, avoiding overfull and other errors,
% and looks nicer.
\setcounter{biburllcpenalty}{7000}
\setcounter{biburlucpenalty}{8000}
\setcounter{biburlnumpenalty}{9000}

% Forces a gap between bib entries that PDF viewers can recognise as a boundary when doing
% a mouseover preview in the main text. Also just makes a bibliography look nicer.
\setlength{\bibitemsep}{1.5\itemsep}

% These mappings make sure biblatex doesn't start translating locale specific things
% like date formats or 'Appendix', 'Bibliography' etc. Other languages are merely content.
% 'british' is equivalent to the modern en_GB locale standard. This also means that the 
% default is left-to-right even in an entry containing arabic text enclosed in \textarabic{}.
% This also suppresses error messages from biblatex about 'Language not supported'.
\DeclareLanguageMapping{arabic}{british} 
\DeclareLanguageMapping{chinese}{british}
\DeclareLanguageMapping{sanskrit}{british}

% Force always printing 'nameaddon' after the author name. I use nameaddon exclusively for latin versions of
% Chinese (etc) names, so the effect is to render the real, untranslated name in the
% references. See notes at top of biblatex file for details of translation in 'note' field,
% and the special case of right-to-left arabic script in author names. This macro has
% completely replaced the authoryear macro, so it also made dates vanish until I added back here. 
% When forcing printing of nameaddon with here, we must use the nameaddon field not shortauthor.
\AtBeginDocument{
  \renewbibmacro*{author}{
    \printfield{nameaddon}
    \setunit{\addspace}
    \printnames{author}
    \setunit{\addspace}
    % This macro handles the label (derived from the 'date' field)
    % while respecting BCE/CE and circa formatting according to EDTF (ISO8601-2) dates.
    % Modern lualatex tries to be standards-compliant.
    \usebibmacro{date+extradate}
  }
}

% Make biblatex do quotation handling as expected for a paper
\usepackage{csquotes}

% The following macro seems to approximate the style I see in academic papers.
\DeclareCiteCommand{\parencite}
  [\mkbibbrackets]  % replaces parentheses with square brackets
  {\usebibmacro{prenote}}
  {\usebibmacro{citeindex}%
   \usebibmacro{cite}}
  {\multicitedelim}
  {\usebibmacro{postnote}}

\addbibresource{discovering-epidemiology.bibtex}

How to manage BibLaTeX across time and cultures#

Authors and titles#

Latinisation#

Which latinisation for references?#

Translations#

Right-to-left scripts#

Preservation#

Dates#

Long lists of authors#

Full example#

LaTeX setup for this bibliography#