How to manage BibLaTeX across time and cultures#
I wrote a paper in English using LaTeX. For my quite interesting
topic some essential references did not exist in English. That might sound simple
— just list the originals, plus some translation/cross-referencing work to get the necessary information! It isn’t that
simple.
This howto is for LaTeX authors with unusual references, especially non-latin scripts, latinisations,
non-English references, rare scripts and ancient documents. In some of my sources I had all of these at once,
giving me this situation:
| Category | Range in my references |
|---|
| Language non-latin scripts | Chinese, Sanscrit, Spanish, Arabic |
| Language latinisations | Chinese, Sanscrit, Arabic |
| Eras | ancient (3400BCE), less ancient (900CE), modern historical (1898) |
| Right-to-left | Arabic |
| Dates | precise, approximate, and ranges |
“Ancient” here refers to the reference text, not the subject of the text. My paper has references analysing
Neandethal cultures, but luckily Neanderthals did not publish books (as far as we know) so there were no
awkward dates from deep time.
These considerations were entirely out of my experience, and all computer systems struggle with them. LaTeX
relies on the 50 year old TeX engine underneath, and the modern LuaTeX project
project has implemented modern language support.
The way LaTeX and BibLaTeX work is that the environment is set up in the .tex file, and then conventions
in the .bib file match what the envionment is looking for.
Following are excerpts from my .bib .tex files. Your paper may have different requirements.
Authors and titles#
For authors and titles with non-latin characters, always use this form in your biblatex file:
1
2
3
| author = {张伟},
shortauthor = {Zhang Wei},
nameaddon = {Zhang Wei},
|
The two identical English approximations are used differently by Biblatex: shortauthor is rendered in the main text eg:
Leith residents built a big wall[Zhang Wei 2025]
but for the same entry nameaddon is rendered in the bibliography eg:
1
| Zhang Wei 张伟. Building the Great Wall of Corstophine.
|
this really matters when things get more complicated, as you’ll see.
Latinisation#
Where there are latinised versions of Chinese/Arabic/Sanscrit author names, they must appear in the
nameaddon field. This wasn’t a problem in the example above because the only name used is the English
approximation. However is often a latinised version that retains features of the original which English cannot
express. For example:
1
2
3
| author = {鲁迅},
shortauthor = {Lu Xun}, <-- widely used English approximation
nameaddon = {Lǔ Xùn}, <-- latinised equivalent (in this case pinyin)
|
For English users, even though the script is latinised these systems still often require a specific font
installed due to accents needed for specific sounds or grammatical features.
There are maybe 100 or so latinisation systems for encoding non-latin languages. Here are some examples:
| Script/Language | Romanization System |
|---|
| Chinese | Pinyin |
| Arabic | Latin-i harakat |
| Japanese | Hepburn romanization (Hebon-shiki) |
| Sanskrit | IAST (International Alphabet of Sanskrit Transliteration) |
| Korean | Revised Romanization of Korean |
| Russian | BGN/PCGN romanization |
| Thai | RTGS (Royal Thai General System) |
| Serbian | Gaj’s Latin alphabet |
There is a similar but slightly different trick for handling latinisations in titles:
1
2
| title = {كتاب الحاوي في الطب},
titleaddon = {Kitāb al-Ḥāwī fī al-ṭibb}, <-- Latin-i harakat
|
Which latinisation for references?#
Latinisation systems exist for people who prefer to use latin scripts for ease and speed, and in many cases as
a response to ubiquitous Western-derived computer technology. Some of these systems are rapidly evolving, for
example Chinese which now has Shuangpin, or double-pinyin.
Some language families have a large number of different Latinisation systems. Arabic has three main systems:
DIN, ALA-LC and Hans Wehr (a less offensive everyday term covering them all being “Latin-i harakat”.) Japanese
has both Hebon-shiki (called Hepburn in English) and also Kunrei-shiki, while Korean has two latinisations,
Mandang has three N’ko latinisations and so on. Which should be used in references? Without specialist advice
there will always be uncertainty, so the reliable choice is to always include the original script as well.
Translations#
There is a problem in the Arabic title given above, because while titleaddon
contains the official latinised script it still lacks an English translation. It’s great
to have the latin script so you know what to search for if you don’t read Arabic
characters (you can still copy/paste Arabic and that can be essential, but even in
2026 some computer systems still don’t handle Arabic very well.). So if there is
an English translation of a title or an author, it is helpful to add it.
In this case the translated title should be in the note field, as follows:
1
| note = {Translated as: The Comprehensive Book of Medicine}
|
This isn’t just for Arabic, the same is true for Chinese. Chinese is a great example of this difference:
pinyin latin equivalents are often supplied, but this is not a translation. There are often many ways to
translate a given title to English. Where translations are few, partial or obscure, the English translation of
the title/author may be so misleading to readers you are better off using the original. Even if you have no
knowledge of the language and don’t read the script, a search engine is more likely to find information about
a rarely-translated author if you use their native Chinese/Arabic/etc. name.
Right-to-left scripts#
In the case of left-to-right scripts (which is the default for CJK and latin languages)
then the above conventions will work. These conventions may seem as though they work for
Arabic, but there is still a problem due to script being written right-to-left. Biber
detects the arabic text and switches to right-to-left so that the Arabic script is correct,
unfortunately it also switches all text in the reference including latin
characters whether for English or latinisation. So an Arabic reference containing
a latin field, as they normally do, will have latin fields rendered like this:
1
2
3
| 'Medicine of Book Comprehensive The' or
'enicideM fo kooB eviseneherpmoC ehT'
|
depending on context. To fix this, set the default language in the preamble to a left-to-right
language such as ‘british’ (as in this bibliography) or ‘chinese’. Then preserve
the Arabic text in the reference exactly how it is written by enclosing it in
double braces like this:
1
| title = \textarabic{{كتاب الحاوي في الطب}},
|
Preservation#
Preservation with double braces is useful elsewhere too. Another common problem is that many latinised scripts
contain special characters requiring double braces like this Arabic example:
1
2
3
| publisher = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}}
^ ^ ^
\-----------\--------\--- breaks biblatex without {{double}}
|
The double quotes preserve the string exactly as written. Another example, which applies to normal English
BibLaTeX is the first/last author default assumption:
1
| author = {Dundee Museum}
|
will render as ‘Museum, Dundee’, unless you say
1
| author = {{Dundee Museum}}
|
Dates#
Dates use the EDTF (ISO8601-2) standard. BibLaTeX handles BCE/CE dates correctly but also
avoids prefixes when it is pointless or distracting. The full syntax for dates is
in the BibLaTeX user
manual.
Long lists of authors#
For very long lists of authors (such as [Meisner2024] in this bibliography) include all authors
separated by ‘and’, rather than saying ‘, and others’. Biber has been setup in the
preamble with max/mincitenames and maxbibnames so that it will render “et. al.” in the
text, but render all names in the bibliography.
Full example#
Here is a fictional example in full, handling both Chinese and right-to-left Arabic, with
translations in latin script and correct use of -addon and note fields.
1
2
3
4
5
6
7
| author = {冷開泰}, name
shortauthor = {Leng Kaitai}, English approximation
nameaddon = {Lěng Kǎitài}, correctly latinised
title = \textarabic{{كتاب الحاوي في الطب}}, Right-left preserved
titleaddon = {Kitāb al-Ḥāwī fī al-ṭibb}, latinised
note = {Translated as: The Comprehensive Book of Medicine},English translation
publisher = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}} latinised, with unsafe quotes
|
NB ‘authoraddon’ is not a valid field name, although it would seem logical that it
would be instead of shortauthor.
Entire real-world references look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| @book{sushruta_samhita_1907,
title = {The {Suśruta-Saṃhitā}},
titleaddon = {\textsanskrit{सुश्रुतसंहिता}},
author = {Suśruta (composite work)},
nameaddon = {\textsanskrit{सुश्रुत}},
translator = {Bhishagratna, Kaviraj Kunja Lal},
date = {1907},
origdate = {-0599~/-0499~},
publisher = {Calcutta},
url = {https://wellcomecollection.org/works/vnqskk8w/items?canvas=98&manifest=2},
note = {English translation of the original Sanskrit text (circa 600 BCE--500
BCE), including discussion on transmissibility. The
\href{https://www.wisdomlib.org/hinduism/book/sushruta-samhita-volume-2-nidanasthana/d/doc142863.html}
{Wisdom Library translation} appears to be similar.},
keywords = {ancient},
}
|
LaTeX setup for this bibliography#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
| %%
%% Font setup for Latin and Chinese
%%
% The following Chinese font support requires an exact font name match Eg on my Linux:
% I install the package adobe-source-han-serif-cn-fonts, followed by 'fc-list | grep "Han Serif"'.
% For reference, on my system 'fc-list | grep Han' gives 136 lines, because
% I installed all the Adobe CJK fonts (Chinese, Japanese, Korean) as recommended by CJK
% experts.
% The order of package loading really matters because some of the references use bidi
% (bi-directional) text to display the relevant Arabic, for which there is no
% translation to a Western Language. bidi was a retrofit onto latex and is a bit sensitive.
% If bidi wasn't needed, packages could be loaded in any order. These problems are steadily
% reducing as lualatex is developed. Lualatex is really quite an impressive redevelopment.
% Maths comes first in a bidi world
\usepackage{amsmath}
\usepackage{amssymb}
% Fonts next for bidi ordering reasons
\usepackage{fontspec}
\usepackage{luatexja-fontspec} % CJK handling (not just ja). No equivalent needed for other languages.
% Not needed at all except for bidi. It "stabilises arrays for bidi" according to experts.
% I don't understand but it did make errors go away.
\usepackage{array}
% Polyglossia is needed to do language-aware hyphenating, date formats, quote style etc
% in at least the csquotes and biblatex packages. Replacement for the older babel package.
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage{arabic}
\setotherlanguage{sanskrit}
% no \setotherlanguage above for chinese, because luacjk handles this and we don't want
% panglossia and luacjk to get into a fight about who captures the incoming CJK unicode.
% This potential clash is a recurring theme in this preamble.
\newfontfamily\arabicfont[Script=Arabic]{Amiri}
\newfontfamily\arabicfonttt[Script=Arabic]{Amiri}
\newfontfamily\chinesefont{Source Han Serif SC}[
Renderer=Harfbuzz,
Script=CJK,
AutoFakeSlant=0.2, % CJK doesn't have italics, so this does a bit of judicious tilting
AutoFakeBold=2
]
\ltjsetparameter{jacharrange={-1}} % Prevents luatexja from being too aggressive and
% seizing all Unicode text that might be CJK, including parts of biblatex references
% that are merely adjacent to CJK text.
\setmainjfont{Source Han Serif SC}[
Index=2,
Renderer=Harfbuzz,
AutoFakeSlant=0.2,
CharacterWidth=Full, % Forces better mapping of CJK punctuation
BoldFont={* Bold} % Explicitly point to the bold weight so Biber knows it exists
]
% The Index=2 above is about mandating which version of a font to pick inside a TrueType collection.
% In this case, 2 is Simplified Chinese. Latex sometimes gets confused and picks (say)
% the Japanese version, so we are explicit. 0=Japanese, 1=Korean, 2=SC, 3=TC.
% Now repeat the above, only for sans not gothic. This is a trick, the point being
% that if latex wants to use a Han sans font, it will now use gothic instead. Reduces errors
% and the result seems good. Need to check with a CJK expert.
\setsansjfont{Source Han Serif SC}[
Index=2,
Renderer=Harfbuzz,
AutoFakeSlant=0.2,
CharacterWidth=Full,
BoldFont={* Bold}
]
% The above three commands (\ltjsetparameter, \setmainjfont, \setsansjfont) collectively
% avoid hundreds of warnings about missing fonts, and emit a better quality result.
\newfontfamily\devanagarifont[
Script=Devanagari,
HyphenChar=None, % Explicitly disable hyphenation, otherwise bibtex warns it can't load hyphenation rules
ItalicFont={Noto Serif Devanagari}, % This script doesn't have italics or bold so we map them.
BoldItalicFont={Noto Serif Devanagari Bold} % Unlike CJK where we fake slant. Also stops biblatex warnings.
]{Noto Serif Devanagari}
\setmainfont{TeX Gyre Pagella}
\newfontfamily\bigquotefont{TeX Gyre Cursor} % used for big block quotes
% The Gyre project (https://www.gust.org.pl/projects/e-foundry/tex-gyre/index_html)
% explains it all, but basically these are TeX and OpenType font families similar to
% the well-known commercial fonts with similar names, with significantly more functionality.
% On my Linux I installed the package tex-gyre-fonts .
% Disable all CJK small caps attempts. CJK doesn't have smallcaps in the fonts,
% but biber's default is smallcaps for authors. It generates a warning when it can't
% and this avoids large numbers of warnings.
\let\scshape\upshape
\let\textsc\textup
%%
%% Referencing and quoting setup
%%
\usepackage{authblk}
\usepackage[
backend=biber,
style=authoryear, % alternatives include numeric, apa, etc.
doi=true,
url=true,
isbn=false,
datecirca=true, % prints "circa" if date is followed by a slash /. Don't use tilda ~ convention.
dateera=secular, % handles BCE and CE by printing "BCE/CE"
dateeraauto=1600, % adds CE/BCE to anything before this year CE
backref=true, % great idea, but sometimes gets confused with the preview feature
dateabbrev=false,
language=auto, % will change language according to langid, if present in an entry
autolang=other, % Use polyglossia/babel environments (but I am unsure why I need to set it)
maxcitenames=2, % Keep citations short: (Zhang et al., 2024)
mincitenames=1,
maxbibnames=99, % List all authors in the bibliography (who doesn't? rude!)
uniquelist=false % Prevents BibLaTeX from adding names to disambiguate
]{biblatex}
% let long DOIs and URLs in bibliography break, avoiding overfull and other errors,
% and looks nicer.
\setcounter{biburllcpenalty}{7000}
\setcounter{biburlucpenalty}{8000}
\setcounter{biburlnumpenalty}{9000}
% Forces a gap between bib entries that PDF viewers can recognise as a boundary when doing
% a mouseover preview in the main text. Also just makes a bibliography look nicer.
\setlength{\bibitemsep}{1.5\itemsep}
% These mappings make sure biblatex doesn't start translating locale specific things
% like date formats or 'Appendix', 'Bibliography' etc. Other languages are merely content.
% 'british' is equivalent to the modern en_GB locale standard. This also means that the
% default is left-to-right even in an entry containing arabic text enclosed in \textarabic{}.
% This also suppresses error messages from biblatex about 'Language not supported'.
\DeclareLanguageMapping{arabic}{british}
\DeclareLanguageMapping{chinese}{british}
\DeclareLanguageMapping{sanskrit}{british}
% Force always printing 'nameaddon' after the author name. I use nameaddon exclusively for latin versions of
% Chinese (etc) names, so the effect is to render the real, untranslated name in the
% references. See notes at top of biblatex file for details of translation in 'note' field,
% and the special case of right-to-left arabic script in author names. This macro has
% completely replaced the authoryear macro, so it also made dates vanish until I added back here.
% When forcing printing of nameaddon with here, we must use the nameaddon field not shortauthor.
\AtBeginDocument{
\renewbibmacro*{author}{
\printfield{nameaddon}
\setunit{\addspace}
\printnames{author}
\setunit{\addspace}
% This macro handles the label (derived from the 'date' field)
% while respecting BCE/CE and circa formatting according to EDTF (ISO8601-2) dates.
% Modern lualatex tries to be standards-compliant.
\usebibmacro{date+extradate}
}
}
% Make biblatex do quotation handling as expected for a paper
\usepackage{csquotes}
% The following macro seems to approximate the style I see in academic papers.
\DeclareCiteCommand{\parencite}
[\mkbibbrackets] % replaces parentheses with square brackets
{\usebibmacro{prenote}}
{\usebibmacro{citeindex}%
\usebibmacro{cite}}
{\multicitedelim}
{\usebibmacro{postnote}}
\addbibresource{discovering-epidemiology.bibtex}
|