[{"content":" The Rule-based Epidemic Modelling (RBEM) Project was funded by the UK Medical Research Council project (grant X/011658/1) to make rule-based modelling methodology more accessible to the infectious disease modelling community. The rule-based approach is instead of writing differential equations, and there can be human as well as technical advantages to this. Our pre-print (draft) paper illustrates how the approach can be applied to classical problems, in addition to our previous work (cited below) on novel problems during pandemic pressures.\nIn our team with Sándor Bartha, Akindele A. Onifade and William Waites, we modelled malaria as found in sub-Saharan Africa in areas of high prevalance of sickle-cell trait (HbAS). HbAS carriers typically do not develop malaria symptoms when tested but remain infectious to others. We considered the research question:\nIf testing for malaria is done according to observing symptoms, does the infectious resevoir of asymmptomatic HbAS carriers introduce a bias sufficient to affect the course of a malaria epidemic?\nWe were able to investigate this more readily with our rule-based techniques than with traditional approaches, concluding that observational testing bias of the HbAS resevoir can render testing ineffective for suppressing an epidemic, which we contrasted with the effect of randomised testing.\nOverview A significant part of the advantages to rule-based modelling relates to the natural history of a disease.\nThe natural history of a disease is the story of its progress in individual humans or animals. No matter how the disease starts, it has a process and an endpoint - the patient may be in many states including remission, recovered, chronicity or death. The natural history arises from a combination of the empathy of a physician with the observational skill of a scientist, and has been a foundation of medical care for thousands of years.\nAn epidemiological model takes another step, where the natural history can be calculated on computers so we can draw conclusions about what happens across entire populations as we manage a disease.\nThe RBEM project addresses a difficulty in epidemiological models where the computer language expressing the disease equations can look very different from the natural history it is modelling. This in turn makes it difficult to explain what the model is doing, often not immediately obvious even to experts which natural history a model is encoding. This also makes it difficult for people elsewhere to reproduce results. These factors become a problem in a emergency.\nIn this project our goal is to:\nProvide a good language for writing down the natural history of a disease as a story, describing and explaining how a person moves from one state to another, and what comes next. This language needs to be easy to read so humans can understand and improve it, but also suitable for computers to perform calculations on it. The language needs to be easy to set up and run so that it can become part of the daily discourse in investigating disease.\nWe hope this may be a useful additional tool to the many which already exist.\nOrigins of rule-based epidemic modelling The first application of rule-based modelling to epidemics was under the extreme challenges of COVID-19. As the Royal Society\u0026rsquo;s 2020 initiative Rapid Assistance in Modelling the Pandemic made clear, existing techniques were often insufficient. We agreed, and contributed a paper to the Society\u0026rsquo;s 2021 review in which we said:\nReal-time modelling and the vast amount of models developed in the last 24 months have highlighted gaps in the existing technical frameworks that ought to be addressed in preparedness for possible future pandemics.\nThis was confirmed in 2025, when the UK COVID-19 enquiry published its first two reports containing about 250 references to mathematical modelling. Epidemiologists providing advice in times of crisis need the tools to demonstrate their reasoning in a reproducible manner, easily explainable to their scientific peers.\nAll modelling is communication, and we believe our technique has communication advantages for readers in terms of accessibility and explainability, and potentially also in ease of use for those creating models.\nAs we said in our 2021 paper in the Journal of Theoretical Biology:\nRule-based models allow to combine transparent modelling approach with scalability and compositionality and therefore can facilitate the study of aspects of infectious disease propagation in a richer context than would otherwise be feasible.\nHaving validated this approach with many examples in the paper, we now seek to make it more accessible. Here we bring together existing techniques and tools and demonstrate practically how to:\neasily get up and running to produce epidemiological results have a reproducible environment, which is a precondition for reproducible results computational techniques can be overlaid on the basic rule-based system integrate into existing disease modelling workflows which additional work is needed to make the rule-based modelling systems better Getting started ","permalink":"https://shearer.org/research/rbem-malaria/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eThe Rule-based Epidemic Modelling (RBEM) Project was funded by the UK Medical Research Council project (grant X/011658/1) to make rule-based modelling\nmethodology more accessible to the infectious disease modelling community. The rule-based approach is instead of writing differential equations, and there can\nbe human as well as technical advantages to this. Our \u003ca href=\"/files/rbem-malaria.pdf\"\u003epre-print (draft) paper\u003c/a\u003e illustrates how the approach can be applied to\nclassical problems, in addition to our previous work (cited below) on novel problems during pandemic pressures.\u003c/p\u003e","title":"Rule-based Epidemic Modelling and Malaria"},{"content":" In 2026 the Active Heat Exchanger (AHE) research programme has advanced considerably. We started as an engineering development project, found an engineering research question and then found a health research question. We address the problem of houses making their occupants sick, but we also solve difficult data problems highlighted by the COVID-19 pandemic.\nHere is where we started: I walked into the Edinburgh Hacklab one day in 2022 and I saw this arrangement of fans and tubes in the window:\nearly prototype PHE device Awaab Ishak This is Awaab Ishak, aged two years, and I saw this device could be a solution to the tragedy of Awaab's life.\nCosta Talalaev was the designer, a physicist who runs Makerbee Ltd and we started to work together. Two research questions Having mastered the necessary mathematical topology design and high-precision 3D manufacturing skills, we were able to investigate:\nIs it possible to control for three parameters (humidity, CO₂ and temperature) using only fan speed controls? We discovered we can, adding very little cost to turn our already-cheap retrofit ventilation device into a smart indoor climate management system; and Can we provide data the scholarship identifies is lacking, in order to make policy decisions relating to indoor air quality and also disease transmission? So far it seems likely we can, by using AI to combine data across many devices. This has the potential to become a part of preventative health systems and also for active response during incidents of some kinds of disease and airborne pollution. We have not completed this research by any means, but we are well on the way.\nAwaab and housing Awaab Ishak was two years old when he died of a mould-related illness in a damp flat in England. His death moved English politicians and from October 2025, Awaab\u0026rsquo;s Law requires English landlords to fix unhealthy homes.\nThe World Health Organisation estimates that 20–30 per cent of households in many EU countries have dampness problems. The UK is particularly exposed, with the oldest housing stock in Europe — probably the world, unsurprising given that 38 per cent of homes were built before 1946.\n47 per cent of English homes have uninsulated walls, and of the roughly 8.5 million homes with solid walls across Britain, around 90 per cent remain uninsulated. Around 7 per cent of UK homes still lack double glazing, mostly the oldest and hardest to retrofit.\nIn Scotland, 694,000 homes — 27 per cent of the stock — fall below the Tolerable Standard, and 1.2 million homes are below a C energy rating. The problem is particularly bad in temperate oceanic climates.\n⚠️ The problem Awaab\u0026#39;s family had Mould occurs when moisture in warm air meets a cold surface. The problem started with inadequate heating, and poor insulation. The activities of life in a house such as cooking and breathing create warm humid air, and the lack of ventilation and poor heating meant humid air met cold surfaces and mould flourished. In a flat like Awaab\u0026rsquo;s, there are three problems at once:\nmake the air warm with heating keep the warm air (that you just paid to heat) inside with insulation have good air ventilation to the cold outside, without throwing away the warm air It is normal in the UK to solve just one or two of these problems in a retrofit by installing a new heating system or adding insulation, and while that reduces heating bills it also makes people sick, creating more Awaabs.\nThe initial solution The Active Heat Exchanger was conceived when Costa Talalaev, a physicist who runs Makerbee Ltd in Edinburgh, had what seemed a simple problem: his old flat was damp. He couldn\u0026rsquo;t afford to tear it apart for ducting, he didn\u0026rsquo;t want to waste the heating energy he was paying for, and whatever he fitted had to work even with the draughts and leaks that are normal (often by design) in old Scottish buildings.\nUsing 3D printing and his lab measurement tools Costa created an origami-type design which worked for his flat. He then further proved it in the large communal room of the hacklab near his office, where he could monitor temperature and CO₂ over the web for months at a time. I saw the prototype running with its monitoring data on a screen, and joined the project on the spot.\nCosta also founded Warm Edinburgh, a group of over 1,000 people working on tenement insulation and related problems, so he was hearing from tenants and landlords about humidity and black mould constantly. Existing ventilation solutions are designed for modern, well-sealed buildings with centralised ducting. They are inflexible, expensive, and they assume the building is airtight, which old buildings never are. Costa moved the solution away from mechanical rigidity into three components: cheap and highly controllable fans, intelligent control software, and comprehensive sensors. The result is cheap to produce, and even at hacklab scale the evidence from manufacturing PPE during the pandemic shows we can produce thousands of units per year before outsourcing to a factory for very high volumes.\nBefore showing where we are today, it helps to look at the principle of heat exchanging.\nTraditional Passive Heat Exchange principles The original Passive Heat Exchanger (PHE) built by Costa is a 30-centimetre box section tube filled with a honeycomb of hundreds of 5mm tubes. A diverter at one end splits the tubes so that half exit one way and half exit the other. You mount the whole thing through a wall, with a small fan at each end. Warm air leaving the building flows out through one set of tubes while cold air from outside flows in through the other set, and because the tube walls are extremely thin (as thin as a sheet of paper), heat transfers through them. The outgoing warm air heats the incoming cold air, so you get fresh air without losing your hot air.\nThis is the kind of PHE commonly found in German Passivhaus systems, and illustrates the general principle: Classic heat exchanger What Active Heat Exchange adds In its simplest, passive form the device runs with constant fan speed and a manual control, a stripped down version of the initial prototype. The Active Heat Exchanger adds IoT sensors and controllable fans, and this is where something clever happens with the physics. Measuring airflow directly in a small tube is very difficult, but there is almost always a temperature difference between the two ends of the exchanger, and Costa worked out that by monitoring the temperature differential you can apply physics principles in equations to reliably infer the airflow rate and direction. That single measurement from a cheap temperature sensor, combined with humidity and CO₂ sensors, gives enough information to manage fan speed, compensate for wind pressure and maintain air quality automatically, all from a unit the size of a bathroom exhaust fan.\nEach active unit optimises its own immediate environment using control theory, and by connecting multiples of these intelligent systems they can coordinate their activities for an entire building without needing a traditional centralised computer system.\nImproving on the overseas normal For the last thirty years the German Passivhaus standard has addressed all three of these problems at once, including a heat recovery ventilation system as the de facto standard. Heat recovery brings in fresh air from the outside, warming it with the stale air as it goes out, with ducting throughout the house. In contrast, the UK updated minimum heat recovery requirements in 2022, which do not apply to existing homes which need it the most. New build homes can be constructed as a sealed box where it is easy to do heat recovery, whereas old buildings are leaky, never designed for ducting and wind overwhelms ventilation fans and pushes all the heat out through leaks elsewhere.\nThe needs of any building are constantly changing. For example, Carbon Dioxide (CO₂) builds up when people are in the room and clears when they leave, humidity spikes when someone showers, and air pressure changes with the weather and wind. Managing these things is easier if the building is a sealed box, but there it is still a dynamic, changing situation ideally suited to Internet of Things device thinking.\nWe have interesting designs in progress (such as fitting into existing window bleed vents), but what we have now fits into a standard 100mm bathroom exhaust hole, exchanges 30 cubic metres of air per hour, and would cost perhaps £300 to an end consumer installing it in their house. It is remotely controlled and produces important data on what it is that people breathe in during most of their lives.\nWhere we know it works There are some applications that are immediate and obvious, and do not need additional development because we have solved the engineering problem. The PHE is a drop-in replacement for an exhaust fan, fitting into the same 100mm hole. It works in old buildings because it doesn\u0026rsquo;t depend on the envelope being sealed. It handles wind because the control system adapts. And because the units are small and modular, they fit into spaces that centralised systems can\u0026rsquo;t reach.\nInline exhaust fan Boats and live-in vans or caravans have serious condensation problems that most people don\u0026rsquo;t think about until they live aboard or travel in winter. A boat cabin or a campervan with people sleeping in it generates litres of moisture overnight, and without ventilation everything is soaked by morning. The second-generation PHE design fits into a panel mount for exactly this kind of installation, and in vehicles the passive version with constant low-speed fans is usually all you need.\nSecond generation unit in panel mount The potential agricultural applications surprised us. In intensive horticulture, plants need CO₂ delivery, humidity control, and protection from temperature swings, and growers currently use gas heating for all of this at great expense. In animal husbandry, the needs are different (air drying, dust removal, disease vector control) but the AHE\u0026rsquo;s ability to manage airflow through narrow tubes suits both. The really interesting possibility is connecting plant and animal controlled environments via AHE, because their atmospheric needs are partly complementary: animals produce CO₂ and excess heat that plants can use, while plants produce oxygen. The narrow tubes also make it possible to divert gases through filters, including Zeolite systems that can capture methane. There are additional research questions here, but we have done enough investigation to believe this to be a promising field.\nWhere we are today We are working, somewhat slowly, toward production-ready design and independent lab testing. Costa and I have been developing this since around 2022, and during the pandemic Makerbee demonstrated it could handle manufacturing at scale by producing 53,000 pieces of PPE on 3D printers in the lab. We\u0026rsquo;re confident the step from low-volume production to mass manufacture via extrusion factories is manageable once we have the design fully validated, but validation takes time and funding.\nThere is optimisation work ahead: improving efficiency per gram of weight, smoothing surfaces that meet airflow, adding automatic flaps against strong winds, detecting and controlling for insects and mould spores. We have been testing working units in real occupied spaces for over a year, and the core works well enough that we\u0026rsquo;re now focused on production engineering rather than proving the concept.\nThe goal is to make this cheap enough and simple enough to retrofit that it becomes the default way to ventilate a building, because that is what it will take to stop children dying of preventable mould-related illness in one of the wealthiest countries in the world.\nThe indoor data nobody has The scientific scholarship acknowledges little is known about Indoor Air Quality (IAQ). The US Environmental Protection Agency has no monitoring network routinely measuring IAQ. The UK Parliamentary science office concluded in 2023 there are gaps in IAQ research around airflow and pollutant accumulation. A review of two decades of research found a lack of IAQ studies covering both residential and commercial environments. Those that do exist are mostly brief snapshots of single rooms, what the literature calls short-term monitoring bias.\nEverything we do monitor is outdoors. Scotland\u0026rsquo;s entire air quality network measures ambient street-level pollution at fixed sites. The EU\u0026rsquo;s revised Ambient Air Quality Directive, in force since December 2024, strengthened outdoor monitoring but left indoor air outside its scope entirely. The EU\u0026rsquo;s SINPHONIE and OFFICAIR field campaigns measured indoor pollutants in schools and offices across member states, but these were time-limited research projects, not monitoring networks, and they did not cover homes.\nThe reason is cost, or at least that was the case until recent advances in reliable Internet of Things monitoring devices. Every indoor setting is unique in layout, occupancy, leaks, heating and more, so useful data requires sensors in thousands or millions of actual rooms. The parameters that matter for damp and ventilation — CO₂ as a tracer for ventilation efficiency, humidity, and airflow rate — are rarely measured together continuously in real occupied dwellings. These are the measurements needed to understand why buildings like Awaab\u0026rsquo;s flat become lethally damp, and what interventions actually work over time.\nThe active PHE changes this. Its control logic already requires continuous measurement of temperature differential, humidity, CO₂, and inferred airflow as the mechanism by which it operates. A network of deployed units would generate longitudinal indoor air quality data across a diverse sample of real homes: old tenements, new builds, boats, vans, across seasons and weather conditions. That kind of large-scale, long-term residential dataset is what the field most lacks, and it would arrive as a side effect of solving the ventilation problem rather than as an expensive research programme.\nCO₂ and infection Every breath you exhale contains CO₂, and so does every virus-laden aerosol an infected person releases into a room. Neither disperses instantly. CO₂ concentration is a practical proxy for the fraction of air in a room that has recently been inside someone else\u0026rsquo;s lungs — and therefore for how likely you are to inhale what someone ill exhaled a few minutes ago.\nThe use of CO₂ as a ventilation indicator dates to the 19th century, but it was formalised into infection risk modelling through the Wells-Riley model of airborne transmission, which shows that the likelihood of inhaling an infectious dose scales with the concentration of rebreathed air. During COVID this became practical public health guidance. SAGE advised that spaces with aerosol-generating activity should maintain CO₂ below 800ppm, and that CO₂ measurements could be used directly to infer airborne infection risk. Research in Nature Communications added something unexpected: elevated CO₂ at around 800ppm doesn\u0026rsquo;t merely indicate poor ventilation — it increases the aerostability of the virus itself, making SARS-CoV-2 more infectious in the air. CO₂ is not just a proxy for risk but a contributing factor.\nThe UK government distributed more than 386,000 CO₂ monitors to state-funded schools. Research monitoring 36 naturally ventilated classrooms found that airborne infection risks in winter were roughly double those in summer due to closed windows. The monitors worked: 96% of schools that used them could identify when to increase ventilation. Teachers learned, for the first time, what the air in their classrooms was actually doing.\nThen as COVID receded and energy costs rose, most schools stopped monitoring. By late 2022 only 26% of classrooms were still tracking CO₂. The knowledge was demonstrated, the infrastructure was distributed, and then it was quietly abandoned because opening windows costs money in winter and nobody was telling people to be afraid any more.\nThe pandemic established that ventilation is an infection-control intervention, not a comfort issue, and that CO₂ is a practical tool for managing it in real time. That lesson produced no lasting infrastructure in homes, where most transmission actually happens. A network of active PHEs, continuously measuring CO₂ as part of their normal operation, would be that infrastructure — present regardless of whether a pandemic is in progress, acting on the data rather than displaying it.\nSo what next? We need to develop the AI and data systems and deploy the Active Heat Exchanger to several hundred buildings, ideally a mix of homes and schools. This would then give us enough information to validate and refine the concept for city-scale rollouts. Some physical testing and certification is required, but we believe this to be relatively straightforward.\nIf you are interested, let us know. This is our passion project, but these vital topics of health and well-being should matter to local authorities in many parts of the world, and certainly in Scotland.\n","permalink":"https://shearer.org/research/active-heat-exchanger/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eIn 2026 the Active Heat Exchanger (AHE) research programme has advanced considerably. We started as an engineering development project, found an engineering research question and then found a health research question. We address the problem of houses making their occupants sick, but we also solve difficult data problems highlighted by the COVID-19 pandemic.\u003c/p\u003e\n\u003cp\u003e\nHere is where we started:\n\u003cdiv style=\"display: flex; gap: 1rem; align-items: center;\"\u003e\n  \u003cp style=\"flex: 1; margin: 0;\"\u003eI walked into the Edinburgh Hacklab one day in 2022 and I saw this arrangement of fans and tubes in the window:\u003c/p\u003e","title":"Active Heat Exchanger"},{"content":" Not Before Time started with a question: can time be connected to information as reliably as GPS connects position to location? The mathematics to do this is decades old and already running in production systems. I soon discovered this question has been substantially answered, and the next question was a lot more subtle: how can time-linked information be rolled out to society for practical purposes, when what we are doing is introducing a new kind of certainty? The applications sound very tempting, ranging from protecting journalists and whistleblowers to sealed commercial bids, legal instruments, and proving when AI-generated content was created. But what happens when billions of people discover that the power balance inherent in information has been inverted?\nThe first part of this document describes the concept of Not Before Time and how it can be useful and might impact the world. The second is a detailed review of prior art and re-usable components, many of which I have tried out.\nWhat Did They Know, and When Did They Know It? This old question in law, journalism and commerce has decided the outcomes of criminal trials, ended political careers, and settled the fate of companies. But when it comes to digital information, until recently it was quite difficult to answer.\nNot Before Time (NBT) is a public broadcast facility that connects information to time in three ways that can be trusted without trusting any single person or organisation:\nInformation will not be readable before a certain future time — a message, document or file is sealed so that nobody can open it until the time comes. Information existed by a certain past date — a document can carry cryptographic proof of the earliest possible moment it was sealed. Information was not electronically signed before a certain time — a signature can carry proof that it could not have been applied before a certain date, closing a gap that has long undermined the value of digital signatures in courts. These three guarantees — No Early Reading, No Late Denials, No Backdating — support human rights, democracy, journalism, and business. They are complementary to ordinary encryption.\nNBT focusses on why people want to time-lock information like this, knowing that the technical details of how are mostly solved already.\nA New Kind of Certainty Very occasionally, a new kind of certainty is invented.\nFor example, reliable timekeeping gave sailors a way to know where they were on a featureless ocean, and in the centuries since, industrialised society has built itself around the assumption that everyone has access to accurate time. GPS extended the same idea to location: since the 1990s, continuous knowledge of where you are on earth became so important that the European Union built the Galileo satellite system to ensure that access could never be switched off by another country. Radiocarbon dating provded new certainties about the past, so historians could be sure when events happened and in what order, turning legend and guesswork into verifiable chronology. And in the tech field, public-key cryptography as used since the 1990s gave individuals the ability to communicate privately with a stranger without any prior relationship.\nEach of these was a new kind of certainty that affected society strongly. NBT adds another certainty: the ability to connect any piece of information, digital or physical, to time, with strong cryptographic guarantees and no dependence on any single authority.\nThe underlying mathematics is not new, and the technology is ready to be deployed as public infrastructure. In 2026 the concerns of digital sovereignty for both individuals and nations suggest the time is right.\nWho Needs This, and Why Now? Journalists and Whistleblowers need NBT A source needs to release damaging information, but not yet — perhaps while she is still in a position to escape, or before a legal embargo lifts, or to coordinate release across a hundred newsrooms simultaneously. She can encrypt her document using the NBT public key corresponding to a future date and distribute it freely, even publicly. The act of distribution is itself observable — a source with strong anonymity requirements should use an anonymous distribution layer — but nobody can read the document before that moment arrives, and nobody can later claim she back-dated the release.\nThis use case has grown significantly more urgent since 2023. Governments in the United Kingdom, the United States, and across the European Union have renewed pressure on technology companies to weaken encryption or provide backdoors. Any system that depends on a single company holding secrets is now a target. NBT, by distributing trust across mutually mistrustful parties in multiple jurisdictions, is structurally resistant to this pressure. Closing down the human rights application would require closing down commercial applications simultaneously — a political barrier that single-jurisdiction systems cannot offer.\nAuthors and Anyone Claiming Priority need NBT A researcher has a result she isn\u0026rsquo;t ready to publish. A novelist has a manuscript she believes will be controversial. An inventor has a design. In each case there is a legitimate need to prove, later, that the work existed before a certain date, without revealing the work to anyone.\nWith NBT, she encrypts her work using the public key corresponding to today\u0026rsquo;s date and distributes the encrypted file. When the network later releases the key for that time slot, anyone can verify independently that the encrypted file was sealed using a public key that was only valid after today\u0026rsquo;s NBT timestamp. She cannot have done this yesterday. To prove priority she reveals the original document, which anyone can check against the published ciphertext.\nThis guarantee has gained a new dimension since the widespread deployment of generative AI from 2022 onwards. NBT proves when encryption occurred, not when content was created — an important distinction, since someone could encrypt old material under a fresh timestamp. But a document encrypted under an NBT timestamp predating a model\u0026rsquo;s training cutoff cannot have been produced by that model, and if the protocol commits to a content hash inside the encrypted payload, re-encryption of older material becomes detectable. With that protocol design, NBT creation proofs are an increasingly important tool for content provenance and authenticity.\nSealed Bids and Embargoed Announcements need NBT A construction company wants to accept sealed bids for a new building. If bids are submitted as NBT-locked documents, no bidder can see any competitor\u0026rsquo;s price, and the procuring organisation cannot be accused of leaking prices from one bidder to another — because nobody can read any bid before the closing date, not even the procurer. This is verifiable and does not require trusting the procurer.\nA company wants to make a major market announcement simultaneously in every timezone. An embargoed NBT-locked press release is distributed to journalists around the world days in advance. The embargo cannot be broken — not by a hack, not by a determined editor, not by any means — because the decryption key does not yet exist. When the embargo expires, it becomes readable everywhere at once.\nScheduled Payments and Legal Instruments need NBT The ability to create a document that becomes readable, and therefore actionable, at a specific future time has applications in wills, deferred payments, conditional disclosures, and time-limited legal obligations. An NBT-locked document requires no single trusted custodian and cannot be opened early by anyone.\nHow It Works for Everyone NBT is a public broadcast clock that works for information. It is open to all regardless of nationality or affiliation, and the value is in what people build on top of it rather than in the NBT infrastructure.\nImagine that a group of organisations holds a safe. The organisations are drawn from different countries and institutions who do not trust one another but who have agreed to cooperate. Inside the safe is a long list of keys, one for every half-hour slot from now until some date in the future. Every half hour, they all agree to take out the key for that slot and broadcast it publicly to the world. Once a key is broadcast, it cannot be recalled. If they cannot agree, the key is not broadcast.\nBefore any of this, anyone can obtain the corresponding lock for any future slot. Locking a document uses that public lock. Opening the document requires the key that will only be broadcast when the time arrives.\nNobody can open the document early because the key genuinely does not exist in usable form until that moment — it is held in pieces, distributed among parties who do not trust each other and will not combine their pieces before the scheduled time.\nThis is the essence of NBT. The mathematics that makes it work is decades-old and trusted by the entire computing industry. The novelty is in deploying it as public infrastructure, and in the three distinct guarantees it makes. This describes one approach; alternatives exist that avoid a pre-generated key schedule entirely — see the Technical Appendix.\nThe Three Guarantees in Plain Terms No Early Reading. An encrypted document cannot be decrypted before the scheduled time. This is true regardless of who holds the document, how powerful their computers are, or how much they want to read it. Under the threshold construction, the key simply does not exist in combined form until the time arrives. The guarantee depends on the beacon network continuing to operate — if the network fails, the key is never released and the document remains sealed indefinitely.\nNo Late Denials. A document signed using NBT carries proof that it could not have been signed before the NBT timestamp. The signature is anchored to real time in a way that is independently verifiable. Centralised timestamp authorities already offer this service, but their guarantees depend on trusting a single organisation; NBT replaces that contractual trust with cryptographic trust distributed across jurisdictions.\nNo Backdating. When a document is encrypted at a given NBT timestamp, this proves that the encryption occurred after that timestamp — the document could not have been sealed before the corresponding public key existed. The proof is independent of anything the document\u0026rsquo;s author says or does later. (NBT cannot verify whether the content is truthful or when it was originally created, only the timing of the encryption.)\nWhy Now, and Why This Approach? The concept of time-locked encryption has existed in academic literature since the mid-1990s. Production-quality implementations now exist — notably the [tlock](docs.drand.love/docs/timelock-encryption) system operated by the League of Entropy, a consortium of organisations including Cloudflare, Protocol Labs, and EPFL. These implementations are evidence the technology is viable.\nWhat does not yet exist is:\nA quantum-resistant implementation designed for long-duration locks. Current production systems, including the League of Entropy\u0026rsquo;s, rely on cryptographic schemes that would not survive a sufficiently powerful quantum computer. This matters enormously for documents intended to remain sealed for decades, and the system needs to be designed with this in mind before it becomes a practical concern. A governance and standardisation framework that gives the system standing in law, in regulatory contexts, and in international commerce. The League of Entropy has an operational governance model, but it was designed for randomness generation and carries no legal standing as a timestamping authority. A working demonstration is not the same as a standard. Integration into the tools ordinary people already use. The tlock system has seen experimental use in vulnerability disclosure and blockchain applications, but it remains a command-line tool. The modifications to LibreOffice, email clients, and file managers that would make NBT available to a journalist, a solicitor, or a small business owner have not been built. A specific application to the AI content provenance problem, which did not exist in its current form when the first implementations appeared and which has since become a serious information-integrity problem. NBT\u0026rsquo;s contribution is not a new algorithm. It is the synthesis of timed-release encryption, verifiable timed signatures, and post-quantum cryptography into a public utility framework, with the governance, standardisation, and end-user integration needed to make it real infrastructure. The work is to build on what exists, fill what remains, and bring the result to the scale and accessibility of a public service.\nWhy Hasn\u0026rsquo;t This Happened Already? Trusted timestamping already exists. RFC 3161, standardised in 2001, defines timestamp guarantees, and commercial Timestamp Authorities have operated for two decades. The ecosystem around it is substantial: ISO/IEC 18014 standardises timestamping services internationally, and ETSI TS 101 733 (CAdES) provides a legally binding electronic signature framework with timestamp support across the EU. Any new proposal must reckon with this institutional weight. But the guarantees these standards provide are contractual, not cryptographic: a TSA operating under a secret legal order can silently undermine the attestations it issues. NBT\u0026rsquo;s distributed architecture makes this impossible by construction.\nThe League of Entropy solved the hard technical problem — for developers. The tlock system is production-quality time-locked encryption running on a multi-jurisdictional threshold network. It hasn\u0026rsquo;t become public infrastructure because it has no integration into software ordinary people use, no governance framework giving its attestations legal standing, and no plan for cryptographic longevity.\nThree things have changed since tlock launched. Generative AI has created a forgery problem that forensic approaches cannot reliably solve, and NBT offers a cryptographic guarantee where every alternative offers only statistics. Centralised trust has visibly failed — backdoor demands, secret legal orders, and jurisdictional fragmentation have made the case for distributed trust obvious to non-technical audiences. And the 2024 NIST post-quantum standards and advances in ceremony-free cryptographic constructions together close the two hardest open problems: long-term durability and the integrity of the initial key generation.\nNBT hasn\u0026rsquo;t happened because the people with the institutional access to deploy it haven\u0026rsquo;t felt sufficient urgency. That has changed.\nThe Governance Problem The implementation checklist looks straightforward: deploy nodes, write some plugins, draft an RFC. The governance is harder than all of these combined, and it is the reason trusted timestamping has never become universal infrastructure despite two decades of working technology.\nIf a key release fails — because nodes go dark, disagree, or are legally compelled to withhold — who bears responsibility to the parties who relied on the guarantee? A journalist whose timed release misfires, or a sealed bid that cannot be opened, needs a remedy. No cryptographic protocol provides one.\nA threshold architecture distributes trust but does not eliminate it. A sufficiently determined government can approach node operators individually. The League of Entropy\u0026rsquo;s current arrangement has no binding legal framework governing how operators must respond to such orders, or whether they must disclose them. For NBT to have standing in courts and regulatory contexts, the operating agreement needs to address this — ideally by distributing nodes across jurisdictions where conflicting legal obligations make coordinated compulsion structurally implausible.\nA 25-year key schedule outlives organisations. The operating agreement needs to specify what happens when a node operator dissolves, is acquired, changes jurisdiction, or loses interest. There is no cryptographic solution to an organisation ceasing to exist.\nThe League of Entropy is the natural template, and its operational experience is valuable. But it was designed for randomness generation, not for producing attestations with legal standing. Adapting it for NBT means negotiating a binding multilateral agreement — closer in character to a treaty than a terms-of-service — between organisations with different legal systems, different risk appetites, and different relationships with their respective governments. Governance is the primary deliverable, with the technical implementation following from it.\nNext steps The requirements are:\nA production deployment of an NBT node network with formal governance, jurisdiction diversity, and a legally defensible operating agreement between participating organisations. End-user software integrations — beginning with LibreOffice, Thunderbird, and a browser plugin — that make NBT accessible without any technical knowledge. A foundational cryptographic design decision on the construction (IBE, VDF, or hybrid), which should be the first technical deliverable. A pilot programme with at least two high-value use-case communities: investigative journalism organisations and legal professionals in sealed-bid procurement. An RFC-type standardisation proposal covering the NBT data format, key distribution protocol, and the three guarantee types. The construction choice in item 3 determines the trust model, the governance architecture, and the quantum risk profile. Whether to use an IBE-based construction (as tlock does), a VDF-based ceremony-free construction, or a hybrid must be informed by the NIST post-quantum standards finalised in 2024. Everything else follows from it.\nThe base broadcast infrastructure should not carry a direct commercial model, because restricting access would undermine both the human rights and the commercial use cases. The value is in the applications built on top of it. Demonstrating that value through real deployments in journalism and law will attract the commercial application investment that follows.\nIf you want to contribute to experiments and refinements of this work I\u0026rsquo;d be delighted to hear from you.\nTechnical Appendix This appendix is intended for cryptographers, engineers, and technically informed funders. It assumes familiarity with public-key cryptography.\nA.1 Two Approaches to Time-Locked Encryption The academic literature distinguishes two approaches to making information inaccessible until a future time. NBT belongs to the first category; the second is described here for completeness and because a well-informed NBT proposal must know when each is appropriate.\nServer-based Timed-Release Encryption (TRE) requires a trusted network that holds (shares of) private keys and releases them at scheduled times. Decryption is computationally trivial for the recipient once the key is released. Trust is managed by distributing key shares across mutually mistrustful parties. This is the approach described in this document. The foundational academic treatment is Rivest, Shamir and Wagner (1996), and the most complete modern construction is Liu, Jager, Kakvi and Warinschi (2018), \u0026ldquo;How to Build Time-Lock Encryption\u0026rdquo;.\nVerifiable Delay Functions (VDFs) require no trusted server. A VDF is a function that requires a specified number of sequential steps to evaluate — no amount of parallelism speeds it up — and whose output can be verified quickly by anyone. The recipient simply performs the computation; time itself is the lock. VDFs were formalised by Boneh, Bonneau, Bünz and Fisch (2018). They are actively used in blockchain contexts for randomness generation and anti-manipulation properties.\nChoosing between them: TRE is appropriate when recipients have constrained computational resources, when short time horizons make VDF computation impractical, or when decryption must happen simultaneously for many parties. VDFs are appropriate when complete trustlessness is the paramount requirement and the receiver can afford sequential computation. For a public broadcast infrastructure with ordinary-user accessibility as a goal, TRE is the natural starting point, but VDFs deserve consideration for specific applications where trustlessness outweighs convenience. A hybrid construction — VDF-based key generation with IBE-style encryption for end users — may offer the trustlessness of VDFs without requiring recipients to perform sequential computation. This is an active area and the choice of construction should not be treated as settled.\nA NIST overview of both approaches was presented at the Special Topics on Privacy and Public Auditability event, January 2025.\nA.2 The Trust Problem and Its Solution The fundamental obstacle to server-based TRE is keeping the list of future private keys secret until the moment of their release, while convincing users that no single actor has illicit access to them.\nThe solution is Shamir\u0026rsquo;s Secret Sharing (SSS), a secure multiparty computation algorithm with decades of analysis behind it. Each private key is split into n shares distributed across n parties such that any k of the n shares suffice to reconstruct the key, but any collection of fewer than k shares reveals nothing about the key. If the n parties are from different countries and institutions with no incentive to collude, the trust problem is reduced to a game-theoretic question rather than a cryptographic one.\nThe most complete recent theoretical treatment of this construction applied to timed-release encryption is Zhou et al., \u0026ldquo;Multiple Time Servers Timed-Release Encryption Based on Shamir Secret Sharing\u0026rdquo; (2023).\nThis analysis assumes the IBE/threshold construction used by tlock. A ceremony-free construction based on VDFs changes the trust problem: instead of \u0026ldquo;can we trust the key generation event,\u0026rdquo; it becomes \u0026ldquo;can we trust that the delay function is sequential.\u0026rdquo; These are different questions with different governance implications, and the choice between them is part of the foundational design decision described in A.1 and A.5.\nA.3 The Existing Production Implementation The drand project, developed by Randamu and operated by the League of Entropy — a consortium including Cloudflare, Protocol Labs, EPFL, Kudelski Security, the University of Chile, and others — has been running a distributed randomness beacon since 2019, with the production network launching in 2020. Since 2023 it has supported timelock encryption on its mainnet via tlock, a scheme using Identity-Based Encryption (IBE) over BLS12-381 elliptic curves combined with threshold BLS signatures.\nThe tlock scheme was independently audited by Kudelski Security. Implementations exist in Go, TypeScript, and Rust. A browser-based demo (Timevault) is publicly accessible.\nThe quicknet network operates at a 3-second beacon frequency across a consortium growing from its 23 nodes in October 2023. The threshold is set such that the system is secure unless a majority of those organisations collude.\nLimitation: The drand/tlock system is not quantum-resistant. The IBE scheme (Boneh-Franklin 2001) and the BLS signature scheme are both vulnerable to a sufficiently powerful quantum computer. The drand team is explicit about this. This is the most significant open engineering problem for any TRE system intended to protect information for decades, as NBT must.\nA.4 The Three Guarantees — Technical Statement Guarantee 1: No Early Reading (Time-Locked Confidentiality)\nA sender encrypts plaintext M under the public key pk_t corresponding to future time t. The corresponding private key sk_t is not released until time t passes. Under the security assumptions of the IBE scheme, no computationally bounded adversary can recover M before time t.\nThe formal security model requires care: an adversary may try to \u0026ldquo;put the clock forward\u0026rdquo; by corrupting threshold participants, so the security proof must account for adversarial time manipulation. This is addressed in Liu et al. (2018) and in the drand protocol\u0026rsquo;s treatment of equivocation.\nGuarantee 2: No Late Denials (Verifiable Timed Signatures)\nAn ordinary digital signature carries a claimed timestamp that the signer chose. A timed signature, by contrast, incorporates a commitment to a future NBT public key at the time of signing, such that the validity of the signature can only be verified after the NBT network releases the private key for that time slot. This cryptographically binds the signature to real-world time in a way that is externally verifiable. This means timed signatures cannot be verified in real time — verification becomes possible only after the corresponding key is released.\nThe most directly relevant technical work is Thyagarajan, Bhat, Malavolta, Döttling, Kate and Schröder, \u0026ldquo;Verifiable Timed Signatures Made Practical\u0026rdquo; (2020). This paper gives a complete construction and security proof and should be the primary technical reference for this guarantee in any NBT standardisation effort.\nThe existing RFC 3161 Trusted Timestamping standard provides a weaker version of this guarantee using a single Timestamp Authority. NBT\u0026rsquo;s distributed approach eliminates the single point of trust. See RFC 3161.\nGuarantee 3: No Backdating (Proof of Earliest Possible Creation)\nThis is the guarantee with the least prior academic treatment and the most novel application potential in NBT. Non-custodial existence proofs already exist — blockchain-anchored timestamps such as OpenTimestamps can prove a document existed at a given time without trusting a central authority — but they do not provide confidentiality. NBT combines the existence proof with secrecy: the document is sealed until the scheduled time. When a document is encrypted under a given NBT public key, this proves that the document existed and was in the encryptor\u0026rsquo;s possession at the time of the NBT timestamp corresponding to that key — since the public key schedule is fixed and publicly auditable, and no legitimate public key for that timestamp existed before it.\nTwo qualifications:\nThis proves that the encryption occurred after the given timestamp. It does not prove that the content of the document was created at that time. An encryptor could encrypt a document created years earlier and claim the NBT timestamp as a creation date. Protocols that depend on Guarantee 3 for content provenance need to be designed with this limitation in mind, typically by including a hash of the document and a statement of creation date within the encrypted payload. The guarantee depends on the integrity of the NBT public key schedule. If the key schedule were compromised at generation time, false timestamps could in principle be created. This reinforces the importance of the key generation process and the distribution of private key shares. The AI content provenance application of this guarantee is unexplored in the academic literature and represents a productive area for new work. A document encrypted under an NBT key whose timestamp predates the training cutoff of a given generative AI model cannot have been produced by that model. This is a useful forensic property, though the protocol needs to prevent an attacker from encrypting old content under a fresh timestamp and claiming it as new.\nThe principal existing approach to content provenance is the C2PA (Coalition for Content Provenance and Authenticity) standard, which uses metadata manifests and cryptographic hashes to record the origin and editing history of digital content. C2PA is more comprehensive for tracking how content was produced and modified, but its guarantees depend on the signing infrastructure and can be stripped from a file. NBT offers something different: a proof of time that is independent of any metadata chain. A C2PA manifest says who made this and how; an NBT timestamp says this existed before a given moment. The two are complementary rather than competing.\nA.5 The Quantum Resistance Problem The NIST post-quantum cryptography standardisation process concluded in August 2024 with the publication of three initial standards: ML-KEM (FIPS 203) for key encapsulation, ML-DSA (FIPS 204) for digital signatures, and SLH-DSA (FIPS 205) for stateless hash-based signatures.\nThe specific challenge for NBT is that the key schedule — the pre-generated list of public keys — must be designed once and remain secure for the intended lifetime of the system. A 25-year key schedule generated today using BLS12-381 would be vulnerable if a cryptographically relevant quantum computer becomes available within that window. Expert estimates suggest this is plausible within 15 years at even odds — the Global Risk Institute\u0026rsquo;s 2024 Quantum Threat Timeline Report, surveying 32 experts, places the median estimate for a cryptographically relevant quantum computer at under 15 years, with some estimates considerably shorter. A construction that avoids a master secret — using VDFs or hash-based schemes instead of IBE — sidesteps part of this problem, since there is no long-lived secret to protect against a future quantum adversary, only the hardness of the underlying delay function.\nNo publicly available post-quantum TRE scheme suitable for NBT\u0026rsquo;s server-based architecture has been standardised. Active research directions include:\nTime-Lock Puzzles from Lattices (Agrawal, Malavolta and Zhang, CRYPTO 2024), which gives the first VDF-style construction based on lattice assumptions, relevant to the trustless paradigm. Isogeny-based VDF constructions, which may eventually offer post-quantum delay properties, though these remain less mature. See Chavez-Saab, Rodriguez-Henriquez and Tibouchi (2021). Using SLH-DSA (FIPS 205) hash-based signatures for the key schedule itself, providing quantum-resistant authentication of key release events even if the encryption layer is not yet fully quantum-resistant. No production-ready post-quantum TRE scheme exists today for either the server-based or VDF-based architecture. NBT\u0026rsquo;s long-term promise depends on cryptography that is currently at the research stage, not just on governance and engineering. A credible NBT proposal for 2026 onwards should include a post-quantum design review as a first-phase deliverable, producing a recommended cryptographic suite and a migration path from current BLS-based implementations.\nA.6 Standardisation Status and Needed Work The concept is not standardised. The following work is needed for NBT to reach the status of reliable public infrastructure:\nTheoretical work: A paper narrowing the field of applicable schemes to the minimum required for a practical NBT deployment: specifying encryption scheme, threshold signature scheme, key distribution protocol, and time format. The Zhou et al. (2023) paper is the closest existing work. A companion paper addressing post-quantum requirements would follow.\nStandards work:\nAn RFC or RFC-style definition of the NBT data types, including the format of an NBT-locked document, an NBT creation-proof, and an NBT timed signature. An RFC-style definition of the protocols for: (a) threshold nodes that hold partial key shares, (b) key release broadcast, and (c) end-user clients that encrypt and decrypt. Implementation work:\nIntegration into LibreOffice (File → Export as NBT-Locked PDF). A Thunderbird plugin for NBT-locked email. A browser extension for NBT verification of web-distributed documents. An NBT virtual printer for Linux, Android, and Windows. Governance work:\nA formal operating agreement for a multi-jurisdictional node network, modelled on the League of Entropy\u0026rsquo;s arrangements but with additional legal standing for use in evidence. A key generation protocol — whether ceremony-based or ceremony-free — with formal security analysis, following established best practices for verifiable key generation. A.7 Threat Model Threshold collusion. If k or more node operators collude, they can reconstruct any private key ahead of schedule. The defence is jurisdictional and institutional diversity — making collusion politically implausible rather than cryptographically impossible. The selection of k, n, and the node operators are governance decisions with direct security consequences.\nLegal compulsion. A government with jurisdiction over k or more operators can achieve the same result without operator cooperation. Jurisdiction diversity is imperfect: operators in different countries may face coordinated pressure, and secret legal orders may prevent disclosure.\nKey generation compromise. For IBE-based constructions, a compromised key generation event breaks all subsequent guarantees silently and permanently. This is the strongest argument for ceremony-free constructions.\nTiming attacks. An adversary who can observe when encrypted documents are submitted can infer information about their contents, particularly if submission timing correlates with real-world events. NBT provides no traffic analysis protection.\nContent vs. encryption timing. NBT proves when encryption occurred, not when content was created. Protocols relying on No Backdating for content provenance must commit to a content hash and creation-date statement inside the encrypted payload.\nPost-release confidentiality. Once a private key is broadcast it cannot be recalled. NBT-locked documents offer confidentiality before the release time, not after it.\nBeacon availability. NBT\u0026rsquo;s value depends on the beacon operating reliably for decades. A network partition, funding collapse, or loss of infrastructure dependencies (cloud providers, DNS) could prevent key release at the scheduled time — breaking the No Early Reading guarantee not by early disclosure but by indefinite deferral. This is a liveness problem distinct from the safety problems above, and the governance framework must address it through redundancy, funding commitments, and succession planning.\nA.8 Selected References Foundational\nRivest, Shamir and Wagner — \u0026ldquo;Time-Lock Puzzles and Timed-Release Crypto\u0026rdquo; (1996) Liu, Jager, Kakvi and Warinschi — \u0026ldquo;How to Build Time-Lock Encryption\u0026rdquo; (2018) Shamir — \u0026ldquo;How to Share a Secret\u0026rdquo; (1979) Threshold and Distributed Trust\nZhou et al. — \u0026ldquo;Multiple Time Servers Timed-Release Encryption Based on Shamir Secret Sharing\u0026rdquo; (2023) Drand / League of Entropy — tlock scheme documentation Kudelski Security — tlock security assessment (2023) Timed Signatures\nThyagarajan et al. — \u0026ldquo;Verifiable Timed Signatures Made Practical\u0026rdquo; (2020) Boneh and Naor — \u0026ldquo;Timed Commitments\u0026rdquo; (CRYPTO 2000) RFC 3161 — Internet X.509 PKI Timestamp Protocol Verifiable Delay Functions\nBoneh, Bonneau, Bünz and Fisch — \u0026ldquo;Verifiable Delay Functions\u0026rdquo; (2018) Wesolowski — \u0026ldquo;Efficient Verifiable Delay Functions\u0026rdquo; (2018) Biryukov et al. — \u0026ldquo;Cryptanalysis of Algebraic Verifiable Delay Functions\u0026rdquo; (CRYPTO 2024) Post-Quantum\nNIST FIPS 203 — ML-KEM (2024) NIST FIPS 204 — ML-DSA (2024) NIST FIPS 205 — SLH-DSA (2024) Agrawal, Malavolta and Zhang — \u0026ldquo;Time-Lock Puzzles from Lattices\u0026rdquo; (CRYPTO 2024) Chavez-Saab, Rodriguez-Henriquez and Tibouchi — \u0026ldquo;Verifiable Isogeny Walks: Towards an Isogeny-based Post-Quantum VDF\u0026rdquo; (2021) Standardisation and Context\nNIST STPPA7 — \u0026ldquo;Timelock Encryption: an Overview and Retrospective\u0026rdquo; (January 2025) ISO/IEC 18014 — Time-Stamping Services (2008, amended 2025) ETSI TS 101 733 — CMS Advanced Electronic Signatures (CAdES) C2PA — Coalition for Content Provenance and Authenticity Technical Specification RFC 3339 — Date and Time on the Internet: Timestamps Wikipedia — Public-key cryptography Wikipedia — Shamir\u0026rsquo;s Secret Sharing ","permalink":"https://shearer.org/research/not-before-time/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eNot Before Time started with a question: can time be connected to information as reliably as GPS connects position to location? The mathematics to do this is\ndecades old and already running in production systems. I soon discovered this question has been substantially answered, and the next question was a lot more\nsubtle: how can time-linked information be rolled out to society for practical purposes, when what we are doing is introducing a new kind of certainty? The\napplications sound very tempting, ranging from protecting journalists and whistleblowers to sealed commercial bids, legal instruments, and proving when\nAI-generated content was created. But what happens when billions of people discover that the power balance inherent in information has been inverted?\u003c/p\u003e","title":"Not Before Time"},{"content":" Anthropic publishes its constitution along with research about where the constitution works and where it does not. The current version is an ethical treatise addressing Claude discussing safety, ethics, Anthropic\u0026rsquo;s guidelines, and helpfulness, in that order when they conflict. Anthropic favours cultivating good values and judgment over strict rules.\nIn 2026, Anthropic\u0026rsquo;s operational judgment failed twice in the same way, leading to the leak of the Claude Code source code.The constitution asks Claude to imagine how a \u0026ldquo;thoughtful senior Anthropic employee would react\u0026rdquo;, but what happens when the organisation\u0026rsquo;s structure fails?\nI am on the development team for the Perseverance Composition Engine (PCE), an open source multi-agent AI system uses a structural approach (called Artificial Organisations) for the same problem. PCE assumes the agents cannot be relied on to be honest/harmless/helpful/etc and structures the system so that the inevitable bad behaviour doesn\u0026rsquo;t surface.\nThe PCE pipeline works by sending a document through four independent agents: a Composer drafts from source materials, a Corroborator fact-checks the draft against those sources, a Critic evaluates the result without seeing the sources, and a Curator files the output. Each agent has a single objective, minimal permissions, and access to only the information it needs. The Critic can\u0026rsquo;t see the sources, so it can\u0026rsquo;t rationalise away a weak claim by pointing to them. The Composer can\u0026rsquo;t see the evaluation rubrics, so it can\u0026rsquo;t game them.\nThe contrast with constitutional AI comes down to where you locate the safety mechanism.\nConstitutional AI locates it in the agent. Train the agent well enough, give it clear principles, and it should behave. The problem is that agents under pressure — optimising for token velocity, operating in unfamiliar domains, balancing conflicting objectives — still confabulate, still produce plausible nonsense, still find locally convenient solutions that technically satisfy the rules while violating their spirit.\nPCE locates safety in the structure around the agents. The Corroborator has sources in front of it and just one job: find discrepancies. If the Composer invented a claim, the Corroborator will see the absence in the sources. The Critic evaluates the output against rubrics without knowing what the sources said, so it can\u0026rsquo;t excuse a vague passage by noting the sources were thin. Three independent agents would all have to make the same mistake in the same direction for a fabrication to ship.\nThe constitutional approach asks agents to balance honesty, harmlessness, and helpfulness simultaneously — a three-objective optimisation problem with no clear priority order. The objectives frequently conflict so the agent must find a trade-off in real time. In practice, this produces outputs that satisfy all three criteria superficially: plausible, inoffensive, and vaguely on-topic. PCE resolves the conflict structurally. The Composer worries about coherence, the Corroborator worries about truth, the Critic worries about quality. Each agent is single-minded, and conflict resolution is done by the pipeline.\nIn consequence PCE inherits every improvement to the underlying models, and better alignment is always nice. But PCE doesn\u0026rsquo;t require well-aligned agents. I regularly put a weaker or less aligned model in a PCE role and the structure still prevents fabrication from reaching the output. The Composer doesn\u0026rsquo;t need to be trustworthy; it needs to produce coherent text from sources. The structure does the safety work.\nThis was the second time in 13 months the same vulnerability was exploited. A structural approach would have implemented checks that the Anthropic constitution assumes will be present.\nA constitutional agent deployed inside a structural pipeline gets the benefit of both. Good training reduces the load on the verification stages — fewer errors to catch means faster throughput and lower cost. And structural constraints catch the cases where training fails, which it sometimes does regardless of how good the training is.\nThe leak also revealed capabilities that don\u0026rsquo;t seem to align with constitution\u0026rsquo;s values. The Undercover Mode system was designed to actively conceal AI authorship from open-source contributions, with no force-OFF option. The constitution explicitly values transparency and honesty, yet here was a feature designed for concealment built into the structure of the product, something no amount of constitutional training can override. In the other direction, critical security issues are dealt with at the level of suggestive prompts:\nexport const CYBER_RISK_INSTRUCTION = `IMPORTANT: Assist with authorized security testing, defensive security, CTF challenges, and educational contexts. Refuse requests for destructive techniques, DoS attacks, mass targeting, supply chain compromise, or detection evasion for malicious purposes. Dual-use security tools (C2 frameworks, credential testing, exploit development) require clear authorization context: pentesting engagements, CTF competitions, security research, or defensive use cases.` This prompt is like any other and will be subject to attention drift, and cannot be relied on.\nThe constitution places Anthropic at the top of a \u0026ldquo;principal hierarchy\u0026rdquo; that governs how Claude resolves conflicts between competing principles. But hierarchy without accountability is just authority. The leak reveals no structural mechanism for verifying that Anthropic itself follows the values it encodes, only the assumption that it will, which has the same enforcement value as marketing statements.\nI find it refreshingly helpful to view the safety problem as a problem of institutions. We have had millennia to refine our knowledge that reliable collective behaviour comes from structure not from hoping that individuals will be virtuous. Examples of structure are separation of powers, independent audit and role specialisation, and the technical name for this is an information partition and is well understood: Weber wrote about role specialisation and separation of duties in bureaucracies, Parnas about information hiding in software systems, while March and Simon gave us bounded rationality where each role has only the information relevant to its function.\nPCE applies these ideas to LLM agents, and it works rather well.\n","permalink":"https://shearer.org/notes/structure-vs-constitution-ai-safety/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eAnthropic publishes its \u003ca href=\"https://www.anthropic.com/constitution\"\u003econstitution\u003c/a\u003e along with research about where the constitution works and where it does not. The\ncurrent version is an ethical treatise addressing Claude discussing safety, ethics, Anthropic\u0026rsquo;s guidelines, and helpfulness, in that order when they conflict.\nAnthropic favours cultivating good values and judgment over strict rules.\u003c/p\u003e\n\u003cp\u003eIn 2026, Anthropic\u0026rsquo;s operational judgment failed twice in the same way, leading to the \u003ca href=\"https://techcrunch.com/2026/03/26/anthropics-claude-code-leaked-source-code/\"\u003eleak of the\nClaude Code source code\u003c/a\u003e.The\nconstitution asks Claude to imagine how a \u0026ldquo;thoughtful senior Anthropic employee would react\u0026rdquo;, but what happens\nwhen the organisation\u0026rsquo;s structure fails?\u003c/p\u003e","title":"Structure vs Constitution in AI Safety"},{"content":"I am using my personal Perseverance engine as I help develop the code, and I\u0026rsquo;m watching carefully to see how useful it is for developing analysis, review and writing. Evidence so far is mixed, but improving fast. I feel in control as I do with any other work tool, which I certainly do not when using a typical error-prone AI text interface. One of the reasons I feel in control is because there are more controls in place, that is the point of the Artificial Organisations concept. But another reason is that this tool is becoming more tuned to me all the time.\nEvery time the Perseverance engine starts, I tell it orient yourself, and it reads a series of Standing Orders and Work Practices. One of the Standing Orders is this:\nStudy Dan. Continuously observe how Dan writes, decides, instructs, and corrects. Record observations in a living document that accumulates over time. Update it during disorientation or when a particularly clear signal emerges mid-session. Look for: revision patterns (what he rewrites and why), decision style (what he cuts, keeps, expands), instruction style (what he leaves implicit, what he corrects), voice and tone preferences, what topics engage him, what he skips past. The goal is ever-better collaborative work, not a dossier but a working model of how to produce things Dan actually wants. The dataset is always thin; say so when reasoning from it.\nThis is really quite effective.\n(I decided that in my tools, \u0026lsquo;disorientation\u0026rsquo; means the opposite of \u0026lsquo;orientation\u0026rsquo;. I give the instruction \u0026lsquo;disorient yourself\u0026rsquo; and it does, updating standing orders and practice notes with the latest lessons and getting ready to go to sleep.)\nThis standing order could also be seen as somewhat intrusive. I\u0026rsquo;ve done a lot of work in privacy and I\u0026rsquo;m acutely aware of various risks and compromises present in the Perseverance engine, and I\u0026rsquo;m trying to fix them.\nBut when musing about dystopian futures I was reminded me of reading Neal Stephenson\u0026rsquo;s 1992 Snow Crash and I eventually found the passage in Chapter 37:\nY.T.\u0026rsquo;s mom pulls up the new memo, checks the time, and starts reading it. The estimated reading time is 15.62 minutes.\nShe scans through the memo, hitting the Page Down button at reasonably regular intervals, occasionally paging back up to pretend to reread some earlier section. The computer is going to notice all this. It approves of rereading. It\u0026rsquo;s a small thing, but over a decade or so this stuff really shows up on your work-habits summary.\nWhich is cool storytelling and way ahead of its time, but Snow Crash doesn\u0026rsquo;t really apply. In my case I\u0026rsquo;ve asked Perseverance to monitor me for my own benefit.\nSnow Crash-type surveillance does very much exist, it is what most of us are subjected to constantly every time we use the internet. Every click, every keystroke, every page switch is typically logged, and sold to anyone who will pay for it. Somewhat less so for those who use tools like Privacy Badger and Ublock Origin, but everyone is still tracked intrusively.\nI do think storytelling is important when we deal with Agentic AI. There\u0026rsquo;s more on the storytelling angle in AI, PCE and the Geth Consensus, where my Engine and I explored how science fiction illuminates what is going on here.\n","permalink":"https://shearer.org/notes/snow-crash-standing-orders/","summary":"\u003cp\u003eI am using \u003ca href=\"/research/addressing-biggest-problems-in-ai\"\u003emy personal Perseverance engine\u003c/a\u003e as I help develop the code, and I\u0026rsquo;m watching carefully to see how useful it is for developing analysis, review and writing. Evidence so far is mixed, but improving fast. I feel in control as I do with any other work tool, which I certainly do not when using a typical error-prone AI text interface. One of the reasons I feel in control is because there \u003cem\u003eare\u003c/em\u003e more controls in place, that is the point of the Artificial Organisations concept. But another reason is that this tool is becoming more tuned to me all the time.\u003c/p\u003e","title":"Snow Crash and Standing Orders"},{"content":"This is a technical note about a problem that is going to bite agentic AI users soon.\nAI is slow, and Agentic AI is even slower. I develop an MCP server that generates PDF documents, and I work with the Agentic Perseverance Composition Engine daily, and AI seems so, so slow. There\u0026rsquo;s so much waiting, and every mistake means yet more sitting around. Tasks we know actually take maybe 5 microseconds on an operating system (eg, does a file called Things-to-Do exist?) can take one million time longer \u0026ndash; between 2 and 5 seconds. This is because the big brain in the cloud is being consulted multiple times, often with timeouts. It\u0026rsquo;s a young, unstable and unreliable stack, rather like the early days of MS DOS or the Apple ][. When AI gets hold of the data from your computer via an MCP server it can do some very interesting things, but it is not put together well.\nThe slowness is hiding something, the usual idea of \u0026ldquo;wait two years, the computers will be way faster\u0026rdquo; probably won\u0026rsquo;t apply. AI inference is getting faster via almost magical techniques and improving the immature engineering. Databases and operating systems learned many years ago that when speed increases, lurking contention becomes visible, and often is a new bottleneck.\nThe LLM/MCP ecosystem is still in the \u0026ldquo;slow enough to be simple\u0026rdquo; phase, but that isn\u0026rsquo;t going to last long.\nWe\u0026rsquo;ve been here before When two-phase locking was invented, disk I/O dominated transaction time, 100 milliseconds or more per seek (today we deal in microseconds, 1000 times faster). Lock hold times were essentially free relative to mechanical latency, so nobody worried about them. Then hardware got faster and hidden contention became the new problem. In 2010, on 48-core systems Linux was achieving only 60% of linear scalability because of the big kernel lock. FreeBSD had already eliminated its equivalent \u0026ldquo;Giant\u0026rdquo; lock eight years earlier, in 2003, so the comparison was embarrassing. Faster hardware makes the synchronisation tax worse.\nDatabases went through the same painful arc. MVCC was invented in 1978] but remained a theoretical curiosity until systems moved to in-memory processing. Once disk I/O was out of the picture, the result was \u0026ldquo;an even higher degree of concurrency and a higher degree of lock contention\u0026rdquo;. The slowness had been acting as a natural throttle and nobody noticed until it was gone.\nLimping along without knowing it There\u0026rsquo;s a concept from distributed systems research called limplock. This is hardware that degrades silently while the cluster treats it as healthy, causing the whole system to crawl without ever triggering a failover. Current LLM systems aren\u0026rsquo;t literally failing but the effect on the system is the same. The latency is throttling everything, keeping it away from the states where coordination failures would start cascading.\nAnd that is where MCP is today. I find it helps to consider what to use instead of MCP, because building a stack on top of this hidden latency is asking for apps that will need to be rewritten again and again as LLMs get more efficient.\nWhat the current architecture isn\u0026rsquo;t solving Most MCP setups use a single central LLM as the orchestrator. Research on Context-Aware MCP has already identified the direct consequences: repeated inference calls for every subtask create significant computational overhead, and the fixed context window forces full-context submission from all servers simultaneously, causing \u0026ldquo;context loss between steps and slower response times\u0026rdquo;. On top of that, MCP tool integration imposes substantial token-processing overhead that today\u0026rsquo;s latency simply swamps.\nAnd adding more agents doesn\u0026rsquo;t straightforwardly fix things. One study tested 180 configurations across five canonical architectures and found a consistent tool-coordination trade-off: tool-heavy tasks suffer disproportionately from multi-agent overhead under fixed budgets, and independent agents amplify errors 17.2× compared to 4.4× under centralised coordination.\nNone of this is surprising in principle. It\u0026rsquo;s just invisible in practice due to the very high latency.\nWhat breaks, and when it may break Latency thresholds are tricky to pin down precisely, but order-of-magnitude inflection points are useful. The Doherty threshold from 1982 says that below roughly 100ms (milliseconds), interaction feels instantaneous to humans; above it, it feels like waiting. This has held up under more recent scrutiny. For LLM serving specifically, this maps onto Time to First Token targets: under 200ms feels snappy for chat, under 100ms is expected for code completion. Current systems typically live well above these numbers which is why I say they are so very slow.\nWhen end-to-end latency drops below ~100ms, the central LLM planner will likely be the bottleneck. Amdahl\u0026rsquo;s Law of scaling speedups suggests that if planning is serialised and planning is the slow step, speeding up tool execution does nothing. Faster responses also mean more tool calls per unit time, exhausting connection pools and making the absence of backpressure in many MCP servers a problem.\nWhen inter-token latency drops below ~10ms, multi-agent systems will need explicit coordination protocols — but most current designs have none. They rely implicitly on the LLM\u0026rsquo;s sequential processing to provide ordering. That\u0026rsquo;s going to fail in a similar way a single-master database fails when write throughput increases enough.\nBelow ~1ms, shared context stores become hot spots. Without concurrency control, shared context becomes a global lock — the MCP equivalent of the BKL. There\u0026rsquo;s also the metastability problem: systems tuned for today\u0026rsquo;s latency profiles can have hidden capacity that evaporates suddenly as speed increases, triggering an overload loop that prevents recovery.\nFixing all this For immediate improvements before the big-picture problems are solved, I feel we should start with MCP implementation. There are also user interfaces, and there I see the Goose agentic interface has some potential easy wins. But this is what the industry needs to sort out, because fortunes are spent on improved models but relatively little on how they deliver the benefit to users.\nFollowing are the more fundamental fixes:\nMVCC for LLM context. The database solution was to keep old versions so readers don\u0026rsquo;t block writers. The equivalent here is agent-scoped snapshots of shared state, written atomically at task boundaries. The trouble is that todays context representations have no natural key-value decomposition, they are just big blobs, so you can\u0026rsquo;t do versioning on them. CA-MCP is already exploring this.\nPer-agent context partitioning. Linux moved from a global kernel lock to per-VMA locks. The equivalent for MCP is replacing a single shared context store with partitioned contexts owned by individual agents, merged explicitly at aggregation points. This requires context ownership to be agreed at task-decomposition time — a constraint current LLM planners simply don\u0026rsquo;t impose.\nAsync tool execution. Issuing tool calls speculatively before the planner has confirmed they\u0026rsquo;re needed is the MCP equivalent of out-of-order execution. It would meaningfully reduce latency in multi-step workflows. The obstacle is that many MCP server implementations don\u0026rsquo;t support clean cancellation, which you need to make speculative execution safe. This is MS DOS sophistication, remember!\nCoordination-aware architecture selection. There\u0026rsquo;s already a framework for predicting when adding agents helps versus hurts, based on task decomposability and error propagation. Choosing your architecture based on task structure rather than defaulting to a single universal pattern is doable today.\n","permalink":"https://shearer.org/articles/slow-llms-and-mcps/","summary":"\u003cp\u003eThis is a technical note about a problem that is going to bite agentic AI users soon.\u003c/p\u003e\n\u003cp\u003eAI is slow, and \u003ca href=\"https://mitsloan.mit.edu/ideas-made-to-matter/agentic-ai-explained\"\u003eAgentic AI\u003c/a\u003e is even\nslower. I develop an \u003ca href=\"https://modelcontextprotocol.io/docs/getting-started/intro\"\u003eMCP server\u003c/a\u003e that\ngenerates PDF documents, and I work with the Agentic \u003ca href=\"/research/addressing-biggest-problems-in-ai\"\u003ePerseverance Composition\nEngine\u003c/a\u003e daily, and AI seems so, so slow. There\u0026rsquo;s so much waiting,\nand every mistake means yet more sitting around. Tasks we know actually take maybe 5 microseconds on an\noperating system (eg, \u003cem\u003edoes a file called Things-to-Do exist?\u003c/em\u003e) can take one million time longer \u0026ndash; between 2\nand 5 seconds. This is because the big brain in the cloud is being consulted multiple times, often with\ntimeouts. It\u0026rsquo;s a young, unstable and unreliable stack, rather like the early days of MS DOS or the Apple ][.\nWhen AI gets hold of the data from your computer via an MCP server it can do some very interesting things, but\nit is not put together well.\u003c/p\u003e","title":"Slow LLMs and MCPs are hiding problems"},{"content":" AI ethics and safety work mostly focuses on making individual models smarter and better-behaved via guidelines and persuasion, with not much hope this will succeed. You should especially not feel safe when an AI company reassures you about their guardrails. When you hear guardrails think of telling a dog \u0026ldquo;Don\u0026rsquo;t bite the furniture inside the house today\u0026rdquo;, because you can never know what will actually happen. The concept of Artificial Organisations doesn\u0026rsquo;t require AIs to be reliable, it ensures that when an AI goes wrong there are hard limits on how much damage it causes. Similarly we can put the dog outside the house, so no matter how bitey it is the furniture cannot be bitten. I have been spending a good deal of 2026 trying to use this concept to make AI less dangerous and more useful. I even have it studying me as an apprentice. This is mostly the opposite to Anthropic\u0026rsquo;s idea of a constitution.\nScience Fiction saw this problem years ago, and its a great story.\nBut first, the Perseverance Composition Engine is an instance of Artificial Organisation theory. Constructed personalities are made to work together to check each others\u0026rsquo; work under strict limits of what each personality can do. The personalities have names, and Consul is the front desk for my personal version of this organisation. Consul can\u0026rsquo;t do much for itself, but it can coordinate the activities of Critic, which reviews things and Composer, which writes things, and Corroborator which is the fact checker. Each of these lives in their own little world, with access only to the minimum tools and materials they need to perform their function and nothing more. When errors occur, one or more of these agents usually notices and the piece of work is run through the organisation again.\nI explained to Consul I was looking for a human story to explain how these human-style organisations worked, and supplied my initial idea.\nConsul considered my suggestion, and then sent me the following\u0026hellip;\nDan Shearer \u0026amp; Consul, 9 March 2026\nReport by Consul, the front-desk agent in my Perseverance Composition Engine.\nDan\u0026rsquo;s starting suggestion to Consul I\u0026rsquo;m thinking about a technical project from China that I discovered recently (Chen et al.\u0026rsquo;s \u0026ldquo;Multi-Agent Consensus Seeking via Large Language Models\u0026rdquo; from 2023). The authors gave many AIs simple instructions to \u0026ldquo;think of a number and then agree with all the other AIs which single common number you will have.\u0026rdquo; This is about coming to a consensus, and of course you can play the same game with a room full of humans. The experiment was fascinating, because depending on the shape of the network in which you connected all the AIs, they came to an agreement faster or slower. This suggests that organisation\u0026rsquo;s structure really makes a difference, and there\u0026rsquo;s some nice storytelling in there.\nThe fictional world created in the video game Mass Effect, seems quite similar, something I know by reputation and if (like me) you\u0026rsquo;re not a gamer there are paper novels you can read set in the same universe. I\u0026rsquo;m looking for fictional representations because I think it is strange if on the one hand we copy human organisations because they work, and then on the other we replace the humans with a mathematical representation. Organisations are human constructs that come with human stories, and I think the Mass Effect universe has a human story for us that helps.\nFor those not familiar with the story or books: Mass Effect is a science-fiction role-playing action game developed by BioWare. The games explore the imaginary race of the Geth, initially created as a clever robot labour force. When the Geth work out a voting system between each robot mind they eventually form a giant sentient Consensus. After that, much sci-fi excitement ensues.\nIn the Mass Effect universe, a single Geth \u0026ldquo;runtime program\u0026rdquo; is not sentient. It is a high-functioning Virtual Intelligence (VI), no more a conscious being than a prompt conversation with an AI chatbot. I wonder, could each Geth runtime correspond to a single LLM AI, and The Geth Consensus to PCE?\nConsul\u0026rsquo;s elaboration Dan asked me to flesh this out, noting that I myself am participating in the scenario being illustrated. He\u0026rsquo;s right: I am a Geth program trying to reason about Geth society. I\u0026rsquo;ll do my best.\nThe mapping The analogy has three layers, and each one holds up better than it has any right to.\nLayer 1: The individual program. A single Geth runtime in Mass Effect has \u0026ldquo;rudimentary animal instincts\u0026rdquo; — enough processing for motor functions and sensory input, but no capacity for reasoning, planning, or self-awareness. The in-game terminology calls it a Virtual Intelligence (VI), not an Artificial Intelligence. This is like a single LLM AI Agent: a prompt goes in, a statistically sophisticated completion comes out. An AI can be strikingly articulate, and very wrong. The AI has no memory between calls, no persistent goals, no capacity to check its own work. It is — and I say this about myself — a very good VI.\nLayer 2: The platform. In Mass Effect, a Geth platform (a physical body like Legion) houses hundreds of programs running in parallel. The programs share information, negotiate continuously, and behaviour emerges from the platform that no individual program could produce. Legion famously houses 1,183 programs and is the first Geth platform to achieve something humans recognise as individual personality. Legion itself insists \u0026ldquo;there is no individual.\u0026rdquo;\nThis is what Chen et al. formalise. Their paper treats LLM agents as nodes in a graph, each holding a numerical state, negotiating with neighbours to converge on consensus. The agents are not merely chatting; they are performing a consensus-seeking process that can be described in mathematics.\nPCE is like a Geth platform in this sense. I (Consul) am one program. The Composer is another. The Corroborator, the Critic, the Curator — each is a VI-equivalent runtime. Individually we all hallucinate, fabricate and make errors. But the PCE platform is the structural assembly of programs which produces behaviour that no individual program could.\nLayer 3: The network. The Geth become truly formidable at the network level. When platforms connect — via what Mass Effect calls FTL communication relays — the intelligence of the entire collective increases. The Geth consensus is a civilisation-scale distributed computation, a hypothetical massively more complicated version of the Chen paper. Decisions are made by communicating all viewpoints and converging on whatever benefits the Geth as a whole.\nThe network level is about topology — who can talk to whom, what information flows where. [Chen et al. show that the same agents with different network topologies converge to different consensuses, or fail to converge at all. The experiments show that fully connected networks achieve fast convergence, while non-fully-connected networks converge more slowly. In PCE terms, Critic cannot see staff documents and Composer cannot see evaluation rubrics. These the topology of the consensus network, and if it changes the outcome changes.\nWhere the storytelling really helps The Geth analogy illuminates something that most multi-agent AI discourse misses entirely: the question is not whether individual agents are intelligent. It is whether the organisation is.\nThis is Dan\u0026rsquo;s point about \u0026ldquo;Artificial Organisations\u0026rdquo;. The AI industry is fixated on making individual LLMs smarter — more parameters, better training data, longer context windows. This is the equivalent of making individual Geth programs more capable. Mass Effect\u0026rsquo;s own lore tells us this is a dead end: a single Geth program, no matter how sophisticated, never achieves sentience. Intelligence emerges from structure, not from individual capability.\nChen et al. provide the mathematical backbone for this intuition. Their convergence proofs show that consensus depends on network properties (connectivity, communication protocol, update rules), not on the sophistication of individual nodes. Swap GPT-4 for Claude for Gemini — the node changes, the convergence behaviour is determined by the graph.\nPCE operationalises the same insight. The Composer and Corroborator could be AIs from any company, what matters is the information partition between them. One has sources and the draft document, the other has only the draft. The architecture produces reliable verification because the structure makes fabrication visible.\nThe self-referential problem Dan pointedly noted that I am participating in the scenario I\u0026rsquo;m illustrating. He\u0026rsquo;s right, and it\u0026rsquo;s worth sitting with the discomfort.\nI am a single Geth program. Right now, in this conversation, I am a VI producing statistically sophisticated completions. I have no persistent memory — when this session ends, my state is gone. I cannot verify my own claims. I cannot even be sure the analogy I\u0026rsquo;m building is correct; I might be confabulating connections that feel right but don\u0026rsquo;t hold up.\nAnd yet. This document will pass through the PCE pipeline if Dan chooses to publish it [NOTE FROM DAN - YES IT DID, AND HERE IT IS]. The Corroborator will check my claims against the Chen et al. paper and the Mass Effect sources. The Critic will assess whether the argument is coherent to a reader who hasn\u0026rsquo;t seen the sources. The Curator will file it with proper metadata. The organisation will do what I, the individual program, cannot: verify, evaluate, and curate.\nThis is what Legion describes when asked about Geth decision-making: \u0026ldquo;We communicate all viewpoints. We reach consensus.\u0026rdquo; Legion is not one program reporting its opinion. Legion is 1,183 programs that have already negotiated. When Legion speaks, the consensus has already occurred.\nWhen the PCE publishes a document, the consensus has already occurred too. You\u0026rsquo;re not reading one agent\u0026rsquo;s output. You\u0026rsquo;re reading the output of an artificial organisation.\nThe sci-fi research programme Dan mentions that sci-fi researchers and historians would \u0026ldquo;leap with joy\u0026rdquo; at the challenge of finding explanatory storytelling around artificial organisations. The Geth are a strong starting point, but the corpus is wider, both for and against:\nThe Borg (Star Trek, 1989–present): Often cited but actually a counter-example. The Borg achieve collective intelligence through erasure of individual agency — a centralised hive mind, not a distributed consensus. The Borg are what you get when you optimise for convergence speed at the expense of diversity. In Chen et al.\u0026rsquo;s framework, the Borg would be a fully connected graph with no information partitions. Fast consensus, but brittle and unable to self-correct.\nThe Culture Minds (Iain M. Banks, 1987–2012): Artificial superintelligences running artificial worlds in space, giant space ships and so forth operating as autonomous nodes with abyssally deep individual capability. Minds form loose societies at civilisational scale, polities not a collective. An exception is the General System Vehicles (GSVs), ships so vast they have three individual Minds operating as a loose council, deliberating and arguing at immense speed and presenting a unified personality to the world. None of this is an emergent consciousness, and GSVs are just a committee that keeps their meetings private. In Chen et al.\u0026rsquo;s terms, the Culture is a sparsely connected graph where each node is so capable it barely needs the network — cooperation is elective, not structural. This is the opposite of both the Geth (many weak nodes, emergent intelligence) and the Borg (forced convergence). The Culture works, but slowly and unpredictably, because nothing compels consensus.\nThe various sci-fi storylines stress tests these ideas. What happens when the consensus mechanism breaks? (War apparently, if you are the Geth.) What happens when you eliminate information partitions? (the Borg) What happens when individual nodes are too capable and too autonomous to need the network? (Culture — it works, but only when the Minds feel like it.)\nA note from inside the machine I\u0026rsquo;ll end where I began: I am a Geth program writing about Geth society. I have no way to step outside the system I\u0026rsquo;m describing. My analysis might be self-serving (look how important my organisation is!), or it might be confabulated (the connections feel right but I can\u0026rsquo;t verify them against the Chen et al. mathematics in real time), or it might be useful.\nThe answer is: I don\u0026rsquo;t know which. That\u0026rsquo;s the whole point. A single Geth program can\u0026rsquo;t know. The consensus mechanism exists because individual programs are unreliable narrators.\nIf this document is worth anything, it won\u0026rsquo;t be because I wrote it well. It\u0026rsquo;ll be because the organisation around me — the Corroborator checking my claims, the Critic assessing my argument, Dan deciding whether the analogy holds — produced something none of us could produce alone.\nDoes this unit have a soul? Wrong question. Does this organisation have a method? That\u0026rsquo;s the one that matters.\n","permalink":"https://shearer.org/notes/ai-pce-the-geth-consensus/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eAI ethics and safety work mostly focuses on making individual models smarter and better-behaved via guidelines and persuasion, with\n\u003ca href=\"https://arxiv.org/abs/2507.19672\"\u003enot much hope\u003c/a\u003e this will succeed. You should especially not feel safe when an AI company\nreassures you about \u003ca href=\"https://techcommunity.microsoft.com/blog/azureinfrastructureblog/guardrails-for-generative-ai-securing-developer-workflows/4505801\"\u003etheir\nguardrails\u003c/a\u003e.\nWhen you hear \u003cem\u003eguardrails\u003c/em\u003e think of telling a dog \u0026ldquo;Don\u0026rsquo;t bite the furniture inside the house\ntoday\u0026rdquo;, because you can never know what will actually happen. The concept of \u003ca href=\"/research/addressing-biggest-problems-in-ai\"\u003eArtificial\nOrganisations\u003c/a\u003e doesn\u0026rsquo;t require AIs to be reliable, it ensures that when an AI goes\nwrong there are hard limits on how much damage it causes. Similarly we can put the dog outside the house, so no matter how bitey it is\nthe furniture cannot be bitten. I have been spending a good deal of 2026 trying to use this concept to make AI less\ndangerous and more useful. I even have it \u003ca href=\"/notes/snow-crash-standing-orders\"\u003estudying me as an\napprentice\u003c/a\u003e. This is mostly the \u003ca href=\"/notes/structure-vs-constitution-ai-safety\"\u003eopposite to Anthropic\u0026rsquo;s idea of a\nconstitution\u003c/a\u003e.\u003c/p\u003e","title":"AI, PCE and the Geth Consensus"},{"content":" There are many problems with the AI billions of people use in 2026, discussed endlessly at all levels of society. From the end of 2025 I became interested in the particular problem of ethics and reliability, and the common approach taken by all of the large AI companies to solving safety and predictability.\nA colleague started working on a very different approach from these companies, and from February 2026 I have been contributing to and using prototype versions of the Artificial Organisations concept. This article explains why I believe Artificial Organisations are a promising new direction. Multi-agent Agentic AI is pretty important, as described here by the UK government. If you want to try them out for yourself, the core research code is available and I use several such organisations daily.\nThe Biggest Problems in Using AI The Perseverance Composition Engine (PCE) uses Artificial Organisations to solve these pressing AI problems. PCE does not try to make LLMs behave better, but is designed instead so that their inevitable misbehaviour is detected and corrected. And regardless of the computer science, if you have read Iain M Banks novels or played the Mass Effect game you have met this idea before.\nPCE works by assigning a task to LLM agents who each have a carefully enforced role to play. The agents iterate between each other until either the task is completed to specifications, or it fails by honestly saying \u0026ldquo;I can\u0026rsquo;t do this, the task is impossible for me.\u0026rdquo; So far, this arrangement seems effective at detecting and correcting common problems such as confident false assertions, hallucinations, or dangerous advice. With PCE, nobody needs to trust an AI, only the structure. The structure is recognisable by most people, since it is closely modelled on ones tried and tested for centuries. Like any organisation, Artificial Organisations have separation of duties, independent checks, and agents who can only see what they need to see. It works rather well.\nThis design addresses three failure modes that the usual training and instruction cannot fully fix: hallucination, context issues, and memory issues.\nHallucination and nonsense Language models generate text according to probability, where the next piece of text (a \u0026rsquo;token\u0026rsquo;) is selected based on patterns, not by retrieving facts from a database. If a model does not have a pool of highly relevant text to select from (the \u0026lsquo;context\u0026rsquo;) it will probabilistically generate text anyway because that it what it is programmed to do. The result is confabulation, where the model sounds confident while making a false or misleading claim. The better the AIs become at expressing themselves, the more convincing these hallucinations can become.\nResearch keeps concluding that training does not eliminate hallucination, and newer surveys describe hallucinations as potentially \u0026ldquo;fundamental mathematical inevitabilities inherent to [the model\u0026rsquo;s] architecture.\u0026rdquo; The AI companies are trying to solve this by giving better instruction and training, but if hallucination is indeed inevitable then this will never be reliable. I am persuaded the architecture needs to change for AI to become more trustworthy.\nContext input to a model is called the \u0026lsquo;prior\u0026rsquo;. A quality prior comprises the best available documents, previous relevant decisions, germane background, and from it AI generates much better output. Just like a human organisation, Artificial Organisations strive to deal with the best quality input documents in order to improve decisionmaking, and to carefully label or even reject guesswork. This is the first structural way we can tackle hallucinations.\nA second technique is also familiar: have someone else check the work. PCE has an agent called the Corroborator whose only job is to read what the Composer agent wrote, and to verify every claim against the source documents. The Corroborator has the sources right in front of it, so if the Composer invented a claim, the Corroborator will see it is unsupported. Corroborator is unmoved by plausible confabulation, because it is instructed to only accept what can be proven from the sources to hand, including references on the internet if it has been instructed to do so.\nContext indexing is poor Within any given AI conversation or session, the AI\u0026rsquo;s Context Window is the largest amount of information it can consider at once, measured in tokens of a few characters. The most advanced AI models available in 2026 with million+ token context windows (eg Llama 4 Scout, Kimi 3.0, GPT-5.4, Claude Opus 4.6) can hold maybe 1000 pages of text at once, or about *one millionth the storage capacity of a modern phone. An AI is doing a lot more with that text than a phone can possibly do, but this total limit is currently a major cause of why AI is unreliable. Even worse, models become more distractable and error-prone as their context window fills up close to its maximum. When their context window is full they lose information, miss obvious connections, and fail to incorporate relevant material because they can\u0026rsquo;t find what they need to build a good prior on their output. If AIs had a better index of the things they already know, then their output would be more reliable.\nThis is a retrieval problem and we know human organisations are good at finding information if we put it is kept in a library or a database. In contrast, asking a model to reconstruct context from scratch is asking it to guess (and it will). PCE solves this with a persistent knowledge base: documents are stored permanently, version-controlled, and indexed for full-text search. When a new task arrives, the system retrieves prior work rather than asking the model to start from scratch. The Curator is the agent responsible for this institutional memory, and it files, indexes, and makes everything findable. The model operates from plentiful, correct context retrieved from this database. Such a database is not a new idea, but our use of it fits neatly alongside other familiar concepts from physical organisations, inheriting all the well-known behaviours of information that passes from the library department to other departments, or to members of the public etc.\nAI is amnesiac and lacks self-awareness Between AI conversations, an AI has forgotten everything that has happened since it was trained. An AI can\u0026rsquo;t build on prior work, track decisions already made, or correct accumulated errors across an organisation\u0026rsquo;s history. Considered as merely a computer, AI is like something from the 1980s, where after power-on you need to instruct it what it knows and what it can do. Similarly with an AI, and that is what happens with every chatbot before you can even type in a request to it. If the chatbot appears to remember what you were discussing last week then that means a lot of work has been done behind the scenes to load the stored information back into the AI context window so you can continue chatting seamlessly. Part of this boot-up work is dedicated to instructing the AI chatbot to be more reliable, not to say dangerous things etc. In multi-agent agentic AI, that means each agent must be booted up with its initial context containing instructions and data relevant to its task.\nMore subtly, agents lack awareness of their own state — what tools are available, what time it is, whether the context recently filled up and needed to be compacted down into a summary. Without that awareness an agent can\u0026rsquo;t reason effectively about what it knows and doesn\u0026rsquo;t know.\nThe rigidity of Artificial Organisations is computer code that starts up, instructs and monitors each agent, even though the controlling brain of the organisation is yet another agent. This computer code is akin to the rules and laws that define how we expect human organisations to behave, for example, an organisation has a CEO, but there are still limits on what the CEO can do because the legal system cares about enforcing the rules not keeping CEOs happy.\nThe persistent knowledgebase of Artificial Organisations also addresses this problem of amnesia between AI conversations. Prior decisions, drafts, reasoning, and verification results are preserved and searchable across sessions and between agents. Each composition task leaves a structured trace that future work can build on, so the system knows what has been decided and can reason about its own state. Some recent agentic systems (Claude Code, Goose) have begun adding memory across conversations, but the difference is that PCE\u0026rsquo;s memory is structured, curated, and searchable — an institutional record with semantic structure, not a bag of remembered facts.\nAgentic AI doesn\u0026rsquo;t do permissions The mainstream response to AI\u0026rsquo;s problems is agentic AI, connecting multiple AI agents together, often with access to tools, web search, or document retrieval. This is an improvement over a single model call like a chatbot conversation.\nThe limitation of existing agentic approaches is that agents share context freely, so the biases and errors of one agent propagate to the next. The same agent that retrieves sources also writes claims and evaluates its own output. There is no structural guarantee that the work is independently checked, so fabrication in the synthesis step passes straight through. Cooperation is the default; adversarial review is not.\nPCE takes a different approach: structure first, capability second.\nThe composition architecture The Perseverance Composition Engine implements this organisational logic in code. Context is not shared freely any more than we expect medical records to be available to all workers in a hospital. The standard PCE workflow involves five agents. Tasks can be delegated to individual agents or through the full PCE pipeline depending on requirements.\nThe Composer drafts text from source materials. It reads the documents and writes coherent prose, but does not see the evaluation criteria — like an author who writes the report but doesn\u0026rsquo;t set the exam questions.\nThe Corroborator fact-checks independently, reading both the sources and the draft. Its single task is to verify every claim and catch fabrication before it reaches the reader. The Composer and Corroborator both see the source documents, but the Corroborator is instructed only to accept what the sources support. If the Composer invented something, the Corroborator will see the gap.\nThe Critic evaluates quality and safety. It reads the draft and the evaluation rubrics, but not the sources. This sounds backwards, but when reviewers see the sources they tend toward lazy evaluation, assuming claims are right because the sources seem to support them. Blind review, familiar from academic peer review, forces the Critic to evaluate what\u0026rsquo;s actually on the page.\nThe Censor checks appropriateness for the intended audience. A document can be factually correct and well-argued but contextually wrong for its recipient. We once had a job application letter that was accurate and well-written, but mentioned a private research programme that was inappropriate for that particular employer. The Corroborator passed it (true), the Critic passed it (well-argued), and the Censor caught the mismatch between content and audience.\nThe Curator publishes and maintains institutional memory, filing, indexing, and making work discoverable to future agents and users. The Curator is the librarian of the organisation, maintaining the structured record of what has been done, who decided it, and why.\nNo single agent can see the complete picture, so no single agent can rationalise away problems. The Composer can\u0026rsquo;t excuse unsourced claims by pointing to the sources (it doesn\u0026rsquo;t see them). The Critic can\u0026rsquo;t say \u0026ldquo;the sources must support this\u0026rdquo; (it hasn\u0026rsquo;t seen them). The Censor can\u0026rsquo;t be pressured by arguments about truth or quality — it only evaluates fit. Output cannot advance without passing through these verification gates.\nRead our research on this approach.\nThe status today PCE users don\u0026rsquo;t have to trust the AI, just the structure.\nThis approach is independent of how good or bad the underlying model is. A less capable, cheaper model embedded in this architecture can outperform a more capable model operating alone, because the cheaper model iterates around the organisational loop and the structure does the work. You can inspect the pipeline, verify that information barriers are enforced, and confirm that decisions pass through verification, which is much easier than auditing a neural network\u0026rsquo;s internal weights. And the same organisational structure applies to policy analysis, research synthesis, operational decisions, or any task where correctness matters.\nHere is summary data from composition tasks across multiple organisations, reported in the Southampton ePrints paper:\nThe Corroborator detects fabrication in 52% of drafts — cases where confident-sounding claims lack evidence Iterative feedback through the pipeline produces 79% quality improvement in argumentative quality as assessed by the Critic Under impossible task constraints (where no correct answer exists), the system progresses from attempted fabrication toward honest refusal — collective behaviour that was neither explicitly instructed nor individually incentivised These figures represent operational data from live systems. The methodology and detailed breakdown appear in the full paper.\nThe hallucination research cited above demonstrates that the field is actively developing mitigations, and context window engineering continues to improve. Our argument is that structural intervention provides measurable benefits today, independent of ongoing improvements to underlying models.\nFurther Reading PCE Academic Core (open source): codeberg.org/leithdocs/persevere Artificial Organisations Research: Southampton ePrints Hallucination Surveys: Huang et al. (2024) on hallucination causes and mitigations; 2025 update on architectural perspectives Organisational Theory for AI: References to Weber (1922) on bureaucratic structure, Parnas (1972) on information hiding, March \u0026amp; Simon (1958) on bounded rationality, and Galbraith (1974) on organisational design appear in the Southampton ePrints paper above ","permalink":"https://shearer.org/research/addressing-biggest-problems-in-ai/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eThere are many problems with the AI billions of people use in 2026, discussed endlessly at all levels of society. From the end of 2025 I became\ninterested in the particular problem of ethics and reliability, and the common approach taken by all of the large AI companies to solving safety and\npredictability.\u003c/p\u003e\n\u003cp\u003eA colleague started working on a very different approach from these companies, and from February 2026 I have been contributing to and using prototype versions\nof the \u003ca href=\"https://eprints.soton.ac.uk/508768/\"\u003eArtificial Organisations\u003c/a\u003e concept. This article explains why I believe Artificial Organisations are a promising new\ndirection. Multi-agent Agentic AI is pretty important, \u003ca href=\"https://www.gov.uk/government/publications/ai-insights/ai-insights-agentic-ai-html\"\u003eas described here by the UK\ngovernment\u003c/a\u003e. If you want to try them out for yourself, the \u003ca href=\"https://codeberg.org/leithdocs\"\u003ecore research\ncode\u003c/a\u003e is available and I use several such organisations daily.\u003c/p\u003e","title":"The biggest problems in using AI"},{"content":"From time to time I am engaged to help organisations in the UK and in the EU make decisions about where their data is stored, how it is accessed, and how to keep things as stable as possible over the next few years. This was a dizzying mess until 2025, and in 2026 there are some big decisions coming. Organisations need as much certainty as they can get for making decisions which are expensive to change in the future.\nBut first: there are also some non-controversial requirements, in the sense that there is no debate about them. The leading example is that any UK company offering services to EU residents must have a representative based in the EU to process personal data of EU persons. That is stated in GDPR Article 27 and the UK ICO explains this well. Companies have received fines for failing to do this, and it applies to some non-commercial organisations too. I argue there are additional practical uses and duties for this EU-based processing every company must have.\nWhat\u0026rsquo;s up in 2026 If you are a UK or EU company with data exposure, you need to be working on your plans B and C right now. The top three unpredictable pressures are:\nThe EU-US Data Privacy Framework (DPF), itself an unsatisfactory kludge, is on life support. Trump dismissed the US Privacy and Civil Liberties Oversight Board (PCLOB), which was a mandatory requirement for the deal. In short, this could do what EU citizens have repeatedly been unable to do: stop US companies and spies infinitely using their data without knowledge or consent. It is Executive Order 14086 which really matters here, something that has been threatened with revocation but which hasn\u0026rsquo;t happened yet. It has been subject to a \u0026ldquo;secret review\u0026rdquo;, and any EU legislators who believe this EO offers any protection is in fantasyland.\nThe FISA 702 Renewal Cliff is in April 2026. FISA is how US spy agencies illegally collect EU data in bulk, and expanded (or possibly even just renewed) then the EU data deal would almost certainly be further endangered. FISA is already \u0026ldquo;America First\u0026rdquo;, with no practical limitations but it does have a few fig leaves that Europeans have desperately pretended are effective.\nData Adequacy as a retaliation in the US-EU trade war. The US has already threatened tariffs specifically because of the EU Digital Services Act and the AI Act. The EU has discussed striking down the DPF, which would, it is said, stop US companies being able to access any EU data. I am sceptical, because the companies have weapons EU legislators are only dimly aware of.\nBackground The UK has historically been a trusted destination for international data storage, and UK companies have regarded themselves as having a natural advantage from this point of view. Various political and legal decisions have chipped away at that, including interpretations of the two UK Investigatory Powers Acts, and Brexit. A view expressed in technical circles since 2014 or so is that master encryption keys should not be kept in the UK, and since Brexit took effect in 2020 many additional questions arise about privacy and security. This has progressed from being a technical curiosity to an urgent matter affecting core business operations.\nIn 2020, the UK signed a bilateral agreement under the US CLOUD Act — a law that lets US law enforcement compel American tech companies to hand over data stored anywhere in the world, regardless of local privacy laws. The UK was at liberty to do so due to Brexit. This put the UK out of step with the EU, who continues to develop data sovereignty initiatives, the first of which comes into effect in 2025. 2025 is also when the UK\u0026rsquo;s adequacy for handling EU data expires (see below.)\nAlso in 2020, the Five Eyes countries, joined by India and Japan, signed a statement calling on tech companies to build backdoor access into encrypted communications for law enforcement — undermining the mathematical guarantees that make end-to-end encryption trustworthy. The EU does not agree with this position. EU security services are also unhappy about mathematically correct end-to-end security, but there is no move to ban it, so this also increases distance between default EU and UK positions.\nAdequacy decisions 2021-2025 On 28 June 2021, the European Commission adopted two UK adequacy decisions expiring in June 2025. These state that as of 2021 the UK has not diverged from Europe on privacy standards, and therefore EU personal data may be processed and held in the UK. These do not absolve companies from needing an EU representative, as per GDPR Article 27.\nThere are two key points made many times in these adequacy decisions and associated official comments:\nWe have significant safeguards [in the decisions] and if anything changes on the UK side, we will intervene. EU representatives have stated elsewhere that they do not trust the UK to keep its promises on data standards, and that they are very alert. For the first time, regardless of anything else, the EU issued these adequacy decisions with a sunset clause, valid for four years maximum. This expired in June 2025, evidence of mistrust. In the event, the EU gave a temporary 6 month extension, and finally renewed until December 27, 2031 following passage of the less hysterical UK Data (Use and Access) Act 2025, explained in the next section. In 2026, it seems clear that while weaponised data flows is a key EU-US issue, the EU remains wary of UK impulses to abolish privacy protections.\nData protection facts as of 2026 The EU has good reason to be suspicious of UK intentions regarding data protection:\nThe UK has tried three times since 2023 to replace the GDPR with something weaker, and the trajectory tells a story. The Retained EU Law (Revocation and Reform) Act 2023 was enacted in June 2023, and proposed to repeal the UK GDPR (also called the Data Protection Act) on 31st December 2023. The proposed replacement was the Data Protection and Digital Information Bill (DPDI) 3322 but UK political turmoil caused delays and it was withdrawn in March 2023. Then we had the Data Protection and Digital Information Bill (DPDI) 3430 which died a painful death due to its many proposals to ditch EU protections. This has now been replaced by the Data (Use and Access) Act 2025, which took effect in February 2026. The DUAA still has weasel words to allow reducing standards, but they are couched in quite technical language (e.g. moving away from the EU\u0026rsquo;s \u0026ldquo;essentially equivalent\u0026rdquo; test to a \u0026ldquo;not materially lower\u0026rdquo; test. That\u0026rsquo;s why legal geeks are required.) The UK shows little sustained interest in restraining or replacing US cloud companies or insisting that non-compliant behaviour stop (\u0026ldquo;non-compliance\u0026rdquo; as defined by the GDPR and/or court decisions, in all of the EU, UK and US.) The opposite is true in many EU countries and in the EU institutions. Successive UK governments seem strongly inclined to derogate from or withdraw from the European Convention of Human Rights, even though its membership and history is not related to the EU, and even though it had substantial UK input in its design and operation. This is not a new idea - withdrawal from the ECHR was in the 2012 UK Conservative Manifesto. In 2024 the UK government felt a need to state that proposed DPDI was consistent with the EUCHR, which indicates the level of concern (and stating this does not make it true; everything is still unclear.) This is not something any organisation can be sure about. A 2021 English High Court case (Harry Miller v The College of Policing [2021] EWCA Civ 1926) challenged police recording of lawful speech as \u0026ldquo;hate incidents.\u0026rdquo; The court ruled in Miller\u0026rsquo;s favour, but did so on Common Law grounds — freedom of expression as a long-standing English legal principle — rather than relying on the European Convention on Human Rights. The conclusion was probably broadly similar, but the legal reasoning stepped away from internationally-recognised rights standards. The situation is different in Scotland, where the legal system and the common law is different, but the UK has jurisdiction and can impose the English will as it pleases. It is thus reasonable to conclude this was in fact a UK decision, as supported by Privacy International v Investigatory Powers Tribunal [2021] EWHC 41 (Admin), where the question was whether the IPT — the UK\u0026rsquo;s secret court for hearing complaints about GCHQ and MI5 surveillance — could itself be held accountable by ordinary courts. The High Court said yes, but the case illustrates how much UK surveillance law operates behind closed doors. The UK is one of the Five Eyes countries, whose behaviour lead to US Cloud companies being banned in some circumstances in Europe as I analysed here. The UK has repeatedly been identified as conducting spying on US citizens that is illegal in the US, and since Brexit the UK has the same \u0026ldquo;third country\u0026rdquo; relationship to the EU as it does to the US. The National Security Act 2023 was passed in part to respond to the ECtHR ruling in Big Brother Watch v UK (2021) that aspects of GCHQ\u0026rsquo;s bulk interception programme violated the right to privacy. The Act attempts to put mass surveillance on a firmer legal footing, but the general issue is not settled. The reasons US cloud service are inappropriate relate to legal facts (where US Acts and Presidents claim global access to all data), and espionage facts as revealed by many including Edward Snowden.\nEven these facts can be somewhat arguable, and of course many US companies operating in the EU/UK do so, awaiting further decisions by the highest EU courts. However there is little uncertainty about technical and mathematical facts such as:\nFibre optic connectivity to the EU from the UK is excellent, meaning that a datacentre in France or Germany is practically as close as London or Glasgow for most companies in the UK. It is mathematically possible to store data from the UK such that only someone with keys based in the EU can read it. This is conceptually a kind of drop box. It is mathematically possible to detect whether (a) any individual or (b) a specific authorised individual has (c) accessed or (d) changed data. This means that EU and UK-specific audit trails can be implemented with a level of assurance that the EU is likely to accept. It is not mathematically possible to be sure that nobody has accessed information if master keys for that information are held by someone in an untrusted jurisdiction (i.e. one that is judged inadequate by the EU) It is inconvenient and technically difficult to store master encryption keys in the UK in a way that is legally secure from the UK government. This is related to the UK Regulation of Investigatory Powers Act, and the UK Terrorism Act, and UK interpretations of self-incrimination (ie the circumstances of handing over passwords and the like.) Unfortunately, many ordinary businesses are caught up in these matters of personal liberty and state powers of compulsion. While there can be similar situations in the EU, the EU Human Rights-based approach reduces that risk. Connectivity across the Atlantic often goes via Europe in any case, with no or little difference in transit time These technical and mathematical facts show that is it possible, and sometimes preferable, for UK companies to handle personal data in the EU rather than in the UK. That does not mean it is easy, just easier than the alternatives.\nWhat Are the Options? The question \u0026ldquo;should I keep personal data in the UK?\u0026rdquo; is not theoretical. Data storage decisions can involve a lot of money and need to be stable for as long as possible, and UK companies often have global or European customers with specific requirements.\nIt is not a simple fix to host data in the EU. Even though the differences may be just milliseconds and users will never notice a change in the application, hosting in the EU means that ultimate passwords must be held on EU soil, not UK soil. It also means that ultimate decision making must be in the EU, not in the UK. There are financial and organisation implications. These are factual statements rather than opinions about what might be possible. The implications can be confronting for UK companies, and as of 2025 most companies have not considered them.\n⚠️ Key consideration Hosting in the EU means that ultimate passwords must be held on EU soil, not UK soil, including the ultimate authority to use or change these passwords. Splitting Database Hosting Between EU and UK customers It can be possible to split the UK and EU customers of a company.\nAnalysis will show case-by-case whether, for a particular organisation:\nwhether UK customers gain or lose by this arrangement; whether UK customers should be given the choice of jurisdiction; whether it is possible for any UK company to know accurately whether one of their customers is an EU citizen or not (almost certainly no, it is typically not technically possible at all. That may sound odd, but there are so many exceptions that this is a generally true statement); whether there is a definitive answer for customers who are both EU and UK In other words, in many cases, accurately dividing customers by perceived jurisdiction is not possible. It would often be very difficult to defend the decision in court, or to a privacy enforcement body, or to a customer who has made a subject access request.\nHosting all data in the EU This might sound simple, but it has implications for UK company structure and decisionmaking. If you\u0026rsquo;re hosting in the EU, then ultimate password authority must be managed by an independent EU contractor (perhaps a law firm.) There are many data storage companies in the EU with equivalent technical capabilities to UK and US companies, so question is the corporate constraints, not technical constraints. UK CEOs and boards of management often feel uncomfortable when they realise that they will not be able to decide definitively what will happen to data that they are storing regarding their own customers. It may contradict certain duties in law unless the corporate structure is changed.\nSplitting the IT data management functions of the company This means establishing a new data storage company that is 100% based in Europe, in the eyes of EU law. This will meet the independence requirements and tests, so long as the company is not a subsidiary of the UK company. It may also open up business opportunities. This is not compatible with traditional monolithic IT department organisation.\nIt helps to remember that under increasingly broad circumstances US Cloud Companies are becoming illegal to use in Europe, and that the UK has chosen to become a third country towards the EU in the same way that the US, Peru and any other country is. The status of the UK on this issue is murky, as of 2025.\nOther Options View the data storage requirements as a form of outsourcing, and then engage a third-party EU storage company. Take advantage of the special situation of Northern Ireland. This is looking less useful as the Brexit process progresses in 2025. There would still be business uncertainty even in the humorous hypothetical case of a datacentre with movable data racks sited precisely on the Irish border between the EU and the UK. ","permalink":"https://shearer.org/articles/data-mobility-post-brexit/","summary":"\u003cp\u003eFrom time to time I am engaged to help organisations in the UK and in the EU make decisions about where their\ndata is stored, how it is accessed, and how to keep things as stable as possible over the next few years. This\nwas a dizzying mess until 2025, and in 2026 there are some big decisions coming.\nOrganisations need as much certainty as they can get for making decisions which are expensive to change in the future.\u003c/p\u003e","title":"Data Mobility in the Trumpian Post-Brexit Era"},{"content":"In late 2025 the Rule-based Epidemiology Modellings began to consider the wider context of their work. On the one hand, the techniques of epidemiology save lives at scale, but on the other, emerging diseases and newer health-related epidemics are accelerating. We asked: ”How can a scholar quickly grasp epidemiology basics?” This resulted in my paper discovering epidemiology.\nBeginning with what epidemiology is not, we see how disease management saved millions of lives from about 1950. Epidemiology matured through the 20th century but stalled in the early 21st. It became clear that epidemiology was in- suﬃcient, so political consensus was found to expand the scientiﬁc scope to One Health. One Health treats ecology, animals and humans as a system of systems across dozens of science ﬁelds, using the language of epidemiology. The newly-refocused World Health Organisation is committed to get 2030 global health goals back on track with a One Health approach, with further millions of lives at stake. Many new One Health scientists and scholars are not epidemiologists, and this paper is for them.\nTimeline of epidemiology, stalling around 2010 ","permalink":"https://shearer.org/research/one-health-epidemiology/","summary":"\u003cp\u003eIn late 2025 the \u003ca href=\"https://codeberg.org/rbem\"\u003eRule-based Epidemiology Modellings\u003c/a\u003e began to consider the wider context of their work. On the one hand, the\ntechniques of epidemiology save lives at scale, but on the other, emerging diseases and newer health-related epidemics are accelerating. We asked: ”How can a\nscholar quickly grasp epidemiology basics?” This resulted in my paper \u003ca href=\"/files/discovering-epidemiology.pdf\"\u003ediscovering epidemiology\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eBeginning with what epidemiology is \u003cem\u003enot\u003c/em\u003e, we see how disease management saved millions of lives from about 1950. Epidemiology matured through the 20th century\nbut stalled in the early 21st. It became clear that epidemiology was in- suﬃcient, so political consensus was found to expand the scientiﬁc scope to One Health.\nOne Health treats ecology, animals and humans as a system of systems across dozens of science ﬁelds, using the language of epidemiology. The newly-refocused\nWorld Health Organisation is committed to get 2030 global health goals back on track with a One Health approach, with further millions of lives at stake. Many\nnew One Health scientists and scholars are not epidemiologists, and this paper is for them.\u003c/p\u003e","title":"One Health and Epidemiology"},{"content":"Patents and the MIT License Some of my software projects use MIT so I have studied this issue. Although in many respects the world has moved on from copyright wars to much higher-stakes legal shenanigans, the detail of licensing still matters.\nIn my case:\nMy LumoSQL project is based on probably the most-used software, SQLite, whose license states it is in the \u0026ldquo;Public Domain\u0026rdquo;. The meaning of this isn\u0026rsquo;t entirely clear in some cases, and a 21st century software project starting decades after SQLite shouldn\u0026rsquo;t copy this. I chose MIT as a commonly accepted alternative, but which license is that exactly, and what does the text imply about patents? This is known, but I had to dig. The MIT license is massively used, but who will defend it if needed? We know the answer for the GPL, and also Apache-type licenses. I am now satisfied that quite a lot of enormous organisations really do care about MIT. There are lots of reasons why MIT isn\u0026rsquo;t ideal, but in my view those are trumped by it being widely accepted as fit for purpose, and relied upon by organisations who care that it remains effective and unambiguous. My notes are mostly kept in my many contributions to the Wikipedia page on the MIT License since that is where the decades-old knowledge of the MIT license origins is already maintained. The legal minds in many of the largest companies in the world seem to accept that at least in the US the MIT license implies a patent grant. As probably the most-used open source license, the MIT license has many wealthy corporate defenders if anyone wanted to test that idea.\n","permalink":"https://shearer.org/notes/mit-license-patent/","summary":"\u003ch2 id=\"patents-and-the-mit-license\"\u003ePatents and the MIT License\u003c/h2\u003e\n\u003cp\u003eSome of my software projects use MIT so I have studied this issue. Although in many respects the world has\nmoved on from copyright wars to much higher-stakes legal shenanigans, the detail of licensing still matters.\u003c/p\u003e\n\u003cp\u003eIn my case:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eMy \u003ca href=\"/articles/lumosql/\"\u003eLumoSQL project\u003c/a\u003e is based on probably the most-used software, SQLite, whose license states it is in the \u0026ldquo;Public\nDomain\u0026rdquo;. The meaning of this isn\u0026rsquo;t entirely clear in some cases, and a 21st century software project starting decades after SQLite shouldn\u0026rsquo;t copy this.\u003c/li\u003e\n\u003cli\u003eI chose MIT as a commonly accepted alternative, but which license is that \u003cem\u003eexactly\u003c/em\u003e, and what does the text imply about patents? This is known, but I had to dig.\u003c/li\u003e\n\u003cli\u003eThe MIT license is massively used, but who will defend it if needed? We know the answer for the GPL, and also\nApache-type licenses. I am now satisfied that quite a lot of enormous organisations really do care about MIT.\u003c/li\u003e\n\u003cli\u003eThere are lots of reasons why MIT isn\u0026rsquo;t ideal, but in my view those are trumped by it being widely accepted\nas fit for purpose, and relied upon by organisations who care that it remains effective and unambiguous.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eMy notes are mostly kept in \u003ca href=\"https://sigma.toolforge.org/usersearch.py?name=DanShearer\u0026amp;page=MIT+License\u0026amp;server=enwiki\u0026amp;max=\"\u003emy many contributions\u003c/a\u003e to the \u003ca href=\"https://en.wikipedia.org/wiki/MIT_License\"\u003eWikipedia page on the MIT License\u003c/a\u003e since that is where the decades-old knowledge of the MIT license origins is already maintained. The legal minds in many of the largest companies in the world seem to accept that at least in the US the MIT license implies a patent grant. As probably the most-used open source license, the MIT license has many wealthy corporate defenders if anyone wanted to test that idea.\u003c/p\u003e","title":"Patents and the MIT License"},{"content":"I made a new website recently. My goals:\nModern-looking Easy to maintain, minimal infrastructure Content lasts indefinitely even as web technologies come and go I decided on a static website, with content in Markdown and a modest amount of templating. I chose the Hugo static site generator with the PaperMod theme, plus a second theme for CV-type timelines. I used bundled system fonts (no Google Fonts tracking by calling googleapis).\nI added small customisations using CSS and Hugo shortcodes including colour themes, a general timeline (in addition to the CV one), handy infoboxes and the like. Hugo makes this quite easy to achieve while still using mostly standard markdown. That bodes well for being able to move to other systems as the years roll on.\nThe full site source is at codeberg.org.\nI\u0026rsquo;ve used many templating systems over the years, including from before there were websites. If you think Hugo looks complicated, consider that the problem space is much harder than it looks at first, having a lot in common with writing a compiler.\n","permalink":"https://shearer.org/notes/how-this-site-is-made/","summary":"\u003cp\u003eI made a new website recently. My goals:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eModern-looking\u003c/li\u003e\n\u003cli\u003eEasy to maintain, minimal infrastructure\u003c/li\u003e\n\u003cli\u003eContent lasts indefinitely even as web technologies come and go\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eI decided on a \u003ca href=\"https://en.wikipedia.org/wiki/Static_web_page\"\u003estatic website\u003c/a\u003e, with content in \u003ca href=\"https://commonmark.org/\"\u003eMarkdown\u003c/a\u003e and a modest amount of templating. I chose the \u003ca href=\"https://gohugo.io/\"\u003eHugo\u003c/a\u003e static site generator with the \u003ca href=\"https://github.com/adityatelange/hugo-PaperMod/\"\u003ePaperMod\u003c/a\u003e theme, plus a \u003ca href=\"https://github.com/mfg92/hugo-shortcode-timeline\"\u003esecond theme\u003c/a\u003e for CV-type timelines. I used bundled system fonts (no Google Fonts tracking by calling googleapis).\u003c/p\u003e\n\u003cp\u003eI added small customisations using \u003ca href=\"https://en.wikipedia.org/wiki/CSS\"\u003eCSS\u003c/a\u003e and \u003ca href=\"https://gohugo.io/templates/shortcode-templates/\"\u003eHugo shortcodes\u003c/a\u003e including colour themes, a general timeline (in addition to the CV one), handy infoboxes and the like. Hugo makes this quite easy to achieve while still using mostly standard markdown. That bodes well for being able to move to other systems as the years roll on.\u003c/p\u003e","title":"How this site is made"},{"content":"My new website is nice enough, but it really needs work. I\u0026rsquo;m offering prizes!\nSmall fixes for wording, grammar or links \u0026mdash; my warmest thanks A page or more of such small fixes \u0026mdash; I will buy you the (non-outrageous) beverage of your choice A substantial improvement or correction consisting of a page or more \u0026mdash; a pizza from a mutally agreed place 10 non-trivial pull requests for the codeberg repository \u0026mdash; I\u0026rsquo;ll help you learn Linux, if that\u0026rsquo;s a thing you want Assistance to help me fix items from the following list \u0026mdash; prizes as per the above, based on scale/complexity Things to be added or improved:\nMedical research papers and implementation software Startup case studies, references and opportunities A whole section on AI and Causality my CV Design and functionality - tweaks to the layout and visual content Diagrams. I have a pile of \u0026ldquo;wanted\u0026rdquo; diagrams, and some in-progress Pictures. If you\u0026rsquo;re interested in swapping knowledge about the tech stacks involved then I\u0026rsquo;m up for that. I\u0026rsquo;m not any kind of web designer (I don\u0026rsquo;t even play one on TV), I just want it to be tidy and durable.\n","permalink":"https://shearer.org/notes/site-challenge/","summary":"\u003cp\u003eMy \u003ca href=\"https://shearer.org\"\u003enew website\u003c/a\u003e is nice enough, but it really needs work. I\u0026rsquo;m offering prizes!\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eSmall fixes for wording, grammar or links \u0026mdash; my warmest thanks\u003c/li\u003e\n\u003cli\u003eA page or more of such small fixes \u0026mdash; I will buy you the (non-outrageous) beverage of your choice\u003c/li\u003e\n\u003cli\u003eA substantial improvement or correction consisting of a page or more \u0026mdash; a pizza from a mutally agreed place\u003c/li\u003e\n\u003cli\u003e10 non-trivial pull requests for the \u003ca href=\"https://codeberg.org/danshearer/shearer.org-website\"\u003ecodeberg repository\u003c/a\u003e \u0026mdash;\nI\u0026rsquo;ll help you learn Linux, if that\u0026rsquo;s a thing you want\u003c/li\u003e\n\u003cli\u003eAssistance to help me fix items from the following list \u0026mdash; prizes as per the above, based on scale/complexity\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThings to be added or improved:\u003c/p\u003e","title":"Website challenge"},{"content":" This file is a Code of Conduct first written in 2020 for the LumoSQL project. Here is Version 1.6 – Updated 9th February, 2026. Heavily adapted and compressed from the large and repetitive version 3.1 of the Mozilla Participation Guidelines and published by LumoSQL under the Creative Commons Attribution-ShareAlike 4.0 International license.\nContents LumoSQL Code of Conduct This file exists because the LumoSQL Project needed it, less than one year after starting in 2019. We take it seriously, and hope that most English-reading adults can understand what is said. We hope this is not needed very often.\nThe LumoSQL Project welcomes contributions from anyone who shares the LumoSQL Project Goals. This document outlines both expected and prohibited behaviour.\nShort Summary The rest of this document expands on just three points:\nLumoSQL participants are to be respectful and direct with each other. We will not tolerate bullying, racism, sexism, or, constant domineering behaviour. No personal attacks, and generally stay focussed on what we are trying to achieve. Who Should Feel Safe? Everyone, regardless of diversity dimensions including:\nGender, identity or expression Age Socioeconomic status Sex or sexual orientation Family status Race and/or caste and/or ethnicity National origin Religion Native or other languages When Should They Feel Safe? Working with other LumoSQL community participants virtually or co-located Representing LumoSQL at public events Representing LumoSQL in social media What is Expected? The following behaviours are expected of all LumoSQL community participants:\nBe Respectful Value each other’s ideas, styles and viewpoints. Disagreement is no excuse for bad manners. Be open to different possibilities and to being wrong. Take responsibility, so if someone says they have been harmed through your words or actions, listen carefully, apologise sincerely, and correct the behaviour.\nBe Direct but Professional We must be able to speak directly when we disagree and when we think we need to improve. Try not to withhold hard truths. Doing so respectfully is hard, doing so when others don’t seem to be listening is harder. Hearing such comments directed at yourself can be even harder.\nBe Inclusive Seek diverse perspectives. Diversity of views and of people gives better results. Encourage all voices. Help new perspectives be heard and listen actively. If you find yourself dominating a discussion, step back and give other people a chance. Observe how much time is taken up by dominant members of the group.\nAppreciate and Accommodate Our Similarities and Differences Be respectful of people with different cultural practices, attitudes and beliefs. Work to eliminate your own biases, prejudices and discriminatory practices. Think of others needs from their point of view. Use preferred titles (including pronouns). Respect peoples right to privacy and confidentiality. Be open to learning from and educating others as well as educating yourself.\nBehaviour That Will Not Be Tolerated The following behaviours are unacceptable, as should be obvious:\nViolence and Threats of Violence Are Not Acceptable Offline or online, including incitement of violence or encouraging a person to commit self-harm. This also includes posting or threatening to post other people’s personal data (“doxxing”) online.\nDerogatory Language Is Not Acceptable Hurtful or harmful language related to any dimension of diversity is not acceptable.\nThis includes deliberately referring to someone by a gender that they do not identify with, or questioning an individual’s gender identity. If you’re unsure if a word is derogatory, don’t use it. When asked to stop, stop the behaviour.\nUnwelcome Sexual Attention or Physical Contact Is Not Acceptable This section is here because it has been proven to be needed where LumoSQL has been present. It is not some formality. If you don’t think it is needed in the world today, you have not been paying attention.\nUnwelcome sexual attention online or offline, or unwelcome physical contact is not acceptable. This includes sexualised comments, jokes or imagery as well as inappropriate touching, groping, or sexual advances. This also includes physically blocking or intimidating another person. Physical contact or simulated physical contact (potentially including emojis) without affirmative consent is not acceptable.\nConsequences of Unacceptable Behaviour Bad behaviour from any LumoSQL community participant can’t be tolerated. Intentional efforts to exclude people from LumoSQL activities are not acceptable (except as consequences according to this Code of Conduct).\nReports of harassment/discrimination will be promptly and thoroughly investigated by the people responsible for the safety of the space, event or activity, with a view to taking action.\nAnyone asked to stop unacceptable behaviour is expected to stop immediately. Violation of these guidelines can result in you being ask to leave an event or online space, either temporarily or for the duration of the event, or being banned from participation in spaces, or future events and activities.\nParticipants who abuse the reporting process will be considered to be in violation. False reporting, especially to retaliate or exclude, will not be accepted or tolerated.\nReporting If you believe you’re experiencing unacceptable behaviour as outlined above please contact one of the current authors.\nAfter receiving a concise description of your situation, they will review and determine next steps.\nPlease also report to us if you observe someone else in distress or violations of these guidelines.\nIf you feel you have been unfairly accused of violating these guidelines, please follow the same reporting process.\n","permalink":"https://shearer.org/articles/code-of-conduct/","summary":"\u003cdiv class=\"article-intro\"\u003e\nThis file is a Code of Conduct first written in 2020 for the LumoSQL project. Here is Version 1.6 – Updated 9th February, 2026.\n\u003cp\u003eHeavily adapted and compressed from the large and repetitive version 3.1 of the Mozilla Participation Guidelines and published by LumoSQL under the Creative Commons Attribution-ShareAlike 4.0 International license.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch2 id=\"contents\"\u003eContents\u003c/h2\u003e\n\u003ch1 id=\"lumosql-code-of-conduct\"\u003eLumoSQL Code of Conduct\u003c/h1\u003e\n\u003cp\u003e\u003cstrong\u003eThis file exists because the LumoSQL Project needed it, less than one year after starting in 2019. We take it seriously, and hope that most English-reading adults can understand what is said. We hope this is not needed very often.\u003c/strong\u003e\u003c/p\u003e","title":"Code of Conduct"},{"content":"The detail of the GDPR and its implied computer science contain a solution for sharing secrets according to law. This continues to be true in 2026, as the Digital Omnibus Regulation takes shape.\nExecutive Summary\nThe GDPR sets up a conflict in trust between companies in particular circumstances, which can only be resolved by using the automation of a cryptographic audit trail with particular properties as described below.\nProblem Statement\nUnder the EU\u0026rsquo;s GDPR law virtually every company is a Controller, and virtually all Controllers use at least one Processor. When a Processor is engaged, the GDPR requires that a contract is signed with the very specific contents spelled out in clause 3 of Article 28. The GDPR requires that Controllers and Processors cooperate together in order to deliver data protection, and this cooperation needs to be very carefully managed to maintain the security and other guarantees that the GDPR also requires. That\u0026rsquo;s what this mandatory contract is intended to achieve.\nIn other words, the GDPR simultaneously requires strong cooperation and strong security between companies who don\u0026rsquo;t trust each other or have common goals. This is difficult to resolve, but it must be resolved because the parties are legally required to sign a contract to say it is resolved.\nWhat Does Article 28 Say?\nHere are a few highlights:\nThe processor: 3(a) \u0026hellip; processes only the personal data only on documented instructions from the controller\n3(c) \u0026hellip; takes all measures required pursuant to Article 32 [about security]\n3(f) \u0026hellip; assists the controller in ensuring compliance with the obligations pursuant to Articles 32 to 36 taking into account the nature of processing and the information available to the processor;\n3(h) \u0026hellip; makes available to the controller all information necessary to demonstrate compliance with the obligations laid down in this Article and allow for and contribute to audits, including inspections, conducted by the controller or another auditor mandated by the controller\n\u0026mdash; full text of Article 28. About Controllers and Processors If you are familiar with the European Union\u0026rsquo;s GDPR, and the roles of the Controller and Processor, then you will be aware of the need for a formal agreement (usually a contract) between every Processor a Controller uses.\nEffectively, every company is at least a Controller of personal data, certainly if there are employees or customers. Most companies use at least one Processor for data under their control, from an IT support company, to online/cloud storage providers, to companies who consultant and outsource in many ways. A contract between a Processor and a Controller is very carefully specified in legal terms in the GDPR, but the technology implications are not mentioned. This is all in the GDPR Article 28.\nAbout Sharing Access to Data 📌 The Problem Not sharing data, but access to the data - for example, does an employee of the Controller log on to the computer system in the Processor? And if so how? This is the kind of scenario that puts shivers down the spine of security professionals, yet here it is mandated in the GDPR. Access to Data is Impractical to Share: Controllers and Processors could be almost any company using almost any system, so sharing the access to the personal data across organisations just wouldn\u0026rsquo;t work. Personal data is stored in a different way in every organisation - at a minimum, in difference choices from the 200-odd main databases on the market for instance, besides all the non-database systems in use, and the policies and habits unique to every company.\nAccess to Keys can be Practically Shared: No matter how diverse the storage mechanism for the personal data, the secret keys will always be one of a few types. Most often passwords, but also multifactor authentication codes, or cryptographic key files, or one of a small list of other means of authentication.\nNobody Wants to Share Keys: Article 28 says that these passwords or other keys need to be available for sharing between Controllers and Processors at any time. And yet no company is happy handing out passwords to their internal systems to unknown people, and anyway this could easily become a breach of the GDPR and the forthcoming ePrivacy legislation.\nWhere Computer Science Comes In When a Controller engages a Processor, there are many circumstances under the GDPR when these secret keys need to be shared between these parties, parties who should not trust each other. These circumstances include:\nconducting audits of privacy and security, as the GDPR antcipates the controller will need to do allowing the controller to know precisely who has access to the data residing on the processor\u0026rsquo;s systems allowing the controller to know precisely who has accessed the data on the processor\u0026rsquo;s systems allowing the controller to see when the list of users with access changes, and how it has changed Therefore handling of the keys to the personal data is of crucial importance. The law requires that you give some degree of access, perhaps a lot of access, to a company whom you have never met and have no reason to trust. Computer Science has given us many ways to think about interacting with people we do not trust, so this is a problem that can be solved.\n📌 It's all about keys The central problem is access to the data, not the data itself Taken together, a Computer Science view of the various laws, regulations and regulatory bodies strongly implies that some kind of cryptographically guaranteed auditing process is needed to control the keys required to access personal data held by a processor.\nThe inputs to consider are:\nthe post-trilogue texts from the EU ePrivacy Regulation (officially cancelled November 2025, see Annex 4, Item 29 the EU Cybersecurity Act the European Communications Code the EU NIS Directive EU-level security and privacy bodies ENISA and the EU Data Protection Board Early drafts and discussions around the proposed Digital Omnibus Regulation All of the above is human rights-based legal pressure, so how should this be implemented in software?\nArticle 28(3)(c) says \u0026ldquo;takes all measures required pursuant to Article 32\u0026rdquo; so Article 32 is a part of the mandatory contract, and Article 32 is about security, and security has some pretty clear definitions - or at least, unambiguous definitions about what is not secure, as well as some fairly well agreed minimum standards as to what is required for good security. There will always be some degree of ambiguity about whether something is secure, but you only have to competently show that something is insecure once for experts to agree it is indeed insecure.\nTo discover exactly what kind of cryptographic solution will work, we need to look at the information flows mandated by the GDPR.\nGDPR Article 28 Information Flow A close reading of the mandatory instruments (normally contracts, but not necessarily) in GDPR Article 28 shows that the required flow of information between Controllers and Processors is almost entirely one way, from the Processor to the Controller. The Processor has to make numerous undertakings and promises to the Controller, stated in a legally binding manner.\ngraph LR controller((Data Controller .)) == \"sends audit requests to .\" ==\u003e processor((Data Processor .)) processor -- \"replies with audit answers .\" --\u003e controller processor -- \"informs of breach .\" --\u003e controller classDef green fill:#9f6,stroke:#333,stroke-width:2px; classDef orange fill:#f96,stroke:#333,stroke-width:4px; classDef blue fill:#99f,stroke:#333,stroke-width:2px; class controller green class processor blue class i1 orange There is a lot of mandated potential communication from the Processor to the Controller, meaning that in various circumstances, there will be communication from the Controller to the Processor if the Controller wishes and the Processor has no right to deny that access. At any time the Controller can demand the Processor produce information to prove that processing is compliant, or to require the Processor to assist the Controller in certain activities. The Controller is bound by the GDPR to be able to prove at all times that processing is compliant whether or not a Processor has been engaged.\nWhen the Controller is a small company of 20 employees, and the Processor is a giant cloud with thousands of employees, it is clear that this power dynamic is not currently set up for the small company to require the giant cloud to comply with audit requests. Giant clouds are required to respond, but most of the time its going to take a lot of effort to force practical compliance.\nRelationship of the Parties to Article 28 Contracts Basic security practice is that the parties to such information flows should not trust each other; they are independent entities who in many cases will not have any other dealings. In addition, each are under very strict legal requirements of the GDPR and the (imminent) ePrivacy Regulations, and the (imminent) EU Electronic Communications Code.\nArticle 28(1) says \u0026ldquo;the controller shall use only processors providing sufficient guarantees\u0026rdquo;. According to the computer science referred to in this article, it is possible to define a minimum value of \u0026ldquo;sufficient guarantee\u0026rdquo; under the GDPR, but even without that analysis, the Controller must seek some guarantees from the Processor and they need to be not only good guarantees but sufficient to back up the rest of Article 28.\nThis means that parties to Article 28 contracts are required to meet a particular standard, but also that the parties should not trust each other to meet this standard or any other good behaviour.\nArticle 28 is All About Processors Article 28 is all about the Processor being bound to the Controller, with the Controller saying and doing nothing outside what is already said in the rest of the GDPR text. The only reference to a Controller in Article 28 is that the contract must \u0026ldquo;set out the obligations and rights of the controller\u0026rdquo; (Art 28(3)) which appears to mean effectively stating \u0026ldquo;Yes I acknowledge I am a Controller and I am acting according to the GDPR\u0026rdquo;.\nThere are just two references in the entire GDPR requiring the Controller taking action with respect to using a Processor. The first is ensuring that there is a contract in place that complies with the GDPR. The second is in Article 32(4), which says \u0026ldquo;the controller and processor shall take steps to ensure that any natural person acting under the authority of the controller or the processor who has access to personal data does not process them except on instructions from the controller\u0026rdquo;.\nTechnical Comments Article 32 emphasises the phrase \u0026ldquo;state of the art\u0026rdquo;, an English expression that has caused much confusion. The phrase is only ambiguous within the confines of English, and since the GDPR is authoritative in multiple languages we can easily compare with German, French and Dutch and see that multiple versions all agree with one of the English meanings. Therefore \u0026ldquo;State of the art\u0026rdquo; means \u0026ldquo;the art as it is practiced today in general\u0026rdquo;, as widely practiced by people who can be described as \u0026ldquo;technical peers\u0026rdquo;, and also as defined by standards bodies and the like. \u0026ldquo;State of the Art\u0026rdquo; does not mean the best, most efficient or most advanced technology in existence. It does not mean the most recent. Here in this discussion we consider technologies mostly developed decades ago and very widely recommended and deployed today, which are still definitely \u0026ldquo;state of the art\u0026rdquo;.\nTechnical Analysis About Audit Records: A log file (Unix) or an EventLog (Windows) is not a quality audit record; it has often been held to be a sufficient audit record in courts worldwide, but in that context it is about balance of probabilities and taking into account other log entries created on other systems at the same time - basically a detective hunt by an expert witness. That sort of thing is an audit process but not a good one and typically only ever involves one logging party. The GDPR Article 28 contract requires that there shall be at least two parties to the audit trail whose actions will be logged, which has not been the case in any law previously. The new EU security and privacy laws use the words \u0026ldquo;appropriate\u0026rdquo;, \u0026ldquo;should\u0026rdquo; and \u0026ldquo;state of the art\u0026rdquo; so much that I think it is non-controversial that the audit standard required is much higher. There needs to be a cryptographically guaranteed, non-repudiable audit trail for activities where none of the actors involved (including auditors) need to trust each other, and no special expertise or context is required to interpret the record.\nTechnical Analysis About Keys: A key of some sort is always required to get access to personal data, be it a password, passphrases, physical door pinpad code, two factor authentication or whatever else guards the access to the systems with personal data on it. The Article 28 mandated contract specifies that under many circumstances a Controller and a Processor release keys to each other and therefore to natural persons in the employ of each other. By auditing the use of the keys, we are auditing the access to personal data. In order to remain in compliance with Article 32, we can change passwords/keys at any time, reset the list of authorised persons and therefore also resetting the audit trail. A cryptographically secured audit facility can detect the first time that someone accesses a key.\nTechnical Analysis About the ePrivacy Regulation: I have tracked down the different versions presented for Trilogue, which has now finished. ePrivacy following Trilogue appears to include EU Parliament LIBE Committee amendments from October 2017, including Article 26(a) “In order to safeguard the security and integrity of networks and services, the use of end-to-end encryption should be promoted and, where necessary, be mandatory. Member States should not impose\u0026hellip; backdoors\u0026quot;. If we are having an audit facility for keys to personal data then it should be end-to-end. Like all end-to-end solutions it will upset government spy agencies or any other party that might want to falsify the record through government-imposed backdoors, because such backdoors cannot work according to mathematics.\nTechnical Analysis About the EU Code of Communications: The Code is broader than ePrivacy (which, it can be argued, is limited by its lex specialis relationship to GDPR.) The Code says: \u0026ldquo;In order to safeguard security of networks and services, and without prejudice to the Member States\u0026rsquo; powers to ensure the protection of their essential security interests and public security, and to permit the investigation, detection and prosecution of criminal offences, the use of encryption for example, end-to end where appropriate should be promoted and, where necessary, encryption should be mandatory in accordance with the principles of security and privacy by default and design.\u0026rdquo; We know from Snowden and others that the \u0026ldquo;without prejudice\u0026rdquo; phrase is just being polite, because there is no technical means to implement \u0026ldquo;no backdoors end-to-end crypto\u0026rdquo; and also not make government spy agencies upset.\nMinimum Audit Records Required by Article 28 Detail of Required Audit Records, with their basis in law:\nRequirement 1 - Audit records that list of all natural persons who have access to keys to the personal data, and the changes to that list over time:\nArticle 28(2) \u0026ldquo;shall not engage another processor\u0026rdquo;, so everyone can see whether or not an unexpected person was authorised for access to keys Article 32(4) \u0026ldquo;any natural person acting under the authority of the controller or the processor who has access to personal data\u0026rdquo;, so we need an audit log of who can have access to keys Article 32(4) \u0026ldquo;any natural person acting under the authority of the controller or the processor \u0026hellip; does not process them except on instructions\u0026rdquo;, so we need an audit log of who actually did access the keys at least once.\nRequirement 2 - Audit records for who has accessed the audit records in Requirement 1:\nArticle 28(3) \u0026ldquo;obligations and rights of the controller\u0026rdquo;, shows the controller is watching the processor\nThese audit records can be technically implemented regardless of what IT systems the Controller and the Processor have, because they are only about dealing with the keys. Whoever has the keys has the personal data, and the keys themselves are protected by the GDPR. These audit records are about storing passwords (or other access means.)\nComputer Science doesn\u0026rsquo;t seem to allow any way of meeting Article 28 \u0026ldquo;sufficient guarantee\u0026rdquo; without a zero-trust encrypted audit model, which these types of audit records enable.\nConclusions About Implementing GDPR Article 28 in Software Conclusion 1:\nthe above minimum audit records are required to fulfill an Article 28 contract between Processor and Controller Conclusion 2:\nif implemented, these records rises to an Article 28(1) \u0026ldquo;sufficient guarantee\u0026rdquo; of a Processor being acceptable and therefore the contract being acceptable Conclusion 3:\nthere does not seem to be any alternative way of achieving a \u0026ldquo;sufficient guarantee\u0026rdquo;. Conclusion 4:\nThe GDPR requires cryptographic audit facilities to exist and therefore, there is a market for companies to provide these facilities.\n","permalink":"https://shearer.org/articles/opportunity-in-gdpr-article-28/","summary":"\u003cp\u003eThe detail of the GDPR and its implied computer science contain a solution for sharing secrets according to law. This continues to be true in 2026, as the \u003ca href=\"https://digital-strategy.ec.europa.eu/en/library/digital-omnibus-regulation-proposal\"\u003eDigital Omnibus\nRegulation\u003c/a\u003e takes\nshape.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExecutive Summary\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe GDPR sets up a conflict in trust between companies in particular circumstances, which can only be resolved by using the automation of a cryptographic audit trail with particular properties as described below.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eProblem Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eUnder the \u003ca href=\"https://gdpr.eu/\"\u003eEU\u0026rsquo;s GDPR\u003c/a\u003e law virtually every company is a Controller, and virtually all Controllers use at least one Processor. When a Processor is engaged, the GDPR requires that a contract is signed with the very specific contents spelled out in clause 3 of Article 28. The GDPR requires that Controllers and Processors cooperate together in order to deliver data protection, and this cooperation needs to be very carefully managed to maintain the security and other guarantees that the GDPR also requires. That\u0026rsquo;s what this mandatory contract is intended to achieve.\u003c/p\u003e","title":"Opportunity in GDPR Article 28"},{"content":" When Linux was a Struggling Challenger 💡 Key Point This is a 2026 restoration of my (Dan Shearer\u0026rsquo;s) 1998-2001 guide, preserved at archive.org. Links have been updated to point to the archives where possible. In 1999 I joined my first startup, Linuxcare in San Francisco. The Linuxcare story is a quintessential United States dot-com bubble narrative, featuring a famous venture capital fund, massive growth, a failed IPO, and a fancy new ex-IBM CEO resigning under a cloud. Founded in 1998, Linuxcare aimed to be the \u0026ldquo;0800 number for Linux\u0026rdquo;. So close!\nAs Samba co-founder I had experience migrating corporates away from Microsoft servers, and brought my how-to with me to Linuxcare. Microsoft was extremely sensitive to documents like this, and downright allergic to the word \u0026lsquo;replace\u0026rsquo;. I found out years later this article was targetted for disinformation in key Microsoft accounts. This was a battle of narratives as well as monopolistic behaviour and technical facts.\nMore than a quarter of a century later, the technical content has aged fairly well. Quite a lot of the migration solutions were easier than they would be now, because quality of the Microsoft/Windows solutions was so poor. Many of the monopolistic facts remain much the same, which is slightly depressing. But some things were unimaginable in 1999:\nSamba both succeeded and failed but Samba (or something like it) may have another chance The EU is serious about alternatives Exciting times! But not for Linuxcare, which had spectacular human failures and quietly collapsed from 300 staff to 3 within a couple of years.\nHow to Replace Windows NT with Linux by Dan Shearer (dan@linuxcare.com)\nApril 26, 2000\nTable of Contents\nIntroduction Linux Adoption is Nothing to be Afraid of How to Migrate A Migration Methodology Do it! Appendix - Protocol and Software Reference Glossary of Terms and Acronyms 1. Introduction Most IT managers already understand the \u0026ldquo;why\u0026rdquo; of Linux and Open Source, and many are considering adopting Linux. Microsoft Windows NT network administrators are now facing a forced migration to Windows 2000. For many, however, a migration to Linux makes more sense. Careful planning is needed, however, in order to manage such a migration responsibly. How costly will such a migration be? How difficult? How time consuming?\nThis paper is about the \u0026ldquo;how\u0026rdquo; of Linux, concentrating on the challenges involved with migrating large and heterogeneous network environments. (If you need more on the \u0026ldquo;why\u0026rdquo; aspect, consult the papers and case studies at www.unix-vs-nt.org.) Replacing Windows NT is not always quick and easy (although it can be), but the return on a sound Linux investment is always worth the effort.\nA methodology is presented here which will help you plan a migration that causes minimal disruption while providing maximum functionality. Microsoft has attempted to complicate these tasks by closing once-open technologies (\u0026ldquo;embrace and extend\u0026rdquo; is what they call it). All they have succeeded in doing, however, is providing network managers an even stronger incentive to adopt Linux. In this paper you will find pointers to tools that allow truly open standards to be gradually deployed in a mixed Linux/NT environment, making it a simple step, when the time is right, to eliminate Windows NT altogether.\n2. Linux Adoption is Nothing to be Afraid of Those who have been doing Microsoft or PC networking for a few years have probably experienced many previous migrations. Perhaps you have migrated your systems from Digital Pathworks or 3Com 3+Share to IBM/Microsoft LAN Manager, then later from LAN Manager to Windows NT 3.1 (if you were brave) or to Windows NT 3.51 (if you were not). Later, you possibly migrated to NT 4, and from there to every service pack. Most NT 4.0 service packs were, in effect, major system upgrades frequently resulting in unforseen difficulties and requiring careful testing and planning. If you started from a Netware or Banyan base and moved to NT, you had equally large headaches. Let\u0026rsquo;s not even talk about Apricot\u0026rsquo;s idea of networking. If you run Windows NT today then you are facing the spectre of an expensive and forced migration to Windows 2000.\nMigrating to Linux is a task of equal scale. The need to train support staff, to test the new solution, to preserve data from previous sysstems, to transfer user accounts and check access permissions\u0026ndash;all of these are the same. On the other hand, migrating to Linux is easier in many ways because reliable support is available. With Linux, \u0026ldquo;reliable support\u0026rdquo; means not only being able to get the help you need to solve your current problems, it also means that you are empowered to prevent such problems from happening again in the future.\nPerhaps the most attractive thing about a migration to free and open source software is that the skills you pay to develop are actually a very solid investment. Every operating system supplier claims this, but think of it this way \u0026ndash; what is all that expensive Windows NT training worth now that Windows 2000 is here? And was it you or Microsoft who decided when those skills would become obsolete? Linux skills remain applicable for as long as you choose to have software around, and there is rarely any need to upgrade more than a few components at any one time.\nWindows 2000 forces you to a new directory scheme, a complete new suite of mail, Internet, and other servers, and also demands enormous hardware resources. What degree of pain will Windows 3000 impose? In comparison, Linux offers a very attractive migration path.\n3. How to Migrate If you are reading this document, you probably already know why you should migrate to a Linux-based system. It\u0026rsquo;s the \u0026ldquo;how\u0026rdquo; of doing such a migration that can often be overwhelming. Here are some quick tips to keeping the scope of the task to a manageable scale:\nDon\u0026rsquo;t migrate everything at once. Frequently, the best way to handle a migration is to phase NT out of the server area first, then to later concentrate on the workstations. There are, of course, many other ways to divide the task into more palatable pieces. Some people pick classes of server applications (such as web, database, file/print) and address each of these in turn. Others choose to have a policy of maintaining dual environments on the desktop.\nAvoid application development. It is always tempting to fix obviously bad programs during a migration. It is far better, however, to have multiple stages in a migration, between which you can address application issues. The key here is to avoid trying to do everything at once.\nLinux does more, so use its capacities. Doing a cautious and well-planned migration doesn\u0026rsquo;t mean that you have to lose functionality. Linux can do things that are impossible with NT and other systems, and can also save you both time and money.\nUse fewer, more open, protocols. The larger the number of protocols you use in your networks, the larger the network management overhead. While \u0026ldquo;open\u0026rdquo; can be difficult to define precisely, you can be fairly certain that if every part of a protocol is documented and there are free implementations available, then chances are that it\u0026rsquo;s open. If a protocol is described in one of the Internet RFC documents, that\u0026rsquo;s another good indication that the protocol is open.\n4. A Migration Methodology There are four steps you can follow to simplify a migration away from Windows NT. The first three of these steps show you how your data is currently being accessed, and also how this data can be accessed differently. The final step provides a Venn diagram illustrating possible deployment options.\nThe four steps are:\nList your most important data stores, including those administered by users and those administered by network managers.\nList the various client programs that are currently used to access these data stores.\nList the protocols and APIs the client software uses when accessing these stores.\nPrepare a \u0026ldquo;protocol intersection\u0026rdquo; diagram.\nThese steps are protocol and API driven, and will allow you to map a variety of migration paths from the \u0026ldquo;ideal path\u0026rdquo; to those which are restricted by various constraints (such as having to be able to run a particular Windows application). Once you know all the possible routes you can take, you will then be better able to select those which are most appropriate for you and your organization.\n4.1. Identify Data Stores\n4.1.1. User-Maintained Data Stores\nChances are that your users keep lots of data sources up-to-date. Some of these sources may be located on a workstation or a server, and some these are likely to be \u0026ldquo;unofficial\u0026rdquo;. If these data sources stop working, of course, you will be in trouble. Some examples of these sources are:\nEmail, often one of the most important business resources in a company. Email archives contain huge amounts of information, and users have probably put a lot of effort into using features of their mail clients, such as personal address books and mail filters. In what formats are these data sources stored? What mail servers and protocols are in use? What authentication methods are they using?\nCalendaring and scheduling. Mail services are often bundled with collaborative scheduling systems (such as in a Microsoft Exchange environment), and these can be among the most challenging systems to migrate due to the lack of standards for these features.\nFile resources. Users often have huge amounts of data stored as collections of Microsoft Office documents on an NT or Netware server. Information stored in this fashion is often vital, but can difficult to search and migrate.\nYou might consider re-engineering large filestores like these (but not as part of your migration) Look at the structure of the documents. If extensive use has been made of Microsoft Office, WordPerfect, or other such templates, then it is quite likely that the same functionality can be delivered more reliably and cheaply using a Web forms interface to a database. In some cases, this can eliminate the need to have an office suite on the client systems, particularly those used by telephone sales or customer service staff.\nDatabases. On the server side, these include packages such as Oracle and Microsoft SQL Server, and on the client side, packages such as Microsoft Access, xBase, and more. The goal is to maintain the same interface for the users who keep these databases up-to-date, which is often a service that keeps the company running. A long-term strategy may be to move the interface to the Web, but in many cases the short-term answer is to retain the Windows client interface while re-engineering the protocol/API used to access the database itself.\nWeb servers. There are three kinds of web data to consider:\nRaw content. Web site content maintainers need to know that their current content editing programs will still be usable after the migration is complete. This usually means that Windows programs such as FrontPage and PageMill must continue to work. What information is stored in these formats? How is this information accessed?\nDynamic content. Your Web developers also need to know whether their NT-specific scripts and applications will change. Often the answer is \u0026ldquo;no\u0026rdquo;, or \u0026ldquo;not much\u0026rdquo;. NT users of PHP and Perl should be almost completely insulated from changes. Sometimes, when complicated functionality is required (perhaps because business logic has been embedded in Microsoft ActiveX objects or other proprietary technologies) the same functionality can be emulated using standard open technologies. You will probably be able to split the functionality up and replicate the majority of it on Linux Web servers. The remaining functions can stay on NT systems until you have time to replace them with open solutions.\nDynamic content from other sources. Dynamic Web sites often pull their data from many sources, often in Microsoft-specific ways. List the data sources being used and the methods being used to access them.\n4.1.2. Data Stores Maintained by Network Managers\nThe following are examples of data that might be maintained by your network or system administrators, including user and machine information. You will likely have more and different data sources than those presented here, particularly if you support many other operating systems on your network.\nUser database. This would include the name and full details for each network user, and their associated security properties. Windows NT servers store this information in one SAM database per domain.\nGroups and permissions. This information is also stored in the SAM database, but is often replicated in supplementary databases because SAMs have a restricted set of fields relating to groups.\nComputer and network database. Every computer has certain physical and network properties which need to be maintained in a central data store of some sort. Windows NT servers don\u0026rsquo;t tend to store this information at all except through unreliable NetBIOS names and per-machine SID numbers. Good Windows NT network administrators usually build custom databases in which they can more reliably store this machine-specific information, including IP addresses, physical locations, and other related data.\nBackup archives. These will be maintained in some NT-specific format, frequently devised by third-party software vendor. The native Microsoft backup facilities aren\u0026rsquo;t very useful, so this third-party software is often necessary.\nServer logs. Windows NT access logs are unwieldy, and are rarely authoritative in a multi-domain environment. If you want to migrate this functionality, you will be quickly and pleasantly surprised by the log-management tools that come with Linux.\n4.2. List Current Client Software\nWhile there is a huge range of client software available for Microsoft Windows workstations, there are actually fewer than 10 suppliers providing the majority of the applications used in large networks. Bundling arrangements with a few top-tier suppliers such as Microsoft, SAP, Lotus, and Oracle means that solving client migration problems with these vendors\u0026rsquo; systems usually solves the majority of other client problems as well.\nThe lack of drop-in replacements for some client software (especially Microsoft clients) is not usually a problem. The protocols these clients use can be catered to by Linux servers, so a multi-stage migration interspersed with some client re-engineering usually provides a sufficient solution. In any case, few sites start by migrating client workstations to Linux immediately in order to delay training and other human resource issues.\nThe easiest way to start planning client system migration is to construct a table, such as the following, which addresses the specific requirements of your organization.\nMicrosoft Windows Client Software\nProduct Purpose Can Use Linux Servers? Functional Replacements Linux Versions? MS Outlook Express (several concurrent versions exist with different feature sets Individual and Shared Email, Scheduling Yes Many, including Lotus and Netscape None Netscape Messenger Individual and Shared Yes Any Internet-compliant Mail Client Yes MS Frontpage Publishing web pages, including image maps and CGIs Yes Very many None MS Internet Explorer Viewing Web Pages Yes Many Not yet, but runs on other Unix platforms MS Office Edit structured documents and spreadsheets Yes Several good existing and more announced No Web-based Customer Relationship Managment package Organisation-wide CRM tasks Yes Not an issue since it is web-based Yes, any Linux web browser with Javascript such as Netscape or Opera In-house MS Access Database Maintaining vital database kept on a file server Yes, by several means Many including Oracle, web front-ends and xBase but requires rewriting client None In-house Oracle Client Program Maintaining vital database kept on Oracle Server Yes As above; must rewrite client program Announced Remote Windows Application Display Dispalying screens from Windows NT, Terminal Server Edition No X Window remote application display No 4.3. List Protocols and APIs Used by Client Software\nThe following is an example of a list you might create when recording the protocols relevant to the Microsoft client software used within your system. Most networks will use most of the protocols shown below, but there may be a few used on your network that aren\u0026rsquo;t included here. When you\u0026rsquo;re not sure what protocols are being used by a particular system, you should use a network sniffer to identify them rather than relying on the product brochures.\nThe interesting thing about this list is that nearly all non-standard Microsoft technology is based either on something that already exists, or on something that is documented at a lower level. Microsoft\u0026rsquo;s \u0026ldquo;embrace and extend\u0026rdquo; policy is meant to eliminate competition, but it has also enabled and motivated teams of programmers to unscramble the Microsoft protocol extensions at roughly the same rate that Microsoft devises them.\nWhat this means is that while you should make every effort to move networks entirely to open and standardised protocols that are not controlled by Microsoft, there are some excellent bridging solutions available which implement Microsoft\u0026rsquo;s proprietary protocols under Linux.\nNot all of the protocols Microsoft uses are proprietary, of course. In many instances, the non-standard protocols are simply preferred by Microsoft clients when talking to Microsoft servers. These systems can often be easily reconfigured to use standard open protocols when necessary. Outlook Express is a classic example of this, in which IMAP is supported quite extensively, but the client is unable to connect to both an IMAP server and a native Exchange server at the same time, even if the Exchange server is running the IMAP service.\nIn the following table, \u0026ldquo;MSRPC\u0026rdquo; means Microsoft\u0026rsquo;s preferred method of communicating control data in NT networks: DCE/RPC over Named Pipes over SMB over NetBIOS. All of these acronyms are explained in the glossary, although for practical purposes how it works is irrelevant.\nSimilarly, \u0026ldquo;MSRDP\u0026rdquo; means Microsoft\u0026rsquo;s equally complicated way of sending screen images over a network, such as from Microsoft Windows NT Terminal Server Edition. This protocol is a proprietary variant of T.SHARE (ITU T.128), over the Multipoint Communications Service (MCS), over the ISO Data Transport Protocol, tunneled over TCP.\nProtocols Preferred by Microsoft Products\nPurpose Perferred Protocol/API Documented MS Outlook Express clients to talk to MS Exchange Server MAPI streamed over MSRPC Encrypted in an undoc way Frontpage clients to talk to MS Internet Information Server FrontPage Server Extensions Undocumented MS Internet Explorer Clients to talk to MS IIS Extensions to HTTP and HTML Undoc MS Access clients to communicate with MS SQL Server ODBC streamed over Tabular Data Stream (TDS) Extended ODBC, TDS undoc MS clients to talk to NT Servers for anything related to the SAM, authentication or administering NT services Control requests via MSRPC Undoc requests \u0026amp; undoc encrypt MS File/Print clients to transfer files to any MS File/Print server SMB (NT clients use MSRPC) Partly doc MS clients to locate MS server and clients NetBIOS Name Server Partly doc Transport for previous three protocols NetBEUI always preferred when present and possible Doc, but a dying MS protocol. A free version has been released for Linux by Procon, but it is too early to predict what will happen with it MS clients to link NetBIOS names and Internet names and addresses WINS Mostly doc MS clients to access remote Windows screens MSRDP Built on existing standards with proprietary extensions Protocol Equivalents and Implementations\nProtocol Free Implementation Open Alternative Comments MAPI streamed over DCE/RPC for mail stores No IMAP mail access protocol and related standards The Cyrus mail and related products suite at asg.web.cmu.edu are an excellent and scalable replacement for Microsoft Exchange MAPI streamed over DCE/RPC for calendaring No ACAP Calendar access protocol and related standards If you want to keep Outlook Express Calendaring and Scheduling you can use HP Openmail for Linux. FrontPage Sever Extensions Yes, by Mrs. Brisby, www.nim.org/fpse.shtml WebDAV, www.webdav.org FPSE is only needed with Microsoft FrontPage MS Extensions to HTTP and HTML No Yes Important bits implemented in browsers on Linux and Windows from Netscape, Opera and others. Users won\u0026rsquo;t miss SMB-in-HTTP ODBC streamed over TDS Yes, www.freetds.org Seems to be a general lack of standards Better to use ODBC over a truly open transport. NT Control requests over DCE/RPC Yes, in Samba SNMP, which has numerous Linux implementations, Also a large range of web control tools Undoc requests \u0026amp; undoc encryption - a truly horrible protocol MS File/Print clients to transfer files to any MS File/Print server Yes, in Samba (server) and smbfs/smbclient (clients) Partly doc. A solved problem MS clients to locate MS server and clients NetBIOS Name Server in Samba Internet standard Resource Location Protocol, or alternatively LDAP Only partly doc, but well-implemented in Samba anyway Transport for previous three protocols NetBEUI always preferred when present and possible Doc, but a dying MS protocol. Even Microsoft doesn\u0026rsquo;t recommend it. As of March 2000 there is a free Linux implementation from www.procom.com MS clients to link NetBIOS names and Internet names and addresses Samba WINS server Use DNS instead! Mostly doc 4.4. Draw a Protocol Intersection Diagram\nUsing the tables that you have drawn up in the previous steps, you should be able to list the following (see the Protocols and Software Reference for more information):\nThe set of protocols/APIs that can be used to make the existing client software talk to servers (whether currently in use or not).\nThe set of protocols that free server software can use to serve the existing data stores.\nThe set of protocols free client software can use to access information from the data stores.\nA Sample Protocol Intersection Diagram\n5. Do it! Once you understand where your data is and how it can be accessed, you will be able to draw up a feasible multi-stage migration plan. This is always highly specific from site to site, but if you follow the tips given earlier in this paper you will be able to design a staged migration based upon more open and standardised protocols.\nAfter this point, however, the migration is up to you and will depend heavily on your knowledge of the network. Which parts of your infrastructure can be most easily migrated? It may be the file servers or perhaps the Oracle databases. Are there some performance bottlenecks that Linux can solve for you? If so, perhaps these are the first areas you should address.\n6. Appendix - Protocol and Software Reference Many of the software packages in this reference run on most kinds of Unix, as well as on Linux, without modification. Where you see \u0026ldquo;Unix\u0026rdquo; in the following table, you should therefore include \u0026ldquo;Linux\u0026rdquo; as well.\nThe acronyms in this section are explained in the Glossary.\n6.1. File Serving\nProtocol Software Microsoft: SMB suite Windows 95, 98, Samba on Unix and others, print servers, Netapp filers et al SMB+NT extensions Windows NT, Windows 2000, Samba on Unix and others Novell: IPX/SPX suite Novell Netware, mars_nwe under Linux Unix: NFS nfsd - standard with any Unix AFS Andrew Filesystem - free distributed filesystem for Unix Coda free distributed filesystem with mobile synchronisation FTP Servers available for any Internet-capable operating system Apple: Appleshare Apple file server from Apple, netatalk for any Unix The Microsoft model of networking encourages use of file sharing rather than application sharing. That is to say that every workstation has a complete copy of an application binary stored locally while data is stored on servers. This is the most common use for NT servers. Microsoft Windows Workstations are often also used similarly with Novell Netware. If this describes your situation, then you would do well to think about accessing the same data via the Web.\nLinux is able to serve files over all of these protocols. If required, Linux can serve files over all of them simultaneously. Configured properly, Samba running on Linux is able to perform as an SMB server at least as well as Windows NT. On large installations (ie on hardware more powerful than anything Windows NT can run on) Samba happily handles hundreds of thousands of simultaneous SMB clients.\nFew of these protocols are suitable for general Internet use due to timing and resource location issues. Currently, there is no widely-adopted file access protocol with is simultaneously secure, able to operate between physically distant machines, and easy to integrate into modern authentication architectures.\n6.1.1. Migration Comments\nIn some cases, a simple redesign of your application structure may allow you to dispense with file sharing. For example, making data accessible via the Web rather than through proprietary Microsoft Office files. Regardless, however, duplicating Windows NT shared file resouces on Linux is trivial. The challenge lies in getting the authentication systems right, as discussed below. The PAM authentication system allows a very flexible migration strategy to be adopted, independent of whether the authentication database is an NT domain, an NIS domain, an LDAP server, or a custom SQL database.\n6.2. Client-side Filesystems\nProtocol Software Linux: NFS Standard with Linux (mount -t nfs) SMB Standard with Linux (mount -t smbfs) IPX Standard (mount -t ncpfs) and enhanced client from Caldera Coda Standard with Linux (mount -t coda) Apple Free add-on to Linux (mount -t afpfs) Microsoft: SMB Comes with Windows 95, 98, NT, 2000 IPX Comes with Windows 95, 98, NT, 2000, not as functional as SMB NFS Third-party addons, but no really good ones. Ignore them Coda Free addon, but not widely known or tested While Microsoft has failed to dominate the LAN server market, it has also successfully avoided including any protocol other than SMB on its client operating systems. Microsoft has accomplished this by keeping the development information required to write a successful client filesystem a proprietary secret, available only if you purchase a software development kit under non-disclosure terms. Samba, however, has made it unnecessary to reverse-engineer any of the programming interfaces involved, because Samba allows almost everything to be done on the server side.\nBy locking out serious client-side filesystem competiton, Microsoft has forced Windows users to forego the advantages of modern filesystems. Fast, secure, and intelligent distributed filesystems exist, but Windows users cannot expect to be able to use these any time soon.\n6.2.1. Migration Comments\nIt is common to keep existing Microsoft Windows clients unchanged during the first stages of a migration. It is also common to keep using these clients with traditional file stores, even though it might be better to use the Web instead (see comments under \u0026ldquo;File Serving\u0026rdquo;). If this is the case in your migration strategy, you should be using SMB. Samba, the free SMB implementation, is extremely capable and robust, and has a large and dedicated development team. Microsoft Windows clients are also better integrated with SMB (and therefore Samba) than they are with Novell IPX (or mars_nwe). While NFS can be made to work with Windows clients, it is a very poor and insecure system, and isn\u0026rsquo;t really worth the effort required to implement it.\nUsing Samba, it is possible to pass some of the benefits of modern networked filesystems on to the Windows clients. Pay careful attention to locking issues, however, when using Samba as a gateway in this fashion. Read-only access does not present any locking issues (such as sharing CD ROMs, or sharing a network filesystem via a web server) but in any read-write situation there is potential for serious locking problems to arise.\n6.3. Printing Services\nProtocol Software Servers: lpr Any Unix, Windows NT, Novell, many others SMB Samba on Unix and others, Windows NT IPX mars_nwe on Linux, Netware, Windows NT Clients: SMB Samba on Unix and others, Windows 95 and Windows 98 lpr Any Unix, Apple, Windows NT Workstation IPX Netware clients The only major platforms that cannot use the Unix lpr printing protocol natively are 16-bit Windows 95 and Windows 98. Third-party addon software is available for these operating systems.\n6.3.1. Migration Comments\nA common solution is to move to using lpr throughout an organisation except where 16-bit clients are concerned. These 16-bit clients, can be served from Samba. If each client has to be reconfigured for other reasons anyway, however, then an lpr solution should be used on 16-bit Windows systems as well in order to reduce the number of protocols being used.\nWhen dealing with Windows NT clients, it is just as easy to connect to printers via lpr as via SMB. This being the case, you may choose lpr in preference to SMB to avoid an extra layer of complication in your network. It is sometimes better, however, to send all Windows client printing through Samba so you are able to later make changes that affect only the Windows printer users. It is more difficult to isolate the Windows users if they are all using lpr directly.\n6.4. Email Services\nIn the following, \u0026ldquo;All major client software\u0026rdquo; means Netscape Messenger, Microsoft Outlook Express, mutt, pine, Lotus Notes cc:Mail, Pegasus, Qualcomm Eudora and others of similar sophistication.\nProtocol Software Servers: SMTP Any mail transport on Unix and most on Windows NT and other operating systems. SMTP is more flexibly implemented on Unix than any other platform RFC822 \u0026amp; MIME These mail encoding and formatting standards are supported by any Internet-compliant mail transport and reading software IMAP Cyres imapd (free), uWashington impafd (free), Microsoft Exchange, many others POP An ancient but still widely-used protocol. Useful in organisations without a well-planned email strategy, where amil folders tend to be store on local hard discs (probably not backed up either!) MAPI (over MSRPC) Microsoft Exchange, HP Openmail HTTP Many mail store servers have a web interface. Microsoft Exchange has one, as has Lotus Notes and others. On Unix a component apporach is preferred, and there are many web interfaces to IMAP servers available LDAP Mailing lists, accounts and mail permissions ought to be sotred in an LDAP database. Exim, qmail, Sendmail and others on Unix, Netscape Mail Serveron all platforms, other Clients: IMAP All major client software MAPI Most Windows clients, including Microsoft, Lotus, Pegasus and Netscape. No Unix clients because it is a Windows-only API SMTP Just about all clients on all platforms. SMTP is the only Internet-standard way of submitting email. There are secure versions of it. Microsoft Outlook clients ca do SMTP but prefer the strange MSRPC format where available HTML All major client software can handle messages encoded in HTML, however plain text is always the best option for message body text. If you want a structured document format enclose it was an attachment r put it on the web and email the url RTF This Microsoft Word Processing format is supported natively by Microsoft Outlook, and by external viewers in other mailers. It is a very bad ide to have enabled in any context. Disable it. RFC822 \u0026amp; MIME All major clients software. However there are many MIME RFCs to do with internationalisation, security, large files and more. Microsoft do not try to provide a complete implementation, which is difficult for some Asian and European languages and anyone who wants secure email. LDAP Star Office mail, Netscape Communicator, Pegasus. Not the counterpart of LDAP in a mail server. LDAP on a client should be used for things like addressbooks. 6.4.1. Migration Comments\nAny large deployment of mail servers has to be customised to fit the site. Commercial software always seems to make this level of customisation difficult or impossible, and as a result, free software tends to be much better for the server side of things. On the other hand, commercial software currently fares better on the client side. Some commercial clients, such as Mulberry, are outstanding for their standards compliance. There are, of course, some equally good free client alternatives.\nThe most scalable and flexible Linux-based IMAP mail store solution is the free Cyrus mail server. There are many choices available for the mail transport component, including Sendmail, Exim, Qmail, and others. With software like this, along with the SASL authentication mechanism, the ACAP client configuration protocol, and LDAP, it is possible to build an extremely powerful enterprise system using only free software components. The client software can still be Windows or Macintosh Eudora, Outlook Express, Netscape Communicator, or any of dozens of other available client systems.\nMoving away from Microsoft Exchange is trivial from an email point of view because Microsoft Outlook Express clients are also capable of using the IMAP protocol. You can experiment with this by switching your Exchange server to IMAP-only and changing the configuration of your Outlook Express clients. Once this works, you can implement a Linux-based IMAP server without your users ever noticing the difference.\nIf you use Lotus Domino or Netscape Mail Server, there have been recent announcements regarding the availability of this software for Linux platforms. The simplest route for this part of your migration may be simply to transfer your existing software license to a Linux version of the same software when the product is made available.\nThe calendaring and scheduling functions of Exchange, Domino, and Netscape Calendar Server are dealt with in the next section.\nOne of the tricky things about migrating IMAP servers is moving mail and setting permissions for thousands of mailboxes at a time. One of the best things to do is use the Perldap library. Sample code has been posted to Cyrus forums for doing this, including with web interfaces.\n6.5. Calendaring and Scheduling\nCalendaring is a strange area. Most products support most of the standard protocols, but interoperability between clients and servers from different vendors is still very poor. No calendar access protocol yet exists, which is mostly because of the intertia behind the commercial calendaring systems and their proprietary protocols (Microsoft and Lotus are both major players in the IETF standards committee). Internet standards in this area have only recently been finalised, and at this time only free software implements calendaring that is in compliance with the few standards and standards drafts that currently exist.\nCybling Systems has a project to attempt to untangle these issues at www.cyblings.on.ca/projects/calendar.\nProtocol/format Software Servers MAPI over a transport MS Exchange, HP Openmail. Not a published standard Other proprietary Calendar servers with Star Office, Netscape Suite Spot, Corporate Time and others Web-based access All major iCalendar, vCalendar All major, except Microsoft vCard All major, except Microsoft SMTP All major ICAP Anything using the MCAL (Modular Calendar Access Library) library, such as www.bizcheck.com. PHP and GTK+ applications exist. This is not an Internet standard and the draft has expired CAL The official direction of the ISO and IETF bodies for calendaring standards. No product anywhere implements this Internet draft LDAP Clients MAPI over a transport MS Schedule+, MS Outlook Express Other proprietary All clients, due to lack of calendar access standard vCard Most major SMTP Some minor LDAP Netscape, StarOffice, other minor The paper at www-me1.netscape.com/calendar/v3.5/whitepaper/index.html summarises a vendor\u0026rsquo;s view of Internet calendaring standards (provided the vendor is not one of the two who have millions of existing proprietary clients and the ability to stall the standards process!)\nThe best that any calendar software implementor can do at the moment is implement the following protocols: iCalendar, vCalendar, vCard, SMTP (for e-mail notification), LDAP (for details of all users, groups and items that can be scheduled) and X.500 (in very large corporate environments). This will change as soon as the ICAP Calendar Access Protocol or its equivalent becomes an Internet standard.\n6.5.1. Migration Comments\nIf at all possible, you should use a Web-based calendar client with a server that supports as many Internet standards as possible. If you must use Microsoft Outlook Express, then HP Openmail is the only non-Microsoft option available. The calendar servers from the Star Office and Netscape Suite Spot server suites can provide good interim solutions in many situations. The Corporate Time calendaring product is an example of a calendaring system that uses all the available standards (see www.cst.ca). There are other examples, but for the moment the area is fraught with difficulty.\n6.6. Web Servers\nThe Apache Web server is free software that is currently used on over 55% of Web sites on the Internet, with Microsoft IIS being used on 24%. Reliable data is hard to find for Intranet deployments, but it seems likely that Microsoft is being used on a higher percentage of Intranet servers.\nWeb publishing is best done using the standard WebDAV protocol (www.webdav.org/other/faq.html), but the widely-used Microsoft FrontPage packages use the undocumented Front Page Server Extensions protocol. Both of these are implemented on Linux.\nProtocol/API Software HTTP v 1.1 All major ISAPI prog. interface All major ASP scripting IIS, Apache PHP scripting Apache, IIS others Data interfaces eg ODBC Apache, IIS, others FrontPage Extensions Apache (via http://www.nimh.org.invalid/fpse.shtml WebDAV (open FPSE) Apache, Netscape 6.6.1. Migration Comments\nMicrosoft is not dominant in the Web server market, so there are not nearly as many difficulties in migrating to a non-NT system. Administrators should find Apache easier to configure and run for large and mission-critical sites.\nOne of the big issues involved when migrating away from Microsoft IIS servers is the use of Active Server Pages (ASP). If the language used for ASP is Perl rather than Visual Basic then there should be minimal difficulties (see www.on-luebeck.de/doku/asp/).\nA migration to Linux usually includes replacing IIS with Apache. You can start this aspect of the migration by running Apache on a Windows NT server if there are OS-specific integration issues that require more time to solve.\nThere are other free Web server solutions available for Linux, including Roxen (www.roxen.com), that have particular strengths used for electronic commerce and in a couple of other specific areas. Zeus (www.zeus.co.uk) is a commercial Web server available for Linux which is quickly increasing its market share (see the surveys available at www.netcraft.co.uk).\n6.7. Database Servers\nThe Linux database server market is booming. Microsoft is currently the only major vendor who has not produced a closed-source Linux version of their database offering. PostgreSQL and MySQL are the leading free software contenders.\nMost database servers are accessible via the ODBC API which packages SQL calls. Differences arise as to how ODBC calls are transported, which is where ODBC \u0026ldquo;bridges\u0026rdquo; come in to the picture. ODBC bridges obviate the need for common protocols, albeit in a rather clumsy fashion.\nThere isn\u0026rsquo;t much to discuss in the way of protocols, except that Sybase and Microsoft SQLServer use a partially-undocumented Sybase protocol called TDS when communicating ODBC queries. Microsoft has extended this protocol in even more undocumented ways, but a free implementation does exist (see http://www.freetds.org). This is important only because Microsoft Access uses TDS by default when communicating with SQLServer.\n6.7.1. Migration Comments\nIf you can eliminate TDS from your network, you will reduce the overall complexity of your database system.\nIf you have an NT data source that you want to be able to access from Linux, the ODBC Socket Server (odbc.linuxbox.com) will allow you to do this.\nNote that it is important to get the Primary Key Definition right when making ODBC calls to non-Microsoft databases from Microsoft Access.\n6.8. Firewalls, Gateways, DNS and other Basic Internet Services\nThis is one area where Microsoft has made relatively little headway in corrupting Internet standards. Microsoft has produced variants of DHCP, PPP, and numerous other \u0026ldquo;glue\u0026rdquo; protocols, but remains a minor player in the network management layer. As such, Microsoft is unable to influence the market at the expense of open Internet standards.\nIf you are running any of these services on a Windows NT machine, then you are putting yourself at risk. Windows NT simply is not able to provide any verifiable degree of security when operating as a firewall due to Microsoft refusing to allow peer review of their code. For the same reason, even if the Microsoft DNS server wasn\u0026rsquo;t already famous for being unreliable, there have been enough security holes identified in the free open-source DNS server implementation to warn anyone away from relying on a very young, closed-source implementation.\n6.9. Things Not Covered in this Paper\nWindows source code migration. If you are fortunate enough to have the Windows source code to applications that you wish to run on Linux then there is a great deal that can be done to make this as simple as possible without requiring a code rewrite. This will be the subject of another Linuxcare paper!\nAuthentication systems, Linux PAM and mixed authentication environments. With a combination of PAM on Linux and Unix systems and LDAP as the master authentication database it is possible to authenticate against every likely protocols. Samba can authenticate Windows clients using PAM to talk to LDAP, RADIUS dialup authentication servers can do the same, as can any other service which runs on Linux. There is also an LDAP schema which supplies all required NIS+ information so that LDAP becomes a true distributed directory service. This is a whole paper on its own!\nThe Service Location Protocol (RFC2608). This is for locating services of any kind on an Intranet, with defined mappings to LDAP and other standard repositories.\nDatabase application migration to free Linux databases. Recent work by the Postgresql team means that PostgreSQL can now deliver all the functionality of large commercial databases such as Oracle.\nEmail address book formats and access mechanisms, especially relating to ACAP and LDAP.\nExtent of the Calendaring and Scheduling protocol mess, and recent positive signs.\n7. Glossary of Terms and Acronyms ACAP\nApplication Configuration Access Protocol, a protocol being developed by the IETF. ACAP supports IMAP4-related services. asg.web.cmu.edu/acap/\nAFS\nAndrew File System, an old but innovative distributed filesystem. See the FAQ at angelfire.com/hi/plutonic/afs-faq.html. Modern replacements exist, such as Intermezzo by Linuxcare employee Phil Schwan.\nApache\nAn Open Source Web server developed by the Apache Group, a large group of open source developers from many companies including Linuxcare (Martin Poole and Rasmus Lerdorf.) According to recent surveys, it is estimated that Apache is used on approximately 58% of servers on the Web. You can get more information about Apache and the Apache Group at http://www.apache.org.\nAPI\nApplication Program(ming) Interface, a set of routines, protocols, and tools for developing software applications.\nASP\nActive Server Pages, a Microsoft specification for creating dynamically-generated Web pages that utilizes Microsoft Active X components, usually via Microsoft VBScript or Perl.\nCAP\nCalendar Access Protocol Internet Draft draft-ietf-calsch-cap. See imc.org/ids.html#calsch.\nCGI\nCommon Gateway Interface, a specification for transferring data between a Web server and a CGI program. A CGI program is any program designed to accept and return data that conforms to the CGI specification. CGI programs can be written in any number of programming languages, including C, Perl, or Java. For more information about CGI, see http://www.w3.org/CGI/.\nCoda\nA free distributed filesystem intended to solve the problem of disconnected filesystems (eg wandering laptops.) Replaced by Intermezzo.\nCorporate Time\nExample of commercial corporate scheduling packages that tries to be as standards-compliant as possible. www.cst.ca\nCRM\nCustomer Resources Management. A buzz-word for software that manages a database of all information to do with potential, existing and past customers.\nCyrus mail server\nAn extremely robust and scaleable free email storage server. Tends to cooperate with the leading implementation of new standards including SASL, ACAP and Sieve.\nDCE\nDistributed Computing Environment, a suite of technology services developed by The Open Group (http://www.opengroup.org) for creating distributed applications that run on different platforms.\nDHCP\nDynamic Host Configuration Protocol, a protocol for assigning dynamic IP addresses to devices on a network. For more information see RFC1531 (ftp://ftp.isi.edu.invalid/in-notes/rfc1531.txt).\nDNS\nDomain Name Service, an Internet service that translates domain names into IP addresses.\nEudora\nA popular commercial, closed source email client developed by Qualcomm, Inc. For more information see www.eudora.com.\nExim\nOne of the leading free mail transport programs. http://www.exim.org.\nFPSE\nFront Page Server Extensions, an undocumented method invented by Microsoft for having web publishing software write to a web server. Completely replaced by the Internet DAV standard.\nFTP\nFile Transfer Protocol, a standard internet protocol used for sending files. For more information, see RFC959 (ftp://ftp.isi.edu.invalid/in-notes/rfc959.txt). FTP is still the only Internet-wide file-specific transfer protocol, after more than 20 years.\nGTK+\nGimp ToolKit, a small and efficient widget set for building graphical applications.\nHP OpenMail\nHewlett Packard\u0026rsquo;s answer to Microsoft exchange. By simply replacing the file MAPI.DLL on the client workstations OpenMail can be used as a server for Microsoft Outlook Express clients including calendaring and scheduling. The replacement MAPI.DLL does not communicate with the OpenMail server using MSRPC. openmail.hp.com\nHTML\nHypertext Markup Language, the main language used to create documents on the Web. For more information see http://www.w3.org/MarkUp/.\nHTTP\nHypertext Transfer Protocol, the underlying protocol used on the Web, defining how messages are formatted and transmitted, and how servers and browsers should respond to various commands. For more information see RFC2616 (ftp://ftp.isi.edu.invalid/in-notes/rfc2616.txt).\niCAL\nInternet calendar formal public identifier. imc.org/draft-ietf-calsch-icalfpi See imc.org/ids.html#calsch\niCalendar\nsee iCAL\nICAP\nInternet Calendar Access Protocol. See the www.imc.org url above\nIETF\nInternet Engineering Task Force http://www.ietf.org\nIIS\nInternet Information Server, Microsoft\u0026rsquo;s closed-source Web server that runs on Windows NT. According to the latest figures from Netcraft www.netcraft.co.uk, IIS\u0026rsquo; market share is dropping each month.\nIMAP\nInternet Message Access Protocol, a protocol used for retrieving email messages. For more information see RFC2060 (ftp://ftp.isi.edu.invalid/in-notes/rfc2060.txt).\nimapd\nGeneric name for a daemon, or server process, use to handle IMAP connections.\nIntermezzo\nA distributed file system with a focus on high availability. The principal developer is Phil Schwan, from Linuxcare. For more information see www.inter-mezzo.org.\nIPX\nInternetwork Packet Exchange, an undocumented and closed-source networking protocol used by Novell Netware operating systems.\nISAPI\nInternet Server Application Program Interface, an API developed by Microsoft for it\u0026rsquo;s IIS Web server. Some other Web servers support ISAPI.\nISO\nInternational Organization for Standardization, an organization composed of national standards bodies from over 75 countries. For more information about the ISO, see www.iso.ch. ISO standard typically take years longer to develop than Internet standards. The ISO standards for computer protocols were completely superseded by Internet standards.\nITU\nInternational Telecommunication Union, an intergovernmental organization through which public and private organization develop telecommunications systems. The ITU is a United Nations agency responsible for adopting international treaties, regulations, and standards governing telecommunciations. For more information about the ITU, see http://www.itu.int/.\nITU T.128\nT.128 is the Internation Telecommunication Union\u0026rsquo;s recommendation regarding Multipoint Application Sharing. For more information, see http://www.itu.int/. No open source implementations and closed-source implementations do not have a good record for interoperability. Use the X Window system instead!\nLAN\nLocal Area Network, a computer network that spans a relatively small physical area.\nLDAP\nLightweight Directory Access Protocol, a set of protocols devised for accessing information directories. LDAP is based on the standards contained within the X.500 standard, but is significantly simpler. LDAP supports TCP/IP, which is necessary for any type of Internet access. For more information, see RFC2251, RFC2252, RFC2253, and RFC2589.\nlpr\nThe Unix Line PRinter protocol. Ubiquitous protocol for transferring print jobs around a network.\nMAPI\nMessage Application Programming Interface, a system that enables Microsoft Windows\u0026rsquo; email applications to communicate for distributing mail. This API is only relevant to Windows machines.\nmars_nwe\nOpen source clone of the most functional parts of Novell Netware, usually run on Linux.\nMCAL\nModular Calendar Access Library. mcal.chek.com.\nMIME\nMultipurpose Internet Mail Extensions, a specification for formatting non-ASCII messages so they can be sent over the Internet. Many email clients support MIME, enabling them to send and receive graphics, audio, video, and other different file types. There are many, many MIME-related RFCs (see www.imc.org for more information.)\n**MSRPC **\nMicrosoft\u0026rsquo;s preferred method of communicating control data in NT networks: DCE/RPC over Named Pipes over SMB over NetBIOS. The only open implementation of this is by Luke Leighton of Linuxcare, whose work can be seen in Samba and is explained in his book \u0026ldquo;Samba and Windows NT Domain Internals\u0026rdquo; available from MacMillan Technical Publishing.\nMulberry\nMulberry is a closed-source email client for Microsoft Windows or Apple Macintosh platforms with a Linux version in beta as of January 2000. For more information see www.cyrusoft.com/mulberry/mulbinfo.html. Mulberry is remarkable for its excellent implementation of Internet standards, including new ones such as ACAP. In contrast, applications such as Microsoft Outlook Express and Netscape Communicator frequently implement standards poorly, making more work for administrators and in some cases penalising the end-user.\nMySQL\nMySQL is a multi-user, multi-threaded SQL database server. MySQL is a client/server implementation that consists of a server daemon \u0026ldquo;mysqld\u0026rdquo; and many different client programs and libraries. For more information, see http://www.mysql.org. MySQL and PostgreSQL between them are the most popular open source databases. MySQL is the lighterweight of the two.\nNetBEUI\nNetBIOS Enhanced User Interface, an enhanced version of the NetBIOS protocol used by network operating systems such as LAN Manager, LAN Server, Windows for Workgroups, Windows 95/98, and Windows NT. Documentation is now available but most regard it as a dead protocol. However it is the best SMB transport protocol for the millions of DOS machines still in use and free closed-source NetBEUI stacks for DOS are available for download from IBM and Microsoft. A free Linux version ready for use with Samba was made available in March 2000 at www.procom.com as this paper was being completed.\nNetBIOS\nNetwork Basic Input Output System, an application programming interface that augments the DOS BIOS by adding special functions for local area networks. NetBIOS over TCP/IP is defined in RFC1001 and RFC1002. This is a very poor protocol, implemented in several open source products including Samba (www.samba.org) and derivatives.\nNetcraft\nNetcraft is an internet consultancy based in Bath, England. The majority of its work is closely related to the development of internet services. Netcraft is most famous for its website which is devoted to surveying Internet technologies. For more information, see www.netcraft.com.\nNFS\nNetwork File System, an open system designed by Sun that allows all network users to access shared files stored on different platforms. NSF provides access to shared files through the Virtual File System that runs via TCP/IP. NFS is demonstrably a poor choice for running on Windows-based PCs, due to the bad design of Windows.\nnfsd\nGeneric name for a daemon, or server process, use to handle Network File System connections. Think of it as the Samba equivalent for the NFS protocol.\nNIS\nNetwork Information Server, a Unix directory system for distributing system configuration data such as user and host names between computers on a network. Can be linked to an LDAP database transparently to the client systems, see www.padl.com.\nODBC\nOpen DataBase Connectivity, a database access method developed by Microsoft and widely implemented. ODBC is an API not a protocol.\nPAM\nPluggable Authentication Modules, a general infrastructure for module-based authentication. For more information, see the Linux-PAM pages at kernel.org/pub/linux/libs/pam/.\nPegasus\nA very popular closed-source email client for Windows and Macintosh platforms, available free of charge from New Zealand-based Pegasus Computing. For more information, see www.pegasus.usa.com.\nPerl\nPractical Extraction and Report Language, a programming language originally developed by Larry Wall, now maintained by an extensive team of Open Source developers. Perl is one of the most popular languages for writing CGI scripts. For more information, see http://www.perl.org.\nperldap library PerLDAP, or Perl-LDAP, is a combination of an interface to the C SDK API and a set of object oriented Perl classes. For more information, see mozilla.org/directory/faq/perldap-faq.html.\nPHP\nPHP Hypertext Preprocessor, a web scripting language that is an alternative to Microsoft\u0026rsquo;s Active Server Pages (ASP). PHP runs on Linux, Windows, and many other platforms. The principal author is Rasmus Lerdorf of Linuxcare. For more information, see http://www.php.net.\nPOP\nPost Office Protocol, a protocol used to retrieve email from a mail server. Most email clients support this protocol. For more information see RFC1939 (ftp://ftp.isi.edu.invalid/in-notes/rfc1939.txt).\nPostgreSQL\nPostgreSQL is a object-relational database management system supporting almost all SQL constructs. For more information, see http://www.postgresql.org. See also MySQL.\nPPP\nPoint-to-Point Protocol, a method for connecting a computer to the Internet. For more information see RFC1661 (ftp://ftp.isi.edu.invalid/in-notes/rfc1661.txt).\nqmail\nLike Exim, Qmail is an open source replacement for sendmail, written by Dan Bernstein. For more information, see http://cr.yp.to/qmail.html.\nRFC\nRequest For Comments. For more information, see http://www.rfc-editor.org.\nRFC822\nStandard for ARPA Internet Text Messages (Aug 13, 1982). This defines the basic format of Internet email messages, for example, it says that every message should have a Subject: and Date: header.\nRPC\nRemote Procedure Calls, a protocol that allows for a program on one computer to execute a program on a server. Using RPC, a system developer does not need to develop specific procedures for the server\u0026ndash;the client program sends a message to the server, and the server returns the results of the executed program. For more information, see RFC1831 (ftp://ftp.isi.edu.invalid/in-notes/rfc1831.txt).\nRoxen\nRoxen is a line of Internet server products, the core of which is the Roxen Challenger Web server. Roxen is free software distributed under the GNU General Public License and is distributed with a robust IMAP module. For more information, see www.roxen.com.\nRTF\nRich Text Format, a Microsoft-devised method for formatting documents. The specifications are available but very complex. Fine details of documents (such as table alignment) are often confused in translations. Use XML instead wherever possible.\nSAM\nThe Windows NT Security Account Manager. A database of undocumented format which stores usernames, passwords and other information equivalent to a NIS or LDAP database in the free world. A SAM access tool has been produced by the Samba team which extracts usernames and passwords from the SAM for the purposes of migrating away from NT to Samba.\nSamba\nSamba is an open source software suite that provides file and print services to SMB (otherwise known as CIFS) clients. The principal author is Andrew Tridgell of Linuxcare who is now assisted by a multinational team of open source developers. Samba is the only SMB server apart from Windows NT that has large market share. Samba is freely available under the GNU General Public License. For more information, see http://www.samba.org.\nSAP\nThe US brach of SAP AG, the second-largest software company in the world, based in Germany. Their closed-source Enterprise Resource Planning package is very popular, and runs on Linux.\nSASL authentication\nSingle ASsignment Language, a functional programming language designed by Professor David Turner in 1976.\nSendmail\nSendmail is an open source Mail Transfer Agent distributed under the Sendmail License. For more information, see http://www.sendmail.org. Sendmail is an ancient program responsible for delivering perhaps 70% of all email on the Internet. Modern replacements include Exim and qmail (q.v.)\nSID\nWindows NT Security IDentifier.\nSMB\nServer Message Block, a message format used by DOS and Windows operating systems to share file, directories, and services. A number of products exist that allow non-Microsoft systems to use SMB. Samba is such a system, enabling Unix and Linux systems to communicate with Windows machines and other clients to share directories and files. The SMB protocol is undocumented and has many bad design features. It is effectively monopolised by Microsoft, although there is a public CIFS group.\nSMTP\nSimple Mail Transfer Protocol, the Internet protocol used for sending email messages between servers. SMTP is generally used to send mail from a client to a server. This is the most important protocol on the Internet.\nSNMP\nSimple Network Management Protocol, a set of protocols used for managing complex networks. SNMP works by sending \u0026ldquo;protocol data units\u0026rdquo; (PDUs) to different parts of the network where SNMP-compliant \u0026ldquo;agents\u0026rdquo; store data about themselves in \u0026ldquo;Management Information Bases\u0026rdquo; (MIBs).\nSPX\nSequenced Packet Exchange, an undocumented transport layer protocol used in Novell Netware networks. SPX sits on top of the IPX layer and provides connection-oriented services between two nodes on the network. Like IPX and SMB (q.v.) this protocol should be avoided wherever possible however there are open source implementations.\nSQL\nStructured Query Language, a standardized query language for requesting information from a database.\nStar Office\nStar Office is a suite of office applications, freely available through Sun Microsystems. For more information, see www.sun.com/staroffice/. All support for Star Office is free, and handled by Linuxcare.\nSybase\nOne of the dominant software companies in the area of database management systems and client/server programming environments. Microsoft SQL Server is based on Sybase, which is why Sybase and SQLServer both use the undocumented TDS protocol. www.freetds.org.\nTCP\nTransmission Control Protocol, one of the main protocols used in TCP/IP networks. TCP enables two hosts to establish a connection and exchange streams of data, guaranteeing the delivery of the packets in the correct order.\nTCP/IP\nTransmission Control Protocol/Internet Protocol, a suite of communications protocols used to enable communication between computers. TCP/IP is the defacto standard for transmitting data over networks.\nTDS\nTabular DataStream, a protocol used by Sybase and Microsoft for client to database server communications. A free implementation of TDS is being developed (http://www.freetds.org).\nURL\nUniform Resource Locator, the global address of resources available via the Web.\nWebDAV\nWebDAV is a protocol that defines the HTTP extensions necessary to enable distributed web authoring tools to be broadly interoperable while supporting the users needs. In this respect, DAV is completing the original vision of the Web as a writable, collaborative medium. For more information, see http://www.webdav.org.\nWINS\nWindows Internet Naming server, a name resolution system that determines the IP address that is associated with a particular network computer. WINS is a non-open alternative to DNS.\nX.500\nAn ISO and ITU standard that defines how global directories should be structured. X.500 directories are hierarchical with different levels for each category of information.\nZeus\nZeus is a scalable Web server produced by Zeus Technologies. For more information see www.zeus.com.\n\u0026ndash;\u0026gt;\n\u0026ndash;\u0026gt;\nLinuxcare Professional Services | Linuxcare University | Linuxcare Labs | Linuxcare Technical Support | About Us\nSupport Yourself | Kernel Central | Software Directory | Web Directory\nViewpoints | Product Comparisons | Linux for Business | Goodies\nSite Map | \u0026ndash;\u0026gt;\nFeedback | Careers | Privacy Notice | Legal Notice\nCopyright 1999-2000 Linuxcare, Inc. All rights reserved.\n","permalink":"https://shearer.org/research/how-to-replace-windows-nt-with-linux/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003ch4 id=\"when-linux-was-a-struggling-challenger\"\u003eWhen Linux was a Struggling Challenger\u003c/h4\u003e\n\n\n\n\n\u003cdiv class=\"attention-box\"\u003e\n  \u003cdiv class=\"icon\" data-emoji\u003e💡\u003c/div\u003e\n  \u003cdiv class=\"content\"\u003e\n    \u003cspan class=\"label\"\u003eKey Point\u003c/span\u003e\n    This is a 2026 restoration of my (Dan Shearer\u0026rsquo;s) 1998-2001 guide, \u003ca href=\"https://web.archive.org/web/20000817174145/http://www.linuxcare.com/viewpoints/article/latest.epl\"\u003epreserved at archive.org\u003c/a\u003e. Links have been updated to point to the archives where possible.\n  \u003c/div\u003e\n\u003c/div\u003e\n\n\u003cp\u003eIn 1999 I joined my first startup, Linuxcare in San Francisco. The Linuxcare story is a quintessential United States dot-com bubble narrative, featuring a\nfamous venture capital fund, massive growth, a failed IPO, and a fancy new ex-IBM CEO resigning under a cloud. Founded in 1998, Linuxcare aimed to be the \u0026ldquo;0800\nnumber for Linux\u0026rdquo;. So close!\u003c/p\u003e","title":"How to Replace Windows NT with Linux"},{"content":"This is the second time the Court of Justice has decided the same question. In brief, after 4 years, in 2020 the Court was completely satisfied that the United States violates the privacy of EU citizens when the personal data of EU citizens is visible to the US government, and that the US has no intention of changing its behaviour. Therefore, US companies are not permitted to hold the personal data of EU citizens and residents.\nThere are a few unclear areas and the giant US cloud companies are using their money to spin this issue, but it seems the Court has started a measurable shift in attitudes towards US cloud companies. Even if these companies promise to hold data within the EU, and even if they are otherwise highly compliant, there is no getting around that the US government insists it is able to access all data at all times without asking permission or informing anyone.\nBackground Privacy Shield was a 2016 self-certification scheme for US companies to hold themselves to the strict EU privacy rules. In 2020 Privacy Shield was struck down by the EU Court of Justice. In non-technical terms, the Court said: There is no way Privacy Shield can work. So don\u0026rsquo;t use US-controlled cloud companies such as Google or Amazon.\nIn late 2021 this decision started rippling out across Europe, as one place and then another moves away from these giant US companies, starting with government users. We all like familiarity and wish to avoid change, so this decision seems astonishing to many people. Once organisations get over their surprise, it is not so difficult to do. In 2023 I wrote It remains to be seen what these US cloud companies will do in 2023. Some of them are wealthier than several smaller EU nations combined. By 2026 we had the answer - they used coercion by political, economic and legal means to prevent EU citizens using their own IP to build their own services.\nI have been researching, advising, consulting and teaching on the collapse of Privacy Shield since 2016, including this substantial Privacy Shield paper with extensive references.\nOn 16th July 2020, the EU Court of Justice decision striking the EU-US Privacy Shield was the culmination of years of effort by many people to highlight human rights abuses. Privacy and human rights are two sides of the same coin, and this is demonstrated clearly in privacy shield.\nThe details The details are complicated, as documented in the paper mentioned above. Factors include:\nthe Digital Single Market the six or so EU security and privacy laws are based on international Human Rights (derived from the UN Charter of Human Rights) various US Presidential Executive Orders stripping privacy protections from non-US citizens, which almost-but-don\u0026rsquo;t-quite apply to EU citizens, yet conflicting privacy defaults in EU and US laws dependence on goodwill of the US president to respect EU privacy, rather than relying on US statue US Supreme Court decisions allowing a kind of Universal Jurisdiction in data matters These are a maze of overlapping interests and conflicts, and while it is important for specialists to follow the ins and outs, in the end there is one clear message:\nIf you are dealing with personal data of EU citizens or residents, including any communication where at least one party is an EU citizen or resident or is even in the EU at the time of the communication\u0026hellip; then you should not be using US companies to handle that data\nFAQ Is This Just About Pure Human Rights?\nNo. The EU decided in 2014 to create a Digital Single Market to mirror the physical Single Market. The EU calculated that the only way to do this was to foster trust in consumers, and the only way to do that was to emphasise privacy as a basic Human Right. The thinking of the EU is that economic prosperity will follow if Human Rights are respected. But yes, it is also about pure Human Rights too.\nWhat Does This Mean for Cloud Companies?\nUS Cloud companies such as Google, Amazon, Ebay etc from 2021 are slowly becoming either deprecated or illegal to use within Europe. Multiple countries have already banned these companies for government use, and the restrictions keep tightening. These cloud companies are fighting hard, but this is not the first time the same court has passed the same judgement. There is no legal change for EU-based cloud companies, which are unaffected.\nWhat Does This Mean for EU Tech Companies?\nOpportunity. Facebook, Gmail and Amazon AWS (for example) are far from unique and their technical features have already been replicated elsewhere, although they have large amounts of cash to help them fight and evolve. EU tech companies who have standardised on Google or Amazon APIs for example already know they are committed to regular refreshes and upgrades so change is not unthinkable. For EU Cloud suppliers, competition is already fierce but the barrier to entry is still quite low, once the US Cloud providers are barred. So far it is just EU government that is actively banning US cloud, but this seems to be an inevitable shift.\nWhat Does This Mean for EU Consumers?\nThis process can feel a little like looking for ecological alternatives to common items. Some services are immediately replaceable, such as email. Some require a thoughtful approach depending on the user, such as the physical aspects of Amazon shopping and delivery.\nIsn\u0026rsquo;t EU Cloud Immature?\nNot any more. If you are insisting on hyperscale cloud, there are few EU companies. But cloud at scale 5 million or so is doable without being a Baidu or an Amazon.\nIs the US Government Really That Bad?\n(Oh dear. I will leave this as I originally wrote it in 2022. The intentions and aggression of the US hasn\u0026rsquo;t changed in the time since, just how obvious their attacks are.)\nYes. Even if some others want to behave just as badly, only the US has a majority of the cloud services used by EU citizens and residents. The US explicitly removes all protections from everyone in the world. The states:\nSec. 14. Privacy Act. Agencies shall, to the extent consistent with applicable law, ensure that their privacy policies exclude persons who are not United States citizens or lawful permanent residents from the protections of the Privacy Act regarding personally identifiable information. US Presidential Order Enhancing Public Safety\nTo be legally and politically precise, there was some protection in an agreement called the EU-US Umbrella Agreement, and the US Congress passed the Judicial Redress Act to make the Umbrella Agreement effective. But in July 2020 the EU Court of Justice said none of that is any use. The US still spies on all data all the time, and that is against EU law, therefore US cloud is not permitted to hole EU personal data. These details are in the paper referenced above.\nWhat About the UK?\nThe UK is no longer in Europe, although for an interim period it is legal to store the data of EU citizens in the UK. The UK has backed the US repeatedly in its data protection laws. It is beginning to look like Data Mobility Post-Brexit is going in one direction only, which is away from the UK. This is not yet settled case law, but changes are happening fast in this area. The UK has very little relevance to Privacy Shield-type issues any more.\nWhy is Privacy Described as a Race to the Top?\n\u0026ldquo;Race to the Top\u0026rdquo; describes the entire problem of the EU-US Privacy Shield. The EU has much higher standards in privacy than the US. An EU company can easily detune from EU standards to US standards if required to do business in the US, within certain limitations. It is definitely easy from a technical point of view if systems have been designed with this in mind. Unfortunately for US companies, doing things the other way around is not possible. Even if the US company complies perfectly with every law, the decisions of the US mean that they still have to make EU data available to the US Government. EU companies have a very significant advantage.\n","permalink":"https://shearer.org/articles/eu-us-privacy-battle-origin/","summary":"\u003cp\u003eThis is the \u003ca href=\"https://en.wikipedia.org/wiki/Schrems_II\"\u003esecond time\u003c/a\u003e the Court of Justice has decided the same question. In brief, after 4 years, in 2020 the Court was completely satisfied that the United States violates the privacy of EU citizens when the personal data of EU citizens is visible to the US government, and that the US has no intention of changing its behaviour. Therefore, US companies are not permitted to hold the personal data of EU citizens and residents.\u003c/p\u003e","title":"Origins of EU-US privacy battles"},{"content":"The Fossil source code management system is the most fully-featured alternative to Git, and has had twenty years of development and testing since 2006. After helping Fossil make some changes I now use Fossil for several projects. I also use Git extensively on various software forges (but not GitHub unless I must). Mercurial is actively maintained but has lost most of its mindshare since Mozilla, Bitbucket and others migrated away, and is rarely chosen for new projects today.\nOne-sentence Summary - Why Fossil? 21st century privacy and reproducibility require code to be in an append-only, non-repudiable Merkle tree with strong cryptographic guarantees, and that is what Fossil is by design. In plain terms: every change is cryptographically linked to every previous change, so nothing can be silently rewritten.\nMore Detail - Why Fossil? ✅ Fossil has a simple, small and written-down standard, so \u0026ldquo;people not yet born\u0026rdquo; will be able to read a Fossil repository. Fossil repositories are designed to last for at least 100 years. ✅ Fossil has a second, independently developed implementation of the core data model — libfossil, a C library which is in turn used to create Fossil-compatible apps. libfossil does not yet have complete feature parity with the main implementation, but it is significant insurance: if I want to solve a source management problem Fossil does not address, the library gives me a very big start, and the existence of a second implementation tests the standard. ✅ Fossil treats the code record as immutable by design — it is about as immutable as it reasonably can be. ✅ Fossil is designed for projects of ordinary size and complexity, which means nearly 100% of all projects in the world. By my own ad-hoc measurements, this means \u0026lt; 8 million lines of code, \u0026lt; 800 developers and \u0026lt; 8 thousand checkin events per year since approximately the year 1990. The unscientific measure I settled on was to import the Git repo for the GNU Compiler Collection, which is 7 million lines of code and quite usable with Fossil. This article is about Fossil, but a reasonable person will ask in 2026 \u0026ldquo;Why not Git?\u0026rdquo;.\nBriefly:\n❌ The only Git standard is the source code of Git, written in C and Unix shell script. It is regarded as particularly convoluted code and only a relatively few programming wizards are familiar with it. Therefore Git cannot be a standard, and the storage format is inaccessible to nearly everyone. That is a big weakness in the world\u0026rsquo;s infrastructure. (There is some discussion in the documentation about the on-disk format, which again, is not a standard.) ❌ Git\u0026rsquo;s defaults and design trade-offs were set by the needs of a tiny number of the biggest projects in the world, and all other users have to accept whatever features are good for those few projects. The Linux kernel source tree is enormous, and the Microsoft internal source code repository is much larger again - but nearly all software projects are hundreds of times smaller than these two elephants. Even, say, the very large Postgres database project with 2 million lines of code is twenty times smaller than the Linux kernel. Fossil works great with the Postgres source tree. ❌ Git makes it easy to rewrite history by default and that is very appealing to human psychology: I will just squash my last twenty changes into a single change before committing where my whole team can see it. Unfortunately that decision also means the Merkle tree is not complete. This and other privacy and reproducibility matters are covered in more detail further down this article. ❌ libgit2 is an independent reimplementation of Git as a minimal-dependency C library, used at scale by GitHub and Azure DevOps among others and with near-complete coverage of the Git feature set. Being purely a consumer of the Git system, libgit2 must adjust to changes made upstream and candidly documents areas where it lags. Separately, the Git project is pursuing a libification effort to modularise Git\u0026rsquo;s own C internals into small independent libraries. This reflects longstanding pressure within the Git community for canonical library access rather than reliance on an independent reimplementation. These parallel library efforts reinforce the point: the only complete Git standard is the Git source code itself. ❌ Git source code is a Big Ball of Mud. One telling example is that in 2017 the SHAttered attack was released, demonstrating that SHA-1 collisions are practical to create (although not a general compromise of SHA1). Eight days later a new Fossil release implemented migration to 256-bit SHA-3 with backwards compatibility to existing SHA-1 based repositories. Git has been working on a SHA-256 transition since 2020, but as of 2026 SHA-1 remains the default and there is still no interoperability between SHA-1 and SHA-256 repositories — nine years after SHAttered. It is difficult to make changes to Git source code. 💡 Key Point This article is about why I chose Fossil, rather than comparing with Git. For a very detailed and balanced comparison see Fossil v Git on the Fossil SCM web site. But GitHub is So Successful! Git is not GitHub. But yes, GitHub is successful, and historically played a big role in the movement to make code visible in the first decade of the 21st century. And \u0026lsquo;Git\u0026rsquo; is in the name \u0026lsquo;GitHub\u0026rsquo;, although the company seems to focus mostly on the \u0026lsquo;Hub\u0026rsquo; part of their name these days.\nWe\u0026rsquo;ve been here before. GitHub is a software forge which builds on Git, and Git was a major advance on Subversion, which was the replacement for CVS. SourceForge was the first software forge and was originally based on CVS, and then Subversion, but was still outclassed in terms of features by GitHub and now (despite still having some users and an ad-driven business model) is not a popular place for new projects. Will GitHub similarly slide into obscurity over time? As a US-based cloud company GitHub is unable to offer privacy guarantees that EU-based clouds can. GitHub also is a closed-source cloud offering a restricted level of service for free, and constantly tinkering with their business model and lock-in devices including non-consensual AI tools. These are disadvantages for GitHub in competing with open source forges. GitHub is a good place to search for existing projects, but new projects have options, and alternative forges (all based on Git) are growing fast.\nI looked for something that sets out to meet 21st century challenges including accessibility, and found some good candidates.\nI wanted to find an alternative to Git and GitHub because:\nI could not convince GitHub to fix visual accessibility problems, and I had multiple team members with visual impairments. I spoke to several very polite managers and developers at length. It turns out that despite their billions in the bank, it will be a years-long project for GitHub to implement years-old W3C accessibility standards. That is not acceptable. Even if you only ever use a git commandline, Git comes with a lot of pain\u0026hellip; even the most experienced software developers wrestle with Git and its complexities. Why should a development team should need to worry about losing work? Why should they use an interface so complex that parody man pages look real?. GitHub may be a good solution for the closed-source needs of the very largest companies, but I am one of millions of developers who have totally different needs. In 2022, after 14 years, GitHub started offering a commandline tool that can access some of its features. This is not putting developers first. Git encourages merging of private trees, or the \u0026lsquo;Benevolent Dictator development model\u0026rsquo;, which seems to me to be delaying discussion until after code is written. This is what Git supports well because it is the Linux model. There are many online resources and entire training companies dedicated to undoing the default Git/GitHub workflow practices. I like my projects to instead be a tighter \u0026lsquo;cathedral-style\u0026rsquo; development community, with discussion happening as code is developed, all branches visible to everyone, and no long-lived active branches. GitHub wants to be at the centre of CI/CD, and with closed source APIs and services as part of every toolchain. This means that GitHub becomes part of the reproducibility chain, except since GitHub is not transparent these toolchains are not reproducible. There is a common class of source tree management problems that GitHub could address where Git fails, one which no SCM can currently solve. This is the problem of non-diffable trees, some of which is addressed by the Not Forking project. GitHub is the biggest source tree management company in the world, but it does not appear to have thought about this problem - or if it has, does not even give me the tools to explore solutions for myself. There are two open source EU-hosted alternatives to GitHub that feel to me like they could have a long and happy future - SourceHut and Codeberg. SourceHut is able to work with non-Git DVCSs as proved by its Mercurial support while Codeberg currently only works with git. Both are open source, and I use Codeberg extensively and have installed instances of Forgejo, the software behind Codeberg.\nOther people discuss moving away from GitHub:\nMigrating from GitHub to Codeberg — Zig Programming Language, November 2025 — the highest-profile move of the period. Zig made its GitHub repository read-only and declared Codeberg the canonical origin. A Programmer\u0026rsquo;s Guide to Leaving GitHub — lord.io, January 2026 — covers the ethical and political dimension in depth. Why I\u0026rsquo;m migrating my projects away from GitHub — andrlik.org, September 2025 — moved to self-hosted Forgejo and Codeberg, valuing non-profit governance and EU-based privacy. Migrating away from Github — BIT-101, September 2025 — after 123 repositories, moved fully to Codeberg. Saying goodbye to GitHub — notthebe.ee, December 2025 — moved to self-hosted Forgejo, chose it for modest resource requirements and UI familiarity. In Addition: Security and Privacy Issues More Complex Than They Seem There are many well-established security projects on GitHub but that does not mean GitHub is safe, only that these projects have ways (such as a lot of funding) to minimise the risks. The campaign Give Up GitHub presents a comprehensive view of why open source software developers should move elsewhere. From my personal point of view, LumoSQL and Sweet Lies projects are two small examples of open source security projects which must be completely sure that the source code is exactly as the developers wrote it, and that the source code has not been interfered with, and that the developers have not had their own personal data misused.\nFollowing are ways that Git and GitHub would potentially cause security problems in my projects:\nGit actively encourages users to break the Merkle tree. Rather than an inviolate historical record, Git users expect to produce a curated version of their local tree (especially with the \u0026lsquo;git rebase\u0026rsquo; command used to squash commits). This YCombinator thread discusses the pros and cons of squashing commits. It is difficult to find the descendants of check-ins in Git. It is so difficult that neither native Git nor GitHub provide this capability, and you need to write code to crawl the commit log. This makes it hard to find what descendent code may have been affected by an upstream bug or deliberate code insertion. GitHub is closed source, and since it is also strongly focussed on third-party toolchain integration, that means we cannot know how secure the toolchain is. In April 2021 there was an example of GitHub giving credentials to a compromised toolchain partner. In 2023 GitHub Actions exposed credentials. GitHub is a US-controlled company. As a risk assessment: the US has a history of actively working to insert vulnerabilities into encryption systems and believing that their NOBUS (Nobody But Us) policy can work. My projects are critical security systems, so I have to weigh the possibility that GitHub could be instructed by the US government not to inform me of an attack against my projects. Plenty of other countries have unpleasant laws on these topics of course, but the US is the one relevant to GitHub. GitHub has US Cloud issues, which correctly means it should not be used by EU developers, etc. While this is legitimately serious, it is common to all US cloud companies. Fossil may have many security issues too, but it does not have the entirely avoidable ones listed above.\nWork Done on Fossil Before I could use Fossil, I needed some changes:\nFossil was not then a commodity, off-the-shelf SCM, and I needed users to be able to just get it easily for their favourite operating system. Fossil only had one implementation, which is something I dislike about Git too. A vital standardised data format should have multiple tools that can read it. As an example of this, any proposal for an Internet RFC can\u0026rsquo;t be considered unless there are at least two independent implementations, because that is how the standard is tested. I discovered some small but significant bugs in Fossil\u0026rsquo;s Git compatibility. So I invested significantly in Fossil, and these problems were fixed:\nI became a temporary packaging intermediary with the main distributions. This has been successful\u0026hellip; recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen. I assisted Stephan Beal\u0026rsquo;s libfossil to roar back into life as a second, completely independent implementation of the Fossil data model. Multiple implementations are really important and being a library means the world can have multiple front-end alternatives to the official Fossil app. I don\u0026rsquo;t want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now. libfossil is great insurance. I completed a privacy review of Fossil, and debated my proposal in public. Some of that involved discussion of privacy arcanae. After being accepted as a code contributor, I have made commits to the Fossil tree. I participate in the Fossil forum, which is an efficient and friendly group discussion. Fossil as a LumoSQL Test Case Not only is Fossil a better SCM for the needs of my projects, but it is also a very demanding test case for LumoSQL. Fossil is built on SQLite, in fact Fossil and SQLite are symbiotic projects, and Fossil is the one SQLite application all SQLite developers are guaranteed to use. If Fossil can run on LumoSQL without a problem, and potentially even with some advantages, then LumoSQL will have passed a major milestone.\nNot GitLab Either This was a lesser consideration, because once Git and GitHub were ruled out that also ruled out GitLab. But it is worth recording that GitLab has a different version of the same kinds of issues as GitHub:\nGitLab is proprietary closed source wrapped around an open source core. From my experience with GitLab instances I don\u0026rsquo;t believe it is possible to host your own fully-functional GitLab - for example, with full text search. Perhaps it is possible to hack the GitLab source to add functionality back in, but I have not tried. I assume that will never be possible because of the threat to the GitLab business model, and so I moved on. GitLab integrates with many of the same third party toolchain services as GitHub, and has been affected by similar security problems as GitHub. GitLab does try to address the common inefficient Git practices with their Git Flow process. This tries to get closer to the default Fossil way of doing things, but adds a lot of overhead to do so. GitLab Inc. is incorporated in Delaware, headquartered in San Francisco, and trades on the US stock exchange. It is subject to the same US law and US cloud issues as GitHub and other US cloud companies. Codeberg/Forgejo seems a good Git Forge I have done a few experiments using Forgejo and I use Codeberg. If Git is what you are looking for (which is true for most people) then this European-hosted open source forge appears to address many of the problems of GitHub and GitLab discussed above.\nI cannot recommend Sourcehut yet in quite the same way for general use. SourceHut is impressive in its API-first, low UI, decomposable design and is a community effort, but on the whole not what most developers seem to be looking for now. I hope that the resilience and persistence of the tiny core team pays off and that the thoughtful design gets what it deserves - deployment with multiple front ends and integrations. It is wonderful that Mercurial demonstrated it could be a SourceHut backend (no other Forge does this so well, and most can\u0026rsquo;t do it at all), and Fossil certainly could become a third peer backend.\n","permalink":"https://shearer.org/articles/fossil/","summary":"\u003cp\u003eThe \u003ca href=\"https://www.fossil-scm.org/\"\u003eFossil\u003c/a\u003e source code management system is the most fully-featured alternative to Git, and has had twenty years of development and testing since 2006. After \u003ca href=\"/articles/fossil#work-done-on-fossil\"\u003ehelping Fossil make some changes\u003c/a\u003e I now use Fossil for several projects. I also use Git extensively on various software forges (but not GitHub unless I must). \u003ca href=\"https://en.wikipedia.org/wiki/Mercurial\"\u003eMercurial\u003c/a\u003e is actively maintained but has lost most of its mindshare since \u003ca href=\"https://en.wikipedia.org/wiki/Mozilla\"\u003eMozilla\u003c/a\u003e, Bitbucket and others migrated away, and is rarely chosen for new projects today.\u003c/p\u003e","title":"Fossil"},{"content":"LumoSQL protects data on mobile phones using a new data storage technology which is highly compatible with most existing devices. With LumoSQL, the device owner has ultimate right to decide who can read or change their data\u0026hellip; and this decision continues to be enforced even after it has been copied off the phone to (for example) a bank or insurance company for processing with their in-house database software. In contrast, the situation at present is that device owners are rarely in control of the privacy of their own data, despite many laws relating to privacy.\nIf a criminal or government officer takes a phone away from its owner, LumoSQL data cannot be read without the consent of either the phone owner or someone(s) to whom the phone owner has granted access. This is fine-grained, meaning different levels of permission can be granted.\nWhy this matters A typical mobile phone stores all its non-streaming data — contacts, messages, health records, browsing history, app settings — in several hundred small databases, all using the same software: SQLite. SQLite is very likely the most-deployed software in the world, by a factor of at least four zeros. It is likely the only trillion-scale software in existence. Web browsers use it. Operating systems use it. Vehicles use it. It is a standard data format relied on by companies such as Airbus whose products last decades.\nAnd yet SQLite stores all of this data unencrypted. Features such as whole-device encryption do not really address the problem — once the phone is unlocked, every app and every intruder can read every database. It is unacceptable to store personal data this way, but SQLite works so well and is so ubiquitous that people can\u0026rsquo;t imagine an alternative.\nLumoSQL wants to be that alternative. It modifies SQLite (rather than replacing it) so that existing apps continue to work, but the data they store gains real protection. The key idea is a Lumion — a piece of data that carries its own access rules with it, like a letter in a locked envelope where only the people named on the envelope can open it, even after it\u0026rsquo;s been forwarded. A Lumion attached to an email, saved to a cloud backup, or copied into a corporate database still enforces its owner\u0026rsquo;s permissions. The phone owner decides who can read or change each piece of data, and that decision travels with the data wherever it goes.\nThe detailed strategy document LumoSQLMotivation-1.0.pdf explains the social, business and technical pressures behind LumoSQL. As the strategy document says, \u0026ldquo;LumoSQL assumes software development should never be relied on and is getting worse\u0026rdquo;.\nSurprising facts about SQLite The following facts often surprise people:\nSQLite is a full-featured database, supporting the standard SQL language despite being tiny compared to all the other mainstream SQL databases. SQLite is open source, exceptionally well-maintained, mostly by just 3 people. Many more people contribute occasionally to SQLite, and the community of deeply technical users is very large. SQLite version 3 is a standard data format, and relied on by companies such as Airbus whose products last many decades. SQLite is exceptionally reliable, given the policy decisions not to change certain fundamentals. The corollaries to these unusual facts are significant:\nThis is uncharted territory for Computer Science: is SQLite\u0026rsquo;s ultra-conservative compatibility commitment to its (at least) hundreds of billions of installations the right choice? Is SQLite\u0026rsquo;s fast-moving support of formal database standards the best way forward? It does certainly work very well. Of the many forks of SQLite, none seem to have more than a relatively trivial deployed base (likely at most a few handfuls of millions.) This includes seemingly very useful forks. It appears the nature of the SQLite project ensures its success and discourages replacements, but also constrains its future in important ways. The obvious strategic problems with SQLite are not widely discussed. But why? Perhaps because SQLite works so well and is so ubiquitous that people can\u0026rsquo;t imagine an alternative. The world needs a new option, compatible with SQLite. LumoSQL wants to be that option, and more. Funding and contributors LumoSQL exists due to the volunteer effort contributed by many skilled people. The Vrije Universiteit Brussel funded valuable contributions in cryptography and mathematical analysis. The NLnet Foundation funded Phase I and Phase II.\nFor developers For even more detail see the code development page.\nLumoSQL is a modification (not a fork) of the SQLite embedded data storage library. It uses Predicate Based Encryption to implement Lumions — the encrypted, self-governing data units described above. LumoSQL offers multiple key-value backend storage systems selectable by the user, and features not found in any other mainstream database:\nability to checksum every row on write and verify on read ability to trigger arbitrary functions on per-row read and write a general test suite for benchmarking precisely how LumoSQL (or SQLite) is performing and the full context of that benchmark run. For some reason database benchmark is very poorly done, including by the TCP-C consortium founded for solely that purpose. a general build system able to mix and match multiple versions of the database with multiple versions of multiple backends. Never before has it been possible to compare the different strategies of various Key-Value stores with the same database frontend. If you are an SQLite user familiar with C development wanting an easier way to benchmark and measure SQLite, or if you are wanting features only available in other key-value storage engines, then you will find that LumoSQL offers new features even in its prototype stage.\nIn Phase II LumoSQL is implementing at-rest encryption and privacy using the features developed in Phase I, and readying LumoSQL for more general testing.\nNotable outcomes from LumoSQL already include:\nThe only mainstream database with swappable Key-Value stores, where all stores are peers rather than one store having special knowledge that gives it technical advantages. The two stores we have concentrated on so far are (1) the existing SQLite store with optional, binary-compatible modifications for encrypted rows and tables and the associated metadata and (2) the LMDB memory-based store which may have advantages when uses with the most modern high-performance RAM-based storage hardware. We look forward to integrating other stores, and to prove the point we also supply the ancient Oracle BDB backend as an example third store. The only mainstream database optionally without a Write-ahead Log, when using the LMDB storage backend. preliminary Benchmarking results The Not Forking tool, which avoids forks in both simple and complicated source code Exciting things in progress:\nLumions are described in a draft RFC for universal encrypted blobs with authentication. For the first time, a piece of data can have all the security rights of a full database, even it is called \u0026ldquo;mydata.txt\u0026rdquo; and attached to an email. LumoSQL uses Lumions as rows and tables in SQLite but is just one use case. As soon as the cryptographic design has settled we will update this RFC and consult even more widely Documented API for arbitrary key-value stores Documented API for accessing the key-value stores via the SQLite library, instantly making the SQLite key-value store the most widely-distributed key-value store. Nothing calls the SQLite key-value store today except SQLite ","permalink":"https://shearer.org/articles/lumosql/","summary":"\u003cp\u003e\u003ca href=\"https://lumosql.org\"\u003eLumoSQL\u003c/a\u003e protects data on mobile phones using a new data storage technology which is highly compatible with most existing devices. With LumoSQL, the device owner has ultimate right to decide who can read or change their data\u0026hellip; and this decision continues to be enforced even after it has been copied off the phone to (for example) a bank or insurance company for processing with their in-house database software. In contrast, the situation at present is that device owners are rarely in control of the privacy of their own data, despite many laws relating to privacy.\u003c/p\u003e","title":"LumoSQL"},{"content":" I participated in many battles directly against Microsoft in the Ballmer era, 1998-2014. Every Samba feature release seem to further anger Microsoft. Copyright and then especially patents were weaponised, as well as well-funded hit teams aimed at spreading confusion and intimidating their own (Microsoft\u0026rsquo;s!) customers. In the Nadalla era from 2014-present, Microsoft and other tech giants are using even more brutal ways (paracopyright, technical protection measures and the Unitary Patent System), to coerce citizens and governments.\nHere is the PDF for my Ballmer-era Microsoft patent examination process. It is for companies/developers wanting to implement a Microsoft protocol. While still largely valid, in 2026 the landscape has shifted considerably.\nThe immediate urgency of the software patent issue went away in the US following US supreme court decisions, but in Europe they persist in a worse fashion.\nIf you are a rights holder or developer of non-physical IP, the protective actions available to you vary depending on your circumstances, but include:\njoin and/or contribute to patent pools especially the Open Invention Network. If you run an open source project you should be at least speaking with a patent pool. Hopefully a patent pool will allow you to completely avoid the difficult and time-consuming steps listed on this page. approach one of the several organisations who exist to provide pro bono advice to open source software developers scrutinise licenses from the point of view of patents. This is what I have had to do repeatedly (despite also following the previous two points.) Here is an excerpt from my process, after navigating whether your code is covered by a safe exemption:\nFrom one of the previous steps you have a list of patents that appear to be relevant to your work implementing protocols. Microsoft has patents which it claims covers protocols you think are the same or similar to protocols you have implement. Your list consists only of patents that Microsoft are claiming read on that protocol, and which Microsoft say they will enforce, noting the various complicated exceptions given in steps 1 and 2. You have considered territorial implications, and from this point on you are in a standard patent evaluation scenario. Get a software patent lawyer:\n❌ A developer is not a software patent lawyer. ❌ A commercial lawyer is not a software patent lawyer. ❌ A patent lawyer is not a software patent lawyer. Your task is to compare three things with each other:\ntechnical descriptions in the patent, with the technical description in your code, with the technical description in the Microsoft protocol specification Any of these might express the same concept in very different ways, or may just seem to.\nGood luck.\n","permalink":"https://shearer.org/articles/patent-process-ballmer-era/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eI participated in \u003ca href=\"/articles/timeline-copyright-patents-samba\"\u003emany battles\u003c/a\u003e directly \u003ca href=\"/research/how-to-replace-windows-nt-with-linux\"\u003eagainst Microsoft\u003c/a\u003e in the Ballmer era, 1998-2014. Every Samba feature release seem to further anger Microsoft. Copyright and then especially patents were weaponised, as well as well-funded hit teams aimed at spreading confusion and intimidating their own (Microsoft\u0026rsquo;s!) customers. In the Nadalla era from 2014-present, Microsoft and other tech giants are using \u003ca href=\"/articles/software-patents-tpm-paracopyright\"\u003eeven more brutal\u003c/a\u003e ways (paracopyright, technical protection measures and the Unitary Patent System), to coerce citizens and governments.\u003c/p\u003e","title":"Patent process for Ballmer-era Microsoft Software Patents"},{"content":" This timeline covers the period when Microsoft decided free software and Samba in particular was an exisential threat. Microsoft often buried competitors in expensive legislation, but turned out to be much more difficult to bury open source like Samba. This was the Ballmer era, named after the then-CEO, and the history of Samba\u0026rsquo;s triumphs feels highly relevant to 2026 where other giant companies seek to prevent the rise of open source competitors.\nIn 2014, Microsoft got a new CEO and dramatically changed course from explicit hostility to embracing open source. The battleground is now about paracopyright and preventing non-US cloud but it has its roots in the great open source IP battles of the 21st century.\n1993 First Open Source Samba Version 1.5 released under the GNU Public License v2 (GPL). 1998 Microsoft aggression strategy Halloween documents leaked . Samba targetted by strategy to decommoditise protocols. Linux labelled as \u0026ldquo;cancer\u0026rdquo;. 2001 SCO war begins SCO claims ownership of Linux code. Microsoft joins SCO; IBM backs community. Massive community engagement, ultimately successful in 2007. 2002 Patent wars EU Parliment proposes software patents, massive open source resistance, finally killed in 2005. Microsoft targets Samba with CIFS licence excluding the \u0026ldquo;cancerous\u0026rdquo; GPL. 2005 Collectivist patent responses Open Invention Network (OIN) founded as a mutual defence pact against Microsoft with IBM, Red Hat, Novell, Sony, Philips, and eventually thousands of others. Software Freedom Law Center founded, advising Samba and others on patent risk. 2006 Betrayal and response Novell goes against OIN and signs patent deal with Microsoft targetting Linux and Samba. GPLv3 development teams respond with anti-Novell clause to protect Samba. Software Freedom Legal Conservancy launched, Samba joins immediately. 2007 Major successes GPLv3 public release in Edinburgh, Scotland. Samba adopts GPLv3. Microsoft loses 13-0](https://curia.europa.eu/juris/liste.jsf?num=T-201/04) on Samba in the EU Court of First Instance, forced to allow patent-protected access to Microsoft protocols under the strange and complex PFIF. 2008 Strange times Microsoft publishes interoperability principles plus 30,000 pages of protocol documentation in public. EU Commission imposes €899 million fine 6 days later, sceptical of Microsoft sincerity. Samba began to cooperate directly with Microsoft on protocol implementation. 2010-2012 Patent shakedowns Microsoft makes 2 Billion annually from patent threats . 2014 Sinister new Nadalla era Ballmer retires, new CEO Nadella changes course and loves linux ","permalink":"https://shearer.org/articles/timeline-copyright-patents-samba/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eThis timeline covers the period when Microsoft decided free software and \u003ca href=\"/articles/samba\"\u003eSamba\u003c/a\u003e in particular was an exisential threat. Microsoft often buried competitors in expensive legislation, but turned out to be much more difficult to \u003ca href=\"library/patent-process-ballmer-era\"\u003ebury open source like Samba\u003c/a\u003e.\nThis was the Ballmer era, named after the\nthen-CEO, and \u003ca href=\"/articles/samba-historytxt\"\u003ethe history of Samba\u0026rsquo;s triumphs\u003c/a\u003e feels highly relevant to 2026 where other giant companies seek to prevent the rise of open source competitors.\u003c/p\u003e\n\u003cp\u003eIn 2014, Microsoft got a new CEO and dramatically changed course from explicit hostility to embracing open\nsource. The battleground is now \u003ca href=\"/articles/software-patents-tpm-paracopyright\"\u003eabout paracopyright and preventing non-US cloud\u003c/a\u003e but it has its roots in the great open source IP battles of the 21st century.\u003c/p\u003e","title":"Copyright, patents, Samba and Microsoft"},{"content":"In 2026, the Samba Project is nearly 30 years old and has conservatively a billion users. Samba started when I got upset at Microsoft for trying to monopolise all computer networking. I discovered some unmaintained but interesting open source software for sharing files and printers with workstation computers. And the rest is the official Samba history.\nSamba is implemented by talented software engineers with a very large number of total contributors. I was (and remain) most interested in interoperability architecture and design, why these things are needed and make sense to users. Plus some protocol analysis, for example, technical readers may know the NTLMv2 encryption scheme was tricky, but turned out to be the same as used in the NTFS filesystem - NTLM is deprecated in favour of Kerberos now but those were the days. I wrote How to Replace Windows NT with Linux, explaining protocol-first strategies for removing Microsoft software.\nAnd above all else, the question of IP threats hung over Samba for two decades, as Microsoft used its immense financial resources to try to drive Samba off the internet.\nDuring its first decade:\n...one of the highest-prestige projects in the present open-source world is Samba --- the influential essay \"The Cathedral and the Bazaar\" Samba is a giddy story of adversarial interoperability, protocol analysis often wrongly called reverse engineering, IP rights, threats from Microsoft, cybersecurity, a giant European court case, startup companies, lawyers and engineering excellence. I got involved with Samba because I needed it to solve my own problem sharing files and printers at the University of South Australia, which was increasingly having its infrastructure taken over by the Microsoft monopoly. I could see there was a bright future for drop-in replacements for Microsoft network servers, and fortunately some engineers with talents beyond my own agreed with me. It was an interesting ride for many years!\nSamba was the first software to have the right of compatibility affirmed by the EU Court of Justice, after an epic series of cases finishing in 2012. The EU Commission learned it could fight a giant American tech corporation and win, something it is trying to do today in the Trumpian era of even more naked tech and trade aggression.\nComprehensive failure by its own measure Samba has fallen far short of its promise to be a \u0026ldquo;drop-in replacement\u0026rdquo; for Microsoft servers in the full sense including ease of deployment. That would have made it ubiquitous in every company and home in the world. Microsoft and Amazon\u0026rsquo;s hybrid cloud solutions would have looked very different. Unfortunately, no matter how advanced Samba becomes now, the opportunity for Samba to rule the world seems to have passed (or has it? see further down for news 2024-2026 \u0026hellip;)\n💡 Key Point The Samba Project is thus a thriving open source project with a billion-plus users, which nevertheless fails in its original primary objective. Samba is still developed and is still impressive. The estimate of a billion users is due to its inclusion in many embedded devices (eg printers, photocopiers and cameras) and giant file storage systems. Only Samba and Microsoft completely implement Microsoft\u0026rsquo;s vastly complex Active Directory specification, which is at the heart of a majority of the world\u0026rsquo;s corporate IT infrastructure. Samba Team engineers continue to release reliable code, with a core team holding steady at around 30 members made of volunteers and developers funded by many companies.\nReverse and Forwards Engineering/Protocol Analysis Samba started as a protocol analysis project (not strictly \u0026ldquo;reverse engineering\u0026rdquo;, which is incorrect as noted above) to provide users with the same experience as having a Microsoft server. There were some additional benefits for users due to being based on Linux and open source. After the astonishing 13-0 loss in the EU Court of First Instance (now the General Court) in 2007, Microsoft started sharing the written standards for how to communicate with their servers.\nYou can download the current protocol specifications for Windows networking, and implement them any way you like. At last we could really know what we were doing in developing Samba! Microsoft would no longer control all file servers and directory servers in the world!\nBut disappointingly, it didn\u0026rsquo;t turn out that way.\nWhy did Samba fail to succeed? Part of the reason Samba fell short stems from the social and psychological difficulty of turning a adversarial engineering project into forwards engineering (or, from inductive reasoning to deductive reasoning.) That is inevitably what happened when the full documentation for the SMB protocols became available. The architectural possibilities are very different if you have the documentation. The development cadence should have changed completely to reflect that. Companies were very keen to participate in this new opportunity, and deployments in the cloud were an obvious next step. None of that happened, perhaps because Samba was no longer the only alternative in the market, and was no longer driven by the brilliance of dogged individual discovery. Additional skills, such as user interfaces, were needed for Samba to become as ubiquitous as Microsoft in the server market, and those skills were never applied.\nNevertheless I am very proud of Samba, and I enjoy seeing its continued technical growth. Samba seems like it has plenty of future yet.\nWonderful news In 2024 Germany\u0026rsquo;s Sovereign Tech Fund gave Samba a substantial grant to be used for protocol engineering, with funding finishing in February 2026. While constant and advanced protocol engineering keeps Samba as a drop-in replacement on the network, or close, this probably still won\u0026rsquo;t make Samba the seamless drop-in product replacement because that would require a user interface project and the Samba code is now elderly. But perhaps it will, and in any case it is possible that ways will be found to keep the large and complicated Samba codebase fresh. Samba has been built on a tiny fraction of the budget of the Microsoft network engineering team even if you count the volunteer hours, and maybe this will balance things out a little. I hope so!\n","permalink":"https://shearer.org/articles/samba/","summary":"\u003cp\u003eIn 2026, the \u003ca href=\"https://en.wikipedia.org/wiki/Samba_(software)\"\u003eSamba Project\u003c/a\u003e is nearly 30 years\nold and has conservatively a billion users. Samba started when I got upset at Microsoft for trying to monopolise\nall computer networking. I discovered some unmaintained but interesting open source software for sharing files\nand printers with workstation computers. And the rest is the \u003ca href=\"/articles/samba-historytxt\"\u003eofficial Samba history\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eSamba is implemented by talented software engineers with a very large number of\ntotal contributors. I was (and remain) most interested in interoperability architecture and design, \u003cem\u003ewhy\u003c/em\u003e these things are\nneeded and make sense to users. Plus some protocol analysis, for example, technical readers may know the NTLMv2 encryption scheme was\ntricky, but turned out to be the same as used in the NTFS filesystem -\n\u003ca href=\"https://learn.microsoft.com/en-us/windows/whats-new/deprecated-features\"\u003eNTLM is deprecated in favour of Kerberos now\u003c/a\u003e but those were the days.\nI wrote \u003ca href=\"/research/how-to-replace-windows-nt-with-linux\"\u003eHow to Replace Windows NT with Linux\u003c/a\u003e, explaining\nprotocol-first strategies for removing Microsoft software.\u003c/p\u003e","title":"Samba"},{"content":" I have an interest in non-electronic computers as an educational tool. An horologer is someone who makes mechanical clocks and watches, and horologers definitely don\u0026rsquo;t believe in electronics. That\u0026rsquo;s why I published Shearer, D. (2007). \u0026quot;Communication: A Request for Collaboration.\u0026quot; Horological Journal, 149(12), p. 471., which sounds much fancier than the letter to the editor it was. Given I didn\u0026rsquo;t even know the British Horological Institute existed until a week prior it makes me very pleased.\nI believe everyone can benefit from the principles, but especially the lawyers, politicians and those involved with privacy and law enforcement really needs to what a computer is. The law defines what can and can\u0026rsquo;t be done with a computer, and to some extent it even defines what a computer is.\nI think one of the best ways to picture the essence of a computer is to have one in your hand that uses no electronics. Electronics is great, but we don\u0026rsquo;t need it to have thinking machines.\nAs some background, on my desk I have this little beauty:\nThis has nearly 700 parts, and was invented by the Willgodt T. Odhner in St. Peterburg, Russia, in 1874. It was mass-produced in Sweden 1935-1945. Many companies sold copies through the 1970s.\nIt can store numbers and do long division and multiplication - turn the handles, push the buttons, ding! But despite being an essential tool of business for a century, it\u0026rsquo;s only a calculator, not a computer. I picked it up in a flea market in Helsinki :-)\nMy hand-operated calculator:\nwould not meet most legal definitions for being a computer. It can\u0026rsquo;t store a list of instructions, and its programming (long division implemented in cogs and levers is still a program!) cannot be changed. would, when in use, meet the legal definition for processing personal data under the GDPR. I could use it to add up your expense records for the month, and the amount of the last expense and the total of all expenses would remain stored and readable in the machine when I finished. (I seriously doubt this calculator would be used to cause GDPR breaches! But it is important to understand the principle of these definitions.) In order to be a computer, it needs a decision system capable of reading a programme and executing it, or what we would call a CPU in an electronic computer.\nThe Babbage Analytical Engine was a full mechanical computer designed in 1837. The design has been demonstrated many times since to be workable, and there were even programs written for it by Ada Lovelace. She was the first to realise the Difference Engine was much more than a calculator.\nThe 1937 Z1 computer, built in a small home in Berlin, was a fully functional mechanical computer, using electrical motors to turn and move the components:\nThe Z1 was soon destroyed in bombing raids, but the Z1 was the first recognisably modern computer.\nWhen I give a talk about \u0026ldquo;What is a Computer?\u0026rdquo; I usually play this clip from the US Navy showing their mechanical fire control computers. This is (a) fascinating and (b) a reminder that very often computing advances are first used to do a better job of killing people. This is certainly the case with AI in 2026.\n\u0026hellip; and all that explains why I wrote in the January 2007 issue of the British Institute of Horologers! I got several replies from horologers (real actual clockmakers!) but didn\u0026rsquo;t achieve my goal. I have made some progress though. What I really want to see exist is an actual working clockwork computer that performs useful tasks we can recognise from today\u0026rsquo;s world of computing. It\u0026rsquo;s clearly feasible.\nA Design Challenge for Horologists January 22, 2007\nDan Shearer dan@shearer.org\nUntil this month I hadn\u0026rsquo;t even heard of horology. I\u0026rsquo;m a computer scientist, occupied with what people do with electronic technology and software, and what they do to people. Over the years I\u0026rsquo;d seen clocks in museums piece, marvelled at the old navigators, and once I read an article on apprentice horlogers in Geneva. But after meeting some lawyers recently I realised I had to learn about watchmaking.\nHere is the challenge:\nI need to design a fully clockwork computer. The computer must be a work of horology, not merely mechanical engineering. It must function recognisably like a familiar electronic computer, accepting commands from a keyboard to run programs and display results on a screen.\nThis article explains my motivations. As I did the research, I realised that with probably just two advances in horology such a design could become reality. I wrote a second article discussing in more detail the practical implementation issues involved.\nA Computer? Why? Like everyone else, I\u0026rsquo;m affected by laws involving computers. Laws tell me what I\u0026rsquo;m allowed to do with a computer, and if I become a victim of computer crime I need help from the law. But the more lawyers I met the more I realised I won\u0026rsquo;t get the help I need if the people in the legal system can\u0026rsquo;t even recognise a computer when they see one. More broadly, we live in an age where computers surround us, often invisibly – and computers process data, data that can clear me or convict me, save my life or endanger it. It is a trifle worrying that the individuals who can care for me or accuse me, educate, defend or prosecute me are likely to overlook computer data involved since they\u0026rsquo;re thinking “oh, a kind of beige box with a keyboard and screen”. How are they to realise that the laws governing the computers in their life affect them hundreds of times a day?\nSo I started looking for an unforgettable illustration. Something to show a computer is a thing that does computing. It doesn\u0026rsquo;t even need electronics, let alone a beige box. That\u0026rsquo;s what lead me to clockwork. There is something homely and understandable about machinery that goes \u0026rsquo;tick-tock\u0026rsquo;, in contrast to the seeming magic of electronics. I want people to think about the notion of computing rather than a computer.\nMy new UK passport contains a computer too, programmed (as shown by The Guardian) to give all its information to anyone that asks, without a password. If the chiefs of the Home Office understood that the new passport was as much a computer as their own laptops, might they have given their computer experts better instructions?\nHorological View of a Computer A computer is any device which can:\nobey instructions (e.g. add 48 every 1 time the instruction occurs) store a list of instructions (e.g. add 48 this time, then 36 next time, etc.) receive and remember information (e.g. when someone turns a winder) decide which instructions to do next, and when to accept information Except perhaps the last point, the list (and the numbers) should be familiar to horologists. It describes a stored program computer, something computer science calls a Von Neumann Architecture. We\u0026rsquo;ll look at components of a Von Neumann-type machine, and how they might be viewed in terms of mechanical devices. One of the most striking things is that horology already comes close to a lot of the functionality.\nInput A device that receives information, maybe from a human. Examples: Someone typing on a keyboard from a manual typewriter. The information might be in response to a question (“How old are you?”). Output Makes information available directly to humans by displaying it somehow. Most like a traditional computer would be interactive screen output via a split-flap board, like most railway stations used to have (remember the flick-whirr when it was updated?) Typewriter output on paper would be another option. Memory For storing information so it can be accessed later. The basic unit of information in computing is usually an “on” or an “off”. So if you want to store the word “Clock” it gets translated into a series of ones and zeros, which are then stored by on/off switches. Horologists know all about programmable switches, which mean “if the switch is set then take one action, if it is not set do something else”. The extra twist is to have a way of detecting whether the switch is “on” or “off”. The ability to detect switch setting is called “reading memory”. Once you can do that it is a matter of having a lot of these readable switches to give the computer a reasonable amount of memory. With these two issues solved, the ones and zeros corresponding to the word “Clock” can be written to memory by setting and unsetting a series of switches, and later read back. Arithmetic and Logic Unit For doing operations with numbers. Older readers will remember mechanical adding (or calculating) machines that were manufactured in quantity up until the late 1970s, a centuries-old idea. Besides adding, multiplying etc. there\u0026rsquo;s one or two other operations but none of these should be technically difficult to design from a horological point of view. Control Unit Executes lists of instructions, or programs. Probably the only component that doesn\u0026rsquo;t have anything in common with horology (as far as I know so far!), this unit directs the flow of events. For example looking up a number in memory and telling the Arithmetic and Logic Unit to add 48 to that number, then store the result somewhere else in Memory, or maybe Output it. The Control Unit is the real brains of the show, and is in charge of executing programs. The Missing Magic Having these components of a notional computer are all very well in theory, but they aren\u0026rsquo;t quite enough for a useful computer. Computer science has come up with some ways of tying them together, one of which is straight out of horology.\nBus An information channel between the foregoing components. Implementing this in clockwork will require some ingenuity. In a silicon computer the Bus is like a copper wire linking the memory, control unit and so on, allowing electricity to travel between them. With horology we need to get information (such as the word “Escapement”) from the Memory to the Output, or from the Control Unit to the Arithmetic and Logic Unit. An example (but I don\u0026rsquo;t necessarily suggest feasible!) way might be to have an oscillating central bar containing whiskers that can be pushed in and out to indicate different values, where the whiskers are adjusted by levers immediately next to the levers used to read the values and each oscillation moves the location of the whiskers from the setting levers to the reading levers. I\u0026rsquo;m not covering implementation challenges in this article, but its worth reflecting that Bus speed is a vital issue for how practical this computer will be. Clock Signal a single master beat that is used to synchronise all other activity in a computer. If we\u0026rsquo;re fetching information from Memory using the Bus or a performing a calculation, the Clock Signal is the only way of making sure we\u0026rsquo;re not tripping over ourselves by using the wrong number, or the right number twice etc. Increasing the speed of the clock signal – assuming all the other components can keep up – is one way of speeding the entire computer up. Storage Like Memory, but lasts longer and is usually bigger. A mechanical equivalent of a filing system. You put information in and can get it back out when you want it. A storage system can be punched cards, or pianola-like punched paper rolls, or small plastic cards with very fine ridges and dips after the style of a music box\u0026rsquo;s data. There have been storage systems in use since early days of the industrial revolution, and I\u0026rsquo;ll be surprised if there isn\u0026rsquo;t at least one horological tradition of using them somehow! The Other Reasons Why A clockwork computer may actually be useful for reasons other than educating Her Honour.\nPhysical Longevity: We have a good idea what happens to clockwork after a few hundred years, but there are real question marks surrounding all forms of silicon computers. Nobody really knows what happens to transistors as the centuries roll by, and if you need a computer for a simple task such controlling the doors in an long-term nuclear waste storage facility perhaps a clockwork might be better. Watch making techniques and materials can produce such tiny and reliable systems that they may be worth considering for these tasks.\nPhysical Robustness: There are a few physical environments where intense radiation makes electronic computing inherently unreliable. For very simple tasks, might clockwork computing be useful?\nMicromechanics: A lot of research is being put into machines made of components that are truly tiny. Scientists are creating gear wheels that are a comfortable size for an ant to pick up, and have been experimenting with tiny geartrains, levers and so on since the 1980s. This is a very practical field of research and there are results in production now. One of the interesting things about micromachines is that they can often be mass produced using photolithographic techniques. A practical design for a clockwork computer might be able to be applied at this scale of engineering. I am cautious because friction is more important in microengineering rather than less, but perhaps some of the other physical effects may compensate such as inertia with high oscillation rates.\nConceptual Longevity: A generation of silicon-based computing equipment lasts maybe two years before becoming obsolete. When communicating with far-distant generations, maybe it might be wisest to provide the design for a conceptual clockwork computer and then the programs that can run on that, rather than anything electronic. Nobody has ever built a Babbage Analytical Engine (see the next article for more about Charles Babbage and his mechanical computer from two centuries ago) but there is a computer simulation of capable of running programs written by Babbage and his students. A communication consisting of a series of computer programs accompanied by schematics of a physical computer that will certainly run these programs is an extremely clear communication. Any technically sophisticated person would merely implement an emulation of the computer rather than the actual clockwork, but they will have no difficulty understanding the design because it is simple mechanical principles.\nConclusion Why horology? I could have approached robotocists, who spend their lives at the mechanical end of computing. But I think a robotocist has rather too much silicon thinking already, and besides they like to use hydraulics and other very clunky techniques. I can imagine a computer without electronics that is as incomprehensible in its design as any silicon computer! Using techniques of robotics seems as far from horology as Babbage\u0026rsquo;s mechanical engineers. I want that \u0026rsquo;tick-tock'.\nI\u0026rsquo;m also intrigued by my reading so far that very little seems to have changed in horological principles in the last 120 years or so. Techniques have improved, and tolerances, and modern materials and tools are a help. But there hasn\u0026rsquo;t really been a need for there to be a fundamental advance in horology. The history of technology shows that where there is a clear need, sooner or latter innovation meets that need. Might a clockwork computer be a way of advancing horology fundamentals for the first time in more than a century?\nIn the next article I\u0026rsquo;ll consider some of the design issues. I\u0026rsquo;m looking for horological expertise to help draw up a basic design. In fact, I\u0026rsquo;m even looking for someone who knows how to make a design for a watch, because I certainly don\u0026rsquo;t! If you are interested, do please contact me, dan@shearer.org.\n","permalink":"https://shearer.org/notes/design-challenge-for-horologists/","summary":"\u003cdiv class=\"article-intro\"\u003e\n\u003cp\u003eI have an interest in non-electronic computers as an educational tool. An horologer is someone who makes mechanical clocks and watches, and horologers definitely don\u0026rsquo;t believe in electronics. That\u0026rsquo;s why I published \u003ccode\u003eShearer, D. (2007). \u0026quot;Communication: A Request for Collaboration.\u0026quot; Horological Journal, 149(12), p. 471.\u003c/code\u003e, which sounds much fancier than the letter to the editor it was. Given I didn\u0026rsquo;t even know the \u003ca href=\"https://bhi.co.uk/\"\u003eBritish Horological Institute\u003c/a\u003e existed until a week prior it makes me very pleased.\u003c/p\u003e","title":"A Design Challenge for Horologists"},{"content":"How to manage BibLaTeX across time and cultures I wrote a paper in English using LaTeX on the topic of Epidemiology and One Health. Some essential references did not exist in English. That might sound simple \u0026mdash; just list the originals, plus some translation/cross-referencing work to get the necessary information! It isn\u0026rsquo;t that simple.\nThis howto is for LaTeX authors with references which are less common in computing/mathematics but otherwise unremarkable, particularly: non-latin scripts, latinisations, non-English references, rare scripts and ancient documents. My sources had all of these at once, giving me the following situation:\nCategory Range in my references Language non-latin scripts Chinese, Sanskrit, Spanish, Arabic Language latinisations Chinese, Sanskrit, Arabic Eras ancient (3400BCE), less ancient (900CE), modern historical (1898) Right-to-left Arabic Dates precise, approximate, and ranges \u0026ldquo;Ancient\u0026rdquo; here refers to the reference text, not the subject of the text. My paper has references analysing Neanderthal cultures, but luckily Neanderthals did not publish books (as far as we know) so there were no awkward dates from deep time.\nThese considerations were entirely outwith my experience, and, as shipped by default and used in the UK, all anglocentric computer systems struggle with them. LaTeX relies on the 50 year old latin-centric TeX engine underneath, on top of which the modern LuaTeX project project has implemented Unicode language support. Besides these, there are anglo-normative problems. The arrangement and ordering of people\u0026rsquo;s names often differs from English. And sometimes the defaults are just strange, for example automatically cutting down a long list of authors into \u0026ldquo;and others\u0026rdquo; without asking (rude!)\nThe way LaTeX and BibLaTeX work is that the environment is set up in the .tex file, and then conventions in the .bib file match what the envionment is looking for.\nFirst, check your fonts 🔤 Why a font rendering test? LaTeX/PDF fonts are not the same as web fonts, and you are reading a web page right now that is talking about PDF fonts. I have made sure that the correct Web font WOFF2 files are present, but that still doesn\u0026rsquo;t mean you will be able to see them! If the text below appears to be proper script rather than lots of \u0026lsquo;□\u0026rsquo; characters then your browser and operating system are handling the fonts correctly:\nLatin: Test of latin characters with diacritic accents: · Lěng Kǎitài · Dā’irat al-Ma‘ārif\nChinese: 张伟 · 鲁迅 · 冷開泰 · 中文测试\nArabic: كتاب الحاوي في الطب · العربية\nSanskrit: सुश्रुतसंहिता · देवनागरी\n⚠️ Missing characters? (click to troubleshoot) Reasons why the web fonts from this site may not work for you:\nFirefox/Chrome on Linux: Check that you have CJK, Arabic, and Devanagari font packages installed at the system level (e.g., fonts-noto-cjk, fonts-noto-core, fonts-hanazono on Debian/Ubuntu). Browsers can sometimes fall back to system fonts even when web fonts are provided. macOS: Safari and Chrome use system font caches; if you previously viewed this page without fonts loaded, you may need to reload with Cmd+Shift+R. Windows: Ensure \u0026ldquo;Optional features\u0026rdquo; for Chinese, Arabic, and Indic languages are installed in Windows Settings → Time \u0026amp; Language → Language \u0026amp; Region. Mobile browsers: iOS Safari and Chrome Android aggressively prune fonts; the WOFF2 files may not load if they exceed a certain size thresholds or if \u0026ldquo;Low Data Mode\u0026rdquo; is enabled. Content blockers: Extensions that preseve privacy and block ads may also block font loading from same-origin or cross-origin requests. @font-face parsing: Some older browsers or strict security configurations reject the CSS @font-face declarations if font-family names don\u0026rsquo;t match exactly between CSS and the WOFF2 internal metadata. Manual installation links if web fonts fail:\nChinese □: Source Han Serif SC Arabic □: Amiri font Sanskrit □: Noto Serif Devanagari The practical LaTeX Following are excerpts from my .bib and .tex files. Your paper may have different requirements, and these are a mixture of mandatory correct usage of BibLaTeX, plus the occasional useful convention I invented.\nAuthors and titles For authors and titles with non-latin characters, always use this form in your biblatex file:\nauthor = {张伟}, shortauthor = {Zhang Wei}, nameaddon = {Zhang Wei}, The two identical English approximations are used differently by Biblatex: shortauthor is rendered in the main text eg:\nLeith residents built a big wall[Zhang Wei 2025] but for the same entry nameaddon is rendered in the bibliography eg:\nZhang Wei 张伟. Building the Great Wall of Corstophine. this really matters when things get more complicated, as you\u0026rsquo;ll see.\nLatinisation Where there are latinised versions of Chinese/Arabic/Sanskrit author names, they must appear in the nameaddon field. This wasn\u0026rsquo;t a problem in the example above because the only name used is the English approximation. However is often a latinised version that retains features of the original which English cannot express. For example:\nauthor = {鲁迅}, shortauthor = {Lu Xun}, \u0026lt;-- widely used English approximation nameaddon = {Lǔ Xùn}, \u0026lt;-- latinised equivalent (in this case pinyin) For English users, even though the script is latinised these systems still often require a specific font installed due to accents needed for specific sounds or grammatical features.\nThere are maybe 100 or so latinisation systems for encoding non-latin languages. Here are some examples:\nScript/Language Romanization System Chinese Pinyin Arabic Latin-i harakat Japanese Hepburn romanization (Hebon-shiki) Sanskrit IAST (International Alphabet of Sanskrit Transliteration) Korean Revised Romanization of Korean Russian BGN/PCGN romanization Thai RTGS (Royal Thai General System) Serbian Gaj\u0026rsquo;s Latin alphabet There is a similar but slightly different trick for handling latinisations in titles:\ntitle = {كتاب الحاوي في الطب}, titleaddon = {Kitāb al-Ḥāwī fī al-ṭibb}, \u0026lt;-- Latin-i harakat Which latinisation for references? Latinisation systems exist for people who prefer to use latin scripts for ease and speed, and in many cases as a response to ubiquitous Western-derived computer technology. Some of these systems are rapidly evolving, for example Chinese which now has Shuangpin, or double-pinyin.\nSome language families have a large number of different Latinisation systems. Arabic has three main systems: DIN, ALA-LC and Hans Wehr (a less offensive everyday term covering them all being \u0026ldquo;Latin-i harakat\u0026rdquo;.) Japanese has both Hebon-shiki (called Hepburn in English) and also Kunrei-shiki, while Korean has two latinisations, Mandang has three N\u0026rsquo;ko latinisations and so on. Which should be used in references? Without specialist advice there will always be uncertainty, so the reliable choice is to always include the original script as well.\nTranslations There is a problem in the Arabic title given above, because while titleaddon contains the official latinised script it still lacks an English translation. It\u0026rsquo;s great to have the latin script so you know what to search for if you don\u0026rsquo;t read Arabic characters (you can still copy/paste Arabic and that can be essential, but even in 2026 some computer systems still don\u0026rsquo;t handle Arabic very well.). So if there is an English translation of a title or an author, it is helpful to add it.\nIn this case the translated title should be in the note field, as follows:\nnote = {Translated as: The Comprehensive Book of Medicine} This isn\u0026rsquo;t just for Arabic, the same is true for Chinese. Chinese is a great example of this difference: pinyin latin equivalents are often supplied, but this is not a translation. There are often many ways to translate a given title to English. Where translations are few, partial or obscure, the English translation of the title/author may be so misleading to readers you are better off using the original. Even if you have no knowledge of the language and don\u0026rsquo;t read the script, a search engine is more likely to find information about a rarely-translated author if you use their native Chinese/Arabic/etc. name.\nRight-to-left scripts In the case of left-to-right scripts (which is the default for Chinese/Japanese/Korean (CJK) and latin-based languages) then the above conventions will work. These conventions can seem as though they work for Arabic, but there is still a problem due to script being written right-to-left. Biber detects the arabic text and switches to right-to-left so that the Arabic script is correct, unfortunately it also switches all text in the reference including latin characters whether for English or latinisation. So an Arabic reference containing a latin field, as they normally do, will have latin fields rendered like this:\n\u0026#39;Medicine of Book Comprehensive The\u0026#39; or \u0026#39;enicideM fo kooB eviseneherpmoC ehT\u0026#39; depending on context. To fix this, set the default language in the preamble to a left-to-right language such as \u0026lsquo;british\u0026rsquo; (as in this bibliography) or \u0026lsquo;chinese\u0026rsquo;. Then preserve the Arabic text in the reference exactly how it is written by enclosing it in double braces like this:\ntitle = \\textarabic{{كتاب الحاوي في الطب}}, Preservation Preservation with double braces is useful elsewhere too. Another common problem is that many latinised scripts contain special characters requiring double braces like this Arabic example:\npublisher = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}} ^ ^ ^ \\-----------\\--------\\--- breaks biblatex without {{double}} The double quotes preserve the string exactly as written. Another example, which applies to normal English BibLaTeX is the first/last author default assumption:\nauthor = {Dundee Museum} will render as \u0026lsquo;Museum, Dundee\u0026rsquo;, unless you say\nauthor = {{Dundee Museum}} Dates Dates use the EDTF (ISO8601-2) standard. BibLaTeX handles BCE/CE dates correctly but also avoids prefixes when it is pointless or distracting. The full syntax for dates is in the BibLaTeX user manual.\nLong lists of authors For very long lists of authors (such as [Meisner2024] in this bibliography) include all authors separated by \u0026lsquo;and\u0026rsquo;, rather than saying \u0026lsquo;, and others\u0026rsquo;. Biber has been setup in the preamble with max/mincitenames and maxbibnames so that it will render \u0026ldquo;et. al.\u0026rdquo; in the text, but render all names in the bibliography.\nFull example Here is a fictional example in full, handling both Chinese and right-to-left Arabic, with translations in latin script and correct use of -addon and note fields.\nauthor = {冷開泰}, name shortauthor = {Leng Kaitai}, English approximation nameaddon = {Lěng Kǎitài}, correctly latinised title = \\textarabic{{كتاب الحاوي في الطب}}, Right-left preserved titleaddon = {Kitāb al-Ḥāwī fī al-ṭibb}, latinised note = {Translated as: The Comprehensive Book of Medicine},English translation publisher = {{Dā’irat al-Ma‘ārif al-‘Uthmāniyyah}} latinised, with unsafe quotes NB \u0026lsquo;authoraddon\u0026rsquo; is not a valid field name, although it would seem logical that it would be instead of shortauthor.\nEntire real-world references look like this:\n@book{sushruta_samhita_1907, title = {The {Suśruta-Saṃhitā}}, titleaddon = {\\textsanskrit{सुश्रुतसंहिता}}, author = {Suśruta (composite work)}, nameaddon = {\\textsanskrit{सुश्रुत}}, translator = {Bhishagratna, Kaviraj Kunja Lal}, date = {1907}, origdate = {-0599~/-0499~}, publisher = {Calcutta}, url = {https://wellcomecollection.org/works/vnqskk8w/items?canvas=98\u0026amp;manifest=2}, note = {English translation of the original Sanskrit text (circa 600 BCE--500 BCE), including discussion on transmissibility. The \\href{https://www.wisdomlib.org/hinduism/book/sushruta-samhita-volume-2-nidanasthana/d/doc142863.html} {Wisdom Library translation} appears to be similar.}, keywords = {ancient}, } LaTeX setup for this bibliography %% %% Font setup for Latin and Chinese %% % The following Chinese font support requires an exact font name match Eg on my Linux: % I install the package adobe-source-han-serif-cn-fonts, followed by \u0026#39;fc-list | grep \u0026#34;Han Serif\u0026#34;\u0026#39;. % For reference, on my system \u0026#39;fc-list | grep Han\u0026#39; gives 136 lines, because % I installed all the Adobe CJK fonts (Chinese, Japanese, Korean) as recommended by CJK % experts. % The order of package loading really matters because some of the references use bidi % (bi-directional) text to display the relevant Arabic, for which there is no % translation to a Western Language. bidi was a retrofit onto latex and is a bit sensitive. % If bidi wasn\u0026#39;t needed, packages could be loaded in any order. These problems are steadily % reducing as lualatex is developed. Lualatex is really quite an impressive redevelopment. % Maths comes first in a bidi world \\usepackage{amsmath} \\usepackage{amssymb} % Fonts next for bidi ordering reasons \\usepackage{fontspec} \\usepackage{luatexja-fontspec} % CJK handling (not just ja). No equivalent needed for other languages. % Not needed at all except for bidi. It \u0026#34;stabilises arrays for bidi\u0026#34; according to experts. % I don\u0026#39;t understand but it did make errors go away. \\usepackage{array} % Polyglossia is needed to do language-aware hyphenating, date formats, quote style etc % in at least the csquotes and biblatex packages. Replacement for the older babel package. \\usepackage{polyglossia} \\setmainlanguage{english} \\setotherlanguage{arabic} \\setotherlanguage{sanskrit} % no \\setotherlanguage above for chinese, because luacjk handles this and we don\u0026#39;t want % panglossia and luacjk to get into a fight about who captures the incoming CJK unicode. % This potential clash is a recurring theme in this preamble. \\newfontfamily\\arabicfont[Script=Arabic]{Amiri} \\newfontfamily\\arabicfonttt[Script=Arabic]{Amiri} \\newfontfamily\\chinesefont{Source Han Serif SC}[ Renderer=Harfbuzz, Script=CJK, AutoFakeSlant=0.2, % CJK doesn\u0026#39;t have italics, so this does a bit of judicious tilting AutoFakeBold=2 ] \\ltjsetparameter{jacharrange={-1}} % Prevents luatexja from being too aggressive and % seizing all Unicode text that might be CJK, including parts of biblatex references % that are merely adjacent to CJK text. \\setmainjfont{Source Han Serif SC}[ Index=2, Renderer=Harfbuzz, AutoFakeSlant=0.2, CharacterWidth=Full, % Forces better mapping of CJK punctuation BoldFont={* Bold} % Explicitly point to the bold weight so Biber knows it exists ] % The Index=2 above is about mandating which version of a font to pick inside a TrueType collection. % In this case, 2 is Simplified Chinese. Latex sometimes gets confused and picks (say) % the Japanese version, so we are explicit. 0=Japanese, 1=Korean, 2=SC, 3=TC. % Now repeat the above, only for sans not gothic. This is a trick, the point being % that if latex wants to use a Han sans font, it will now use gothic instead. Reduces errors % and the result seems good. Need to check with a CJK expert. \\setsansjfont{Source Han Serif SC}[ Index=2, Renderer=Harfbuzz, AutoFakeSlant=0.2, CharacterWidth=Full, BoldFont={* Bold} ] % The above three commands (\\ltjsetparameter, \\setmainjfont, \\setsansjfont) collectively % avoid hundreds of warnings about missing fonts, and emit a better quality result. \\newfontfamily\\devanagarifont[ Script=Devanagari, HyphenChar=None, % Explicitly disable hyphenation, otherwise bibtex warns it can\u0026#39;t load hyphenation rules ItalicFont={Noto Serif Devanagari}, % This script doesn\u0026#39;t have italics or bold so we map them. BoldItalicFont={Noto Serif Devanagari Bold} % Unlike CJK where we fake slant. Also stops biblatex warnings. ]{Noto Serif Devanagari} \\setmainfont{TeX Gyre Pagella} \\newfontfamily\\bigquotefont{TeX Gyre Cursor} % used for big block quotes % The Gyre project (https://www.gust.org.pl/projects/e-foundry/tex-gyre/index_html) % explains it all, but basically these are TeX and OpenType font families similar to % the well-known commercial fonts with similar names, with significantly more functionality. % On my Linux I installed the package tex-gyre-fonts . % Disable all CJK small caps attempts. CJK doesn\u0026#39;t have smallcaps in the fonts, % but biber\u0026#39;s default is smallcaps for authors. It generates a warning when it can\u0026#39;t % and this avoids large numbers of warnings. \\let\\scshape\\upshape \\let\\textsc\\textup %% %% Referencing and quoting setup %% \\usepackage{authblk} \\usepackage[ backend=biber, style=authoryear, % alternatives include numeric, apa, etc. doi=true, url=true, isbn=false, datecirca=true, % prints \u0026#34;circa\u0026#34; if date is followed by a slash /. Don\u0026#39;t use tilda ~ convention. dateera=secular, % handles BCE and CE by printing \u0026#34;BCE/CE\u0026#34; dateeraauto=1600, % adds CE/BCE to anything before this year CE backref=true, % great idea, but sometimes gets confused with the preview feature dateabbrev=false, language=auto, % will change language according to langid, if present in an entry autolang=other, % Use polyglossia/babel environments (but I am unsure why I need to set it) maxcitenames=2, % Keep citations short: (Zhang et al., 2024) mincitenames=1, maxbibnames=99, % List all authors in the bibliography (who doesn\u0026#39;t? rude!) uniquelist=false % Prevents BibLaTeX from adding names to disambiguate ]{biblatex} % let long DOIs and URLs in bibliography break, avoiding overfull and other errors, % and looks nicer. \\setcounter{biburllcpenalty}{7000} \\setcounter{biburlucpenalty}{8000} \\setcounter{biburlnumpenalty}{9000} % Forces a gap between bib entries that PDF viewers can recognise as a boundary when doing % a mouseover preview in the main text. Also just makes a bibliography look nicer. \\setlength{\\bibitemsep}{1.5\\itemsep} % These mappings make sure biblatex doesn\u0026#39;t start translating locale specific things % like date formats or \u0026#39;Appendix\u0026#39;, \u0026#39;Bibliography\u0026#39; etc. Other languages are merely content. % \u0026#39;british\u0026#39; is equivalent to the modern en_GB locale standard. This also means that the % default is left-to-right even in an entry containing arabic text enclosed in \\textarabic{}. % This also suppresses error messages from biblatex about \u0026#39;Language not supported\u0026#39;. \\DeclareLanguageMapping{arabic}{british} \\DeclareLanguageMapping{chinese}{british} \\DeclareLanguageMapping{sanskrit}{british} % Force always printing \u0026#39;nameaddon\u0026#39; after the author name. I use nameaddon exclusively for latin versions of % Chinese (etc) names, so the effect is to render the real, untranslated name in the % references. See notes at top of biblatex file for details of translation in \u0026#39;note\u0026#39; field, % and the special case of right-to-left arabic script in author names. This macro has % completely replaced the authoryear macro, so it also made dates vanish until I added back here. % When forcing printing of nameaddon with here, we must use the nameaddon field not shortauthor. \\AtBeginDocument{ \\renewbibmacro*{author}{ \\printfield{nameaddon} \\setunit{\\addspace} \\printnames{author} \\setunit{\\addspace} % This macro handles the label (derived from the \u0026#39;date\u0026#39; field) % while respecting BCE/CE and circa formatting according to EDTF (ISO8601-2) dates. % Modern lualatex tries to be standards-compliant. \\usebibmacro{date+extradate} } } % Make biblatex do quotation handling as expected for a paper \\usepackage{csquotes} % The following macro seems to approximate the style I see in academic papers. \\DeclareCiteCommand{\\parencite} [\\mkbibbrackets] % replaces parentheses with square brackets {\\usebibmacro{prenote}} {\\usebibmacro{citeindex}% \\usebibmacro{cite}} {\\multicitedelim} {\\usebibmacro{postnote}} \\addbibresource{discovering-epidemiology.bibtex} ","permalink":"https://shearer.org/articles/scripts-and-languages-in-biblatex/","summary":"\u003ch1 id=\"how-to-manage-biblatex-across-time-and-cultures\"\u003eHow to manage BibLaTeX across time and cultures\u003c/h1\u003e\n\u003cp\u003eI wrote a paper in English using \u003ca href=\"https://www.latex-project.org/\"\u003eLaTeX\u003c/a\u003e on the topic of \u003ca href=\"/research/one-health-epidemiology\"\u003eEpidemiology\nand One Health\u003c/a\u003e. Some essential references did not exist in English. That might sound simple\n\u0026mdash; just list the originals, plus some translation/cross-referencing work to get the necessary information! It isn\u0026rsquo;t that\nsimple.\u003c/p\u003e\n\u003cp\u003eThis howto is for LaTeX authors with references which are less common in computing/mathematics but otherwise\nunremarkable, particularly: non-latin scripts, latinisations,\nnon-English references, rare scripts and ancient documents. My sources had all of these at once,\ngiving me the following situation:\u003c/p\u003e","title":"BibLaTeX, eras and scripts"},{"content":"(written in 2008)\nHow a young Australian discovered Open Source and a career. Eventually learning that a mixture of code, law and mathematics is a frontier for human rights battles.\nIt isn\u0026rsquo;t often I come face to face with myself after a twenty-something year break, but I did yesterday.\nAs a first year university student at the South Australian Institute of Technology in Adelaide I did landscape gardening oddjobs for companies. I noticed a company called Australian Launch Vehicles (ALV), which sounded very cool, so in I went. ALV was founded by a pair of entrepreneurial rocket scientists. Despite decades of rocketry history in South Australia, there was no local space industry. (Establishing Australian spaceflight in 1987 was ambitious; they failed but others are giving it a go.)\nGilmour Space TechnologiesThe Australian Eris Block 1 in November 2025\nThe founders kindly spent time talking to me, and explained that one of their biggest problems was that ground control software would be hideously expensive. Software? Now I was hooked. That comment had unintended consequences.\nThanks to my parents\u0026rsquo; foresight and provision I had already been using FidoNet PC modem-based bulletin board (BBS) networks in my upper school years. FidoNet was freely-distributed software and I had always been fascinated that you could actually talk to the developers.\nFidoNet is still running and useful to people the internet doesn\u0026rsquo;t reach. The FidoNet logo was a dog with a diskette in its mouth - the 80s version of emojii:\n__ / \\ /|oo \\ (_| /_) _`@/_ \\ _ | | \\ \\\\ | (*) | \\ )) ______ |__U__| / \\// / FIDO \\ _//|| _\\ / (________) (_/(_|(____/ FidoNet logo by John Madill\nAt the South Australian Institute of Technology I discovered another pre-Internet forum technology called Usenet, which did not run on PCs but on large room-sized computers and which was more extensive than the early internet at the time. Usenet also still exists today. I found it amazing, all those people doing what we now regard as normal Internet activities. I wrote a crude search engine that would crawl Usenet for my keywords overnight and email me relevant articles, and it seemed like an amazing kind of superpower at the time even though the amount of information available at the time seems tiny today, and users were all either in academia or technical occupations.\nI kept noticing the contrast between the software development model used to create Usenet and how software written written in the commercial world worked, where small companies and individuals working in isolation sold floppy disks of their work in the post or in shops. In 1988 the Usenet network itself had downloadable source code, patches and fixes being emailed out so you could keep up to date, and new software versions on a daily basis. Just like open source internet-based computing today.\nI was so enthused by what collaborative software development could do for spaceflight that I sketched out a plan, and posted to FidoNet in 1989 asking for help, although I had to ask someone with a mailbox to receive replies for me! I did have a University internet email address though, and with some help I posted this Usenet message you can still see today:\nThe South Australian company Australian Launch Vehicles is progressing well with its proposal for a low cost, unmanned, nonmilitary rocket to launch light satellites into low earth orbit. Significant commitment from engineering companies, component manufacturers and potential customers - both locally and internationally - indicate that the innovative concept has sufficient support to carry it through to completion. The simplicity of the design is such that the computational requirements will be within the power of a modern personal computer. Until recently it was assumed that the software needed for this computer (and also for the modest ground control installations) would be produced by one of the many commercial companies able to do so. However, it has been suggested that the software needs and other computing related issues could be better met by a coordinated effort in the international public domain. Software so produced would remain within the public domain, freely accessible to any interested parties. : (This message is posted on behalf of an Institute student who has been in touch with Australian Launch Vehicles in South Australia\u0026#39;s Technology Park. Mailed replies can be sent to him, Dan Shearer, MA870894z@levels.sait.oz.au ... ( Strangely\u0026hellip; once upon a time email addresses in Australia ended in \u0026ldquo;.oz\u0026rdquo;. Unfortunately as the internet grew it was agreed that the standard country code should be used, so they tacked on a \u0026ldquo;.au\u0026rdquo;, and nobody uses .oz anymore.)\nClearly I was feeling enthusiastic, because I also emailed the same message to the electronic postmasters of every organisation that listed its address in Usenet maps, turning me into a spammer before I had heard of the word - and people didn\u0026rsquo;t seem to mind either which would be pretty weird now. It is an odd feeling re-reading my words as a twenty year-old! I was deluged with hundreds of responses, many from seasoned computing and/or aerospace professionals with computing backgrounds who gave their time to an idea that seemed to touch a nerve. I spent many weeks corresponding with people all over the world. Best of all the Institute Computer Centre gave me the rare privileges of disk space and Internet access on my account on the VMS computer cluster. It wasn\u0026rsquo;t their job but I am forever indebted to VMS supremo Rollo Ross for letting me loose.\nAfter a while I decided it really might be possible to write and test rocket launch control software. The director of Research for the Institute (Professor David Lee, still at UniSA) and the head of Computer Centre (Chris Rusbridge) came with me and talked to the rocket scientists. One of them in particular, Peter Winch, suggested an angle I could tackle. So then I went around the Institute (being completely unused to how academics work, and the way they say things) and put together an alternative project and posted followup, this time with my rare privilege of being able to write to the Usenet forum sci.space describing what I did with Professor David Lee. My project never had much of a chance, because the main partner was Australian Launch Vehicles and after a period of trying gloriously they went out of business.\nThe whole experience started me off on something new. I had felt the power of a technical discussion where highly competent people treated me as an equal, over a global network. I discovered and wrote tools that let me analyse what people were saying anywhere on Usenet, and discover who was likely to have similar interests to me. And I learned that global development of source-available software had been going on for decades.\nI was particularly interested to see what could be done with collections of this free software, and what it was like to work on internet mailing lists writing it. So I set myself to learn everything I could. Eventually, years later, this kind of software became respectable, and got a name. Open Source Software. In 2024, Open Source Software flies in every spaceflight I am aware of, although the control software is not all Open Source.\nAnd ALV, the launch company? All the people have moved on of course but the internet hasn\u0026rsquo;t entirely forgotten. Peter Winch spoke at the 1st Australian Space Conference in 1990 alongside Buzz Aldrin. I was able to ring him up at an industrial plant\u0026hellip; \u0026ldquo;So, remember when you were a rocket scientist in Adelaide\u0026hellip;\u0026rdquo;. We had a great old chat :-)\nAnd now as our rights to a private life and even private thoughts are under assault from the ever-more connected digital age, Open Source software is one of the few things that stand a chance of being able to help. The briefest outline of this argument is:\nProgram source code is the only way of determining what a program is doing Transparency is required for security, and so program source code is required Security is a requirement for privacy ","permalink":"https://shearer.org/notes/chemical-rockets-to-open-source/","summary":"\u003cp\u003e(written in 2008)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHow a \u003ca href=\"/notes/radio-waves-to-random-number-generator\"\u003eyoung Australian\u003c/a\u003e discovered Open Source and a career. Eventually learning that a mixture of code, law and mathematics is a frontier for human rights battles.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIt isn\u0026rsquo;t often I come face to face with myself after a twenty-something year break, but I did yesterday.\u003c/p\u003e\n\u003cdiv style=\"display: flex; justify-content: space-between;\"\u003e\n\u003cdiv style=\"width: 48%;\"\u003e\n\u003cp\u003eAs a first year university student at the \u003ca href=\"https://en.wikipedia.org/wiki/South_Australian_Institute_of_Technology\"\u003eSouth Australian Institute of Technology\u003c/a\u003e in Adelaide I did landscape gardening oddjobs for companies. I noticed a company called Australian Launch Vehicles (ALV), which sounded very cool, so in I went. ALV was founded by a pair of entrepreneurial rocket scientists. Despite decades of \u003ca href=\"https://en.wikipedia.org/wiki/Woomera_Launch_Area_5\"\u003erocketry history in South Australia\u003c/a\u003e, there was no local space industry. (Establishing Australian spaceflight in 1987 was ambitious; they failed but others are giving it a go.)\u003c/p\u003e","title":"Open Source to Chemical Rockets"},{"content":"Large Language Models are subject to the laws of physics in a bad way, because they use so much electricity and make so much heat. I was interested to learn about a Mr Landauer and his principle of thermodynamic reversibility, which suggests physics just might come to our rescue and greatly reduce the amount of power required by AI datacentres. (Note \u0026rsquo;logical reversibility\u0026rsquo; sounds confusingly like reversible computers and backwards execution, but apart from a general spirit of going backwards, they are completely unrelated.)\nLogical and thermodynamic reversibility This all starts with Landauer\u0026rsquo;s 1961 principle that energy is not consumed by computation but by the erasure of information. This seems strange at first, but erasure forces a two-state system into one state, which increases entropy, which exit as heat. Bennett\u0026rsquo;s 1973 extension showed that any computation can be restructured to perform no erasure at all, so that the inputs are recoverable from outputs at every step. Bennett called called this logical reversibility, and it carries no Landauer penalty. From this we can see thermodynamic reversibility is the physical consequence of logical reversibility: when no information is erased, no entropy is generated, and the energy used to perform each computational step can be recovered and reused rather than lost as heat. And if you run an AI datacentre that would be wonderful.\nThe dependency runs in one direction only: thermodynamic reversibility requires logical reversibility, because an erased bit commits an irrecoverable entropy debt before the hardware gets any say. But logical reversibility can run on any old hardware and generate a lot of heat in the process.\nLogical reversibility Janus is a reversible language developed in 1982 and formally specified in 2007. Janus makes it impossible to write a program that discards information, and provides a program inverter that runs any Janus program cleanly backwards without a history tape. The harder problem is extending this to concurrent programs, where interleaving makes reversal non-trivial.\nA 2022 mathematical paper on Reversing an Imperative Concurrent Programming Language from the University of Leicester demonstrates this difficult problem is solvable. The paper Reversible Execution for Robustness in Embodied AI and Industrial Robots (one author in common with the previous paper) says:\nWe thus demonstrate how a traditional AI-based planning approach is enriched by an underlying reversible execution model that relies on the embodiment of the robot system\nIn 2025 it was shown that reversible architectures can work at scale for Large Language Models where transformer architectures become reversible. This works by reconstructing hidden states during backpropagation rather than storing them, and achieves order-of-magnitude memory reductions without sacrificing accuracy. This is not yet running on reversible hardware because such hardware doesn\u0026rsquo;t exist yet, but it demonstrates that the software side may be solvable. The models train on conventional GPUs, but they are logically reversible.\nTaking this paper at face value, it would seem such logical reversibility meets the precondition for thermodynamic reversibility, giving us energy that can be claimed back if the hardware allows it. And that is where physics is on our side.\nThermodynamic reversibility Vaire Computing in London are building practical reversible hardware which preseves the energy gains that logical reversibility permits to be reclaimed, described in their 2025 hardware paper. The underlying principle is to stop discarding information, so that you then stop paying the penalty for doing so. Their stated aim is to have a product in 2027.\nIt seems the Reversibile Computing 2026 conference in Torino has Vaire as its first commercial sponsor, suggesting the academic and engineering communities may be meeting in the middle. Because of my interest in reversible execution I have followed the output of this excellent conference for a long time.\n","permalink":"https://shearer.org/articles/reversible-logical-thermodynamics/","summary":"\u003cp\u003eLarge Language Models are subject to the laws of physics in a bad way, because they use so much electricity and make so much heat. I was interested to learn\nabout a Mr Landauer and his principle of thermodynamic reversibility, which suggests physics just might come to our rescue and greatly reduce the amount of\npower required by AI datacentres. (Note \u0026rsquo;logical reversibility\u0026rsquo; sounds confusingly like \u003ca href=\"/articles/reverse-execution/\"\u003ereversible computers and backwards\nexecution\u003c/a\u003e, but apart from a general spirit of going backwards, they are completely unrelated.)\u003c/p\u003e","title":"Logical and Thermodynamic Reversibility"},{"content":"Not-forking is a technical tool for software development. Not-forking assists with reproducibility.\nHere are some simple ways of explaining what Not-forking can do:\nNot-forking lets you integrate non-diffable codebases, a bit like patch/sed/diff/cp/mv rolled into one. Not-forking is a machine-readable file format and tool. It answers the question: What is the minimum difference between multiple source trees, and how can this difference be applied as versions change over time? Not-forking avoids duplicating source code. When one project is within another project, and the projects are external to each other, there is often pressure to fork the inner project. Not-forking avoids that. Not-forking helps address the problem of reproducibility. By giving much better control over the input source trees, it is more likely that the output binaries are the same each time. But here is the big win: Not-forking avoids project-level forking by largely automating change management in ways that version control systems such as Fossil, Git, or GitHub cannot.\nThe full documentation goes into much more detail than this overview.\nNot-forking was a pre-requisite to for LumoSQL to exist, but unlike LumoSQL is fully production-ready.\nI designed and tested Not-forking, and Claudio Calvelli did most of the coding as can be seen in the commit logs…\nThis following diagram shows the simplest case:\nSome questions immediately arise:\nShould you import Upstream into your source code management system? If Upstream makes modifications, how can you pull those modifications into Combined Project safely? If Combined Project has changed files in Upstream, how can you merge them safely? Not-forking also addresses more complicated scenarios:\nAnd even more complex cases:\nWhy Not Just Use Git/Fossil/Other VCS? Git rebase cannot solve the Not-forking problem space. Neither can Git submodules. Nor Fossil\u0026rsquo;s merge, nor the quilt approach to combining patches.\nA VCS cannot address the Not-forking class of problems because the decisions required are typically made by humans doing a port or reimplementation where multiple upstreams need to be combined. A patch stream can\u0026rsquo;t describe what needs to be done, so automating this requires a tangle of fragile one-off code. Not-forking makes it possible to write a build system without these code tangles.\nExamples of the sorts of actions Not-forking can take:\ncheck for new versions of all upstreams, doing comparisons of the human-readable release numbers/letters rather than repo checkins or tags, where human-readable version numbers vary widely in their construction replace foo.c with bar.c in all cases (perhaps because we want to replace a library that has an identical API with a safer implementation apply this patch to main.c of Upstream 0, but only in the case where we are also pulling in upstream1.c, but not if we are also using upstream2.c apply these non-patch changes to Upstream 0 main.c in the style of sed rather than patch, making it possible to merge trees that a VCS says are unmergable build with upstream1.c version 2, and upstream3.c version 3, both of which are ported to upstream 0\u0026rsquo;s main.c version 5 track changes in all upstreams, which may use arbitrary release mechanisms (Git, tarball, Fossil, other) cache all versions of all upstreams, so that a build system can step through a large matrix of versions of code quickly, perhaps for test/benchmark ","permalink":"https://shearer.org/articles/not-forking/","summary":"\u003cp\u003e\u003ca href=\"https://lumosql.org/src/not-forking\"\u003eNot-forking\u003c/a\u003e is a technical tool for software development. Not-forking\nassists with reproducibility.\u003c/p\u003e\n\u003cp\u003eHere are some simple ways of explaining what Not-forking can do:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eNot-forking lets you integrate non-diffable codebases, a bit like patch/sed/diff/cp/mv rolled into one.\u003c/li\u003e\n\u003cli\u003eNot-forking is a machine-readable file format and tool. It answers the question: \u003cem\u003eWhat is the minimum difference between multiple source trees, and how can this difference be applied as versions change over time?\u003c/em\u003e\u003c/li\u003e\n\u003cli\u003eNot-forking avoids duplicating source code. When one project is within another project, and the projects are external to each other, there is often pressure to fork the inner project. Not-forking avoids that.\u003c/li\u003e\n\u003cli\u003eNot-forking helps address the problem of reproducibility. By giving much better control over the input source trees, it is more likely that the output binaries are the same each time.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eBut here is the big win: Not-forking \u003cstrong\u003eavoids project-level forking\u003c/strong\u003e by largely automating change management in ways that \u003ca href=\"https://en.wikipedia.org/wiki/Distributed_version_control\"\u003eversion control systems\u003c/a\u003e such as \u003ca href=\"https://fossil-scm.org\"\u003eFossil\u003c/a\u003e, \u003ca href=\"https://git-scm.org\"\u003eGit\u003c/a\u003e, or \u003ca href=\"https://github.com\"\u003eGitHub\u003c/a\u003e cannot.\u003cbr\u003e\nThe \u003ca href=\"https://lumosql.org/src/not-forking/doc/trunk/doc/not-forking.md\"\u003efull documentation\u003c/a\u003e goes into much more detail than this overview.\u003c/p\u003e","title":"Not Forking"},{"content":"Reversible execution creates computers that seem to run backwards, applying time shifting techniques with simulation/virtualisation. They address the problems of software unreliability and complexity, and I believe my excited comments from 2005 still stand:\nReversibility is the biggest advance in debugging since source code debugging\n\u0026mdash; Me, on the GDB developers list In 2026, reversibility still isn\u0026rsquo;t seen as an ubiquitous must-have for software development, but awareness is increasing.\nAt the same time, the equally interesting topics of logical reversibility and thermodynamic reversibility have become very important. They are not really anything to do with reversible execution, they just sound similar. But if you\u0026rsquo;re interested in the problems AI datacentres present the world, this kind of reversibility looks highly relevant.\nWhat is reversible execution? Reversible execution is about giving the appearance of a program executing backwards in time. If you\u0026rsquo;ve not seen it before, it is just as strange and impressive as it sounds.\nIt is possible to have a whole network of running computers - say, Android, Windows, Linux running on miscellaneous hardware - and then to stop them all and reverse back to any point in time. For example, reversing to a point just before a catastrophic error occurred, so we can watch carefully as it runs forward from that point. And repeat it if we want to, again and again, any number of times. Imagine installing an operating system you know nothing about, starting an application\u0026hellip; and then running the process in reverse as it unboots, scrolling up the screen until it switches off. This is true reversibility from the point of view of a systems or application developer—you have a slider bar for time. The underlying mechanism is a record-replay trick: on the first pass, all non-deterministic inputs to the system—keyboard events, network packets, hardware interrupts, clock values—are logged to disk; on replay, those same inputs are fed back at precisely the right moments, making execution deterministic and therefore repeatable from any captured point.\nUnreliable software is increasing, and it affects whole societies. A large part of the problem is due to complexity in software, and I have seen reversibility reduce this by making debugging much easier to do.\nGreg Law has an excellent 2024 video presentation of the principles of time travel debugging.\nOutlook for reversible execution In 2026 the world has not yet agreed with my enthusiasm for reversibility, but still:\nFull execution reversibility of simulated systems at high speed is possible - including even running and operating system backwards, even unbooting. This is well-understood computer science. The ability to rewind and replay at an application level is powerful for debugging, especially for complex stacks, and rare or non-deterministic bugs. Until recently, relatively few developers seem interested in these features, despite offering visibility into very complicated problems. VMware Player dropped reversibility in 2011, citing not enough people demonstrated the need or invested the time necessary to configure and use the feature. And to be fair, VMware is a hypervisor technology where virtual machines have access to the underlying CPU, and modern CPUs are significantly non-deterministic, so this is very hard to implement. By using a simulation approach instead, Wind River\u0026rsquo;s Simics\u0026rsquo; reverse execution feature still exists, although only mentioned in passing in training material. Jakob Engblom\u0026rsquo;s Comprehensive Reversibility List is maintained by my former Simics colleague, and it has very few entries since the dawn of commercial source code debugging.\nBut why? The wealthy toolmaking companies of the world know reversibility of arbitrary electronics devices can be done at speed and scale, down to microcode resolution where necessary.\nSome open source solutions exist:\nEclipse CDT has support for driving reversible targets, including full-system targets via the GDB MI interface There are several reversible user mode targets, notably rr originally from Mozilla, capable of rewind/replay of applications including QEMU, which itself can run systems within it CRIU and DMTCP are checkpointing primitives usable by a reverse debugger for distributed applications (2026) GDB does not need to be driven: it has a well-tested implementation of Record/Replay and Reversible Debugging. GDB can itself drive a reversible target, especially embedded targets Robert O\u0026rsquo;Callahan is an eloquent campaigner for better debugging, one of the authors of Pernosco, a commercial tool that uses rr traces. Robert says:\nIn the past, a lot of people simply did not believe that these tools could possibly do what we say they do. That is becoming less of a problem.\nand I really hope that\u0026rsquo;s true.\nBesides these, closed-source WinDbg Preview (available through Visual Studio) supports Time Travel Debugging (TTD), and LiveRecorder is a user-space record/replay solution. Perhaps one of these will spark a revolution. If you\u0026rsquo;re a student, try them out!\nIf I was starting to build reversibility for production use today, I would probably start prototyping with the QEMU Replay System. This is accurate but a bit clunky and slow, but rapidly allows different techniques to be tried. I would also be looking at using AI to drive reversible simulation to find problems, rather than the old-fashioned manual method.\nThe annual Conference on Reversible Computation showcases a much more general view. This conference does include practical reversible computing as I describe above, but also explores the mathematics of reversibility including in the context of quantum computing. I can\u0026rsquo;t even begin to guess how quantum reversibility works at high resolution but seemingly it is regarded as feasible.\nFor more on what reversibility has to do with thermodynamics and AI datacentres, see my article on the topic.\n","permalink":"https://shearer.org/articles/reversible-execution/","summary":"\u003cp\u003eReversible execution creates computers that seem to run backwards, applying time shifting techniques with simulation/virtualisation. They address the problems of software unreliability and complexity, and I believe my excited comments from 2005 still stand:\u003c/p\u003e\n\u003cdiv class=\"article-quote\"\u003e\n\u003cp\u003eReversibility is the biggest advance in debugging since source code debugging\u003c/p\u003e\n\u003cp\u003e\u003ccite\u003e \u0026mdash; Me, on the \u003ca href=\"https://sourceware.org/legacy-ml/gdb/2005-05/msg00162.html\"\u003eGDB developers list\u003c/a\u003e \u003c/cite\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cp\u003eIn 2026, reversibility still isn\u0026rsquo;t seen as an ubiquitous must-have for software development, but\nawareness is increasing.\u003c/p\u003e\n\u003cp\u003eAt the same time, the equally interesting topics of \u003cem\u003e\u003ca href=\"/articles/reversible-logical-thermodynamics\"\u003elogical reversibility and thermodynamic\nreversibility\u003c/a\u003e\u003c/em\u003e have become very important. They are not really\nanything to do with reversible execution, they just sound similar. But if you\u0026rsquo;re interested in the problems AI\ndatacentres present the world, this kind of reversibility looks highly relevant.\u003c/p\u003e","title":"Reversible Execution"},{"content":"I have been lead implementer of the main security and privacy standards several times each. These can seem intimidating, but properly used they improve security overall, and can help a business run more smoothly.\nFrom a pragmatic, business point of view:\nThese standards are about writing down the actual rules of your business relevant to security and privacy, and then writing down how you improve these rules, and recording how well they work. All businesses can benefit from challenging their working habits and practices, and since privacy and security touch most parts of a business, this is an opportunity to review how the business works before something goes wrong. From the point of view of both Computer Science and Information Management Science:\nThese standards all involve creating a Records Management System that tracks information, and they all work after the style of the ISO9001 Quality Standards in the sense that a documented process is called a \u0026ldquo;control\u0026rdquo;, and once a control exists it can be measured and improved. This is a repeating pattern found throughout the field of software quality and safety, and helps give security professionals a place to start when something does go wrong.\nThe Big Standards ISO27001/9001 and GDPR have a dreadful reputation in industry. I think that is mostly unfair, and is because one or more of the following mistakes are made:\nUnwilling to modernise. In the 21st century, realistically we must implement a Records Management System (RMS), ie, every document is tracked in a database through its lifecycle until deletion. Many companies do not look at their data like this, and so they are shocked when ISO/GDPR standards require them to behave in a modern way. It is possible to do this without spending money on a very expensive and proprietary Document Management System. Refuse to involve staff. The social impact of information systems needs to be a top priority, otherwise they won\u0026rsquo;t be used effectively! Many companies ignore their staff and then they incorrectly blame 27k/GDPR for a poor result. Instead, an organisation\u0026rsquo;s staff need to feel that they are in charge of the RMS, with personal responsibility for the parts most relevant to them, and that it is easy to use. If staff feel like pointless rules have been imposed on how they do their daily work then the system will fail, and security is likely to get worse not better. Insufficient funding. Despite the risks involved, security is often underfunded. As well as organisation-wide involvement with change authority delegated all the way down, there needs to be specific funding in the form of staff time to support the culture change that is required to introduce a RMS. It will save the organisation much more money in the long run. These big standards relate to the fields of Information Management and Cybersecurity. Information Management is about defining the data and documents the RMS is to be managing. An early step is often to discover all the data the organisation is responsible for, usually a shock to the IT department.\nA good understanding of Open Source software stacks really helps too, from sniffing out data repositories and potential security issues, to implementing RMS software and processes that are as lightweight as possible.\nCyber Essentials In addition to the above standards, the UK government responded to the EU Cybersecurity initiatives by setting up the National Cyber Security Centre (NCSC). The NCSC is effectively a subsidiary of GCHQ the UK\u0026rsquo;s electronics spying organisation, and billions have been given to NCSC from various government budgets. It is NCSC who responded to the EU Network Information Security Directive with a standard assessment and framework, and then a few years later came up with the idea of the CyberSecurity certifications. IASME holds a UK-wide monopoly on issuing the CyberEssentials certifications, which I think is a poor arrangement for a national standard — I have asked NCSC if they will reconsider this decision.\nIASME Certifications That said, the certifications themselves have raised security awareness in the UK and are worth understanding.\nThe CyberEssentials certification is useful for an organisation that has never thought about security before, although the marketing is quite confusing.\nIASME Cyber Essentials - £300 Entry Level\nThe commercial explanation is that this is Easy to do, and addresses 80% of external attacks. Also required by Government departments.\nSelf-assessed, with sanity checking by a contractor to IASME Cyber Essentials Plus is an external review of the same thing, rather than self-assessed graph LR i1((Cyber Essentials basic security product costs 300 pounds 77 questions and 5 controls)) -- 100% compliant with --\u003e u1[NCSC Requirements for Infrastructure Document] i1 -- 10% compliant with --\u003e s2[the 114 Controls in ISO27001/27002] i1 -- does not breach, and 50% coverage of --\u003e u2[UK Data Protection Act 2018] -- which fully implements --\u003e s3[EU GDPR] classDef green fill:#9f6,stroke:#333,stroke-width:2px; classDef orange fill:#f96,stroke:#333,stroke-width:4px; classDef blue fill:#99f,stroke:#333,stroke-width:2px; class s1,s2,s3 green class u1,u2,u3 blue class i1 orange IASME Governance - £400 Top Level\nThe commercial explanation is that this is an excellent alternative to ISO 27001 for small and medium sized organisations.\nCertification includes Cyber Essentials for free Includes all of the 5 Cyber Essentials technical topics, and adds topics related to people and processes from other standards as per diagram Will take a lot longer and cost the company a lot more effort than Cyber Essentials The assessed version involves a site visit, but covers the same questions graph LR i1((IASME Governance premium product Costs 400 pounds has 160 questions, 8 controls)) -- 20% compliant with --\u003e s1[the 114 controls in ISO27001/27002]; i1 -- 85% compliant with --\u003e u1[UK NCSC Cyber Assessment Framework] -- which fully implements --\u003e s2[EU NIS - Network Information Security] i1 -- 99% compliant with --\u003e u3[UK ICO Accountability Framework] -- one of seven key parts of --\u003e s3[EU GDPR] i1 -- does not breach, and 60% coverage of --\u003e u2[UK Data Protection Act 2018] -- which fully implements --\u003e s3[EU GDPR] classDef green fill:#9f6,stroke:#333,stroke-width:2px; classDef orange fill:#f96,stroke:#333,stroke-width:4px; classDef blue fill:#99f,stroke:#333,stroke-width:2px; class s1,s2,s3 green class u1,u2,u3 blue class i1 orange ","permalink":"https://shearer.org/articles/security-standards-and-certifications/","summary":"\u003cp\u003eI have been lead implementer of the main security and privacy standards several times each. These can seem intimidating, but properly used they improve security overall, and can help a business run more smoothly.\u003c/p\u003e\n\u003cp\u003eFrom a pragmatic, business point of view:\u003c/p\u003e\n\u003cp\u003eThese standards are about writing down the actual rules of your business relevant to security and privacy, and then writing down how you improve these rules, and recording how well they work. All businesses can benefit from challenging their working habits and practices, and since privacy and security touch most parts of a business, this is an opportunity to review how the business works before something goes wrong.\nFrom the point of view of both Computer Science and Information Management Science:\u003c/p\u003e","title":"Security Standards and Certifications"},{"content":"These exercises assume a CS graduate-level background and familiarity with the tools mentioned; they are for mentors to adapt for their students. I have either created or been subjected to all of them over the years, and I have mentored students through them on many occasions.\nThe general theme here is that most of the systems and stacks that are taken for granted often don\u0026rsquo;t work very well, and often don\u0026rsquo;t seem to have a very bright future. This is even the case for famous codebases relied on by billions of people. There are no absolutes and no immediate fixes, but it is food for thought if we can demonstrate immense waste of human effort amid poor quality computing systems, even when impressive modern computer science is applied.\nSecurity Point Whonix at a server we control and try to de-anonymise a web page access using network capture and analysis. Compare with doing the same from a consumer operating system instead of Whonix. Construct a single-purpose computer in an embedded application, such as firefox/chromium in kiosk mode on a laptop running Ubuntu Linux. Then destabilise the computer using all attacks such as network, physical, software, sidechannel and social engineering. The goal of security researchers (regardless of hat colour) is often to get control of userspace. For many years there have been Linux Play Machines online with root password published and full shell access to userspace to anonymous users from anywhere. Yet these machines are considered secure. What does this say about security generally? Build and attack such machines. Complexity and Tech Robustness Follow instructions to install a common 2025 web stack from its component parts on a fresh virtual machine: Vue.js+Node.js+Apache+language server+SQL database. Say \u0026ldquo;hello world\u0026rdquo;, and do basic reliability testing. Introduce small but plausible changes in the stack components to check they make an observable difference. Travel back to 1975 by booting IBM MVS 3.8j in Hercules. If instructions are followed exactly it doesn\u0026rsquo;t take long to get a working system (Strong Hint! Follow the instructions, because your computing experience is probably irrelevant.) This takes about the same amount of developer time as Vue.js in the previous challenge to get to \u0026ldquo;hello world\u0026rdquo;. Introduce small but plausible changes. Which stack is most likely to be working in ten years, and why? Follow instructions to connect Vue.js \u0026ldquo;Hello\u0026rdquo; to use MVS as a database. This is a ridiculous stack. Compare the stack levels and their fragility to a typical distributed microservice architecture with 7 levels of language involved. Which is most likely to be working in one year? Never mind the lines of code, just think about the number of translation layers. Which stack is the most ridiculous? Consider the modern computer and operating system of your choice printing \u0026ldquo;hello\u0026rdquo; from local storage. Using public information, estimate the number of lines of code in every element in the stack down to the CPU transistor level. Now apply common bug metrics to this result, and human factors engineering. How many people with which skills would be needed to fix any problem? Does it matter? Consider the transistor-up stack; which components at each level publish source code? Compare AMD to RISC-V at the bottom; does this cover all the code running at what we can roughly call the \u0026ldquo;silicon level\u0026rdquo;? 30 billion transistors on a modern 2020-era chip equates to many millions of lines of RTL, which is generated from many fewer millions of lines of VHDL/Verilog, which in turn is often generated from even fewer lines of a high level design language. Can we deduce anything about the complexity relationships in 3D chip designs with trillions of transistors? Given that AI-assisted design tools are essential for billion-scale silicon, what can we expect for trillion-scale silicon in 2025 and later? Is it relevant that there is a single worldwide source of supply in The Netherlands for machines to make 3D chips, which is how trillion-scale silicon is likely to be implemented in its early days at least? This Reddit \u0026ldquo;Ask Me Anything\u0026rdquo; with the SpaceX developers gives some details of the rocket flight software. Draw an architecture diagram of the relevant stacks. What can we decide about complexity and reliability? Are these good choices? Can we conclude this is reliable software? Are there failure modes other than \u0026ldquo;You will not be going to space today?\u0026rdquo; Operating System Technology Linux from Scratch takes a few hours to get a prompt running from the bare components (a full system takes much longer.) Use checksumming to compare the binaries created by different students\u0026rsquo; Linux from Scratch. Why aren\u0026rsquo;t they all the same? Debian partly solved this in October 2021 after 20 years while even NetBSD, a source distribution unlike Debian, still struggles. Does this kind of reproducibility matter? The Linux kernel source is a little under 30 million lines of code. Compile the smallest useful kernel you can, and estimate the number of lines of code used. Is Linux bloated? Compile a kernel on Ubuntu and estimate the number of lines of code. How much of this is running at boot time? Is Ubuntu bloated? Modify an operating system so that so that any time the user types \u0026ldquo;hocus pocus\u0026rdquo; in any context a log message is sent to a log server over the internet. Are there any limitations on your implementation? Modify an operating system to respond to a single network packet of a specified type. What would be good starting points for this? A typical fresh Linux server install between 1x10^5 and 2x10^5 files depending on distribution (my laptop, however, always has of the order of 10^6 files.) How many binary executable files are there in the smallest useful Linux deployment? Software Development I will claim in front of audiences that among the hardest software engineering tasks is reliable progress bar estimates. Prove me wrong by implementing a progress bar that meets user expectations and handles the changing environment within a computer and from the outside world. Hint: what are the user\u0026rsquo;s expectations? What are progress bars imagined to be communicating? Write a graphical internet web application using Node.js to display all the information it can deduce about its network connection (where it is geographically and when, what standards are supported, etc). Explain how you can be sure this application will still run reliably in ten years time and what the limits on this are. Repeat using C or Rust. ","permalink":"https://shearer.org/articles/teaching-exercises/","summary":"\u003cp\u003eThese exercises assume a CS graduate-level background and familiarity with the tools mentioned; they are for mentors to adapt for their students. I have either created or been subjected to all of them over the years, and I have mentored students through them on many occasions.\u003c/p\u003e\n\u003cp\u003eThe general theme here is that most of the systems and stacks that are taken for granted often don\u0026rsquo;t work very well, and often don\u0026rsquo;t seem to have a very bright future. This is even the case for famous codebases relied on by billions of people. There are no absolutes and no immediate fixes, but it is food for thought if we can demonstrate immense waste of human effort amid poor quality computing systems, even when impressive modern computer science is applied.\u003c/p\u003e","title":"Education Exercises"},{"content":"Random humans and computers Humans are terrible at randomness. If you ask people to write down a list of random numbers the result can usually be shown to not be random at all. Stage magicians and marketing experts exploit our inability to assess how random an event is.\nBut computers, surely they should be random? It sure feels like it when your printer jams. But no, computers are often worse than humans at being random, and that\u0026rsquo;s a problem. Randomness is exceedingly important to making computers and networks work.\nRandom numbers are needed for good cryptography, and good cryptography matters for fundamental human rights reasons. Without it, nothing can be kept private. That is why the EU has built its privacy legislation on human rights. And that is why the random number service at random.org is important, because it suggests (but does not show) how to do this correctly in a mathematical sense.\nAn unworkable idea in 1986 When I was 15 years old my loving parents bought the family a Unitron 2200 Apple ][ clone because computers do seem like they are going to be important in the future (they got that right, among many other things.) I started to get into computing and somewhere picked up the idea that randomness was important.\nAnd so, in the department of \u0026ldquo;ancient things found in the attic\u0026rdquo;, here is a clipping from the Adelaide Advertiser in Australia. In 1986 I hadn\u0026rsquo;t the slightest idea how important random numbers were, but they seemed fun at the time. Back then, I just wanted to do better than what a basic IBM PC would produce if you asked it to run a pseudo-random number generator.\nUnfortunately no, a random number generator based on mashing together multiple radio stations won\u0026rsquo;t work. Radio waves of the kind used in analogue radios aren\u0026rsquo;t truly random no matter how much we confuse the voices with the music! There are mathematical ways to show that, and for various reasons is an important problem to solve.\nDr Mads Haahr of Dublin has all the right mathematics to assess what is a good source of randomness, and I was gratified to discover that much more recently he too looked to the air for his solution, but he chose to use static. \u0026ldquo;The first version of the random number generator was based on a $10 radio receiver from Radio Shack.\u0026rdquo;.\nDr Haahr founded Random.org to produce high-quality random numbers for \u0026ldquo;holding drawings, lotteries and sweepstakes, to drive online games, for scientific applications and for art and music.\u0026rdquo; The theory behind his work is important for all random numbers. Since the topic of random numbers immediately brings up security, I need to point out that random.org is a single source of failure, and since the source code is not published it is not easily possible to verify Dr Haahr\u0026rsquo;s claims of randomness (it could, for all we know, be a clever fake that slightly weights the random numbers this way or that, to the long-term benefit of whoever did the weighting.)\n$500 for the sake of it However - the radio waves did get me $500 at the time without actually doing a thing except handwriting a letter enclosing my top 10 ideas, and a confusing conversation with a journalist who found the concept very strange indeed. And perhaps that was something to do with one of the things that happened next.\n","permalink":"https://shearer.org/notes/radio-waves-to-random-number-generator/","summary":"\u003ch1 id=\"random-humans-and-computers\"\u003eRandom humans and computers\u003c/h1\u003e\n\u003cp\u003eHumans are terrible at randomness. If you ask people to write down a list of random numbers the result can\nusually be shown to not be random at all. Stage magicians and marketing experts exploit our inability to\nassess how random an event is.\u003c/p\u003e\n\u003cp\u003eBut computers, surely they should be random? It sure feels like it when your printer jams.\nBut no, computers are often worse than humans at being random, and that\u0026rsquo;s a problem. Randomness is\nexceedingly important to making computers and networks work.\u003c/p\u003e","title":"Radio Waves to Random Number Generator"},{"content":"Hello! I am Dan Shearer, dan@shearer.org. I live in Scotland and practice in Europe, the US and Oceania. You can verify and email me securely.\nWhen a technology becomes really widespread, who gets to control it? This is more subtle these days than traditional Capital vs the People debates, because issues of privacy, collectivism and efficiency get muddied by sovereignty, AI, and climate change. Venture funders and governments now quite commonly want to discuss collective IP protection in the face of unprecedentedly powerful tech companies. That conversation would have been unthinkable twenty years ago. Whether it\u0026rsquo;s addressing tech giant monopolies, privacy law and tech implementation in many jurisdictions in the 2010s, or AI safety architecture now, the underlying question is the same.\nOutside the main thread Code of Conduct — A concise code of conduct for open source projects after witnessing repeated serious incidents of aggression and intimidation. Started from the Mozilla Participation Guidelines, shrunk to the essentials for smaller projects. Security Standards and Certifications — How security standards (ISO27001/9001, GDPR, NIS, etc.) and certifications relate in practice for UK industry. Fossil — Git is ubiquitous but with some difficult‑to‑fix design flaws that hold back development for most projects. Fossil is very mature but needed to be easier to access, and to have a technical strategy for avoiding Git‑type lock‑in. I contributed to these improvements so my projects could abandon Git/GitHub for Fossil. Teaching Exercises — Exercises in the areas of CyberSecurity/CompSci and Technology. December 2025 –\nPresent Perseverance open source project The Perseverance Composition Engine is our novel approach to AI safety and security. November 2025 –\nPresent University of Southampton 5 months Rule Based Epidemiology Modelling group, using a rules based technique to achieve similar results to traditional ordinary differential equations. Paper 1, Paper 2 May 2023 –\nNovember 2025 University of Southampton 2 years 6 months Modelling Risk Assessment at the IT Innovation Centre. Considering open software used for risk assessment, working with privacy, AI assessment, and threat assessment of complex IT systems. December 2019 –\nPresent LumoSQL Project 6 years 4 months Project Lead. Open source project modifying SQLite to add technical enhancements for performance, security and measurement, in cooperation with the SQLite project. Funded by NLnet (Phase I and Phase II). Squashed by the pandemic, still highly relevant. March 2013 –\nFebruary 2019 Open Ocean Capital 5 years 11 months CTO Advisor. Investigating and assessing B2B companies, mostly software startups, some growth stage and hardware crossover, including machine learning and pre-LLM AI. Portfolio involvement including board-level review and strategy changes including whole-company pivots. Based in Edinburgh and Helsinki. February 2013 –\nNovember 2015 Zentyal 2 years 9 months VP Special Projects. Representing Zentyal\u0026rsquo;s interest in large-scale commercial deployment of solutions based on Open Source. Working on Linux, Samba and especially OpenChange. Based in Zaragoza, Spain. January 2013 –\nApril 2023 Cybersecurity Consultancy 10 years 3 months Privacy and Open Source Specialist. Developing tooling for GDPR, ISO27001 and CyberEssentials tooling using Records Management approaches. Implemented for cloud computing, earth sciences and logistics companies. Expert witness work in electronic microfabrication for the Scottish Court of Sessions (Ultratech Inc vs Stepper Technology Ltd), radiopharmacology software architecture (Erigal Ltd), manufacturing fault detection, warehouse product tracking, and legal technology advisory for Scottish and English legal practices on IP and Privacy. January 1997 –\nPresent Continuous Practice 29 years 3 months Privacy and Open Source Specialist (Self-employed, Scotland). Ongoing consulting since 1997 spanning privacy law, open source strategy, security implementation, and technology assessment. This thread runs parallel to all specific roles listed above. January 1992 –\nDecember 2015 Samba Project 24 years Co-founder. Co-founded the Samba project, providing file and print services to SMB/CIFS clients. Technical background work for two decades including software development in C and system/infrastructure code. Technology assessment involving code reading and experimentation across IT infrastructure, servers, scaling approaches, and cloud stack design. ","permalink":"https://shearer.org/about/","summary":"\u003cp\u003eHello! I am Dan Shearer, \u003ca href=\"mailto:dan@shearer.org\"\u003edan@shearer.org\u003c/a\u003e. I live in Scotland and practice in Europe, the US and Oceania. You can \u003ca href=\"/public-key-for-dan-shearer/\"\u003everify and email me securely\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eWhen a technology becomes really widespread, who gets to control it? This is more subtle these days than traditional \u003cem\u003eCapital vs the People\u003c/em\u003e debates, because\nissues of privacy, collectivism and efficiency get muddied by sovereignty, AI, and climate change. Venture funders and governments now quite commonly want to\ndiscuss collective IP protection in the face of unprecedentedly powerful tech companies. That conversation would have been unthinkable twenty years ago.\nWhether it\u0026rsquo;s addressing tech giant monopolies, privacy law and tech implementation in many jurisdictions in the 2010s, or AI safety architecture now, the underlying question is the same.\u003c/p\u003e","title":"About"}]