Java Is More Like English Than You Think

At a recent in-house event celebrating the launch of Java 17, I was asked to give a talk on the future of the language. I don’t have a hotline to Oracle, so I took the opportunity to dust off my linguistics degree and point out how the nature and comparatively glacial evolution of human languages is not unlike what you can witness in computer languages, compressed within decades. This is the first of two posts based on that talk.

Image by Shvets Productions licensed from Pexels.com

In many ways, Java, Perl, Haskell, or Lisp are not at all like human languages. Lines of code only resemble English sentences at a very superficial level. Java is one of many specifications to make a digital machine do your bidding, expressed using a small set of English keywords, special characters, and structural rules to make it easy on the brain of the speaker, i.e., the programmer. Regardless of complexity, the behavior of a program follows unambiguously from its content. Sure, you can bang your head against the wall why it does what it does, but given enough eyeballs, even the worst spaghetti code is deterministic.

Not so with human language. Conveying statements of fact (the train to Amsterdam is five minutes late) is just one, boring use case. We don’t just talk to inform, but to bond, impress, or persuade. Our words don’t even have to make logical sense. Twisting the meaning of words is the whole point of humor – and much of politics too. Most of us choose never to program, but everyone acquires basic fluency in their native language before primary school age. We evolved to speak; the capability is hardwired in dedicated areas of our brains and human languages reflect the complexity and illogical, messy nature of systems that emerged through evolution.

A Small Toolkit To Build Anything

And yet there is one big commonality: both human and computer languages must be Turing complete. A limited set of tools (words and syntactic rules) can express an infinite range of meanings. If a novel concept needs a word, the language community will borrow or invent one, usually by combining existing words. The language must therefore be sufficiently rich to express any meaning but cannot be successful if it is too hard to learn. Any useful tool must always be manageable by the average human brain.

The International Phonetic Alphabet provides a symbol for any sound that is used in the world’s languages. You’d be surprised how versatile your tongue, lips, throat, and vocal cords can be. No language uses them all. They compromise in what they pack in their toolkit. You don’t need every tool that’s on display in the hardware store to do the job. A classic example: Mandarin Chinese is a tone language. Variations in the way your voice goes up and down (pitch contour) determine the meaning of a word. 

 mā — mother
 má — hemp
马  — horse
  — to scold

Source: fluentu.com

The four tonal varieties of ma are as different to native speakers of Chinese as byeboughbay, and boo are to English ears. Mandarin effectively gives you four distinct vowels for the price of one. If it’s so powerful, why didn’t every language adopt the feature? Well, Mandarin is restrictive in other areas. Syllables can only start with a single consonant and must end either in a vowel or an n or ng sound. A word like scripts, with an impressive six consonants, cannot occur and is predictably hard to master for Chinese speakers – much as Europeans struggle to get the tones right.

Mastering absence of mutable state and eliminating side effects in a pure functional language coming from Java felt like learning Chinese again. Any radically different way of doing things has this effect. And they often don’t combine well. Scala tries to marry object orientation and functional programming. As much as I like Scala, feature richness is not all that counts. Popularity requires a more gentle learning curve.

No Revolution, but Evolution

The language spoken in Britain during the early Middle Ages is called Anglo-Saxon, aka Old English (500-800 AD). Take the opening verses of the epic poem Beowulf. Any resemblance to Middle Earth is no coincidence; Tolkien taught it in Oxford. It hardly looks or sounds like English.

Hwæt! We Gardena in geardagum,
þeod-cyninga þrym gefrunon,
hu ða æþelingas ellen fremedon.

French-speaking invaders brought about a massive overhaul in vocabulary, spelling and grammar. By the time of Geoffrey Chaucer and his Canterbury Tales (1400), you get by without constant recourse to the footnotes.

Whan that Aprill, with his shoures soote,
The droghte of March hath perced to the roote
,
And bathed every veyne in swich licour
,
Of which vertu engendred is the flour;

And note how our present-day spelling has stayed remarkably stable since Shakespeare’s A Midsummer Night’s Dream (1595).

 Masters, you ought to consider with yourselves. To bring in – God shield us – a lion among ladies is a most dreadful thing.

Words like thought and laugh still reflect how they were pronounced a thousand years ago (that is, with the throaty sound as in Scottish Loch). English spelling is conservative, like Java’s syntax, which is why we can read Shakespeare exactly as it was written 400 years ago.

Java’s 25+ year history looks turbulent compared to the timeframe of English language evolution. We’ve had many major syntactic improvements (Generics, for-loops) and even a major paradigm shakeup in the move towards lambdas and streams. Immutability is likely to be another, with the introduction of records. This is how we used to do sums, before 1.5.

public double cumulativeSalaries(List employees){
	double accum = 0;
	for (int i=0;i<employees.size(); i++){
		Object item = employees.get(i);
		if ( item instanceof Employee){
			Employee e = (Employee)item;
			accum += e.isActive() ? e.getSalary() : 0;
		}
	}
	return accum;
}

Lambdas and streams were much more than syntactic sugar. They introduced a radically different way of doing things. You can’t have too many paradigm shifts like that, or with won’t be the same language anymore.

public double cumulativeSalaries(List<Employee> employees){
  return employees.stream()
    .filter(Employee::isActive)
    .mapToDouble(Employee::getSalary)
    .sum();
}

Evolutionary Democracy and Benevolent Dictatorship

Human language is in the public domain. No country or speech community holds the copyright to English. Change is gradual and unregulated. Governments that control thought through language reform are the stuff of dystopian novels. Computer language evolution can be democratic, but it must be centrally regulated to a large extent. And the bigger the user community, the more it shuns breaking changes. The greater the canon of texts or source code, the less attractive it becomes to introduce changes that render your old code useless or your books unreadable. Introduce too many radical changes, and you can no longer call it the same language. This happened to Perl 6, which changed its name to Raku after twenty years.

Here are the two takeaways from this exercise in comparison: Firstly, languages are fully expressive (Turing complete) and they need not share the same grammatical toolset to be so. Secondly, their evolution must be gradual and not introduce too many breaking changes, or it will no longer be the same language. In the second part of this post, we’ll look at some possible consequences for the future of the languages.