Tuesday, October 7, 2025

Linguistics ToE, one

 

 

One,

What linguistics is.

I have discussed these issues at two Facebook groups (Linguistics & Historical linguistics and Etymology). I will simply use some of my posts there to discuss the above issues here.

See my post at https://www.facebook.com/groups/generallinguistics/permalink/10157742816449346/

 

Someone said: {Linguistics has four levels: Phonology, Morphology, Syntax & Semantics referred to as formal linguistics. The issue of linguistics having three folds is contestable and arguable.}

 

He is kind of right in terms of human natural languages but is wrong in linguistics.

 

Someone also said: {only angel’s language is perfect}.

This is wrong.

 

For these two comments, I decided to write a very brief discussion here about {what linguistics (language) is}.

While most of the members of this forum are human language linguists, I will discuss this linguistics issue in its rightful scope (much bigger than the human languages). You (the readers) need not get into it too deep. But a superficial understanding of the SCOPE of linguistics is necessary even for discussing human languages.

 

For a system T, it is a language if it can describe a system U (universe).

In general, U is not T. However, U is T is still meeting the above definition. Yet, this self-mapping will not be discussed here.

 

With the above definition, the FIRST question will be {what is the smallest T?}

Example: T has only one token, such as {1}. U has three members: {apple, orange, egg}.

Can T describe U? The answer is Yes.

For apple = 1

Orange = 11

Egg = 111

So, the system T (with only one token) can be a language for U (with three members).

 

The next question is {what is the biggest U?}

How about U = the entire natural universe.

However, we do not truly know what the {entire nature universe} is and thus are unable to deal with it analytically.

 

Fortunately, we can describe some known universes.

U1 = computable universe; everything (members) in U1 is computable.

U2 = U1 (computable) + un-computable universe; some members in U2 are not reachable by any computing algorithm.

U3 = U2 + countable infinite universe.

U4 = U3 + uncountable infinite universe.

 

Then, the third question will be {what kind of language system is needed for those universes?}

Can the above T {1, with only one token} be the language of U1?

The answer is NO.

Yet, there is a math theorem (proved) that a two-token system can be the language for U1. That is, T2 = {two tokens, such as (0, 1), (yin, yang), (man, woman), etc.}. This is a proven math theorem, and I thus will not provide any further explanation here. But most of the high school students today know that only two codes are needed for all computing universes.

 

Then, can the language T2 describe the U2 (including the un-computable)?

Anyone who can read definition knows the answer right the way. It is a big NO.

Then, what kind of language system is needed for U2, U3, and U4?

The answers are:

For U3, T3 must have 4-codes.

For U4, T4 must have 7-codes.

 

Again, you (the readers) need not get into the above too deep, just understand that the above issues are parts of linguistics.

 

In fact, for the human languages, we can arbitrarily use more codes, such as 26 alphabets and 220 Chinese root words.

 

With the above, we, now, have the 4th question: {is the U4 the biggest U (universe)?}

And can T4 (the language of U4) be able to describe a U bigger than U4?

The MOST answer, thus far, is NEGATIVE.

 

In Christian theology, God is totally incomprehensible (thus only faith can reach God); that is, God is beyond the U4 and T4 (the largest human language).

In Zen Buddhism, the highest wisdom (the Nirvana) is beyond the description of human language (T4) and can be reached only via kōan.

In math, there are Gödel’s incompleteness theorems, saying that there is always a math statement outside of the entire math universe.

 

The three above shows that there is something unreachable by the largest REAL language system. That is, we can now define {what is the ‘ideal language’?}.

 

{Ideal language is a language which can describe ‘that thing’ which is beyond the U4.}

 

With a clear definition, we now can address the issue of ‘ideal language (IL)’.

Is IL an ontological reality? If it is, how can we show (prove) it?

 

For a linguist who studies human natural language only, he needs not get into the depth of the above issues. But the above issues nonetheless are the foundations of ALL (any) linguistics.

 

The key points of my book {Linguistics Manifesto} discuss the above issues. If you are interested in some detailed arguments, it is available at many Ivy university libraries (such as Harvard, Columbia, Cornell, etc.; see https://search.worldcat.org/title/688487196 }

 

The conclusion is that the HUMAN natural language is bigger than the entire math universe and is able to describe ‘that something’ of Zen Nirvana or of God of Christian.

 

That is, we can now not only describe the ontological issue of ‘ideal language’ but is about the ideal language in terms of human natural language.

 

In my previous post, I have defined ‘language’.

A system L is a language for U (an arbitrary universe) if L describes U.

That is, linguistics is a study about L and U (not just L), especially about U, as L is only a reflection of U.

 

Thus far, we know, at least, three U.

U (C) = U (computable), infinitely large in size

U (NC) = U (C) + non-computable

U (In) = U (NC) + infinities

At this point, we (the humanity) are 100% confident that there is an L (In) for U (In), and thus I will not address this L (In).

 

However, there are some claims for some U which are larger than U (In), such as:

U (Ch) = U (Christian) = U (In) + G (God); There is no way of any kind that we can squeeze the something (God) into U (In)

U (z) = U (Buddhism Zen) = U (In) + N (Nirvana)

U (pa) = U (paradox) = U (math, logical and analytical) + P (paradoxes); no way to eliminate the paradoxes in any kind of math universe.

 

Gödel’s theorems guaranteed that there is no L (math) for U (Pa). Others also claim that there is no L of any kind for U (Ch) and/or U (z). I will call these U as U (we) = U (weird).

 

The above is the current paradigm.

Then, I did two things in my previous post.

One, I defined ‘ideal language’. If a system L can describe U (weird), then L is an ideal language.

Two, I claimed that ‘human natural language’ can describe U (weird).

 

There is, of course, no argument about the definition. But there are many problems with the Claim.

The first big, big problem is {what the heck is a human natural language?}

Are human natural languages essentially equal? If not, then which human natural language can be used as evidence for the claim?

 

So, for this big claim, the key, key issue is {what the heck is a human natural language?} This is a huge, huge issue, and I will discuss it later.

 

Let’s assume that we do know what the heck a human natural language is; then, how can we prove it can be a language of U (weird)? The proof is very, very complicated. But I should, at least, show the strategy here. There are two steps.

 

Step one: proving that U (ch), U (z) and U (pa) are isomorphic, exactly identical in SIZE or scope (on its capacity). That is, if we can prove that one L (human) encompasses one of the U (weird), it will encompass all.

Step two: to show that that L (human) does encompass one U (weird). In my work, I used U (paradox) as the U (weird).

 

But first thing first, {what the heck is a human natural language?}; its body (structure), its soul (meta-base) and its dress.

 

Two,

My work is about what 'language' is and what linguistics is.

That is, my points are:

One, what is the scope of languages?

The computational language (all computer languages) can only encompass the computable universe (a very small part of the real universe). All computational languages can be defined with a set of axioms and rules. When someone gives me a set of requirements, I can design a computer language (such as Basic or C++) in 10 hours, although it might take years to refine it.

On the other hand, the human natural language (HNL) has the largest scope which can encompass any universe (including the Christian God, Zen Nirvana, or else).

 

Two, what is the base for all languages?

I have shown that MLT (Martian Language Thesis) ensures that all languages share the identical meta-language, and this gives rise to three points.

     Frist, all HNLs have the same scope (capacity).

     Second, the translation among all HNLs is ensured.

     Third, the existence of a universal language is ensured in principle.

 

Three, the basis (reason) for the diversity of languages.

What is the principle to allow all HNLs to choose their own way of syntax-ing (Phonology, Morphology, and Pragmatics)?

I have shown the SWP (Spider Web Principle).

Then, SWP gives rise to a language spectrum (from type 0 to type 1), see chapter 24. Some attributes can be clearly defined for these types, such as the issues of {Predicative, Inflection, Redundancy, Non-Communicative, Exception, etc.}.

With a spectrum, the HNLs are defined by two extremes: the type 0 becomes a Conceptual language, type 1 the perceptual language.

With a spectrum, some evolution rules (laws) can be developed (discovered), such as {the Operator of pidginning (moving away from the original language) and the Operator of creoling (moving toward the original language).

 

All the above issues are definitely Human Natural Language issues.

 

Thus far, I have only discussed the scope of languages. The bigger issue is the scope of linguistics. What can it encompass?

I have shown a "Large Complex System Principle" (LCSP) in my book {Linguistics Manifesto} -- there is a set principle that governs all large complex systems regardless of whatever those systems are, a number set, a physics set, a life set, or a vocabulary set.

 

     Corollary of LCSP (CLCSP) -- the laws or principles of a "large complex system x" will have their correspondent laws and principles in a "large complex system y."

 

In the HEP (High Energy Physics) community, TOE (Theory of Everything) means to unify gravity with other 3 fundamental forces (electromagnetic force, strong force and weak force). On the contrary, the CLCSP insists that nature TOE encompasses EVERYTHING {physics, mathematics, life science and social science (economy and politics) and linguistics}.

 

That is, linguistic laws and principles can and must govern all other disciplines (physics, math, or life science, etc.), and this is discussed in detail in the book {Nature’s Manifesto, US copyright # TXu 2-078-176}, and it is collected by many Ivory University Libraries.

 

The Pdf version of this book {Nature’s Manifesto, 6th edition} is available at  https://tienzengong.files.wordpress.com/2020/04/6th-natures-manifesto.pdf

That is, this book {Linguistics ToE} is a sister book of {Nature’s Manifesto}.

 

Three, 

the base of Prebabel

What human natural language (HNL) can I use to prove that HNL is an ideal language?

Do you (the readers) know?

I don’t. I have no slightest idea of where and how to start addressing this issue.

 

Thus, my only choice is by using the Martian language, that is, with the Martian Language Thesis.

{The Martian Language Thesis (MLT) -- Any human language can always establish communication with the Martian or Martian-like languages.}

 

The MLT shows that all languages have the same meta-language.

 

What is the meta-language then?

Meta-language consists of four parts:

     One: the universal laws (physics, math, etc.) continent: all universal events are described by the universal laws.

     Two: the universal conscientiousness (meaning) continent: the human conscientiousness views the universal laws in an identical way, getting the identical MEANING for all universal laws.

     Three: there is a Grand Canyon between these two continents.

     Four: Human natural languages are different symbol systems for connecting these two universal CONTINENTs.

 

For example, I am meeting a beautiful Martian lady and want to offer her some gifts.

I first gave her an apple and said apple. She happily accepts and says Yaya.

I then gave her an orange, said orange. She calls it Kaka.

Soon, a translation table is built, and we can communicate ever after.

 

Now, I can define what human natural language (HNL) is.

HNL is a system based on a universal meta-language to express or to describe some world events.

Then, there are immediately three consequences.

     One, all HNLs must be equal in capacity.

     Two, the translation among all HNLs is guaranteed.

     Three, a universal language is possible in principle.

 

With the Martian Language Thesis (MLT), human natural languages are obviously having two levels.

      The base:

    1) syntaxes to describe the universal laws (physics, math, etc.) and world events continent,

    2) semantics to interpret (infer) those syntaxes.

     The dress: the choices of symbols or tokens for those syntaxes (with verbal or with lexicons), having both is not a necessary condition (one of them is enough). This leads to Phonology, Morphology. The different choices will result in different pragmatics. So, the teaching that pragmatics is a subset of semantics is wrong in principle.

 

The above shows that there is no FREEDOM of choosing the base, that is, all HNLs are equal in capacity.

However, there is infinite freedom of choice for dress. Then, the different dresses will have different efficiencies (in addition to the capacity). That is, we can define a ‘perfect efficient HNL’, {THE perfect language}.

 

There are thousands of living human natural languages today, and each one of them has different phonology, morphology, and pragmatics. To understand their differences is very important. Yet, my concern here is only about the reason why they can be so different. It is based (caused) by a Spider Web Principle.

 

{The "Spider Web Principle (SWP)" -- The whereabouts to build a spider web is completely arbitrary (total freedom or total symmetry). However, as soon as the first spider thread is cast, that total symmetry is broken, total freedom no more.}

 

The first thread determines its whereabouts (America, Europe, Asia, etc.). The second thread defines its center. The third thread confines its scope.

 

Thus, as soon as the first morpheme or the first grammar rule of a language is cast, it enters into a Gödel system; consistency becomes the norm, and total freedom is no more. That is, every language has its own internal framework regardless of the fact that universal grammar is about total freedom. Thus, universal grammar has two spheres.

     1. Universal level -- total freedom. Every language can choose its grammar arbitrary with total freedom.

     2. Language x level -- as soon as a selection is made, it becomes a "contract" (among its speaking community) with a set of the internal framework.

 

"Spider Web Principle (SWP)" is the first principle of linguistics.

The Martian Language Thesis (MLT) is the second principle for linguistics. It encompasses the following attributes.

     1. Permanent confinement -- no language (Martian or otherwise) can escape from it.

     2. Infinite flexibility -- it can encompass any kind of language structure.

     3. Total freedom -- no limitation is set for languages.

 

So, the MLT guarantees that all HNLs (human natural languages) have the same capacity while the (SWP) guarantees that all HNLs have the total freedom of choosing their own way of syntax-ing (the dress of HNL: phonology, morphology and/or the pragmatics).

 

How big this freedom is? It is infinite, such as from 1 to (infinite). Yet, in number theory, the scope of [1, ] = [0, 1]. Thus, the entire scope of the infinite can be expressed with (or confined in) [0, 1], that is, the dress of all HNL can be expressed in a spectrum between [0, 1] (see chapter 24).

 

In my book {Linguistics Manifesto}, I defined three types of HNL (human natural language).

     One, type 0: there are many attributes for each ‘0’. Here, I will simplify it as {non-inflection = 0},

     Two, type 1: {inflected = 1}

     Three, between [0, 1].

 

In that book, I also show that there is an efficiency issue among the different types of HNL although their capacities are all equal. I, thus, defined “Perfect Language”.

 

Perfect language has three attributes:

     One, with only a finite number of tokens (roots or alphabets), it can construct unlimited words (vocabulary).

     Two, the sound (pronunciation) of each word can be read out from its face.

     Three, the meaning of each word can be read out from its face.

 

Thus far, I have defined ‘ieasl language’ via the scope of a language. Now, I have defined ‘THE perfect language’ via efficiency.

 

 

Someone said: {(your work) …loaded with a mathematical approach which has no linguistics value in natural languages, such as ‘of what value is this in natural languages’.}

 

Four,

finding the Prebabel

Last but not least, is there a universal (human) language?

If yes, then how can we get it?

After we get it, how can we prove it being universal?

This will be the issue that I want to discuss.

 

{Go to, let us go down, and there confound their language, that they may not understand one another’s speech. So, the LORD scattered them abroad from thence, upon the face of all the earth: and they left off to build the City. Therefore, is the name of it called Babel, because the LORD did there confound the language of all the earth: and from thence did the LORD scatter them abroad upon the face of all the earth. (Genesis, chapter 11: 7 to 9)}

 

This Bible story shows that the diversity of the human language was caused by God’s action, but it does not mention the cause for the rising of the PreBabel (universal) language.

 

Yet, I have shown that the MLT (Martian Language Thesis) is the base for all HNLs (human natural languages). That is, a universal language (PreBabel) is possible in principle.

 

Furthermore, the SWP (Spider Web Principle) guarantees that God’s action to scatter them all abroad is not a fiction, as it can be done in reality.

 

Now, my objective is to construct a universal language. My first step is to make all HNLs mutually translatable; that is, I need to make translation tables for ALL of them.

 

If the task is only about three languages, I will need three translation tables, such as {A, B, C == > Ab, ac, bc}. If the task is about 5 languages, I need to make 10 tables {A, B, C, D, E    == > Ab, ac, ad, ae, bc, bd, be, cd, ce, de}. In fact, the number of translation tables for an n-languages task will be:

Y (number of translation tables) = n (n-1)/2

If n= 3, Y = 3

N= 5, Y = 10

N = 1000, y = 499500

Today, there are over 7,000 living languages. That is, Y = 24.5 million. That will be a very big job.

 

Fortunately, there is a shortcut. If we choose one language as the master (the center) and make translation tables only from this center. Then, for 7,000 languages, we need only 6,999 translation tables, as the center language needs no translation for itself.

 

That is, the translation between any two languages (E or D) can be done in two steps.

     First, translate E to C (the center master)

     Second, translate C to D.

 

This shortcut reduces my task 7,000 times.

Then, which language should be chosen as the center master? In principle, any language will be fine. But if we want to reduce our task even further, more criteria are needed.

 

In 1997, I published a law: {If we can encode ONE human natural language with a closed set of root words, then any ARBITRARY vocabulary type language will be organized into a logically linked linear chain too.}

If we can use that {closed root set} to construct a virtue language as the center master, my task will be further reduced about 100 folds.

 

But the catch was that I did not have a {code set} at that time and did not know which language would be the best candidate if I could find a {code set}. I simply had no idea how to construct such a code set. Even if I did construct a code set, there would be a mammoth job to verify it.

 

Twenty years later, I did find that {code set}. With that code set, we can construct a VIRTUE language as the center for our translation task. Yet, this virtue language is, in fact, a universal (PreBabel) language.

 

All my above discussions are theories. Without finding or constructing a REAL language that meets all the above descriptions, all the above will simply be nonsense.

 

As always, a theory is a guiding light for its description. In this case, the ‘closed encoding set (CES)’ is that guiding light. Then, how to find such a CES?

 

The way is to analyze what consequences that a CES will produce. If a language is based on a CES, then the meaning of every vocabulary (word) can and should be read out from its face. And this becomes the sole searching criterion.

 

Now, the entire PreBabel (universal language) program becomes clear.

 

     One, criterion: if we can find a CES, then we can encode, at least, one HNL (human natural language).

     Two, consequence 1: if we can encode one HNL, we can encode ALL HNLs, and this is based on the MLT (Martian Language Thesis).

     Three, consequence 2: when a CES can encode all HN Ls, then we can construct a virtue language (VL, the Mother Proper) with it too, see chapter 27. And this VL is, in fact, a universal language.

     Four, the verification on CES is guaranteed as the vocabulary of any HNL is finite and thus can be checked 100% in addition to theoretical proof.

 

With the four above, the issue becomes Yes or No, no arguments of any kind can be made.

If we can show that one CES can encode ONE (anyone) HNL, the answer is Yes.

If we cannot find such a CES, then the PreBabel is No, regardless of what God did say, and all my saying above is simply nonsense.

 

Fortunately, the news is good. I did find one CES and showing it is the key objective of this book.

 

For this CES, I had some discussion at ‘Historical Linguistics and Etymology (at Facebook), see https://www.facebook.com/groups/historicallinguisticsandetymology/permalink/2477904812498560/

 

Many members of this forum hold this view: {Every language is "ideal" for the environment in which it developed, just as living organisms are ideally adapted to their environments.}


Five,

Arguments and pieces of evidence for the concept of a universal language.

  1. Definition and Scope of Language: The article defines a language as a system that can describe a universe (U). It explores different universes (U1, U2, U3, U4) and the types of language systems (T) needed to describe them. For example, a two-token system can describe a computable universe (U1), while more complex universes require more complex language systems.
  2. Martian Language Thesis (MLT):
  3. Spider Web Principle (SWP):
  4. Human Natural Language (HNL) Capacity: The article argues that all human natural languages have the same capacity due to the MLT, ensuring translation among all HNLs and the possibility of a universal language in principle.
  5. Efficiency and perfect Language: where the pronunciation and meaning of each word can be read from its face. It suggests that the Chinese written system, when understood through Chinese Etymology {see the book (PreBabel,  ISBN 9786204986821, US copyright © TX 8-925-723)}, meets these criteria and is a perfect language.
  6. Translation and Universal Language: to construct a universal language by creating translation tables for all human natural languages, using a central master language to reduce the number of required translation tables.
  7. Closed Encoding Set (CES): The article introduces the concept of a CES, which can encode any human natural language. With a CES, it can encode all HNLs, leading to the construction of a universal language.

These arguments and pieces of evidence collectively support the reality of a universal language, grounded in theoretical principles and practical steps for implementation.

  

Key differences from traditional linguistics theories:

1)      Scope of Linguistics: Traditional linguistics often focuses on human natural languages, encompassing phonology, morphology, syntax, and semantics. This chapter expands the scope of linguistics to include the study of any system (T) that can describe a universe (U), not just human languages.

2)      Definition of Language: Traditional linguistics defines language as a system of communication used by humans. This chapter defines language more broadly as any system (T) that can describe a universe (U), where U is generally different from T.

3)      Ideal and Perfect Languages: Traditional linguistics does not typically address the concepts of ideal or perfect languages. This chapter introduces the idea of an ideal language, which can describe realities beyond the largest known universe (U4), and a perfect language, which has three attributes: a finite number of tokens, pronunciation readable from the word's face, and meaning readable from the word's face.

4)      Martian Language Thesis (MLT): This chapter introduces the Martian Language Thesis, which posits that any human language can establish communication with Martian or Martian-like languages, implying all human languages share a universal meta-language. This is not a concept found in traditional linguistics.

5)      Spider Web Principle (SWP): This principle explains the diversity of languages as a consequence of initial arbitrary structural choices, which become fixed "contracts" within language communities. Traditional linguistics does not typically use this principle to explain language diversity.

6)      Universal Meta-Language: This chapter posits that all human natural languages share a universal meta-language, which consists of universal laws, universal consciousness, and human natural languages as symbol systems bridging these two. Traditional linguistics does not typically address the concept of a universal meta-language.

7)      Large Complex System Principle (LCSP): This principle suggests that laws governing large complex systems in one domain correspond to those in others, implying that linguistic principles apply universally, including in physics, mathematics, life sciences, and social sciences. Traditional linguistics does not typically extend its principles to other disciplines.

       These differences highlight the broader and more integrative approach taken in this chapter compared to traditional linguistics theories.

 

For Linguistics ToE, it is available at { https://tienzengong.wordpress.com/wp-content/uploads/2025/09/2ndlinguistics-toe.pdf  }

 

No comments:

Post a Comment