Language-Oriented Data
Truenumbers is the first to represent data as facts written in a structured natural language. It unifies the way users, developers and every part of a system talk about information, independent of storage implementation. Data you need to use, create or analyze every day is at your fingertips - not behind a wall of enterprise applications. Truenumbers scales from desktop to enterprise and is a better system of record than anything you have down in I.T. today.
Natural language offers several significant advantages over traditional schema-based data representations, addressing limitations such as data silos, lack of descriptive meaning, and the challenges of integration and interoperability.
Limitations of Traditional Schema-Based Data Systems
• Lack of Descriptive Value and Meaning: Relational tables, objects, or XML structures are primarily for computation and have no inherent descriptive value. An XML element or table column doesn't describe its contents; meaning is weakly implied by labels. Queries often rely on human understanding of labels like "Aircraft" or "FerryRng" to make sense, rather than the schema itself providing explicit meaning.
• Information Silos and Lack of Liquidity: Different schemas and formats create silos, limiting agility, flexibility, and interoperability. Integrating and federating data across multiple schemas is an ongoing, often unsuccessful task that struggles to keep up with system development and changing information needs. This "silo effect" destroys the inherent information liquidity found in natural language.
• Difficulty in Capturing and Expressing Rich Information: Schematic data is improved for computation but often diminished from the perspective of rich information capture. More precise data models, like ontologies, are difficult to build and maintain and move away from the expressivity and nuance of natural language. Description is often treated as separate metadata, rather than being part of the data itself.
• Accessibility and Shared Understanding Challenges: Information moved from human-readable handbooks to "arcane databases" with the advent of computers, making it less accessible to users. Labeling columns with abbreviated or technical terms (e.g., "FerRngNm" instead of "ferry range") can obscure information from systems and users.
How Natural Language Overcomes These Limitations (Truenumbers Approach)
• Universal and Descriptive Data Representation: People communicate about any information using natural language without needing a predefined schema or model. Phrase-oriented data, used by Truenumbers, indexes data values by descriptive natural language phrases, making it universal, human, and machine-readable. Truenumbers conceptualizes data as a linguistic artifact, aligning with how humans think, speak, and reason.
• Enhanced Data Liquidity and Intelligibility: Truenumbers leverages the inherent liquidity and intelligibility of natural language as a basis for a new data strategy. By using natural language phrases, data becomes more accessible and understandable to both humans and machines, reducing the "silo effect" caused by disparate schemas.
• Meaning as the Core Content: Meaning in human terms requires narrative. Truenumbers makes description the main content of data, much like a reference book, rather than separating it as metadata. Each Truenumber is an atomic sentence with a subject phrase, a property phrase, and its value, making it a linguistic unit of meaning. This approach makes the data itself the metadata.
• Common Language and User Custody of Information: Natural language allows communities of interest, like warfighters and system architects, to share a common understanding and terminology. Truenumbers aims to return custody of information to its users, allowing them to function as peers speaking the same data language.
• Seamless Integration and Interoperability: Natural language inherently avoids the silo problems of computational data. Truenumbers, as a universal data language, integrates legacy and new platforms and can become a unified data platform. Its consistent structure means applications become interoperable, reducing the amount of knowledge embedded in code.
• Support for Both Computation and Narrative: While natural language is traditionally intractable for computation, Truenumbers bridges this gap. It allows computation and rich human description simultaneously. Truenumbers is designed to be written, managed, and searched like narrative text, while also supporting computation effectively.
• Improved AI and Analytics: By making data language-based and semantically rich, Truenumbers provides a foundation for advanced analytics, AI, and Machine Learning. It serves as an ideal input for Large Language Models (LLMs), preserving context, intent, and structure without needing complex schema mappings.
• Simplified Structure and Expressivity: Truenumbers employs a structured subset of natural language, using simple sentence structures like "has =" or "of is". Noun phrases can include adjectives and possessives (indicated by "of") and are represented as phrase-paths (e.g., earth/moon, diameter:equatorial), which are ideal for indexing and search.
• Other Benefits: Truenumbers also offers benefits like data portability, real-time distributed system capabilities, thin clients, and enhanced compliance and security due to its immutable and self-describing nature.
Truenumbers have these characteristics
truenumbers can be about any domain, as the list above shows. expressed in a structured yet readable LANGUAGE they are ideal for search, indexing and analytics
Information is a collection of portable values each with its own description
These facts, called truenumbers, are represented in a structured natural language for machines and people
Relationships among facts are expressed by their use of common terms and phrases, just as in common language
Physical quantities and units of measure are built in to represent real-world data
Can be adopted one use-case, one data source at a time. No disruption.
Quick look: a truenumber in Excel
A truenumber is described in a simple language that's natural and easy to learn. We'll define that language in more detail later, but here's an example of making a truenumber manually using the TrueOffice Excel add-in. We simply write the following into a spreadsheet cell:
[estimated construction cost of the new data center = 35 USD millions].
This is sent to the cloud where it's compiled, stored, and returned to your spreadsheet cell as a truenumber. There, TrueOffice grabs the descriptive DNA inside and uses it to generate a cell comment automatically.
The truenumber looks like the number "35" to Excel, so the spreadsheet will work normally for any Excel users with or without Truenumbers. Copy it to Word or email and its DNA goes with it where TrueOffice can use it to generate a sentence or a footnote.
Having self-contained data like truenumbers opens up many new possibilities. Each truenumber has a subject, author and creation date to help you organize and search them, so you could find all truenumbers that have "new data center" as their subject, or search for "costs" or "construction costs". Convert from USD to any other currency because our number knows it's in millions of dollars. Truenumbers can also be tagged, which is the way to organize truenumbers on the fly, and create business processes without programming that provide better governance than your best enterprise software.
Truenumber language
Computer data is just bits. To make sense and be useful, there must be natural language descriptions of it, like labels on classes and program variables, concept names in ontologies and prompts in user interfaces. All these are just comments, or "descriptive names" for human developers or users, and ignored by system code. Truenumbers elevates the descriptive connection between data and domain to be the core of its data representation for the system as well as for people. Let's see how this works. Here's another example of a truenumber sentence:
“the antenna of the Chrysler building has nominal height = 71 feet“
This looks like human natural language, but Truenumbers is restricted to statements of a very specific form, that give a value for a property of something. One of these allowed sentence forms is <subject> has <property> = <value>. In the sentence above, the subject is "antenna of Chrysler building" and the property is "nominal height".
Subject and property get encoded as special strings called Structured Resource Descriptors (SRD). An SRD is a path-like sequence of words separated by colon ( : ) and forward-slash ( / ) operators. The colon operator encodes an adjective-noun relationship so building:Chrysler would be the SRD for the phrase “Chrysler building”.
The slash operator acts like “of” used in English used to mean belonging to, or part of. So, the phrase “antenna of Chrysler building” has SRD building:Chrysler/antenna and is the subject of the sentence. The property measured by a truenumber is encoded as an SRD too. It’s OK to use SRD’s in sentences if you want, instead of equivalent phrases, so this example could also be written:
"building:Chrysler/antenna has height:nominal = 71 ft"
Emergent domain knowledge
Searching for properties of 7049aluminum bar stock using a path in a subject tree
SRD subjects, tags and properties shape the knowledge contained in truenumbers. Given a bag of truenumbers, the SRDs tell us what subjects are being talked about, what sort of properties are of interest, and so forth. SRDs compose naturally to form trees, useful tools for managing and visualizing the vocabulary of a domain. As we gather more facts about buildings, we might find that the building tree has hundreds of branches, one for each building. Yet, an SRD is computationally very light-weight, being only a string, so vocabularies can be complex and large. You can choose to lock down your vocabularies up front, let them grow organically or anything in-between.
Truenumbers are statements of measurement
In a database, person's height would be a floating-point number, and the data model would imply in some way that the number was a height, and in what units. For example, a column-name like HEIGHT_INCHES. Truenumbers, instead, have units of measure and tolerances built into all numbers, and the property measured is part of the data as an SRD. This lets data benefit from the fact that physical quantities are a kind of "standard" that real-world data naturally adheres to, giving data from differing domains a baseline for comparisons and cross-domain analytics.
This picture shows an error being reported when trying to create a truenumber where the units don't match the property. In this case, "ft" is known to be a unit of length, not area. It would have to be "ft^2" or "in^2" or "acres", etc. to create the number.
Which unit of area you choose doesn't matter because Truenumbers internal math compares all values in standard SI units no matter how they are expressed.
The truenumber internal math engine is both units and tolerance aware, and can be used to define one truenumber as an expression combining other truenumbers. The example at left show a truenumber for the area of an angled metal bar defined in terms of the bar dimensions. Tolerance on the bar length is reflected in the uncertainty on the resulting area.
Data Management as Content Management
Computers can do meaningful computation but also help manage human content without regard to what it means (like a word processor or document manager does). Truenumbers is the first to represent data as human content, acting as a librarian or card catalog. The more we say, the more we know. Philosopher Ludwig Wittgenstein realized the consequences of this distinction around 1950. It that sense, he is the father of Truenumbers.
“Creating meaningful statements is not a matter of mapping the logical form of the world.
It is a matter of using conventionally defined terms.” - Philosophical Investigations, Ludwig Wittgenstein (1950)