Abstract

This article presents Etherius, a document-grounded AI research agent designed to support scientific exploration, retrieval, and interpretation within the framework of the BSM-SG theory (Basic Structures of Matter – Supergravitation). Etherius was developed as a specialized knowledge engine rather than a generic conversational assistant. Its purpose is to work directly with a large curated corpus of BSM-SG materials, extract relevant passages, synthesize source-based answers, assist with formula-oriented reasoning, and provide transparent access to the underlying documents.

Unlike general-purpose AI systems that rely primarily on broad pretraining and may produce ungrounded summaries, Etherius is built around a retrieval-augmented architecture that prioritizes local scientific sources. The system currently operates on a corpus of approximately 3,549 indexed documents/pages, including PDF and DOCX materials, and processes them into approximately 11,903 retrieval chunks for semantic and keyword-based access. This makes it possible to ask targeted theoretical questions and receive answers grounded in the actual BSM-SG corpus.

Etherius introduces several practical advantages for researchers working on alternative or emerging physical frameworks. It enables structured access to a large and otherwise difficult-to-navigate body of literature, improves the speed of conceptual lookup, supports formula-oriented querying through a dedicated Physics Mode, and links answers back to source pages inside an integrated PDF workspace. In this way, it serves not merely as a chatbot, but as an early-stage AI research workstation for BSM-SG knowledge exploration.


1. Introduction

As scientific theories grow in scope, depth, and documentary volume, one of the major barriers to progress becomes not only discovery, but navigation of accumulated knowledge. This is especially true for specialized frameworks that do not fit neatly into mainstream academic databases, standardized taxonomies, or widely supported digital tooling.

The BSM-SG theory represents such a case. It proposes a highly distinctive conceptual framework concerning the structure of matter, supergravitation, vacuum dynamics, lattice-like organization, and deeper physical relations that aim to connect microphysical and cosmological domains. Over time, this framework has generated a significant body of texts, books, notes, revisions, derived interpretations, translations, and supporting materials. For any researcher, collaborator, or reviewer, one of the immediate challenges is simple but fundamental:

How can one efficiently search, compare, interpret, and verify the content of such a corpus?

Etherius was created as an answer to this problem.

Rather than treating AI as a generic language interface, the Etherius project treats AI as a scientific access layer over a specialized and growing body of theory. Its objective is to make BSM-SG documents searchable, interpretable, and operationally useful through a research-oriented AI agent that remains closely tied to the source material.

The project is therefore not just about building a chatbot. It is about establishing a BSM-SG knowledge engine capable of supporting:

  • document-grounded answers,
  • scientific concept retrieval,
  • source-aware synthesis,
  • formula-oriented explanation,
  • PDF-based evidence inspection,
  • and future expansion into comparative reasoning and research drafting.

2. Why a Dedicated BSM-SG AI Agent Is Needed

Most conventional AI assistants suffer from a core limitation in specialized scientific contexts: they are often strong at fluent language generation but weaker at document fidelity. When asked about highly specialized frameworks, they may produce responses that sound plausible while blending together:

  • partial prior knowledge,
  • generalized textbook patterns,
  • assumptions from mainstream paradigms,
  • and only weakly relevant fragments of the intended theory.

For a framework like BSM-SG, this is not acceptable. Theoretical precision matters. Terminology matters. Internal consistency matters. Source pages matter.

A dedicated BSM-SG AI agent is needed because the theory has several characteristics that make general AI insufficient:

2.1 Specialized terminology

BSM-SG introduces concepts, structures, and relationships that require careful handling. These concepts cannot be reliably reconstructed from mainstream physical language alone.

2.2 Large internal corpus

The body of work is already substantial and continues to grow. Human memory alone is not enough to efficiently traverse thousands of pages of interrelated content.

2.3 Need for source-grounded reasoning

Researchers need answers that can be checked against documents, not merely paraphrased in general terms.

2.4 Formula and law-oriented interpretation

Many questions concern laws, constants, dependencies, and physical relations. This requires an agent capable not only of retrieval, but of structured formula presentation.

2.5 Cross-document synthesis

The theory is distributed across multiple documents, revisions, and formats. Useful answers often require combining several retrieved passages rather than quoting one page in isolation.

Etherius addresses these requirements directly.


3. What Etherius Is

Etherius is a retrieval-augmented AI research assistant tailored specifically for the BSM-SG corpus.

Its current architecture combines:

  • local document ingestion,
  • semantic search through vector embeddings,
  • BM25 keyword retrieval,
  • hybrid source ranking,
  • strict source-grounded answer mode,
  • physics-oriented output formatting,
  • and an integrated PDF workspace for viewing source pages inside the application.

In practical terms, Etherius allows a researcher to ask questions such as:

  • What do the local documents say about the structure of matter in BSM-SG?
  • How is the supergravitational law described?
  • Which passages discuss the vacuum as physical space?
  • What is stated about the relation between force and distance?
  • Which source pages support the description of a specific constant or concept?

The system then retrieves the most relevant passages from the indexed corpus and generates an answer grounded in those passages.


4. Current Corpus Scale and Processing Capacity

One of Etherius’s major strengths is that it is not operating on a trivial sample set, but on a substantial body of real source material.

At the current stage of development, the system has been demonstrated on a corpus of approximately:

  • 3,549 indexed documents/pages
  • 11,903 semantic retrieval chunks

The corpus includes multiple document types, such as:

  • PDF documents
  • DOCX documents
  • plain text and markdown materials when needed

This matters for two reasons.

First, it means Etherius is not limited to one document or one small booklet. It already functions over a multi-thousand-page knowledge environment.

Second, chunk-based indexing allows the system to access not only file-level information, but localized concept-bearing passages within files. This is essential for scientific retrieval, because relevant evidence usually lives in specific segments, not in entire documents as a whole.


5. Core Technical Principles

5.1 Retrieval-Augmented Generation

Etherius is built around a retrieval-augmented workflow. Instead of asking the language model to answer from memory alone, the system first searches the local BSM-SG corpus and retrieves relevant passages.

This reduces the risk of generic or ungrounded responses and makes the assistant far more suitable for serious research use.

5.2 Hybrid Retrieval

The system uses a hybrid retrieval model, combining:

  • vector search for semantic similarity
  • BM25 keyword retrieval for exact or near-exact term matching

This dual strategy is important because scientific queries often require both:

  • conceptual matching,
  • and precise lexical matching.

For example, formula-related queries may use exact symbolic or technical phrasing, while conceptual queries may require semantic approximation.

5.3 Strict Mode

Etherius supports a Strict Mode, in which answers are constrained to the retrieved local context. This mode is especially important when the user wants high-confidence, corpus-grounded responses rather than speculative generalization.

5.4 Physics Mode

Etherius also includes a dedicated Physics Mode, designed for formula-oriented and law-oriented questions. In this mode, the system is encouraged to structure answers using:

  • a short scientific answer,
  • a formula in LaTeX,
  • explanation of symbols,
  • and a source-grounded interpretation.

This makes the assistant more appropriate for theoretical physics workflows than a standard prose-only chatbot.

5.5 PDF Workspace

A particularly valuable feature is the integrated PDF Workspace, which allows the user to open source documents and navigate to relevant pages inside the application. This transforms Etherius from a simple answer generator into a source inspection environment.


6. Why Etherius Is Valuable for BSM-SG Research

The value of Etherius is not merely that it answers questions. Its real strength lies in how it changes the workflow of theory exploration.

6.1 It compresses access time to knowledge

Searching thousands of pages manually is slow and cognitively expensive. Etherius can reduce this burden dramatically by surfacing the most relevant passages within seconds.

6.2 It supports document-grounded interpretation

Rather than separating interpretation from documentation, Etherius ties them together. This is essential for any theory that requires careful reading and internal consistency.

6.3 It improves theoretical continuity

As a body of work grows over time, it becomes increasingly difficult to maintain continuity across books, revisions, and discussions. Etherius acts as a continuity layer across the corpus.

6.4 It makes collaboration easier

A collaborator, reviewer, or professor does not need to know the entire archive by memory. They can ask focused questions and immediately see the retrieved sources.

6.5 It supports scientific transparency

Because the system shows the passages and pages behind its answers, it allows users to inspect whether the answer is justified. This is a major advantage over opaque AI output.

6.6 It is extensible

Etherius is not a closed endpoint. It is a platform that can be expanded into:

  • comparative theory analysis,
  • structured definition extraction,
  • formula indexing,
  • ontology building,
  • research drafting,
  • and eventually more advanced agentic scientific reasoning.

7. Example of a Research Workflow

A typical use of Etherius may look like this:

  1. A researcher asks a question about a BSM-SG concept, law, or constant.
  2. The system retrieves the most relevant passages from the indexed corpus.
  3. Etherius synthesizes an answer from these passages.
  4. The user reviews the source excerpts.
  5. The user opens the cited PDF pages in the integrated viewer.
  6. The answer is evaluated or refined based on the original material.

This creates a loop of:

question → retrieval → synthesis → verification

That loop is one of the most important foundations of credible AI-assisted scientific work.


8. Distinguishing Etherius from a Generic Chatbot

It is important to emphasize that Etherius is not simply a themed chatbot with a scientific name. It is better understood as an early-stage research operating layer over a specialized scientific archive.

A generic chatbot typically:

  • generates fluent responses,
  • uses broad prior knowledge,
  • and may not expose evidence.

Etherius, by contrast, is designed to:

  • search a local theory corpus,
  • retrieve the most relevant source passages,
  • synthesize source-backed answers,
  • present formulas more appropriately,
  • and allow direct inspection of the underlying documents.

This difference is crucial.

In specialized scientific work, traceability is often more important than eloquence. Etherius is valuable precisely because it attempts to preserve that traceability.


9. Advantages for Professors, Researchers, and Collaborators

For a scientific collaborator such as Professor Stoyan Sargoychev, Etherius offers several immediate advantages.

9.1 Faster access to relevant passages

Instead of remembering where a concept was discussed across multiple books or revisions, one can query the system directly.

9.2 Better review of theoretical consistency

By surfacing multiple related passages, Etherius helps reveal how a concept is treated across different documents or versions.

9.3 Easier validation of interpretations

A user can quickly compare the generated answer with the actual source pages.

9.4 More efficient discussion

During theory meetings or collaborative reviews, Etherius can function as a live document assistant.

9.5 A basis for future publication support

As the system matures, it can evolve from retrieval assistance into support for:

  • structured summaries,
  • terminology harmonization,
  • citation clustering,
  • and draft generation.

10. Current Limitations

It is equally important to state that Etherius is still an evolving system.

Its present limitations include:

10.1 Formula extraction is still partial

Although Physics Mode improves formula handling, formulas embedded in PDFs are not always captured perfectly during ingestion.

10.2 Concept chunking is still general-purpose

Current chunking is text-based and retrieval-oriented. Future versions should become more concept-aware and section-aware.

10.3 Retrieval can still miss ideal passages

Even with hybrid retrieval, some questions may return relevant but not optimal chunks, especially when terminology varies.

10.4 Scientific reasoning remains source-bound

Etherius is strongest as a retrieval and interpretation tool, not as an autonomous scientific theorist.

These limitations are normal for an early-stage research AI platform and point directly toward the next development steps.


11. Future Directions

The present version of Etherius already demonstrates a meaningful foundation, but its long-term potential is much larger.

Future upgrades may include:

  • formula-aware ingestion
  • section-aware and concept-aware chunking
  • metadata-rich retrieval
  • document family/version grouping
  • symbol dictionary extraction
  • cross-document comparison mode
  • research paper drafting assistance
  • semantic maps of BSM-SG concepts
  • ontology construction
  • experimental protocol support
  • theory-to-hardware design assistance

In other words, Etherius can evolve from a retrieval assistant into a broader BSM-SG scientific intelligence platform.


12. Conclusion

Etherius represents a significant step toward a new type of scientific AI tooling: one that is not merely conversational, but document-grounded, corpus-specific, and research-oriented.

By operating over a large BSM-SG corpus of approximately 3,549 indexed documents/pages and approximately 11,903 retrieval chunks, Etherius demonstrates that a specialized AI agent can make a complex theoretical body of work more searchable, more navigable, and more usable for real scientific collaboration.

Its major value lies in five areas:

  • access to a large and growing corpus,
  • grounded retrieval and synthesis,
  • source transparency,
  • formula-oriented reasoning support,
  • and integrated document inspection.

For BSM-SG research, Etherius is not just a convenience tool. It is the beginning of an AI-assisted scientific environment specifically aligned with the internal logic, terminology, and documentary needs of the theory.

As the corpus grows and the system becomes more concept-aware and formula-aware, Etherius has the potential to become a foundational instrument for the preservation, exploration, and further development of BSM-SG knowledge.