A Developmental Branching Model of Computation and Learning

Preface, From Fractal Intuition to Structural Adaptation

This work explores a computational system in which learning is not restricted to parameter optimization within a fixed architecture, but is instead defined as the joint evolution of parameters and topology under a shared evaluation signal.

The initial intuition behind this system is that adaptive intelligence does not require a fixed computational graph. Instead, useful computation can emerge from a structure that is itself subject to mutation, selection, and incremental refinement.

Rather than assuming a predefined neural architecture, the system begins with a minimal root node and evolves both its structure and parameters through stochastic perturbation and performance-based selection.

The central claim of this model is not biological fidelity, but computational sufficiency: that structured adaptation alone, without gradients or backpropagation, can produce nontrivial multi-task behavior within a single evolving system.

1. A System Defined by Structural Evolution

Traditional machine learning systems define a fixed computation graph prior to training. Learning is performed by adjusting parameters within this static structure.

In contrast, the system presented here treats structure itself as a mutable object. Each node contains both parameters and the capacity to generate additional computational structure.

The system begins with a single root node and evolves through repeated mutation and selection.

A node is defined as follows:


struct Node
{
    double bias;
    double wA;
    double wB;
    double wTask;

    std::vector> children;
};

Each node acts as a minimal computational unit capable of producing an output and spawning further computational units. Over time, this results in a recursively expanding tree whose topology is not predefined but discovered through search.

2. Computation as Recursive Selection

Computation in this system is defined as a recursive traversal over a branching structure in which child nodes compete for selection based on alignment with a locally induced signal.


double Forward(double a, double b, int task) const
{
    double x = bias + wA * a + wB * b + wTask * task;

    if (children.empty())
    {
        return x;
    }

    int bestIndex = 0;
    double bestScore = std::numeric_limits::max();

    for (int i = 0; i < (int)children.size(); i++)
    {
        double y = children[i]->Forward(a, b, task);
        double score = std::abs(y - x);

        if (score < bestScore)
        {
            bestScore = score;
            bestIndex = i;
        }
    }

    return children[bestIndex]->Forward(a, b, task);
}

Each node computes a local representation \( x \), and recursively evaluates its children. The child whose output is closest to the parent representation is selected for further evaluation.

This defines computation not as a single forward pass through a fixed graph, but as a recursive selection process over competing subcomputations.

The resulting system can be interpreted as a structured search process embedded within inference itself.

3. Learning as Structural and Parametric Search

Learning is implemented as a stochastic hill-climbing process over both parameters and structure. At each iteration, the system mutates the current model, evaluates performance, and either accepts or rejects the modification.


void Train(const vector& data, double targetErr, int maxIter)
{
    double bestErr = Evaluate(data);

    for (int i = 0; i < maxIter; i++)
    {
        Edit e;
        e.before = root.DeepCopy();

        Mutate(root);

        double err = Evaluate(data);

        if (err < bestErr)
        {
            bestErr = err;
        }
        else
        {
            root = e.before.DeepCopy();
        }
    }
}

This process does not rely on gradients or backpropagation. Instead, it directly evaluates full candidate structures under the objective function.

Both structural changes (adding or removing nodes) and parametric updates occur within the same optimization loop.

The system therefore performs joint search over:

continuous parameter space
discrete structural space

4. Mutation Operators and Structural Dynamics

The system evolves through three mutation mechanisms applied uniformly over the tree:

local parameter perturbation
addition of new child nodes
removal of existing child nodes


void Mutate(Node& n)
{
    Node* x = RandomNode(n);

    int type = RandInt(0, 2);

    if (type == 0)
    {
        x->bias += Rand(-0.5, 0.5);
        x->wA += Rand(-0.5, 0.5);
        x->wB += Rand(-0.5, 0.5);
        x->wTask += Rand(-0.5, 0.5);
    }
    else if (type == 1)
    {
        auto child = std::make_unique();
        child->bias = Rand(-1, 1);
        child->wA = Rand(-1, 1);
        child->wB = Rand(-1, 1);
        child->wTask = Rand(-1, 1);

        x->children.push_back(std::move(child));
    }
    else
    {
        if (!x->children.empty())
        {
            x->children.pop_back();
        }
    }
}

Through repeated application of these mutations, the system explores both functional variation and structural reconfiguration. Selection pressure is applied globally through the objective function.

5. Evaluation and Selection Dynamics

The system is trained using greedy acceptance-based optimization. Each mutation is evaluated against a dataset, and retained only if it improves performance.


double Evaluate(const vector& data)
{
    double err = 0.0;

    for (const auto& s : data)
    {
        double p = root.Forward(s.a, s.b, s.task);
        err += std::abs(p - s.target);
    }

    return err / data.size();
}

This defines a global objective function over the entire structure. Importantly, no gradient information is used; only full forward evaluation determines acceptance.

The resulting optimization process is a form of stochastic hill climbing over a hybrid discrete-continuous space.

6. Multi-Task Conditioning Within a Shared Structure

Task differentiation is achieved through a minimal conditioning mechanism:


double x = bias + wA * a + wB * b + wTask * task;

Rather than constructing separate networks for each task, the system embeds task identity into the same computation graph.

This forces all tasks to share structural resources, encouraging reuse and emergent specialization within different branches of the tree.

Importantly, task separation is not explicitly enforced through modular architecture; it emerges from shared optimization pressure across tasks.

7. Structural Memory and Persistent Computation

In this system, memory is not explicitly represented. Instead, it is encoded implicitly in the persistence of structural configurations that consistently reduce error.


int Count() const
{
    int n = 1;
    for (const auto& c : children)
    {
        n += c->Count();
    }
    return n;
}

Substructures that improve performance tend to persist under repeated mutation-selection cycles, while ineffective structures are removed.

This results in a form of structural memory, where useful computational motifs are retained in topology rather than stored in an external memory mechanism.

8. Distinction from Fixed-Architecture Learning Systems

This system differs from conventional neural networks in three primary ways:

1. Evolving topology

The computation graph is not fixed. It is constructed incrementally through mutation and selection.

2. Non-gradient optimization

Learning is performed via stochastic evaluation of full candidate structures rather than gradient descent over parameters.

3. Recursive selection-based inference

Inference is defined as recursive selection among competing subcomputations rather than uniform layerwise transformation.

9. A Unified View of Computation and Learning

The system can be interpreted as a structured hypothesis space embedded within a single evolving tree.

Each node represents a local hypothesis, each branch represents a refinement or alternative computation, and mutation generates new hypotheses for evaluation.

Learning corresponds to selecting structures that minimize global error under repeated evaluation.

This collapses the distinction between architecture and optimization into a single process: structural adaptation under performance pressure.

10. Conclusion

This model frames learning as joint optimization over functions and structures. Rather than treating architecture as fixed and only optimizing parameters, both are treated as mutable components of a unified system.

Within this framework:

parameters define local transformations
nodes define reusable computational units
branches define competing subcomputations
structure encodes persistent computational bias
learning is stochastic search over structure and parameters
computation is recursive selection among candidate evaluations

The resulting system demonstrates that non-gradient, structure-evolving computation can support nontrivial multi-task behavior within a single adaptive tree.

Appendix A: Full Source Code


#include <iostream>
#include <vector>
#include <memory>
#include <random>
#include <cmath>
#include <limits>
#include <iomanip>

using std::vector;

static std::mt19937 rng((std::random_device())());

double Rand(double a, double b)
{
    std::uniform_real_distribution<double> d(a, b);
    return d(rng);
}

int RandInt(int a, int b)
{
    std::uniform_int_distribution<int> d(a, b);
    return d(rng);
}

struct Sample
{
    double a;
    double b;
    int task;
    double target;
};

struct Node
{
    double bias = 0.0;
    double wA = 0.0;
    double wB = 0.0;
    double wTask = 0.0;

    std::vector<std::unique_ptr<Node>> children;

    double Forward(double a, double b, int task) const
    {
        double x = bias + wA * a + wB * b + wTask * task;

        if (children.empty())
        {
            return x;
        }

        int bestIndex = 0;
        double bestScore = std::numeric_limits<double>::max();

        for (int i = 0; i < (int)children.size(); i++)
        {
            double y = children[i]->Forward(a, b, task);
            double score = std::abs(y - x);

            if (score < bestScore)
            {
                bestScore = score;
                bestIndex = i;
            }
        }

        return children[bestIndex]->Forward(a, b, task);
    }

    Node DeepCopy() const
    {
        Node copy;

        copy.bias = bias;
        copy.wA = wA;
        copy.wB = wB;
        copy.wTask = wTask;

        copy.children.clear();

        for (const auto& c : children)
        {
            copy.children.push_back(
                std::make_unique<Node>(c->DeepCopy())
            );
        }

        return copy;
    }

    int Count() const
    {
        int n = 1;
        for (const auto& c : children)
        {
            n += c->Count();
        }
        return n;
    }

    void Print(int depth = 0) const
    {
        for (int i = 0; i < depth; i++) std::cout << "  ";

        std::cout
            << "Node bias=" << bias
            << " wA=" << wA
            << " wB=" << wB
            << " wTask=" << wTask
            << " children=" << children.size()
            << "\n";

        for (const auto& c : children)
        {
            c->Print(depth + 1);
        }
    }
};

struct Edit
{
    Node before;
};

struct Network
{
    Node root;

    double Evaluate(const vector<Sample>& data)
    {
        double err = 0.0;

        for (const auto& s : data)
        {
            double p = root.Forward(s.a, s.b, s.task);
            err += std::abs(p - s.target);
        }

        return err / data.size();
    }

    Node* GetNode(Node& n, int& index)
    {
        if (index == 0) return &n;
        index--;

        for (auto& c : n.children)
        {
            Node* r = GetNode(*c, index);
            if (r) return r;
        }

        return nullptr;
    }

    Node* RandomNode(Node& n)
    {
        int idx = RandInt(0, n.Count() - 1);
        return GetNode(n, idx);
    }

    void Mutate(Node& n)
    {
        Node* x = RandomNode(n);

        int type = RandInt(0, 2);

        if (type == 0)
        {
            x->bias += Rand(-0.5, 0.5);
            x->wA += Rand(-0.5, 0.5);
            x->wB += Rand(-0.5, 0.5);
            x->wTask += Rand(-0.5, 0.5);
        }
        else if (type == 1)
        {
            auto child = std::make_unique<Node>();
            child->bias = Rand(-1, 1);
            child->wA = Rand(-1, 1);
            child->wB = Rand(-1, 1);
            child->wTask = Rand(-1, 1);

            x->children.push_back(std::move(child));
        }
        else
        {
            if (!x->children.empty())
            {
                x->children.pop_back();
            }
        }
    }

    void Train(const vector<Sample>& data, double targetErr, int maxIter)
    {
        double bestErr = Evaluate(data);

        std::cout << "Initial error: " << bestErr << "\n";

        for (int i = 0; i < maxIter; i++)
        {
            Edit e;
            e.before = root.DeepCopy();

            Mutate(root);

            double err = Evaluate(data);

            if (err < bestErr)
            {
                bestErr = err;
            }
            else
            {
                root = e.before.DeepCopy();
            }

            if (i % 500 == 0)
            {
                std::cout
                    << "Iter " << i
                    << " Err=" << bestErr
                    << " Nodes=" << root.Count()
                    << "\n";
            }

            if (bestErr < targetErr)
            {
                std::cout << "Target reached\n";
                break;
            }
        }
    }

    void Test(const vector<Sample>& data)
    {
        std::cout << "\n=== TEST ===\n";

        for (const auto& s : data)
        {
            double p = root.Forward(s.a, s.b, s.task);

            std::cout
                << "Input=(" << s.a << "," << s.b << ") Task=" << s.task
                << " Output=" << p
                << " expected=" << s.target
                << "\n";
        }
    }

    void Print()
    {
        std::cout << "\n=== STRUCTURE ===\n";
        root.Print();
    }
};

int main()
{
    vector<Sample> train =
    {
        {1,2,0,3},
        {2,3,0,5},
        {3,4,0,7},
        {4,5,0,9},
        {5,6,0,11},
        {6,7,0,13},
        {20,40,0,60},
        {100,54,0,154},
        {72,17,0,89},
        {547,3,0,550},
        {1,1,0,2},
        {0,1,0,1},
        {0,142,0,142},
        {0,0,1,0},
        {0,1,1,1},
        {1,0,1,1},
        {1,1,1,0}
    };

    vector<Sample> test =
    {
        {10,2,0,12},
        {12,100,0,112},
        {30,40,0,70},
        {0,0,1,0},
        {0,1,1,1},
        {1,0,1,1},
        {1,1,1,0}
    };

    Network net;

    net.root.bias = Rand(-1, 1);
    net.root.wA = Rand(-1, 1);
    net.root.wB = Rand(-1, 1);
    net.root.wTask = Rand(-1, 1);

    net.Train(train, 0.01, 100000);

    net.Print();
    net.Test(test);

    std::cin.get();
    return 0;
}