A Developmental Branching Model of Computation and Learning
Preface, From Fractal Intuition to Structural Adaptation
This work explores a computational system in which learning is not restricted to parameter optimization within a fixed architecture, but is instead defined as the joint evolution of parameters and topology under a shared evaluation signal.
The initial intuition behind this system is that adaptive intelligence does not require a fixed computational graph. Instead, useful computation can emerge from a structure that is itself subject to mutation, selection, and incremental refinement.
Rather than assuming a predefined neural architecture, the system begins with a minimal root node and evolves both its structure and parameters through stochastic perturbation and performance-based selection.
The central claim of this model is not biological fidelity, but computational sufficiency: that structured adaptation alone, without gradients or backpropagation, can produce nontrivial multi-task behavior within a single evolving system.
1. A System Defined by Structural Evolution
Traditional machine learning systems define a fixed computation graph prior to training. Learning is performed by adjusting parameters within this static structure.
In contrast, the system presented here treats structure itself as a mutable object. Each node contains both parameters and the capacity to generate additional computational structure.
The system begins with a single root node and evolves through repeated mutation and selection.
A node is defined as follows:
struct Node
{
double bias;
double wA;
double wB;
double wTask;
std::vector> children;
};
Each node acts as a minimal computational unit capable of producing an output and spawning further computational units. Over time, this results in a recursively expanding tree whose topology is not predefined but discovered through search.
2. Computation as Recursive Selection
Computation in this system is defined as a recursive traversal over a branching structure in which child nodes compete for selection based on alignment with a locally induced signal.
double Forward(double a, double b, int task) const
{
double x = bias + wA * a + wB * b + wTask * task;
if (children.empty())
{
return x;
}
int bestIndex = 0;
double bestScore = std::numeric_limits::max();
for (int i = 0; i < (int)children.size(); i++)
{
double y = children[i]->Forward(a, b, task);
double score = std::abs(y - x);
if (score < bestScore)
{
bestScore = score;
bestIndex = i;
}
}
return children[bestIndex]->Forward(a, b, task);
}
Each node computes a local representation \( x \), and recursively evaluates its children. The child whose output is closest to the parent representation is selected for further evaluation.
This defines computation not as a single forward pass through a fixed graph, but as a recursive selection process over competing subcomputations.
The resulting system can be interpreted as a structured search process embedded within inference itself.
3. Learning as Structural and Parametric Search
Learning is implemented as a stochastic hill-climbing process over both parameters and structure. At each iteration, the system mutates the current model, evaluates performance, and either accepts or rejects the modification.
void Train(const vector& data, double targetErr, int maxIter)
{
double bestErr = Evaluate(data);
for (int i = 0; i < maxIter; i++)
{
Edit e;
e.before = root.DeepCopy();
Mutate(root);
double err = Evaluate(data);
if (err < bestErr)
{
bestErr = err;
}
else
{
root = e.before.DeepCopy();
}
}
}
This process does not rely on gradients or backpropagation. Instead, it directly evaluates full candidate structures under the objective function.
Both structural changes (adding or removing nodes) and parametric updates occur within the same optimization loop.
The system therefore performs joint search over:
- continuous parameter space
- discrete structural space
4. Mutation Operators and Structural Dynamics
The system evolves through three mutation mechanisms applied uniformly over the tree:
- local parameter perturbation
- addition of new child nodes
- removal of existing child nodes
void Mutate(Node& n)
{
Node* x = RandomNode(n);
int type = RandInt(0, 2);
if (type == 0)
{
x->bias += Rand(-0.5, 0.5);
x->wA += Rand(-0.5, 0.5);
x->wB += Rand(-0.5, 0.5);
x->wTask += Rand(-0.5, 0.5);
}
else if (type == 1)
{
auto child = std::make_unique();
child->bias = Rand(-1, 1);
child->wA = Rand(-1, 1);
child->wB = Rand(-1, 1);
child->wTask = Rand(-1, 1);
x->children.push_back(std::move(child));
}
else
{
if (!x->children.empty())
{
x->children.pop_back();
}
}
}
Through repeated application of these mutations, the system explores both functional variation and structural reconfiguration. Selection pressure is applied globally through the objective function.
5. Evaluation and Selection Dynamics
The system is trained using greedy acceptance-based optimization. Each mutation is evaluated against a dataset, and retained only if it improves performance.
double Evaluate(const vector& data)
{
double err = 0.0;
for (const auto& s : data)
{
double p = root.Forward(s.a, s.b, s.task);
err += std::abs(p - s.target);
}
return err / data.size();
}
This defines a global objective function over the entire structure. Importantly, no gradient information is used; only full forward evaluation determines acceptance.
The resulting optimization process is a form of stochastic hill climbing over a hybrid discrete-continuous space.
6. Multi-Task Conditioning Within a Shared Structure
Task differentiation is achieved through a minimal conditioning mechanism:
double x = bias + wA * a + wB * b + wTask * task;
Rather than constructing separate networks for each task, the system embeds task identity into the same computation graph.
This forces all tasks to share structural resources, encouraging reuse and emergent specialization within different branches of the tree.
Importantly, task separation is not explicitly enforced through modular architecture; it emerges from shared optimization pressure across tasks.
7. Structural Memory and Persistent Computation
In this system, memory is not explicitly represented. Instead, it is encoded implicitly in the persistence of structural configurations that consistently reduce error.
int Count() const
{
int n = 1;
for (const auto& c : children)
{
n += c->Count();
}
return n;
}
Substructures that improve performance tend to persist under repeated mutation-selection cycles, while ineffective structures are removed.
This results in a form of structural memory, where useful computational motifs are retained in topology rather than stored in an external memory mechanism.
8. Distinction from Fixed-Architecture Learning Systems
This system differs from conventional neural networks in three primary ways:
1. Evolving topology
The computation graph is not fixed. It is constructed incrementally through mutation and selection.
2. Non-gradient optimization
Learning is performed via stochastic evaluation of full candidate structures rather than gradient descent over parameters.
3. Recursive selection-based inference
Inference is defined as recursive selection among competing subcomputations rather than uniform layerwise transformation.
9. A Unified View of Computation and Learning
The system can be interpreted as a structured hypothesis space embedded within a single evolving tree.
Each node represents a local hypothesis, each branch represents a refinement or alternative computation, and mutation generates new hypotheses for evaluation.
Learning corresponds to selecting structures that minimize global error under repeated evaluation.
This collapses the distinction between architecture and optimization into a single process: structural adaptation under performance pressure.
10. Conclusion
This model frames learning as joint optimization over functions and structures. Rather than treating architecture as fixed and only optimizing parameters, both are treated as mutable components of a unified system.
Within this framework:
- parameters define local transformations
- nodes define reusable computational units
- branches define competing subcomputations
- structure encodes persistent computational bias
- learning is stochastic search over structure and parameters
- computation is recursive selection among candidate evaluations
The resulting system demonstrates that non-gradient, structure-evolving computation can support nontrivial multi-task behavior within a single adaptive tree.
Appendix A: Full Source Code
#include <iostream>
#include <vector>
#include <memory>
#include <random>
#include <cmath>
#include <limits>
#include <iomanip>
using std::vector;
static std::mt19937 rng((std::random_device())());
double Rand(double a, double b)
{
std::uniform_real_distribution<double> d(a, b);
return d(rng);
}
int RandInt(int a, int b)
{
std::uniform_int_distribution<int> d(a, b);
return d(rng);
}
struct Sample
{
double a;
double b;
int task;
double target;
};
struct Node
{
double bias = 0.0;
double wA = 0.0;
double wB = 0.0;
double wTask = 0.0;
std::vector<std::unique_ptr<Node>> children;
double Forward(double a, double b, int task) const
{
double x = bias + wA * a + wB * b + wTask * task;
if (children.empty())
{
return x;
}
int bestIndex = 0;
double bestScore = std::numeric_limits<double>::max();
for (int i = 0; i < (int)children.size(); i++)
{
double y = children[i]->Forward(a, b, task);
double score = std::abs(y - x);
if (score < bestScore)
{
bestScore = score;
bestIndex = i;
}
}
return children[bestIndex]->Forward(a, b, task);
}
Node DeepCopy() const
{
Node copy;
copy.bias = bias;
copy.wA = wA;
copy.wB = wB;
copy.wTask = wTask;
copy.children.clear();
for (const auto& c : children)
{
copy.children.push_back(
std::make_unique<Node>(c->DeepCopy())
);
}
return copy;
}
int Count() const
{
int n = 1;
for (const auto& c : children)
{
n += c->Count();
}
return n;
}
void Print(int depth = 0) const
{
for (int i = 0; i < depth; i++) std::cout << " ";
std::cout
<< "Node bias=" << bias
<< " wA=" << wA
<< " wB=" << wB
<< " wTask=" << wTask
<< " children=" << children.size()
<< "\n";
for (const auto& c : children)
{
c->Print(depth + 1);
}
}
};
struct Edit
{
Node before;
};
struct Network
{
Node root;
double Evaluate(const vector<Sample>& data)
{
double err = 0.0;
for (const auto& s : data)
{
double p = root.Forward(s.a, s.b, s.task);
err += std::abs(p - s.target);
}
return err / data.size();
}
Node* GetNode(Node& n, int& index)
{
if (index == 0) return &n;
index--;
for (auto& c : n.children)
{
Node* r = GetNode(*c, index);
if (r) return r;
}
return nullptr;
}
Node* RandomNode(Node& n)
{
int idx = RandInt(0, n.Count() - 1);
return GetNode(n, idx);
}
void Mutate(Node& n)
{
Node* x = RandomNode(n);
int type = RandInt(0, 2);
if (type == 0)
{
x->bias += Rand(-0.5, 0.5);
x->wA += Rand(-0.5, 0.5);
x->wB += Rand(-0.5, 0.5);
x->wTask += Rand(-0.5, 0.5);
}
else if (type == 1)
{
auto child = std::make_unique<Node>();
child->bias = Rand(-1, 1);
child->wA = Rand(-1, 1);
child->wB = Rand(-1, 1);
child->wTask = Rand(-1, 1);
x->children.push_back(std::move(child));
}
else
{
if (!x->children.empty())
{
x->children.pop_back();
}
}
}
void Train(const vector<Sample>& data, double targetErr, int maxIter)
{
double bestErr = Evaluate(data);
std::cout << "Initial error: " << bestErr << "\n";
for (int i = 0; i < maxIter; i++)
{
Edit e;
e.before = root.DeepCopy();
Mutate(root);
double err = Evaluate(data);
if (err < bestErr)
{
bestErr = err;
}
else
{
root = e.before.DeepCopy();
}
if (i % 500 == 0)
{
std::cout
<< "Iter " << i
<< " Err=" << bestErr
<< " Nodes=" << root.Count()
<< "\n";
}
if (bestErr < targetErr)
{
std::cout << "Target reached\n";
break;
}
}
}
void Test(const vector<Sample>& data)
{
std::cout << "\n=== TEST ===\n";
for (const auto& s : data)
{
double p = root.Forward(s.a, s.b, s.task);
std::cout
<< "Input=(" << s.a << "," << s.b << ") Task=" << s.task
<< " Output=" << p
<< " expected=" << s.target
<< "\n";
}
}
void Print()
{
std::cout << "\n=== STRUCTURE ===\n";
root.Print();
}
};
int main()
{
vector<Sample> train =
{
{1,2,0,3},
{2,3,0,5},
{3,4,0,7},
{4,5,0,9},
{5,6,0,11},
{6,7,0,13},
{20,40,0,60},
{100,54,0,154},
{72,17,0,89},
{547,3,0,550},
{1,1,0,2},
{0,1,0,1},
{0,142,0,142},
{0,0,1,0},
{0,1,1,1},
{1,0,1,1},
{1,1,1,0}
};
vector<Sample> test =
{
{10,2,0,12},
{12,100,0,112},
{30,40,0,70},
{0,0,1,0},
{0,1,1,1},
{1,0,1,1},
{1,1,1,0}
};
Network net;
net.root.bias = Rand(-1, 1);
net.root.wA = Rand(-1, 1);
net.root.wB = Rand(-1, 1);
net.root.wTask = Rand(-1, 1);
net.Train(train, 0.01, 100000);
net.Print();
net.Test(test);
std::cin.get();
return 0;
}
