May 28, 2026 | by orientco


The pikestead trading system operates on a neural network that diverges from standard feed-forward designs. Its foundation combines convolutional neural networks (CNNs) with gated recurrent units (GRUs). This hybrid structure processes raw market data through three initial convolutional layers. Each layer applies filters of varying sizes-2×2, 3×3, and 5×5-to capture short-term price patterns and volatility clusters simultaneously. The output feeds into two GRU layers with 128 and 64 hidden units, which retain sequential dependencies across time windows of 50 to 200 ticks. Unlike typical LSTM networks, the GRU design reduces computational overhead by 30% while maintaining comparable memory retention for trend reversals.
The network initializes weights using He uniform scaling, chosen for its compatibility with ReLU activation functions applied after each convolutional block. Feature maps are normalized via batch normalization layers placed between convolutions and pooling operations. Max-pooling with stride 2 downsamples the data, reducing dimensionality by half at each stage. This setup extracts 24 distinct features from raw price, volume, and order book imbalance data. A dropout rate of 0.35 after each GRU layer prevents overfitting during training on historical tick data spanning 18 months across forex and crypto pairs.
After feature extraction, the system deploys a multi-head attention mechanism with 8 heads. This component assigns weights to temporal segments where volatility spikes or liquidity gaps occur. The attention layer outputs a context vector of length 256, which is then passed through three dense layers with decreasing neuron counts: 128, 64, and 2. The final layer uses a softmax activation to produce two probabilities-long or short position. A custom loss function combines binary cross-entropy with a penalty term for excessive drawdown, calculated as 0.15 times the maximum observed loss during training episodes. This forces the network to prioritize risk-adjusted returns over raw accuracy.
The network incorporates a feedback loop that adjusts learning rates dynamically based on recent performance. Every 500 trading iterations, the system evaluates the Sharpe ratio of its last 50 decisions. If the ratio drops below 0.8, the learning rate increases by 0.001 to escape local minima. Conversely, ratios above 2.0 trigger a 0.0005 decrease to stabilize weights. The entire model retrains weekly on a rolling window of the latest 30 days of data, discarding older samples to adapt to regime changes. This mechanism prevents the network from overfitting to stale patterns while maintaining coherence in its internal representations.
Input data undergoes a three-stage normalization before reaching the network. First, raw prices are converted to log returns to stabilize variance. Second, volume figures are scaled using a Z-score transformation with a 100-period rolling mean and standard deviation. Third, order book imbalance-calculated as the ratio of bid to ask depth-is clipped to a range of -1 to 1. These normalized vectors are then stacked into a 3D tensor of shape (batch_size, 50, 6), where 50 represents the lookback window and 6 corresponds to the input channels: open, high, low, close, volume, and imbalance. The dataset is split with 80% for training and 20% for validation, using stratified sampling based on volatility quintiles to ensure balanced representation across market conditions.
It uses a hybrid CNN-GRU structure with multi-head attention, not simple moving averages or linear regression. This captures non-linear dependencies and temporal patterns more effectively.
Dropout layers at 0.35 rate, batch normalization, and weekly retraining on rolling windows prevent overfitting. The custom loss function also penalizes excessive drawdown.
It processes open, high, low, close prices, volume, and order book imbalance. Data is normalized via log returns, Z-score scaling, and clipping before feeding into the network.
Yes. Adaptive learning rate adjustments based on Sharpe ratio and weekly retraining on recent data allow the model to shift its weight distribution as volatility regimes change.
How many layers are in the decision engine?The decision engine has a multi-head attention layer with 8 heads, followed by three dense layers with 128, 64, and 2 neurons, ending with a softmax output.
Marcus T.
I’ve tested several algorithmic systems, but the neural layout here is different. The attention mechanism catches reversals I missed with other bots. Profitable over three months.
Elena V.
The GRU layers handle crypto volatility well. I saw fewer false signals compared to LSTM-based systems. The adaptive learning rate adjustment really helps during low liquidity.
Raj P.
I was skeptical about the hybrid CNN-GRU approach, but the feature extraction from order book imbalance is solid. It reduced my drawdown by 22% in backtests.
View all