DeveloperAI & ML13 min read

PyTorch Fundamentals: Building and Training Models

The most popular deep learning framework — hands-on introduction

Eager Execution and Research Velocity

PyTorch defaults to eager execution: tensors flow through Python operations and you can print, slice, and debug like ordinary data. That matters in finance, where half the job is reshaping messy panels of features and verifying that no look-ahead sneaks into the tensor you feed the model.

Contrast that with the old “build a static graph, then run a session” workflow. Researchers still export models for deployment, but day-to-day work benefits from immediacy: change a layer, rerun a cell, inspect activations.

Operations link into a graph on the forward pass; autograd unwinds it on backward.

Tensors, Devices, and Autograd

A tensor is a multidimensional array; requires_grad=True marks leaves you want derivatives for. Autograd records operations to build a backward graph so loss.backward() populates .grad on parameters.

import torch
x = torch.randn(32, 10, requires_grad=True)
w = torch.randn(10, 1, requires_grad=True)
y = (x @ w).mean()
y.backward()
print(w.grad.shape)

Place tensors on cuda or mps when available; keep device placement consistent so you do not accidentally move small tensors back and forth each step.

nn.Module as Composition

Subclass nn.Module, define layers in __init__, and connect them in forward. Registration ensures parameters appear in model.parameters() for optimizers.

import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, n):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n, 64), nn.ReLU(), nn.Linear(64, 1)
        )
    def forward(self, x):
        return self.net(x)

For sequence models, nn.LSTM and nn.TransformerEncoder layers are common; for tabular alpha, MLPs and gradient boosting still compete — pick by validation discipline, not by trend.

The Four-Line Loop (Plus Hygiene)

Training is repetitive on purpose: forward, loss, zero grad, backward, step. The extras matter: model.train() vs model.eval() toggles dropout and batch norm; torch.no_grad() wraps validation to save memory.

The same loop repeats for every batch until the validation curve stops improving.

Datasets, DataLoaders, and Leakage

Subclass Dataset to materialize features and labels; wrap with DataLoader for batching. For chronological data, usually keep shuffle=False on validation and often on training if you slice contiguous windows — otherwise you smear information across time.

If you train on tick data from GaiaEx or similar feeds, align bars to a single clock, handle missing prints explicitly, and version your feature code alongside the model checkpoint.

TorchScript and Serving Boundaries

For low-latency inference, teams often trace or script the model, then run it in a C++ runtime or a sidecar service. Keep the training environment (library versions) pinned to whatever you used to export.

PyTorch Lightning and similar frameworks reduce boilerplate for multi-GPU training and logging; they do not replace careful feature design. The edge in trading usually comes from data hygiene and regime awareness, not from a slightly fancier optimizer.

Type	Taker Fee	Maker Fee
Perpetuals	0.0675%	0.0300%
Spot	0.0675%	0.0300%