Mitigating Feature Overfitting in Barlow Twins via Mixed-Sample Regularization for Stable Long-Horizon Representation Learning

Arwa Saad; Prasun Chakrabarti; Mona Ali Abdelrahman; Vinayakumar Ravi

doi:10.66279/7scztv14

Authors

Arwa Saad Nahda University Author

Competing Interests

No competing interests this author may have with the research subject.
Prasun Chakrabarti ITM University Author

Competing Interests

The author declares no competing interests
Mona Ali Abdelrahman American University in the Emirates Author

Competing Interests

The author declares no competing interests
Vinayakumar Ravi Prince Mohammad bin Fahd University Author

Competing Interests

The author declares no competing interests

DOI:

https://doi.org/10.66279/7scztv14

Keywords:

Self-supervised learning, Barlow Twins, Redundancy Reduction, Mixed Sample Regularization, Feature Overfitting

Abstract

In self-supervised learning, feature overfitting during extended training is still a significant problem, especially in redundancy-reduction frameworks like Barlow Twins. Barlow Twins performs well at first, but after prolonged training (e.g., after 600 epochs), its representation quality deteriorates. This is primarily because of limited data diversity and overfitting to feature correlations. An improved Mixed Barlow Twins framework that incorporates mixed-sample regularization via linear interpolation in the input space is presented to overcome this restriction. This method facilitates simpler feature matrices and mitigates redundancy-induced overfitting by ensuring consistency between mixed inputs and their corresponding embeddings. Stable optimization without performance degradation is demonstrated by extensive experiments on CIFAR-10 with a ResNet-50 backbone over 1000 training epochs. In longer-term scenarios, the proposed approach outperforms the standard Barlow Twins baseline, achieving a k-NN classification accuracy of 92.1%. With only 7.2 GB of GPU memory and about 15 hours of training time, the technique also maintains high computational efficiency with little overhead. These findings indicate that mixed-sample regularization is a simple yet effective method for improving representation robustness and training stability in self-supervised learning.

Downloads

Download data is not yet available.

Author Biographies

Prasun Chakrabarti, ITM University

ITM SLS Baroda University, 391510, Vadodara,India
Mona Ali Abdelrahman, American University in the Emirates

Department chair, Mass Communications College, American University in the Emirates,
Vinayakumar Ravi, Prince Mohammad bin Fahd University

Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar, Saudi Arabia

References

[1] S. U. Amin, A. Hussain, B. Kim, and S. Seo, “Deep learning based active learning technique for data annotation and improve the overall performance of classification models,” Expert Syst. Appl., vol. 228, p. 120391, Oct. 2023, doi: 10.1016/j.eswa.2023.120391. DOI: https://doi.org/10.1016/j.eswa.2023.120391

[2] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” Nov. 21, 2020, PMLR. Accessed: Mar. 26, 2026. [Online]. Available: https://proceedings.mlr.press/v119/chen20j.html

[3] N. Hossain, A. Al Thaki, Md. Mamun-Or-Rashid, and Md. Mosaddek Khan, “Graph Contrastive Learning: A Comprehensive Review of Methodologies, Applications, and Future Directions,” IEEE Access, vol. 14, pp. 40571–40604, 2026, doi: 10.1109/ACCESS.2026.3672509. DOI: https://doi.org/10.1109/ACCESS.2026.3672509

[4] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum Contrast for Unsupervised Visual Representation Learning,” 2020. Accessed: Mar. 26, 2026. [Online]. Available: https://github.com/facebookresearch/moco DOI: https://doi.org/10.1109/CVPR42600.2020.00975

[5] J. Gui, T. Chen, J. Zhang, Q. Cao, Z. Sun, and H. Luo, “A survey on self-supervised learning: Algorithms, applications, and future trends,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 9052–9071, 2024, doi: 10.1109/TPAMI.2024.3415112. [6] C. F. G. Dos Santos and J. P. Papa, “Avoiding Overfitting: A Survey on Regularization Methods for Convolutional Neural Networks,” ACM Comput. Surv., vol. 54, no. 10 s, Jan. 2022, doi: 10.1145/3510413. DOI: https://doi.org/10.1109/TPAMI.2024.3415112

[7] C. Cao, F. Zhou, Y. Dai, J. Wang, and K. Zhang, “A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability,” ACM Comput. Surv., vol. 57, no. 2, p. 38, Oct. 2024, doi: 10.1145/3696206. DOI: https://doi.org/10.1145/3696206

[8] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, Z. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, R. Munos, and M. Valko, “Bootstrap your own latent: A new approach to self-supervised learning,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2020.

[9] X. Chen and K. He, “Exploring Simple Siamese Representation Learning,” 2021. Accessed: Mar. 26, 2026. [Online]. Available: https://github.com/facebookresearch/simsiam

[10] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow Twins: Self-Supervised Learning via Redundancy Reduction,” Jul. 01, 2021, PMLR. Accessed: Mar. 26, 2026. [Online]. Available: https://proceedings.mlr.press/v139/zbontar21a.html

[11] A. Bardes, J. Ponce, and Y. LeCun, “VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning,” ICLR 2022 - 10th International Conference on Learning Representations, Jan. 2022, Accessed: Mar. 26, 2026. [Online]. Available: http://arxiv.org/abs/2105.04906

[12] A. Ermolov, A. Siarohin, E. Sangineto, and N. Sebe, “Whitening for Self-Supervised Representation Learning,” Jul. 01, 2021, PMLR. Accessed: Mar. 26, 2026. [Online]. Available: https://proceedings.mlr.press/v139/ermolov21a.html

[13] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond Empirical Risk Minimization,” 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, Apr. 2018, Accessed: Mar. 26, 2026. [Online]. Available: http://arxiv.org/abs/1710.09412

[14] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features,” 2019. Accessed: Mar. 26, 2026. [Online]. Available: https://github.com/clovaai/CutMix-PyTorch. DOI: https://doi.org/10.1109/ICCV.2019.00612

[15] A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.

[16] V. Hondru, F. A. Croitoru, S. Minaee, R. T. Ionescu, and N. Sebe, “Masked Image Modeling: A Survey,” International Journal of Computer Vision 2025 133:10, vol. 133, no. 10, pp. 7154–7200, Jul. 2025, doi: 10.1007/s11263-025-02524-1. DOI: https://doi.org/10.1007/s11263-025-02524-1

[17] W. Guo, T. Xu, B. Li, Y. Fan, Z. Yu, and K. Jing, “Deep Embedding with Adversarial Convolutional Autoencoder for Image Clustering,” pp. 195–198, Jan. 2026, doi: 10.1109/icicml67980.2025.11333458. DOI: https://doi.org/10.1109/ICICML67980.2025.11333458

[18] M. E. Ram and G. Manju, “VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins,” IEEE Access, vol. 13, pp. 136312–136319, 2025, doi: 10.1109/ACCESS.2025.3590971. DOI: https://doi.org/10.1109/ACCESS.2025.3590971

[19] A. Abdallah, M. S. Kasem, I. Abdelhalim, N. S. Alghamdi, and A. El-Baz, “Improving BI-RADS Mammographic Classification With Self-Supervised Vision Transformers and Cascade Learning,” IEEE Access, vol. 13, pp. 135500–135514, 2025, doi: 10.1109/ACCESS.2025.3581582. DOI: https://doi.org/10.1109/ACCESS.2025.3581582

[20] H. Saeed, M. Adel, Y. Ataa, M. Mohamed, H. Ahmed, T. W. Hong, and N. Jayarajan, “Reliable drug–target interaction prediction using convolutional neural networks with robust negative sample generation,” Journal of Smart Algorithms and Applications, vol. 2, no. 2, pp. 34–48, Feb. 2026.

[21] Y. Jiang, J. Li, Y. Tian, J. Yao, X. Yu, W. Ye, and X. Cao, “Positional relation contextual mixing for imbalanced classification,” Machine Learning, vol. 115, no. 3, Mar. 2026, doi: 10.1007/s10994-026-07004-2 DOI: https://doi.org/10.1007/s10994-026-07004-2

[22] A. A. Wani, “Comprehensive review of dimensionality reduction algorithms: challenges, limitations, and innovative solutions,” PeerJ Comput. Sci., vol. 11, p. e3025, Jul. 2025, doi: 10.7717/peerj-cs.3025. DOI: https://doi.org/10.7717/peerj-cs.3025

[23] Chuhan Zhang, “Enhanced Multi-Modal Feature Fusion Algorithm for Early-Stage Cancer Detection: A Comparative Study of Optimization Strategies,” Chinese Control Conference, CCC, vol. 2018-July, pp. 9428–9433, Oct. 2018, doi: 10.23919/ChiCC.2018.8483140. DOI: https://doi.org/10.23919/ChiCC.2018.8483140

[24] K. Vinters, “Evaluating the generalizability of a panorama-point cloud encoder trained without supervision,” 2025, Accessed: Mar. 27, 2026. [Online]. Available: https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-372284

[25] X. He, “APPLICATION OF MACHINE LEARNING TO METAMATERIALS FOR INVERSE DESIGN AND TOPOLOGICAL CLASSIFICATION”.

[26] R. Kumar, Y. W. Kim, and Y. C. Byun, “Hybrid Framework Combining Diffusion-Based Image Augmentation and Feature Level SMOTE for Addressing Extreme Class Imbalance,” IEEE Access, vol. 13, pp. 154623–154646, 2025, doi: 10.1109/ACCESS.2025.3600622. DOI: https://doi.org/10.1109/ACCESS.2025.3600622