Researchers Introduce Spanish Cybersecurity Language Model VectraYX-Nano
Researchers have unveiled VectraYX-Nano, a 41.95M-parameter Spanish language model designed specifically for cybersecurity applications in Latin American contexts. The model, detailed in a preprint study published on arXiv, incorporates curriculum learning and native tool invocation via the Model Context Protocol (MCP) to enhance its effectiveness in regional cybersecurity tasks.
VectraYX-Nano was trained on a custom 170M-token corpus called VectraYX-Sec-ES, generated through an eight-virtual-machine pipeline costing approximately $25 USD. The dataset combines conversational Spanish sources like OpenSubtitles-ES and cybersecurity-focused content from the National Vulnerability Database (NVD) and Wikipedia-ES. The model’s Latin American focus addresses a gap in cybersecurity tools tailored to regional linguistic and technical needs.
Key innovations include its decoder-only architecture optimized for Spanish and the integration of native tool invocation, allowing direct interaction with cybersecurity systems. According to the study, while the model has limited direct relevance to U.S. markets, it could inform tooling for Spanish-speaking cybersecurity professionals in the United States.
The study highlights cost-effective model development, achieving competitive performance with relatively modest computational resources compared to larger global models.