Music generation became a platform for creative expression, promoting artistic innovation, personalized experiences, and cultural integration, with implications for education and creative industry development. But generating music that resonates emotionally is a challenge. Therefore, we introduce a new framework called the Sequence-to-Music Transformer Framework for Music Generation. This framework employs a simple encoder-decoder Transformer to model music by transforming its fundamental notes into a sequence of discrete tokens. The model learns to generate this sequence token by token. The encoder extracts melodic features of the music, while the decoder uses these extracted features to generate the music sequence. Generation is performed in an auto-regressive manner, meaning the model generates tokens based on previously observed tokens. Music melodic features are integrated into the decoder through cross-attention layers, and the generation process concludes when “end” is generated. The experimental results achieve state-of-the-art performance on a wide range of datasets.