Монгол-Aнгли орчуулга хийдэг нейрон сүлжээнд суурилсан машины орчуулгын гурван шатлалт загвар боловсруулах

Triple Model for Mongolian-English Translation Based on Neural Machine Translation

Authors

  • Bat-Erdene B. ХИС, МХМС, Компьютерын Ухааны тэнхим

DOI:

https://doi.org/10.22353/mjflc.v26i565.1812

Keywords:

Монгол-англи орчуулга, Статистикт тулгуурласан машины орчуулга, нейрон сүлжээнд суурилсан машины орчуулга, шаталсан загвар, өгүүлбэрийн хүрээ

Abstract

The widespread use of neural machine translation has the advantage of allowing users to translate terms and translate untrained data to a certain extent, but in some cases often results in distorted sentence structure. This study aims to address issues such as neural machine translation control, high-probability translation of unrecognized data, correct sentence structure, beginning and ending recognition, and the establishment of an independent, machine translator in one's home country. We have made improvements to the neural network model, such as adjusting neural machine translation to unidentified words in subunits, and defining sentence boundaries and scope. The design is based on the usual PMT and SMT templates used to compare words in a system that takes into account word and sentence structure. However, the model we developed is based on the latest neural machine translation (NMT) architecture, which can make more complex relationships. In this sense, this work can be seen as an attempt to use a combination of statistical machine translation and neural machine translation. We sought and tested in practice a step-by-step approach to combining complex deep neural network models that included longer contexts in a system that considered only short contexts in terms of word and sentence structures.

References

Altangerel, A., & Damdinsuren, B. (2009). Research Report on ASR Development for Mongolian.

PAN Localization Project Phase II.

Altangerel, C., Purev, J., Yesyenbyek, K., & Hansakunbuntheung, C. (2013). An evaluation of Mongolian data-driven Text-to-Speech. 2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 1–4. https://doi.org/10.1109/ICSDA.2013.6709881

Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., & Sima’an, K. (2017). Graph Convolutional Encoders for Syntax-aware Neural Machine Translation. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1957–1967. https://doi.org/10.18653/v1/D17-1209

Bender, O., Hasan, S., Vilar, D., Zens, R., & Ney, H. (2005). Comparison of Generation Strategies for Interactive Machine Translation. EAMT, 33–40.

Brown, P. F., Cocke, J., della Pietra, S. A., della Pietra, V. J., Jelinek, F., Lafferty, J. D., Mercer,

R. L., & Roossin, P. S. (1990). A STATISTICAL APPROACH TO MACHINE TRANSLATION. Computational Linguistics, 79–85.

Burlot, F., & Yvon, F. (2018). Using Monolingual Data in Neural Machine Translation: a Systematic Study. Proceedings of the Third Conference on Machine Translation: Research Papers, 144–155. https://doi.org/10.18653/v1/W18-6315

Cohn, T., Hoang, C. D. V., Vymolova, E., Yao, K., Dyer, C., & Haffari, G. (2016). Incorporating Structural Alignment Biases into an Attentional Neural Translation Model. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 876–885. https://doi.org/10.18653/v1/N16- 1102

Cotterell, R., & Kreutzer, J. (2018). Explaining and Generalizing Back-Translation through Wake- Sleep. ArXiv Preprint , 1806.04402.

Dahlmann, L., Matusov, E., Petrushkov, P., & Khadivi, S. (2017). Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1411–1420. https://doi.org/10.18653/v1/D17-1148

Dinu, G., Mathur, P., Federico, M., & Al-Onaizan, Y. (2019). Training Neural Machine Translation to Apply Terminology Constraints. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3063–3068. https://doi.org/10.18653/v1/P19-1294

Edunov, S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding Back-Translation at Scale. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 489–500. https://doi.org/10.18653/v1/D18-1045

Foster, G., Isabelle, P., & Plamondon, P. (1996). Word Completion- A First Step Toward Target- Text Mediated IMT. COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics. https://aclanthology.org/C96-1067

Graça, M., Kim, Y., Schamper, J., Khadivi, S., & Ney, H. (2019). Generalizing Back-Translation in Neural Machine Translation. Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), 45–52. https://doi.org/10.18653/v1/W19-5205

Hansakunbuntheung, C., Thangthai, A., Thatphithakkul, N., & Chagnaa, A. (2011). Mongolian speech corpus for text-to-speech development. 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), 130–135. https://doi.org/10.1109/ICSDA.2011.6085994

Kalchbrenner, N., & Blunsom, P. (2013). Recurrent Continuous Translation Models. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1700–1709. https://aclanthology.org/D13-1176

Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical Phrase-Based Translation. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 127–133. https://aclanthology.org/N03-1017

Langlais, P., Foster, G., & Lapalme, G. (2000). TransType: a Computer-Aided Translation Typing System. ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems. https://aclanthology.org/W00-0507

Lison, P., & Tiedemann, J. (2016). OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). http://www.opensubtitles.org.

Mi, H., Wang, Z., & Ittycheriah, A. (2016). Supervised Attentions for Neural Machine Translation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2283–2288. https://doi.org/10.18653/v1/D16-1249

Purev, J., & Altangerel, C. (2011). Language Resources for Mongolian. Conference on Human Language Technology for Development, 56–61.

Schwenk, H., Chaudhary, V., Sun, S., Gong, H., & Guzmán, F. (2019). WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia. CoRR, abs/1907.05791. http://arxiv.org/abs/1907.05791

Sennrich, R., Haddow, B., & Birch, A. (2016). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725. https://doi.org/10.18653/v1/P16-1162

Sutskever, I., Vinyals, O., & Le, Q. v. (2014). Sequence to Sequence Learning with Neural Networks. CoRR, abs/1409.3215. http://arxiv.org/abs/1409.3215

Tan, Z., Wang, S., Yang, Z., Chen, G., Huang, X., Sun, M., & Liu, Y. (2020). Neural machine translation: A review of methods, resources, and tools. AI Open, 1, 5–21. https://doi.org/10.1016/j.aiopen.2020.11.001

Yu, L., Buys, J., & Blunsom, P. (2016). Online Segment to Segment Neural Transduction.

Conference on Empirical Methods in Natural Language Processing, 1307–1316.

Ziemski, M., Junczys-Dowmunt, M., & Pouliquen, B. (2016). The United Nations Parallel Corpus v1.0. LREC. http://research.un.org/en/docs/symbols

Published

2023-02-09

How to Cite

B., B.-E. (2023). Монгол-Aнгли орчуулга хийдэг нейрон сүлжээнд суурилсан машины орчуулгын гурван шатлалт загвар боловсруулах: Triple Model for Mongolian-English Translation Based on Neural Machine Translation. Mongolian Journal of Foreign Languages and Cultures, 26(565), 74–83. https://doi.org/10.22353/mjflc.v26i565.1812