
Extracting Formulaic Sequences Containing Useful Expressions for Language Learning from Closed Caption TV Corpus
PROCEEDING
Hajime Mochizuki, Kohji Shibano, Tokyo University of Foreign Studies, Japan
E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, in Washington, DC, United States Publisher: Association for the Advancement of Computing in Education (AACE), San Diego, CA
Abstract
This paper describes the details of the formulaic sequences (FS) extracted from closed caption TV (CCTV) data corpus to develop language learning materials of e-learning system. In second language education and applied linguistics, it is a widely acceptance that appropriately using FSs in particular situations and functions contributes to learners’ language comprehension, production, and fluency. In our research, we aim to apply FSs as a language e-learning system’s learning materials. To extract the FSs, we calculated sequences of n words (n-grams) in the corpus, where n is from one to nine. We calculated a total of 3,544,847,579 n-grams from the CCTV corpus of over 655 million words. After a sorting and merging process, we acquired 33,173,413 significant n-grams as FSs candidates. We show the details of the FSs and investigate whether they are useful as language learning materials.
Citation
Mochizuki, H. & Shibano, K. (2016). Extracting Formulaic Sequences Containing Useful Expressions for Language Learning from Closed Caption TV Corpus. In Proceedings of E-Learn: World Conference on E-Learning (pp. 29-37). Washington, DC, United States: Association for the Advancement of Computing in Education (AACE). Retrieved June 28, 2022 from https://www.learntechlib.org/primary/p/173916/.
© 2016 Association for the Advancement of Computing in Education (AACE)
References
View References & Citations Map- Allen, D. (2010). Lexical Bundles in Learner Writing: An Analysis of Formulaic Language in the ALESS Learner Corpus. Komaba Journal of English Education. Vol. 1, 2010. Pp.105-127.
- Biber, D., Conrad, S. & Cortes, V. (2004). If you look at…: Lexical Bundles in University Teaching and Textbooks. Applied Linguistics 25. 3. Pp.371-405.
- Conklin, K. & Schmitt, N. (2008). Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? Applied Linguistics 29, 1. Pp.72-89.
- Conklin, K. & Schmitt, N. (2012). The Processing of Formulaic Language. Annual Review of Applied Linguistics, vol.32, pp.45-61.
- Council of Europe (2001). Common European Framework of Reference for Languages: Learning, Teaching, assessment (CEFR). Http://www.coe.int/t/dg4/linguistic/Source/Framework_EN.pdf
- Ellis, C.N., Vlach, S.R., & Maynard, C. (2008). Formulaic Language in Native and Second Language Speakers: Psycholinguistics, Corpus Linguistics, and TESOL. Tesol Quartery vol. 42, No. 3, pp. 375-396, 2008.
- Jiang, N. & Nekrasova, M.T. (2007). The processing of Formulaic Sequences by Second Language Speakers. The Modern Language Journal, vol. 91, iii., pp. 433-445.
- Kudo, T., Yamamoto, K. & Matsumoto, Y. (2004). Applying conditional random fields to Japanese morphological analysis. In Proceedings of the Conference on Empirical Methods if Natural Language Processing EMNLP 2004 (pp. 230-237).
- Ministry of Internal Affairs and Communications (2013), an actual result about the spread of closed caption TV in 2012 (in Japanese), http://www.soumu.go.jp/menu_news/s-news/01ryutsu09_02000071.html
- Mochizuki, H. & Shibano, K. (2014). Building Very Large Corpus Containing Useful Rich Materials for Language Learning from Closed Caption TV. World Conference on E-Learning inCorporate, Government, Healthcare, and Higher Education, Volume 2014, No. 1, pp. 1381-1389. Association for the Advancement of Computing in
- Mochizuki, H. & Shibano, K. (2015). Development of a Closed Caption TV Corpus Retrieval System to Seek Video Scenes Containing Useful Expressions for Language Learning. The EdMedia World Conference on Educational Media and Technology, Volume 2015, No. 1, pp. 1744-1752. Association for the Advancement of Computing in
- Vilach, S.R. & Ellis, C.N. (2010). An Academic Formulas List: New Methods in Phraseology Research. Applied Linguistics 31, 4. Pp.487-512.
- Wood, D. (2002). Formulaic Language in Acquisition and Production: Implications for Teaching. TESL Canada Jornaurevue TESL du Canada, vol. 20, No. 1, pp. 1-15.
- Wray, A.(2002). Formulaic Language and the Lexicon. Cambridge UK: Cambridge University Press.
- Wray, A. (2008). Formulaic Language: Pushing the Boundaries. Oxford University Press.
- Witten, I.H. & Bell, T.C. (1991). The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, Vol. 37, Issue 4, pp. 1085-1094. IEEE.
These references have been extracted automatically and may have some errors. Signed in users can suggest corrections to these mistakes.
Suggest Corrections to References