For dependency structure analysis, extraction of key sentences and discourse structure analysis, it is necessary to divide text into the “sentence” unit. However, sentence segmentation is not straightforward for transcription of spontaneous speech. We present a method of sentence boundary detection for the Corpus of Spontaneous Japanese by using the dependency structure and also investigate machine learning techniques. Experimental results show that the accuracy of sentence segmentation is improved with these methods, and also that the accuracy of dependency structure analysis is improved by using the enhanced sentence boundary detection.
Download Full PDF Version (Non-Commercial Use)