|
Countless number of websites allow university students to buy ready-made term papers. Some even lure elementary students by doing their homework for them. That flood makes it virtually impossible to ferret out plagiarism. But recent research by a Korean academic reveals that writing has its own ¡°fingerprint¡± that can tell teachers who the writer is through a statistical analysis on the frequency of words used.
¡ß ¡®Primary Colors¡¯
In February 1996, a novel titled ¡°Primary Colors¡± was published by an anonymous author in the United States. The protagonist, a governor in the southern U.S. who has sex with a librarian during the campaign, closely resembled Bill Clinton, and the novel was full of information that only a close acquaintance of Clinton could have.
The first suspect was Joe Klein, then a columnist for Newsweek. He first denied the accusations, but subsequently admitted it. Prof. Donald Foster, a forensic linguist and text analyst for the FBI, showed that the novel was written in a similar style as Klein¡¯s columns by statistical analysis of the sentences.
Dr. Han Na-rae of the Institute of Korean Culture at Korea University was the first one to apply Foster¡¯s method to Korean and prove its effectiveness in revealing the author¡¯s identity. Han analyzed a total of 160 columns by Chosun Ilbo columnists Kim Dae-joong, Ryu Geun-il, Yang Sang-hoon and Kim Chang-kyoon, 40 for each, and announced she had a 93.7 percent of success rate. Her study will be presented at the 20th Annual Conference on Human and Cognitive Language Technology at Seoul National University on Saturday.
¡ß English vs. Korean
Statistical text analysis method to track down the author has a long history in the Anglophone world, but the success rate is 89 percent, lower than Han¡¯s average. ¡°It is still difficult to come to a definite conclusion, but the result suggests that Korean reflects a more personal style than English,¡± said Han.
There is a technical edge that works better in Korean. While the word is the basic unit in the analysis of English, the morpheme is the crucial unit in Korean textual analysis because noun and postposition are combined in Korean. In a search of the Internet with the keyword ¡°our country,¡± the search engine will look for various forms like ¡°our country is¡± and ¡°in our country.¡± This is why a natural linguistic analysis of Korean language was developed early on.
Han pledged to expand the database by analyzing more texts. She added the technique can be used at a practical level to detect plagiarism in students¡¯ work.¡±
(englishnews@chosun.com )
|