"Sungguh indah kehidupan seorang muslim. Jika dia mendapatkan nikmat maka akan BERSYUKUR, jika mendapatkan ujian maka akan BERSABAR"

Kamis, 04 Maret 2010

ElectroLarynx, Esopahgus, and Normal Speech Classification using Gradient Discent, Gradient discent with momentum and learning rate, and Levenberg-Marquardt Algorithm

ABSTRACT

Malignant cancer of the larynx in RSCM hospital is the third ranking after disease of ear and Nose. The average number of larynx cancer patients in RSCM is 25 people per year. More than 8900 persons in the United States are diagnosed with laryngeal cancer every year. The exact cause of cancer of the larynx until now is unknown, but it is found some things that are closely related to the occurrence of laryngeal malignancy: cigarettes, alcohol, and radioactive rays.

Ostomy is a type of surgery needed to make a hole (stoma) on a particular part of body. Laryngectomy is an example of Ostomy. It is an operations performed on patients with cancer of the larynx (throat) which has reached an advanced stage. The impact of this operation will make the patients can no longer breathe with their nose, but through a stoma (a hole in the patient's neck).
Human voice is produced by the combination of the lungs, the valve throat (epiglottis) with the vocal cords, and articulation caused by the existence of the oral cavity (mouth cavity) and the nasal cavity (nose cavity) [3]. Removal of the larynx will automatically remove the human voice. So that post-surgery of the larynx, the patient can no longer speak as before.

Several ways to make Laryngectomes can talk again has been developed., for example:
 Esophageal Speech,
 Tracheoesophageal
 Electrolarynx Speech.

Esophageal speech is a way to talk with throat as high as the original vocal cords as a source of sound. The vibration comes from swallowed air, before entering into the stomach[1]. The steps in practice of esophageal speech are blowing, winding, forming a syllable, and speaking. [2].

Tracheoesophageal is a device which implanted between the esophagus and throat. The voice source of this method is esophagus [4]. It can happen, when laryngectomies speaking, the flow of air into the stoma have been closed. So the air will lead to the esophagus through the vocal cords replacement has been planted. This method produces a satisfactory sound, but it has high risk infection risk.

Another device for helping laryngectomies to speak is Electrolarynx. This tool is placed on the lower chin and make the neck vibrates to produce a sound. The sound that produced by electrolarynx is monotone and no intonation at all. So it likes robots and not attractive.

Meanwhile research in the Speech recognition and its application is now going rapidly. A lot of application of speech recognition was introduced. Some of them are: dialing the phone using voice (eg "call home"), entering simple data into a data base using voice, providing a simple command to a particular machine, etc [5]. It was expected that this technology also can be used by electrolarynx and esophageal speech.

In this paper it is presented a system for Identifying each speech. Two main parts of this system are feature extraction that will extract the characteristic of the human voice, and pattern recognition that will recognize the sound patterns correctly. Feature extraction will be done by Linear Predictive Coding - LPC while pattern recognition will be done by artificial neural network technology.

Three kinds of training methods will be used in the ANN. They are Gradient Discent, Gradient discent with momentum and learning rate, and Levenberg-Marquardt (LM). All three methods were compared, so it will be known which ones can provide the fastest and has the highest validity.

From the test results, it is known that the LM training methods give the fastest time, with a validity reached 88.2%. With Intel atom processor N270 1.60 GHz CPU, its learning process takes 0.54 seconds. Meanwhile Gradient descent training method gives the longest time (660.64 seconds), but has a higher validity, and even reached 100%.

For full paper, please contact the author: fatchul@uny.ac.id

(It was published on ICGC International Conference 2010, Yogyakarta 2-3 march 2010)

Tidak ada komentar:

Posting Komentar