dc.contributor.author | Hossain, Arafat Bin | |
dc.description.abstract | Large BERT models cannot be used with low computing power and storage capacity. Knowledge Distillation solves this problem by distilling knowledge into a smaller BERT model while retaining much of the teacher’s accuracy in student. A teacher expert in predicting one class should be chosen, by student, over others for that class - we used the teacher’s domain expertise like this to train the student. We calculated per-class accuracy for the Student and the Teacher and recorded the difference between the student from the teacher for all k classes. With k differences, we calculated the median of the differences to quantify the student's overall deviation from the teacher over all k classes. Student trained using our approach eventually outperformed all its teachers for the MIND dataset where it was 1.3% more accurate than its teacher, BERT-base-uncased, and 2.6% more accurate than its teacher, RoBERTA, in predicting k classes. | en_US |
dc.publisher | North Dakota State University | en_US |
dc.rights | NDSU policy 190.6.2 | en_US |
dc.title | Multi-Teacher Knowledge Distillation Using Teacher's Domain Expertise | en_US |
dc.type | Thesis | en_US |
dc.date.accessioned | 2023-12-18T16:22:48Z | |
dc.date.available | 2023-12-18T16:22:48Z | |
dc.date.issued | 2022 | |
dc.identifier.uri | https://hdl.handle.net/10365/33328 | |
dc.subject | BERT | en_US |
dc.subject | Encoder | en_US |
dc.subject | Knowledge Distillation | en_US |
dc.subject | Multi Teacher Knowledge Distillation | en_US |
dc.subject | Multi-class Text Classification | en_US |
dc.rights.uri | https://www.ndsu.edu/fileadmin/policy/190.pdf | en_US |
ndsu.degree | Master of Science (MS) | en_US |
ndsu.college | Engineering | en_US |
ndsu.department | Computer Science | en_US |
ndsu.program | Computer Science | en_US |
ndsu.advisor | Malik, Muhammad Zubair | |