Show simple item record

dc.contributor.authorHossain, Arafat Bin
dc.description.abstractLarge BERT models cannot be used with low computing power and storage capacity. Knowledge Distillation solves this problem by distilling knowledge into a smaller BERT model while retaining much of the teacher’s accuracy in student. A teacher expert in predicting one class should be chosen, by student, over others for that class - we used the teacher’s domain expertise like this to train the student. We calculated per-class accuracy for the Student and the Teacher and recorded the difference between the student from the teacher for all k classes. With k differences, we calculated the median of the differences to quantify the student's overall deviation from the teacher over all k classes. Student trained using our approach eventually outperformed all its teachers for the MIND dataset where it was 1.3% more accurate than its teacher, BERT-base-uncased, and 2.6% more accurate than its teacher, RoBERTA, in predicting k classes.en_US
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU policy 190.6.2en_US
dc.titleMulti-Teacher Knowledge Distillation Using Teacher's Domain Expertiseen_US
dc.typeThesisen_US
dc.date.accessioned2023-12-18T16:22:48Z
dc.date.available2023-12-18T16:22:48Z
dc.date.issued2022
dc.identifier.urihttps://hdl.handle.net/10365/33328
dc.subjectBERTen_US
dc.subjectEncoderen_US
dc.subjectKnowledge Distillationen_US
dc.subjectMulti Teacher Knowledge Distillationen_US
dc.subjectMulti-class Text Classificationen_US
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdfen_US
ndsu.degreeMaster of Science (MS)en_US
ndsu.collegeEngineeringen_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US
ndsu.advisorMalik, Muhammad Zubair


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record