Signature Extraction from E-Mails
Abstract
Detecting user identity information from the email is one of the predominant exploring topics in data mining. One approach is to extract signature from the body of emails. Those names are usually suitable for representing the sender’s or recipient’s identity. To overcome the limitation, we proposed the novel approach to extract the signature of email sender and recipient from salutation and signature blocks and email bodies. After locating and extracting signature blocks from email bodies, we can identify the names in the salutation and signature lines, which can be directly associated with the corresponding email address in email headers and the body by using named entity recognition (NER) tools. For these tasks Naive Bayes, maximum entropy and Support Vector Machines (SVM) algorithms are preferred in this paper. Results on the data subset of the Enron corpus indicate that the approaches presented in this paper can extract signature from email bodies.