MIME to HTML in Perl如果能把MIME转换成HTML格式,再配合EXIM之类的邮件路由,我们将赋予邮件更多功能,例如邮件发贴,分析邮件内容并归档等。exim方面 的设置很简单,只需要配一个transport策略即可,关键是MIME到HTML的转换,以下使用perl对MIME格式进行转换。 先感性认识一下MIME格式,一个简单的MIME格式如下所示: From xxx@mail.com Tue Jan 04 14:52:47 2011 Received: from [10.10.10.10] (helo=host1.com) by host1.com with esmtp (Exim 4.43) id 1Pa0lC-0005Pq-Fp for yyy@mail.com; Tue, 04 Jan 2011 14:52:46 +0800 Received: from host1.com ([10.10.10.10]) by host1.com ([10.10.10.10]) with mapi; Wed, 5 Jan 2011 14:52:36 +0800 From: xxx@mail.com To: "YYY@mail.com" Date: Wed, 5 Jan 2011 14:52:33 +0800 Subject: =?gb2312?B?W8n6svq5ytXPXVvW0LzkvP7X6V3W0M7EuL28/s7KzOI=?= Thread-Topic: =?gb2312?B?W8n6svq5ytXPXVvW0LzkvP7X6V3W0M7EuL28/s7KzOI=?= Thread-Index: AQHLrKUiSx7I5ZuKXEKkIBjubIjU3Q== Message-ID: <248935DBD2C33243B5D32551AC2AB08F01BCE402@CNSZ021464.mail.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: yes X-MS-TNEF-Correlator: acceptlanguage: zh-CN, en-US Content-Type: multipart/mixed; boundary="_004_248935DBD2C33243B5D32551AC2AB08F01BCE402CNSZ021464mailc_" MIME-Version: 1.0 --_004_248935DBD2C33243B5D32551AC2AB08F01BCE402CNSZ021464mailc_ Content-Type: multipart/alternative; boundary="_000_248935DBD2C33243B5D32551AC2AB08F01BCE402CNSZ021464mailc_" --_000_248935DBD2C33243B5D32551AC2AB08F01BCE402CNSZ021464mailc_ Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: base64 0ru2/sj9y8TO5cH5xt+wy77Fyq4NCsquvsWwy8bfwfnO5cvEyP22/tK7DQo= --_000_248935DBD2C33243B5D32551AC2AB08F01BCE402CNSZ021464mailc_ Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable <!-- P { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } --> <div><span style="color: #3d0000; font-size: small;">=D2=BB= =B6=FE=C8=FD=CB=C4=CE=E5=C1=F9=C6=DF=B0=CB=BE=C5=CA=AE</span></div> <div><span style="font-size: small;">=CA=AE=BE=C5=B0=CB=C6=DF= =C1=F9=CE=E5=CB=C4=C8=FD=B6=FE=D2=BB</span></div> --_000_248935DBD2C33243B5D32551AC2AB08F01BCE402CNSZ021464mailc_--
下面我们来看看怎样转换MIME格式为HTML,这次用到的第三方包是:Email::MIME,大家去cpan下吧。 第一步需要取的邮件内容,当然是MIME格式的。 my $parsed = Email::MIME->new($content); my $header = $parsed->header_obj;#get the header my @parts = $parsed->parts;#取得MIME part,可以理解成MIME 这里的part是可以嵌套 取得MIME header后,我们就可以分离出主题,收件人发件人等信息了。 my $para_title = $header->header_raw("Subject"); #主题 my $para_fromuser = $header->header_raw("From");#发件人 my $touser = $header->header_raw("To"); #收件人可能会有多个。 下面是分析邮件主体,需要注意的是,邮件有可能包含附件,我们需要先把附件提取出来,然后生成文件。如果附件是图片的话,还需要修改邮件内容引用图 片的地方,改成:<img src="">这种形式。因此我的思路是创建一个map对象,以content-id为key,以最后生成的文件名为value值,把文件信息都存 进去,在提取邮件正文时再进行替换。 提取文件代码: sub SplitPart4File(){ my (@subpart) = @_; for my $part (@subpart){ if($part->subparts){ SplitPart4File($part->subparts); next; } my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime(time); my $fn = $part -> filename; my $fh; if($fn){ #判断主体内容是否为文件,是就提取 my $sub_header = $part -> header_obj; $file_name = Util->decode_gb2312($sub_header -> header_raw("Content-Description"));#得到文件名 my $file_size = Util->getValueByKey($part -> header("Content-Disposition"), "size");#得到文件大小 my $content_id = $part -> header("Content-ID");#得到content-id $content_id =~ s/^<([^>]*?)>$/$1/img;#需要把<>都去掉 my $file_ext = substr($file_name, rindex($file_name, '.')); my $time = Time::HiRes::Value->now(); my $new_file_name = $year.$mon.$mday.$hour.$min.$sec.$time.$file_ext;#用时间重新命名 if(! -e $ATTACH_PATH . (1900 + $year)){ mkdir($ATTACH_PATH . (1900 + $year)); } $ATTACH_PATH .= (1900 + $year) . '/'; if(! -e $ATTACH_PATH . ($mon + 1)){ mkdir($ATTACH_PATH . ($mon + 1)); } $ATTACH_PATH .= ($mon + 1) . '/'; if(! -e $ATTACH_PATH . $mday){ mkdir($ATTACH_PATH . $mday); } $ATTACH_PATH .= $mday . '/'; my $file_path = $ATTACH_PATH . $new_file_name; $fh = IO::File -> new('>' . $file_path); my $decoded = $part->body; $fh->print($decoded); push(@file_list, $content_id); $file_map{$content_id} = $new_file_name; $fh -> close(); }else{ next; } } } 以下就是提取正文,并替换文件链接: sub SplitPart4Content(){ my (@subpart) = @_; my $content; my $file_name; my $fh; my $content_type; my @content_type_list; my $encoding; my $charset; my $decoded; my $content_type; for my $part (@subpart){ if($part->subparts){ SplitPart4Content($part->subparts); next; } my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime(time); $file_name = $part -> filename; $content_type = $part->content_type; @content_type_list = split(';', $content_type); $content_type = $content_type_list[0]; $encoding = $part->header("Content-Transfer-Encoding"); $charset = $part->header("Content-Type"); if($charset =~ /.*charset\=\"(.+)\"/i){ $charset = $1; }else{ $charset = ''; } if($file_name){ #判断主体内容是否是文件,是文件就跳过 next; }else{ $decoded = $part->body; $content = Util->convert2gb2312($decoded); $content_type = $part->content_type; $content =~ s/src\=\"cid\:(.*?)\"/src\=\"$FILE_URL$file_map{$1}\"/img; #替换图片连接格式 } } } 由于MIME和Html类似,内容是可以嵌套的,因此需要递归提取。 调用一下以上function: SplitPart4File(@parts); SplitPart4Content(@parts); 最后公布一下Unit.pm里的代码: #!/usr/bin/perl package Util; use strict; use MIME::Base64; use Text::Iconv; sub convert2gb2312(){ my ($self, $word) = @_; my $word_tmp; my $converter2gb2312 = Text::Iconv->new( "utf-8", "gb2312" ); $word_tmp = $converter2gb2312->convert($word_tmp); if( $word_tmp ne ''){ $word = $word_tmp ; } return $word; } sub decode_gb2312(){ my ($self, $word) = @_; my $word_tmp; if( $word =~ /\=\?([^\s\(\)\<\>\@\,\:\;\"\/\?\.\=]+)\?([bBQq])\?([^\s\?]+)\?\=/ ){ if( $2 eq "q" || $2 eq "Q" ){ $word_tmp = decode_qp( $3 ); }elsif( $2 eq "b" || $2 eq "B" ){ $word_tmp = decode_base64( $3 ); } if( !($1 =~ /^gb2312$/i) ){ my $converter2gb2312 = Text::Iconv->new( $1, "gb2312" ); $word_tmp = $converter2gb2312->convert($word_tmp); } if( $word_tmp ne ''){ $word = $word_tmp ; } } return $word; } sub getValueByKey(){ my ($self, $str, $key) = @_; if($str =~ /$key\=(.*?)\;/img){ my $match = $1; $match =~ s/\"//img; return $match; }else{ return 0; } } sub trim(){ my($self, $str) = @_; $str =~ s/^\s+|\s+$//g; return $str; } 1; 以上并非完整代码,只是原型一个思路,大家可以在此基础上不断丰富功能。 |
|