分享

Markdown+Pandoc:A light

 Tehero 2015-06-14
note: add english translation for this post. 2013.4.21 at beijing
周末了,早上起来外边还在下小雨,站在阳台上看了会雨。阳台前边就是一个很大的公园,
早起的鸟儿已经开始忙碌了,我竟然在窗口那棵树上看到了一只彩色的小鹦鹉,非常漂亮。
Another weekend, it is still raining outside. Walking to the balcony and just enjoying the moment. There is a huge park in front of the apartment. The early birds are working busy, I even saw a rainbow coloured parrot standing on the gum tree, it is really cute.

中午吃过饭做城铁来到办公室,手边一杯咖啡,开始想要把这几天困扰我的一个问题解
决掉。
After lunch, I bought a cup of coffee and took cityrail to my office, trying hard to solve a problem that confused me for couple days.

一直在用LaTeX写论文,很好很强大,对数学公式的支持几乎可以用完美来形容。但
是只有一个问题,每次当我写好东西想要发给导师或者同事开始协同写作的时候,LaTeX
到PDF然后到Word文档的转换成了一个比较麻烦的事情。当我从PDF中拷贝粘贴文本
到Word的时候,格式全乱掉,数学公式一团糟。之后手动调整格式的过程让使用LaTeX
写作的优势看起来似乎抵消掉。
I usually write thesis (papers) in LaTeX, it is really powerful and convenient for scientific writing due to its great support for math symbols and nice integration with vim editor. However, there is a problem that confused me a lot. Each time, when I want to share my LaTeX document to my supervisor and colleagues, I found the conversion from LaTeX to Word is really tedious work. Not only lost all the formatting, but also the math symbols. The post-editing just canceled the advantages of LaTeX.

我不打算也知道很难劝说导师和周围的人都使用LaTeX,实话说,LaTeX的好处用过的
都知道。但是,当科学写作更多的需要协作的时候,除非周围的人都使用LaTeX,否则
每次转换格式都会浪费许多时间,而且也容易出错。
I hesitate to persuade my supervisor and colleagues to transfer from Word to LaTeX. Frankly speaking, the benefits of using LaTeX can be only felt through using it for a while (actually not a short while). However, collaborative writing is becoming much common in scientific community, unless all around switched to LaTeX, otherwise, the format conversion is a huge waste of time and easily got errors during post-editing process.

于是想要寻找一个LaTeX的替代解决方案,并非完全要放弃LaTeX。因为科技论文写作
不可避免要有许多数学符号和公式。而且许多期刊都提供了LaTeX模板,因此如果这种
替代方案需要让我完全放弃LaTeX,我也会有些犹豫的。
So, I just want to find an alternative solutions rather than writing in pure LaTeX. This doesn't mean to abandon LaTeX at all. At it is inevitable that math symbols and equations intensively presented in scientific writings, and most major peer review journals provide formatted LaTeX templates. Therefore, if the solution means that I need to through away LaTeX, I rather do nothing.

好在发现了Markdown,更确切的说,是发现了Markdown+Pandoc的组合。
OK, finally I found Markdown, specifically, the combination of Markdown plus Pandoc.

什么是Markdown?
What is Markdown?
-------------------------
简单一句话就是:用纯文本写作,同时用直观的轻量级标记来格式化文档。
In one sentence: Markdown is a lightweight markup language that enable the formatting of document with minimal markups (such as #, $ or _).

为什么要使用纯文本创作?
Why prefer pure text writing?
-------------------------------------
我的哲学是,任何基于字符的创作都应该是内容至上(除了书法)。这包括写小说,写
论文,或者是写代码。在创作过程中,尤其是前期创作中,任何格式都是一种多余。这
一点大家可以想像一下作家用纸笔写小说,纸上总不能高亮或者加粗吧?用内容说话才
是王道。
The philosophy is, any text based creative activities should be contents focused (except Chinese calligraphy). This includes novel, thesis and programming codes. In the process of creating, especially the pre-creating, formatting is not necessary at all.

什么是Pandoc?
What is Pandoc?
----------------------
Markdown本身是为了方便输出到HTML格式的。可是后来大家不局限于只是生成HTML
网页,而Pandoc就是为了解决这种需要。通过Pandoc,原始的Markdown文本可以顺利
的转换成Word文档(.docx),OpenOffice文档(.odt),或者是TeX文档(.tex)。
The purpose of Markdown is for the conversion from pure text to HTML. But Pandoc enable the users to convert the Markdown texts to much diverse formats, such as Word, OpenOffice, or TeX.

为什么Markdown+Pandoc的组合让我动心?
What are the advantages of Markdown+Pandoc?
--------------------------------------------------------------
1. 轻量、简单易学、上手容易。实话说学LaTeX已经花了不少功夫,我不想再学习另一种
复杂的语言,只是为了写作文章。Markdown符合需求。
1. Lightweight, smooth learning curve. Actually, learning LaTeX costed me a lot of time. I really hesitate to learn another complicated language just for writing.
2. 能够顺利转换成Word文档。毕竟周围的人用Word还是不少,能够顺利和他们分享文档
也是我的基本需求之一。这点Pandoc可以解决。
2. MUST support the conversion to Word. After all, the Word is still the mainstream text editor for most people. One of my proprieties the text sharing with my colleagues.
3. 能够转成TeX文档。这个对我来说也是必须的,目前为止Markdown对数学和表格的支持
还是有些弱。Pandoc可以将Markdown转到TeX文件,这一点对我来说吸引力非常大。
3. MUST support the conversion to TeX.

如何配置Mardown+Pandoc?
How to configure Markdown+Pandoc?
----------------------------------------
几乎不需要配置。下载Pandoc安装就行。五分钟就可以搞定上手。
The good thing is that Pandoc does not need any specific configurations (compared with LaTeX).
MD的介绍可以看这里:[维基百科Markdown](http://zh./wiki/Markdown)
There is a Wikipedia page for Markdown: http://zh./wiki/Markdown
Pandoc的介绍可以看这里:[Pandoc用户指南](http://www./article/746)
The introductions for Pandoc can be found at: http://www./article/746

我现在的写作流程
My current writing process
--------------------------
1. Vim中写作MD原始文本
1. Writing original markdown text in Vim.
2. 需要分享时通过Pandoc转换为.docx
2. If necessary, convert Markdown text to .docx through Pandoc.
3. 需要发表到学术期刊时,通过pandoc转为TeX,然后调用期刊的LaTeX模板,生成PDF。
3. Whenever need to publish to scientific journals, the Markdown text can be converted into TeX, then just copy into LaTeX template which is provided journals, finally typeset to PDF.

这样,既避免了转到另一种更复杂的解决方案,同时能够保证和同事之间的分享。而且最重要
的是没有放弃LaTeX,保留了后期转到LaTeX的灵活性。
By doing this, I can avoid another more complicated solution, at the same time, I can also share documents with my colleagues. Most importantly, it is also easily integrated with LaTeX.

有哪些问题?
Problems?
------------------------------
当然,这种解决方案并非完美,目前来看问题主要有:
However, it is far to perfect, there are couple problems until now:
1. 数学公式。Pandoc虽然可以把LaTeX math渲染成很漂亮的HTML公式,但是目前来看
似乎无法很好的渲染到.docx文档中。也有可能是因为我电脑上没有装Office,而是用苹果
的Pages,所以不支持MOMML(Microsoft's Office Math Markup Language)语言?好吧,
看来又是一个编辑器专用的东西。那么数学公式怎么办呢?我现在的办法是,依然用TeX
Math直接在Markdown中写数学公式,转好DOCX文件之后,\begin{equation}和\end{equation}
之间的部分不会输出,我再用LaTeX it!(一个小工具,转LaTeX数学公式到PDF或者图片,
非常简单,拷贝粘贴就行)插入到DOCX文档中。我不是搞数学或者物理的,文章中公式并
不是很多,因此,这并不是太大的问题。
1. Math.

2. 参考文献。用Markdown+Pandoc的话,当然可以直接使用Pandoc的文献插入格式。但是
这里有一个问题,因为我需要后期转到TeX文档,现在看来Pandoc在把MD转到TeX之后,
仍然保留的Pandoc的cite key{author:year},而不是LaTeX的cite key (\cite{author:year})。
但是,这个世界总是有很多办法的。我的办法就是,直接插入Papers2(文献管理软件)的
cite key,对于上边那篇文献,Papers2的cite key长这个样子:{author:year}。当我生成DOCX
之后,{author:year}依然保留在DOCX文件中,然后我只需要很简单的用Papers2格式化一下
文档就行(超级简单)。那么转到TeX中呢,{author:year}依然保留,不过变成这个样子了:
\{author:year}。大家可以看到,这里和LaTeX的cite key唯一不同的地方就是,\{author:year\}
之间少了个cite,然后第二个大括号的前边多了一个“\”。哈哈,有办法了。直接用Vim的查找
替换。首先,替换前边的"\{"为"\cite{":在Vim中输入::%s/\\{/\\cite{/g 。然后替换后边的"\}"
为"}"::%s/\\}/}/g 。经过这两步,本来还是\{author:year\}的cite key就变成了LaTeX的cite
key \cite{author:year}了。
2. Bibliography. For sure, we could easily use native pandoc citation format (like this [@author:year]) to manage citations. But, one problem by doing this is that once I convert the original Markdown document into TeX file, it requires some extra efforts to translate the pandoc citation format to LaTeX citation format (\cite{author:year}). My solution for this problem is, directly using the pandoc citation format which provided by my references manager app on Mac (Papers 2, which also has Windows edition). Once I convert the Markdown into TeX, the original pandoc citation format became \{author:year\}. Then I could just using some tricks in Vim to easily convert it to LaTeX citation format. Just typing ":%s/\\{/\\cite{/g" which will replace the first "\{" with "\cite{". Then replace the second "\}" by typing ":%/\\}/}/g". These two steps will replace the "\{author:year\}"
by "\cite{author:year}".
 
搞定!Perfect!
Done!

好了,现在可以放心的用Markdown+Pandoc的组合来写论文了,插数学公式或者参考文献
都没有问题。同时既保留了Markdown的轻量,同时可以无缝转换到其它文档格式,最关键
的是,和LaTeX也有非常完美的结合。

下一步,就是怎么想办法诱惑周围的人(包括导师和同事)也开始放弃Word,来使用Markdown
写论文了。更大的理想是,通过使用纯文本写作,再结合GitHub进行协作,实行版本控制,
commit, push, pull。。。你懂的。。。。

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多