My original motivation to pursue graduate school is that I wanted to dramatically change my career, and to spend time in the field of technology and design that I am interested in. Four years ago at my admission, I was expecting to earn a degree and some new skills so that I can find a good job after graduation. But after consistent learning, discovering, trying and failing, I am lucky enough to find myself on the path toward discovering and pursuing my intrinsic interest, which brings me more enjoyment and satisfaction than the extrinsic motivations such as degree or job. For me, that’s the serendipity of being at graduate school.
SIAT is a very unique place in that it is an interdisciplinary and research-oriented program. Although the disciplines in SIAT vary from humanities, arts, design, to computer science, engineering, psychology, neuroscience, healthcare, ect., there is a common theme across the various disciplines in SIAT, and such common theme is human. We are engaged in understanding the human beings and their existence with respect to the arts, technology, society. Technology, design or arts are ways or probes to understand the human beings.
Creating new algorithms is hard, but the people problem is even harder. Unlike the traditional technology-oriented programs that focus on the algorithm problem and reduce the complexity of the problem by ignoring the people problem, in SIAT we face the two hard problems simultaneously, which makes the research even more challenging. To tackle the complex problem of people + technology, doing interdisciplinary research is not a goal itself, but a mean to its goal. In SIAT, the discipline boundaries or barriers are blurry, and we are used to be surrounded by researchers with multiple backgrounds. However, that comes with costs. It means one needs to spend more time on different subjects, and be familiar with different research communities and methods. It’s a very time-consuming process. And one has to bear the risks of not being recognized by neither communities, or the research is merely on the surface of some disciplines. But such pursuit in itself is rewarding, because by doing so, the research problem is no longer fragmented through a single discipline lens. We see the bigger picture, and do not need to bear the alienation and inauthenticity due to ignoring the human variable in the problem.
The top two beneficial things I obtain from graduate school are:
The former is the generator of my self-esteem, vitality, happiness, and well-being, while the latter is the thinking strategy that I can transfer to other problems.
To get most out of the graduate school and make it a push for your career, ask yourself these questions when deciding if graduate school is right for you:
Do I feel autonomy when doing such choices, without any control or being largely influenced by external motives (others’ expectation, degree, money, good job).
Can I feel relatedness in the research lab? Is it a supportive environment for my pursuit of interest, for my enhancement of skills, for me to have freedom in my research decisions?
[Note: this relates to the three psychological needs in self-determination theory: competence, autonomy, and relatedness.]
Soon after I entered SIAT, I was so excited and embracing the new environment. For a while, I got pretty busy fulfilling short-term goals, such as course assignments, conference deadlines, project presentations, scholarship applications, etc. But gradually, I realize I had lost my original interest when replacing some extrinsic goals (such as publications) as my ultimate goals. So when it is the time to decide my thesis research, I didn’t know what I was going to do. Then my supervisor Professor Diane Gromala encouraged me to keep a diary on things I am interested in and try to distill the common attributes. I did so, and although it was very slow and difficult (took me almost a year to reach my thesis topic), but in this self-introspective process, I get to approximate my intrinsic interest. I can keep my own pace and goal, thus feel less anxious about the inevitable competition and comparison while I’m in grad school.
Before joining SIAT, I received my Doctor of Medicine (MD) in Neurology and had worked in both the hospital, the pharmaceutical and technological companies. After experiencing different jobs, I discover that being a researcher in developing AI and HCI (human-computer interaction) technologies for healthcare is the best route for me. Conducting research is a creative activity. Publications to a researcher, is like artworks to an artist, albums to a musician, buildings to an architect, and fictions to a novelist. The inspiration and enlightening I get from reading a great publication, is no less than from reading a fabulous novel, or listening to a remarkable piece of music.
Doing research in the interdisciplinary field of AI, HCI and medicine also meets my interests pretty well. I enjoy the problem-solving process which is a common attribute in research, medicine and programming. I love the human focus in medicine and HCI. For example, my PhD thesis deals with how doctors use AI, and how to design AI for doctor-AI collaboration. I am also very interested in the human mind, that’s why I chose Neurology in the first place as my specialty in medical training. And AI is another approach by reverse engineering the neural network. My thesis is engaged in understanding the decision-making process for both natural intelligence (the doctors) and artificial intelligence, and how can both AI and doctors learn better by explaining.
Ryan, R. M. & Deci, E. L. Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. Am. Psychol. 55, 68–78 (2000). ↩
It is necessary to make it clear that, the aim of CV is not to get one or multiple trained models for inference, but to estimate an unbiased generalization performance. This may be quite confusing at the beginning, since the outcome of the common train/validate/test split approach are a trained model with tuned hyperparameter (on train/validate set), plus a generalization estimation of performance (on trainval/test set).
Why bother to have a validation set? Why not just use the test set for the two tasks of hyperparameter tuning(model selection) and estimation at once? The problem is, if we use the test set multiple times for different trained models, during our selection of the optimal model, the test set actually “leaks” information, and thus unpure. When we later apply the model to real-world data, the model will probably have a larger error than on the test set. That being said, when using the test set for both model selection and estimation, it tends to overfit the test data, and the estimation leads to an optimistic bias.
(A side note: if we match the test
set in above to the test
set of many benchmark datasets, we will find the machine learning community is actually overfitting the benchmark. Since now we are using the results on the benchmark test
dataset to select the best model, we are again mixing the model selection and performance estimation together by using the two tasks on the same benchmark test
set.)
When doing one round CV to evaluate the performance of different models, and select the best model based on the CV results, it is similar to the above case of using test set both for model selection and estimation. Thus, when we want to perform model selection and generalization error estimation, we have to separate the two tasks by using two test set for each task. That’s why we have the validation and test set, and the same version in CV is called nested or two-round cross validation.
The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set).
The algorithm is as follows (adapted from Hastie et. al [1] and this post):
The nested cross validation
Divide the dataset into \(K\) cross-validation folds at random.
For each fold \(k=1,2,...,K\): outer loop for evaluation of the model with selected hyperparameter
2.1 Let test
be fold \(k\)
2.2 Let trainval
be all the data except those in fold \(k\)
2.3 Randomly split trainval
into \(L\) folds
2.4 For each fold \(l= 1,2,...L\): inner loop for hyperparameter tuning
2.4.1 Let val
be fold \(l\)
2.4.2 Let train
be all the data except those in test
or val
2.4.3 Train with each hyperparameter on train
, and evaluate it on val
. Keep track of the performance metrics
2.5 For each hyperparameter setting, calculate the average metrics score over the \(L\) folds, and choose the best hyperparameter setting.
2.6 Train a model with the best hyperparameter on trainval
. Evaluate its performance on test
and save the score for fold \(k\).
Calculate the mean score over all \(K\) folds, and report as the generalization error.
As for the implementation, the scikit-learn documentation points out: the inner loop can call scikit-learn’s GridSearchCV
to achieve grid search of hyperparameter evaluated on the inner loop val
set, and the outer loop can call cross_val_score
for generalization error.
Can I apply the best hyperparameter selected in the first iteration of the outer fold, to build models for the remaining \(K-1\) outer loop? i.e. to save the search of the best hyperparameter in the next \(K-1 \times L \times M\) (where \(M\) is the number of hyperparameter combinations, if use grid search).
I think the answer is no. The reason is that in this way, the test
sets in the following loop are not “untouched” by the hyperparameter selection process. For example, in the outer loop # \(2\), the test
set for evaluating the model performance was actually used in the outer loop # \(1\) for selecting the hyperparameter, then some data were used both for hyperparameter tuning and performance evaluation. This will cause overfitting.
What if the \(K\) outer loop has distinct hyperparameter? How can I use the nested CV to build the best model?
As I state in the beginning, CV is not a method to get one or multiple trained models for inference, but only a tool to estimate an unbiased generalization performance. CV will generate multiple models in each outer loop, but we can hardly estimate the performance of each individual model, since the number of the test set in each outer loop is small. However, if the model is stable (do not change much if the training data is perturbed), the hyperparameter found in each outer loop may be the same (using grid search) or similar to each other (using random search). A more in-depth explanation can be found here.
That’s all for what I would like to share of nested CV. This post reflects my current understanding of the cross validation. Please correct me if you identify any problems. Thanks!
[1] T. Hastie, J. Friedman, and R. Tibshirani, “Model Assessment and Selection,” in The Elements of Statistical Learning: Data Mining, Inference, and Prediction, T. Hastie, J. Friedman, and R. Tibshirani, Eds. New York, NY: Springer New York, 2001, pp. 193–224.
[2] G. C. Cawley and N. L. C. Talbot, “On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation,” Journal of Machine Learning Research, vol. 11, no. Jul, pp. 2079–2107, 2010.
]]>“Convolution” is one of the most mysterious words for a novice deep learner. The first time when I opened wikipedia on convolution and tried to make sense, I just got dizzy and lost. After a long time mingling with CNN and a bit with signal processing, I finally figure it out a little better. In my current understanding,
And here is the whole story:
To see what is a weighted sum, Let’s begin with the following expression, which may remind you of algebra in high school:
\[y = w_1 \times x_1 + w_2 \times x_2\]Pretty easy, right? We give \(x_1\) and \(x_2\) different weights of \(w_1\) and \(w_2\), because in our mind, we value \(x_1\) and \(x_2\) differently. For example, if you want to calculate the final course score from the midterm and final exam, the weights reflect how important you think each exam is.
Weights reflect how important we think a variable is numerically, in a numerical perspective. Here, by injecting our thoughts, we are bringing in new perspectives into the original flat world (since the original weights are : \(w_1 = 1\), \(w_2 = 1\)).
And sum is just an operation that condenses our deliberate thoughts on each variable into one final value (like the final course score).
If we have more than two, say \(n\) number of \(x\), then the above formula can be written as:
\[y = w_1 \times x_1 + w_2 \times x_2 + w_3 \times x_3 + ... + w_n \times x_n\]Or, a more condensed version:
\[y = \sum^n_{i=1} w_i \times x_i\]This works totally well when the number of \(x_i\) is small. But what if the number of \(x_i\) becomes large, say \(1,000,000\), or even infinite, how do we assign each \(x_i\) a weight wisely?
Here is the trick:
Instead of using a single number \(y\) to describe \(x\) sequence, now we use a group of numbers \(\vec{y}\). That’s because a single number compresses the information of \(x\) too much, so that it could not reflect its character very well.
Now since we are dealing with a whole bunch of \(x\)s and \(y\)s, we will use \(\vec{x}\) and \(\vec{y}\) instead. Don’t be intimidated by the fancy new notation. They are called “vector”, which is merely a sequence of numbers grouped together.
To generate a sequence of numbers \(\vec{y}\) out of \(\vec{x}\), there are some different approaches. Let’s explore one by one.
The easiest approach is to assign a signal weight \(w\) to all \(x_i\). It is equal to \(\vec{y} = w \times \vec{x}\), where \(w\) is a single number.
We will evaluate this approach from the following two aspects:
Same weight for all \(x_i\) | |
---|---|
\(w\) is simple and compact | Yes |
\(w\) adds information on how we value different value in \(\vec{x}\) | No |
\(w\) is quite simple: it is just a number. However, it adds little information from “our perspective”. It only scales the original \(\vec{x}\), and could not differentiate the fine details inside \(\vec{x}\).
Here is an example:
import numpy as np
import matplotlib.pyplot as plt
x = np.sin(np.linspace(-2* np.pi, 2*np.pi, 200))
plt.plot(x)
plt.ylim(-2.5, 2.5)
plt.xlabel("The sequence of x")
plt.ylabel("The value of x")
plt.show()
We draw \(\vec{x}\) as a sequence of numbers. Now if we multiply each \(x_i\) with a weight, say \(2\), then the output \(\vec{y}\) will look like this:
plt.plot(x)
#plt.plot(2*np.ones(x.shape))
plt.plot(2 * x)
plt.plot(np.convolve(x, 2))
plt.ylim(-2.5, 2.5)
plt.legend(['x', 'w * x', 'convolve with 2'], bbox_to_anchor=(1.05, 1), loc=2)
plt.xlabel("Sequence")
plt.ylabel("Value")
plt.show()
As you can see, the green line \(\vec{y}\) has exactly the same “pattern”, i.e.: it has peaks and valleys in the same position as in \(\vec{x}\). It stretched \(\vec{x}\) but that’s it.
You may notice I actually draw 3 lines. The graph only shows two lines because the lines of “multiply by 2” and “convolve with 2” are overlapped. I’ll talk about it later. But now you see we begin to touch convolution a little bit after a long reading!
Another extreme is to assign each \(x_i\) with a distinct weight. For every resulting point \(y_i\), it is the product of \(w_i\) and \(x_i\).
\[y_i = w_i \times x_i\]Now \(\vec{w}\) becomes a sequence of numbers with the same length as \(\vec{x}\). The weight \(\vec{w}\) reflects our thoughts on each \(x_i\) in much finer details. This is similar to what we do in the first example of calculating the final score of a course, but here we didn’t apply the summation.
To take another example, here we assign the weight as a sequence of numbers on a straight line, as the orange line shown in the picture. When we multiply each \(w_i\) with \(x_i\), we get the resulting \(\vec{y}\) as the green line. Even with a very simple form of \(\vec{w}\), the resulting \(\vec{y}\) can do a very good job in incorporating the information from both \(\vec{w}\) and \(\vec{x}\).
x = np.sin(np.linspace(0, 20*np.pi, 400))
w = np.arange(400)/400
plt.plot(x)
plt.plot(w)
plt.plot(np.multiply(x, w))
plt.legend(['x', 'w', 'y'], bbox_to_anchor=(1.05, 1), loc=2)
plt.xlabel("Sequence")
plt.ylabel("Value")
plt.show()
This approach has some real-world applications, such as Amplitude modulation. Depending on the specific problems, if we want to apply this approach to identifying peaks and valleys, it will become unpractical, since we will need to deliberately design all the weight sequence according to all the domain of \(\vec{x}\). It is also redundant and costly to express and store \(\vec{w}\).
Same weight for all \(x_i\) | Distinct weight for each \(x_i\) | |
---|---|---|
\(w\) is simple and compact | Yes | No |
\(w\) adds information on how we value different value in \(\vec{x}\) | No | Yes |
An improvement to the above approach, is to express \(\vec{w}\) in a repetitive manner, i.e.: to repeat a short sequence \(\vec{w}\) over \(\vec{x}\).
In this way, the \(\vec{w}\) is expressed by a short sequence of numbers, meanwhile still carry out our thoughts over the raw data \(\vec{x}\).
Same weight for all \(x_i\) | Distinct weight for each \(x_i\) | Repetitive weight | |
---|---|---|---|
\(w\) is simple and compact | Yes | No | Yes |
\(w\) adds information on how we value different value in \(\vec{x}\) | No | Yes | Yes |
n = 50
x = np.sin(np.linspace(0, 20*np.pi, 400))
w = np.tile(np.concatenate((np.arange(n), np.flip(np.arange(n), axis = 0)), axis = 0), int(200/n))/n/2
plt.plot(x)
plt.plot(w)
plt.plot(np.multiply(x, w))
plt.legend(['x', 'w', 'y'], bbox_to_anchor=(1.05, 1), loc=2)
plt.xlabel("Sequence")
plt.ylabel("Value")
plt.show()
In this example, we repeat \(\vec{w}\) four times. We notice that \(\vec{y}\) (in green) has the largest peaks when the peaks of \(\vec{w}\) overlap with the one of \(\vec{x}\).
However, in the above example, I set the length of \(\vec{w}\) deliberately to match the shape of \(\vec{x}\). Thus the interesting pattern of the \(\vec{w}\) can only fall on the specific segment of \(\vec{x}\). The real-world data are way more complex than the sinusoid. How to make \(\vec{w}\) cover the whole sequence of \(\vec{x}\) without missing any potential interesting combination of \(w \times x\)?
The problem with the previous approach is that its step size is too large. It equals the length of \(\vec{w}\). We can solve the problem by shortening the stepsize. Let’s push the stepsize to another extreme, say \(1\) in case that \(x\) is discrete.
Concretely, we iteratively use each number in \(\vec{w}\) as the start point, and periodically repeat \(\vec{w}\) to get the same length as \(\vec{x}\), then we multiply the two sequence element-wise.
Wait a minute, in this way, since we repeat the process many times, we will get many \(\vec{y}\), how do we summarize the multiplied information.
So far, we only discussed different scenarios of the weight, we still didn’t talk much about the sum. Now it’s time to let the sum shine!
I thought about adding all the resulting \(\vec{y}\) element-wise, but then I realized the result equals to add all the elements of \(\vec{w}\) together, then multiply the number with \(\vec{x}\). It becomes the case in Section 2.1, which destroyed the information we extracted by shifting the start point of \(\vec{w}\).
The real trick of convolution is:
we sum over the sequence of \(\vec{w}\), right after multiplying \(\vec{w}\) with the corresponding \(x\) segment.
The resulting output from the sum will be a single number \(y_n\):
\[y_n = \sum_{i=0}^{|w|} w_i \times x_{n + i}\]where \(n\) is the start position of \(\vec{w}\) on \(\vec{x}\).
After calculating all the \(y_n\) along the sequence of \(\vec{x}\), we just complete the convolution operation.
\[y(n) = (x * w)(n) = \sum_{i=0}^{|w|} w_i \times x_{n + i}\]where the \(*\) symbol denotes the convolution operation, and \(y\) is denoted as a function of the sequence \(n\).
Remember in the beginning, I mentioned sum is an operation that condenses our deliberate thoughts on each variable into one final value. Instead of summing over the whole sequence as above, the convolution sum over the area covered by the weight size. In this way, the sum operation won’t lose much information, and concisely represents the combined information from \(w\) and \(x\) patch. Moreover, since the weighted sum is actually the dot product of \(w\) and the local \(x\) patch, it is a similarity measure of the two, where a larger weighted sum represents a detected pattern on \(\vec{x}\) that corresponding to the pattern of \(\vec{w}\). Thus, we can design weights with the patterns we want and use them to detect if similar patterns exist on the target data \(\vec{x}\).
The formula is similar to the one in Discrete convolution in Wikipedia. Except for that in the wikipedia’s formula, it is \(x_{n - i}\) instead of \(x_{n + i}\). The minus sign used to confused me a lot, until I found out when we talk about convolution in CNN, we actually talk about cross-correlation, where it uses \(x_{n + i}\). The convolution in math uses \(x_{n - i}\), where the weight is flipped. The difference only matters when writing proofs, but as regarded to CNN implementation, people just use \(x_{n + i}\) and call it convolution. (Ref: Deep Learning book p. 324)
That’s almost the whole story of convolution.
But what does it to do with convolution and image, you may ask. The above story tells us how to calculate convolution with 1-dimensional data. The convolution in CNN for images is calculated exactly in the same way, we just extend the data and weights to 2 dimensions.
\[Y(m, n) = (X * W)(m, n) = \sum_{i} \sum_{j} W_{i , j} \times X_{m + i, n + j}\]That’s it!
The convolution is really a smart, simple and powerful operation. It uses a small size of weight, which is easy to express. I’d like to think the weight as a searchlight that focuses on one small patch of \(X\) at a time. While the “light” shines on one image patch, it synthesizes the information of the image data with our values on different spatial location, by using the weighted sum operation. The convolved output is a group of such weighted sum aligned with the shape of input data \(X\). This is my intuition on the strength of convolution. More advantages of convolution include sparse interactions, parameter sharing, and equivariant representations. These contents are described in much details in Ch 9.2 of Deep Learning book.
My next notebook will introduce the implementation and a variety of convolutions in CNN. See you later!
This article reflects my current understanding of convolution in CNN. Please let me know if you identified any errors or have any questions: @weina_jin or by creating an issue. Thanks for your reading 😎
]]>垃圾处理问题一直困扰着泰坦星人。从宇宙中望去，泰坦星呈一半是橙色，一半是白色，就好像两块半圆的积木拼在了一起。那一半白色的就是堆放垃圾的地方。这数百万年来积累起来的垃圾量，不仅完全无法降解，而且正逐渐蚕食着泰坦人的生存空间。
组成这白色物质的，主要是泰坦星人的排泄物。作为这个星球上占压倒性地位的生物群体，他们能在数百万泰坦年内一直位居其它有机或无机生物的顶端，跟他们独特的消化系统不无关系，消化道的高压环境能最大化地获取食物中的能量，但经消化道排出的这些致密坚硬的物质，根本无法被星球上生态链中的其它生物分解利用。
于是，倒往星球外似乎成了垃圾处理唯一的出路。临近行星的颜色都已被泰坦星人的排泄物染白，他们只能向更远的宇宙深处进军。
正当舰长计算着航行绩效与家中每月账单的收支平衡问题时，收到了来自空间垃圾处理站首席搜寻专家的消息。
“报告舰长，我们终于找到一颗星球，刚好就在燃料能承受的航行范围的边缘。”
“那太好了！赶紧把坐标设置为这颗星球！”舰长第一反应是终于不用因为砍掉的薪水而退掉下半年的有线电视服务。
“但是……”
“但是什么？”
“这颗星球上……似乎有低等生物活动的迹象……这违反了宇宙垃圾倾倒法则。”
“真是见了鬼了！在这种荒凉的地方竟然还有生物存在！”
“是啊。等等……我们好像检测到这颗星球上大量存在着另一种气体元素，可以跟我们的排泄物起燃烧反应，也就是说，可以完全把我们的排泄物分解掉！”
“那不就得了，别再想着什么这个法则那个标准了，在这鸟不拉屎的地方谁管的了？赶紧倒完回家吧！”
“哦好的舰长，我们这就去执行……”
这天，在银河系荒凉边缘的一颗行星上空，出现了一颗罕见的火流星。甚至是在白天都比恒星还耀眼。在流星坠落的陨石坑里，有一颗像小山一般大的坠落物，在一颗卫星的反射光（月光）下熠熠发光，把半边天照的通亮。据称，在这颗又名为地球的行星上，大多数名叫人类的低等生物群体都目睹了这一奇观，又由于这颗火流星的残余物质——在地球当地语言中被称作钻石——在这种低等生物中具有不菲的价值，导致坠落点附近发生多次抢劫及袭击事件，引起了不小的骚乱。
]]>I’m a gardener. While gardening for almost 30 years, I feel like my life is shorter each time when flowers blossom.
It is like the plant is your timer, they count down your life’s length each seasons.
But meanwhile, they also add on the width of my experience, which is consist of, but not exclusive of my life length.
Why? Because I have seen and cultivated too many plants. Plants they have their character, just like people. Some don’t like too much sunshine, some are hunger for water and nutrition. Their variety make my experience prosperous.
And planting will seldom disappoint you. You seed, you harvest, just that simple cause-effect. The plant won’t betray you if you’re treating them well.
I’m a bachelor and have no children. Now I’m old and will be buried in the earth soon, like all the seeds I have sowed. Some already dead but some will live longer than me. My last will is someone can sow my favourite seed above my ground. When people pass by, they will say, “What a gorgeous life!”.
]]>医药发展的确是展示现代文明的重要成果之一。回想起来，人们享受现代医学“神话”的时光也不过百年而已，在这之前，欧洲人一直将“放血”视为万能疗法，中国人则将水银当做长生不死药。这种治疗靠的更多的似乎是运气，当时的医师如果没有让疾病雪上加霜的话，治坏了只能怪患者自己命该绝，治好了就被视为魔法。在科学和理性的苗头尚未占主导的社会里，医术与魔法并没有明显的界限。这也难怪，回顾起人类各种科学的发迹史大抵如此：天文学诞生于占星，化学诞生于炼金，医学在初期也与巫术不分你我。
所幸历史上那段蒙昧的时期早已过去，我们这代人比以往的数千代人都更能感受到科学启蒙与医学发展之后的成绩，也都或多或少地从中受益。现代医学百年的进步除了新技术、新药物的不断发现之外，还在于摒弃非理性而用数据说话，一个药物或疗法到底有没有效果，随机对照试验是检验疗效的”金标准”，药物上市前后的层层监管也是对用药安全的保障。
此时此刻，医学更容易被一些人看做是魔法，并且这个“魔法”的力量似乎比以往更灵验了。诸如“攻克某某疾病”、“药到病除”、“一针根治永不复发”以假乱真的宣传攻势更是不断加强了一些人对”医学是魔法”的幻象。对于此种现象，精神病学家托马斯·斯扎斯则在他的书《第二宗罪》里有段很精妙的描述：“在宗教盛行而医学尚孱弱的时代，人误将魔法当做医学；现在，在这个宗教式微而医学繁荣的时代，人却误把医学当做魔法”。认为医学进步就能消除一切疾病的“科学主义”在如今取代了宗教变成新的信仰。
在悉心治疗下康复的人最能感受到医学进步的伟大力量；但也只有饱受疾病困扰的人才最能体会到医学的局限与无奈：我们对多数疾病的认识较之从前有了大大的提高，但能被真正“治愈”的病掰着手指就能数出来，对于多数疾病，医学的诊断手段远远超前于治疗，这也就是说，很多情况下医生只能为病人提供一个明确的诊断，但治疗效果却不甚理想。就算在治疗上取得最辉煌领域的传染病领域，消灭了天花艾滋又接踵而至，抗生素拿下了普通的细菌却产生了各种超级耐药菌，于是抗感染治疗仍是每个医生面对老年、重症患者的棘手问题。即便医疗提高了人均寿命，但是无病生存的时间却仍同过去没什么差别，这说明延长的那段寿命都是带病生存的时间，人口老龄化以及慢性病的困扰，让许多人处于“治不好也死不了”的状况。
但是医学真的要同曾孕育它的“魔法”划清界限吗？如果除去笼罩在魔法中的迷信因素，我们能看到魔法通过巫术对患者精神的宇宙的控制，通过仪式营造出的一种具有心理支持的气氛，这恰恰契合了当今医学实践中所提倡的人文关怀，毕竟医学再怎么理性精准，它所研究的不是毫无感情的物理世界，而是活生生不断变化的人，这个变量是始终无法从临床试验中剔除的。医学中的人文关怀是架起医学的科学性与魔法中的一座桥梁。医生的抚慰、护士的悉心照料，这些不确定因素尽管在科学与研究中无法量化统一，但它们的确是病人康复不可忽视的神奇“魔法”。
如果未来文明的人穿越到现在，他可能会惊讶于当今外科手术的粗鄙，也不能理解我们为什么一直连区区癌症都拿不下。他也许会向我们展示他那个时代经过科学与理性充分发展的医疗，在我们眼中那也许如同圣经中所描述的神迹一般，立竿见效，抚慰人心。
已发表于《东方早报 身体周刊》 ]]>直到我当上医生，才真正体会到这句话的深意，医院里的事儿确实是人间百态的加强浓缩版，在医院待久了，平时生活中的喜怒哀乐比起生离死别的降临就真的算不上什么了。
这天清晨，我正在例行查房，碰上35床老王的儿子来探视。他儿子每次来医院，要么是因为住院费已经欠到取不出药不能再拖的地步，来交住院费，要么就是来跟医生理论某个自费药物是否有使用的必要。今天他又是一身西装革履的行头，以迅雷不及掩耳之势从老爹床边走过一趟，象征性地表示看过老头子，然后就把主要的时间和精力投入在与管床大夫的交流中，交流的主题无外乎列举他孝顺自己老爹的事例，这些话听得病房里所有医生的耳朵都快长了茧子，接着开始一笔一笔地追究住院单的明细，精确到小数点后第两位。我从他们身边经过，无奈地摇了摇头。
我手上的几个病人中，数28床小刘最重。每次查房我都会花挺长时间询问和观察他的病情变化。他才25岁，三个月前体检时发现血小板、白细胞降低，做了骨髓穿刺查出来是急性粒细胞白血病。最近他的血细胞数量低得吓人，这几天接二连三地输血；肺部感染一直高热，把抗细菌、真菌的抗生素都用上了也不见效，本以为能够尽早进行的骨髓移植因此也只能暂时搁置。小刘有个女友，为了照顾他，辞去了工作几乎每天陪在他身边，不仅调理他的饮食，也承担了很多护理工作。小刘血小板低，为了防止牙龈出血不能刷牙，只能用棉球清洁口腔，他女友很聪明，很快就从护士那里学会了操作步骤，这样在护士忙不过来时就能多给小刘做几次口腔护理，也降低了感染的可能。小刘的层流床旁还挂着女友亲手做的贴心装饰。之前他情况稳定的时候，在医院的小花园里经常能看到女友与他在夕阳并肩排散步的身影，那温馨的情景真教人怜惜为何疾病偏要降临在他们身上。
后来小刘的病情趋于稳定，做骨髓移植的钱也已经到位。在移植舱里，他们便只能隔着一扇小玻璃交流。就在我们都觉得事情正朝着我们期待的方向走时，小刘突然出现了高烧不止，可是这次，压上最强的抗感染药物也没能挽回，三天后，小刘因严重的感染去世了。
我最后一次看到小刘的女友仍是在那个花园里，“上次我和他来这里散步，是4月15号晚上6点，如果时间能定格在那一刻，再也别向前走，该多好……”我一阵心悸，在死神面前，再一次感觉自己是如此的无力。
在小刘去世后的几个月里，有时候我值夜班，从梦中叫醒，走在昏黑的走廊中，闻着只有夜班大夫才能体会到的晨雾特有的气息，穿过一间又一间病房，在恍惚间回忆及想象着白天在这里已经和将要上演的人生百态。在这些故事中，小刘和他女友的事情经常被我想起，这对伴侣手牵着手、慢慢地走在人生里的情景，同多年前病理课上的那句话一起，深深地定格在记忆里。
已发表于《东方早报 身体周刊》 ]]>在这样的环境中，咖啡馆能存活五年之久也算是小小的奇迹。这间与「博尔赫斯」同名的咖啡馆并不大，所以日常经营的事务基本上只由店主和一个店员在打理。我因为常去，很快就和老板混熟了。他四十岁光景，原先是个律师，因厌倦了与丑陋的官员打交道，便辞去工作跑到这边缘地带开了间怪僻的咖啡馆：咖啡的口味马马虎虎，常年只有那么几样，店的招牌也小得近乎隐形，第一次来基本上很难找的到，人气不旺料想是必然的，一整天也不见得能来上两三个人，冷清的生意使得四五年来咖啡馆一直在赔钱，靠家庭补贴才勉强维持下去，直到最近才稍有起色。这里吸引我的是店内设计、老板藏书和音乐营造起的独特氛围，不似有些咖啡馆仅想靠做些表面文章则处处透露着媚俗。因此也只有少数像我这样与之臭味相投的人才会一次次造访这里。
我也就是在这间咖啡馆遇见老周的。不知道老板为什么会介绍我们认识。那天下午我因为实验中的一点问题卡了壳，就跑到咖啡馆放松一下。老板一见我来，立马招呼我过去坐下，「小郑啊，你来得可真是时候，正好要介绍你们认识认识！」老板指着他身旁之前一直在跟他聊天的一位大胡子，「周鲲，我的老交情，刚从台湾过来。」「郑洋，北大生物系的讲师，是这里的熟客。」就这样，我认识了老周。
老周是土生土长的花莲人，大学读的建筑专业，毕业那年，当其他同学纷纷投身于建筑公司、房地产、政府部门、金融行业时，他却做起了诗人。十年后同学聚会，其他同学都已经有车有房，各自过着体面的生活，而他却还是如刚毕业般清贫，几杯酒下肚，当年的班长搂着老周的肩膀不解地问，「老周啊，我真是看不透你，你可是当年咱班的才子啊，这么多年了，凭你的头脑找点正事做做不是什么难事啊，我们都是而立的人了，已经过了做梦的年纪，也该考虑考虑成家立业的事情了，可你怎么到现在这么大了还在做梦？！ 」「那你当时怎么回应的？」我问老周。他抿了口烟斗，徐徐的烟雾缭绕着，他微笑着说，「别急，听我说完后面的故事。」又过了十年，同学再次聚首，很多人的名片上印了更高的头衔，买了更大的房子，孩子正读着更好的学校，而他们眼里的老周，跟十年前一样在经济上依然拮据。老周没有什么头衔和标签可介绍，只说自己这些年一直在做一些令自己感兴趣的创作，虽没有名气，却也自得其苦和乐。后来，他的同学私下跟老周说，就是从那次聚会后，他们发现跟老周相比，自己的生命已经结束了，再过十年又能怎样？自己不过是又换了更大的房子、更好的车子，老婆呢当然如果你想，也可以换更漂亮的，可那和现在又有什么实质的区别呢？
这么多年下来，老周写过诗，画过画，遇见了很多人，走过了很多地方，也出了几本小印量的诗集，他没有一份固定的工作，五十好几依然孑然一身。老周有很多爱好，其中之一就是跟有趣的人聊天。「 现在我可正享受着我的爱好哦， 」 老周笑着用他的台普腔跟我说。
可我自认为自己是个无趣之人，经历平平没什么值得书写的：从这所学校毕业后，我就被出国的潮流卷到大洋彼岸，读完博又和大多数海归一样，在学校里谋到一份稳定的教职，在这条航线上，自己就像个牢牢地把握方向的舵手，小心翼翼地避免任何意外和差错，生怕被什么莫名的洋流卷入未知的境地，就这样循规蹈矩到如今已经三十好几，成了家有了孩子，看似有份光鲜稳定的工作，不过在庞大的学术机器里，我也不过是颗高级螺丝钉，研究搞得不上不下，课也讲得不瘟不火，既不是什么青年专家、学术明星，也不是领导身边的红人，属于那种就算消失个把月也不会有人惦记的边缘人物。每每在校园里看到那些年轻脸庞，我就愈发明显地感觉到自己体内的那点东西在逐渐收缩。唯有来到这间咖啡馆，那东西才得到暂时的保护。
在这之后，我造访咖啡馆的次数愈加频繁，常常一壶茶、一支烟斗伴着我们聊天到深夜。我从未见过一个人能够抵达如此的生命深度，拥有如此丰厚的人生经历，在他身上，我看到了在此前生活中未尝见过的世界，好像自己之前从未活过一般。和老周接触得越深，我就愈发感觉到自己身上那股东西的悸动。在那些夜里，竹林里的博尔赫斯咖啡馆向我发出了危险的诱惑。
可我从来就没有挣脱和追寻的勇气，现在的我，还可以吗？
]]>它真是个难以言表的家伙
它奴役了寂寞 侵蚀日出前的空白
它重构了记忆 纠缠经历后的悲哀
它是座循环往复的旋梯 将昔日与未来的灵魂附于现在
它包含着多重谜题 你却不必急着将其一一解开
]]>