Neopythonic

Friday, December 11, 2009

While-you-type Searching

Here's an idea that is just begging to be implemented as a Firefox extension.

You know how there's a while-you-type spell checker that's always on when you are editing text in a multi-line text box? There should be a feature that takes the last few words you're typing (or the entire current paragraph, or whatever works best), does a Google search, and presents snippets for the top few results in an unobtrusive pop-up window.

Sure, maybe you're thinking "It looks like you're writing a letter. Do you want me to write it for you?" (the Microsoft paperclip). But using web search instead of a fixed set of patterns could actually make this useful. Imagine the number of messages to customer support forums that will never have to be sent because this feature pops up the answer the user was looking for. And so on.

You might also think, "this already exists, it's called Google auto-suggest." But I specifically want it to work when I'm not (yet) actively searching, but just writing. (If it already existed, it might have stopped me from writing this blog post. :-) Twitter might also become a different place if users realized how many others have already entered the same item.

Of course there's a little privacy issue. But still, if this existed, I'd opt in! (In fact, I did half a dozen searches while I was typing this. How much easier it would be if I didn't have to select text, switch to a different tab, paste, and hit enter, losing my writing context each tim.)

Thursday, November 5, 2009

Python in the Scientific World

Yesterday I attended a biweekly meeting of an informal a UC Berkeley group devoted to Python in science (Py4Science), organized by Fernando Perez. The format (in honor of my visit) was a series of 4-minute lightning talks about various projects using Python in the scientific world (at Berkeley and elsewhere) followed by an hourlong Q&A session. This meant I didn't have to do a presentation and still got to interact with the audience for an hour -- my ideal format.

I was blown away by the wide variety of Python use for scientific work. It looks like Python (with extensions like numpy) is becoming a standard tool for many sciences that need to process large amounts of data, from neuroimaging to astronomy.

Here is a list of the topics presented (though not in the order presented). All these describing Python software; I've added names and affiliations insofar I managed to get them. (Thanks to Jarrod Millman for providing me with a complete list.) Most projects are easily found by Googling for them, so I have not included hyperlinks except in some cases where the slides emphasized them. (See also the blog comments.)

Fernando gave an overview of the core Python software used throughout scientific computing: NumPy, Matplotlib, IPython (by Fernando), Mayavi, Sympy (about which more later), Cython, and lots more.

On behalf of Andrew Straw (Caltech), Fernando showed a video of an experimental setup where a firefly is tracked in real time by 8 camaras spewing 100 images per second, using Python software.

Nitimes, a time-series analysis tool for neuroimaging, by Ariel Rokern (UCB).

A comparative genomics tool by Brent Pedersen of the Freeling Lab / Plant Biology (UCB).

Copperhead: Data-Parallel Python, by Bryan Catanzaro (working with Armando Fox) and others.

Nipype: Neuroimaging analysis pipeline and interfaces in Python, by Chris Burns (http://nipy.sourceforge.net/nipype/).

SymPy -- a library for symbolic mathematics in Pure Python, by Ondrej Certik (runs on Google App Engine: http://live.sympy.org).

Enthought Python Distribution -- a Python distro with scientific batteries inluded (some proprietary, many open source), supporting Windows, Mac, Linux and Solaris. (Travis Oliphant and Eric Jones, of Enthought.)

PySKI, by Erin Carson (working with Armando Fox) and others -- a tool for auto-tuning computational kernels on sparse matrices.

Rapid classification of astronomical time-series data, by Josh Bloom, UCB Astronomy Dept. One of the many tools using Python is GroupThink, which lets random people on the web help classify galaxies (more fun than watching porn :-).

The Hubble Space Telescope team in Baltimore has used Python for 10 years. They showed a tool for removing noise generated by cosmic rays from photos of galaxies. The future James Webb Space Telescope will also be using Python. (Perry Greenfield and Michael Droettboom, of STSCI.)

A $1B commitment by the Indian government to improve education in India includes a project by Prabhu Ramachandran of the Department of Aerospace Engineering at IIT Bombay for Python in Science and Engineering Education in India (see http://fossee.in/).

Wim Lavrijsen (LBL) presented work on Python usage in High Energy Physics.

William Stein (University of Washington) presented SAGE, a viable free open source alternative to Magma, Maple, Mathematica and Matlab.

All in all, the impression I got was of an incredible wealth of software, written and maintained by dedicated volunteers all over the scientific community.

During the Q&A session, we touched upon the usual topics, like Python 3 transition, the GIL (there was considerable interest in Antoine Pitrou's newgil work, which unfortunately I could not summarize adequately because I haven't studied it enough yet), Unladen Swallow, and the situation with distutils, setuptools and the future 'distribute' package (for which I unfortunately had to defer to the distutil-sig).

The folks maintaining NumPy have thought about Python 3 a lot, but haven't started planning the work. Like many other projects faced with the Python 3 porting task, they don't have enough people who actually know the code base well enough do embark upon such a project. They do have a plan for arriving at PEP 3118 compliance within the next 6 months.

Since NumPy is at the root of the dependency graph for much of the software packages presented here, getting NumPy ported to Python 3 is pretty important. We briefly discussed a possible way to obtain NumPy support for Python 3 sooner and with less effort: a smaller "core" of NumPy could be ported first, which would give the NumPy maintainers a manageable task, combined with the goal of selecting a smaller "core" which would give them the opportunity for a clean-up at the same time. (I presume this would mostly be a selection of subpackage to be ported, not an API-by-API cleanup of APIs; the latter would be a bad thing to do simultaneous with a big port.)

After the meeting, Fernando showed me a little about how NumPy is maintained. They have elaborate docstrings that are marked up with a (very light) variant of Sphynx, and they let the user community edit the docstrings through a structured wiki-like setup. Such changes are then presented to the developers for review, and can be incorporated into the code base with minimal effort.

An important aspect of this approach is that the users who edit the docstrings are often scientists who understand the computation being carried out in its scientific context, and who share their knowledge about the code and its background and limitations with other scientists who might be using the same code. This process, together with the facilities in IPython for quickly calling up the docstring for any object, really improves the value of the docstrings for the community. Maybe we could use something like this for the Python standard library; it might be a way that would allow non-programmers to help contribute to the Python project (one of the ideas also mentioned in the diversity discussions).

Thursday, September 17, 2009

Lovely Python!

I just heard from Bill Xu in China. His book "Lovely Python", an introduction to Python in Chinese, was just published and shot to the top-5 of china-pub.com's bestseller list (at some point it even was #2). I can't read Chinese, but I am very glad that there's a book on Python available for Chinese readers, and that's why I wrote a brief foreword for the book as well. (Also because I am one of the mentors of Zeuux.org.) Links:

This is the Lovely Python page on the book store:

http://www.china-pub.com/195771

This is the bestseller list:

http://www.china-pub.com/rank/?type=59&act=day&v=7

This is the cover (with front and back) of Lovely Python:

http://www.zeuux.org/pub/lovely-python-cover.jpg

This is my preface:

http://www.zeuux.org/science/zeuux-lovely-python-preface-by-guido.html (English)

http://www.zeuux.org/science/zeuux-lovely-python-preface-by-guido.cn.html (Chinese translation by Bill Xu)

Wednesday, July 22, 2009

Scientists Discover That Hidden Persuaders Are Real

In yesterday's post I mentioned reading George Lakoff's book, The Political Mind. While I agree with the politics of the book in almost every instance, I was still disappointed. For one thing, the book "compresses well." (IOW it contains a lot of repetition. A Lot.) It also felt a bit like a classic bait-and-switch: the back flap touts "the science behind how our brains understand politics" but the contents are 90% political rhetoric, and I'm still in doubt about the science.

The author's premise is attractive enough as far as it goes: our brains don't make perfectly rational decisions, but are influenced by "framing"; and the Republicans have used this to their advantage while the Democrats with their belief in "pure reason" have not properly defended themselves by accepting the conservative framing (for example: "tax relief").

Well, there may be some recent scientific research that confirms that most people are not so good at rational decision making, but honestly, I thought that the importance of framing has been well known for a long time to all politicians -- and advertisers as well. As far as the recent scientific proof for this commonly-known fact, Jonah Lehrer's book "How We Decide" contains at least as much about the research, and the non-scientific parts of his book are better written and, I expect, more future-proof.

I'm also skeptical of the importance of Lakoff's discovery that frames are represented physically in the brain. That's about as insightful as saying that this blog entry exists physically in Google's computers (as magnetic fluctuations on a hard drive). Has he never heard of abstractions? He seems to argue that all of philosophy needs to be thrown away because it ignores this fact. I will gladly accept that we cannot treat the brain as a perfect mathematical machine, and using the embodyment of the mind will probably eventually help us understand consciousness (more likely than abstract reasoning like Douglas Hofstadter's approach, no matter how much I enjoy his puzzles and paradoxes).

But the important message to me is still about how the brain's software works. It's useful to know that frames are reinforced by trauma and repetition, and that it requires a lot of repetition of counteracting frames to override them once they're there. And yes, that the Bush government used this to its advantage is a great example. But I wanted to know more about the science, and less about the politics.

Lakoff's other point is that human beings are born to have emphathy with each other. But he doesn't mention much of the science behind this. That's because in the end he is a linguist, and linguists spend most of their time studying (and arguing about) human language, which was the result of a long evolutionary path and cannot necessarily explain it. And his oft-repeated use of the words America and American in connection to empathy is surely his own little joke, where he's trying to make the reader believe that American values are nurturing values by applying his own theory: say it over and over and the frame will be hard-wired (whether that's literally or figuratively :-) in the reader's mind. As a non-US-citizen I wished the emphasis was on human values, not American values.

Tuesday, July 21, 2009

Progressive vs. Conservative

[Warning: loose thoughts ahead!]

Microsoft's Eric Meijer gave a talk at Google yesterday, and afterwards I had lunch with him. One of his remarks was (I paraphrase) that Microsoft users want to be told what to do, while the Java community is more vocal or argumentative. (He didn't discuss the Python community but in my experience it falls in the latter category.)

Now, while lying sick in bed with a hacking cough, I am reading George Lakoff's "The Political Mind". This book tries to model the distinction between conservative and progressive politics on the differences between two different ideal family models: the strict father (from which most conservative moral virtues flow according to Lakoff), and the nurturing family, from which the progressive moral virtues derived.

The parallel with Microsoft users vs. Java users seems to be all too obvious: Microsoft as the strict father: If you are loyal you will be rewarded, but if you stray you will be punished; whereas in the Java (or Python) community benefits and moral goodness flow from helping each other (which includes sharing open source software, and, apparently, bikeshedding :-).

What about other companies and communities? I can't help thinking of Oracle as the ultimate strict-father company, which makes me worry about the Sun takeover. Are Linus Torvalds and Richard Stallman strict fathers?

Friday, June 26, 2009

IronPython in Action and the Decline of Windows

While CPython still rules on python-dev, especially with the excitement around Py3k, Python's alternative implementations are growing up: PyPy is now capable of running Django, Jython just released version 2.5, and IronPython has been releasing significant milestones like clockwork. I get a lot of satisfaction out of such milestones: they help establish Python as a language you can't ignore, no matter which platform you are using.

Seeing a book like IronPython in Action, by Michael Foord and Christian Muirhead, is another milestone for IronPython. This is a solid work in every aspect, and something nobody using IronPython on .NET should be without. The book is chock full of useful information, presented along with a series of running examples, and covers almost every aspect of IronPython use imaginable.

After reading the table of contents and the introduction by IronPython's creator Jim Hugunin, I couldn't help myself and skipped straight to appendix A, "A whirlwind tour of C#." This is a useful thing to have around for readers like myself who haven't really kept track of things in the .NET world. Maybe I'll comment more on C# another time. For now, let me just say that it seems a decent enough system programming language. The more relevant thing about C# is that you can't avoid learning it if you are developing on .NET, even when using IronPython. There just are too many issues where IronPython has to work around a limitation of C#. This happens often indirectly where a particular API was designed purely with C# in mind. And then there's the issue that Microsoft's API documentation focus on C#. (And VB.NET, I suppose, which after seeing some samples of in this book I have less desire to know than ever.)

There are some introductory chapters -- some fluff about .NET and the CLR, an introduction to Python, and an introduction to with .NET objects from IronPython. The Python introduction has a slight emphasis on differences between IronPython and CPython, though there aren't enough to fill a chapter. This is a good thing! The chapter does a pretty good job of teaching Python, assuming you already know programming. In general, the book is aimed solidly at professional software developers: unless you are paid to do it, why would anyone want to get intimate with Windows?

Yes, Windows programming is what this book is really about. I'm sure that doing Windows programming using IronPython is a much better proposition than Windows programming using C++; but it's still Windows programming. Fortunately the authors maintain a slightly ironic attitude about Windows. I can't help admiring their persistency in getting to the bottom of the many mysteries presented by Windows(and in some cases by IronPython's wrappers).

Many, many years ago -- so long ago that I can't even recall when -- I did some Windows programming myself, using Mark Hammond's Win32 extensions for CPython. That package maps the Win32 API pretty directly to Python. It lets you work with Windows in much the same way as you can in IronPython -- the main difference is that IronPython lives in the more modern .NET world, while Win32 is showing its age.

But is life with IronPython all that much better than in CPython+Win32? It still looks incredibly tedious to create the simplest of UI. Each button in the UI has to be tediously positioned and configured (width, height, padding, font size, etc.). The book maintains a running example throughout many of the chapters, and one of the earlier versions (with many features not yet developed) clicks in at a "mere" 258 lines. Fortunately, the source code for all examples can be downloaded from the book's website. While the downloaded zip file is a whopping 33 megabytes, there's actually only half a megabyte of source code in it (and much of it multiple versions of the same running example) -- the majority of the download is not source code but DLLs that are probably included by the authors because Microsoft scatters them around half a dozen or more different support websites. (Plus, there seem to be multiple copies of IronPython.dll and a few other DLLs included.)

This then, was the big eye-opener for me: that despite all the hype, Windows UI programming is as tedious today as it was in 1995. Sure, the new UI looks a lot better. But that's mostly glitz: 3D effects, color gradients, video, and so on. Sure, it's all object-oriented now. But it hasn't really gotten any less complex to create the simplest of simple UIs. And that's a shame. When is Microsoft going to learn the real lesson about simplicity of HTML? Instead, Microsoft is doing the same thing to HTML that it does to anything it touches: adding cruft to the point where the basic functionality is buried so deeply that most people can't even find it. You can't really blame the average Windows developer for focusing on eye candy instead of usability -- it's all mashed together in the APIs, and by the time you've got something that works at all, you're too exhausted to look at it from your users' perspective.

It's no wonder that users are switching to the web as the platform for everything that used to live on the desktop -- with all its flaws (which I will discuss another time), web development still feels like a breeze compared to Windows development. And that means less time to release, and hence more frequent releases, which in turn means more opportunities for developers to learn what their users actually do. Which as a user I really appreciate.

Tuesday, June 16, 2009

Highs and Lows of IEEE Computer Magazine

I still read a few print publications, including IEEE Computer. Today's issue contained a high and a low:

Today's high point was a detailed history of the Conficker worm. Since we're a Macintosh family, and Google typically has its security stuff in order, I was barely aware of it. The sophistication of the worm's creators is almost admirable. (They probably use Python too. :-) An interesting table in the article included information about which countries contribute the most to the worm's population. China, Brazil and Russia top the list. You could have all sorts of theories on why this would be; personally I'm assuming it's a combination of sheer number of computers plus widespread use of bootlegged copies of Windows.

The low point was an article on "Software Engineering Ethics." Why a low point? Look at this table and think of how many bits of information it contains:

Using postphenomenology for software engineering ethics
Actions	Desirable	Undesirable
Amplify experiences that are	+	-
Reduce experiences that are	-	+
Invite actions that are	+	-
Inhibit actions that are	-	+

Ironically, this pointless table contains a redundant column, while the table I mentioned above was missing a column that would have been useful -- how many PCs are installed in each country. Oh well.

PS: Googling for "postphenomenology" gives this as the title of the first hit: "If phenomenology is an albatross, is postphenomenology possible?" The web knows best.