Saturday, January 17, 2009

Declarative vs imperative: Math, Algoritms, the universe and everything

 The question of imperative versus pure declarative coding has brought to my mind some may be off-topic speculations.  (so please don´t read it if you have no time to waste):  I´m interested in the misterious relation between mathematics, algoritms and reality (see this for example).  Functional programming is declarative, you can model the entire world functionally with no concern for the order of calculations. The world is mathematical. The laws of physics have no concern for sequencing.

But CPUs and communications are basically sequential.  Life is sequential, and programs run along the time coordinate. Whenever you have to run a program,  you or your compiler must sequence it.  The sequentiation  must be done by you or your compiler or both. The functional declarative code can be sequenced on-the-run by the compiler in the CPU by the runtime. but IO is different. 

You have to create, explicitly or implicitly the sequence of IO actions because the external events in the external world  are not controlled by you or the compiler. So you, the programmer are the responsible of sequencing effects in coordination with the external world. so every language must give you ways to express  sequencing of actions. that is why, to interact with the external world you must think in terms of algorithms, that is , imperatively, no matter if you make the imperative-sequence  (relatively) explicit with monads or if you make it trough pairs (state, value) or unsafePerformIO or whatever. You have to think imperatively either way, because yo HAVE TO construct a sequence. I think that the elegant option is to recognize this different algorithmic nature of IO by using the IO monad.

In other terms, the appearance of input-output in a problem means that you modelize just a part of the whole system. the interaction with the part of the system that the program do not control appears as input output. if the program includes the model of the environment that give the input output (including perhaps a model of yourself), then, all the code may be side effects free and unobservable. Thus, input output is a measure of the lack of  a priori information.  Because this information is given and demanded at precise points in the time dimension with a quantitative (real time) or ordered (sequential) measure, then these impure considerations must be taken into account in the program. However, the above is nonsensical, because if you know everithing a priory, then you don´t have a problem, so you don´t need a program. Because problem solving is to cope with unknow data that appears AFTER the program has been created, in oder to produce output at the corrrect time, then the program must have timing on it. has an algoritmic nature, is not pure mathemathical. This applies indeed to all living beings, that respond to the environment, and this creates the notion of time.

Concerning monadic code with no side effects, In mathematical terms,  sequenced (monadic) code are mathematically different from declarative code: A  function describes what in mathematics is called a  "manifold" with a number of dimensions equal to the number of parameters.  In the other side, a sequence describe a particular  trayectory in a manifold, a ordered set of points in a wider manyfold surface. For this reason the latter must be described algorithmically. The former can be said that include all possible trajectories, and can be expressed declaratively. The latter is a sequence  . You can use the former to construct the later, but you must   express the sequence because you are defining the concrete trajectory in the general manifold that solve your concrete problem, not other infinite sets of related problems. This essentially applies also to IO.

 Well this does not imply that you must use monads for it. For example, a way to express a sequence is a list where each element is a function of the previous.  The complier is forced to sequence it in the way you want, but this happens also with monad evaluation.  

This can be exemplified with the laws of Newton: they are declarative. Like any phisical formula. has no concern for sequencing. But when you need to simulate the behaviour of a ballistic weapon, you must use a sequence of instructions( that include the newton laws). (well, in this case the trajectory is continuous integrable and can be expressed in a single function. In this case, the manifold includes a unique trajectory, but  this is not the case in ordinary discrete problems,) . So any language need declarative as well as imperative elements to program mathematical models as well as algorithms.

Sunday, January 11, 2009

About programming and design

As a physicist, I think that programming, like any design in general, is all about making as little use of brain resources as possible at the time of solving problems and to transmit the solution to others. This is the reason why it is pervasive in all kinds of engineering the concepts of modularity, isolation, clean interfacing (for which referential transparency is part of) etc. To decompose a problem in parts each one with simple solutions and with as little uncontrolled interaction with other parts is a rule of good design simply because we can not think in nothing but around seven concepts at the same time (By the way, an evolutionary psychologist would say that the beatifulness of simplicity is related with this release of brain resources). As a matter of fact, these rules of "good" design do not apply to the design of living beings simply because the process of Natural Selection has not these limitations. that is indeed the reason because natural designs are so difficult for reverse engineering. 
Because such kid of human brain limitations and our lack of knowledge, design has a lot of trial and error. The danger of this is to get lost in the explosion of unfruitful alternatives due to low level issues outside of our high level problem, because the limitation of the tools we are using for it. In this sense, things like the strong type inference is superb for cutting the explosion of erroneous paths that the process of software development can generate. If designing solutions is sculpting order from chaos and against chaos, intelligent tools are the thing needed to keep us concentrated in fruitful courses of action. A physicist would say that the meaning of the engineering activity is to lower the entropic balance of the future by progressively lowering the number of possible states until the only ones permitted correspond with the desired outcomes, called "solutions" and a few bugs, of course.
For me, syntactic sugar is one more of this features that make haskell so great. Once we discover that a solution general enough has a correspondence with something already know, such are the relation of monads with imperative languages, then, why not make this similarity explicit ,with the do notation, in order to communicate it better with other people making use of this common knowledge?. 
I have to say also that, without Haskell, I never dream to have the confidence to play simultaneously with concurrence, transactions, internet communications, parsing and, at the time, keeping the code clean enough to understand it after a month of inactivity. This is for me the big picture that matters for real programming.

Friday, January 09, 2009

A new version of TCache (0.5.5)

has been uploaded to hackage. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/TCache The main addition of this versiĆ³n is the capablity to safely handle transact, and serialize to permanent storage many datatypes simultaneously in the same piece of code and incrementally. Just register each new datatype (with registerType :: ). So it is not necessary to glue all types in advance in a single algebraic datatype. I suppose taht "enhanced composablility" applies to this feature. In this release: Added a Data.TCache.Dynamic. (SEE dynamicsample.hs) - Can handle, transact, and serialize to disk many datatypes simultaneously and incrementally - Dynamic uses the same interface than TCache and add *DResource(s) calls for handling many datatypes simultaneously - Safe dynamic data handling trough a lighter, indexable and serializable version of Data.Dynamic - Added KEY object for retrieving any object of any type. Data.Tcache is a transactional cache with configurable persistence. It tries to simulate Hibernate for Java or Rails for Ruby. The main difference is that transactions are done in memory trough STM. There are transactional cache implementations for some J2EE servers like JBOSS. TCache uses STM. It can atomically apply a function to a list of cached objects. The resulting objects go back to the cache (withResources). It also can retrieve these objects (getResources). Persistence can be syncronous (syncCache) or asyncronous, wtih configurable time between cache writes and configurable cache clearance strategy. the size of the cache can be configured too . All of this can be done trough clearSyncCacheProc. Even the TVar variables can be accessed directly (getTVar) to acceess all the semantic of atomic blocks while maintaining the persistence of the TVar updates. Persistence can be defined for each object: Each object must have a defined key, a default filename path (if applicable). Persistence is pre-defined in files, but the readResource writeResource and delResource methods can be redefined to persist in databases or whatever. There are Samples in the package that explain the main features.