Tuesday, December 08, 2009

The Transient monad

Concerning Stable Names http://www.haskell.org/ghc/docs/6.10.4/html/libraries/base/System-Mem-StableName.html
I did not test fully my proposal, and I´m thinking aloud, Just to inspire others and fish some ideas: look at the type:
makeStableName :: a -> IO (StableName a) The IO in makeStableName suggest more side effects than the call really do. But still it isn't pure. For calls such are makeStableName that gives a different result the FIRST time they are called but return the same result every time in the same session, I suggest to use a Transient monad: makeStableName :: a -> Transient (StableName a) The key here is to maintain the programmer aware that it is not pure, but there are no IO and that the results have no meaning from session to session. Instance Monad Transient where Transient x ↠ f = f x return x = Transient x We can Transient`ize IO calls by means of an implicit memoization: liftT:: IO a -> Transient a liftT= liftT2=.... liftT3=.... Memorization then is embedded in the monad transformation.
This may be more elegant than IO plus unsafePerformIO and is more informative for the programmer. Instead of unsafePerformIO, we can use: unsafePurifyTransient :: Transient a -> a unsafePurifyTransient (Transient x) = x for the inherently transient calls
The transition form IO to pure can be done in two steps trough Transient: A safer version of unsafePerformIO using implicit memoization could be:
unsafePerformIOT :: IO a -> a unsafePerformIOT = unsafePurifyTransient . liftT unsafePerformIOT guarantee that it returns the same value in the same session
Twan van Laarhoven objected that makeStableName return different values before and after evaluation of an expression. True, makeStableName does not return the same value ever. This is the second time that I forget that. Still I think that Transient or something similar is worth the pain for making a less unsafe transition from IO to pure trough memoization, That was my goal.
For this reason, I want something more narrow than IO to make the programmer aware of the higuer level of safety, rather than something more general.
Stable names are good for memoization however, since memoization of IO procedures can be implemente by hashing their stable names. I think that it is possible to force the evaluation of an expression, then it is possible to have a Transient version of makeStableName.
Transient means that something has been cached and fixed. This can be generalized also for temporary valid computations:
data Time= Session | Time Integer
data Transient a= Trans (Time, a)
instance Monad Transient where
Trans( t,x) ↠ f =
let Trans (t2, y) = f x
in Trans ((min1 t t2) , y)
return x = Trans (Session, x)
min1 :: Time → Time → Time
min1 Session x= x
min1 x Session= x
min1 (Time x) (Time y)=Time $ min x y
This monad will calculate the time that the result remain valid given the time validity of each computation involved.
Well, more fine calculation is needed since the computation takes time too. Perhaps the bind operation can abort the compuitation and send an error when the time is zero. this can be catched to restart the computation again or whatever else.

Saturday, November 21, 2009

Workflow: an Haskell package for transparent support of interruptible computations

A month ago i uploaded the package Workflow to hackage, the repository of Haskell libraries. I added an hopefully self suficient documentations and examples.
Here follows a few remarks, a simple example and a more sophsticated example:

The main features are:

  • Transparent state logging trough a monad transformer: step :: m a -> Workflow m a.
  • Resume the computation state after an accidental o planned program shutdown (restartWorkflows).
  • Event handling (waithFor, waitForData).
  • Monitoring of workflows with state change display and other auxiliary features.
  • Communications with other processes including other workflows trough persistent data objects, inspecttion of intermediate workflow results , Queues so that no data is lost due to shutdowns

Here is a complete example:

This is a counter that shows a sequence of numbers, one a second:

module Main where 
import Control.Concurrent(threadDelay) 
import System.IO (hFlush,stdout)  
count n= putStr (show n ++  ) >> hFlush stdout >> threadDelay 1000000 >> count (n+1) 
main= count 0

This is the same program, with the added feature of remembering the last count after interrupted:

module Main where 
import Control.Workflow 
import Control.Concurrent(threadDelay) 
import System.IO (hFlush,stdout)  
mcount n= step $  putStr (show n ++  ) >> hFlush stdout >> threadDelay 1000000 >> mcount (n+1)  
main= do    
   registerType :: IO ()    
   registerType :: IO Int    
   let start= 0 :: Int    
   startWF  count  start   [(count, mcount)] :: IO ()

This is the execution log:

Worflow-0.5.5demos>runghc sequence.hs 
0 1 2 3 4 5 6 7 sequence.hs: win32ConsoleHandler sequence.hs: sequence.hs: interrupted 
Worflow-0.5.5demos>runghc sequence.hs 
7 8 9 10 11 ....
Here is the complete documentation iof the package
And here is a more sophisticated example, at pastebin.org. The code is also documented.

Tuesday, September 29, 2009

The future of Haskell

Most successful languages spread because they are part of a platform which solves an IT problem. C was part of Unix, both brougth CPU independence when this was necessary. Java is part of the Java platform, that brougth OS independence and interoperability at the right time. Download-execution on the client was also a reason for the initial success of Java in the Internet era. Javascript is part of the web browser. The .NET languages are part of NET. Rubi and Pyton came with libraries targeted to Rapid development of Internet applications.

What is the vehicle that haskell can use to enter the mainstream?. I think that the mere interest of the ideas in the language is not enough. Many people will play with Haskell in the spare time, and many of them will be permitted to develop some non critical applications at work. But that is all. Java was not designed for the Internet but it was re-targeted to it because some needed features where already implemented in Java. Maybe something like that will happen to Haskell.

I think that all the current niches are filled, but new niches are coming. specially with higher level programming that is made on top of current software infrastructure such are BPM, workflows, more flexible scientific applicatins, creation of models in business intelligence, as part of ERPs,.Data mining too. And higuer levels of netwrok communications( for example, Google Wave robots) etc.

About the last point, sometimes a basically identical infrastructure is re-engineered to a higher level, and a new language takes over. For example, the architecture of many Internet applications in the 80s was client-server based, where C, C++ was the king. This was substituted by the web architecture with Java because Java was involved in the gradual change by filling the holes of the new architecture. It could be that in a few years, instead of Web sites people could develop interoperable gadgets for aggregators such are netvibes or IGoogle or, even more radical, robots and gadgets in google Wave. Anyway, for sure, people will think and develop at a higher level.

Financial applications are an example of higher level programming where tasks usually performed by humans are now automatized and there is no or few traditions about that. The need to think at a higher level without being worried by side effects and other details are specially needed in such kind of areas. That's where haskell could have its own niche.


Saturday, January 17, 2009

Declarative vs imperative: Math, Algoritms, the universe and everything

 The question of imperative versus pure declarative coding has brought to my mind some may be off-topic speculations.  (so please don´t read it if you have no time to waste):  I´m interested in the misterious relation between mathematics, algoritms and reality (see this for example).  Functional programming is declarative, you can model the entire world functionally with no concern for the order of calculations. The world is mathematical. The laws of physics have no concern for sequencing.

But CPUs and communications are basically sequential.  Life is sequential, and programs run along the time coordinate. Whenever you have to run a program,  you or your compiler must sequence it.  The sequentiation  must be done by you or your compiler or both. The functional declarative code can be sequenced on-the-run by the compiler in the CPU by the runtime. but IO is different. 

You have to create, explicitly or implicitly the sequence of IO actions because the external events in the external world  are not controlled by you or the compiler. So you, the programmer are the responsible of sequencing effects in coordination with the external world. so every language must give you ways to express  sequencing of actions. that is why, to interact with the external world you must think in terms of algorithms, that is , imperatively, no matter if you make the imperative-sequence  (relatively) explicit with monads or if you make it trough pairs (state, value) or unsafePerformIO or whatever. You have to think imperatively either way, because yo HAVE TO construct a sequence. I think that the elegant option is to recognize this different algorithmic nature of IO by using the IO monad.

In other terms, the appearance of input-output in a problem means that you modelize just a part of the whole system. the interaction with the part of the system that the program do not control appears as input output. if the program includes the model of the environment that give the input output (including perhaps a model of yourself), then, all the code may be side effects free and unobservable. Thus, input output is a measure of the lack of  a priori information.  Because this information is given and demanded at precise points in the time dimension with a quantitative (real time) or ordered (sequential) measure, then these impure considerations must be taken into account in the program. However, the above is nonsensical, because if you know everithing a priory, then you don´t have a problem, so you don´t need a program. Because problem solving is to cope with unknow data that appears AFTER the program has been created, in oder to produce output at the corrrect time, then the program must have timing on it. has an algoritmic nature, is not pure mathemathical. This applies indeed to all living beings, that respond to the environment, and this creates the notion of time.

Concerning monadic code with no side effects, In mathematical terms,  sequenced (monadic) code are mathematically different from declarative code: A  function describes what in mathematics is called a  "manifold" with a number of dimensions equal to the number of parameters.  In the other side, a sequence describe a particular  trayectory in a manifold, a ordered set of points in a wider manyfold surface. For this reason the latter must be described algorithmically. The former can be said that include all possible trajectories, and can be expressed declaratively. The latter is a sequence  . You can use the former to construct the later, but you must   express the sequence because you are defining the concrete trajectory in the general manifold that solve your concrete problem, not other infinite sets of related problems. This essentially applies also to IO.

 Well this does not imply that you must use monads for it. For example, a way to express a sequence is a list where each element is a function of the previous.  The complier is forced to sequence it in the way you want, but this happens also with monad evaluation.  

This can be exemplified with the laws of Newton: they are declarative. Like any phisical formula. has no concern for sequencing. But when you need to simulate the behaviour of a ballistic weapon, you must use a sequence of instructions( that include the newton laws). (well, in this case the trajectory is continuous integrable and can be expressed in a single function. In this case, the manifold includes a unique trajectory, but  this is not the case in ordinary discrete problems,) . So any language need declarative as well as imperative elements to program mathematical models as well as algorithms.

Sunday, January 11, 2009

About programming and design

As a physicist, I think that programming, like any design in general, is all about making as little use of brain resources as possible at the time of solving problems and to transmit the solution to others. This is the reason why it is pervasive in all kinds of engineering the concepts of modularity, isolation, clean interfacing (for which referential transparency is part of) etc. To decompose a problem in parts each one with simple solutions and with as little uncontrolled interaction with other parts is a rule of good design simply because we can not think in nothing but around seven concepts at the same time (By the way, an evolutionary psychologist would say that the beatifulness of simplicity is related with this release of brain resources). As a matter of fact, these rules of "good" design do not apply to the design of living beings simply because the process of Natural Selection has not these limitations. that is indeed the reason because natural designs are so difficult for reverse engineering. 
Because such kid of human brain limitations and our lack of knowledge, design has a lot of trial and error. The danger of this is to get lost in the explosion of unfruitful alternatives due to low level issues outside of our high level problem, because the limitation of the tools we are using for it. In this sense, things like the strong type inference is superb for cutting the explosion of erroneous paths that the process of software development can generate. If designing solutions is sculpting order from chaos and against chaos, intelligent tools are the thing needed to keep us concentrated in fruitful courses of action. A physicist would say that the meaning of the engineering activity is to lower the entropic balance of the future by progressively lowering the number of possible states until the only ones permitted correspond with the desired outcomes, called "solutions" and a few bugs, of course.
For me, syntactic sugar is one more of this features that make haskell so great. Once we discover that a solution general enough has a correspondence with something already know, such are the relation of monads with imperative languages, then, why not make this similarity explicit ,with the do notation, in order to communicate it better with other people making use of this common knowledge?. 
I have to say also that, without Haskell, I never dream to have the confidence to play simultaneously with concurrence, transactions, internet communications, parsing and, at the time, keeping the code clean enough to understand it after a month of inactivity. This is for me the big picture that matters for real programming.

Friday, January 09, 2009

A new version of TCache (0.5.5)

has been uploaded to hackage. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/TCache The main addition of this versión is the capablity to safely handle transact, and serialize to permanent storage many datatypes simultaneously in the same piece of code and incrementally. Just register each new datatype (with registerType :: ). So it is not necessary to glue all types in advance in a single algebraic datatype. I suppose taht "enhanced composablility" applies to this feature. In this release: Added a Data.TCache.Dynamic. (SEE dynamicsample.hs) - Can handle, transact, and serialize to disk many datatypes simultaneously and incrementally - Dynamic uses the same interface than TCache and add *DResource(s) calls for handling many datatypes simultaneously - Safe dynamic data handling trough a lighter, indexable and serializable version of Data.Dynamic - Added KEY object for retrieving any object of any type. Data.Tcache is a transactional cache with configurable persistence. It tries to simulate Hibernate for Java or Rails for Ruby. The main difference is that transactions are done in memory trough STM. There are transactional cache implementations for some J2EE servers like JBOSS. TCache uses STM. It can atomically apply a function to a list of cached objects. The resulting objects go back to the cache (withResources). It also can retrieve these objects (getResources). Persistence can be syncronous (syncCache) or asyncronous, wtih configurable time between cache writes and configurable cache clearance strategy. the size of the cache can be configured too . All of this can be done trough clearSyncCacheProc. Even the TVar variables can be accessed directly (getTVar) to acceess all the semantic of atomic blocks while maintaining the persistence of the TVar updates. Persistence can be defined for each object: Each object must have a defined key, a default filename path (if applicable). Persistence is pre-defined in files, but the readResource writeResource and delResource methods can be redefined to persist in databases or whatever. There are Samples in the package that explain the main features.