On Code etc.

14 notes

The Haskell / Snap ecosystem is as productive (or more) than Ruby/Rails.

This may be controversial, and all of the usual disclaimers apply - this is based on my own experience using both of the languages/frameworks to do real work on real projects. Your mileage may vary. Because this is something that has the potential to spiral into vague comparisons, I am going to try to compare points directly, based on things that I’ve experienced. I am not going to say “I like Haskell better” or anything like that, because the point of this is not so much to convince people about the various merits of the languages involved, just to point out that I’ve found that they both are as productive (or that Snap feels more so). For Haskell programmers, this could be an indication to try out the web tools that you have available, especially if you are usually a Rails developer.

As a note - some of this could also apply to other haskell web frameworks (in particular, most of this pertains to happstack, and some pertains to yesod), but since Snap is what I use, I want to keep it based on my own personal experience.

\1. The number one productivity improvement is a smart strong type system. This is less of an issue for small projects, but as soon as you have at least a few thousand lines of code, adding new features or refactoring inevitably involves changes to multiple parts of the codebase. Having a compiler that will tell you all the places that you need to change things is an amazing productivity booster. This can be approximated in some ways with good test coverage, but it is really a different beast - tests often need to be changed as well, and if you aren’t very careful about this it is easy to change them in ways that don’t catch new bugs. Additionally, it is hard (or very tedious, if you do it wrong) to achieve high enough coverage to actually catch all of the bugs introduced in refactoring. This as compared to a compiler that is completely automated and will always be aware of all of the code you have and the ways that it interacts (at least to the extent that you actually use the type system - but if you are a good haskell programmer, you will).

This alone wouldn’t be enough to suggest using Haskell/Snap over Ruby/Rails, as a type system isn’t worth much without supporting libraries, but as I switch between the ecosystems, this is the place where I notice the most drastic improvements in productivity, so I put it first.

\2. Form libraries. There are many different libraries for dealing with forms in Rails, and there is the built in one as well. The general idea is that you define some validations on your models, and then use the DSLs from the form libraries to define forms, and can do validations, etc. In Haskell (in my opinion), the best form library is Digestive-Functors (thanks Jasper!), and the productivity difference is staggering in more complex use-cases. In the sort of vanilla examples that rails has, the validation system works quite well, and dynamic introspection allows you to write really short forms. This begins to break down when you start getting forms that don’t correspond in a simple way to models. I have forms that are sometimes a mix of two models, or forms that are a partial view into a data structure, or any number of other variations.

With Digestive-Functors, I can define the forms that I need, and re-use components between multiple forms (forms are composable), and these validations are on the form, not on the underlying model. It is obviously useful to database level data integrity checks, but I find that having them being the main / only way of doing validations is really limiting - because sometimes there are special cases when you want the validation done one way and other times another.

More generally, it is possible that the business logic of a specific form may have requirements that do not always have to hold for the datastore, and thus should not reside in the integrity checks. Having written a lot of forms (who hasn’t?), I find that getting the first form out is much faster with Rails, but inevitably when I need to change something it starts become difficult fast. Every time I am doing it I keep picturing an exponential curve - sure it starts out really small, but it gets really big really fast! It isn’t that I run into things that are not possible with Rails, but they end up being more difficult, more error prone, and generally reduce my productivity. With Digestive-Functors, I spend a little more time building the forms in the beginning, but I’ve never had requirements for a form that weren’t easily implemented (almost without thinking).

\3. Routing is the next big one. This may be more of an opinion that the previous ones, but I have always thought that great care should be involved in designing the url structure of a site. In this sense, I guess I disagree with the idea of universally using REST - I think it is very useful when writing APIs, but when designing applications for people, I believe the urls should be meaningful to the people, not to machines. Usually, right after modeling the data of an application, I make a site-map - this is a high level view of what the site should look like. Instead, with Rails, I spend time thinking of how I can adapt what I want to the REST paradigm, and usually end up with something that is an incomplete/counterintuitive representation.

More broadly, I think the idea of hierarchical routing is brilliant - the idea that you match routes by pieces. What this allows you to do is easily abstract out work that should be done for many different related requests. In Rails, this is approximated by :before_filters (ie, it a controller for a specific model, you might fetch the item from the id for many different handlers), but it is a poor substitute. For example I often have an “/admin” hierarchy, and to limit this, all I have to do is have one place (the adminRouter or something) that does the required work to ensure only administrators can access, and it can also fetch any data that is needed, and then it can pass back into the route parsing mode. Or if I want to do the rails-style pre-fetching, then I design the routes as “/item/id/action” and have a handler that matches “/item/id”, fetches the item, and then matches against the various actions. If I have nested pieces of data, this is just as easy. I could have “item/id/something/add” which adds a new “something” to the item with id “id”, This would all be in the same hierarchy, so the code to fetch the item would still only exist once.

Not only is this very natural to program, it keeps the flow easy to follow when you are looking back at it, and allows backtracking in a great way: if, in a handler, you reach something that indicates that this cannot be matched, like if the path was “/item/id” but the id did not correspond to an actual item, you can simply “pass” and the route parser continues looking for things that will handle the request. If it finds nothing, it gives a 404.

An example of how you could exploit things in a really clean way - if you are building a wiki-like site, then you first have a route that matches “/page/name” and looks up the page with name “name”. If it doesn’t find it, it passes, and the next handler can be the “new page” handler, that prompts the user to create the page. As with everything else, I’m not saying this cannot be done with Rails, simply that it is much more natural and easy to understand with Snap (and Happstack, where this routing system originated, at least in the Haskell world).

\4. Quality of external libraries. Point 2 was a special case of this, since dealing with forms comes up so much, but I think the general quality of libraries in Haskell is superb. One example that I came up against was wanting to parse some semi-free-form CSV data into dates and times. Haskell has the very mature parsing library Parsec (which has ports into many languages, including Ruby) that makes it really easy to write parsers. I ported an ad-hoc parser to it, and found that not only was I able to write the code in a fraction of the time, but it was a lot more robust and easy to understand.

For testing of algorithmic code, the QuickCheck library is pretty amazing - in it, you tell it how to construct domain data, and then certain invariants that should hold over function applications, and it will fuzz-test with random/pathological data. The first time you write some of these tests (and catch bugs!) you will wonder why you haven’t been testing like that before! I don’t really want to go into it here, but the other point is that many of these libraries are very very fast - there has been, over the last couple years, a massive push to have very performant libraries, with a lot of success. The Haskell web frameworks webservers regularly trounce most other webservers, and there are very high performant json, text processing, and parsing libraries (attoparsec is a version of parsec that is very fast).

\5. Templating. In this, I want to directly compare the experience of using Heist (a templating system made by the Snap team) and Erb/Haml (I mostly use the latter, but in some things, like with javascript, I have to use the former). The first big difference is the idea of layouts/templates/partials in rails. I never really understood why there was this distinction when I first used it, and when comparing it to Heist (which has no distinction - any template can be applied to another, to achieve a layout like functionality, and any template can be included within another, to achieve a partial like functionality) it feels very limited.

The other major difference is that the two templating languages in Ruby allow dynamic elements by embedding raw ruby code, whereas the former allows dynamic stuff by allowing you to define new xml tags (called splices) that you can then use in the templates. I have found this to be an extremely powerful idea, as it allows you to not only do all the regular stuff (insert values, iterate over lists of values and spit out html), but can even allow you to build custom vocabularies of elements that you want to use that are designed to go with javascript (so for example, I built an asynchronous framework on top of this, where I had a “<form-async>” tag and “<div-async>”s that would be replaced asynchronously by the responses from the form posts).

It also adapts to being used with (trusted) user generated input - I’ve used it in multiple CMS systems so that, for example, all links to external sites are set to open in new tabs/windows (by overriding the “<a>” tag and adding the appropriate “target”) or allowing the users to gain certain dynamic stuff for their pages. Compared to this, the situation with Haml always seems hopelessly tied up with ruby spaghetti code - not that it always is (you can always be careful), but the split with Heist both feels like a cleaner separation AND more powerful, which is not something you get often, and I think is a sign that the metaphor that Heist created (which is based on a couple really simple primitives) is really something special.

\6. This is sort of an extension of the first point, and I’m putting it towards the end because it is the most subjective of this already quite subjective comparison - I think that web applications built with Haskell/Snap are much easier to edit / add to than corresponding applications in Ruby/Rails. One of the biggest reasons for this is that there is much more boilerplate/code spread in ruby - some of it is auto-generated, other bits is manually generated, but there ends up being code scattered around. It is pretty easy to add new code, but when you want to edit / refactor existing code, it starts to get hard to figure out where everything is. A bit of this relies on conventions to a degree (which you learn), but there is simply less code in Snap, and usually everything pertaining to a specific function is in one place. This has a lot to do with the functional paradigm - there is no hidden state, so generally all the transformations that occur are very transparent, whereas with Rails it is possible for stuff from the ApplicationController being applied, or just various filters coming into play, or stuff from the model, etc. There is no obvious “starting point” if you want to see how a request travels through your application (candidates include the routes file, the controllers, etc), in the same way where with a Snap application, the code to start the web server is in one of the files you write! You can trace exactly what it is doing from there!

In addition, there is also very little “convention” with Snap. It enforces nothing, which has the consequence (in addition to allowing you to make a mess!) of having the whole application conforming to exactly how you think it should be organized. I’ve found that this actually makes it much easier to add new things or modify existing functionality (fix bugs!), because the entire structure of the application, from how the requests are routed to how responses are generated, is based on code I wrote. This means that making a change anywhere in this process is usually very easy - it feels in some ways like the difference in making a change to an application you wrote from scratch and one that you picked up from someone else. There is also a potential downside to this - the first couple applications I built had drastically different organizational systems

(Side note for anyone reading this who is curious: I’ve converged to the following method: all types for the application lives in a Types module or hierarchy, all code that pertains to the datastore lives in a State hierarchy or module in a small application, code for splices lives in a Splices hierarchy, forms live is a Forms hierarchy, and the web handlers live in a Handlers hierarchy. I also usually have a Utils module that collects some various things that are used in all sort of different places. Everything depends on Types and Utils. Splices, Forms, and State are all independent of one another, and Handlers depends on everything. And then of course there is an Application module and Main, according to the generated code from Snap).

This is a major difference in how Snap even differs from some other Haskell web frameworks, that it seems more like a library with which to build a web application instead of a true framework, but in my experience this is actually a really powerful thing, and makes the whole process a lot more enjoyable, because I never feel like I’m trying to conform to how someone else thinks I should organize things.

\7. I’m bundling the performance, security, etc all at once. Rails is a very stable framework, so lots of work has gone into this. But I think the recent vulnerabilities exposed on a lot of major sites (like GitHub) based on the common paradigm of mass-assignment sort of point out the negative side. Snap is much newer, but it was built with security in mind from the beginning, as far as I can tell, and most libraries that I have used have also mentioned ways that it comes up - the entire development community seems a lot more aware / concerned with it.

I think part of this probably has to do with the host languages - ruby is a very dynamic language that has a history of experimentation (so generally, flexibility is preferred of correctness), whereas Haskell is a language where lots of static guarantees are valued, and security is usually lumped in with correctness. For performance, there is no question that Haskell will win hands down on any performance comparison (and on multithreading). Granted, a lot of web code is disk/database bound so this isn’t a huge deal, but it is nice to know that you aren’t needlessly wasting cycles (and can afford to run on smaller servers).

\8. Now, as a counterpoint, I want to articulate what Rails really has over Snap. Number one, and this is huge, is the size of the community. There are a massive number of developers who know how to use Rails (how many are good at it is another question), and this also means that if you are trying to do something it is much more likely that a prebuilt solution exists. It also means that it will be easier to hire people to work on it, and easier to sell it as a platform to clients/bosses.

The Haskell community is surprisingly productive given its size (and some of the tools it has produced are amazing - examples mentioned in this comparison are Parsec, QuickCheck, Digestive-Functors, etc), but there is some sense where they will always be at a disadvantage. This means that if you are doing any sort of common task with Rails, there will probably be a Gem that does it. The unfortunate part is that sometimes the Gem will be unmaintained, partially broken, incompatible, as the quality varies widely. This is a place where a lot of subjectivity comes in - I have found that most of what I need exists in the haskell ecosystem, and if stuff doesn’t it isn’t hard to write libraries, but this could be a big dealbreaker for some people.

Cheers, and happy web programming.

1 note

Using Heist (from the Snap Framework) for CMS functionality.

This is a short post, and it may be totally obvious to people who are familiar with the Heist templating system, but it wasn’t to me at first, so here it is.

As a brief introduction: Heist is a templating system written by the people behind Snap, a Haskell web framework. The basic idea is to write xml, where most of it is regular html, but there are some tags that are special - these are tags that are bound to (essentially) arbitrary Haskell code, that can operate on their children. There are a couple more features, but this the one that really sets it apart: it essentially allows you to extend html with whatever domain logic you want. So for example, if you have an administrator role, you can create a tag <isAdmin> and wrap it around anything you only want displayed to an administrator. You can also create tags that insert text, like a <currentDate/>, and you can have ones that do arbitrarily complex transformations on their children.

But what I want to write about in this post is a simple application that has turned out to be really useful: using it to process the text for a content management system. Now I have only used this when all the people writing are trusted. It would probably be possible to preprocess all the text and only allow a whitelist of tags (and a whitelist of attributes) through, but this is not what I’ve done, so the examples I give may not work for that case (if you do in wrong, you could end up with code injection, DOS attackes etc). Consider yourself warned.

The basic idea is simple: Heist can run just as well on the representation of xml that the XmlHtml library uses (list of nodes) as on files, so we can run our content through Heist with a specific set of splices (what tags are called), which basically define a vocabulary of dynamic things that can be done in any dynamic page. Here’s an example:

Just today, I had a simple request: all the links in the content of the site should open in new tabs if they are external (not within the site). I’m using Markdown as the first pass, which doesn’t have any way to do this (at least in the implementations I know of), so the two options are: make authors write html links and include target=”_blank” (seems unnecessary, and potentially troublesome for non-technical users), add the target attribute some other way. I thought of javascript first, but I prefer sites to work without it, so I started thinking about how to do it serverside. With Heist, the solution was actually obvious - override the “a” tag and have it inspect the href, and if it is external, add the target attribute. To be friendly, we also leave the target alone if it already exists, to allow target=”” to cause nothing to happen.

The code (not the prettiest, but written in just a couple of minutes) is the following:

extLinks = do
 n <- getParamNode
 (href,targets,attrs) <- getHrefTargetAttrs
 stopRecursion
 let n' = case targets of
            (x:xs) -> n -- in this case, they have explicitly specified a target
             [] -> case T.isPrefixOf "http://" href of
               True -> n { X.elementAttrs = ("href", href):("target","_blank"):attrs }
               False -> n
 return [n']
getHrefTargetAttrs = do
 n <- getParamNode
 return $ case n of
   X.Element _ as _ ->
      let href = fromMaybe "" $ lookup "href" as in
      let targets = maybeToList $ lookup "target" as in
      (href, targets, filter ((\e -> e /= "href" && e /= "target") . fst) as)

Another example that I ran into: again, for ease of editing the html/markdown that the users input, all images are accessed relative to the name of the page. This means that, in the source of the page, you only have to write “<img src='something.png'/>” instead of some sort of absolute path. This works fine when the page is being displayed at its proper path, but when summaries are displayed elsewhere, this relative path becomes useless. So again, creating an “img” splice that simply adds the absolute prefix (which can be determined by what page it is in) solves this problem.

I have also found uses where I wasn’t overriding existing tags, but adding new ones. One example was allowing editors to post a list of recent pages by author. So I made a new tag “authorRecentPages” that had an an attribute the authors id, and then the body of the tag is html that shows how to format the individual page.

What I’ve realized is this is a really powerful idea - it allows me to write arbitrary Haskell code and package it up so that anyone who can deal with a little bit of html can use that code on the pages they are working on. There are certainly other ways to do this, but usually it takes quite a bit of work - by deciding that every post is going to have one pass through with Heist, adding new vocabulary is as simple as writing a few lines of code and adding a name for the tag.

1 note

Getting started with Snap-Auth

This is a short guide to getting started with the user authentication that comes with the Snap Framework, otherwise known as the auth snaplet. Once the documentation matures (right now there doesn’t seem to be any - just the code), this will probably become irrelevant, but until then, it should help an aspiring snap developer get started quickly.

If you haven’t already, at least check out some of the basic snap documentation, for example, the Quickstart.

There are actually two snaplets you need - sessions and auth. The first provides support for storing data related to sessions, and the second does the actual authentication. They both provide the ability to have multiple backends, each coming with one that is supported.

To add these two snaplets, we first add their respective states to the main state for the application, which in the generated project (created by snap init), is in Application.hs, and looks something like this:

data App = App
    { _heist :: Snaplet (Heist App) }

(or possibly with more records, or not including Heist, etc). We want to add the states for sessions and auth, so our new data type will look like this:

data App = App
    { _heist :: Snaplet (Heist App)
    , _sess :: Snaplet SessionManager
    , _auth :: Snaplet (AuthManager App)
    }

And be sure to add the following includes to the file:

import Snap.Snaplet.Session
import Snap.Snaplet.Auth

Now we need to set up the initializers. In the generated project, this is done in Site.hs. At the bottom of the file is a function called “app”, that starts out:

app :: SnapletInit App App
app = makeSnaplet “app” “Some name” Nothing $ do
   ...

(Or something similar). If you have heist, it probably has a line like this:

h <- nestSnaplet “heist” heist $ heistInit “templates”

To add the initializers for sessions and auth, we need to first import the proper backends:

import Snap.Snaplet.Session.Backends.CookieSession
import Snap.Snaplet.Auth
import Snap.Snaplet.Auth.Backends.JsonFile

Now we add an initializer for each one:

s <- nestSnaplet “sess” sess $ initCookieSessionManager “site_key.txt” “sess” (Just 3600) -- 1 hour login timeout
a <- nestSnaplet “auth” auth $ initJsonFileAuthManager defAuthSettings sess “users.json”

And on the last line, use these to create the App data type:

return $ App h s a

Now you should be able to use any of the functions exported by the Auth snaplet in you handlers, have fun!

Filed under haskell programming snap

58 notes

iOS is anti-UNIX and anti-programmer.

When I was first learning about UNIX, and learning to use Linux, the most immediately powerful tool that I found was the shell’s pipe operator, ‘|’. Using the commandline (because at that point, linux GUI’s were not so well developed, and the few distros that tried to allow strictly graphical operation usually failed miserably) was at times difficult, and at times rewarding, but it was the pipe that opened up a whole world for me.

I can remember looking through an online student directory in highschool that had names, email addresses, etc. For student government elections it had become popular (if incredibly time consuming) to copy and paste the hundreds of email addresses and send a message to the every student. For me, with my newfound skills, it amounted to something like:

cat directory.txt | grep @ | awk '{print $3}' | perl -pe 's/\n/,/'

It seemed like magic at the time, and in some ways, it still does. What the shell (and UNIX in general) offered was composability - it gave you simple (but powerful) tools, and a standard way of linking them together - text streams. By combining those together, it offered immeasurable power, much more than any single tool. The mathematics of combinations guarantees this.

The more I use graphical interfaces (or anything that does not operate on text streams - commandline curses programs included), the more I am struck by how profound the loss of composability is - each program has to try to implement all the standard things (searching, sorting, transforming) that you might want to do with the information it has, and in that repetition lies inconsistencies and usually plain lack of power. The better ones share common libraries, and gain common functionality, but this only amounts to their least common denominator - two separate programs can not (easily) expose their higher functionality to each other (at least not it compiled languages) in the way that commandline stream processing programs can.

What I realized the other day, is that iOS is the extreme example of that lack of flexibility, taken almost to the point of caricature - the only interaction that is possible is through single applications that for the most part can have no connection to other applications. People rejoiced when copy and paste was added, but that celebration hides a sad loss of the true power that computers have. The existence of files - the only real way that composability is achieved in GUI systems (ie, do one thing, save the file, open with another program, etc) - has been essentially eliminated, and applications must therefore do everything that a user might want to do with whatever data they have or will get from the user.

I’d noticed before how frustrating it was for me to use iOS, but I wasn’t sure until recently exactly why that was, until I realized that it had effectively taken away the one thing that is so fundamental about computers, and why I am a programmer - the ability to compose. Every day I live and breath abstraction, and building things out of different levels of it, and the idea of not being able to combine various parts to make new things is so antithetical to that type of thinking that I almost can’t imagine that iOS was created by programmers. I remember looking at the technical specifications of the most recent iPhone and thinking - that is a full computer, and it’s small enough to fit in a pocket - that is a profound change in the way the world works. But it’s not a computer, it’s just a glorified palm pilot with a few bells and whistles.

Filed under programming iOS rants

45 notes

Baby steps with Mercury - doing file I/O.

The language that I’ve been learning recently is a pure (ie, side-effect free) logic/functional language named Mercury. There is a wonderful tutorial (PDF) available, which explains the basics, but beyond that, the primary documentation is the language reference (which is well written, but reasonably dense) and Mercury’s standard library reference (which is autogenerated and includes types and source comments, nothing else).

Doing I/O in a pure language is a bit of a conundrum - Haskell solved this by forcing all I/O into a special monad that keeps track of sequencing (and has a mythical state of the world that it changes each time it does something, so as not to violate referential transparency). Mercury has a simpler (though equivalent) approach - every predicate that does IO must take an world state and must give back a new world state. Old world states can not be re-used (Mercury’s mode system keep track of that), and so the state of the world is manually threaded throughout the program. A simple example would be:

main(IO_0,IO_final) :- io.write_string("Hello World!",IO_0,IO_1), 
                       io.nl(IO_1,IO_final).

Where the first function consumes the IO_0 state and produces IO_1 (while printing “Hello World!”) and the second function consumes IO_1 and produces IO_final (while printing a newline character).

Of course, manually threading those could become pretty tedious, so they have a shorthand, where the same code above could be written as:

main(!IO) :- io.write_string("Hello World!",!IO), 
             io.nl(!IO).

This is just syntax sugar, and can work with any parameters that are dealt with in the same way (and naming it IO for io state is just convention). It definitely makes dealing with I/O more pleasant.

The task that I set was to figure out how to read in a file. This is not covered in the tutorial, and I thought it would be a simple matter of looking through the library reference for the io library. One of the first predicates looks promising:

:- pred io.read_file(io.maybe_partial_res(list(char))::out, 
                     io::di, 
                     io::uo) is det.

But on second thought, something seems to be missing. The second and third parameters are the world states (the type is io, the mode di stands for destructive-input, meaning the variable cannot be used again, uo means unique output, which means that no other variable in the program can have that value), and the first one is going to be the contents of the file itself. But where is the file name?

The comment provides the necessary pointer:

% Reads all the characters from the current input stream until
% eof or error.

Hmm. So all of these functions operate on whatever the current input stream is. How do we set that? io.set_input_stream looks pretty good:

% io.set_input_stream(NewStream, OldStream, !IO):
% Changes the current input stream to the stream specified.
% Returns the previous stream.
%
:- pred io.set_input_stream(io.input_stream::in, 
                            io.input_stream::out,
                            io::di, io::uo) is det.

But even better is io.see, which will try to open a file and if successful, will set it to the current stream (the alternative is to use io.open_input and then io.set_input_stream):

% io.see(File, Result, !IO).
% Attempts to open a file for input, and if successful,
% sets the current input stream to the newly opened stream.
% Result is either 'ok' or 'error(ErrorCode)'.
%
:- pred io.see(string::in, io.res::out, io::di, io::uo) is det.

With that in mind, let’s go ahead and implement a predicate to read files (much like I was expecting to find in the standard library, and what I put into a module of similar utilities I’ve started, titled, in tribute to Haskell, prelude):

:- pred prelude.read_file(string::in,
                          maybe(string)::out,
                          io::di,io::uo) is det.
prelude.read_file(Path,Contents,!IO) :- 
  io.see(Path,Result,!IO),
  ( Result = ok,
    io.read_file_as_string(File,!IO),
    io.seen(!IO),
    (
      File = ok(String),
      Contents = yes(String)
    ;
      File = error(_,_),
      Contents = no
    )
  ;
    Result = error(_),
    Contents = no
  ).

To walk through what this code is doing, the type says that this is a predicate that does I/O (that’s what the last two arguments are for), that it takes in a string (the path) and give out a maybe(string), and that this whole thing is deterministic (ie, it always succeeds, which is accomplished by wrapping the failure into the return type: either yes(value) or no).

The first line tries to open the file at the path and bind it as the current input stream. I then pattern match on the results of that - if it failed, just bind Contents (the return value) to no. Otherwise, we try to read the contents out of the file and then close the file and set the input stream to the default one again (that is what the predicate io.seen does). Similarly we handle (well, really don’t handle, at least not well) reading the file failing. If it succeeds, we set the return type to the contents of the file.

What is interesting about this code is that while it is written in the form of logical statements, it feels very much like the way one does I/O in Haskell - probably a bit of that is my own bias (as a Haskell programmer, I am likely to write everything like I would write Haskell code, kind of how my python code always ends up with lambda’s and maps in it), but it also is probably a function of the fact that doing I/O in a statically type pure language is going to always be pretty similar - lots of dealing with error conditions, and not much else!

Anyhow, this was just a tiny bit of code, but it is a predicate that is immediately useful, especially when trying to use Mercury for random scripting tasks (what I often do with new languages, regardless of their reputed ability for scripting).

Filed under mercury haskell programming Prolog

26 notes

A functional/logic programming tidbit in Mercury

I just started learning a functional/logic language called Mercury, which has features that make it feel (at least to my initial impressions) like a mix between Prolog and Haskell. It has all the features that make it a viable Prolog, but it also adds static typing (with full type inference) and purity (all side effects are dealt with by passing around the state of the world). Since I recently was interested in learning Prolog, but had no desire to give up static typing or purity, Mercury seemed like a neat thing to learn.

While it is not very well known, the language has been around for over 15 years, and has a high quality self-hosting compiler.

Getting to play around with logic/declarative programming is interesting (and indeed the main reason why I’m interested in learning it), but what seems even more interesting with Mercury is how they have incorporated typing to the logic programming (which, unless I’m mistaken, is a new thing). As a tiny code example:

:- pred head(list(T), T). 
:- mode head(in,    out) is semidet. 
:- mode head(in(non_empty_list), out) is det.
head(Xs, X) :- Xs = [X | _].

The first line says that this is a predicate (logic statement) that has two parts, the first is a list of some type T (it is polymorphic), the second is an item of type T.

The fourth line should be familiar to a prolog programmer, but briefly, the right side says that Xs is defined as X cons’d to an unnamed element. head can be seen as defining a relationship between Xs and X, where the specifics are that Xs is a list that has as it’s first element X.

Now with regular prolog, only the fourth line would be necessary, and that definition allows some interesting generalization. Because head([1,2,3],Y) will bind Y to 1, while head([1,2,3],1) will be true (or some truthy value), and if head(X,Y) were used in a set of other statements, together they would only yield a result if X (wherever it was bound, or unified, to a value) had as it’s first value Y, whatever Y was.

Since Mercury is statically typed, it adds what it calls modes to predicates, which specify whether a certain argument (that’s probably not the right word!) is going to be given, or whether it is going to be figured out by the predicate. The other thing it has is specifications about whether the predicate is deterministic. There are a couple options, but the two that are relevant to this example are det, which means fully deterministic, for every input there is exactly one output, and semidet, which means for some inputs there is an output, for others there is not (ie, the unification fails). These allow the compiler to do really interesting things, like tell you if you are not covering all of the possible cases if you declare something as det (whereas the same code, as semidet, would not cause any errors).

What is fascinating about this predicate head is that it has two modes defined, one being the obvious head that we know from Haskell etc:

:- mode head(in,    out) is semidet. 

Which states that the first argument is the input (the list) and the second is the output (the element), and it is semidet because for an empty list it will fail. The next is more interesting:

:- mode head(in(non_empty_list), out) is det.

This says for an input that is a non_empty_list (defined in the standard libraries, and included below), the second argument is the output, and this is det, ie fully deterministic. What this basically means is that failure is incorporated into the type system, because something that is semidet can fail, but something that is det cannot (neat!). Now the standard modes are defined (something like):

:- mode in == (ground >> ground).
:- mode out == (free    >> ground).

Ground is a something that is bound, and the >> is showing what is happening before and after the unification (the analog to function application). So something of mode in will be bound before and after, whereas something of mode out will not be bound before (that’s what free means) and it will be bound afterwards. That’s pretty straightforward.

What get’s more interesting is something like non_empty_list, where inst stands for instantiation state, ie one of various states that a variable can be in (with ground and free being the most obvious ones):

:- inst non_empty_list == bound([ground | ground]).

What this means is that a non_empty_list is defined as one that has a ground element cons’d to another ground element. (I don’t know the syntax well enough to explain what bound means in this context, but it seems straightforward). What this should allow you to do is write programs that operate on things like non-empty-lists, and have the compiler check to make sure you are never using an empty list where you shouldn’t. Pretty cool!

Obviously you can write data types in Haskell that also do not allow an empty list, like:

data NonEmptyList a = NonEmptyList a [a]

And could build functions to convert between them and normal lists, but the fact that it is so easy to build this kind of type checking on top of existing types with Mercury is really fascinating.

This is (obviously) just scratching the surface of Mercury (and the reason all of this stuff actually works is probably more due to the theoretical underpinnings of logic programming than anything else), but seeing the definition of head gave me enough of an ‘aha!’ moment that it seemed worth sharing.

If any of this piqued your interest, all of it comes out of the (wonderful) tutorial provided at the Mercury Project Documentation page. If there are any inaccuracies (which there probably are!) send them to daniel@dbpatterson.com.

Filed under mercury programming haskell prolog

1 note

data Maybe — harmful?

Here’s a question: is overemphasis of the Maybe type actually harmful, making it easier for Haskell newcomers to write unreliable code?

In some recent Haskell projects, I relied heavily on the Maybe type. It is simple to understand, Maybe is often the first Monad people learn and one of the first places that people start exploring Haskell’s power (realizing you can use do notation with it was a pretty cool moment for me). It is often the first major focal point to Haskell tutorials. And so it’s not surprising that I’ve used it a lot (I bet many people have).

Now I definitely don’t think that Maybe is not useful sometimes, and here’s a good example: looking for an item in a list. It is either there (Just value) or not there (Nothing). What is important about this example is that it is that both are normal, expected results. But what about the case when you are finding a value in a list but it should definitely be there (let’s say you put it in the list, serialized to disk, read it back in, and are inspecting the list), in that case the two possibilities are not equally likely, and passing back a Nothing value might be hiding some underlying problem.

What I noticed about my code is that I had started using the Maybe monad for failure conditions, or in cases where I really only expected the value to be a Just, but it was so easy to use Nothing that I ended up writing code that type checked (and compiled, and ran), but that provided virtually no information about errors that were occurring, or where they occurred. Part of this ease is the way you can use Maybe as a Monad: comp1 >>= comp2 >>= comp3 is so simple and clean, hiding within it that comp1 can genuinely return either a value or not, but comp2 and comp3 should really only not return a value in the case of something being wrong. If you end up with Nothing at the end of this, you really have no idea what actually went wrong, if anything.

Code written this way is difficult to debug once you find a bug, and good at hiding bugs in the first place (because we don’t know if the the result is Nothing because the item in the database or wherever didn’t exist, or because it was formatted incorrectly or because something else happened that shouldn’t have).

What I realized, which is probably obvious to any experienced Haskell programmer, is that Maybe should not ever be used in cases where an error has occurred. There are (at least) two ways of properly handling errors: the first being the Either type, which is like Maybe if Nothing carried a type with it (so you have either Left error-value or Right success-value), or if it is indeed an error that means things are really messed up (and should not keep going), error - a function that causes a runtime error to be raised (that can be caught, but if not, causes the program to exit).

Especially in web programming (where everything I’ve done recently is), calls to error can (and at least with snap, do) cause the request to terminate and a 500 to be sent to the client, which in the case of an error that can not be recovered from, is probably desired! In most other cases, Either is probably a practical solution, as it allows you to fail in the same way as Nothing, but specify where it happened, and maybe some other details. And it can be used in the Monadic style if you import Control.Monad.Error.

So my conclusion with all of this is to only use Maybe when a value can truly be there or not be there, not when it should be there and it’s absence is an error. And, to be careful about using library functions that return Maybe values if in my case they should only not return values in exceptional cases. I’d be curious to know what more experienced Haskell programmers think about Maybe, and whether they’ve come up with different solutions to the problems I’ve run into.

Filed under haskell

2 notes

Haskell web development environment (on a mac)

I’ve been doing a bit of development with the Snap Framework, a web framework in Haskell, and I’m doing this on a Mac development host. Since I am deploying to linux (more specifically, Debian), and prefer to have a similar environment on my development machine as on the production server that it will eventually live on, this meant that just installing the Haskell Platform and other libraries on my laptop wasn’t going to be good enough.

The solution I ended up with is based on Virtualbox - each separate project has a separate virtual machine, that has the same operating system as it will be deployed to (debian, in this case), and all the same libraries. This is also helpful in that it keeps everything separate, so that the libraries of one application won’t conflict with any other (I use this process for non-haskell web development too). The virtual machine is set up as a server, so it will only be accessed remotely or via command line (and that only to set up ssh in the first place).

Then, each virtual machine mounts a folder from the host, which contains the source code for the application. With Virtualbox, this is done with “Shared Folders”, but I’m sure any other virtualization solution would provide something similar. Then once the application is running, set up port forwarding so that the host machine (my development laptop) can access the application running inside the virtual machine. And access it via SSH too, of course.

I’d been doing this for a little while, and keeping an ssh session with gnu screen open to the virtual machine in order to rebuild the application and restart it. But, this was sort of a pain, and I wanted to make it even more transparent that I was using a virtual machine at all (so needing to manually SSH in when I start the machine, and switching to that session to rebuild and relaunch wasn’t ideal).

So the next iteration had two parts: first, use God (the process manager) to maintain the running development web app. This is also duplicating what I have running in production, so this is a good thing! The second step is to make a script that takes care of the process of building and restarting the process, so that it can be easily triggered remotely. The script is very simple:

#!/bin/bash
PATH=/home/dbp/ghc-7.0.4/bin:/home/dbp/.cabal/bin:$PATH
cd /path/to/app
cabal-dev install && sudo god restart app-name

Now this can be triggered remotely with:

ssh -p PORTNUM localhost /path/to/script

Where PORTNUM is the port you have forwarded from the virtual machine for SSH (ie, what on the host corresponds to 22 on the virtual machine).

Now to round out my (at least for now) development environment, I am using Sublime Text 2 (just started, so this is all up in the air), and have set up a custom build system that looks line this:

{
"cmd": ["ssh", "-p", "2222", "localhost", "/path/to/build-script"],
}

What this means is that I can edit my application (all the files are on the host, just shared with the virtual machine, so there is nothing involved in doing that), and then with a shortcut trigger the build. This will compile the application and relaunch it, with the output of the compilation in a little panel so I can see what is going on.

The last parts are version control, testing, and deploying. I use darcs, and Quickcheck according to the guide here: How_to_write_a_Haskell_program#Running_the_test_suite_from_darcs, which allows you to run the tests as a precommit hook (which means commit will fail if the tests do not pass) like:

darcs --record --test

I am constantly pushing patches to a repository at Darcsden, a shared hosting site similar to github but for darcs, which serves as a level of redundancy (and if working with other developers, acts as a common repo we both can push to / pull from). There is also a repository on the production server, which can be pushed to as well.

This repository has a Heroku-inspired (though primitive) post-commit hook to build and deploy the application. This is a part I’d like to improve in the future - building binaries locally and just pushing them is one improvement; another is keeping around the old binaries to make it easy to switch back to a previous version. But for now, this is what the _darcs/prefs/defaults file in the production darcs repo looks like:

apply posthook echo "Rebuilding and deploying...\n" && cabal install && echo "Copying resources...\n" && sudo cp -R resources /path/to/production/root && echo "Restarting App-Name.\n" && sudo /var/lib/gems/1.8/bin/god restart app-name
apply run-posthook

This means that after recording changes locally, to deploy, I run:

darcs push servername:/path/to/production/repo

and select which patches I want to push. It will then build and deploy the application.

Aside from making the deployment more robust, I just need to integrate the version control / deployment into my editor to make this process completely streamlined (ie, being able to do everything from my editor, and just switch between it and a browser to test), which since it has a console pane, should be pretty easy.

9 notes

Was running into this problem, fixed by disabling that setting. Thanks!

leff:

Sorry peeps, still no joy on the “too many redirects” issue. I’ve swapped themes in and out, eliminating my recent theme work as the source. I’m fairly sure it’s a server config issue. Been trying to contact with tumblr support, but only getting unrelated canned responses back so far.
update: there’s a setting:  ’Use descriptive URLs’ under the Advanced tab on your Customize screen. Turning this off seems to have fixed the problem. Thanks to Renee at tumblr support for the help.

Was running into this problem, fixed by disabling that setting. Thanks!

leff:

Sorry peeps, still no joy on the “too many redirects” issue. I’ve swapped themes in and out, eliminating my recent theme work as the source. I’m fairly sure it’s a server config issue. Been trying to contact with tumblr support, but only getting unrelated canned responses back so far.

update: there’s a setting:  ’Use descriptive URLs’ under the Advanced tab on your Customize screen. Turning this off seems to have fixed the problem. Thanks to Renee at tumblr support for the help.

6 notes

How to organize Ocsigen projects to compile to a native code binary (and why this is not good).

disclaimer: This whole post is based on the fact that I was not able to get a certain thing working. Part of the reason to write this is a challenge to someone else to figure out how to do it, and document it. There is extremely little information out there. With that said, I tried pretty hard, and came to the conclusion that it was not possible. If you can figure out how, I will retract my claim that Ocsigen native code is not a viable option for web programming.

edit

This only pertains to native code that is done with static linking. I was able to get ocsigen to link native code dynamically when it was the only library, but was not able to get this working with some external libraries, which only worked with static linking. If dynamic linking was working, all the acrobatics described in this post would be irrelevant. Since I was not able to, this was my experience, but consider this an enormous caveat (and it is terrible that I did not mention this originally. I stopped using ocsigen months ago because of this and other reasons, and wanted to finally get around to posting this, but forgot to mention that critical detail).

edit 2

This was written very negatively. I didn’t intend it to be so - when I first started writing this post it was meant as a guide for someone else who wanted to use statically linked native code with ocsigen. Since it took a bit of work to figure out how to structure the code (and a couple false starts) I wanted to write this down so that it could benefit others. However, partly due to the code organization necessary (and for other reasons), I stopped using Ocsigen for anything but small projects (let’s say, above 800 lines of application code), and I think because of that (and due to re-writing some applications that had reached that limit) I ended up writing this much more negatively than I originally intended. I wanted to write that post, ie, why I stopped using Ocsigen, in another post, but some of it leaked in here unintentionally.

why do I want / need this?

Well, I personally think it is silly to use a language that has a very fast native code compiler and not take advantage of that. But this is a valid point — and my conclusion is that indeed if you are going to use Ocsigen for any even medium sized project, you probably should not use native code.

what I looked at for comparison

The best references were the applications by mfp, particularly ocsiblog. It is a small blog application and also has some even smaller test applications in the repository. It was from these that I got the first native application running.

what it meant I had to change about my app

So the most important thing that you have to keep in mind is that you cannot register any service until runtime. Additionally, you canot register any service twice. These two things serve as somewhat of a death-knell to the cause of native code Ocsigen projects (at least as far as I could tell). The only workaround I could find, which is how the code in the example project by mfp works, is to wrap the entire application (or, all of the application that involves services - non-web libraries can be separate) within a single functor that takes a dummy argument. Then at runtime, you evaluate it once.

Now the reason why this is catastrophic is that it means that all your web code has to be in a single file. Trying to do it any other way, unless you have parts of your application that never interact (ie, never link to one another, never post a form to one another), and you will end up doing multiple evaluation, as far as I could tell. Possibly someone else could make this happen, but the various things I tried did not work. Take this as an open challenge. What would need to happen is that one module would orchestrate loading each one only once and passing them among one another.

interesting echos of haskell purity

Separating code that used services (or more generally, that used to use the server params argument, which is in current ocsigen passed around as Lwt thread data) from that code that is “pure” (ie, doesn’t touch ocsigen), so that the latter could be factored out into separate files (the former all being lumped into one module), was interesting, and reminded me of the isolation that haskell’s IO monad enforces. However, the lack of flexibility in dealing with the “IO” code was pretty limiting.

conclusion

Until this becomes more supported, and someone figures out a way to do this easily and without drastic code reorganization, Ocsigen should be thought of as a byte-code only option. There is enough documentation to make it seem like native code is an option, if only you bother to do it, but I think that is extremely misleading, and it would be worthwhile for the Ocsigen website to make this clear.

Filed under ocsigen ocaml web programming