Alcides Fonseca

40.197958, -8.408312

Posts tagged as Programming

How scientists learn computing and use LLMs to program

“scientists often use code generating models as an information retrieval tool for navigating unfamiliar programming languages and libraries.” Again, they are busy professionals who are trying to get their job done, not trying to learn a programming language.

How scientists learn computing and use LLMs to program: Computing education for scientists and for democracy

Very interesting read, especially since we teach programming to non-CS students, which is fundamentally different. Scientists are often multilingual (Python, R, bash) and use LLMs to get the job done. Their goal is not to write maintainable large software, but rather scripts that achieve a goal.

Now I wonder how confident they are that their programs do what they are supposed to do. In my own research, I’ve found invisible bugs (in bash, setting parameters, usually in parts of the code that are not algorithmic) that produce the wrong result. How much of the results in published articles is wrong because of these bugs?

We might need to improve the quality of code that is written by non-scientists.

The world is an hexagonal boardgames

Hexagons are quite popular in boardgames, given the high number of neighbours, which increases the number of possible actions by players.

Only triangles, squares, and hexagons can tile a plane without gaps, and of those three shapes hexagons offer the best ratio of perimeter to area.

Simon Willison

Which makes hexagon the shape used in Uber’s H3 geographical indexing mechanism, which can be visualized at https://wolf-h3-viewer.glitch.me/

Programming for non-CS is different

[…] the top learning objective was for their students to understand that websites can be built from databases.

I’m pretty sure that the most popular programming language (in terms of number of people using it) on most campuses is R. All of Statistics is taught in R.

End-user programmers most often use systems where they do not write loops. Instead, they use vector-based operations — doing something to a whole dataset at once. […] Yet, we teach FOR and WHILE loops in every CS1, and rarely (ever?) teach vector operations first.

CS doesn’t have a monopoly on computing education: Programming is for everyone by Mark Guzdial

The main take away is that you do not teach Programming 101 to non-Software Engineering/Computer Science the same way you teach to those students. The learning outcomes are different, and so should the content.

Funny how Functional Programming (via vectorized operations) is suggested to appear first than imperative constructs like for or while. This even aligns with GPU-powered parallelism that is needed when processing large datasets.

Food for thought.

A review of "We are destroying software"

Salvatore Sanfilippo (@antirez):

We are destroying software by no longer taking complexity into account when adding features or optimizing some dimension.

Agree.

We are destroying software with complex build systems.

Disagree: they are no longer build systems. They also take care of deployment, notarization, linting, vcs, etc.

We are destroying software with an absurd chain of dependencies, making everything bloated and fragile.

Mostly disagree. Leftpad is a good example of this taken to the extreme. But 90% of the cases are worthwhile. Fixing bugs in onde dependency fixes many downstream projects. However, product maintenance is often ignore in industry, and this is the real issue.

We are destroying software telling new programmers: “Don’t reinvent the wheel!”. But, reinventing the wheel is how you learn how things work, and is the first step to make new, different wheels.

Mostly disagree. Reinventing the wheel is a good academic exercise, but not in a product or service. Do it on your own time or in school.

We are destroying software by no longer caring about backward APIs compatibility.

Agree. We need to care more about the longevity of software and hardware. How long should a car last? Or a phone? I still use a very old iPod, but can’t use my more-recent blackberry.

We are destroying software pushing for rewrites of things that work.

Mostly disagree. I think most of the cases, we lack rewrites of things that do not work. The opposite is much less common.

We are destroying software by jumping on every new language, paradigm, and framework.

I agree, but only for startups/SV. It’s a common practice for CoolCompanies™ to write their software using a newish framework to hire people who are interested in learning (often better engineers). But that only occurs in a minority of the companies producing software.

We are destroying software by always underestimating how hard it is to work with existing complex libraries VS creating our stuff.

Mostly disagree. It’s easier to underestimate building things from scratch.

We are destroying software by always thinking that the de-facto standard for XYZ is better than what we can do, tailored specifically for our use case.

Disagree. We want open and modular software. I hate that the Gmail app is way better than Mail.app. Or that WhatsApp does not talk to Telegram or Signal. I hate silos like instagram that are not good internet citizens by not having open APIs and standards. Yes, standards are slow, but the end result is better for society.

We are destroying software claiming that code comments are useless.

Mostly disagree. We are destroying software by not writing the right comments. Most comments are written by people who write poor code and the wrong comments.

We are destroying software mistaking it for a purely engineering discipline.

I don’t even understand this point. Writing software products and services is engineering: it has millions of tradeoffs.

We are destroying software by making systems that no longer scale down: simple things should be simple to accomplish, in any system.

Disagree. We are destroying software by not spending the resources to build good software. It’s not about scaling.

We are destroying software trying to produce code as fast as possible, not as well designed as possible.

Mostly agree. Again, it’s about economics. Software is build with the constraints provided. Time and team quality are part of those constraints. Maybe we need better leadership and better economics.

We are destroying software, and what will be left will no longer give us the joy of hacking.

Disagree. Enterprise software does not provide joy, and that software is most of what exists. There is still good, indie or opensource software that provides that joy, it’s simply in less and less of products/services.

For most software to be hackable, it would have to incorporate many of @antirez’s complaints. I understand it’s a manifesto, but it lacks consistency and focus.

Score: 5/14

The Aging Programmer

Kate Gregory explains the challenges of growing old. I recommend this talk even more to younger folks who pull all nighters, drink red bull (or worse), and don’t care about their posture.

I’ve been highly concerned with my health as someone whose job is spent mostly in front of a screen. I take care of my posture, but even with standing desks, exercise and external monitor, I’ve been having backaches. In my 20ies, I’ve had issues with my wrists (I’ve since adopted vertical mouses and trackballs). Now I’m experiencing eyesight degradation. Even with this, I’ve learned a few new things:

  • Having more muscles leads to a better imune system, and more independence when you’re older.
  • After your 50ies, driving at night is a problem due to slow adaptation to high-contrast scenarios. One problematic example is the bright screen cars come equipped with. Maybe we need to invest in analog cars for the elderly (not a joke, I also want one of those).

Choose your editor font, tournament style

Coding Font is a web-based tournament game that allows you to select your favorite programming font.

Mine is Inconsolata, which I have used for years in macOS’s Terminal.app), but I’m not sure I ever used in my editor. My VSCode is set to use “Menlo, Monaco, ‘Courier New’, monospace”.

Wanted: an elegant solution for breadth-first iteration of a tree structure.

While working on the enumerative feature of GeneticEngine, I wanted to recursively explore all the instances of a grammar, ideally using cache.

My first solution ended up being DFS as I used Python generators to yield all possible options on sum types and recursively iterating through all arguments in product types.

I’ve written this proof of concept pt implement breadth-first iteration in a generic tree structure that yields the right order. However, I find the extra argument a bit ugly, and I would like a more elegant solution. If you happen to know it, I’m all ears!

Hidden Bug: Python class as definition

While preparing geneticengine to participate in the SRBench 2024 run, I was getting None out of a constructor:


def PredictorWrapper(BaseEstimator): def __init__(self, ind: tuple[str, str]): self.ind = ind

def predict(self, X): _, data = self.prepare_inputs(X) return forward_dataset(self.ind0, data) def to_sympy(self): return self.ind1

def mk_estimator(x): print(f“x={x}”) p = PredictorWrapper(x) print(f“p={p}”) return p

Outputting:

x=('np.log(np.power(dataset[:,1], 10.0))', 'log((x1 ** 10.0))')
p=None


Have you found the bug? It took me probably around 1 hour, mainly because I trusted myself too much (and there many other things going on in the code). If you still haven’t found the bug, check the first 3 characters of the code snippet. A function with only other functions inside returns None.

False and True Positive Testing in Differential Testing

Alive2 is a translation validation tool: given two versions of a function in LLVM IR–usually these correspond to some code before and after an optimization has been performed on it–Alive2 tries to either prove that the optimization was correct, or prove that it was incorrect. Alive2 is used in practice by compiler engineers: more than 600 LLVM issues link to our online Alive2 instance.

John Regehr & Vsevolod Livinskii

Really interesting read on how Alive2 is used alongside the Minotaur superoptimizer and llvm-mca.

European Union will stop funding Open-Source projects in Horizon Program

The European Union must keep funding free software

Since 2020, Next Generation Internet (NGI) programmes, part of European Commission’s Horizon programme, fund free software in Europe using a cascade funding mechanism (see for example NLnet’s calls). This year, according to the Horizon Europe working draft detailing funding programmes for 2025, we notice that Next Generation Internet is not mentioned any more as part of Cluster 4.

NGI programmes have shown their strength and importance to support the European software infrastructure, as a generic funding instrument to fund digital commons and ensure their long-term sustainability. We find this transformation incomprehensible, moreover when NGI has proven efficient and ecomomical to support free software as a whole, from the smallest to the most established initiatives. This ecosystem diversity backs the strength of European technological innovation, and maintaining the NGI initiative to provide structural support to software projects at the heart of worldwide innovation is key to enforce the sovereignty of a European infrastructure.
Contrary to common perception, technical innovations often originate from European rather than North American programming communities, and are mostly initiated by small-scaled organizations.

Previous Cluster 4 allocated 27 millions euros to:

“Human centric Internet aligned with values and principles commonly shared in Europe” ; “A flourishing internet, based on common building blocks created within NGI, that enables better control of our digital life” ; “A structured eco-system of talented contributors driving the creation of new internet commons and the evolution of existing internet commons” .

In the name of these challenges, more than 500 projects received NGI funding in the first 5 years, backed by 18 organisations managing these European funding consortia.

NGI contributes to a vast ecosystem, as most of its budget is allocated to fund third parties by the means of open calls, to structure commons that cover the whole Internet scope – from hardware to application, operating systems, digital identities or data traffic supervision. This third-party funding is not renewed in the current program, leaving many projects short on resources for research and innovation in Europe.

Moreover, NGI allows exchanges and collaborations across all the Euro zone countries as well as “widening countries“¹, currently both a success and and an ongoing progress, likewise the Erasmus programme before us. NGI also contributes to opening and supporting longer relationships than strict project funding does. It encourages to implement projects funded as pilots, backing collaboration, identification and reuse of common elements across projects, interoperability in identification systems and beyond, and setting up development models that mix diverse scales and types of European funding schemes.

While the USA, China or Russia deploy huge public and private resources to develop software and infrastructure that massively capture private consumer data, the EU can’t afford this renunciation.
Free and open source software, as supported by NGI since 2020, is by design the opposite of potential vectors for foreign interference. It lets us keep our data local and favors a community-wide economy and know-how, while allowing an international collaboration.
This is all the more essential in the current geopolitical context: the challenge of technological sovereignty is central, and free software allows to address it while acting for peace and sovereignty in the digital world as a whole.

— OW2

The Register has more information on this issue.

Scope of Generics in Python

Thanks to Continuous Integration, I have found a typing problem in our genetic engine program synthesis framework. It boiled down to me not defining a scope for a type variable.

I started with some code that looked like the following:

with the following error:

main.py:18: error: Argument 1 has incompatible type "P@consume"; expected "P@__init__"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

You can load this example on the MyPy playground if you want to play around with it.

In this case, MyPy is inferring the type of data as dict[str, Callable[[P@__init__], bool], where the key is the init part of the type variable that ends up being different than the use o P inside the consume function. This behavior is because type vars are, by default, bound to the function/method, and not the class. The first step is to actually introduce the explicit annotation for data with the dict[str, Callable[[P],bool]] type, inside Subclass. Now we get a different error:

main.py:17: error: Dict entry 0 has incompatible type "str": "Callable[[P@__init__], bool]"; expected "str": "Callable[[P], bool]" [dict-item]

Now the P type variable in the field annotation is different than the ones inside the method. To actually bind the type variable to the whole class, we need to extend Generic[P]:

Now, we have no typing errors, and we do not even need the explicit type declaration for data.

Most of this issue was due to me not clearly understanding the default behaviors of type variables1. Luckily, if you are able to only support Python 3.12 and upwards, you can use the new, saner syntax. And maybe someday I’ll finish the draft post where I explain why Python’s approach to typing is the best (for prototyping type systems and meta-programming techniques, like we do in GeneticEngine) and the worst (for real-world use).

1 Who the hell creates a type variable through the definition of a variable??

Every time you use -f, a kitten dies

I’ve been only using git for little more than two years now, but having using it daily for every project (even those in subversion servers, via git-svn) I’ve learnt a few tricks and developed my own workflow.

During this semester, I have been working on a 13 people project and we are using git (and github) to manage the code. This means a large code-base with two different teams working on different parts of the software, that depend on each other. And I’m the lucky poor bastard who has to keep updated with the whole system and perform the merges of feature and bug branches.

Working on such environment makes weird stuff happen to the repository and when one gets to merge a branch, discovers everything is now broken and some stuff disappeared. Here are some things to avoid, learned from this and many other projects:

  • Developer A commits some stuff. He then pushes to master.
  • Developer B (almost at the same time) commits and pushes to master.
  • Developer A finds out he forgot to include one file, and commit amends the file. He then pushes with -f (because an amended commit requires it) and B changes are lost for ever (not quite, but B may delete that code once pushed).

Another interesting story is about a feature X that was accepted to be merged into master, but since it was based on a really old version and a total refactor of half of the code. Smart as I were, I decided to do a rebase instead of a regular merge, to resolve merges commit by commit. Turned out I needed to undo the rebase and turn it again on a branch without my conflict solving.

As a rule of thumb, avoid at all costs to use -f, because as easy and attractive as it might seem, in the end it might corrupt your repository. Also, merges are a nice way of keeping your history clean and prevent from losing individual codes.

Writing a compiler using Python, Lex, Yacc and LLVM

I found a good post on how to build your own toy compiler using Flex, Bison and LLVM. I saw one disadvantage right in the beginning: you had to use C++. If I were just prototyping a compiler, I wouldn’t use C++ but rather a dynamic language. And last semester for the Compilers course that’s what I did.

Students were assigned to build a Pascal compiler (actually a subset, but not that small) and the tools suggested were Lex, Yacc (using the C language) and compiling the code into C. I took a different approach and decided to do the project in Python (I actually tried ruby first, but the ruby-lex and ruby-yacc projects didn’t pass my basic tests).

I wrote the language grammar using PLY (the lex and yacc DSLs for python) and it was pretty simple. As for the AST generation, I had only a class Node that accepted an type and a list of arguments while my colleagues using C had to make 1001 structs for each kind of node. Not that it wasn’t impossible using C, but dynamic languages make the code simpler and more clear.

For the code generation, I decided to go with LLVM. It is a very promising project. Just take a look at google’s unladen-swallow or macruby, even parrot is planing on using llvm for their JIT.

For writing the code in Python, I had to use the llvm-py which I may say it’s in a early stages and lacks documentation. That was my major problem using. I had only three resources: the official guide, a presentation in japanese with some source code, and the actual source of the project (in C and C++).

Since every time I got an error in the llvm code generation it crashed the program, I had to dig into the source code of the project and find that error message and reverse engineer what was wrong with my code (usually I was giving values or pointers instead of references and vice-versa). So if you are doing something more complex, you actually need some C++ reading skills.

The project however worked, and I’m making it available so anyone may use the code as an example until better resources are published.

The Github Momentum

History

Long time ago there was this website called SourceForge that hosted the majority of opensource projects. It would offer a unix shell account, hosting and CVS (and later on, SVN) repos and CDN powered downloads. Today a lot of Unix and Windows utilities live there.

Google got big and in 2006 they launched their own OpenSource Hosting with a SVN repository, wiki, issue tracking and downloads. Plain simple, à lá Google.

Then a couple of ruby hackers started a side-project called Github that offered repository hosting for projects that used the Git Version Control System. But it wasn’t a regular hosting like SourceForge, GoogleProjects, or even BitBucket or Launchpad [1], it uses the Web2.0 success model:

Simple to use

If you have used Github, you have seen that the web interface is really simple. Basecamp-like simple. The only thing that is limiting this factor it git itself that is not as straightforward to use as SVN or even Mercurial. But they even did some tutorials and provide some help about git itself, which works pretty well and is making some great opensource projects migrate to their service.

Social network

This is a small difference to the regular services. In Github you can follow2 developers, or simply some project. It has an activity stream (think facebook) where you can be up to date with commits, forks, pushes related to the projects you care.

Freemium

It is free for opensource projects, but they run a business. If you want your company to use their features for your projects, you can buy one of their plans. I find they a bit expensive, specially the lower ones for small teams, but it’s not by chance that they won the Best Bootstrapped Startup Crunchie.

Github Rocks!

I love the decentralization of git and now more than ever I love the offline commits. So bad I migrated everything I had in SVN and I am hosting everything as a git repository in my external hard-drive, VPS and the important ones in Github.

I have tried to use simple git repositories in my VPS and even using Redmine to browse but the experience sucks comparing to Github where you can see the various branches, commits and even get some stats.

There are some nifty features like being able to host your webpage as a github repo, or the per-project wiki that’s very useful for storing the documentation of your opensource project. There is a small different against google’s project wiki: it isn’t available in the repository, which I find weird for these guys that even have snippets in repositories. You can also edit a file and commit right there in the browser, which I use sometimes for quick fixes in my website. But my favorite feature is the commit comments which Gaspar use for code reviewing.

What I most miss is an issue tracker. Google has this, and while Github doesn’t include one, it allows you to integrate with 3rd party services like lighthouseapp. Be there is always hope.

The Catch

As I said before, I love the fact that Github works with the opensource community. They even blog about cool projects they host. There is a general concern about a commercial company hosting most of the opensource projects around (being Google, Github, any of them). I agree that would be safer to have non-profit entities, like the FSF and a non-freetard one, to do this service. However I find the advantages of having an innovative company working on this service enough to have the risk of having most of the opensource projects in the future.

1 The later two are a step ahead of the former and more close to Github.

2 Or stalk…

Thoughts on PDC

Some may accuse me of being a Microsoft guy, but using a mac in the past or so, I can’t really say that about me. Nevertheless, I keep an eye on Microsoft Conferences ( and I even got to attend one or two) because really cool stuff come from them. I’m not kidding about this. Let’s see PDC 2008:

Windows 7

I’ve been following Engineering Windows 7 blog, so I was pretty up to date with this stuff, but seeing real screenshots was pretty impressive. I have mixed feelings about the taskbar redesign. While I really liked the old one, I understand that this way it’s more usable in smaller resolutions (say notebooks or even mobile phones, think Shift or Advantage). But in bigger displays, that are cheaper and cheaper each day, the old style was pretty cool.

The vista style of the windows was predictable, but I really hate it. I do! I hope they get a real theming engine, and not make us use some third party software to make them more macish.

One cool surprise was to see that they fixed the horrible wifi icon in the traybar. Linux and Mac did it right years ago, and in Windows up to Vista and even in Windows Mobile it’s a pain to connect to networks.

About the multi-touch? Well, they had it all along with Surface (and Surface SDK), so no big surprise. We’ll see MS release the iTablet before Apple does.

The Cloud Stuff

a.k.a. Windows Azure

Well, startups are going the Cloud way. Amazon Web Services and Google App Engine are just a first step. Microsoft wants Entreprise costumers to join this trend, and be able to have their business in the cloud. I don’t know if this is going to be such as a success and they think. a) real small business don’t want their data on the clould. They want it in their small server in their intranet. b) Large companies that have the need for a cloud server probably can support having their own infrastructure and not relying on Microsoft. Maybe I’m mistaken, but we’ll see.

James Governor has written a really interesting post on this matter and even mentions OpenID in Azure Services.

More Cloud Stuff

a.k.a. Live Mesh

Live Mesh is the Mobile Me for the rest of us. It syncs files P2P or through the cloud and for those, like me, with several computers rocks.

Since the Mac and Windows Mobile clients came out, I guess I’ll have to give it a try some day.

Dale Lane writes about the transition from USB syncing to Cloud syncing. It’s true Google doesn’t provide a offline sync out of the box in the Android, but I like to have the oldschool method available when needed.

Yet More Cloud Stuff

a.k.a. Live Services

Angus got extra points for the shirt and for spreading the social word among the entreprise developers there.

It’s true that Microsoft has a different view form Google and Yahoo that are embracing the OpenID+OAuth way, but this might change in the future. You can already see some little steps being made.

Dynamic Languages

Oddly, the first dynamic language I noticed in PDC was C#. Really! C# is now lightyears away from Java, and is evolving continuously. Version 4 brings a lot of new features and one of them is the ability to integrate dynamic languages directly in C# using the dynamic type. I believe C# is becoming more of a glue language (LINQ, Dynamic Languages, F#) that allows programmers to switch smoothly to other languages.

As usual, I love John Lam’s talk on IronRuby that besides the usual C#, Silverlight and Testing/Mocking stuff, demoed a Visual Studio Plugin in Ruby and Web Services using Sinatra. You should really take a look at it.

Oslo Modeling tools

DSLs are becoming popular in the several business software. and is something Microsoft was looking at a while ago. While I’d say IronRuby was the way to go (see RSpec examples), they took it further and made their own toolkit, Oslo, to develop both visually and textually Models The language they created to achieve that purpose is called M, and right now is supported through the IntelliPad editor.

In fact this editor was what got my interest in this area, since it’s codename was Emacs.NET, and since I’m in the quest for the perfect editor I wanted to take a look. Well, right now it supports the M language, but “you can extend it using IronPython”:hhttp://www.masteringbiztalk.com/blogs/jon/PermaLink,guid,92ec6f1f-45e5-4b7d-b675-548be5131a07.aspx. I’ll wait to see the first plugins to support different languages in the IntelliPad.

In the meanwhile, take a look at the different Oslo sessions at PDC

Mono

Yeah, Mono gets to be one of the main points of this post, as it should also be very important to Microsoft. The work Miguel and the team is doing gives much more value to .NET and Microsoft, than any other technology they presented in my opinion. Since the Mac and Linux worlds are raising their share, it’s important to let developers target those platforms too. And their doing interesting new stuff too, like the C# compiler service, the C# interpreter and even running .NET apps in the iPhone!

So take a look at his talk, one of the best in the whole PDC.

Of course this wasn’t everything PDC was about, but the stuff that I really care about. And I really liked some of this stuff!

Microsoft starting to really embrace OpenSource

After some steps to embrace the OpenSource model, specially thanks to IronRuby and IronPython projects, the day has come.

Microsoft is shipping OpenSource tools as part of one of their products: jQuery will be part in ASP.NET MVC and Visual Studio, with Intellisense support!

This is great news not because of jQuery itself (nevertheless, my congratulations to John Resig’s team), but because Microsoft is selling a product together with OpenSource code. This has been battled with a lot of effort by the IronPython and Ruby teams. For instance, IronPython is OpenSource, but cannot accept contributions from the community (in source code, bug reports are welcome). And until today, I thought they were doing the same approach with the JS toolkit for ASP.NET.

There’s this project Gimme ECMAScript (or Javascript if you prefer) library designed to make working with “everyone’s favorite scripting language” fun again!_ It is OpenSource, but since it was made by Microsoft:

Due to some licensing restrictions, code contributions from the community will not be accepted, however the Gimme source code is completely free and open to all who wish to view it and learn from it.

I’m glad Scottgu decided not to go with Gimme but with jQuery. (nothing against Gimme, but the community around jQuery is so much wider) This is the real step that tell us that Microsoft is really changing!

Now I can touch ASP.NET again

So after my first real project in ASP.NET 2.0, I’ve never touched ASP.NET again. It’s simply ugly. And coding for the web in a language like C#, or Java is really a PITA. I just want my logic explained, and it’s one of the reasons for Ruby on Rails success.

But today Microsoft has made a small step that may make me experiment some stuff in their web technology again:

This afternoon we released a refresh of our DLR/IronPython support for ASP.NET, now called “ASP.NET Dynamic Language Support”, on our CodePlex site.

This means I will be able to do MVC web applications in Python (or Ruby). This is their response to the RoR success. Of course I like Django the most and I may even use it in the MS stack. This because the Microsoft teams for the IronRuby and IronPython are working to get Rails and Django working in their platforms, which is a really cool thing coming from the company that we all know well.

Offline Commits in SVN

James Bennett wrote an interesting article called Let’s talk about DVCS

I agree 100% with him. People are all migrating to git and other DVCSs, but that approach is not the better for everyone. Myself, I like to have a central repository to all my team projects. I want users to commit to this repository, and not to have another ones. (I use SVN repos as a backup, and I trust myself more that I trust any of the people I work with for this kind of task).

But with SVN you can’t have offline commits, and that may become handy. Yes, that’s true. I’d love SVN guys to add what Bennett called waypoints, a way of doing intermediate commits in your local copy.

Please, get this thing in 1.6, and the GUI tools, please!