Alcides Fonseca

40.197958, -8.408312

Posts tagged as Programming

The world is an hexagonal boardgames

Hexagons are quite popular in boardgames, given the high number of neighbours, which increases the number of possible actions by players.

Only triangles, squares, and hexagons can tile a plane without gaps, and of those three shapes hexagons offer the best ratio of perimeter to area.

Simon Willison

Which makes hexagon the shape used in Uber’s H3 geographical indexing mechanism, which can be visualized at https://wolf-h3-viewer.glitch.me/

Programming for non-CS is different

[…] the top learning objective was for their students to understand that websites can be built from databases.

I’m pretty sure that the most popular programming language (in terms of number of people using it) on most campuses is R. All of Statistics is taught in R.

End-user programmers most often use systems where they do not write loops. Instead, they use vector-based operations — doing something to a whole dataset at once. […] Yet, we teach FOR and WHILE loops in every CS1, and rarely (ever?) teach vector operations first.

CS doesn’t have a monopoly on computing education: Programming is for everyone by Mark Guzdial

The main take away is that you do not teach Programming 101 to non-Software Engineering/Computer Science the same way you teach to those students. The learning outcomes are different, and so should the content.

Funny how Functional Programming (via vectorized operations) is suggested to appear first than imperative constructs like for or while. This even aligns with GPU-powered parallelism that is needed when processing large datasets.

Food for thought.

A review of "We are destroying software"

Salvatore Sanfilippo (@antirez):

We are destroying software by no longer taking complexity into account when adding features or optimizing some dimension.

Agree.

We are destroying software with complex build systems.

Disagree: they are no longer build systems. They also take care of deployment, notarization, linting, vcs, etc.

We are destroying software with an absurd chain of dependencies, making everything bloated and fragile.

Mostly disagree. Leftpad is a good example of this taken to the extreme. But 90% of the cases are worthwhile. Fixing bugs in onde dependency fixes many downstream projects. However, product maintenance is often ignore in industry, and this is the real issue.

We are destroying software telling new programmers: “Don’t reinvent the wheel!”. But, reinventing the wheel is how you learn how things work, and is the first step to make new, different wheels.

Mostly disagree. Reinventing the wheel is a good academic exercise, but not in a product or service. Do it on your own time or in school.

We are destroying software by no longer caring about backward APIs compatibility.

Agree. We need to care more about the longevity of software and hardware. How long should a car last? Or a phone? I still use a very old iPod, but can’t use my more-recent blackberry.

We are destroying software pushing for rewrites of things that work.

Mostly disagree. I think most of the cases, we lack rewrites of things that do not work. The opposite is much less common.

We are destroying software by jumping on every new language, paradigm, and framework.

I agree, but only for startups/SV. It’s a common practice for CoolCompanies™ to write their software using a newish framework to hire people who are interested in learning (often better engineers). But that only occurs in a minority of the companies producing software.

We are destroying software by always underestimating how hard it is to work with existing complex libraries VS creating our stuff.

Mostly disagree. It’s easier to underestimate building things from scratch.

We are destroying software by always thinking that the de-facto standard for XYZ is better than what we can do, tailored specifically for our use case.

Disagree. We want open and modular software. I hate that the Gmail app is way better than Mail.app. Or that WhatsApp does not talk to Telegram or Signal. I hate silos like instagram that are not good internet citizens by not having open APIs and standards. Yes, standards are slow, but the end result is better for society.

We are destroying software claiming that code comments are useless.

Mostly disagree. We are destroying software by not writing the right comments. Most comments are written by people who write poor code and the wrong comments.

We are destroying software mistaking it for a purely engineering discipline.

I don’t even understand this point. Writing software products and services is engineering: it has millions of tradeoffs.

We are destroying software by making systems that no longer scale down: simple things should be simple to accomplish, in any system.

Disagree. We are destroying software by not spending the resources to build good software. It’s not about scaling.

We are destroying software trying to produce code as fast as possible, not as well designed as possible.

Mostly agree. Again, it’s about economics. Software is build with the constraints provided. Time and team quality are part of those constraints. Maybe we need better leadership and better economics.

We are destroying software, and what will be left will no longer give us the joy of hacking.

Disagree. Enterprise software does not provide joy, and that software is most of what exists. There is still good, indie or opensource software that provides that joy, it’s simply in less and less of products/services.

For most software to be hackable, it would have to incorporate many of @antirez’s complaints. I understand it’s a manifesto, but it lacks consistency and focus.

Score: 5/14

The Aging Programmer

Kate Gregory explains the challenges of growing old. I recommend this talk even more to younger folks who pull all nighters, drink red bull (or worse), and don’t care about their posture.

I’ve been highly concerned with my health as someone whose job is spent mostly in front of a screen. I take care of my posture, but even with standing desks, exercise and external monitor, I’ve been having backaches. In my 20ies, I’ve had issues with my wrists (I’ve since adopted vertical mouses and trackballs). Now I’m experiencing eyesight degradation. Even with this, I’ve learned a few new things:

  • Having more muscles leads to a better imune system, and more independence when you’re older.
  • After your 50ies, driving at night is a problem due to slow adaptation to high-contrast scenarios. One problematic example is the bright screen cars come equipped with. Maybe we need to invest in analog cars for the elderly (not a joke, I also want one of those).

Choose your editor font, tournament style

Coding Font is a web-based tournament game that allows you to select your favorite programming font.

Mine is Inconsolata, which I have used for years in macOS’s Terminal.app), but I’m not sure I ever used in my editor. My VSCode is set to use “Menlo, Monaco, ‘Courier New’, monospace”.

Wanted: an elegant solution for breadth-first iteration of a tree structure.

While working on the enumerative feature of GeneticEngine, I wanted to recursively explore all the instances of a grammar, ideally using cache.

My first solution ended up being DFS as I used Python generators to yield all possible options on sum types and recursively iterating through all arguments in product types.

I’ve written this proof of concept pt implement breadth-first iteration in a generic tree structure that yields the right order. However, I find the extra argument a bit ugly, and I would like a more elegant solution. If you happen to know it, I’m all ears!

Hidden Bug: Python class as definition

While preparing geneticengine to participate in the SRBench 2024 run, I was getting None out of a constructor:


def PredictorWrapper(BaseEstimator): def __init__(self, ind: tuple[str, str]): self.ind = ind

def predict(self, X): _, data = self.prepare_inputs(X) return forward_dataset(self.ind0, data) def to_sympy(self): return self.ind1

def mk_estimator(x): print(f“x={x}”) p = PredictorWrapper(x) print(f“p={p}”) return p

Outputting:

x=('np.log(np.power(dataset[:,1], 10.0))', 'log((x1 ** 10.0))')
p=None


Have you found the bug? It took me probably around 1 hour, mainly because I trusted myself too much (and there many other things going on in the code). If you still haven’t found the bug, check the first 3 characters of the code snippet. A function with only other functions inside returns None.

False and True Positive Testing in Differential Testing

Alive2 is a translation validation tool: given two versions of a function in LLVM IR–usually these correspond to some code before and after an optimization has been performed on it–Alive2 tries to either prove that the optimization was correct, or prove that it was incorrect. Alive2 is used in practice by compiler engineers: more than 600 LLVM issues link to our online Alive2 instance.

John Regehr & Vsevolod Livinskii

Really interesting read on how Alive2 is used alongside the Minotaur superoptimizer and llvm-mca.

European Union will stop funding Open-Source projects in Horizon Program

The European Union must keep funding free software

Since 2020, Next Generation Internet (NGI) programmes, part of European Commission’s Horizon programme, fund free software in Europe using a cascade funding mechanism (see for example NLnet’s calls). This year, according to the Horizon Europe working draft detailing funding programmes for 2025, we notice that Next Generation Internet is not mentioned any more as part of Cluster 4.

NGI programmes have shown their strength and importance to support the European software infrastructure, as a generic funding instrument to fund digital commons and ensure their long-term sustainability. We find this transformation incomprehensible, moreover when NGI has proven efficient and ecomomical to support free software as a whole, from the smallest to the most established initiatives. This ecosystem diversity backs the strength of European technological innovation, and maintaining the NGI initiative to provide structural support to software projects at the heart of worldwide innovation is key to enforce the sovereignty of a European infrastructure.
Contrary to common perception, technical innovations often originate from European rather than North American programming communities, and are mostly initiated by small-scaled organizations.

Previous Cluster 4 allocated 27 millions euros to:

“Human centric Internet aligned with values and principles commonly shared in Europe” ; “A flourishing internet, based on common building blocks created within NGI, that enables better control of our digital life” ; “A structured eco-system of talented contributors driving the creation of new internet commons and the evolution of existing internet commons” .

In the name of these challenges, more than 500 projects received NGI funding in the first 5 years, backed by 18 organisations managing these European funding consortia.

NGI contributes to a vast ecosystem, as most of its budget is allocated to fund third parties by the means of open calls, to structure commons that cover the whole Internet scope – from hardware to application, operating systems, digital identities or data traffic supervision. This third-party funding is not renewed in the current program, leaving many projects short on resources for research and innovation in Europe.

Moreover, NGI allows exchanges and collaborations across all the Euro zone countries as well as “widening countries“¹, currently both a success and and an ongoing progress, likewise the Erasmus programme before us. NGI also contributes to opening and supporting longer relationships than strict project funding does. It encourages to implement projects funded as pilots, backing collaboration, identification and reuse of common elements across projects, interoperability in identification systems and beyond, and setting up development models that mix diverse scales and types of European funding schemes.

While the USA, China or Russia deploy huge public and private resources to develop software and infrastructure that massively capture private consumer data, the EU can’t afford this renunciation.
Free and open source software, as supported by NGI since 2020, is by design the opposite of potential vectors for foreign interference. It lets us keep our data local and favors a community-wide economy and know-how, while allowing an international collaboration.
This is all the more essential in the current geopolitical context: the challenge of technological sovereignty is central, and free software allows to address it while acting for peace and sovereignty in the digital world as a whole.

— OW2

The Register has more information on this issue.

Scope of Generics in Python

Thanks to Continuous Integration, I have found a typing problem in our genetic engine program synthesis framework. It boiled down to me not defining a scope for a type variable.

I started with some code that looked like the following:

with the following error:

main.py:18: error: Argument 1 has incompatible type "P@consume"; expected "P@__init__"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

You can load this example on the MyPy playground if you want to play around with it.

In this case, MyPy is inferring the type of data as dict[str, Callable[[P@__init__], bool], where the key is the init part of the type variable that ends up being different than the use o P inside the consume function. This behavior is because type vars are, by default, bound to the function/method, and not the class. The first step is to actually introduce the explicit annotation for data with the dict[str, Callable[[P],bool]] type, inside Subclass. Now we get a different error:

main.py:17: error: Dict entry 0 has incompatible type "str": "Callable[[P@__init__], bool]"; expected "str": "Callable[[P], bool]" [dict-item]

Now the P type variable in the field annotation is different than the ones inside the method. To actually bind the type variable to the whole class, we need to extend Generic[P]:

Now, we have no typing errors, and we do not even need the explicit type declaration for data.

Most of this issue was due to me not clearly understanding the default behaviors of type variables1. Luckily, if you are able to only support Python 3.12 and upwards, you can use the new, saner syntax. And maybe someday I’ll finish the draft post where I explain why Python’s approach to typing is the best (for prototyping type systems and meta-programming techniques, like we do in GeneticEngine) and the worst (for real-world use).

1 Who the hell creates a type variable through the definition of a variable??

Writing a compiler using Python, Lex, Yacc and LLVM

I found a good post on how to build your own toy compiler using Flex, Bison and LLVM. I saw one disadvantage right in the beginning: you had to use C++. If I were just prototyping a compiler, I wouldn’t use C++ but rather a dynamic language. And last semester for the Compilers course that’s what I did.

Students were assigned to build a Pascal compiler (actually a subset, but not that small) and the tools suggested were Lex, Yacc (using the C language) and compiling the code into C. I took a different approach and decided to do the project in Python (I actually tried ruby first, but the ruby-lex and ruby-yacc projects didn’t pass my basic tests).

I wrote the language grammar using PLY (the lex and yacc DSLs for python) and it was pretty simple. As for the AST generation, I had only a class Node that accepted an type and a list of arguments while my colleagues using C had to make 1001 structs for each kind of node. Not that it wasn’t impossible using C, but dynamic languages make the code simpler and more clear.

For the code generation, I decided to go with LLVM. It is a very promising project. Just take a look at google’s unladen-swallow or macruby, even parrot is planing on using llvm for their JIT.

For writing the code in Python, I had to use the llvm-py which I may say it’s in a early stages and lacks documentation. That was my major problem using. I had only three resources: the official guide, a presentation in japanese with some source code, and the actual source of the project (in C and C++).

Since every time I got an error in the llvm code generation it crashed the program, I had to dig into the source code of the project and find that error message and reverse engineer what was wrong with my code (usually I was giving values or pointers instead of references and vice-versa). So if you are doing something more complex, you actually need some C++ reading skills.

The project however worked, and I’m making it available so anyone may use the code as an example until better resources are published.