Thursday 8 September 2022

Oh dear,

...I have trodden in monsieur's bucket!

It's that Python thing again—for the first time I'm having troubles with imports, paths and subdirs. It's way too complicated for my Little Brains. It seems super simple (or even absent) in C# (I don't know about Java), it's fairly simple in C++, as long as I keep track of changes and update the Makefile, after failing to make (pun!) CMake work under vsc. But C++ is for my patient part. Python should be for fast and easy coding, filtering data, generating code for real languages and so on...

So, I needed a slave that would read some 400+ rows out of an xlsx file (or from td tags of some webpage), reformat some of the data, and save it as another xlsx (some columns skipped, some reshuffled) or produce the 400+ rows' data in the form of constructor call for some (un)clever C# class. Some knucklehead had them hardcoded and it's used not too frequently—to initialize some table when a database is created. All this stuff has to be cleverly sorted by names—by substring that follows some specific substring, including numbering in words (yes, for "first", "second" and "third" you're lucky) and splitting the whole name, depending on some other substring that's there or not. (Of course, all the knuckleheads could come up with was sort the data alphabetically by names; which doesn't help at all and frustrates me and users of the program I'm working on; but afterall, this is what we all like about knuckleheads—a cheap possibility to compare our intellects to theirs. And the intellect in my case—some call it "skull"—is tiny anyway...)

Overall it sounds simple, if not for the amount of work it has to do on string variables; I wouldn't even dare approach this kind of problem in C++, perhaps in C#/Java it would be easy.

So I wrote the stuff in py**on, fighting with exotic stuff like pandas, iteration over dictionaries' keys (yeah, for key in dictionary: How counter-intuitive is that?) and suffering with C# code generator's issues with tabs or spaces, and finally got it to work. Hell, I even wrote that custom "comparer" part for sorting.

And after trying to insert the new data to the database (by importing it from the xlsx file) I realized that our phone number column's max length was too small (like 15 characters—it's them knuckleheads again!) to accept three phone/fax numbers or more. So I entered some ALTER TABLE into the database's updating procedure and only then noticed that phone numbers in the (overall messy) input xlsx file (or html—if the slave is taking them from the net) are formatted and separated differently, spaces, minus signs, commas, semicolons. Well, I had no choice but to write yet more silly scripts that would correct that mess and return a phone number (or a comma-space separated bunch of numbers), nicely formatted etc. 

This part of my code looks very nice, but it doesn't work for some reason, cuts off digits etc. (No, it's not directly related to substrings or strides or indexing errors, at least not obviously—it's been a long time since parting with Fortran... which I miss sometimes.) This proved immediately that I needed to write tests and do some debugging instead of just silly looking at the 400+ lines of output and wondering what the heck is going wrong. After a rather innocent change—moving my test scripts to some test/ subdirectory, so that they don't mess with the main.py piece that does all the work, as well as with input and output files—the problem started to reveal itself. It keeps haunting me to this day—I'm just stuck. And I can't figure out why the editor's underlying compiler (python3 probably, being run by pylance or so) doesn't show me the damned import error, nor can I understand the black magic behind python's imports. It just worked for me for over a year, with simple "project's" directory tree and I took it for granted that the incantation 

from path.filename import somefunction

is somehow universal—perhaps by mapping it, unconsiously, to #includes. And hoping, perhaps, that there is some magic stuff that knows the "project's" root directory and looks for imported modules by directories and file names. This is how it looks, afterall, and so many people say so much about python being simple, intuitive etc. So, most likely my view was very naive, but it worked for me. Until like a week ago when it all broke.

Ok, some posts regarding the issue on SO are hilarious (see e.g.

https://stackoverflow.com/questions/14132789/relative-imports-for-the-billionth-time

), but I still can't follow the explanations—as is typical of python community, there are zillions solutions to zillions of problems, each more clunky and uglier than the other. What a fucking mess...

So, instead of the ol' RTFM approach—since the amount of entitocracy is way too big for my RAM and willingness to learn all this rubbish (survey based on 10% of prejudice and 90% of painful experience...)I thought of a somewhat easier way out. I'm a physicist, I love simplicity and can't deny myself pleasure of making things simple. Here we go:

Why don't just...

1. collect all the paths to python files, 

2. cat all the stuff together into one file, labeling local variables, functions and classes accordingly (by paths perhaps, seems insane, but it's easy and unique), so as to prevent name clashes etc. and 

3. run it as we wish, look for errors, bugs etc.?

And never ever do any editing in the big file—only in the original ones where the bug appears. If you get an exception or error in the function src-StringFunctions-SplitCustom-pyRun(s), perhaps there is a problem in ./src/StringFunctions/SplitCustom.py, where it defines the Run(s) function. You even don't have to label or group code sections with comments or #regions. Just recreate the final script every launch or test. 

Make, grep, cut, cat, sed and >> can make that happen... Yes, the shell script would look very bad and I'm not that fluent in this business to immediately start writing it. On the other hand, it would be the guiltiest of pleasures to write such a converter in python and call it...—I have to think of something. There are many reptile-related names around. Or just unsuffocate or constrictthembloodyimportz. Or doublespeak—since in Polish saying about someone's "double tongue" is accusing them of being a liar. This seems much more sane than learning from those smartass scripures of the insane. I tried looking for it in python documentation pdf (beautifully formatted in (La)Tex which is a shame), with no effect. And why am I getting the error only when actually running these tests (both from console and IDE which is no surprise since they both run the same intepreter, python3.x in my case), but the pylance thing doesn't show it? It's also a mystery. It does show other import errors though, ones that I can more or less understand and eliminate, it even does some "refactoring" on file rename. And it works, though sometimes adds some silly lines like

from module import function as function

which looks very redundant.

I admit my level of understanding is very shallow here. I thought I had some ultra-basic idea about the inner workings of an IDE with "intellisense"-like features—it runs a compiler in the background, detects errors and sends to the gooey those wavy lines, colors and error messages to show up. Perhaps I was very wrong, at least in this case. Or is it some bug inside vsc and its python plugins?

Python's import policy and rules resemble loosely some of my early efforts at coding. I started at the age of 35—I hope it sounds like some serious offence. It's like adding a lot of (more or less random and ad hoc) improvements (quote signs needed...) that sometimes help where the supposed fix happens, but make other pieces stop working properly or introduce some nasty special cases. And very frequently prevent the whole thing from being simple—a feature I have almost a fetish for, since I've not been educated to deal with complex stuff. Although 3 + 4i doesn't sound at all unfamiliar... (Dad joke.) I call it sentiment coding, when I'm so in love (almost) with my own (almost) work, which I tend to value by the effort and time put and pain that I went through to create (or copy-paste-make work) it, that I can't just abandon it and do a major rewrite. Though, deep inside, I know it would be the only reasonable thing to do in many cases. Still too many, still too often.

I practice this approach nowadays sometimes, esp. with some gui-related stuff which I hate to work on—the reason being .NET + WPF on M$Windoze is my main job, and every so often the "code behind" part grows too big and complicated, very fast. So the job of, say, designing a window, is typically done in one push, with the increasing feeling of losing control of what happens. So the next day I'm back, asking myself who wrote this shit and—surprise—being completely lost. And I'm certainly too stupid for MVVM in WPF—my knowledge and way of using C# is too simplistic for that, it's like C++ with some improvements to make life easier. Reflection and basic events is as far as I can go, knowing a bit how events work and how I could design something similar on my own. But reflection? Well, I could think of generating some secret stuff based on the source code, but I just don't know how the real thing works, most likely it's different. And MVVM is just witchcraft to me, almost like QFT. (String theory seems harmless—just pure religion and cult, luckily producing a lot of outcome that math people then can think of and do some clean-up and get new ideas from it.)

Strange association—python and string theory. Perhaps due to the strong PR behind. 

"Never have so many understood so little about so much". I recall running across this quote in the era of Win98 passing to history, and my relationship with Windows' creators came to a happy end. And I was almost sure—a year later, starting as an "advanced user" of the "Linux operating system", the phrase had referred to the stupidity of lots of users, typically from a non-Unix world. Nowadays I'm sure it's also about us, un(der)qualified developers. But at the same time I recall that some book on the C language (Kernighan & Ritchie if I remember correctly, or was it B. Stroustrup's C++?) had a quote about killing the creators of the language. May the reptile have you suffocated soon!

PS Having said that (hardly anyone listens, tldr etc.) only now I realized that the quote at the top plays very nicely with the name of the language! I didn't mean that at the beginning—just wanting to point to something shocking/gross. Python to me is a lot like that bucket that Mr Creosote just puked into*. On the other hand—if I get something to run and do the work, I have to admit, it's great... But the imports suck! And py.fanboyism?... Speak to us, master, speak to us!, I was blind and now I can see!...

(Дурак ты, боцман, и шутки у тебя дурацкие...)

added a week later: I was having troubles with some pdf table which is 99% malformed, perhaps made from a spreadsheet with linebreaks in a lot of cells. After obtaining an awful csv output from tabula, I started reading about camelot. And:

Interestingly, the language in which this library is written (Python) was named after Monty Python.

I have no further questions. You had me fooled. Thank you.

No comments:

Post a Comment