pkkm 2 hours ago

Very happy to see that these issues are getting attention now. I think that the Python language being so centered on one implementation is a long-term threat to its success. Web servers, command-line programs, and embedded devices have different requirements (high post-warmup throughput, fast startup, low memory usage), so they aren't necessarily best served by the same implementation. If this project succeeds in replacing Python's C API with something that doesn't expose implementation details, such as whether the implementation uses reference counting, that could make it easier both to maintain alternative implementations, and to experiment with new techniques in CPython.

koe123 9 hours ago

Is my understanding correct that this would provide version agnostic python bindings? Currently, I am building a version of my bindings separately for each version (e.g. building and linking with python 3.7, 3.8, etc.). While automated, it still makes CI/CD take quite a long time.

  • filmor 5 hours ago

    As others have said, this has been supported since the limited/stable APIs were introduced. What this adds is a way of implementing a Python extension that can be loaded in (not just compiled for, which is already an improvement!) different Python implementations, namely CPython, Pypy and GraalVM.

  • kzrdude 9 hours ago

    Cpython also has a limited stable abi and cp3X-abi3 wheels are compatible across multiple versions of Python.

    https://docs.python.org/3/c-api/stable.html

    • mardifoufs 3 hours ago

      But it is very limited. Understandably so, as they don't want to ossify the internal APIs, but it still is so limited that you can't actually build anything just using just that API as far as I know.

  • masklinn 8 hours ago

    You can already build a single wheel as long as you only target cpython, if your needs fit with the limited / stable abi (abi3).

    While pypy and graal have API support they don't have abi / abi3 support, so they still have to be built on their own (and per version I think).

  • aragilar 9 hours ago

    I believe so, but it would presumably depend on what features you use.

  • gjvc 9 hours ago

    While automated, it still makes CI/CD take quite a long time

    See about using ccache -- https://ccache.dev/

    • IshKebab 8 hours ago

      I wouldn't recommend ccache (or sccache) in CI unless you really need it. They are not 100% reliable, and any time you save from caching will be more than lost debugging the weird failures you get when they go wrong.

      • gjvc 7 hours ago

        please provide evidence for this assertion.

        • imtringued 5 hours ago

          You can't cache based on the file contents alone. You will also need to cache based on all OS/compiler queries/variables/settings that the preprocessor depends on, since the header files might generate completely different content based on what ifdef gets triggered.

          • mananaysiempre 5 hours ago

            And that’s not impossible, just tedious. One tricky (and often unimportant) part is negative dependencies—when the build depends on the fact that a header or library cannot be found in a particular directory on a search path (which happens all the time, if you think about it). As far as I know, no compilers will cooperate with you on this, so build systems that try to get this right have to trace the compiler’s system calls to be sure (Tup does something like this) or completely control and hash absolutely everything that the compiler could possibly see (Nix and IIUC Bazel).

            • zorgmonkey 2 hours ago

              In C++ the __has_include preprocessor expression has been standardized since C++17, I'm not certain if C has standardized it yet though.

              • mananaysiempre an hour ago

                It’s not about that, that’s not relevant to ccache at all. (And yes, C23 does have __has_include, though not a lot of compilers have C23 yet.) It’s about having potentially conflicting headers in the source file’s directory, in your -I directories, and in your /usr/include directories.

                Suppose a previous compile correctly resolved <libfoo.h> to /usr/include/libfoo.h, and that file remains unchanged, but since that time you’ve installed a private build of libfoo such that a new compile would instead resolve that to ~/.local/include/libfoo.h. What you want is to record not just that your compile opened /usr/include/libfoo.h (“positive dependencies” you get with -MD et al.), but that it tried $GITHOME/include/libfoo.h, ~/.local/include/libfoo.h, etc. before that and failed (“negative dependencies”), so that if any of those appear later you can force a recompile.

          • amelius 4 hours ago

            Maybe run every build version in their own container?

          • gjvc 3 hours ago

            please read the documentation before dispensing uninformed advice like this -- it works using the output of the preprocessor and optionally, file paths

        • IshKebab 7 hours ago

          Why are you so skeptical? Think about how it works and then you'll understand that cache invalidation bugs are completely inevitable. Hell, cache invalidation is notoriously difficult to get right even when you aren't building it on top of a complex tool that was never designed for aggressive caching.

          Just search the bugs for "hash":

          https://github.com/ccache/ccache/issues?q=is%3Aissue+hash+is...

Stem0037 7 hours ago

It would be interesting to see benchmarks comparing HPy extensions to equivalent Cython/pybind11 implementations in terms of performance and development time.

actinium226 7 hours ago

I'm a little unclear as to how this fits in with libraries like PyBind11 or nanobind? It seems like those libraries would need to be rewritten (or new libraries with the same goals created) in order to use this in the same way?

rich_sasha 10 hours ago

Looks very cool.

How many new extensions are written in C these days? I was under the impression it's mostly things like Boost Python, pybind or PyO3.

  • masklinn 8 hours ago

    PyO3 is bindings to the C API, so if you're using PyO3 you're still using the C API even if you're not actually writing C.

    • rich_sasha 7 hours ago

      Yeah, sure, I mean, how many people write C to write an end-user Python module. There's stuff that genuinely wraps C libraries or predates higher level language wrappers, like numpy or matplotlib, but how many new modules are actually themselves written in C?

      • masklinn an hour ago

        The point is that’s not relevant, the issue is the API / ABI of the modules, its requirements, and its limitations, not the langage in which the modules are written.

  • aragilar 9 hours ago

    There's also Cython.

    I would guess also that HPy would replace the includes of `Python.h` that pybind11 et al make in order to bind to CPython, and so existing extensions should be easier to port?

  • physicsguy 7 hours ago

    Quite a lot, for things like simulation code

    Less so for general programming.

  • trkannr 6 hours ago

    A lot. You don't have to write in C, just use the C-API functions. pybind etc. introduce a whole new set of problems, with new version issues and decreased debug ability.

ashvardanian an hour ago

Hey!

First of all, cool to see some activity on this front!

I’ve written a fair share of pure CPython bindings and regularly post about implementing them with minimal overhead (<https://ashvardanian.com/posts/discount-on-keyword-arguments...>) and would love to share a few recommendations, questions, and concerns :)

Just a suggestion to help you grow—I'd restructure the landing page (<https://hpyproject.org/>) and the README of the repo (<https://github.com/hpyproject/hpy>). It could benefit from some examples to clarify the "Nicer API" bullet point. Maybe these could be taken from the API documentation page (<https://docs.hpyproject.org/en/latest/api.html>). The page could also be more convincing with some supporting stats in favor of PyPy, GraalPython, and other Python runtimes. A reader like me might not be sure if they have enough usage and are stable enough.

Avoiding singletons and having encapsulated context objects like `HPyContext` is definitely a great thing to have, especially in the multi-threaded Python future or in complex environments with multiple sub-interpreters. But this doesn't really solve the problem if, under the hood, the `HPyContext` still redirects to CPython's singleton.

I've also looked at the linked benchmarks (<https://pypy.org/posts/2019/12/hpy-kick-off-sprint-report-18...>). They are dated from 2019, five years ago, and already mention CPython's `METH_FASTCALL` fast calling convention, but it seems like they are not compared to it. In either case, parsing arguments from one "ll" string specifier is hardly a detailed benchmark if the underlying magic isn't explained. I occasionally do one-off benchmarks as well, but it's better to describe the principle—why the thing is supposed to be faster. For example, if you're concerned about performance, you'd just parse the arguments directly from the tuple without string formatters—like this:

  <https://github.com/ashvardanian/SimSIMD/blob/80cc4bcaddbdee9a0c0e991e13376c234aff3b3f/python/lib.c#L929-L1066>
It’s more error-prone, but it would be cool to see if a high-level solution can achieve under a 10% latency penalty.

Hope this is useful :)

normanthreep 7 hours ago

tangentially related question: is there something as simple as luajit's ffi for python? as in: give it a c header, load the shared library, it simply makes structs usable and functions callable.

  • pkkm 2 hours ago

    cffi is closest to what you described.

  • nly 6 hours ago

    cppyy does this for C++

  • lukego 7 hours ago

    Yeah, cffi.

gghoop 6 hours ago

I'm interested in calling go from python, gopy generates python bindings to cgo. Maybe HPy<->cgo would have less overhead.

  • masklinn an hour ago

    Use IPC. Go wilfully set itself apart from and against the C ABI, it’s generally not worth fighting against that.

  • crabbone 5 hours ago

    It's a no-go at this point, if you want this on MS Windows. CGo on MS Windows uses MinGW, while CPython uses MSVC. It's very hard to make this work due to name mangling.

    I.e. you can do this for Python from MSYS2, for example, but not for the one your users will likely have.

murkt 10 hours ago

Imagine how different the Python ecosystem could be, if this was done 20 years ago.

  • lifthrasiir 9 hours ago

    Unless it was done at the very beginning, I doubt it would have been even possible because the current C API is the remnant from that very first public version.

  • foolfoolz 9 hours ago

    python has one of the most fractured development ecosystems of any moderately used language. i’m pretty convinced python is a language that attracts poor development practices and magnifies them due to its flexibility. the people who love it don’t understand the extreme flexibility makes it fragile at scale and are willing to put up with its annoyances in an almost stockholm syndrome way

    • Quothling 8 hours ago

      I think any programming language with a lot of popularity attracts poor development practices. Simply because a lot of programmers don't actually know the underlying processes of what they build. The flip-side of this is that freedom and flexibility also gives you a lot of control. Yes, it's very easy to write bad Python. In fact it's probably one of Python's weaknesses as you point out. If you're going to iterate over a bunch of elements, you probably expect your language standard libraries to do it in an efficient way, and Python doesn't necessarily do that. What you gain by this flexibility (and arguably sometimes poor design) is that it's also possible to write really good Python and tailor it exactly to your needs. I think Python scales rather well in fact. Django is a good example, as it's a massive workhorse for a lot of the web (Instagram still uses their own version of it as one example). It does so sort of anonymously similar to how PHP and Ruby do it outside of the hype circle, but it does it.

      One of the advantages Python has, even when it's bad, is that it's often "good enough". 95% of the software which gets written is never really going to need to be extremely efficient. I would argue that in 2024 Go is actually the perfect combination of the good stuff from both Python and C. But those things aren't necessarily easy to get into if you're not familiar with something like memory management, (maybe strict typing?), explicit error handling and the differences between an interpreted and compiled language.

      Anyway I don't think Python is anymore annoying than any other language. The freedom it gives you needs to be reigned in and if you don't then you'll end up with a mess. A mess which is probably perfectly fine.

    • est 7 hours ago

      > most fractured development ecosystems of any moderately used language

      Can you elaborate? What's done wrong with Python and right with other "moderately used language" ?

      For start, C/C++ doesn't even have an official ecosystem. For Java or Golang, it looks better only because the "ecosystem" does not always include native extensions like cgo or JNI. Once you add them the complexity were no better than Python's

      • rwmj 7 hours ago

        Python .pth files are horrific. Here's an actual .pth file I was dealing with the other day (from Google Cloud Storage) which completely prevents you from overriding the module using PYTHONPATH:

          import sys, types, os;has_mfs = sys.version_info > (3, 5);p = os.path.join(sys._getframe(1).f_locals['sitedir'], *('google',));importlib = has_mfs and __import__('importlib.util');has_mfs and __import__('importlib.machinery');m = has_mfs and sys.modules.setdefault('google', importlib.util.module_from_spec(importlib.machinery.PathFinder.find_spec('google', [os.path.dirname(p)])));m = m or sys.modules.setdefault('google', types.ModuleType('google'));mp = (m or []) and m.__dict__.setdefault('__path__',[]);(p not in mp) and mp.append(p)
        • est 2 hours ago

          I agree those particular .pth files were horrific.

          But python package made by Google were noturously bad. Its awefulness dates back to the GAE days.

        • talideon 6 hours ago

          If .pth files are the worst thing you can find to complain about, Python's doing pretty well. That horrific .pth file in question is better placed as the feet of its creators than the mechanism itself.

          • rwmj 5 hours ago

            The fact they considered allowing executable code in path lookups shows a certain attitude.

            • oefrha 4 hours ago

              It shows that the language is highly dynamic and you can patch anything? The .pth mechanism allows the party controlling the Python installation (site) to run some init code before any user code, basically an rc mechanism. Nothing more, nothing radical. Maybe you’re unhappy with the dynamism, in which case your complaint is misplaced.

              • rwmj 28 minutes ago

                In this case it prevents someone using PYTHONPATH to alter or override the order that modules are loaded. Hard to justify that.

      • crabbone 5 hours ago

        You have Anaconda packaging world vs PyPI. You have pyproject.toml for project management, which is not supported by Anaconda or the flagship documentation generation tool: Sphynx. You have half a dozen of package installers, none of them work to the full extent / all have different problems. You have plenty of ways to install Python, all of them suck. You have plenty of ways to do some common tasks, s.a. GUI, Web, automation: and all of them suck in different ways, w/o a hint of unifying link. Similarly, you have an, allegedly, common relational database interface, but most commonly used SQL bindings don't use it. And the list goes on.

        • est 3 hours ago

          > You have Anaconda packaging world vs PyPI

          As I said, it's only because .so extensions were hard. If every package were pure Python, I would simply copy paste them in my source code `lib` path.

          Don't laugh at me, this is called "vendoring" or "static linking" by other languages, and the "requests" famously included a version of urllib3 for quite a while

        • Demiurge 2 hours ago

          > You have Anaconda packaging world vs PyPI

          There is no fracture or "versus" here. You can pip install on top of Anaconda. Anaconda provides a more stringent solver and OS level packages that some pip level modules often depend on, it just solves the integration problem, but I use both, including requirements.txt in my Anaconda env.yml all the time.

          > You have pyproject.toml for project management, which is not supported by Anaconda or the flagship documentation generation tool: Sphynx.

          Again, Anaconda is not "standard" python thing, it is a replacement for build OS level packages, such as GDAL, which is a just a subset of Python modules. Anaconda does not need to support standard python tooling, because those python tools exist outside of Anaconda.

          To simplify, for every Anaconda package, you can likely find it in PyPI, but for every PyPI, you will not find it in for conda. Anaconda is not a competitor for PyPI, it does not need to replicate every PyPI feature.

          > You have plenty of ways to install Python, all of them suck.

          What does this actually mean? You install Python with all the major OS installation methods, and absolutely none of them suck, any more than installing anything on this OS does. The standard ways are Python Setup.exe, apt-get install, and brew install. Yes, you can additional options such as conda distros, yet what exactly sucks about them? Nothing.

          > You have plenty of ways to do some common tasks, s.a. GUI, Web, automation: and all of them suck in different ways, w/o a hint of unifying link.

          I think I'm starting to get it. Everything sucks if you've been around long enough. Django is vastly prevalent web framework. wx widgets is standard, and there are bindings for most GUI toolkits. There are many toolkits, is it Pythons fault they all got invented by different organizations? Is it an interpreted language responsiblity to provide a cross platform GUI toolkit for you?

          > Similarly, you have an, allegedly, common relational database interface, but most commonly used SQL bindings don't use it.

          What are you even talking about? Who in the world cares about this? People use database specific libraries, in every single language, because every database has its own set of features.

          > And the list goes on.

          Your list reeks of someone flinging critiques without even knowing what they’re talking about—just a lot of hot air fueled by emotional baggage, likely from some long-dead language you once cherished before it was mercifully abandoned.

    • Const-me 8 hours ago

      > a language that attracts poor development practices

      I agree, but note there’s another way to frame it: “python can be used by people who aren’t professional software developers”.

    • _fizz_buzz_ 8 hours ago

      It’s also fractured because it has such a massive user base that use it for very different applications with very different priorities.

    • miohtama 8 hours ago

      C/C++ is more fractured.

      While Python is fractured, it is nowhere near problems of C ecosystems.

      • rbanffy 5 hours ago

        As anyone who has tried to build multi-platform software with C or C++ can easily tell you.

        It's almost a relief AIX, Solaris, and HP/UX are either very niche, or going the way of the Dodo.

    • poincaredisk 6 hours ago

      >the people who love it don’t understand the extreme flexibility makes it fragile at scale and are willing to put up with its annoyances in an almost stockholm syndrome way

      The people who love it understand that its extreme flexibility makes it applicable everywhere, while academic purity mostly doesn't work in the real work. They also prioritize getting things done over petty squabbling, but they know how to leverage available tooling where reliability is crucial.

      (See, I can generalize too)

    • bvrmn 9 hours ago

      The reason is a popularity not a technical one. It's inevitable to get a diverse interest to improve different parts of ecosystem by different parties.

    • redman25 4 hours ago

      Python with types enforced by CI isn’t too bad. Or did you have something else in mind?

    • analog31 3 hours ago

      Would some other language have become just as fragmented if it had gained the same level of popularity across such a broad range of user interests?

    • jaimebuelta 6 hours ago

      There are only two kinds of languages: the ones people complain about and the ones nobody uses.

    • WhereIsTheTruth 6 hours ago

      it's not 'fractured', it's just fragmented, and it's not necessarily a bad thing, it gives plenty of room for R&D and experimentation

      if something doesn't end up working well, you pivot

  • amelius 4 hours ago

    It would have taken time to do this and consequently Python would have missed the race and some other language would now be #1.

    • pkkm 2 hours ago

      > Python would have missed the race

      Why do you think that? There's no need for a Python 2->3 like transition here, it could have been done while supporting the old C API for a while.

    • murkt 4 hours ago

      Python missed the race pretty heavily with 2to3 transition and still came out on top.

      • amelius 4 hours ago

        Survivorship bias. With version 2 they were already at the top.

trkannr 6 hours ago

After cpyext and cffi, this is the third attempt, largely driven by PyPy people, to get a C-API that people want to use.

If they succeed and keep the CPython "leaders" who ruined the development experience and social structure of CPython out of PyPy, PyPy might get interesting. If they don't keep them out, those "leaders" will merrily sink yet another project.

  • filmor 6 hours ago

    cffi replaces ctypes, which is a completely different thing. cpyext is a reimplementation of the Python C-API, so no attempt at improving the API.

    HPy on CPython uses the existing C-API under the hood, so there is zero need to build up some keep someone out...

    • kagerl 5 hours ago

      cffi is used to wrap c libraries. Only a masochist would use ctypes to wrap a whole library. While both are technically FFIs, it does not make sense to compare them. From a conceptual perspective, cffi was written to replace the C-API for C modules.

xiaodai 6 hours ago

Is this thing “official”?