feat: expanded on sections

2022-05-15 01:19:58 -05:00 · 2022-05-15 01:19:58 -05:00 · 3d934c68fb
commit 3d934c68fb
parent 40049d1f36
5 changed files with 108 additions and 12 deletions
--- a/language.json
+++ b/language.json
@ -1,9 +1,10 @@
 {
-    "knownWords": [
-        "BitWarden",
-        "Moonfire",
-        "NonCommercial-ShareAlike",
-        "derecho",
-        "md"
-    ]
-}
+	"knownWords": [
+		"BitWarden",
+		"Moonfire",
+		"NonCommercial-ShareAlike",
+		"derecho",
+		"md",
+		"unreviewed"
+	]
+}
--- a/scripts/dry.md
+++ b/scripts/dry.md
@ -0,0 +1,20 @@
+---
+title: Don't Repeat Yourself
+---
+
+According to Larry Wall, the creator of Perl, one of the three virtues of a good programmer is being lazy. In this regard, he means they look for ways of getting the effect for the least amount of code. The isolation of the library also means that individual components can be tested for correctness and any dependent package on it can be ensured that it "just work" as intended.
+
+That is why we use libraries. Even something as simple as "left pad" can be implemented in many different ways in code when the standard library doesn't provide it. Elegant implementations still add a cognitive load when reading the code:
+
+```C#
+var leftPad1 = "0".PadRight(5);
+var leftPad2 = ("0" + new string(' ', 5)).Substring();
+```
+
+The first example above is more precise, says what it does, and then no more effort needs to be spent on it because that isn't the point of the code. The second, while it also implements a left pad, takes a bit longer to understand and is harder to use.
+
+The same thing happens with using libraries to figure out ANSI codes, handle screen sizing in a console, implementing SSL, or hosting a SSH server. We use libraries to isolate distinct portions of code to make our code more effective by only focusing on what is important.
+
+In my case, the bulk of my development work is putting different concepts (libraries) together into something I think is interesting and useful. A "new" pattern is hard and requires thought, composition is much easier and where the ideas really shine. I also think that most other coders are the same way, we put together blocks.
+
+The drawback of this that we build up a tree of dependencies. In some languages, it can get very deep where a "simple" library ends up composing hundreds of other dependencies---directly or indirectly. When a [language doesn't provide a good support](./counts/), the number of indirect dependencies increases exponentially. When we have different implementations of the same thing, like logging or REST calls, then we also increase our dependencies.
--- a/src/counts.md
+++ b/src/counts.md
@ -0,0 +1,49 @@
+---
+title: Package Counts
+---
+
+If you look at just about any feed site, usually one of the first things listed is the number of packages. You can see it on [NuGet](https://nuget.org/) and [Crates.io](https://crates.io/). Some years ago, [NPM](https://npmjs.org/) used to have it but it's been taken off since I noticed it.
+
+The number of packages is a selling point for developers. It makes a large number to indicate the vitality of an ecosystem or the general excitement of the language.
+
+The problem is those counts are an example of [Goodhart's Law](https://en.wikipedia.org/wiki/Goodhart%27s_law):
+
+> When a measure becomes a target, it ceases to be a good measure.
+
+With self-serve feeds, the number of packages is unbounded and functions less of an indication of a thriving system and more of a simple function of time. There are a number of reasons for this, most of them predictable.
+
+What the package counts try to sell is the number of "high quality" packages, ones that provide additional functionality or extend services. That is more difficult to quantify, simply because "quality" is subjective.
+
+# Don't Repeat Yourself (DRY)
+
+We have a term for writing the same code over and over again, [Don't Repeat Yourself](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) which is why we end up creating packages when the language fails to provide them.
+
+Almost every language comes with a "base library" of features. It might be included or is baked into the language. For systems that provide relatively few functions, developers end up creating packages to support it.
+
+Probably the most famous would be `left-pad` which just added space or zero padding to a string in JavaScript. The key part is that JavaScript didn't, until recently, provide a way of doing left pad easily. So it would be up to developers to create their own. This means there were various that used loops with buffers next to implementations that concatenated strings with substring.
+
+# Not Invented Here (NIH)
+
+There are two aspects of Not Invented Here. The first are developers who want a favored tool or library but in a new language. This is things like `log4net`, `log4perl`, and `log4r`. They have their place but as the derivative library evolves with the language, it deviates from the source materials. Knowing `log4j` doesn't mean you know all the details of `log4net`.
+
+Some languages try to consolidate that by providing an "official" method of common functionality. A good example is [Microsoft.Extensions.Logging](https://www.nuget.org/packages/Microsoft.Extensions.Logging) and Rust's [log](https://docs.rs/log/0.4.6/log/) crate (`log` isn't officially, but it came in close enough that it is effectively such).
+
+From my experiences, providing those official interfaces near the beginning have a significant impact in reducing the number of packages. Rust still has a number of logging libraries, but almost all of them funnel through the `log` crate abstraction.
+
+# Base or Standard Libraries
+
+There are some arguments that languages should provide more as part of their base or standard library. Delphi, .NET Framework, and Java have rather extensive BCLs which significantly reduce the number of packages.
+
+However, there is also a drawback to this approach. A BCL is a foundation of a library, functionality that cannot disappear on a whim or even over a ten year period for a mature language. It creates a resistance to change and increases the maintenance for the language itself.
+
+A good example is `java.awt` (Abstract Window Toolkit). Almost everything uses Swing since Java 1.2 but the AWT remains in the language as a legacy library. It has to be maintained much like the .NET Framework's various `System.*` packages have to be maintained.
+
+Near to my heart was WebForms in the .NET Framework. I support a WebForms project and it is intimately tied into the BCL and the language. So when WebForms didn't move to .NET Core, I'm left with a product that is nearly impossible to keep with evolution.
+
+I don't think we have a good word for these libraries, but the extension libraries (ECL) work well when they are implemented by the core language but are not integrate to the language and have a well defined life cycle even if the life cycle is "currently recommended with no end-of-life in sight."
+
+# Reinventing the Wheel (RTW)
+
+In my early (okay, still) development career, I suffer from a need to reinvent the wheel. I wrote at least three command-line parsing libraries that worked the way "I want" or did the features I wanted. It took a conscious effort to focus on an existing one, even if it failed in some manner. That is why I used `CommandLineParser` in C# for so many years and then eventually gravitated to `System.CommandLine` (despite both of them being still fluid).
+
+As developers, we create endless copies of our version. I made `MfGames.Templates` to take ideas from WebForms and JSP pages to make a string templating system that let me write C# code. Now, we have Razor pages that do the same thing. But I still took the months to make it because it was a puzzle, it was fun, and I liked making it. If NuGet was around at the time, I would have no doubt have pushed it up in hopes someone would use it.
--- a/src/index.md
+++ b/src/index.md
@ -1,7 +1,7 @@
 ---
 title: Untrusted Projects
 #date: 2022-04-01
-#version: 0.0.1
+version: 0.0.1
 categories:
    - Development
 tags:
@ -9,12 +9,13 @@ tags:
    - Typescript
    - Rust
    - Semantic Releases
+    - Packaging
 summary: >
    One person's idea of how to handle malicious or unreviewed packages across most languages.
 ---

-The open-source ecosystem is huge with thousands upon thousands of developers creating billions of projects across multiple languages. Most of the time, these packages are pushed up to a centralized sites for discovery and download. Because of the scope, that also means that there are almost no reviews of the individual packages nor is there a decentralized way of identifying the malicious implementations out there.
+The open-source ecosystem is huge with thousands upon thousands of developers creating billions of projects across multiple languages. Most of the time, these packages are pushed up to a centralized sites for discovery and download with no human oversight.

-This is the crux of the problem. As an ecosystem acquires [more packages](./counts.md), there is always a risk of [a malicious developer](https://psychopathyis.org/stats/) creating a package to benefit them in some manner. It might be stealing information, protesting [current events](https://www.theregister.com/2022/03/18/protestware_javascript_node_ipc/), [making money](https://securityintelligence.com/news/popular-javascript-library-for-node-js-infected-with-malware-to-empty-bitcoin-wallets/) or simply just to destroy. But those individual packages are difficult to detect, more so when other developers are mandated with keeping packages up to date or the package itself is nested as dependency of another one.
+This is the crux of the problem. As an ecosystem acquires [more packages](./counts.md) managed by [self-serve](./self-serve/) systems, there is always a risk of [a malicious developer](https://psychopathyis.org/stats/) creating a package to benefit them in some manner. It might be stealing information, protesting [current events](https://www.theregister.com/2022/03/18/protestware_javascript_node_ipc/), [making money](https://securityintelligence.com/news/popular-javascript-library-for-node-js-infected-with-malware-to-empty-bitcoin-wallets/) or simply just to destroy. But those individual packages are difficult to detect, more so when other developers are mandated with keeping packages up to date or the package itself is nested as dependency of another one.

-What this [plot](/garden/) is to list one possible approach to handling this problem, along with some suggestions and next steps, because complaining about a system without coming up with a system isn't very productive. Naturally, this an attempt to create a [standard](https://www.explainxkcd.com/wiki/index.php/927:_Standards) but one that I think needs to be done sooner or later.
+What this [plot](//d.moonfire.us/garden/) is to list one possible approach to handling this problem, along with some suggestions and next steps, because complaining about a system without coming up with a system isn't very productive. Naturally, if this is productive, then it would be an attempt to create a [standard](https://www.explainxkcd.com/wiki/index.php/927:_Standards) but one that I think needs to be done sooner or later, by someone's method or another.
--- a/src/self-serve.md
+++ b/src/self-serve.md
@ -0,0 +1,25 @@
+---
+title: Self-Serve
+---
+
+Outside of a single developer or team, [DRY](./dry/) means that there needs to be a mechanism for discovering and using other's work to avoid repeating their work also. That is the basis of package ecosystems.
+
+The ecosystem has three components:
+
+- Discovery: How to find that someone has created something.
+- Details: Information about the package, such as dependencies.
+- Download: How to find and download the package.
+
+The easiest way to implement these is to create a single service that does all three to showcase the packaging system which naturally becomes the go-to place to get packages. Almost always, that also means that most ecosystems consolidate (or never move beyond) a single source to provide everything.
+
+That is our npmjs.org, nuget.org, and crates.io.
+
+Since these are the showcases for the package system, they start with the need to bootstrap themselves and reduce the effort in producing packages for a new ecosystem. This means we get a self-serve system where any developer can upload a package for others to discover and use.
+
+With continually development, [packages are built on packages](./count/) that are uploaded by hundreds of different developers. Trying to review or address every single one would be overwhelming for individuals to perform on their own and there is little profit for most companies to do it. So, these packages get uploaded with little limitation and are made available as soon as they are.
+
+That is where our malicious packages come in. It doesn't take much to upload one malicious package and have it dropped into place. For an mature package, that one package may be the foundation for countless other projects that touch every part of the globe.
+
+That is how `node-ipc` and `left-pad` caused to much damage in so little time.
+
+The retrospectives after those events lead to changes that slowed the influx of new packages or added the need to scan packages. That's how we got Dependabot scanning and why NuGet requires a SSL certificate to upload packages (and why I don't upload my C# packages there, I have been unable to afford getting one).