Accordion

In my last post, I walked through setting up end-to-end integration testing with NServiceBus, and how we can use it to black box test message endpoints similar to how the ASP.NET Core integration testing works. In this post, I want to walk through how it all works underneath the covers.

End-to-End Integration Testing with NServiceBus: How It Works

One of the interesting side effects of adding diagnostic events to infrastructure is that you can now "listen in" to what's going on in your applications for black box testing. This can be especially useful in scenarios where you're building on top of a framework that includes a lot of built-in behavior, such as ASP.NET Core and NServiceBus.

End-to-End Integration Testing with NServiceBus

A controversial GitHub issue came to my attention a couple of weeks ago around ditching the in-memory provider for Entity Framework Core. This seemed like a no-brainer to me - these database providers are far from trivial to maintain, even for in-memory strategies.

Avoid In-Memory Databases for Tests

A few years back, I had given up on xUnit in favor of Fixie because of the flexibility that Fixie provides. The xUnit project is highly opinionated, and geared strictly towards unit tests. It's great for that.

Integration Testing with xUnit

One of the newer features of SQL Server is the ability to snapshot a database at a point in time, and restore it. A commenter on the Respawn announcement asked if I had looked at this feature for resetting a test database to a known point. I hadn't,

Respawn vs SQL Server Snapshots

Respawn is a small library to help reset your database to a known state before tests. If you're like our teams and rely on integration/subcutaneous tests as your final "green" test for feature complete, keeping a consistent database state can be frustrating. Respawn solves this.

Respawn 1.0.0 released

Extra_End

Tags:

Accordion

Shay Rojansky

Microsoft software engineer working on .NET data access and perf, member of the Entity Framework team. Lead dev of Npgsql, the PostgreSQL provider.

Shay Rojansky

Microsoft software engineer working on .NET data access and perf, member of the Entity Framework team. Lead dev of Npgsql, the PostgreSQL provider.

Queryable PostgreSQL arrays in EF Core 8.0

6 minute read

Queryable collections?

When "UTC" everywhere isn't enough - storing time zones in PostgreSQL and SQL Server

8 minute read

When "UTC everywhere" isn't enough

Query parameters, batching and SQL rewriting

7 minute read

When "UTC everywhere" isn't enough In the upcoming version 6.0 of the Npgsql PostgreSQL driver for .NET, we implemented what I think of as "raw mode" (#3852). In a nutshell, this means that you can now use Npgsql without it doing anything to the SQL you provide it - it will simply send your queries as-is to PostgreSQL, without parsing or rewriting them in any way

Tags: Microsoft software engineer working on .NET data access and perf, member of the Entity Framework team. Lead dev of Npgsql, the PostgreSQL provider.

pg_docs_apiPublic archive

A simple abstraction for working with PostgreSQL as a Document Database

41 8

I deprecated this project in favor a more streamlined approach, which you can see here.

Postgres

PostgreSQL Fundamentals

You'll learn the basics of SQL and databases using PostgreSQL and you'll have the time of your life doing it. We use a real set of data: the analysis data from Cassini's flybys of Enceladus. There might actually be life up there and you're going to query the data to find out!

28 Videos

Contain your excitement for ASP.NET on Azure

Creating and deploying a container

If you havent spent time working with containers, you will be likely introduced to new concepts like Docker, Dockerfile, compose, and perhaps even YAML. In creating a container, you typically need to have a definition of what your container is, and generally this will be a Dockerfile. A typical Docker file for a .NET Web API looks like this:

Leveraging continuous integration and deployment (CI/CD)

And the resulting workflow in an Actions YAML file in your project:

Tags:

Writing .NET Application Services for Kubernetes By Mike Hadlow

Extra:

In a traditional .NET distributed application, application services (not to be confused with the Kubernetes 'service' object) would either be written as IIS hosted web applications or Windows Services. When building .NET (micro)services to be deployed in a Kubernetes cluster pretty much every facet of the service needs to be reconsidered, not only the hosting environment but the way configuration is accessed, how logging and monitoring work, and the options for state management and memory considerations. In this post I'll bring my experience of migrating .NET applications to Kubernetes to enumerate some of the main ways that you'll need to change the way you write them.

First some caveats. I don't intend this post to be an introduction to Kubernetes, or a justification of why you should use Kubernetes. Nor is it a tutorial on how to write distributed applications or .NET application services. It's intended audience is my past self about a year and a half ago. I would have very much appreciated a short guide on the changes I would have to make to redesign my .NET application services to take full advantage of Kubernetes.

Application Service Design

Some general application design guidelines:

1. Build stateless horizontally scalable services. See the 12 factor apps guidelines.
2. Use Linux containers. .NET is now cross platform and runs well on Linux. Avoid the bloat and inevitable friction of Windows Containers.
3. Consider the container immutable. Do not change the local file system. If you need a file system, use a volume mount.
4. One container per pod. Although the sidecar pattern is a popular one, it's perfectly reasonable to have a complete distributed application without a single sidecar in sight. Like all popular patterns, only use it if you have a real need.
5. Every application is a console application. Processes are managed by Kubernetes. HTTP services should be standalone console based web apps using the Kestrel webserver.

One of the main advantages you'll find writing application services for Kubernetes is that the platform now provides many things that you would previously have had to include in your application. As I'll describe below things such as configuration, logging, metrics, and security all become simpler to implement.

Building your container images

Kubernetes is primarily a container orchestration framework. Your applications/services need to be built and deployed as (usually Docker) containers. Microsoft have published a very good guide to building and running containerized .NET applications, NET Microservices Architecture for Containerized .NET Applications that I'recommend reading, although it doesn't cover Kubernetes the advice on creating container images and microservice architecture is very good.

Although it's possible to compile your application in a traditional build server and then create the runtime container image from the compiled binaries, it's easier to combine the build and runtime in a single multi-stage docker file, that way you control the environment for both build and deployment. Here is a very simple example:


    FROM mcr.microsoft.com/dotnet/aspnet:6.0-bullseye-slim AS runtime
    WORKDIR /app
    EXPOSE 80
    
    FROM mcr.microsoft.com/dotnet/sdk:6.0-bullseye-slim AS sdk
    
    ARG build_number=1.0.0
    
    WORKDIR /app
    
    COPY src/ .
    
    # restore nuget packages
    RUN dotnet restore
    
    # build
    RUN dotnet build --no-restore "-p:Version={build_number}"
    
    # test
    RUN dotnet test --no-build GreetingService.Tests/GreetingService.Tests.csproj
    
    # publish
    RUN dotnet publish --no-build -o output
    
    # create runtime image
    FROM runtime AS final
    WORKDIR /app
    COPY --from=sdk /app/output/ ./
    ENTRYPOINT ["./GreetingsService"]

As you can see there are multiple FROM clauses. Each one discards the previous image so the final image is small.

Note, it's quite common to see just the .csproj file copied first for the restore phase, then the rest of the src contents copied for the build. This will give you smaller, more efficient, layer cacheing. Although if you are using an ephemeral build server such as GitHub Actions, there's probably little to be gained. Personally I like to keep things simple.

Build the image with docker build:


    > docker build -t greetings-service:0.0.1 --build-args build_number=1.0.0 .

Once your container images are built, you should publish them to your internal image repository. GitHub provides a container registry as does Azure and all other cloud providers.

Deploying to Kubernetes

In the simplest case for an aspnet service you can deploy your application/service to Kubernetes by simply running a deployment specifying your image and the number of replicas you want:


    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: greetings-service-deployment
      labels:
        app: greetings-service
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: greetings-service
      template:
        metadata:
          labels:
            app: greetings-service
        spec:
          containers:
          - name: greetings-service
            image: greetings-service:0.0.1

You will also need a Kubernetes “service” to direct traffic to your pods, something like:

apiVersion: v1
kind: Service
metadata:
   name: greetings-service-service
spec:
   selector:
      app: greetings-service
   ports:
   - name: greetings-service-service-port
     protocol: TCP
     port: 3456
     targetPort: 5432
     nodePort: 30001
type: LoadBalancer

Use kubectl apply to run the deployment:


    > kubectl apply -f greetings-service-deployment.yaml

Consider using Helm to template your deployments for any but the very simplest cases. I've also had Kustomize recommended to me, which looks a little simpler than Helm, but I've not used it myself.

Build and Deployment

Your build and deploy pipeline will look something like this:

1. docker build . Build your service container image.
2. docker push ... Push your image to your image repository.
3. kubectl apply ... Execute the YAML file defining your deployment against your Kubernetes cluster.

If you are using GitHub for source control GitHub actions are a good choice for running these steps on a successful commit or pull request merge.

Author of Blog: Mike Hadlow

If you have any comments or questions, feel free to reach out at @mikehadlow

Software guy, blogger, author of EasyNetQ, AsmSpy and Guitar Dashboard. Technology, science, history and music geek. Lover of 70's prog rock, Tintin and Lego.

Extra_End

Tags: Writing .NET Application Services for Kubernetes

Welcome To My New Blog

I've been blogging since 2005 and have written hundreds of posts in that time, but my blogging output has declined somewhat over the last few years. That's partly laziness on my part, and partly because Twitter provides a much more convenient platform for the quick take, but it's also because there's too much friction in publishing a post on my long running Google Blogger site Code Rant. Google has done little to update the ancient Blogspot platform over the years and it’s showing its age. The effort required to upload and format a blog post is too high, and has effectively disincentivised me from blogging. In the last few years I’ve sometimes managed just a single post per year. So, long overdue, I've decided to start a new blog here at mikehadlow.com using modern developer platform technologies. This is the first post.

Mike Hadlow

Writing .NET Application Services for Kubernetes
Jun 24, 2022

In a traditional .NET distributed application, application services (not to be confused with the Kubernetes service object) would either be written as IIS hosted web applications or Windows Services. When building .NET (micro)services to be deployed in a Kubernetes cluster pretty much every facet of the service needs to be reconsidered, not only the hosting environment but the way configuration is accessed, how logging and monitoring work, and the options for state management and memory considerations. In this post I will bring my experience of migrating .NET applications to Kubernetes to enumerate some of the main ways that you will need to change the way you write them.

Recently I had a situation where I needed to test a class with dozens of event handlers. Rather than manually write the repetitive code to attach the handlers I decided to cheat and use reflection. Since there wasn't anything immediately available online that I could find, I'm sharing an example here to show how to do it.

The StackExchange.Redis NuGet package is the most popular .NET client for Redis. It's stable, high-performance, with a great community and wide usage. One of it's most interesting features is the way it does network multiplexing. This allows multi-threaded applications to use a single TCP/IP connection without blocking. Unfortunately this also means that the library does not support blocking operations such as XREAD BLOCK, which is important if you want to leverage Redis Streams on .NET. This post offers a work-around for this.

Read more...

.NET + VS Code + XUnit - Setup Unit Testing & Code Coverage in ASP.NET Core

Tools required for this tutorial
Download example .NET API project
Create unit test project with XUnit
Add some unit tests
Install VS Code test runner
Run tests and view code coverage

Run tests and view code coverage

Now we'll run the tests and view the code coverage information a couple of different ways, first with the Coverage Gutters VS Code extension that highlights covered/uncovered lines within the VS Code editor pane, then we'll generate a code coverage HTML report for the whole project using the ReportGenerator dotnet tool.

How to create and init ".vscode" in VSCode?

Coverlet lcov.info file not generated for dotnet core project

Automatic Unit Testing in .NET Core plus Code Coverage in Visual Studio Code

Tags:

Updating to .NET 8, updating to IHostBuilder, and running Playwright Tests within NUnit headless or headed on any OS

Now that I'm on .NET 8, I scandalously discovered (as my unit tests stopped working) that the rest of the world had moved from IWebHostBuilder to IHostBuilder five version of .NET ago. Gulp. Say what you will, but the backward compatibility is impressive.

public static void Main(string[] args)
{
  CreateHostBuilder(args).Build().Run();
}
 
public static IHostBuilder CreateHostBuilder(string[] args) =>
  Host.CreateDefaultBuilder(args).
      ConfigureWebHostDefaults(WebHostBuilder => WebHostBuilder.UseStartup<Startup>());

Tags:

A reverse proxy as a configurable package on top of Kestrel

Sign me up! Maybe I live under a rock, but I feel like this deserves a ton more attention.

About

A toolkit for developing high-performance HTTP reverse proxy applications.

Extra:

Stars: 7.6k

One popular Author, Milan Jovanović @mjovanovictech, Practical .NET and Software Architecture Tips | Microsoft MVP

It's so cool! I wrote an article about it a few months ago. I used it for load balancing, and it was a breeze to set up.

Link to the article:

https://www.milanjovanovic.tech/blog/implementing-an-api-gateway-for-microservices-with-yarp

Extra_End

Tags: toolkit for high-performance HTTP reverse proxy by Microsoft

Hi, I'm Tim
👋

A software engineer from Belgium who enjoys to blogabout my experiences with the aim to help others and my future self, primarily on Angular and .NET.

Lifelong learner and open-source enthusiast, mainly working on NgRx and Angular Testing Library.

How to test your C# Web API by Tim Deschryver

Extra:

The full example can be found on GitHub.

Why I stopped worrying about test setups by using AutoFixture

Conclusionlink

Because AutoFixture does all of the heavy liftings for me, I stopped overthinking my test setups and I have more time to implement new features.

Most of the properties of an object don't affect the business logic, thus these can just be ignored during the test setup. This drastically reduced the noise of my tests, and I was able to remove a lot of test setup code. Because there's a low coupling between the application code and test code, it also means that I don't have to go back to my tests every time a model is refactored.

The default generated model is perfect for quick and simple test cases, while the model can still be overwritten in a clean way for specific test cases. The customizations are a great way to create default prerequisites of a model in complex domains.

Since there's only a little bit of documentation that can be found in the AutoFixture repository, I really liked and learned a lot from the blog posts written by Mark Seemann, who is also the creator of AutoFixture. So definitely make sure to check those out!

Lastly, here are some highlights of why I think you should be using AutoFixture as a test fixture builder:

1. tests remain clean and focused; the requirements are clearly visible
2. low coupling between test code and application code; application code can change without having an impact on the existing tests
3. test setup code doesn't need to be discussed, implemented, nor maintained; the defaults values of AutoFixture provide a good baseline that can be patched where needed, you also don't need to reflect on providing proper test data because constrained non-deterministic data is everything you need for your test data
4. AutoFixture's API is extensible for manual overwrites; properties can be overwritten with Customizations and SpecimenBuilders. For specific one-off tests, the object under test can be overwritten in the test
5. it can detect quirks in your application code that you haven't thought of; can your application handle unexpected user inputs?
6. it's a quick and simple way to explore a new code base that has little to no tests; if you don't know the domain and the relations between the models, it's hard to provide test data. Luckily this is one of the strong points of AutoFixture, providing fixture

Extra_End

Tags: Entity-Framework

https://www.mytechramblings.com/posts/dotnet-httpclient-basic-usage-scenarios/

Back to .NET Basics: How to properly use HttpClient

Tags: IHttpClientFactory

https://www.tpeczek.com/2019/10/alternative-approach-to-httpclient-in.html

Alternative approach to HttpClient in AZURE Functions 2.0 revisited – move to dependency injection

Tags: IHttpClientFactory

https://github.com/tpeczek

Hi there 👋, I"m Tomasz Pęczek

Software Developer & Architect • Blogger • Speaker • OSS Contributor • Microsoft MVP

Tags:

https://www.youtube.com/watch?v=OcCKmodXW-Q

Likes: 254

http://danpatrascu.com/, https://github.com/danpdc/aspNetCoreBeginners,

Tags: Authorization

https://www.youtube.com/watch?v=OcCKmodXW-Q

Asp.Net Core Authorization Made Easy

http://danpatrascu.com/

https://github.com/danpdc/aspNetCoreBeginners

Likes: 254

Tags: Authorization

https://blog.joaograssi.com/posts/2021/asp-net-core-deep-dive-policy-based-authorization/

Stars: 82

https://github.com/joaopgrassi/authz-custom-middleware/tree/posts/policy-based-deep-dive ,

Tags: Authorization

ASP.Net Core tutorial for beginners(2): Project Structure, application layers and first controller

This second lesson of the “ASP.Net Core for beginners” course is almost entirely hands on. We will satrt implementing our API, by first setting up our project structure according to a layered architecture. We will implement our first endpoints and we will then talk about dependency injection.

Create class library for DAL, Domain, Services, project for WebAPI.

Tags:

https://www.codeproject.com/Articles/1162957/Dependency-Injection-to-The-Core

We can go further and build a composition root where we can do this kind of type initializations. The composition root is a simple class which will get called when the application is first initialized. In that class, we can resolve the specific types for our abstractions.

As you can see here, we have configured our HTTPRequest in such a way that if someone requests the TodoControllerwe will instantiate a new instance of TodoRepository and pass it to the controller's constructor. Likewise, we can change it toTodoCSVRepository or TodoInMemoryepository whenever we feel like. Now we have a single place to do all our dirty works of type initializations. In Web API projects we have to register this composition root in the Global.asax.cs file like this,

Code:

public class CompositionRoot : IHttpControllerActivator
                {
                    public IHttpController Create(
                        HttpRequestMessage request,
                        HttpControllerDescriptor controllerDescriptor,
                        Type controllerType)
                    {
                        if (controllerType == typeof(TodoController))
                            return new TodoController(
                                new TodoRepository());
                        
                        return null;
                    }
                }

One thing to remember here is, a composition root implementation varies from framework to framework. Here we are working with Web API so this won’t work for MVC (may work) and WPF applications. You have to find a way to implement those but don’t worry because the internet has a lot of code snippets that can ease up your task of creating a composition root for a specific framework. Tell you what, I learned how to make a composition root like this for Web API projects after reading a blog from Mark Seeman. Here is the link for it,

http://blog.ploeh.dk/2012/09/28/DependencyInjectionandLifetimeManagementwithASP.NETWebAPI/

That’s good since we have to deal with resolving tiny dependencies for this small project of ours. What if we have a large project where hundreds of dependencies are scattered around? In that cases, composition root won’t be a good idea. That is why in enterprise level, we use a well-known IoC (Inversion of Control) container to make our job easy. IoC containers can resolve dependencies recursively and they are also pretty much easy to configure. They allows us to work with dependency injection lifecycle easily. There are many IoC containers available and most of them do the same things somewhat differently. Let’s use one of them in our current project. Let’s pick Autofac which has a great documentation online. Here is the link where you can know all about the Autofac integration related stuff with Web API projects,

http://autofac.readthedocs.io/en/latest/integration/webapi.html

Extra:

From this article, I explored a new blog site of this Author: Mark Seeman https://blog.ploeh.dk/

Extra_End

ploeh blog | danish software design

Tags: AutoFixture

Using Scoped Services From Singletons in ASP.NET Core

Did you ever need to inject a scoped service into a singleton service? ASP .NET Core has three service lifetimes:

Transient
Singleton
Scoped

Here's how they work.

Tags: Using Scoped Services From Singletons in ASP.NET Core

Can_someone_explain_when_to_use_singleton_scoped

r/csharp Can someone explain when to use Singleton, Scoped and Transient?

https://www.reddit.com/r/csharp/comments/1acwtar/can_someone_explain_when_to_use_singleton_scoped/

Preparingtocode•

7mo ago

Transient: Every time you request a transient class, a new instance is created. This is useful for lightweight, stateless services where each operation requires a clean and independent instance.

114

mcb2001•

7mo ago

Ef core is a unit of work pattern in it self, and should always be scoped

jordansrowles•

8mo ago

This is the default object lifetime for the AddDbContext() method

Dennis_enzo•

8mo ago

theiam79•

7mo ago

That 's when you should start using DbContextFactory instead

Tags:

C# Azure TelemetryClient will leak memory if not implemented as a singleton

Accordion

C# Azure TelemetryClient will leak memory if not implemented as a singleton

I noticed that my classic .net web application would leak memory after I implemented metrics for some background tasks.

Further investigation showed that my MetricAggregationManager would not release its memory.

Since one of the major changes was the implementation of a TelemetryClient, and since the memory not being released was from the Microsoft.ApplicationInsights.Metrics namespace, I concluded that the problem lies within the creation of the TelemetryClient:


using System;
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.Metrics;
 
namespace MyCode
{
  public class BaseProcessor
  {
    private readonly TelemetryClient _telemetryClient;
     
    private BaseProcessor()
    {
      string instrumentationKey = "somekey"
      var telemetryConfiguration = new TelemetryConfiguration { InstrumentationKey = instrumentationKey };
      // This is a no-go. I should not create a new instance for every BaseProcessor
      _telemetryClient = new TelemetryClient(telemetryConfiguration);
    }
  }
}

The code above will create a new TelemetryClient for each creation of my base class. The TelemetryClient will collect metrics and store those in memory until either a set time or number of metrics are met, and then dump the metrics to Application Insights.

Tags: C# Azure TelemetryClient will leak memory if not implemented as a singleton

Accordion

Ana, Hoverbear 🐻

I've been involved with by checking the rest of this site, as well as my Github profile. Things like Rust, distributed consensus, storage, networking, automation, operations, WASM, and embedded are within my interests.

Ana, Hoverbear 🐻

PostgreSQL Aggregates with Rust

Writing custom Aggregates for statistical or analytical purposes.

Reaching for something like SUM(vals) or AVG(vals) is a common habit when using PostgreSQL. These aggregate functions offer users an easy, efficient way to compute results from a set of inputs.

How do they work? What makes them different than a function? How do we make one? What kinds of other uses exist?

We'll explore creating some basic ones using SQL, then create an extension that defines aggregates in Rust using pgx 0.3.0's new aggregate support.

2022-02-09

21 min

Rust and PostgreSQL

Hey, I'm Steven, a .NET Developer, andFreelancer in Zurich, Switzerland. I am a Microsoft MVP.

Also, this blog is open source onGitHub.

xUnit v3 and some stuff about TUnit
Fullstack Web Applications with .NET, Nx and Angular Seminar(Sep 27, 2024)
Swagger Replacement in ASP.NET 9(Sep 23, 2024)
Managing TaskCancellationTokens in a central service in ASP.NET(Sep 16, 2024)
LINQ Joins on multiple columns in Entity Framework(Sep 09, 2024)
How to test HttpClient inside API Tests(Sep 02, 2024)
To Soft Delete or Not to Soft Delete(Aug 26, 2024)
FormattableStringFactory - Creating dynamic FormattableString instances(Aug 19, 2024)
async2 - The .NET Runtime Async experiment concludes(Aug 12, 2024)
Organizing Parameters in Minimal API with the AsParametersAttribute(Aug 05, 2024)
How many APIs does .NET have?(Jul 29, 2024)
The state machine in C# with async/await(Jul 22, 2024)
How not to benchmark!(Jul 15, 2024)
UUID v7 in .NET 9(Jul 08, 2024)
Codespaces for your open-source project(Jul 02, 2024)
ReadOnlySet<T> in .NET 9(Jun 26, 2024)
NCronJob - June Edition(Jun 21, 2024)
Are my EF LINQ to SQL queries safe?(Jun 17, 2024)
Does an HttpClient await the Header and the body?(Jun 10, 2024)
Solving Duplicate Requests in Microservices(May 27, 2024)
Pipeline in ASP.NET 6(May 20, 2024)

Tags:

.NET Framework 4.8 to .NET 6 migration

ASP.NET Web API - Migrate to ASP.NET Core 6

Global.asax has some entry points likeBegin_Request,Application_Start and so on.Application_Start is now in yourProgram.cs where you configure the application.Begin_Request andEnd_Request are now middlewares. Here's an example of such middleware:

public class MyMiddleware
{
    private readonly RequestDelegate _next;

    public MyMiddleware(RequestDelegate next)
    {
        _next = next;
    }

    public async Task Invoke(HttpContext context)
    {
        // Begin_Request
        await _next(context);
        // End_Request
    }
}

Hi,

I recently released a GitHub project for a local conference presentation with detailed steps to upgrade an old .NET Framework WinForms app to .NET WinForms and then upgrade with Blazor Hybrid.

So also what to do after Upgrade Assistant and how Upgrade Assistant is not always right :)

https://github.com/MPrtenjak/WinForms_2_BlazorHybrid

Stars: 14

PS: Great article and a picture that I will (also) use on the conference ;)

Awesome .NET!

A collection of awesome .NET libraries, tools, frameworks and software

Tags:

https://stackoverflow.com/users/2281790/harald-coppoolse

About

Taught by Professor Dijkstra, so I'm conditioned like a Maslov dog to make software how it "ought" to be done, not for fast hacks, which sometimes is a drawback, except if you're working in projects with a lot of people that need to keep working for years and years continuously changing it.

Learned to use C# nine years ago after using C++ for over 25 years.

"Finally a language that is well designed".
Hurray! Never have to use a macro again

Tags:

https://github.com/MikaelGRA/InfluxDB.Client

About

InfluxDB Client for .NET. Timeseries Database - InfluxDB which is very popular and used in many .NET applications.

InfluxDB Client for .NET

Accordion

InfluxDB Client for .NET

This library makes it easy to be a client for InfluxDB on .NET!

This library makes it easy to be a client for InfluxDB on .NET!The basic idea behind the library is that it should be able to turn queries directly into objects of your own classes. Much like micro-ORMS such as dapper.

The goal is that we want to be able to support LINQ syntax in the future.

Installation

Install it through nuget with the following command.

Reading/Writing

The library exposes all HTTP operations on InfluxDB (1.0+) and can be used for reading/writing data to/from in two primary ways:

Using your Own POCO classes

Start by defining a class that represents a row in InfluxDB that you want to store.


    public class ComputerInfo
    {
       [InfluxTimestamp]
       public DateTime Timestamp { get; set; }
    
       [InfluxTag( "host" )]
       public string Host { get; set; }
    
       [InfluxTag( "region" )]
       public string Region { get; set; }
    
       [InfluxField( "cpu" )]
       public double CPU { get; set; }
    
       [InfluxField( "ram" )]
       public long RAM { get; set; }
    }

On your POCO class you must specify these things:

1. 1 property with the type DateTime, DateTime?, DateTimeOffset or DateTimeOffset? as the timestamp used in InfluxDB by adding the [InfluxTimestamp] attribute.

2. 0-* properties with the type string, long, ulong, int, uint, short, ushort, byte, sbyte, double, float, bool, DateTime, DateTimeOffset, decimal or a user-defined enum (nullables too) with the [InfluxTag] attribute that InfluxDB will use as indexed tags. Note that all tags in InfluxDB is still stored a string. The library will simply making the conversion to the specified type automatically.

3. 1-* properties with the type string, long, ulong, int, uint, short, ushort, byte, sbyte, double, float, bool, DateTime, DateTimeOffset, decimal or a user-defined enum (nullables too) with the [InfluxField] attribute that InfluxDB will use as fields.

Once you've defined your class, you're ready to use the InfluxClient, which is the main entry point to the API:

Here's how to write to the database:

private ComputerInfo[] CreateTypedRowsStartingAt( DateTime start, int rows )
{
    var rng =  new Random();
    var regions = new[] { "west-eu", "north-eu", "west-us", "east-us", "asia" };
    var hosts = new[] { "some-host", "some-other-host" };

    var timestamp = start;
    var infos = new ComputerInfo[ rows ];
    for ( int i = 0 ; i < rows ; i++ )
    {
        long ram = rng.Next( int.MaxValue );
        double cpu = rng.NextDouble();
        string region = regions[ rng.Next( regions.Length ) ];
        string host = hosts[ rng.Next( hosts.Length ) ];

        var info = new ComputerInfo { Timestamp = timestamp, CPU = cpu, RAM = ram, Host = host, Region = region };
        infos[ i ] = info;

        timestamp = timestamp.AddSeconds( 1 );
    }

    return infos;
}

public async Task Should_Write_Typed_Rows_To_Database()
{
    var client = new InfluxClient( new Uri( "http://localhost:8086" ) );
    var infos = CreateTypedRowsStartingAt( new DateTime( 2010, 1, 1, 1, 1, 1, DateTimeKind.Utc ), 500 );
    await client.WriteAsync( "mydb", "myMeasurementName", infos );
}

Author of Blog: Mikael Guldborg Rask Andersen

Stars: 103

Tags: InfluxDB Client for .NET, InfluxDB, Timeseries Database

https://stackoverflow.com/questions/53811620/pull-data-from-multiple-tables-in-one-sql-query-using-linq-and-entity-framework

Pull data from multiple tables in one SQL query using LINQ and Entity Framework(Core)

Answerd by: https://stackoverflow.com/users/2281790/harald-coppoolse

I wanted to grab the 10 latest transactions and 10 latest customers in one LINQ query

It is a bit unclear what you want. I doubt that you want one sequence with a mix of Customers and Transactions. I guess that you want the 10 newest Customers, each with their last 10 Transactions?

I wonder why you would deviate from the entity framework code first conventions. If your class Customer represents a row in your database, then surely it doesn't have a HashSet <Transaction>?

A one-to-many of a Customer with his Transactions should be modeled as follows:

Code:


class Customer
{
    public int Id {get; set;}
    ... // other properties

    // every Customer has zero or more Transactions (one-to-many)
    public virtual ICollection<Transaction> Transactions {get; set;}
}
class Transaction
{
    public int Id {get; set;}
    ... // other properties

    // every Transaction belongs to exactly one Customer, using foreign key
    public int CustomerId {get; set;}
    public virtual Customer Customer {get; set;}
}

public MyDbContext : DbContext
{
    public DbSet<Customer> Customers {get; set;}
    public DbSet<Transaction> Transactions {get; set;}
}

This is all that entity framework needs to know to detect the tables you want to create, to detect your one-to-many relationship, and to detect the primary keys and foreign keys. Only if you want different names of tables or columns, you'll need attributes and/or fluent API The major differences between my classes and yours, is that the one-to-many relation is represented by virtual properties. The HashSet is an ICollection. After all, your Transactions table is a collection of rows, not a HashSet

In entity framework the columns of your tables are represented by non-virtual properties; the virtual properties represent the relations between the tables (one-to-many, many-to-many, ...)

Quite a lot of people tend to (group-)join tables, when they are using entity framework. However, life is much easier if you use the virtual properties

Back to your question

I want (some properties of) the 10 newest Customers, each with (several properties of) their 10 latest Transactions

Code:


var query = dbContext.Customers                           // from the collection of Customer
.OrderByDescending(customer => customer.Created)      // order this by descending Creation date
.Select(customer => new                               // from every Customer select the
{                                                     // following properties
     // select only the properties you actually plan to use
     Id = Customer.Id,
     Created = Customer.Created,
     Name = Customer.Name,
     ...

     LatestTransactions = customer.Transactions        // Order the customer's collection
         .OrderBy(transaction => transaction.Created)  // of Transactions
         .Select(transaction => new                    // and select the properties
         {
             // again: select only the properties you plan to use
             Id = transaction.Id,
             Created = transaction.Created,
             ...

             // not needed you know it equals Customer.Id
             // CustomerId = transaction.CustomerId,
         })
         .Take(10)                                      // take only the first 10 Transactions
         .ToList(),
})
.Take(10);                                              // take only the first 10 Customers

Entity framework knows the one-to-many relationship and recognizes that a group-join is needed for this.

One of the slower parts of your query is the transfer of the selected data from the DBMS to your local process. Hence it is wise to limit the selected data to the data you actually plan to use. If Customer with Id 4 has 1000 Transactions, it would be a waste to transfer the foreign key for every Transaction, because you know it has value 4.

Tags: Entity-Framework

https://www.entityframeworktutorial.net/code-first/code-first-conventions.aspx

Extra:

These EF 6.x Code-First conventions are defined in the System.Data.Entity.ModelConfiguration.Conventions namespace.

The following table lists default code first conventions:

Default Convention For	Description
Schema	EF creates all the DB objects into the dbo schema.
Table Name	By default, EF will create a DB table with the entity class name suffixed by 's' e.g. Student domain class (entity) would map to the Students table.
Primary key Name	Id <Entity Class Name> + "Id" (case insensitive) EF will create a primary key column for the property named Id or <Entity Class Name> + "Id" (case insensitive).
Foreign key property Name	By default EF will look for the foreign key property with the same name as the principal entity primary key name. If the foreign key property does not exist, then EF will create an FK column in the Db table with <Dependent Navigation Property Name> + "_" + <Principal Entity Primary Key Property Name> e.g. EF will create Grade_GradeId foreign key column in the Students table if the Student entity does not contain foreign key property for Grade.

Extra_End

Tags: Entity-Framework

https://github.com/dotnet/EntityFramework.Docs/blob/main/entity-framework/core/modeling/relationships/one-to-many.md

Extra:

https://github.com/dotnet/EntityFramework.Docs

Stars: 1.5k

title: One-to-many relationships - EF Core description: How to configure one-to-many relationships between entity types when using Entity Framework Core author: ajcvickers ms.date: 03/30/2023 uid: core/modeling/relationships/one-to-many

Extra_End

Tags: Entity-Framework

https://codeblog.jonskeet.uk/2019/03/27/storing-utc-is-not-a-silver-bullet/

Extra:

STORING UTC IS NOT A SILVER BULLET

Note: this is a pretty long post. If you are not interested in the details, the conclusion at the bottom is intended to be read in a standalone fashion. There is also a related blog post by Lau Taarnskov https://www.creativedeletion.com/2015/03/19/persisting_future_datetimes.html - if you find this one difficult to read for whatever reason, maybe give that a try.

When I read Stack Overflow questions involving time zones, there 's almost always someone giving the advice to only ever store UTC. Convert to UTC as soon as you can, and convert back to a target time zone as late as you can, for display purposes, and you 'll never have a time zone issue again, they say.

This blog post is intended to provide a counterpoint to that advice. I'm certainly not saying storing UTC is always the wrong thing to do, but it s not always the right thing to do either. Note on simplifications: this blog post does not go into supporting non-Gregorian calendar systems, or leap seconds. Hopefully developers writing applications which need to support either of those are already aware of their requirements.

Background: EU time zone rule changes

The timing of this blog post is due to recent European Parliament proceedings that look like they will probably end the clocks changing twice a year into “summer time” or “winter time” within EU member states. The precise details are yet to be finalized and are unimportant to the bigger point, but for the purpose of this blog post I'll assume that each member state has to decide whether they will “spring forward” one last time on March 28th 2021, then staying in permanent “summer time”, or “fall back” one last time on October 31st 2021, then staying in permanent “winter time”. So from November 1st 2021 onwards, the UTC offset of each country will be fixed – but there may be countries which currently always have the same offset as each other, and will have different offsets from some point in 2021. (For example, France could use winter time and Germany could use summer time.)

The larger point is that time zone rules change, and that applications should expect that they will change. This isn't a corner case, it's the normal way things work. There are usually multiple sets of rule changes (as released by IANA) each year. At least in the European changes, we're likely to have a long notice period. That often isn't the case – sometimes we don't find out about rule changes until a few days before they happen.

Extra_End

Tags: NodaTime,Coordinated Universal Time/UTC

wesdoyle followed Shay Rojansky roji, Principal software engineer working on .NET data access and perf, member of the Entity Framework team at Microsoft. Lead dev of Npgsql, the PostgreSQL provider.

When “UTC everywhere” isn't enough - storing time zones in PostgreSQL and SQL Server - Shay Rojansky's Blog (roji.org)

Extra:

When “UTC everywhere” isn't enough

I've been dealing a lot with timestamps, timezones and database recently - especially on PostgreSQL (see this blog post), but also in general. Recently, on the Entity Framework Core community standup, we also hosted Jon Skeet and chatted about NodaTime, timestamps, time zones, UTC and how they all relate to databases - I highly recommend watching that!

Now, a lot has been said about “UTC everywhere”; according to this pattern, all date/time representations in your system should always be in UTC, and if you get a local timestamp externally (e.g. from a user), you convert it to UTC as early as possible. The idea is to quickly clear away all the icky timezone-related problems, and to have a UTC-only nirvana from that point on. While this works well for many cases - e.g. when you just want to record when something happened in the global timeline - it is not a silver bullet, and you should think carefully about it. Jon Skeet already explained this better than I could, so go read his blog post on this. As a very short tl;dr, time zone conversion rules may change after the moment you perform the conversion, so the user-provided local timestamp (and time zone) may start converting to a different UTC timestamp at some point! As a result, for events which take place on a specific time in a specific time zone, it's better to store the local timestamp and the time zone (not offset!).

So let's continue Jon's blog post, and see how to actually perform that on two real databases - PostgreSQL and SQL Server. Following Jon's preferred option, we want to store the following in the database:

1. The user-provided local timestamp.
2. The user-provided time zone ID. This is not an offset, but rather a daylight savings-aware time zone, represented as a string.
3. A UTC timestamp that's computed (or generated) from the above two values. This can be used to order the rows by their occurrence on the global timeline, and can even be indexed.

In Jon's NodaTime library, the ZonedDateTime type precisely represents the first two values above. Unfortunately, databases typically don't have such a type; SQL Server does have datetimeoffset, but an offset is not a time zone (it isn't daylight savings-aware). So we must use separate columns to represent the data above.

PostgreSQL

PostgreSQL conveniently has a type called timestamp without time zone for local timestamps in an unspecified time zone, and a badly-named type called timestamp with time zone, for UTC timestamps (no time zone is actually persisted); those are perfect for our two timestamps. We also want the UTC timestamp to be generated from the two other values, so we’ll set up a PostgreSQL generated column (called computed column by EF Core) to do that. Here's the minimal EF Core model and context, using the NodaTime plugin:

Extra_End

Tags: NodaTime,Coordinated Universal Time/UTC

Upgrade .NET version of XXX Server Components

Extra:

To support TLS 1.2 capabilities, the recommendation to upgrade the app to .NET Framework 4.7 or a later version.

We only upgrade XXX Server components with .NET 4.7.2 and XXX Agent still on .NET 4.5.2

Technical Description

XXX Management Service, XXX Management Tool and Configuration Loader Component will be targeted to .NET 4.7.2 from 4.5.2

All associated Query/3rd party open source will be updated with the right version and the targeting framework.

For .NET Framework 3.5 - 4.5.2 and not WCF

We recommend you upgrade your app to .NET Framework 4.7 or a later version. If you cannot upgrade, take the following steps:

1. Set the values of SchUseStrongCrypto and SystemDefaultTlsVersions registry entries to 1. See Configuring security via the Windows Registry..NET Framework 3.5 supports the SchUseStrongCrypto flag only when an explicit TLS value is passed.
2. If you're running on .NET Framework 3.5, you need to install a hot patch so that TLS 1.2 can be specified by your program:

Extra_End

Tags: Upgrade NET Framework .NET 4.7.2 from 4.5.2

Enabling TLS 1.2 on your .NET application by Dominic Burford

I recently came across an issue with several of our ASP.NET WebAPI services which were consuming a third-party set of APIs. These third-party APIs were configured to disable any requests from clients that were using TLS 1.0/1.1. Unfortunately, this included our own APIs. All requests to the third-party API were returning empty responses. After some discussion with one of the developers of the third-party APIs, he suggested the issue may be related to TLS 1.2 not being supported as he had seen the issue before.

Extra:

Claps: 25

The Transport Layer Security (TLS) protocol is an industry standard designed to help protect the privacy of information communicated over the Internet. TLS 1.2 is a standard that provides security improvements over previous versions. TLS 1.2 will eventually be replaced by the newest released standard TLS 1.3 which is faster and has improved security.

- Transport Layer Security (TLS) best practices with the .NET Framework | Microsoft Docs

I was able to run the third-party APIs from our local test environment, but not when I ran them from our staging / production environments which were hosted on Azure. I had to make several changes, including code changes to the ASP.NET WebAPI services and changes to our Azure hosting environments.

As many current servers are moving towards TLS 1.2/1.3 and removing support for TLS 1.0 /1.1, connectivity issues between newer servers and older (legacy) .NET applications are becoming more common. Installing a newer version of the .NET Framework onto your development environment is not the answer. The solution is down to the version of the .NET Framework used for compiling your project. This is what actually matters when it comes to selecting the supported TLS version during the TLS handshake.

In this article I will describe the changes I have made to our Azure hosting (where our ASP.NET WebAPIs are hosted) and the code changes which enabled TLS 1.2 support.

Upgrading our Azure hosting to support TLS 1.2

More accurately the changes I have made to our Azure hosting have removed support for earlier versions of TLS i.e. TLS 1.0/1.1. Although this change was not strictly necessary to fix the problem I was experiencing, it was appropriate in terms of tightening up the security of our ASP.NET WebAPIs and to ensure that our own APIs can only be accessed by clients that support TLS 1.2. This is quite simply achieved by opening the Azure portal and navigating to the App Service hosting. From there the TLS/SSL Settings blade can be selected.

I have set this to TLS 1.2 for both our staging and production environments. This sets the minimum TLS version. Therefore our hosting environments will no longer accept requests from earlier versions of TLS.

Code changes to support TLS 1.2

Depending on what version of .NET Framework your project uses will dictate the possible solutions available to you. If your project compiles against .NET Framework >= 4.7 then you are already good to go. Applications developed in .NET Framework 4.7 or greater automatically default to whatever the operating system they run on considers safe (which currently is TLS 1.2 and will later include TLS 1.3).

If your application has been developed in a version of the .NET Framework prior to 4.7 then you have two options.

Recompile your application using .NET Framework 4.7 or greater - If recompiling your application is not something you can do then you can update your .config file by adding the following.

Code:


            <configuration>
              <runtime>
                <AppContextSwitchOverrides value="Switch.System.Net.DontEnableSystemDefaultTlsVersions=false"/>
              </runtime>
            </configuration>

Also make sure you have the following set in your .config file.

Code:


            <system.web>
              <compilation targetFramework="x.y.z" />
              <httpRuntime targetFramework="x.y.z" /> <-- this is the important one!
            </system.web>

Extra_End

Tags: TLS 1.2, .NET application

Setting up ASP.NET Core dev certs for both WSL and Windows

Extra:

For those of you who haven't read the old post, here is some background information. If you ever want to do your ASP.NET Core development using both WSL (using the Remote - WSL extension) and Windows, you will soon realize that there are some inherent issues with the local development certs… Mainly that ASP.NET Core sets up one development certificate in Windows, and one in Linux. And neither environment trusts the other. Not to mention that Linux doesn't even trust its own cert, making server to server communication hard in Linux.

Unfortunately, the tools provided by dotnet doesn't quite seem to do the trick when trying to get mutual trust to work. I'm not sure why, but at least on my machine, any cert that is generated by dotnet has problems when it comes to being able to trust it in Linux. So because of this, there are a few hoops we need to jump through to get this to work…

Note: My guess is that Linux requires a CA to issue the cert to be able to trust it. However, the cert generated by .NET is not a properly issued cert with a CA as this has some inherent dangers… Dangers I will ignore on my development box, and try to mitigate by keeping my cert VERY safe

In the previous post, there were quite a few steps involved in getting it to work. However, it can be simplified a bit at least…and made to work…

Dev certs in Linux

When you install the .NET SDK, an ASP.NET developer certificate is generated and configured for use by ASP.NET. However, it doesn't seem like that cert is being properly trusted by Linux, causing server to server communication to fail. Because of this, we need to generate our own self-signed cert. Luckily, this isn't too hard with the help of Google…

The first step is to create an OpenSSL configuration file that looks like this


            [req]
            prompt                  = no
            default_bits            = 2048
            distinguished_name      = subject
            req_extensions          = req_ext
            x509_extensions         = x509_ext
            [subject]
            commonName              = localhost
            [req_ext]
            basicConstraints        = critical, CA:true
            subjectAltName          = @alt_names
            [x509_ext]
            basicConstraints        = critical, CA:true
            keyUsage                = critical, keyCertSign, cRLSign, digitalSignature,keyEncipherment
            extendedKeyUsage        = critical, serverAuth
            subjectAltName          = critical, @alt_names
            1.3.6.1.4.1.311.84.1.1  = ASN1:UTF8String:ASP.NET Core HTTPS development certificate
            [alt_names]
            DNS.1                   = localhost
            DNS.2                   = 127.0.0.1

Note: This config creates a certificate that is both a CA and an SSL cert. The reason for this is that Linux needs a CA cert to be able to trust it.

Once we have our OpenSSL configuration, we can go ahead and generate our certificate by running


            > openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
               -keyout localhost.key \
               -out localhost.crt \
               -config localhost.conf

This generates a key pair based on the configuration file we just created.

Just for the sake of it, we can verify that it currently isn't trusted, by running


            > openssl verify localhost.crt

            CN = localhost
            error 18 at 0 depth lookup: self signed certificate
            error localhost1.crt: verification failed

To trust the cert, we need to copy it to /usr/local/share/ca-certificates, and add it to our trusted CA:s. Like this


           > sudo cp localhost.crt /usr/local/share/ca-certificates
           > sudo update-ca-certificates

After that, we should be able to verify that the cert is trusted by running openssl verify again


            > openssl verify localhost.crt

            localhost.crt: OK

The last step is to make sure that ASP.NET Core uses this certificate when SSL is turned on. This is easily done by running the dotnet dev-certs command. However, this command expects a PKCS12 certificate. So before we can do that, we need to convert our key pair into a PKCS12 cert using OpenSSL


          openssl pkcs12 -export -out localhost.pfx -inkey localhost.key -in localhost.crt

Once we have out PFX-file, we can finally tell ASP.NET to use it by running

Remember: Make sure it is a secure password! If anyone should get their hands on this cert, it could be used to launch a malicious attack against your machine. So keeping it safe is VERY important! And also make sure that the key pair is stored in a safe place.

Ok, that's it from the Linux side…

Dev certs in Windows

When you install the .NET Core SDK on Windows, it creates a development HTTPS certificate for you automatically just like it does in Linux. All you have to do is run dotnet dev-certs https --trust to trust it and you are good to go! However, in this case we want to replace that cert with the cert we just created…

The first step in doing this, is to get hold of the PFX file we just generated. On my machine, I copy it to a user specific folder by running the following command in WSL

Author of Blog: Chris Klug

If you have any comments or questions, feel free to reach out at @ZeroKoll!

Extra_End

Tags: Setting up ASP.NET Core dev certs for both WSL and Windows

SEAF: A Scalable, Efficient, and Application-independent Framework for container security detection - ScienceDirect

Extra:

Author links open overlay panelLibo Chen a, Yihang Xia b, Zhenbang Ma c, Ruijie Zhao a, Yanhao Wang c, Yue Liu d, Wenqi Sun a, Zhi Xue a

Citations: 6

Extra_End

Tags: Docker Security Detection

wilsonmar (Wilson Mar) (github.com) followed @spotakash

Azure Cross Tenant Access (Authentication, Authorization, Private Communication).

Extra:

There are two idependent Azure tenants, across which we shall try to build authentication, authorization. Subsequently, we shall have secure private connectivity between both tenant so that communication does not traverse through internet and remain private.

1. Source Tenant: Tenant which is Central Identity Store. This identity store (Azure AD) will create and store Service Principals. In Source Tenant, We are processing data residing in Destination tenant.
2. Destination Tenant: Tenant where data is coming in and residing. Data from this tenant can not move out. It is data store.

Note: Have understanding in Azure Active Diretory what is Application Registration, Enterprise Application (Service Pricipal)

Requirements:

1. Systems, running in Source Tenant, should be able to reach Destination Tenant.
2. While doing so, proper secured authentication and authorization should be performed.
3. For authentication in Destination Tenant, no guest account access should be used.
4. Request should not traverse through Internet and traffic should remain total private.
5. Source and Destination tenant can not have any sort of Virtual Network Peering or Mesh Private Connectivity between them.
6. All requests at Source Tenant and Destination on Identity, Connectivity and Data Layer should be logged in loganalytics for compliance purpose.
7. No Private IP hardcoding should be used in any system. Proper FQDNs based DNS resolution happen while accessing/processing data.

Solution:

1. Use Multi-tenant Azure AD Service Principal (Enterprise Application) to authenticate across Source and Destination Tenants.
2. Use Cross Tenant Private Endpoint to access resources over Private Network (Microsoft Backbone Network), without having any Virtual Network Peering or Mesh Private Connectivity
3. Centralized Azure Private DNS Zone for DNS Resoluation for Cross Tenant DNS resolution to Private Endpoint

Cross Tenant Access in Azure

Pre-Requirements:

1. Source and Destination Tenant Administrative Rights to create Application under Application Registration
2. Have a virtual network with subnet (ideally) to be used for Private Endpoint, in Source Tenant.
3. Source and Destination Tenant Administratibe Rights to create cross tenant Private Endpoint Request (at Source Tenant) And Approval (at Destination Tenant)
4. Sufficient IAM Role to assign IAM to Service Principal (created above) on Azure resource example: Storage/DB/Redis/AKS etc. (at Destination Tenant).
1. 4a. Atleast, Azure Resource Manager Reader role
2. 4b. For Storage Access (example), data access role, such as Storage Blob Data Contributor
3. 4c. Define your IAM strategy accordingly. Treat both Control Plane and Data Plane permission Good Azure Document to Refer.
5. If using existing Centralized Azure Private DNZ Zone (in our case we are using), atleast Private DNZ Zone Contributor to allow you to create DNS Record for Existing Private Endpoint

Technical Steps

Create Cross Tenant Application

Source Tenant Service Principal Creation

1. Have Service Principal in Source Tenant Identity Store (AAD)
az ad sp create-for-rbac -n "cross-tenant-app"
2. Retrieving and verifying details
1. 2.1. Note down Application ID and Secret generated in Source Tenant
2. 2.2. Verify Application ID Listed in Application Registration (as Application (client) ID) and 'Enteprise Application' (as Application ID).
3. Go to Application Registration/Authentication and Enable for ApplicationID 'Accounts in any organizational directory (Any Azure AD directory - Multitenant)
4. By doing these steps, a multi-tenant Service Principal has been created in Source Tenant.

Stars: 10

Extra_End

Tags: Azure Active Diretory, Application Registration, Enterprise Application (Service Pricipal)

Sidecar Proxy Pattern - The Basis Of Service Mesh by Ivan Velichko

Sidecar Proxy Pattern - The Basis Of Service Mesh

bibryam status 1728319357854343367

Sidecar Proxy Pattern - The Basis Of Service Mesh by @iximiuz

Extra:

Heart: 347

Author of Blog: Ivan Velichko

Want to learn Docker or Kubernetes? Struggle to understand what this Cloud Native buzz is about? On a mission to master Server-Side Craft? Then you're at the right place!

✔ In-depth technical materials.
✔ Focus on true fundamentals.
✔ Clear and visual explanations.
✔ Interactive playgrounds.New

As someone going through a similar learning journey, I keep my articles experiential and put a great deal of effort into explanatory drawings and interactive playgrounds.

Extra_End

Tags: Sidecar Proxy Pattern, Service Mesh

Coda Hale codahale (github.com) followed @samuel-lucas6

Guidance on implementing cryptography as a developer by Samuel Lucas

Extra:

Kryptor: A simple, modern, and secure encryption and signing tool.

Geralt: A modern cryptographic library for .NET based on libsodium and inspired by Monocypher.

Cahir: A deterministic password manager.

Milva: A simple, cross-platform command line tool for hashing files and text.

Vanity: A simple WireGuard vanity public key generator.

The AEGIS Family of Authenticated Encryption Algorithms: The AEGIS-128L, AEGIS-256, AEGIS-128X, and AEGIS-256X Internet-Draft.

Balloon Hashing: An Internet-Draft for the Balloon password hashing function.

Encrypt-then-MAC for Committing AEAD (cAEAD): An Internet-Draft and committing ChaCha20-BLAKE2b AEAD implementation.

Cryptography Guidelines: Guidance on implementing cryptography as a developer.

This document outlines recommendations for cryptographic algorithm choices and parameters as well as important implementation details based on what I have learnt from reading about the subject and the consensus I have observed online. Note that some knowledge of cryptography is required to understand the terminology used in these guidelines.

My goal with these guidelines is to provide a resource that I wish I had access to when I first started writing programs related to cryptography. If this information helps prevent even just one vulnerability, then I consider it time well spent.

Note

This document is slowly being rewritten and split into individual pages. Please view the sections folder for the latest information.

Acknowledgements

These guidelines were inspired by this Cryptographic Best Practices gist, Latacora's Cryptographic Right Answers, and Crypto Gotchas, which is licensed under the Creative Commons Attribution 4.0 International License. The difference is that I mention newer algorithms and have tried to justify my algorithm recommendations whilst also offering important notes about using them correctly.

Disclaimer

I am a psychology undergraduate with an interest in applied cryptography, not an experienced cryptographer. I primarily have experience with the libsodium library since that's what I've used for my projects, but I've also reported some security vulnerabilities related to cryptography.

Most experienced cryptographers don't have the time to write things like this, and the following information is freely available online or in books, so whilst more experience would be beneficial, I'm trying my best to provide accurate information that can be fact checked. If I've made a mistake, please contact me to get it fixed.

Note that the rankings are based on my opinion, algorithm availability in cryptographic libraries, and which algorithms are typically used in modern protocols, such as TLS 1.3, Noise Protocol Framework, WireGuard, and so on. Such protocols and recommended practices make for the best guidelines because they have been approved by experienced professionals.

General Guidelines

Research, research, research: you often don't need to know how cryptographic algorithms work under the hood to implement them correctly, just like how you don't need to know how a car works to drive. However, you need to know enough about what you're trying to do, which requires looking up relevant information online or in books, reading the documentation for the cryptographic library you're using, reading RFC standards, reading helpful blog posts, and reading guidelines like this one. Furthermore, reading books about the subject in general will be beneficial, again like how knowing about cars can help if you break down. For a list of great resources, check out my How to Learn About Cryptography blog post.
Check and check again: it's your responsibility to get things right the first time around to the best of your ability rather than relying on peer review. Therefore, I strongly recommend always reading over security sensitive code at least twice and testing it to ensure that it's operating as expected (e.g. checking the value of variables line by line using a debugger, using test vectors, etc).
Peer review is great but often doesn't happen: unless your project is popular, you have a bug bounty program with cash rewards, or what you're developing is for an organisation, very few people, perhaps none, will look through the code to find and report vulnerabilities. Similarly, receiving funding for a code audit will probably be impossible.
Please don't create your own custom cryptographic algorithms (e.g. a custom cipher or hash function): this is like flying a Boeing 747 without a pilot license but worse because even experienced cryptographers design insecure algorithms, which is why cryptographic algorithms are thoroughly analysed by a large number of cryptanalysts, usually as part of a competition. By contrast, you rarely see experienced airline pilots crashing planes. The only exception to this rule is implementing something like Encrypt-then-MAC with secure, existing cryptographic algorithms when you know what you're doing.
Please avoid coding existing cryptographic algorithms yourself (e.g. coding AES yourself): cryptographic libraries provide access to these algorithms for you to prevent people from making mistakes that cause vulnerabilities and to offer good performance. Whilst a select few algorithms are relatively simple to implement, like HKDF, many aren't and require a great deal of experience to implement correctly. Lastly, another reason to avoid doing this is that it's not much fun since academic papers and reference implementations can be very difficult to understand.

Cryptographic Libraries

Use(In Order)

Libsodium: a modern, extremely fast, easy-to-use, well documented, and audited library that covers all common use cases, except for implementing TLS. However, it's much bigger than Monocypher, meaning it's harder to audit and not suitable for constrained environments, and requires the Visual C++ Redistributable to work on Windows.
Monocypher: another modern, easy-to-use, well documented, and audited library, but it's about half the speed of libsodium on desktops/servers, has no misuse resistant functions (e.g. like libsodium's secretstream() and secretbox()), only supports Argon2i for password hashing, allowing for insecure parameters (please see the Password Hashing/Password-Based Key Derivation Notes section), and offers no memory locking, random number generation, or convenience functions (e.g. Base64/hex encoding, padding, etc). However, it's compatible with libsodium whilst being much smaller, portable, and fast for constrained environments (e.g microcontrollers).
Tink: a misuse resistant library that prevents common pitfalls, like nonce reuse. However, it doesn't support hashing or password hashing, it's not available in as many programming languages as libsodium and Monocypher, the documentation is a bit harder to navigate, and it provides access to some algorithms that you shouldn't use.
LibHydrogen: a lightweight, easy-to-use, hard-to-misuse, and well documented library suitable for constrained environments. The downsides are that it's not compatible with libsodium whilst also running slower than Monocypher. However, it has some advantages over Monocypher, like support for random number generation, even on Arduino boards, and easy access to key exchange patterns, among other things.

Avoid(In Order)

A random library (e.g. with 0 stars) on GitHub: assuming it's not been written by an experienced professional and it's not a libsodium or Monocypher binding to another programming language, you should generally stay away from less popular, unaudited libraries. They are much more likely to suffer from vulnerabilities and be significantly slower than the more popular, audited libraries. Also, note that even experienced professionals make mistakes.
OpenSSL: very difficult to use, let alone use correctly, offers access to algorithms and functions that you shouldn't use, the documentation is a mess, and lots of vulnerabilities have been found over the years. These issues have led to OpenSSL forks and new, non-forked libraries that aim to be better alternatives if you need to implement TLS.
The library available in your programming language: most languages provide access to old algorithms (e.g. MD5 and SHA1) that shouldn't be used anymore instead of newer ones (e.g. BLAKE2, BLAKE3, and SHA3), which can lead to poor algorithm choices. Furthermore, the APIs are typically easy to misuse, the documentation may fail to mention important security related information, and the implementations will be slower than libsodium. However, certain languages, such as Go and Zig have impressive modern cryptography support.
Other popular libraries I haven't mentioned (e.g. BouncyCastle, CryptoJS, etc): these again often provide or rely on dated algorithms and typically have bad documentation. For instance, CryptoJS uses an insecure KDF called EVP_BytesToKey() in OpenSSL when you pass a string password to AES.encrypt(), and BouncyCastle has no C# documentation. However, this recommendation is too broad really since there are some libraries that I haven't mentioned that are worth using, like PASETO. Therefore, as a rule of thumb, if it doesn't include several of the algorithms I recommend in this document, then it's probably bad. Just do your research and assess the quality of the documentation. There's no excuse for poor documentation.
NaCl: an unmaintained, less modern, and more confusing version of libsodium and Monocypher. For example, crypto_sign() for digital signatures has been experimental for several years. It also doesn't have password hashing support and is difficult to install/package.
TweetNaCl: unmaintained, slower than Monocypher, doesn't offer access to newer algorithms, doesn't have password hashing, and does not zero out buffers.

Notes

If the library you're currently using/planning to use doesn't support several of the algorithms I'm recommending, then it's time to upgrade and take advantage of the improved security and performance benefits available to you if you switch.
Please read the documentation: don't immediately jump into coding something because that's how mistakes are made. Good libraries have high quality documentation that will explain potential security pitfalls and how to avoid them.
Some libraries release unauthenticated plaintext when using AEADs: for example, OpenSSL and BouncyCastle apparently do. Firstly, don't use these libraries for this reason and the reasons I've already listed. Secondly, never do anything with unauthenticated plaintext; ignore it to be safe.
Older doesn't mean better: you can argue that older algorithms are more battle tested and therefore proven to be a safe choice, but the reality is that most modern algorithms, like ChaCha20, BLAKE2, and Argon2, have been properly analysed at this point and shown to offer security and performance benefits over their older counterparts. Therefore, it doesn't make sense to stick to this overly cautious mindset of avoiding newer algorithms, except for algorithms that are still candidates in a competition (e.g. new post-quantum algorithms), which do need further analysis to be considered safe.
You should prioritise speed: this can make a noticeable difference for the user. For example, a C# Argon2 library is going to be significantly slower than Argon2 in libsodium, meaning unnecessary and unwanted extra delay during key derivation. Libsodium is the go-to for speed on desktops/servers, and Monocypher is the go-to for constrained environments (e.g. microcontrollers).

Stars: 395

Extra_End

Tags: Cryptography-Guidelines

Reference from this Article:

Cryptography Guidelines: Guidance on implementing cryptography as a developer.

Extra:

Stars: 685

Author of Blog:Aaron Toponce

Extra_End

Tags: Cryptographic Best Practices

Connor Leechconnor11528 followed James Bradygoodgravy

Extra:

About

Redirect ActiveRecord (Rails) reads to replica databases while ensuring all writes go to the primary database.

Status

This is a slight modification of Rocket Job's original library, simply renaming it from active_record_slave to active_record_replica.

In order to more clearly distinguish the library from active_record_slave, we also incremented the major version – it is, however, functionally equivalent.

Introduction

active_record_replica redirects all database reads to replica instances while ensuring that all writes go to the primary database. active_record_replica ensures that any reads that are performed within a database transaction are by default directed to the primary database to ensure data consistency.

Status

Production Ready. Actively used in large production environments.

1.1 Redirecting reads to a single replica database.
1.2 Works with any database driver that works with ActiveRecord.

2. Supports all Rails 3, 4, or 5 read apis.
1. 2.1. Including dynamic finders, AREL, and ActiveRecord::Base.select.
2. 2.2. NOTE: In Rails 3 and 4, QueryCache is only enabled for BaseConnection by default. In Rails 5, it's enabled for all connections. (PR)

3. Transaction aware
1. 3.1. Detects when a query is inside of a transaction and sends those reads to the primary by default.
2. 3.2. Can be configured to send reads in a transaction to replica databases.

4. Lightweight footprint.

5. No overhead whatsoever when a replica is not configured.

6. Negligible overhead when redirecting reads to the replica.

7. Connection Pools to both databases are retained and maintained independently by ActiveRecord.

8. The primary and replica databases do not have to be of the same type.
1. 8.1. For example Oracle could be the primary with MySQL as the replica database.
9. Debug logs include a prefix of Replica: to indicate which SQL statements are going to the replica database.

Code:


            # Read from the replica database
            r = Role.where(name: 'manager').first
            r.description = 'Manager'
            
            # Save changes back to the primary database
            r.save!
 
            Example showing how reads within a transaction go to the primary

            Role.transaction do
            r = Role.where(name: 'manager').first
            r.description = 'Manager'
            r.save!
            end

Stars: 160

Extra_End

Tags: Redirect ActiveRecord (Rails) reads to replica databases, write to Primary database

RSA is deceptively simple (and fun)

While reading Real-World Cryptography, I came across the "million message attack". This is an attack that Daniel Bleichenbacher demonstrated in 1998, which effectively broke RSA with a particular encoding function called PKCS #1. It was only mentioned briefly, so I dug in and decided to try to understand the attack, eventually to implement it.

Most crypto libraries do not ship with a vulnerable implementation of this, for good reason. It's been broken! And if I implement the full attack against a real implementation, it would also come with using realistic key size.

Instead, I decided to implement RSA myself so that I could implement a weak encoding scheme so I could implement the Bleichenbacher attack! So far, I have an implementation of RSA and of PKCS (the vulnerable one). The basics of RSA took an hour to implement, then what felt like days to debug. And now it (seemingly) works! The attack will follow soon, with any luck.

What's RSA, anyway?

RSA is a public-key cryptosystem, in contrast to symmetric key cryptosystems. With symmetric keys, the sender and the recipient both share a key and use the same key to encrypt and decrypt the message. In contrast, public-key cryptosystems have a key pair, a public and a private key. The public key can be used to encrypt messages and the private key to decrypt them¹.

One of the drawbacks of a symmetric key system is that you have to share the key. This means you have to use a different secure channel to transmit the key, and then both parties need to be really careful to keep it a secret. This isn't manageable for a system with a lot of participants, like the internet!

But symmetric key encryption is often very fast, and we have some of the operations for it even baked into hardware. It would be nice to use it where we can for that efficiency.

In contrast, with public-key cryptography, you can freely share the public key, and anyone can then use that to encrypt a message to you. This means you do not need a separate secure channel to share the key! (Although this ignores the whole problem of validating that the key comes from the right person, so you're not having your connection spoofed by an interloper.) And this is great! This is what RSA gives us, but the computations for RSA are slow and the messages you can send are also small.

In practice, RSA was used (regrettably, sometimes still is) to establish a secure connection and perform a key exchange, and then the keys you exchange let you use symmetric key encryption. You probably shouldn't use RSA. Modern alternatives exist that are better, like Curve25519 and other forms of elliptic-curve cryptography.

But for worse, we run into RSA, and it's also a fun historical artifact! It's worth understanding in, and hey, implementing it is just plain fun.

The basics of RSA

RSA is a nicely elegant cryptosystem. Its security is based on the difficulty of factoring the product of large prime numbers, and in its purest form it has no known breaks².However, as mentioned above, depending on how data is encoded, particular uses of it can be broken.

The basic operations of it are straightforward to express. There are three components:

1. Generating keys
2. Encrypting and decrypting!
3. Encoding messages

We'll go through each of those, starting with generating keys.

Generating your keys

First of all, what even is a key? We know that it's used to encrypt or decrypt a message, but what is inside it?

For RSA, a key comprises two numbers. One of these is called the exponent and one is the modulus. A key could be (exp=3, mod=3233), for example. It's really just this pair of numbers³.

The reason the pieces of it are called the exponent and modulus is because of how we use them! RSA relies onmodular arithmetic (like clock math, if you're not familiar). These are the exponents and modulus for the encryption or decryption operations which we'll see later.

To generate a key, you follow a short procedure.

1. First, pick two prime numbers which we'll call p and q. Then we compute n = p * q.
2. Compute a number t = lcm(p-1, q-1). This is the totient, and we use this as our modulus for generating the keys but then never again.
3. Pick the public exponent, which we'll call e. The requirement is that it shares no factors with t and is greater than 2. One simple way is to start with 3, but go up through the primes until you find one coprime with t. Choosing 65537 is also quite common, since it's small enough to be efficient for encryption but large enough to avoid some particular attacks.
4. Calculate the private exponent, which we'll call d. We compute this as d = e^-1 mod t, or the inverse of e in our modulus.

Now you haved and e, the private and public exponents, and you have n, the modulus. Bundle those up into two tuples and you have your keys!

Let's work an example quickly to see how it ends up. For our primes, we can choose p = 17 and q = 29. So then n = 493.

Now we find t = lcm(17 - 1, 29 - 1) = lcm(16, 28) = 112 . We'll choose e = 3, which works since 2 < 3 and gcd(3, 112) = 1 so we know they share no factors. Now we compute ₄ d = e^-1 = 3^-1 = 75 mod 112. And then we have our keys!

Our public key is (exp=3, mod=493), and our private key is (exp=75, mod=493). We'll use these again in our examples on encrypting and decrypting!

Extra:

Extra_End

Tags: RSA, symmetric key encryption, public-key cryptography

Welcome to Lil'Log GitHub Link

Hi, this is Lilian. I'm documenting my learning notes in this blog. Besides, I'm leading a team working on practical AI safety and alignment at OpenAI. Based on the number of grammar mistakes in my posts, you can tell how much ChatGPT is involved 😉.

Author of Blog:Lilian

Followers on GitHub: 7k

Extra:

4. Some Math behind Neural Tangent Kernel
Neural networks are well known to be over-parameterized and can often easily fit data with near-zero training loss with decent generalization performance on test dataset. Although all these parameters are initialized at random, the optimization process can consistently lead to similarly good outcomes. And this is true even when the number of model parameters exceeds the number of training data points.
3. Learning Word Embedding
2. Predict Stock Prices Using RNN: Part 2
Link to GitHub: lilianweng/stock-rnn
Stars: 1.7k
Predict stock market prices using RNN

1. Predict Stock Prices Using RNN: Part 1

One thing I would like to emphasize that because my motivation for writing this post is more on demonstrating how to build and train an RNN model in Tensorflow and less on solve the stock prediction problem, I didn’t try hard on improving the prediction outcomes. You are more than welcome to take my code as a reference point and add more stock prediction related ideas to improve it. Enjoy!

Predict stock market prices using RNN

One thing I would like to emphasize that because my motivation is more on demonstrating how to build and train an RNN model in Tensorflow and less on solve the stock prediction problem, I didn't try too hard on improving the prediction outcomes. You are more than welcome to take this repo as a reference point and add more stock prediction related ideas to improve it. Enjoy.

Check my blog post "Predict Stock Prices Using RNN": Part 1 and Part 2 for the tutorial associated.

1. Make sure tensorflow has been installed.
2. First download the full S&P 500 data from Yahoo! Finance ^GSPC (click the "Historical Data" tab and select the max time period). And save the .csv file to data/SP500.csv.
3. Run python data_fetcher.py to download the prices of individual stocks in S & P 500, each saved to data/{ { stock_abbreviation.csv } }. (NOTE: Google Finance API returns the prices for 4000 days maximum. If you are curious about the data in even early times, try modify data_fetcher.py code to send multiple queries for one stock. Here is the data archive (stock-data-lilianweng.tar.gz) of stock prices I crawled up to Jul, 2017. Please untar this file to replace the "data" folder in the repo for test runs.)
4. Run python main.py --help to check the available command line args.
5. Run python main.py to train the model.

For examples,

Train a model only on SP500.csv; no embedding

python main.py --stock_symbol=SP500 --train --input_size=1 --lstm_size=128 --max_epoch=50

Extra_End

Tags: Predict stock market prices using RNN model with multilayer LSTM cells + optional multi-stock embeddings.

Object Detection for Dummies Part 1: Gradient Vector, HOG, and SS

I've never worked in the field of computer vision and has no idea how the magic could work when an autonomous car is configured to tell apart a stop sign from a pedestrian in a red hat. To motivate myself to look into the maths behind object recognition and detection algorithms, I’m writing a few posts on this topic “Object Detection for Dummies”. This post, part 1, starts with super rudimentary concepts in image processing and a few methods for image segmentation....

Learning Word Embedding

Human vocabulary comes in free text. In order to make a machine learning model understand and process the natural language, we need to transform the free-text words into numeric values. One of the simplest transformation approaches is to do a one-hot encoding in which each distinct word stands for one dimension of the resulting vector and a binary value indicates whether the word presents (1) or not (0). However, one-hot encoding is impractical computationally when dealing with the entire vocabulary, as the representation demands hundreds of thousands of dimensions....

Predict Stock Prices Using RNN: Part 2

In the second part of this tutorial, we will dive deeper into the technical details of building and training an RNN model for stock price prediction using Tensorflow. Make sure to follow along with the code available on github.com/lilianweng/stock-rnn. If you have missed the first part, be sure to check it out to understand the basics of recurrent neural networks and LSTM cells.

Predict Stock Prices Using RNN: Part 1

This is a tutorial for how to build a recurrent neural network using Tensorflow to predict stock market prices. The full working code is available in github.com/lilianweng/stock-rnn. If you don't know what is recurrent neural network or LSTM cell, feel free to check my previous post. One thing I would like to emphasize that because my motivation for writing this post is more on demonstrating how to build and train an RNN model in Tensorflow and less on solve the stock prediction problem, I didn't try hard on improving the prediction outcomes....

Analyzing Data 180,000x Faster with Rust

This note documents one of my recent adventures in performance optimization with Rust. By following along, hopefully you'll learn something about how to write fast Rust.

Here's the context: imagine you have data from an online exam where a set of users answered a set of questions. The raw data looks like this:


           [
            {
              "user": "5ea2c2e3-4dc8-4a5a-93ec-18d3d9197374",
              "question": "7d42b17d-77ff-4e0a-9a4d-354ddd7bbc57",
              "score": 1
            },
            {
              "user": "b7746016-fdbf-4f8a-9f84-05fde7b9c07a",
              "question": "7d42b17d-77ff-4e0a-9a4d-354ddd7bbc57",
              "score": 0
            },  
            /* ... more data ... */
           ]


            PsuedoCode: 
            func k_corrset($data, $k):
                $all_qs = all questions in $data
                for all $k-sized subsets $qs within $all_qs:
                    $us = all users that answered every question in $qs
                    $qs_totals = the total score on $qs of each user in $us
                    $grand_totals = the grand score on $all_qs of each user in $us
                    $r = correlation($qs_totals, $grand_totals)
                return $qs with maximum $r  

            Python Baseline:

                from itertools import combinations
                import pandas as pd
                from pandas import IndexSlice as islice
                
                def k_corrset(data, K):
                    all_qs = data.question.unique()
                    q_to_score = data.set_index(['question', 'user'])
                    all_grand_totals = data.groupby('user').score.sum().rename('grand_total')
                
                    corrs = []
                    for qs in combinations(all_qs, K):
                        qs_data = q_to_score.loc[islice[qs,:],:].swaplevel()
                        answered_all = qs_data.groupby(level=[0]).size() == K
                        answered_all = answered_all[answered_all].index
                        qs_totals = qs_data.loc[islice[answered_all,:]] /
                            .groupby(level=[0]).sum().rename(columns={'score': 'qs'})
                        r = qs_totals.join(all_grand_totals).corr().qs.grand_total
                        corrs.append({'qs': qs, 'r': r})
                    corrs = pd.DataFrame(corrs)
                
                    return corrs.sort_values('r', ascending=False).iloc[0].qs
                
                data = pd.read_json('scores.json')
                print(k_corrset(data, K=5))

Note that each user only answered a subset of all possible questions, and all scores are either 0 or 1.

Here's the problem: given a size k, which set of k questions has the highest correlation with overall performance? We'll call this the k-CorrSet problem. A simple brute-force algorithm for solving the k-CorrSet problem looks like this pseudocode:

We are going to implement several variations on this algorithm to see how fast we can make it.

Can you explain in depth on this?

The algorithm iterates through all possible k-sized subsets of questions, calculates the correlation between the scores on that subset and the grand scores on all questions, and returns the subset with the maximum correlation.

ChatGPT:

Certainly! Let's break down the key steps of the algorithm that iterates through all possible k-sized subsets of questions, calculates the correlation, and returns the subset with the maximum correlation.

Here's a more detailed explanation:

1. Generate all k-sized subsets of questions:
The algorithm starts by obtaining all possible questions from the given dataset. Then, it iterates through all k-sized subsets of these questions. For example, if k is 3, it would consider all possible combinations of 3 questions out of the total set of questions.
2. Identify users who answered every question in the subset:
For each k-sized subset of questions, the algorithm identifies all users who answered every question in that subset. This is important because the correlation calculation requires data from users who answered all questions in the subset.
3. Calculate total scores on the subset for each user:
Once the algorithm has the subset of questions and the corresponding users, it calculates the total score on that subset for each user. It sums up the scores of each user for the questions in the subset.
4. Calculate the grand total scores on all questions for each user:
Simultaneously, the algorithm calculates the grand total scores on all questions for each user. This involves summing up the scores of each user for all questions, not just the subset.
5. Calculate the correlation between subset scores and grand total scores:
With the total scores on the subset and the grand total scores for each user, the algorithm calculates the correlation between these two sets of scores. The correlation coefficient is a measure of the linear relationship between two sets of data. It ranges from -1 to 1, where -1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 indicates no correlation.
The correlation calculation could use statistical methods such as Pearson correlation coefficient, which measures the linear correlation between two variables. The formula for Pearson correlation is:
𝑟 = ∑ (𝑥𝑖 − 𝑥¯)(𝑦𝑖 − 𝑦¯ ) / √ ∑ (𝑥𝑖 − 𝑥¯ )^2 ∑ (𝑦 − 𝑦¯ )^2
Where:
𝑥𝑖xi and 𝑦𝑖yi are the individual scores of the users on the subset and grand total, respectively.
𝑥¯xˉ and 𝑦¯yˉ are the means of the scores on the subset and grand total, respectively.
6. Return the subset with the maximum correlation:
After calculating the correlation for each k-sized subset, the algorithm identifies the subset with the maximum correlation coefficient and returns it as the result. This subset is considered to have the highest correlation with overall performance.

Note:

The described algorithm is a brute-force approach, meaning it considers all possible subsets and calculates correlations for each. While it provides a straightforward solution to the k-CorrSet problem, it might be computationally expensive for large datasets. The author's exploration in Rust likely involves optimizing this algorithm or exploring alternative approaches to improve its efficiency.

Extra:

Stars: 42

Link to the article:

Analyzing Data 180,000x Faster with Rust

Extra_End

Tags: Performance optimization with Rust

osanseviero followed Christopher Fleetwood

I am a Machine Learning Engineer at HuggingFace in London, UK.

An exploration of fMRI timeseries similarity metrics

In order to perform classification on a functional brain scan it first undergoes many preprocessing steps. One of these steps is the transformation from the timeseries output of the fMRI scan, transforming a 𝑚 × 𝑛 m×n (where 𝑚 m is the number of timepoints recorded and 𝑛 n is the number of brain regions used) to an 𝑛 × 𝑛 n×n matrix of similiarity values (called a connectome). This similarity value is a measure of the neural synchronization between the 2 regions.

So how do we quantify the similarity of 2 different timeseries? This blog post will explore the common ways of quantifying time series similarity in a neuroscientific setting. Before we get into the actual methods used to calculate time series similarity, we need to cover the corner stone of almost all of the methods we are about to explore - covariance.

Covariance

Covariance is simply a measure of how two random variables change together. Below is the formula for calculating covariance:

Σ=E[(X−E[X])(X−E[X]) T ]

In the case of fMRI, we have a multivariate random variable, allowing us to use Maximum Likelihood Estimation to estimate the covariance matrix. Below is a toy example of our estimated covariance matrix.

1. Covariance is bound between − ∞ −∞ and ∞ ∞ making it less suitable for downstream classifiers.
2. Covariance coefficients are not standardized and cannot be used to quantify the strength of the relationship.

3. Symmetric Positive semi-definite (SPD) matricies do not naturally form a Euclidean space (this will be important later).

4. When the number of features is large relative to the number of observations, the sample/empirical covariance matrix has excessive estimation error.

Many of the following approaches will aim to address some or all of the above drawbacks with empirical covariance.

To address the excessive estimation error, it is common to perform a transformation to the covariance coefficients, known as "shrinkage". In their seminal paper, "Honey, I shrunk the Sample Covariance Matrix" [1], Ledoit & Wolf proposed using shrinkage to regularize the sample covariance matrix. "Shrinkage" as the name implies, pulls the most extreme covariance coefficients towards more central values. This not only resolves our excessive estimation error, it can also make the matrix easily invertable by encouraging numerical stability. (The interested reader should consult the scikit-learn docs [2] )

Now that we have a well conditioned covariance matrix, we can attempt to address some of the other identified drawbacks.

Canonical Approaches

Pearson's correlation coefficient (Pearson's 𝑅 R) or simply correlation, is the most commonly used method to quantify similarity between 2 fMRI timeseries. Correlation is a linear metric computed from the covariance of the 2 timeseries. Below is the mathematical formula to compute correlation for a pair of random variables:

Correlation is widely used in neuroscience as it has long statistical history and is bound between -1 and 1. However, correlation does have some disadvantages. The below figure should demonstrate one clearly:

Correlation Decay

Due to correlations linear nature, the same timeseries being slightly out of phase causes a huge decrease in the correlation value. Additionally, correlation provides no distinction between whether 2 regions are directly connected or indirectly connected via another region. To account for this, we can use partial correlation!

Partial correlation is a variant of PCC that attempts to address distinguishing between direct and indirect connections. This is done by computing correlation between regions after regressing all other timeseries. Partial correlation is computed from the inverse of the covariance matrix (this is where the shrinkage comes in handy), also known as the precision matrix. Below is the mathematical formula for computing partial correlation for a pair of random variables:

Extra:

Author of Blog:Chris

1. Layer Normalization as fast as possible. Details: One pass algorithm, Two-pass algorithm, Welford's algorithm, and Implementing Welford's Algorithm
2. Running LLMs with Browser in Rust

3. Rust based ML model analyzer

4. An exploration of fMRI timeseries similarity metrics

Extra_End

Tags: Cryptographic Best Practices

Author: Tim Dettmers

Understanding Convolution in Deep Learning

There are already some blog post regarding convolution in deep learning, but I found all of them highly confusing with unnecessary mathematical details that do not further the understanding in any meaningful way. This blog post will also have many mathematical details, but I will approach them from a conceptual point of view where I represent the underlying mathematics with images everybody should be able to understand. The first part of this blog post is aimed at anybody who wants to understand the general concept of convolution and convolutional nets in deep learning. The second part of this blog post includes advanced concepts and is aimed to further and enhance the understanding of convolution for deep learning researchers and specialists.

What is convolution?

You can imagine convolution as the mixing of information. Imagine two buckets full of information which are poured into one single bucket and then mixed according to a specific rule. Each bucket of information has its own recipe, which describes how the information in one bucket mixes with the other. So convolution is an orderly procedure where two sources of information are intertwined.

Convolution can also be described mathematically, in fact, it is a mathematical operation like addition, multiplication or a derivative, and while this operation is complex in itself, it can be very useful to simplify even more complex equations. Convolutions are heavily used in physics and engineering to simplify such complex equations and in the second part — after a short mathematical development of convolution — we will relate and integrate ideas between these fields of science and deep learning to gain a deeper understanding of convolution. But for now we will look at convolution from a practical perspective.

How do we apply convolution to images?

When we apply convolution to images, we apply it in two dimensions — that is the width and height of the image. We mix two buckets of information: The first bucket is the input image, which has a total of three matrices of pixels — one matrix each for the red, blue and green color channels; a pixel consists of an integer value between 0 and 255 in each color channel. The second bucket is the convolution kernel, a single matrix of floating point numbers where the pattern and the size of the numbers can be thought of as a recipe for how to intertwine the input image with the kernel in the convolution operation. The output of the kernel is the altered image which is often called a feature map in deep learning. There will be one feature map for every color channel.

Convolution of an image with an edge detector convolution kernel. Sources:

We now perform the actual intertwining of these two pieces of information through convolution. One way to apply convolution is to take an image patch from the input image of the size of the kernel — here we have a 100×100 image, and a 3×3 kernel, so we would take 3×3 patches — and then do an element wise multiplication with the image patch and convolution kernel. The sum of this multiplication then results in one pixel of the feature map. After one pixel of the feature map has been computed, the center of the image patch extractor slides one pixel into another direction, and repeats this computation. The computation ends when all pixels of the feature map have been computed this way. This procedure is illustrated for one image patch in the following gif.

As you can see there is also a normalization procedure where the output value is normalized by the size of the kernel (9); this is to ensure that the total intensity of the picture and the feature map stays the same.

Extra:

Extra_End

Tags: Understanding Convolution in Deep Learning

One interesting Blog Post by the Author: Brandon Amos, Research Scientist, Meta, AI (FAIR), bda@meta.com

Image Completion with Deep Learning in TensorFlow

Content-aware fill is a powerful tool designers and photographers use to fill in unwanted or missing parts of images. Image completion and inpainting are closely related technologies used to fill in missing or corrupted parts of images. There are many ways to do content-aware fill, image completion, and inpainting. In this blog post, I present Raymond Yeh and Chen Chen et al.s paper “Semantic Image Inpainting with Perceptual and Contextual Losses,” which was just posted on arXiv on July 26, 2016. This paper shows how to use deep learning for image completion with a DCGAN. This blog post is meant for a general technical audience with some deeper portions for people with a machine learning background. I've added [ML-Heavy] tags to sections to indicate that the section can be skipped if you don't want too many details. We will only look at the constrained case of completing missing pixels from images of faces. I have released all of the TensorFlowsource code behind this post on GitHub atbamos/dcgan-completion.tensorflow.

We'll first interpret images as being samples from a probability distribution.
This interpretation lets us learn how to generate fake images.
Then we'll find the best fake image for completion.

Step 1: Interpreting images as samples from a probability distribution

How would you fill in the missing information?

In the examples above, imagine you're building a system to fill in the missing pieces.How would you do it? How do you think the human brain does it? What kind of information would you use?

In this post we will focus on two types of information:

Contextual information: You can infer what missing pixels are based on information provided by surrounding pixels.
Perceptual information: You interpret the filled in portions as being “normal,” like from what you've seen in real life or from other pictures.

Both of these are important. Without contextual information, how do you know what to fill in? Without perceptual information, there are many valid completions for a context. Something that looks “normal” to a machine learning system might not look normal to humans.

It would be nice to have an exact, intuitive algorithm that captures both of these properties that says step-by-step how to complete an image. Creating such an algorithm may be possible for specific cases, but in general, nobody knows how. Today's best approaches use statistics and machine learning to learn an approximate technique.

But where does statistics fit in? These are images.

To motivate the problem, let's start by looking at a probability distribution that is well-understood and can be represented concisely in closed form: a normal distribution. Here's the probability density function (PDF) for a normal distribution. You can interpret the PDF as going over the input space horizontally with the vertical axis showing the probability that some value occurs. (If you're interested, the code to create these plots is available at bamos/dcgan-completion.tensorflow:simple-distributions.py.)

PDF for a normal distribution.

Let's sample from the distribution to get some data. Make sure you understand the connection between the PDF and the samples.

[ML-Heavy] Generative Adversarial Net (GAN) building blocks

Samples from a normal distribution.

These ideas started with Ian Goodfellow et al.'s landmark paper “Generative Adversarial Nets” (GANs), published at the Neural Information Processing Systems (NIPS)conference in 2014. The idea is that we define a simple, well-known distribution and represent it as For the rest of this post, well use ... as a uniform distribution between -1 and 1 (inclusively). We represent sampling a number from this distribution as ... line of Python with numpy:

z = np.random.uniform(-1, 1, 5)array([ 0.77356483,  0.95258473, -0.18345086,  0.69224724, -0.34718733])

z = np.random.uniform(-1, 1, 5)
            array([ 0.77356483,  0.95258473, -0.18345086,  0.69224724, -0.34718733])

Illustration of a convolution from the input (blue) to output (green). This image is from vdumoulin/conv_arithmetic.

Next, suppose that you have a 3x3 input. Our goal is to upsample so that the output is larger. You can interpret a fractionally-strided convolution as expanding the pixels so that there are zeros in-between the pixels. Then the convolution over this expanded space will result in a larger output space. Here, it's 5x5.

Illustration of a fractionally-strided convolution from the input (blue) to output (green). This image is from vdumoulin/conv_arithmetic.

Extra:

Extra_End

Tags: Image Completion with Deep Learning in TensorFlow

One interesting Blog Post by the Author: PHILSCHMID

Getting started with CNNs by calculating LeNet-Layer manually

The idea of CNNs is intelligently adapt to the properties of images by reducing the dimension. To achieve this convolutional layer and pooling layer are used. Convolutional layers are reducing the dimensions by adding filters (kernel windows) to the Input. The dimension can reduce by applying kernel windows to calculate new outputs. Assuming the input shape is 'expression' and the kernel window ist 'expression' then the output shape will be.

Pooling Layers are reducing the dimension by aggregating the input elements. Assuming the input shape is 'expression' and the pooling method is average with a kernel window of 'expression' then the output shape will be

Example CNNs Architecture LeNet-5

lenet-5-architecture

This is the architecture of LeNet-5 created by Yann LeCun in 1998 and widely used for written digits recognition (MNIST).

To understand what is happening in each layer we have to clarify a few basics. Let's start with Stride and Padding

You can define padding as adding extra pixels as filler around the original input to decrease the erasion of information.

Example of adding p (1x1) to an input. If we add padding to our input the formula for calculating the output changes to

Extra:

Extra_End

Tags: Getting started with CNNs by calculating LeNet-Layer manually

Sylvain Gugger GitHub: 3.2k Followers

A Neural Net In Pytorch

Accordion

CVXR: An R Package for Disciplined Convex Optimization

Link: CVXR: An R Package for Disciplined Convex Optimization

by Anqi Fu, Balasubramanian Narasimhan, Stephen Boyd

At long last, we are pleased to announce the release of CVXR!

First introduced at useR! 2016, CVXR is an R package that provides an object-oriented language for convex optimization, similar to CVX, CVXPY, YALMIP, and Convex.jl. It allows the user to formulate convex optimization problems in a natural mathematical syntax, then automatically verifies the problem's convexity with disciplined convex programming (DCP) and converts it into the appropriate form for a specific solver. This makes CVXR ideal for rapidly prototyping new statistical models. More information is available at the official site.

This is the first of a series of blog posts about CVXR. In this post, we will introduce the semantics of our package and dive into a simple example, which gives users an idea of CVXR's power.

Convex Optimization

A convex optimization problem has the form

\begin{array}{ll} \underset{v}{minimize} & f_{0} (v) \\ subject to & f_{i} (v) \leq 0, i = 1, \dots, M \\ g_{i} (v) = 0, i = 1, \dots, P \end{array}

v : (f_{0} f_{1} ... f_{m}); (g_{1} ... g_{p}) where v is the variable, f0, f1, ..., fm are convex, and g1, ..., gp are affine.

In CVXR, variables, expressions, objectives, and constraints are all represented by S4 objects. Users define a problem by combining constants and variables with a library of basic functions (atoms) provided by the package. When solve() is called, CVXR converts the S4 object into a standard form, sends it to the user-specified solver, and retrieves the results. Let's see an example of this in action.

Ordinary Least Squares (OLS)

We begin by generating data for an ordinary least squares problem.


set.seed(1)
s <- 0
n <- 10
m <- 300
mu <- rep(0, 9)
Sigma <- cbind(c(1.6484, -0.2096, -0.0771, -0.4088, 0.0678, -0.6337, 0.9720, -1.2158, -1.3219),
               c(-0.2096, 1.9274, 0.7059, 1.3051, 0.4479, 0.7384, -0.6342, 1.4291, -0.4723),
               c(-0.0771, 0.7059, 2.5503, 0.9047, 0.9280, 0.0566, -2.5292, 0.4776, -0.4552),
               c(-0.4088, 1.3051, 0.9047, 2.7638, 0.7607, 1.2465, -1.8116, 2.0076, -0.3377),
               c(0.0678, 0.4479, 0.9280, 0.7607, 3.8453, -0.2098, -2.0078, -0.1715, -0.3952),
               c(-0.6337, 0.7384, 0.0566, 1.2465, -0.2098, 2.0432, -1.0666,  1.7536, -0.1845),
               c(0.9720, -0.6342, -2.5292, -1.8116, -2.0078, -1.0666, 4.0882,  -1.3587, 0.7287),
               c(-1.2158, 1.4291, 0.4776, 2.0076, -0.1715, 1.7536, -1.3587, 2.8789, 0.4094),
               c(-1.3219, -0.4723, -0.4552, -0.3377, -0.3952, -0.1845, 0.7287, 0.4094, 4.8406))
X <- mvrnorm(m, mu, Sigma)
X <- cbind(rep(1, m), X)
trueBeta <- c(0, 0.8, 0, 1, 0.2, 0, 0.4, 1, 0, 0.7)
y <- X %*% trueBeta + rnorm(m, 0, s)

Here, n is the number of predictors, y is the response, and X is the matrix of predictors. In CVXR, we first instantiate the optimization variable.

Information here

https://amaarora.github.io/ Aman Arora

Adam and friends

Adam, SGD, RMSProp from scratch in PyTorch.

4.1 What is Stochastic Gradient Descent?

For an intuitive understanding, refer fig-2 below:

fig-2 Gradient Descent

Let's say we are standing at a certain point A of a parabolic hill as shown in fig-2 and we wish to find the lowest point on this curve. Can you think of some ways to do this? Well, we could try going in a random direction, calculate the value of the function and if it's lower than the previous value, we could take a step in that direction. But this process is slow. With some mathematical magic, we can make this process faster. In fact, the fastest way down a function or the steepest way down the hill is the one in the opposite direction of the gradient. Gradient at point A is the slope of the parabolic function, and by calculating the gradients, we can find the steepest direction in which to move to minimize the value of the function. This is referred to as Gradient Descent. Of course in a high dimensional space, calculating the gradients is a little bit more complicated than in fig-2 but the idea remains the same. We take a step from point A directed by the gradients to follow the steepest path downwards to point B to find the lowest value of the curve. The step-size is governed by a parameter called learning rate. The new position B then can be defined as B = A - lr * A.grad where A.grad represents the slope/gradients of the curve at point A.

The stochasticity in Stochastic Gradient Descent arises when we compute the batch gradients. This has been explained below through pseudo-code in Vanilla Stochastic Gradient Descent.

From the Introduction to SGD by Jeremy Howard, and from fig-2, we already know that to perform Gradient Descent, we need to be able to calculate the gradients of some function that we wish to minimise with respect to the parameters.

We don't need to manually calculate the gradients and as mentioned in this video by Jeremy, PyTorch can already do this for us using torch.autorgrad.

So now that we know that we can compute the gradients, the procedure of repeatedly evaluating the gradient and then performing a parameter update is called Gradient Descent. Its vanilla version looks as follows:

Extra_End

Tags: COMPUTER VISION

Adam and friends

Adam, SGD, RMSProp from scratch in PyTorch.

Computer Vision

Basic optimizers from scratch in PyTorch with working notebook.

Mar 13, 2021

Adam, SGD, RMSProp from scratch in PyTorch.

Adam and friends

Computer Vision

Basic optimizers from scratch in PyTorch with working notebook.

Mar 13, 2021

An Image is Worth 16x16 Words - Transformers for Image Recognition at Scale

Vision Transformer

Computer Vision

Model Architecture

Transformers

In this blog post, we will be looking at the Vision Transformer architectures in detail, and also re-implement in PyTorch from scratch.

Jan 18, 2021

U-Net Convolutional Networks for Biomedical Image Segmentation

U-Net A PyTorch Implementation in 60 lines of Code

Computer Vision

Model Architecture

Image Segmentation

As part of this blog post we will implement the U-Net architecture in PyTorch in 60 lines of code.

Aug 30, 2020

Better language models and their implications

Label Smoothing Explained using Microsoft Excel

Computer Vision

In this blogpost, we re-implement Label Smoothing in Microsoft Excel step by step.

Jul 18, 2020

Lernapparat

Maschinelles Lernen Lernen

More memory-efficiency: conversion of weights for large models

June 29, 2023

Working with large language models has us wanting to make the most on invariably limited hardware. Today we consider converting weights and improve on memory efficiency for that.

Memory-efficient loading of model weights

April 18, 2023

I have been intrigued by what computers can do for most of my life, and these days that includes large language models (LLMs) running on my own computer. It seems that I am not the only one. But one thing about these LLMs is that they are not kidding when they say large. The amounts of GPU and CPU memory to run them is huge. And, to be perfectly honest, I have a hunch that PyTorch, my favourite modelling library, is not always as economical with memory as it could be.

Making model initialization faster

March 4, 2023

Recently, largeish pretrained models are the rage. But these take long to init. What can we do?

A PyTorch videocourse

Sept. 21, 2022

Earlier this year, I started recording a PyTorch course with the idea to finally offer a PyTorch 101 certificate for those interested. Today, I'm giving you the first lecture video and notebook for free.

What does it mean that Deep Learning is Non-Parametric?

March 19, 2022

Most days, I write here about getting my hands dirty with code and keep mathematical insights for my friends and clients. Today, we look at a very high, almost philosophical level, of deep learning instead and discuss a small piece of intuition about deep learning.