Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Doc: AWS Messaging Framework #42

Open
ashovlin opened this issue Oct 5, 2022 · 37 comments
Open

Design Doc: AWS Messaging Framework #42

ashovlin opened this issue Oct 5, 2022 · 37 comments
Labels
feature-request New feature or enhancement. May require GitHub community feedback. needs-discussion

Comments

@ashovlin
Copy link
Member

ashovlin commented Oct 5, 2022

The AWS .NET team is exploring creating an AWS native framework that simplifies development of .NET message processing applications using AWS services.

The design doc can be viewed and commented on in PR #41 (rendered view).

The purpose of the framework would be to reduce the amount of boiler-plate code developers need to write. The primary responsibilities of the proposed framework are:

  • Handling the message routing - In a publisher, the framework will handle routing the messages to the correct queue/topic/eventbus. In a consumer process, it will route the particular message type to the appropriate business logic.
  • Handling the overall message lifecycle - The framework will handle serializing/deserializing the message to .NET objects, keeping track of the message visibility while it is being processed, and deleting the message when completed.

Here is an example showing a sample publisher and handler for a hypothetical OrderInfo message.

Sample publisher:

[ApiController]
[Route("[controller]")]
public class OrderController : ControllerBase
{
    // See later in the design for how this was configured and mapped to the queue
    private readonly IMessagePublisher _publisher;

    public OrderController(IMessagePublisher publisher)
    {
        _publisher = publisher;
    }

    [HttpPost]
    public async Task Post([FromBody] OrderInfo orderInfo)
    {
        // Add internal metadata to the OrderInfo object 
        // we received, or any other business logic
        orderInfo.OrderTime = DateTime.UtcNow;
        orderInfo.OrderStatus = OrderStatus.Recieved;

        // The updated OrderInfo object will also be serialized as the SQS message
        await _publisher.PublishAsync(orderInfo);
    }
}

Sample handler:

// See later in the design for how this was configured and mapped to the queue
public class OrderInfoHandler : IMessageHandler<OrderInfo>
{
    public async Task<MessageStatus> HandleAsync(MessageEnvelope<OrderReceived> message, CancellationToken cancellationToken = default(CancellationToken))
    {
        // Here we're reading from the message within the metadata envelope
        var productId = message.Message.ProductId;
        
        // Here we can do our business logic based on what is in the message
        await UpdateInventory(productId);
        await PrintShippingLabel(productId, message.Message.CustomerId);

        // Indicate that OrderInfo has been processed successfully
        return MessageStatus.Success;
    }
}

On either this issue or on specific section(s) of the design in the PR #41, please comment with:

  • What are your thoughts and feedback on the proposed framework design?
  • What are your thoughts on the initial MVP scope?
  • Which AWS messaging services do you use?
    • Amazon SQS
    • Amazon SNS
    • Amazon EventBridge
    • Amazon Kinesis
    • Amazon MQ
    • Other (specify which ones)
@ashovlin ashovlin added feature-request New feature or enhancement. May require GitHub community feedback. needs-discussion labels Oct 5, 2022
@ashovlin ashovlin pinned this issue Oct 5, 2022
@Kralizek
Copy link

Kralizek commented Oct 5, 2022

Would it be possible to add a way to read/write message headers so that we can keep the messages relevant to the business logic and use the headers for anything that it's not BL?

@ashovlin
Copy link
Member Author

ashovlin commented Oct 5, 2022

Would it be possible to add a way to read/write message headers so that we can keep the messages relevant to the business logic and use the headers for anything that it's not BL?

@Kralizek - I'm curious, do you have an example of a piece of data that you'd want to append to the headers but not in the actual message?

But I do think it would be possible to append either arbitrary keys or perhaps a dedicated Metadata or Headers section in the message envelope that would be available to publishers/subscribers outside of the .NET type that is serialized as the message.

@Kralizek
Copy link

Kralizek commented Oct 6, 2022

I'm curious, do you have an example of a piece of data that you'd want to append to the headers but not in the actual message?

Things like: correlation, session and/or transaction ID, authentication tokens, trace headers

@commonsensesoftware
Copy link

Is there some reason not to use Dapr? Its entire purpose is to provide common messaging infrastructure in a cloud vendor-neutral manner. AWS can already be leveraged for the cloud services backing it. Improving that story might be better than starting anew. Furthermore, there are Dapr SDKs for .NET, Python, Java, JavaScript, Go, and PHP.

@normj
Copy link
Member

normj commented Oct 11, 2022

@commonsensesoftware Dapr is a good choice for users aiming to have a vendor-neutral solution. It does come with extra complexity running a side car container and since it is vendor-neutral it ignores some of the vendor specific features like FIFO queues for SQS and message attributes.

What we want to create is a library that is more lightweight and can be easily used in any compute environment whether that is virtual machines, containers or Lambda by just including a NuGet package. It won't be the right pick for users wanting a vendor-neutral solutions but things like Dapr and MassTransit are great for those requirements. From our research a large percentage of users are working with SQS and SNS without those vendor-neutral abstractions and end up creating their own lightweight abstraction. We want help remove that undifferentiating work we see being done.

@jeffhollan
Copy link

jeffhollan commented Oct 11, 2022

I agree Dapr has some additional overhead (sidecar) and even language-agnostic for abstracting these. I'm all for this, would be interested if after "proving" with AWS it could be proposed as a general messaging abstraction for any cloud messaging service (Google PubSub, Azure Service Bus, etc) in .NET as think this pattern could be useful across cloud services. But don't want the team to have to get all clouds to align before doing anything, so I think starting with this with an eye towards a possible contribution / proof point upstream would be cool

@adamhathcock
Copy link

I use MassTransit with SQS/SNS. There's a lot of configuration and code around it. Having a similar but more AWS native experience would be nice.

I don't use a lot of the more robust features of MassTransit (Sagas, Outbox, etc) so I could switch if there's a good foundation for SQS/SNS. Also, Kinesis support would be great.

Haven't looked into Eventbridge but I probably should if there's a framework around it.

@jonnermut
Copy link

jonnermut commented Oct 11, 2022

I think this looks like a good initiative that would make SQS/SNS for messaging more approachable.

We have built something similar for our specific messaging architecture:

  • A message pump to pull from SQS (which are subscribed to SNS topics)
  • Routing to spawn a Hangfire task based on the type of the event. By sending all events to Hangfire we get buffering, concurrency control, visibility/observability, retries etc. We did need to impose a throttle on the message pump to make sure we didn't overwhelm Hangfire, though, as Hangfire doesn't scale infinitely when you're using Redis in-memory storage.
  • Serialisation/Deserialisation using the https://cloudevents.io standard envelope

We might have used MassTransit, but it's too opinionated, and it imposes its own envelope format, making it only compatible with other MassTransit services. We have a very heterogeneous environment with all sorts of non .Net producers and consumers, which is why we chose CloudEvents as a standards based envelope.

I notice in the PR that you are making up a new message envelope format. Please consider Cloud Events, or making the envelope format and/or deserialisation pluggable somehow.

Similarly please consider making the routing logic as pluggable/overridable/flexible as possible to for instance support our use case of processing messages by Hangfire. Maybe that's just actually a generic IMessageHandler that creates the Hangfire task.

Observability (in the monitoring/logging/tracing sense) is really important!

We need to be able to set message properties when publishing to SNS in order to be able to filter messages at the subscription level.

One last thing to consider for the design - most real services will be running multiple servers/pods. Which is probably fine if all instances are sitting there polling SQS and each one grabs messages from the queue, processes it, and deletes it from the queue. But not fine if you expect every server to get every message. That's not relevant to our use case where we pump messages into Hangfire tasks which are then handled by N hangfire servers, but it's something important to cover off in the design.

@SamuelCox
Copy link

Would like to echo the request for CloudEvents

@normj
Copy link
Member

normj commented Oct 11, 2022

@SamuelCox What are you using CloudWatch Events that EventBridge doesn't do for you. EventBridge is meant to be a superset of CloudWatch Events

@SamuelCox
Copy link

SamuelCox commented Oct 11, 2022

@SamuelCox What are you using CloudWatch Events that EventBridge doesn't do for you. EventBridge is meant to be a superset of CloudWatch Events

I should have been more clear. I was asking for built in support for the cloudevents standard, https://cloudevents.io/, nothing to do with cloudwatch events

@bjorg
Copy link

bjorg commented Oct 11, 2022

I just want to heed caution on AWS Kinesis. Unlike the other services, Kinesis semantics are quite different. It's a stream, which means, no next records can be read until the current one is successfully processed (just like in a file stream). The other message services are not as order sensitive. When all goes well, they may look the same, but when things go wrong, they act very differently.

@bjorg
Copy link

bjorg commented Oct 11, 2022

If SNS is used to delivery to SQS, would this abstraction hide that as well? In short, it would unbox the SQS wrapper and also the SNS wrapper before deserializing?

I assume that batched SQS messages would allow for partial failures, correct? So, if 9 out of 10 batch SQS messages were successfully processed, only the failed one would become visible again (instead of all 10).

@iancooper
Copy link

iancooper commented Oct 11, 2022

So, as the owner of Brighter: https://github.com/BrighterCommand/Brighter which also operates in this space a few thoughts.

  • Welcome to the party!!
  • As we already have Brighter, Mass Transit, NServiceBus, Rebus, JustSaying etc. in this space. Within .NET it is a crowded field. You will gain a lot by being in the AWS SDK when looking at traction, but those projects have acquired a rich set of features over time such as Outbox support. On top of that there are sidecar approaches like Dapr So you will need to think about how you compete.
  • I suspect any "special sauce" would be a lower level of abstraction than we have in Brighter, where we abstract the transports to a more 'generic' view of a queue or stream. The more you have AWS features, the more you can compete against the missing features you will have against other projects.
  • So whilst I get the pitch is around: just write a handler, that is true for all the projects above i.e. Brighter has
[RequestLogging(0, HandlerTiming.Before)]
[UsePolicyAsync(step:1, policy: Policies.Retry.EXPONENTIAL_RETRYPOLICYASYNC)]
public override async Task<AddPerson> HandleAsync(AddPerson addPerson, CancellationToken cancellationToken = default(CancellationToken))
{
	await _uow.Database.InsertAsync<Person>(new Person(addPerson.Name));
	
	return await base.HandleAsync(addPerson, cancellationToken);
}

which is very similar to your pitch, so it's not really a USP for you.

  • The danger I think lies in abstraction over the wide variety of AWS services. If you choose to make access to them all the same, you will probably end up in an "abstract transport" model alongside the rest of us, which would tend to weaken your special sauce argument and open you up to the problem of chasing 'feature parity' with the incumbents. Especially when the incumbents are likely to support non-native AWS transports and outboxes.
  • I suspect some of this lies in less how you handle SQS, where the model is only so different, but handling things like Kinesis where you might be able to expose a stream model in a way that exposed more of the differences between them.

A project like Brighter is only going to go so deep on AWS integration, the real benefit to the .NET ecosystem would be if you went deeper than we could.

@lee-11
Copy link

lee-11 commented Oct 12, 2022

I really hope this isn't another "make the easy stuff easier" endeavor. There are difficult bits that others have mentioned. Implementing messaging systems that "fail well" rather than "fail poorly" is the real challenge.

@normj
Copy link
Member

normj commented Oct 13, 2022

@bjorg

I just want to heed caution on AWS Kinesis. Unlike the other services, Kinesis semantics are quite different. It's a stream, which means, no next records can be read until the current one is successfully processed (just like in a file stream). The other message services are not as order sensitive. When all goes well, they may look the same, but when things go wrong, they act very differently.

I'm skeptical that Kinesis fits well in this library as well with the same concerns as you have about it being a stream but I'm not ready to completely close the door completely.

If SNS is used to delivery to SQS, would this abstraction hide that as well? In short, it would unbox the SQS wrapper and also the SNS wrapper before deserializing?

I assume that batched SQS messages would allow for partial failures, correct? So, if 9 out of 10 batch SQS messages were successfully processed, only the failed one would become visible again (instead of all 10).

Yes the library would take care of unwrapping the SNS envelope. I also want to make sure the serialization/deserialization is extensible so an advanced user could register their own serialization implementation in the DI for the library to use. Yes on handling partial failures.

@normj
Copy link
Member

normj commented Oct 13, 2022

@iancooper I really don't want to compete with all of the great projects out there like Brighter. If anything we should be helping those project where they need AWS help fitting AWS services into those abstractions.

But as you say all those libraries are trying to treat all the message brokers the same. And that is fine for many users. In our conversations with .NET developers using AWS a large percentage of them don't need/want the generic abstraction because they are all in on AWS and want an easier way to use AWS services but not remove any of the capabilities of the service. That is where I see this library fitting in and that is what we are seeing developers implement themselves over and over again.

The Handler code snippets was an easy quick views of the experience but like you said that is I'm sure common across all of these libraries. I think what I should do is take another pass through the design doc to emphasize how you can still get access to the AWS advanced features like handling message group id and dedulication ids for FIFO. Making sure how to pass additional SNS or SQS attributes.

@normj
Copy link
Member

normj commented Oct 13, 2022

@lee-11

I really hope this isn't another "make the easy stuff easier" endeavor. There are difficult bits that others have mentioned. Implementing messaging systems that "fail well" rather than "fail poorly" is the real challenge.

I totally agree that focusing on failing well and fault tolerance is the critical part needed for this library and arguably the biggest amount of work to get right. That is where we could use as much feedback as possible on what people want to happen when things don't go as expected.

@normj
Copy link
Member

normj commented Oct 13, 2022

@SamuelCox That is great feedback about looking into how cloudevents fits into this library. Thanks!

@iancooper
Copy link

@iancooper I really don't want to compete with all of the great projects out there like Brighter. If anything we should be helping those project where they need AWS help fitting AWS services into those abstractions.

I know, and a rich ecosystem of choices is good.

The Handler code snippets was an easy quick views of the experience but like you said that is I'm sure common across all of these libraries. I think what I should do is take another pass through the design doc to emphasize how you can still get access to the AWS advanced features like handling message group id and dedulication ids for FIFO. Making sure how to pass additional SNS or SQS attributes.

Yeah, that is the sweet spot IMO

@dev-ayaz
Copy link

Would it be possible to add a way to read/write message headers so that we can keep the messages relevant to the business logic and use the headers for anything that it's not BL?

You can use message attributes for this purpose

@birojnayak
Copy link

We are doing something similar for Web services which are still using SOAP (HTML Design and open source PR) , this would enable any queue transport (SQS, Amazon MQ, Rabbit MQ , MSMQ etc) with any cloud providers, basically a more generic transport layer (architecture PR for reference). So thought of sharing few challenges which we are solving may be useful what we are building here,

  1. How to maintain ordering of messages, so that when service executing a logic another message from the queue is not picked and processed if FIFO (considering all api/logic execution is async). How should we ensure both from producer side as and consumer side.
  2. How to provide extensibility point back to developers rather than we take the decision, so that they can opt for their own way of serving the notification (SNS, Lambda, CW, Event Bridge etc). Giving them full message visibility and status of business logic execution so that they can decide whether to put into DL queue or not.
  3. How to provide extension so that developers can provide their own encoding and decoding mechanism what goes inside the queue and our logic can honor that. What if they want to send message in chunk ?
  4. There could be one consumer with multiple producers, what can we do to keep context of each producers and make it visible to consumer for full isolation, how to handle security context in those cases.

@iancooper
Copy link

1. How to maintain ordering of messages, so that when service executing a logic another message from the queue is not picked and processed if FIFO (considering all api/logic execution is async). How should we ensure  both from producer side as and consumer side.

Brighter uses a single-threaded message pump (scale via competing consumers and more pumps), and does not recommend an async pipeline when order is important. Not de-ordering is a challenge though if you want to use ideas like an outbox. It can be a challenge for consumers that simply offload a message to a message from the thread pool (don't do that it won't scale unless you can apply backpressure).

2. How to provide extensibility point back to developers  rather than we take the decision, so that they can opt for their own way of serving the notification (SNS, Lambda, CW, Event Bridge etc). Giving them full message visibility and status of business logic execution so that they can decide whether to put into DL queue or not.

We tend to offer specific exceptions that you can throw, some folks don't like use exceptions for that, which would mean you need to use return code values.

3. How to provide extension so that developers can provide their own encoding and decoding mechanism what goes inside the queue and our logic can honor that. What if they want to send message in chunk ?

Brighter (and I think Just Saying) offer you the ability to register a 'mapper' that maps between the wire message body and your internal types. Brighter treats that as a byte array and is agnostic to you wanting to use a given encoding (proto-buf, avro etc). Within the AWS space though you might be more able to assume most folks will use JSON as you are layering over an HTTP API in most contexts.

4. There could be one consumer with multiple producers, what can we do to keep context of each producers and make it visible to consumer for full isolation, how to handle security context in those cases.

Normally you would need to use the headers for this I suspect, though I don't know if I understand exactly what you are asking.

@alexeyzimarev
Copy link

MassTransit already supports RMQ (Amazon MQ) and SQS. Why not support an open-source project with nearly 38 million downloads on NuGet, and build additional transports (or Riders) for it instead of building something completely new?

@noahtrilling
Copy link

noahtrilling commented Nov 15, 2022

I would also like to encourage this team to consider making contributions to MassTransit. It already has wide adoption in the .NET space. Our organization is running MassTransit over RabbitMQ currently. In our organization, I know we are much more likely to adopt a new transport on our existing messaging framework than we are to adopt a new messaging framework and be forced into a new transport. Adding MassTransit support for EventBridge would make me much more likely to encourage its adoption in my organization. Knowing that the AWS .NET developers are actively engaged in contributing and maintaining SNS and SQS features in MassTransit would make me significantly more likely to adopt those services as well.

The design document mentions some 'technical constraints' you'd prefer to avoid in using a third party library. Could you enumerate those? MassTransit's architecture has proven extremely flexible to the addition of new transports and Riders and I'm confident the MassTransit community would happily welcome your contributions.

I believe that many other organizations will feel as I do, that if AWS contributes to vendor neutral frameworks, I'll be much more likely to adopt AWS transports, because AWS will have some skin in the game. If not, I'll stick with both vendor neutral transports AND a vendor neutral framework to avoid locking. I believe AWS and the .NET community will see substantially more return on much more limited investment by encouraging contribution to existing OSS Messaging Frameworks, particularly MassTransit.

@danielmarbach
Copy link

For full transparency I want to mention I work for Particular Software the makers of NServiceBus. That being out of the way I want to share my personal opinion here that comes based on my history of contributing to several "SDKs" and abstractions including NServiceBus and MassTransit.

I believe the AWS team could make a much bigger impact with their limited resources by addressing two things in the current SDK:

  • Make the SDK be more throughput and performance friendly by adopting more modern HTTP pipelines similar to what the Azure SDK revival has done with Azure.Core
  • Provide small "higher level" components that abstract some of the underlying complexity with receiving messages similar to the ServiceBusProcessor and ServiceBusSessionProcessor primitives.

For example the Azure SDK for EventHubs and Service Bus provide "lower level" primitives that require you to manually manage messages/events, settlement, lock renewal and more. But if you don't want to worry about these things you can opt-in using higher level primitives that for example allow setting the concurrency, auto-lock renewal and more and you simply get called by the SDK when a message is available. In this mode, you don't have to worry about proper concurrency, cancellation token handling, behind the scenes renewal threads etc. All that is neatly packaged into a slightly higher level abstraction.

Having something like that available helps developers out there to get started quicker with the services they want to use without interfering too much with already available offerings. In fact, based on my own experience, I can say having those primitives available in the Azure SDK makes things like receiving messages, batching and more implementations for the "abstraction" NServiceBus so much easier.

@mgmccarthy
Copy link

Would it be possible to add a way to read/write message headers so that we can keep the messages relevant to the business logic and use the headers for anything that it's not BL?

NServiceBus (a messaging framework for .NET) has the concept of headers (https://docs.particular.net/nservicebus/messaging/headers). Super useful for handling things like infra, cross cutting concerns, correlation id's, etc.

Having a construct wrapped over to of MessageAttributes would be very useful from an AWS messaging framework.

@embano1
Copy link
Member

embano1 commented Dec 6, 2022

NServiceBus (a messaging framework for .NET) has the concept of headers

Since CloudEvents was also mentioned in this thread (events as a sub-class of messages), just making a plug here for its transport bindings which make heavy use of protocol headers (e.g. HTTP, Kafka, RabbitMQ, etc.) and content-type hints to project metadata and keep business logic (domain objects/payload) free from these, i.e. HTTP body is payload only.

@ashovlin
Copy link
Member Author

ashovlin commented Dec 9, 2022

Hello all, thank you for the feedback so far! We've just pushed a revision to the design document with:

  • Expanded the community library section at the top to clarify the niche we see this framework filling.
  • Multiple commenters suggested CloudEvents, which admittedly wasn't on our radar initially. The JSON Event Format for CloudEvents is very similar to the envelope schema we had originally proposed, so we're now exploring using the CloudEvents format.

@nazarenom
Copy link

Hello, for one of our future projects, we're looking into SQS/SNS for backend interactions. It would be handy to deal with a higher-level framework, like the one proposed in this issue, rather than with the SDKs API.

Is there any news you could share or any ETA?

@normj
Copy link
Member

normj commented Feb 13, 2023

@nazarenom Thanks for your interest. We don't have an ETA right now but please let us know if you have any requests/requirements that are not covered in the design doc. We are just starting the initial code construction so a great time to make sure we have the ground work for future features.

@nazarenom
Copy link

Thanks for the reply, @normj We looked at the design documentation, which looks promising. For the vast majority of the things we plan to do, it's what we need.
Do you have any plan/idea about supporting retries capabilities or outbox-like features similar to what the mentioned community frameworks do?

Thanks again. Regards.

@mgmccarthy
Copy link

Would really love to see a Saga implementation as part of this effort. NServiceBus, MassTransit, etc... all have very robust Saga implementations tied directly into the framework. While I'm aware of AWS Step Functions, they lie outside the bus architecture that is being proposed, and I'm not wild about either the visual designer, or the json that represents the Step Function workflow. Other .NET based service bus's allow you specific a Saga using code.

@jeroenbai
Copy link

Always good to have options and to make AWS services more easy to use.

The way I currently use SQS: AWS Lamda function with API gateway to receive webhooks (just a simple node.js Lambda function), and then use the webhook call headers/payload to publish messages to SQS using AWS.SQS / sendMessage. Then, a .NET / C# application reads messages off the queue and cleans up using AmazonSQSClient / ReceiveMessageRequest / DeleteMessageBatchRequest.

@normj
Copy link
Member

normj commented Apr 19, 2023

We have made our repository where we are doing our development public. The work is still very much in progress.

https://github.com/awslabs/aws-dotnet-messaging

@Kralizek
Copy link

Would it be possible to have an overview of the upcoming features? maybe using GitHub projects?

@normj
Copy link
Member

normj commented Apr 21, 2023

@Kralizek We kind of have a crud version on the README file here. https://github.com/awslabs/aws-dotnet-messaging#project-status

We don't use GitHub projects because it doesn't work well with our sprint planning that covers many public and internal projects. We have tried it a few times and for all of the things our team owns it ends up getting out of date and forgotten.

For features you want us to think about adding I recommend opening a GitHub issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or enhancement. May require GitHub community feedback. needs-discussion
Projects
None yet
Development

No branches or pull requests