because the wholesale destruction and minimization of knowledge, education, and information to appease (often arbitrary) intellectual protectionism laws is sad, regardless of who perpetrates it.
non-Google example : What.cd was a site centered around music piracy, but that potentially illegal market created a huge amount of original labels and music that still exists now in the legal sphere.
No one would defend the legal right for what.cd to continue operating, it was obviously illegal; but the unique, novel, and creative works that came from the existence of this illegal enterprise would be sad to destroy.
Swinging back to the Google example : YouTube systematically destroys creations that they feel (often wrongly) infringe upon IP. This is often not even the case, Google routinely makes wrong decisions erring on the side of the legal team.
This destruction of creative work is sad, in my opinion it's more sad than the un-permitted use of work.
Of course, Google as a corporation should act that way, but it's sad in certain human aspects.
It's not just google as a corporation, it's google as a legal entity.
Have your own site in your own individual name with no corporate entity nor search for profit offering to host people's videos for free, and I guarantee you that within 24h you are dealing with things ranging from pedophilia to copyright violations and the like. And if you don't clear them out, you're the one responsible.
Google is acting the way society has decided they should act through the laws it voted. Could they act another, more expensive, way in order to save a bit more of content that get caught by mistake ? Definitely, but why would they as a company when the laws says any mistake or delay is their fault.
Source: like many people, I once made a free image hosting thingy. It was overrun by pedos within a week to my absolute horror and shock. Copyright infringement is obviously not the same at all, BUT the way the law act toward the host is not that different "ensure there is none and be proactive in cleaning, or else ...".
Your free image hosting thingy is an example of low barrier to entry both in cost and anonymity. If you had made the cost trivial but traceable I wonder what the outcome would have been. I wonder if a site like lobste.rs but for video would work better. A graph of who is posting what and a graph of how they got onto the site in the first place.
If you vouch for someone who is dodgy now you are also seen as a little dodgier than you were before. This doesn't necessarily mean you lose your account because you happened to vouch for someone, but it might mean that your vouching means less in future.
They aren't destroying anything. They are just not allowing the material on their site. Are you saying that anyone who creates a video hosting site must allow ANY content on their site? I don't see any practical basis for that contention.
These are examples of YouTube following copyright laws imperfectly, which is basically guaranteed to happen on a regular basis at their scale. Definitely not what I would consider YouTube redefining copyright.
If channel A uploads a video copied from channel B, then makes a copyright claim against channel B, how does an automated system determine which owns the rights? Certainly it would seem in most cases that we should presume channel B has the copyright, since they uploaded first. But there is a very economically important class of videos where infringers will tend to be the first to upload (movies, TV shows, etc.). I don't really see how an automated system solves this problem without making any mistakes. Especially because the law (DMCA) puts the onus on the service provider to take down or face liability.
What impact does this really have, though? Are they making better VP9 tools available to other people? Browsers already have highly-tuned playback engines and YouTube actively combats efforts to make downloaders or other things which use their videos, is there a path I’m missing where this has much of an impact on the rest of the Internet?
VP9 is meant to be a parallel to h264, and AV1 to h265?
VP9 running on custom circuits being equivalent speed to h264 running on custom circuits seems like a win for VP9? Since VP9 isn't royalty encumbered the way h264 is, that could well be a win for the rest of us too.
> Since VP9 isn't royalty encumbered the way h264 is, that could well be a win for the rest of us too.
I can only repeat myself: "Google creates dedicated custom proprietary processors which can process VP9 at roughly the same speed as a 20-year-old codec". How is this a win for anyone but Google (who is already eyeing to replace VP9 with AV1)?
"The rest of us" are very unlikely to run Google's custom chips. "The rest of us" are much more likely to run this in software, for which, to quote the comment I was originally replying to, "without dedicated processors, VP9's encoding is roughly 4.5x as slow as H.264".
Note: I'm not questioning the codec itself. I'm questioning the reasoning declaring this "a big win for the open format(s)".
You won't get adoption until the word gets around that Big Company X is using Format Y, and they supply content prominently in Format Y. That's when Chinese SoC manufacturers start taking things seriously, add hardware decode blocks to their designs, and adoption just spirals out from there.
> Google probably only provides stats about growth (like "500 hours of video are uploaded to YouTube every minute") because the total number of videos is so large, it's an unknowable amount.
I suppose you could sample random YouTube urls to find out how many of them link to public videos. Given the total number of possible URLs, it would give you an idea of what percentage of them have been used and therefore how many videos YouTube has in total. It would not tell you how many private videos or Google Drive / Photos videos exist of course.
Youtube isn't the only platform where Google does video transcoding. I don't know them all, but here are a few other places where video plays a part:
Meet - I'm guessing for participates that on are different devices (desktop, android, ios) and depending on their bandwidth will get different video feed quality. Though, the real-time nature of this may not work as well? Though, Meet has a live stream feature for when your meeting is over 250 people, which gives you a youtube-like player, so this likely is transcoded.
Duo - more video chat.
Photos - when you watch a photos video stored at google (or share it with someone), it will likely be transcoded.
Video Ads. I'd guess these are all pre-processed for every platform type for effecient delivery. While these are mainly on youtube, they show up on other platforms as well.
Nest cameras. This is a 24-stream of data to the cloud that some people pay to have X days of video saved.
But knowing the exact number can indeed be hard. It would take stopping the entire uploading and deletion activity. Of course they may have counters of uploads and deletions on every node which handles them, but the notion of 'the same instant' is tricky in distributed systems, so the exact number still remains elusive.
I think the Chandy-Lamport snapshot algorithm tries to do something like this for all distributed systems (in their model, and it tries to get any consistent snapshot, not allowing you to specify the "time"); not sure if it's actually useful IRL though
I read "an unknowable amount" as "a meaninglessly large number to our monkey brains".
It's like knowing the distance to the Sun is 93 million miles. The difficulty there isn't that measuring the distance from the Sun to the Earth exactly is hard, although it is, or that the distance is constantly changing, although it is, or that the question is ill-defined, because the Earth is an object 8000 miles across and the Sun is 100 times bigger, and what points are you measuring between?
The distance is "unknowable" because while we know what "93 million miles" means, it's much harder to say we know what it "means". Even when we try to rephrase it to smaller numbers like "it's the distance you could walk in 90 human lifetimes" is still hard to really feel beyond "it's really really far."
Likewise, does it matter if YouTube has 100, 1000, or 10,000 millennia of video content? Does that number have any real meaning beyond back-of-the-envelope calculations of how much storage they're using? Or is "500 years per minute" the most comprehensible number they can give?
Clicked into a couple random videos, looks like all of their video IDs are 11 characters, alphanumeric with cases. So 26+26+10 = 62 choices for each char, 62^11 = 5.2e+19 = 52 quintillion unique IDs (52 million trillions).
So, yeah, sampling would be a mostly futile effort since you're looking to estimate about 8 to 10 decimal digits of precision. Though it's technically still possible since you'd expect about 1 in every 50 million - 5 billion IDs to work (assuming somewhere between a trillion and 10 billion videos).
My statistics knowledge is rusty, but I guess if you could sample, say, 50 billion urls you could actually make a very coarse estimate with a reasonable confidence level. That's a lot but, ignoring rate limits, well within the range of usual web-scale stuff.
Of course. They are using some modulo arithmetic:
1. Start from the rightmost digit (i.e. check digit)
2. Multiply every second digit by 2 (i.e. digit at even positions)
3. If the result in step 2 is more than one digit, add them up (E.g. 12: 1+2 = 3)
4. Add the resulting digits to digits at the odd positions
> Given everyone and their grandma is pushing 128-bit UUID for distributed entity PK, it's interesting to see YouTube keep it short and sweet.
The trade-off you make when using short IDs is that you can't generate them at random. With 128-bit Id, you can't realistically have collisions, but with 64-bit ones, because of the birthday paradox, as soon as you have more than 2^32 elements, you're really likely to have collisions.
Youtube video ids used to be just base64 of a 3DES-encrypted mysql's primary key, a sequential 64-bit int - collisions are of zero concern there. By birthday paradox it's about as good as 128-bit UUID generated without using a centralized component like database's row counter, when you have to care about collisions.
However theft of the encryption key is a concern, since you can't rotate it and it just sat there in the code. Nowadays they do something a bit smarter to ensure ex- employees can't enumerate all unlisted videos.
With random database keys I would think they can just be generated at random by any frontend server running anywhere. Ultimately, a request to insert that key would come to the database - which is the centralized gatekeeper in this design and can accept or reject it. But with replication, sharding, caching even SQL databases scale extremely well. Just avoid expensive operations like joins.
The reason why we want ids to be purely random is so we don't have to do the work of coordinating distributed id generation. But if you don't mind coordinating, then none of this matters.
Surely if it was a great chore for YouTube to have random-looking int64 ids, they would switch to int128. But they haven't.
I'm a big fan of the "works 99.99999999% of the time" mentality, but if anything happens to your PRNGs, you risk countless collisions to slip up by you in production before you realize what happened. It's good to design your identity system in a way that'd catch that, regardless of how unlikely it seems in the abstract.
The concept of hierarchical ids is undervalued. You can have a machine give "namespaces" to others, and they can generate locally and check for collisions locally in a very basic way.
> but if anything happens to your PRNGs, you risk countless collisions to slip up by you in production before you realize what happened.
UUID generation basically has to use a CSPRNG to avoid collisions (or at least a very large-state insecure PRNG).
Because of the low volume simply using /dev/urandom on each node makes the most sense. If /dev/urandom is broken so is your TLS stack and a host of other security-critical things; at that point worrying about video ID collisions seems silly.
Thanks for doing the maths - it does seem the sampling method would not be feasible. Taking the statistic of "500 hours uploaded per minute" and assuming the average video length is 10 minutes, we can say about 1.5bn videos are uploaded to YouTube every year or 15bn every 10 years. So it seems likely that YouTube has less than 1tn videos in total.
If there are N IDs to draw from and M videos on YouTube, then P(ID used) = M/N if the ID is drawn from a uniform distribution, and P(At least one of K IDs used) = 1 - (1 - M/N)^K (not accounting for replacement).
If M ≈ 1e9 and N ≈ 1e18, and you sample K = 1000 URLs, then it's about one in 1e-09 that you hit a used ID.
IDs are 64-bit integers. The number of tries before an event with probability P occurs is a geometric distribution. If V is the number of valid IDs (that have a video), the number of tries is 2^64÷V. Assuming 1 megatries per second, since we can parallelize it, we would find the first video in 20 seconds on average, with a conservative estimate of V = 10^12 (a hundred billion videos).
To have a sample of ~100 videos, it’d take about half an hour.
Formal math & probability is something I'm trained in. that being said, intuitively this sounds off to me...
The youtube ID pool is closer to 7.4e19, not 1.8e19. I'm not a math expert and my probability is quite weak. If you assume generously that 1 trillion IDs have been taken, the percent of IDs that are in use is 1e12/7.4e19, then 0.0000000135% of available IDs have been taken.
Getting a single video at 1 megatry per second would take something more like 2 hours, not 20 seconds.
I've worked on a IPTV broadcasting system and this isn't as obvious as you'd think.
The big issue here is quality - most hardware encoders hugely lag behind software encoders. By quality I mean visual quality per bytes/s. Which means that using a quality software encoder like x264 will save you massive amount of money in bandwidth costs because you can simply go significantly lower in bitrate than you can with a hardware encoding block.
At the time, our comparisons showed that you could get away with as low as 1.2MBps for 720p stream where with an enterprise HW encoder you'd have to do about 2-3MBps to have the same picture quality.
That's one consideration. The other consideration is density - at the time most hardware encoders could do up to about 4 streams per 1U rack unit. Those rack units cost about half the price of a fully loaded 24+ core server. Even GPUs like nVidia at the time could do at most 2 encoding sessions with any kind of performance. On CPU, we could encode 720p on about 2 Xeon cores which means that a fully loaded server box with 36+ cores could easily do 15-20 sessions of SD and HD and we could scale the load as necessary.
And the last consideration was price - all HW encoders were significantly more expensive than buying large core count rack-mount servers. Funny enough, many of those "HW specialised encoding" boxes were running x86 cores internally too so they weren't even more power efficient.
So in the end the calculation was simple - software encodes saved a ton of money on bandwidth, it allowed better quality product because we could deliver high quality video to people with poor internet connectivity, it made procuring hardware simple, it made the solution more scalable and all that at the cost of some power consumption. Easy trade. Of course the computation is a bit different with modern formats like VP9 and H.265/HEVC - the software encoders are still very CPU intensive so it might make sense to buy cards these days.
Of course, we weren't Google and couldn't design and manufacture our own hardware. But seeing the list of codecs YouTube uses, there's also one more consideration: flexibility. HW encoding blocks are usually very limited at what they can do - most of them will do H.264, some of them will stretch to H.265 and maaaaaybe VP9. CPUs will encode into everything. Even when a new format is needed, you just deploy new software, not a whole chip.
My work is in IP cameras so I'm aware of these tradeoffs.
I guess what I didn't expect is that Google could design their own encoder IP to beat the current offerings with a big factor at the task of general video coding. I guessed that Google actually built an ASIC with customised IPs from some other vendor.
Very interesting description. Are you familiar at all with the details of FPGAs for these very same tasks, especially the EV family of Xilinx Zynq Ultrascale+ MPSoC? They include hardened video codec units, but I don't know how they compare quality/performance-wise. Thanks!
I'm afraid I don't have any experience with those devices. Most HW encoders however struggle with one thing - the fact that encoding is very costly when it comes to memory bandwidth.
The most important performance/quality related process in encoding is having the encoder take each block (piece) of previous frame and scan the current frame to see whether it still exists and where it moved. The larger area the codec scans, the more likely it'll find the area where the piece of image moved to. This allows it to write just a motion vector instead of actually encoding image data.
This process is hugely memory bandwidth intensive and most HW encoders severely limit the area each thread can access to keep memory bandwidth costs down and performance up. This is also a fundamental limitation for CUDA/gpGPU encoders, where you're also facing a huge performance loss if there's too much memory accessed by each thread.
Most "realtime" encoders severely limit the macroblock scan area because of how expensive it is - which also makes them significantly less efficient. I don't see FPGAs really solving this issue - I'd bet more on Intel/nVidia encoding blocks paired with copious amount of onboard memory. I heard Ampere nVidia encoding blocks are good (although they can only handle a few streams).
Only relatively recently has NVIDIA implemented B-Frames into NVENC. If I am not mistaken, AMD still does not have this capability. I am not deeply well versed in this space though, but if memory bandwidth is such a huge bottleneck, how does the CPU do it so efficiently comparatively? GPU surely wins in this area? Is it just designed that way so that consumer cards can offer realtime speeds? I'm not sure why this couldn't be configurable in some way.
That is interesting context for this quote from the article:
> "each encoder core can encode 2160p in realtime, up to 60 FPS (frames per second) using three reference frames."
Apparently reference frames are the frames that a codec scans for similarity in the next frame to be encoded. If it really is that expensive to reference a single frame then it puts into perspective how effective this VPU hardware must be to be able to do 3 reference frames of 4K at 60 fps.
> I always thought of reference frames as like the sampling rate, so in that sense, is it how few reference frames can it get away with, without being noticeable?
Actually not quite - "reference frames" means how far back (or forward!) the encoded frame can reference other frames. In plain words, "max reference frames 3" means that frame 5 in a stream can say "here goes block 3 of frame 2" but isn't allowed to say "here goes block 3 of frame 1" because that's out of range.
This has obvious consequences for decoders: they need to have enough memory to keep "reference frames" decoded uncompressed frames around in a chance that a future frame will reference them. It also has consequences for encoders: while they don't have to reference frames far back, it'll increase efficienty if they can reuse the same stored block of image data across as much frames as possible. This of course means that they need to scan more frames for each processed input frames to try to find as much reusable data as possible.
You can easily get away with "1" reference frame (MPEG-2 has this limit for example), but it'll encode same data multiple times, lowering overall efficiency and leaving less space to store detail.
> Would that also depend on the content?
It does depend on the content - in my testing it works best for animated content because the visuals are static for a long time so referencing data from half a second ago makes a lot of sense. It doesn't add a lot for content where there's a lot of scenecuts and actions like a Michael Bay movie combat scene.
I am by no means an expert, and this by no means is indicative of a video compression FPGA, but I've been looking at .GZIP and .PNG accelerators and it seems that while they deliver incredible speed, it is done so at the worst compression ratio you can fit in the compression spec. Equivalent to .GZIP setting 2, or maybe equivalent to a "super fast" or "ultra fast" video preset. It is important to note that these are lossless algorithms though. Still, it may not make sense to utilize if your application is bandwidth sensitive. If 4k Netflix doubled it's bitrate by switching to FPGA solutions, that would probably be too high of a cost, even for a 20x speedup. At least until high quality internet speeds become more universal of course.
At least for Google's case YouTube videos are usually transcoded in idle datacenters (for example locations where the locals are sleeping). This means that the cost of CPU is much lower than a naive estimate. These new accelerators can only be used for transcoding video, the rest of the time they will sit idle (or you will keep them loaded but the regular servers will be idle). This means that the economics are necessarily an obvious win.
Of course if you do enough transcoding that you are buying servers for the job then these start to save money. So I guess someone finally decided that the R&D would likely pay off due to the current combination of cyclical traffic, adjustable load and the cost savings of the accelerator.
The complement of "building its own video-transcoding chips " isn't just software encoding though. Google/Youtube could have already been using hardware encodings, just with generic GPUs or whatever existing hardware.
Intel has had PCIe cards targeted at this market, reusing their own HW encoder, e.g. the VCA2 could do up to 14 real-time 4K transcodes at under 240W, and the upcoming Xe cards would support VP9 encode. (XG310 is similar albeit more targeted at cloud gaming servers)
The specialized hardware in GPUs is targeted at encoding content on the fly. While you could use this to encode a video for later playback it has a couple of drawbacks when it comes to size and quality, namely h264, keyframes, static frame allocations, no multipass encoding, etc. ... This is why video production software that supports GPU encoding usually marks this option as "create a preview, fast!". It's fast but that's it. If you want a good quality/size ratio you would use something like VP9 for example. Because of missing specialized hardware and internals of the codec itself currently this is very slow. Add multipass encoding, something like 4k at 60 frames, adaptive codec bitrates and suddenly encoding a second takes a over two minutes ... the result is the need for specialized hardware.
They had to get special permission from the US government to export TPU's abroad to use in their datacenters. The TPU's fell under ITAR regulations (like many machine learning chips). The US government granted permission, but put some restriction like 'they must always be supervised by an american citizen', which I imagine leads to some very well paid foreign security guard positions for someone with the correct passport...
Read all that on some random government document portal, but can't seem to find it now...
i think they do more general purpose things like downsampling, copyright detection etc which doesn't have globally available custom asics. i think gpus don't do encoding/decoding themselves, they have separate asics built in which do the standardised encodings
In my past experience working with FPGA designers, I was always told that any C-to-H(ardware) tooling was always quicker to develop but often had significant performance implications for the resulting design in that it would consume many more gates and run significantly slower. But, if you have a huge project to undertake and your video codec is only likely to be useful for a few years, you need to get an improvement (any improvement!) as quick as possible and so the tradeoff was likely worth it for Google.
Or possibly the C-to-H tooling has gotten significantly better recently? Anyone aware of what the state of the art is now with this to shed some light on it?
It has not, and the type of design they show in the paper has a lot of room to improve (FIFOs everywhere, inefficient blocks, etc). However, video transcoding is suited to that approach since the operations you do are so wide that you can't avoid a speedup compared to software.
Imagine someone using a 10 year old computer to upload a 1 hour video. not only do they need to transcode to multiple different resolutions, but also codecs. This would not practical from a business / client relationship. They want their client (the uploader) to spend as little time as possible and get their videos as quickly as possible.
Though that being said, it would be great to be like hey google, ill do the conversions for you! but then they would have to trust that the bitrate isnt too high / not going to crash their servers etc.etc.etc.
Can't trust user input, you'd have to spend quite a bit of energy just checking to see if it's good. You also want to transcode multiple resolutions, it'd end up being quite slow if it's done using JS.
Must is a strong word. In theory browsers and other clients treat all video stream as untrusted and it is safe to watch an arbitrary video. However complex formats like videos are a huge attack surface.
So yes, for the bigger names like Google this is an unacceptable risk. They will generally avoid serving any user-generated complex format like video, images or audio to users directly. Everything is transcoded to reduce the likelihood that an exploit was included.
YouTube needs to re-encode occasionally (new codecs/settings/platforms), it would be easy to abuse and send too high bitrate or otherwise wrong content, and a lot of end-user devices simply isn't powerful enough to complete the task in a reasonable amount of time.
Because of the massive bandwidth and data requirements. Assuming I as the source have a 20 MBit/s content that is 30 min long - that's about 3.6 GB of data.
Given your average DSL uplink of 5 MBit/s, that's 2 hours uploading for the master version... and if I had to upload a dozen smaller versions myself, that could easily add five times the data and upload time.
Because society results in companies being incentivized to babysit users rather than cutting off those who are unable to learn simple technical skills like optimally encoding a video respecting a maximum bitrate requirement.
> simple technical skills like optimally encoding a video respecting a maximum bitrate requirement.
This is in no way a "simple skill" as maximum video bitrate is only one of a number of factors for encoding video. For streaming to end users there's questions of codecs, codec profiles, entropy coding options, GOP sizes, frame rates, and frame sizes. This also applies for your audio but replacing frame rates and sizes with sample rate and number of channels.
Streaming to ten random devices will require different combinations of any or all of those settings. There's no one single optimum setting. YouTube encodes dozens of combinations of audio and video streams from a single source file.
What about cutting off those who condescend others without recognizing the limits of their own understanding?
I’m not an expert in this but I know that “optimally encoding a video” is an actual job. That’s because there’s no global definition of optimal (it varies depending on the source material and target devices, not to mention the costs of your compute, bandwidth, and time); you’re doing it multiple times using different codecs, resolutions, bandwidth targets, etc.; and those change regularly so you need to periodically reprocess without asking people to come back years later to upload the iPhone 13 optimized version.
This brings us to a second important concept: YouTube is a business which pays for bandwidth. Their definition of optimal is not the same as yours (every pixel of my masterpiece must be shown exactly as I see it!) and they have a keen interest in managing that over time even if you don’t care very much because an old video isn’t bringing you much (or any) revenue. They have the resources to heavily optimize that process but very few of their content creators do.
How long before the ads are realtime encoded into the video streams such that even youtube-dl can't bypass them without a premium login?
I've been surprised this wasn't already the case, but assumed it was just an encoding overhead issue vs. just serving pre-encoded videos for both the content and ads with necessarily well-defined stream boundaries separating them.
Right, using DAI means you don’t have to actually touch the original video (good!) but doesn’t stop a smart enough client (youtube-dl) from pattern matching and ignoring those segments when stitching the final video together.
I am not, however, suggesting that encoding ads into the final stream is appropriate or scalable, though!
The client doesn't even have to know that there is an ad playing if they really want to thwart ad blockers. If you are talking about pattern-matching the actual video stream ad-blockers could do that today and just seek forwards but none do yet.
The ads are not part of the encoded video AFAICT, they are probably served as a separate stream which the client requests alongside the regular video stream, this means that videos and ads can be cached using traditional techniques.
I'm kinda surprised Google doesn't do this... They would need to keep track of user seeks and stuff, but it still seems do-able. One simple model is for the server to know when ad-breaks should happen, and prevent any more downloading for the duration of the ad.
Sure, it would break people who want to watch at 2x realtime, but they seem small-fry compared to those with adblockers.
The issue there is scale, MPEG-DASH/HLS let the edge servers for video to be simple. The servers don't need to do much more than serve up bytes via HTTP. This ends up being better for clients, especially mobile clients, since they can choose streams based on their local conditions the server couldn't know about like downgrading from LTE to UMTS.
There's no shared wall clock between the server and client with HTTP-based streaming. There's also no guarantee the client's stream will play continuously or even hit the same edge server for two individual segments. That's state an edge server needs to maintain and even share between nodes. It would be different for every client and every stream served from that node.
For streaming you actually want the client to have a buffer past the play head. If the client can buffer the whole stream it makes sense to let them in many cases. The client buffers the whole stream and then leaves your infrastructure alone even if they skip around or pause the content for a long time. The only limits that really make sense are individual connection bandwidth limits and overall connection limits.
The whole point of HTTP-based streaming is to minimize the amount of work required on the server and push more capability to the client. It's meant to allow servers to be dumb and stateless. The more state you add, even if it's negligible per client, ends up being a lot of state in aggregate. If a system meant edge servers could handle 1% less traffic that means server costs increase by 1%. Unless those ones of ad impressions skipped by youtube-dl users come anywhere close to 1% of ad revenue it's pointless for Google to bother.
When it's possible to skip baked in ads (SponsorBlock) -- the whack-a-mole will continue no matter what. Even if it means you can't watch videos in realtime but have to wait for them to fully download to rip the ad out, someone will figure it out.
At that time everyone starts talking about it and I gotta imagine a bunch of new people become adblocking users.
SponsorBlock only works because the sponsored segments are at the same location for every viewer. If Youtube spliced in their own ads they could easily do it at variable intervals preventing any crowd sourced database of ad segment timestamps. To be honest, nothing really stops Youtube from just turning on Widevine encryption for all videos (not just purchased/rented TV & movies) besides breaking compatibility with old devices. Sure widevine can be circumvented but most of the best/working cracks are not public.
Things like youtube run on super-thin margins. Bandwidth and storage costs are massive, compute costs quite big, and ad revenue really quite low.
A competitor would need either a different model to keep costs low (limit video length/quality, the vimeo model of forcing creators to pay, or go for the netflix-like model of having a very limited library), or very deep pockets to run at a loss until they reach youtube-scale.
I'm still mystified how tiktok apparently manage to turn a profit. I have a feeling they are using the 'deep pockets' approach, although the short video format might also bring in more ad revenue per hour of video stored/transcoded/served.
To be honest I suspect it isn't actually a differentiator. It's good for Google that they can produce this chip and trim their hardware costs by some percentage, but it's not going to give them a competitive advantage in the market of video sharing. Especially in a business like youtube with network effects, getting the audience is the difficult bit, the technical solutions are interesting but you're not going to beat google by having 5% cheaper encoding costs.
What is floatplane, never heard of it? Seemingly an yt competitor by a somewhat popular youtuber. App on Android has "10k+" installs. Isn't it _way_ too early to say it wouldn't be a money losing business?
Floatplane is a video service built by the people behind the popular Youtube channel LinusTechTips. It is not a direct competitor to Youtube though. The platform makes it easier to let paying fans get videos earlier but it is not meant to build an audience.
Perhaps. But the big issues for YouTube right now isnt efficiency per se, but copyright, monetization, ai-tagging, social clout. If a YouTube competitor can get the content creators and offer them viewers, competition could perhaps work. This fight is probably not fought at the margins of hardware optimization.
Wow 33x throughput improvement for vp9 for the same hardware cost. That seems excessive but their benchmark is using ffmpeg. Is ffmpeg known to have the theoretically highest throughput possible state of the art vp9 encoder algorithms? Or is there any way of knowing if their hardware IP block is structured equivalently to the ffmpeg software algorithm? I know that custom hardware will always beat general hardware but 33x is a very large improvement. Contemporary core counts coupled with very wide simd makes CPUs functionally similar to ASIC/fpga in many cases.
BTW: Multiple CPUs cores are not parallel programming in the sense fpgas or ASICS (or even GPUs) are.
Multiple cores work like multiple machines, but parallel units work choreographically in sync at lower speeds(with quadratic energy consumption). They could share everything and have only the needed electronics that do the job.
Well transistors are cheap and synchronization is not a bottleneck for embarrassingly parallel video encoding jobs like these. Contemporary CPUs already downclock when they can to save power and conserve heat.
>> Contemporary core counts coupled with very wide simd makes CPUs functionally similar to ASIC/fpga in many cases.
> I don't think so. For things that have a way to be solved in parallel, you can get at least a 100x advantage easily.
That’s kind of my point. CPUs are incredibly parallel now in their interface. Let’s say you have 32 cores and use 256 bit simd for 4 64-bit ops. That would give you ~128x improvement compared to doing all those ops serially. It’s just a matter of writing your program to exploit the available parallelism.
There’s also implicit ILP going on as well but I think explicitly using simd usually keeps execution ports filled.
Those thousands of cores are all much more simple and do not have simd and have a huge penalty for branching. There are problems for which GPUs and CPUs are roughly equally well suited. GPUs have their cons.
I wonder how YouTube's power consumption in transcoding the most useless / harmful videos relates to Bitcoin's power consumption. Maybe even every video should be included in the calculation, since Bitcoin also has its positive aspects.
I've never heard about how much power YouTube's transcoding is consuming, but transcoding has always been one of those very CPU-intensive tasks (hence it was one of the first tasks to be moved over to the GPU).
Not just cost reduction or efficiency, the faster encodes you can get through dedicated hardware mean they can potentially reduce the delay between a video being uploaded and a video being available to the public (right now even if you don't spend time waiting in the processing queue, it takes a bit for your videos to get encoded)
You can handle larger volumes of incoming video by spinning up more encoder machines, but the only solution for lowering latency is faster encodes, and with the way the CPU and GPU markets are these days a dedicated encoder chip is probably your best bet.
I'm wondering if this is related to the recent Roku argument; perhaps YouTube is trying to force Roku to incorporate a hardware decoding chip (maybe with an increased cost) in future products as a condition to stay on the platform.
I don’t think YouTube cares if you use hardware of software decoding. I also don’t think they care if you use their hardware decoder or someone else’s. The issue with roku is they don’t want to include any extra hardware to support vp9, and they use such cheap/low spec hardware they can’t reliably decode in software.
For VP9/x264, almost certainly not. If you jump on a newly-uploaded video, you'll see that higher resolution comes later. It's common to see 720p nearly immediately, then 1080p, then 4K.
For pre-x264, they probably could, but between the relatively small sizes required for the low resolution those codecs would be supporting, and the cost difference between compute and storage, I'd bet everything is encoded beforehand.