I-NVFP4: Iyini nokuthi idlula kanjani i-FP8 ne-BF16 ku-AI.

Isibuyekezo sokugcina: 8 Okthoba ka-2025
Author: Isaka
  • I-NVFP4 ihlanganisa i-E2M1 nokukala okukabili (i-FP8 nge-microblock ngayinye kanye ne-FP32 nge-tensor ngayinye) ukunciphisa iphutha le-quantization ngo-88%.
  • E-Blackwell, i-FP4 ifinyelela ku-20 PFLOPS nge-GPU ngayinye kanye nokusebenza okungcono okungu-3x kune-FP8 ezimeni zomhlaba wangempela, ngokwehla kokunemba okuncane.
  • Ukwehla kwenkumbulo (kufika ku-8x), amandla ngethokheni ngayinye ehla aze afike ku-50x, futhi izindleko zokucatshangelwa zehla cishe ngo-90%.
  • I-ecosystem isivele isekela i-FP4 (TensorRT, vLLM, HF) kanye nentuthuko yengqalasizinda nge-NVLink 5, ukupholisa uketshezi, nama-rack angu-120 kW.

Ifomethi ye-NVFP4 nokunemba kwe-AI

Ingxoxo emayelana namafomethi anemba ku-AI isheshisiwe ngokufika kwe-NVFP4, futhi ngesizathu esihle: nciphisa izingcezu ngaphandle kokulahlekelwa ikhwalithi ishintsha kakhulu umnotho we-inference. Kulo mhlahlandlela, uzofunda ukuthi iyini i-NVFP4, ihluke kanjani ku-FP8 ne-BF16, nokuthi kungani izinkampani ezinkulu (ezingezinkulu kangako) sezivele ziyamukela, kusukela ezikhungweni zedatha kuya kuma-PC edeskithophu.

Ngaphandle kwe-hype yokumaketha, kunedatha eqinile: Amandla ngethokheni ngayinye asikwe izikhathi ezingama-50, ithokheni ephula irekhodi ligeleza ngomzuzwana, futhi inkumbulo yehla iye engxenyeni ngaphandle kokucekela phansi ukunemba. Noma kunjalo, kufanelekile ukuhlukanisa izihloko zezindaba ezivela eqinisweni elingokoqobo, ngoba umthelela uncike kuhadiwe, ukukalwa kwezinombolo, nokuthi imodeli ngayinye ilinganiswa futhi yenziwe kanjani.

Iyini i-NVFP4 futhi ithuthuka kanjani ku-FP8 ne-BF16?

I-NVFP4 isiphakamiso se-NVIDIA se- ukunemba okuphansi kakhulu Idizayinelwe ukuchazwa kwe-AI. Imele izinombolo ezine-E2M1 (ibhithi yophawu elingu-1, i-exponent bits engu-2, ne-1 mantissa bit) futhi yengeza isithako esiyinhloko: ukukala ngamaleveli amabili okunciphisa kakhulu iphutha le-quantization uma kuqhathaniswa nezilinganiso ezilula.

Lolu hlelo olunamazinga amabili luhlanganisa isici se I-FP8 E4M3 isikali sisetshenziswa kuma-microblocks wamanani ayi-16 ngokukalwa kwe-tensor yomhlaba ku-FP32. Ngenxa yale nhlanganisela, a Iphutha elingaphansi ngo-88%. kunezixazululo eziningi eziyisisekelo zamandla-okubili njenge-MXFP4, eqinisa ukuqina kwezinombolo ngezingcezu ezimbalwa.

Ngokuphambene, i-FP8 (E4M3 noma i-E5M2) isivele yehlisa kancane izindleko uma iqhathaniswa ne-FP16/BF16, kodwa I-NVFP4 iqhubekela phambili ehlisa inkumbulo namandla. I-BF16 igcina ububanzi obuguquguqukayo obufanayo ku-FP32 enamabhithi ambalwa ku-mantissa, ilungele ukuqeqeshwa kanye nezindawo lapho ukuzinza kwegradient kuyisihluthulelo, kodwa ngokuqonda okukhulu, i-4-bit enesilinganiso esihle yenza umehluko.

Umphumela osebenzayo: emithwalweni eguquguquke kahle, I-NVFP4 igcina ukunemba okusondele kakhulu kumafomethi aphezulu, kodwa ngokugxuma okuphawulekayo kwejubane nokusebenza kahle. Konke kuncike ekulinganiseni, ekulinganisweni, nasekusekelweni kwezingxenyekazi zekhompuyutha zomdabu.

I-Blackwell Architecture kanye ne-NVFP4

I-Blackwell Architecture: I-Muscle Behind NVFP4

Ukufika kukaBlackwell kube yimbangela yokusuka kwe-NVFP4. I-GPU I-B200 ihlanganisa ama-transistors ayizigidi eziyizinkulungwane ezingama-208.000 ekwakhiweni kwe-dual-chip, exhunywe ngesixhumi esibonakalayo esingu-10 TB/s NV-HBI esisobala kusofthiwe, eqinisekisa ukuziphatha okuhlangene.

I-Los Ama-Tensor Cores esizukulwane sesihlanu ngokwendabuko isekela i-NVFP4 ngesikali esisheshiswe ngehadiwe, sifinyelela ku 20 PetaFLOPS ku-FP4I-architecture iphinde ihlanganise inkumbulo ye-tensor eseduze namayunithi wekhompiyutha (TMEM), ikhawulela izindleko zamandla okunyakaza kwedatha kanye nokwandisa ukusebenza okuqhubekayo.

Ukuze ukusetshenziswa, uchungechunge I-GeForce RTX 50 Izuza njengefa amandla e-FP4 nokusebenza kwe-AI okufika ku-4.000 TOPS futhi isheshisa ukukhiqizwa kwesithombe (isb., FLUX) kuze kufike ku- 3,9 izikhathi uma kuqhathaniswa ne-FP8 ezimeni ezithile, okubonisa ukuthi i-4-bit inference akuyona nje into yesikhungo sedatha.

  I-VRM ne-PWM digital vs analog: umhlahlandlela ophelele futhi osebenzayo

Ekugcineni okukhulu, iBlackwell Ultra (B300/GB300) iphakamisa ibha ngayo 288GB HBM3E nokusebenza okwengeziwe okungu-1,5x kune-B200, efinyelela ekucushweni kwe-NVL72 ukuthinta 1,1 ama-exaFLOPS ngohlelo ngalunye ku-FP4 eminyene. Lokhu kubeka isisekelo sokuphakela amamodeli anamakhulu ezigidigidi zamapharamitha emishinini embalwa.

Amamethrikhi: Amathokheni amaningi, ama-watts amancane, nememori engaphansi kokulawulwa

Idatha yokukhiqiza nebhentshimakhi ipenda isithombe esingaguquki. Ku-DeepSeek-R1 671B, Gxumela ku-FP4 ekusebenzeni okuphindwe kathathu kwe-B200 uma kuqhathaniswa ne-FP8 ku-H200, nezinhlelo ze-DGX B200 ezedlula i- 30.000 amathokheni/s. Ukunemba akulimazi neze: I-MMLU yehla isuka ku-90,8% iye ku-90,7% uma ibala isuka ku-FP8 iye ku-FP4.

Enkumbulweni, izinombolo ziningi kakhulu. I-LLM ithanda I-Llama 3.1 405B inyuka isuka ku-140 GB ku-FP32 iye ku-17,5 GB ku-FP4, ukuncishiswa okungu-8x okuvumela ukusebenzisa amamodeli amakhulu kuma-GPU ambalwa. Ekwenziweni kwesithombe, ukucushwa kwe-FLUX kungasuka 51,4 GB ku-FP16 ukuya ku-9,9 GB ku-FP4 enokonakala okuncane kokubona futhi uzivumelanisa ne-VRAM enesizotha.

I-MLPerf v5.0 isekela umnyakazo: isilinganiso sokuphuma kwe I-Llama 2 70B igoqiwe uma kuqhathaniswa nonyaka odlule futhi imiphumela engcono kakhulu ibe ngcono ngo-3,3x. Ngamandla, uphawu
kusuka ku-H100 kuya I-10 J yehla iye ku-0,4 J ku-B200 kakade 0,2 J ku-B300, i.e. kufika ku-50x ukusebenza kahle okwengeziweKuhunyushelwe emigomeni yebhizinisi, kulindeleke ukuthi kwehle cishe u-90% ezindlekweni ze-inference phakathi kuka-2024-2025.

Ekugcineni komsebenzisi, amamodeli wesithombe nombhalo nge I-NVFP4 isebenzisa amathokheni amaningi ngedola ngalinye, ngemibiko yokuthuthuka okungafika kokungu-40% kunezinye izindlela, okuthile okuhlangana kahle nenkumbulo encane kanye nokusebenziseka kalula kwamamodeli amakhulu.

Ukutholwa: amafu, izinkampani kanye namacala empilo yangempela

Abahlinzeki bamafu bahola ukwamukelwa kwe-FP4. I-Lambda Labs inikeza amaqoqo e-HGX B200 ane-FP4 kokuthi 1-Click deployments, kanye namarekhodi e-CoreWeave 800 amathokheni/s ku-Llama 3.1 405B nge-GB200 GPU. Akuyona yonke i-NVIDIA: I-Meta, i-OpenAI ne-Microsoft Basebenzisa i-AMD Instinct MI300X ekuqondeni futhi MI350 izofika ngokusekelwa komdabu kwe-FP4.

Ebhange, JPMorgan ihlola i-FP4 ukuze ithole ubungozi nokunye ukuhlaziya; ekunakekelweni kwezempilo baye babonwa + 30% isivinini con -50% inkumbulo, futhi ekukhiqizeni, izinqumo zesikhathi sangempela zinikwe amandla kumadivayisi anezinsiza ezilinganiselwe, ukuvula iminyango lapho kwakungekho ndawo ngaphambili.

Isofthiwe ihambisana nesinyathelo. I-TensorRT Model Optimizer ihlinzeka ngamapayipi agcwele we-FP4 wokulinganisa; izinhlaka ezifana i-vLLM hlanganisa ukwesekwa kwangaphambi kwesikhathi kwe-NVFP4; futhi Ubuso Obumbambayo isingatha izindawo zokuhlola ze-FP4 ezichazwe ngaphambilini (DeepSeek-R1, Llama 3.1, FLUX) ukuze kusheshiswe ukuthunyelwa kokukhiqiza.

Emaqenjini angasebenzisi kakhulu ikhompuyutha, kunezindlela ezingaphansi kwe-QAT ezisetshenziswayo I-SVDQuant ngokunemba eduze nokuqeqeshwa okulinganiselwe; uma kufunwa ukunemba okuphezulu, i I-QAT ku-FP4 Igcina noma ithuthukise i-BF16 emindenini efana ne-Nemotron 4, inqobo nje uma inqubo icushwe kahle.

Ingqalasizinda: amandla, ukupholisa, nemithetho emisha yesikhungo sedatha

Ukunemba okuphansi kakhulu kudinga ukudwetshwa kabusha kwesikhungo sedatha. Uhlelo I-GB200 NVL72 idla u-120 kW irekhi ngayinye kuma-GPU angu-72, ngaphezu komthamo wezikhungo eziningi zedatha ezikhona. Noma kunjalo, i-NVL72 ithatha indawo ye-HGX H100s eyisishiyagalolunye futhi idinga a Amandla angaphansi kuka-83% ngokubala okufanayo okusebenzayo.

Nge-TDP engu-~1.000 W nge-GPU ngayinye, i ifriji engamanzi Ukufakwa kwe-chip eqondile akuyona inketho. Amapuleti abandayo kuzo zonke izindawo ezishisayo avumela ukusetshenziswa epholile ku-45 ºC nemibhoshongo yokupholisa, ukugwema amakhaza abizayo. Izixazululo ezifana I-Supermicro DLC-2 Zifinyelela ku-96 B200 irack ngayinye futhi zifika 250 kW amandla okushisa.

  I-AMD SERDES vs. Sea-of-Wires D2D ye-Zen 6: Esikwaziyo

Kusoftware eyisisekelo, abashayeli bayadingeka I-CUDA ibuyekeziwe, i-TensorRT-LLM enokwesekwa kwe-FP4 namathuluzi akhethekile okulinganisa. I-Post-quantization nge-Model Optimizer isheshisa ukuthunyelwa kokukhiqiza, kuyilapho ukuqeqeshwa nge-quantization ikhulisa ukugcinwa kwekhwalithi.

Uma sibheka esikhathini esimaphakathi, ama-CPD alungiselelwe ama-rack azokwanda. I-50-120 kW, enezixazululo zesizukulwane esilandelayo zokupholisa kanye nokuphathwa kwamandla. Ukuvuthwa kwesoftware kuzoqhubeka nokuba ngcono ukuhlanganiswa okungenamthungo namapayipi i-quantization ezenzakalelayo.

Inethiwekhi nokuqina: I-NVLink 5, amaswishi, kanye nezithombe

Indwangu ye-interconnect enye ingxenye yokusebenza. Isizukulwane sesi-5 se I-NVLink iphinda kabili i-bandwidth futhi ikuvumela ukuthi ujoyine kuze kube I-576 GPU. Isixhumanisi ngasinye esisebenzayo sinikeza ~50 GB/s isiqondiso ngasinye; ngezixhumanisi eziyi-18 nge-GPU ngayinye, umkhawulokudonsa ohlanganisiwe ufinyelela ~I-1,8 TB / s, ngaphezu kwe-14× kune-PCIe Gen5.

El conmutador I-NVIDIA NVLink inikela kuze I-130 TB / s ngesizinda ngasinye se-NVL72, kubalulekile ekuhambisaneni kwesikali semodeli. Ngaphezu kwalokho, ukwesekwa kwephrothokholi SHARP ekuncishisweni kwesigaba sisheshisa ukunemba okufana ne-FP8 ekusebenzeni kweqoqo elibalulekile.

I-NVIDIA nayo iphokophela ekuxhumaneni nenethiwekhi I-Quantum-X800 InfiniBand y I-Spectrum-X800 Ethernet, enemindeni eshintshayo esukela ku-128 kuya ku-512 800G echwebeni, kanye nezinketho zokuminyana okuphezulu kwe-200G, kanye nokupholisa okuhlanganisiwe koketshezi ukuze kuqhubeke ukusebenza.

cunt I-NVIDIA Photonics, izinjini ezibonakalayo ezihlanganiswe ku-switch ASIC iphakeji zishintsha ama-transceivers endabuko axhumekeka kalula, aphromotha kuze kube 3,5× ukusebenza kahle, ukuqina okuphindwe ka-10 kanye nokuthunyelwa okusheshayo okungu-1,3x, okuvula indlela yezikhungo zedatha yokubona ezixhumene kakhulu.

Isoftware neplathifomu ecosystem: Dynamo, AI-Q, Mission Control, NIM, kanye ne-OVX

Ukuminyanisa uBlackwell, i-NVIDIA yethule izingcezu ezibalulekile ezimbalwa. I-Dynamo iyinkundla yokukhomba yomthombo ovulekile eklanyelwe ukukala umbuzo owodwa phakathi kwe-GPU nge-NVLink, ngentuthuko efinyelela ku- 30x emithwalweni yokucabanga iqine njenge-DeepSeek R1 nokuphindaphinda kabili kokuphuma ku-Hopper ngaphandle kokushintsha ihadiwe.

I-AI-Q (plus AgentIQ) ihlongoza uhlaka oluvulekile lwama-ejenti amaningi oluhlanganisa idatha yebhizinisi, amathuluzi angaphandle kanye namanye ama-ejenti, olusiza amasistimu ayinhlanganisela akwazi isizathu mayelana nombhalo, izithombe namavidiyo, ngokuhlanganiswa kwezinhlaka ezifana ne-CrewAI, i-LangGraph, noma i-Azure AI Agent Service.

Esigabeni sokusebenza, Ukulawulwa Kwemisebenzi Izenzela i-orchestration yokuphela-siya ekupheleni kwezikhungo zedatha ye-AI, ngokushintshana okungenamthungo phakathi kokuqeqeshwa nokusho okuthile, 5x ukusetshenziswa okwengeziwe nokuvuselelwa komsebenzi 10x ngokusheshaUkwengeza, i-Base Command Manager isiyatholakala mahhala kuze kufike kuma-accelerator ayisishiyagalombili ngohlelo.

Ibhethri I-NVIDIA NIM ingeza ama-microservices e-AI akhiqizayo alungele ibhizinisi. Ngakolunye uhlangothi, Izinhlelo ze-OVX Ziqondiswe ku-AI ekhiqizayo kanye nemidwebo ejulile, ephelezelwa uhlelo lwe ukuqinisekiswa kwesitoreji nge-DDN, i-Dell PowerScale, i-NetApp, i-Pure Storage noma i-WEKA ukuze kuqinisekiswe ukuphumela kanye nokukalwa ekukhiqizeni.

Imikhiqizo yobungcweti: I-RTX Pro Blackwell, i-DGX Station ne-DGX Spark

Umndeni omusha I-RTX Pro Blackwell Buyekeza umugqa womsebenzi ngokufika 96 GB yememori ku-Pro 6000 nangaphezulu I-4.000 TOPS I-AI, i-4th Gen RT Cores, kanye ne-5th Gen Tensor Cores ene-FP4. Ku-Server Edition, iyengeza I-vGPU ne-MIG ukuhlukanisa i-GPU ibe yizimo ezingazodwa eziningi.

Ezimweni zangempela, ziye zabikwa 5 × ekulandeleni imisebe vs RTX A6000 (Foster + Partners), kufika ku-2x ekwakhiweni kabusha kwezokwelapha (GE HealthCare), ukuthuthukiswa okuphawulekayo ku-VR (Rivian) kanye 3x ukukhiqiza nge-LLM (SoftServe). U-Pixar uveza ukuthi u-3,3% wevidiyo yokukhiqiza manje usulingana phakathi kuka-70 GB we-GPU eyodwa.

  Izingxenye ze-microprocessor nemisebenzi yazo

Isiteshi se-DGX ibuyekezwa nge-GB300 Grace Blackwell Ultra, Inkumbulo ehlanganisiwe engu-784GB futhi phezulu 20 PFLOPS ku-AI FP4, ukuxhumana okwengeziwe kwe 800 Gbps nge-ConnectX-8. Konjiniyela nabafundi, I-DGX Spark nge-GB10 chip kanye no-128 GB wokunikezwa kwememori ehlanganisiwe ~1.000 OKUPHEZULU ye-AI ne-SmartNIC ConnectX‑7, okwenza ukungena ku-ecosystem kushibhe.

I-Exascale ku-rack kanye nama-superpods angokwezifiso

Uhlelo I-DGX GB200 NVL72 kabili kusuka ku-32 kuya I-72 GPU futhi ikhuphule inkumbulo isuka ku ~ 19,5 TB iye ku ~I-30 TB. Ekubalweni, ukugxuma kuyamangalisa: kusuka 127 PF a 1,4 EF ku-FP4 (~11×), futhi kusukela ku-127 PF kuya 720 PF ku-FP8 (~5,6×), konke ku-chassis epholiswe ngamanzi ngokugcwele.

Ngaphezulu, i I-DGX SuperPOD nge-8 GB200 NVL72 izinhlelo seziphelele 11,5 i-exaFLOPS FP4 kanye ne-36 GB200 SuperChips isistimu ngayinye, nokuthuthukiswa okufika koku 30 × uma kuqhathaniswa ne-H100 ekuqondeni okukhulu kwe-LLM, eklanywe “njengemboni ye-AI” ehloselwe amamodeli ngokulandelana kwamapharamitha ayizigidi eziyisigidi.

Emsamo kaGrace-Blackwell, i I-GB200 ixhumanisa ama-B200 amabili ne-Grace CPU eyabiwe nge-C2C, futhi ifinyelela esikalini 576 GPUs ku-1,8 TB/s usebenzisa i-NVLink 5, ukuhlanganisa izindawo ezihambisana kakhulu ezifanele imithwalo yemisebenzi ye-AI edingeka kakhulu.

I-Quantization Yesimanje: Ukugcina Ubuhlakani Ku-4 Bits

Impumelelo ye-FP4 ivela ekuhlanganiseni hardware kanye nesofthiwe. Ukukala okubili kwe-NVIDIA kulungisa ekusabalaliseni amanani e-tensor futhi injini ye-Transformer ihlaziya imisebenzi engaphezu kuka-1.000 ukuze uthuthukise izikali, okuvumela amamodeli afana ne-DeepSeek-R1 ukuthi azuze Ukunemba okungu-98,1% ku-FP4 futhi, kwezinye izivivinyo, yeqa isisekelo se-FP8.

Ngemva kokuqeqeshwa, I-SmoothQuant y AWQ benze kwaba nokwenzeka ukufaka amamodeli alingana ne-Falcon 180B ku-GPU eyodwa. Uma udinga ukulondoloza ukusebenza okuphezulu, i- I-QAT ilingisa i-FP4 Ngesikhathi sokulungisa kahle, kuyasiza ukujwayela ukusatshalaliswa kwesisindo. Imindeni efana ne-Nemotron 4 show I-FP4 ayilahleki nge-QAT, ku-BF16 noma ngaphezulu.

Ezimweni eziyinkimbinkimbi, ukuphathwa kwe Amanani atypical igwema ukuwa kokwenza kusebenze, namasu okuthi ukunemba okuxubile phakamisa izingcezu emisebenzini ebalulekile. Umphumela: I-FP4 iyasebenza ezakhiweni eziminyene futhi futhi Ingxube Yochwepheshe, ngokunemba okungadela ukukhiqizwa.

Imephu yomgwaqo nokutholakala

Ngibheke phambili, i Isizukulwane sikaVera Rubin ihlose 50 PFLOPS FP4 I-GPU-dense, nge I-ConnectX‑9, NVLink‑6 kanye nenkumbulo HBM4 (+1,6x umkhawulokudonsa). Uxhumano lwe-CPU-GPU nalo luzokhuphuka lube ~1,8 TB/s, futhi i-Rubin Ultra izophakamisa ibha futhi ukuze 100 PFLOPS FP4 y 1 TB ye-HBM4e.

Ohlangothini lwe-AMD, i-architecture CDNA 4 inika amandla i-Matrix Cores ngokusekelwa I-FP4 ne-FP6, ukusebenza okuphindwe kabili uma kuqhathaniswa nesizukulwane sangaphambilini nokwengeza ubuncane ukuze kusheshiswe nangokwengeziwe, into ejabulisa kakhulu kumamodeli we-Mixture of Experts.

Umkhawulo osheshayo akuwona ubuchwepheshe kodwa ukunikezwa kwehadiwe: Okuningi kokukhiqizwa kwe-2025 B200/B300 kuzibophezele kuma-hyperscaler. Noma kunjalo, umthelela wezindleko ngethokheni ngayinye kanye nokusebenza kahle kwamandla kubangela a intando yeningi yangempela, iletha amakhono aphambili ezinhlanganweni ezincane ngenxa yokugxuma ememori kanye nokubala nge-watt ngayinye.

i-nvidia blackwell ultra gb300
I-athikili ehlobene:
I-NVIDIA Blackwell Ultra GB300: Izakhiwo, Inkumbulo, kanye ne-NVLink 5