We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Feature-rich and fast SCSI target with CTL and ZFS

00:00

Formal Metadata

Title
Feature-rich and fast SCSI target with CTL and ZFS
Title of Series
Number of Parts
41
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Three years ago FreeBSD got new subsystem called CTL (CAM Target Layer), providing SCSI target device emulation at kernel level. It allowed to bring FibreChannel target support in FreeBSD to significantly new level, and later was integrated with the new iSCSI stack. This talk will describe CTL internal organization, improvements done during the last year, results and perspectives. It will include overview of modern SCSI extensions, known as VMWare VAAI and Microsoft ODX, and their CTL implementation.
16
Thumbnail
39:54
SCSIJames Waddell Alexander IIConditional-access moduleInclusion mapSimulationBlock (periodic table)Data storage deviceMiniDiscMedianFile formatMetropolitan area networkAddressing modeAsynchronous Transfer ModeFiber (mathematics)FrustrationPhysical systemCore dumpCodeMiniDiscMappingData storage deviceConfiguration spaceFront and back endsBlock (periodic table)Mathematical optimizationDifferent (Kate Ryan album)Default (computer science)iSCSIOperating systemFile systemComputer fileFunctional (mathematics)MereologySet (mathematics)Classical physicsCommunications protocolInternet service providerDevice driverSCSIConditional-access moduleData structureCASE <Informatik>NumberIntegrated development environmentTheory of relativityFlow separationWordPhysicalismArithmetic meanInheritance (object-oriented programming)Rule of inferenceEndliche ModelltheorieContent (media)Point (geometry)Particle systemWebsiteGodMultiplication signSystem callSource codeComputer iconForm (programming)Computer animation
Addressing modeExplosionLogicBlock (periodic table)Web pageLimit (category theory)MiniDiscParameter (computer programming)Gamma functionData storage deviceSubsetDifferent (Kate Ryan album)Block (periodic table)Default (computer science)Normal (geometry)Latent heatGeometrySpacetimeHard disk driveLevel (video gaming)Pattern languageVirtual machineMultilaterationReduction of orderTraffic reportingPrimitive (album)PhysicalismLogicCharacteristic polynomialNP-hardWeb pageError messageVirtualizationProper mapMereologySystem administratorWritingLimit (category theory)Cycle (graph theory)SCSIMiniDiscShift operatorMaxima and minimaINTEGRALInstallation artSummierbarkeitRight angleProcess (computing)CASE <Informatik>VarianceMessage passingPartition (number theory)Physical lawFormal grammarBefehlsprozessor1 (number)Ising-ModellMusical ensemblePole (complex analysis)Context awarenessRoundness (object)State of matterQuicksortBayesian networkArithmetic meanWind tunnelForm (programming)Point (geometry)Speech synthesisSheaf (mathematics)Computer animation
MiniDiscBlock (periodic table)Electronic mailing listINTEGRALStatisticsWindowWritingSpacetimeFunctional (mathematics)Linear subspaceData storage deviceGroup actionExterior algebraNatural languageComputer animation
HypermediaStatisticsMiniDiscMetropolitan area networkWeb pageBlock (periodic table)LogicLogarithmAsynchronous Transfer ModeFreewareCluster samplingTotal S.A.Heat transferPattern languageBlock (periodic table)Data storage deviceOperator (mathematics)Virtual machineWikiWebsiteResource allocationSpacetimeINTEGRALSurgeryLatent heatPoint (geometry)Formal verificationRevision controlRun time (program lifecycle phase)WindowThresholding (image processing)Endliche ModelltheorieSpeech synthesisHeat transferFunctional (mathematics)Multiplication signError messageTraffic reportingNeuroinformatikConnected spaceState observerPhysical systemMereologyCASE <Informatik>CuboidStatisticsSoftware bugExterior algebraWeb pageSystem callRegular graphMiniDiscMachine visionInterface (computing)Reading (process)DataflowFront and back endsSoftwareiSCSIStandard deviationPhysicalismSign (mathematics)Fiber (mathematics)2 (number)SCSILink (knot theory)Band matrixComputer animation
MiniDiscComputer networkMetropolitan area networkNewton's law of universal gravitationData storage deviceLatent heatMiniDiscWindowBit rate2 (number)MereologyPower (physics)MathematicsComputer animation
Gamma functionMetropolitan area networkRemote Access ServiceMiniDiscComputer networkArmCluster samplingFreewareTotal S.A.Heat transferPattern languagePrimitive (album)Physical systemFile systemOperator (mathematics)Atomic numberSoftware testingSet (mathematics)View (database)WindowDistanceWordOperating systemInheritance (object-oriented programming)IterationComputer animation
Parameter (computer programming)MiniDiscExtension (kinesiology)Military operationCloud computingComputer networkMetropolitan area networkGamma functionWeb pageToken ringInformationMathematical singularityInterpreter (computing)Data storage deviceMiniDiscParameter (computer programming)Operator (mathematics)Token ringSoftwareReading (process)WritingSpacetimePhase transitionHash functionVirtualizationBit rateFunctional (mathematics)File systemSCSIVirtual machineCloningPoint (geometry)Source codeLatent heatPrimitive (album)CASE <Informatik>Structural loadHeegaard splittingHuman migrationComplete metric spaceRevision controlSimilarity (geometry)Projective planeSampling (statistics)Focus (optics)HypermediaInterior (topology)Reverse engineeringVector space2 (number)State of matterCodeComputer animation
Ext functorComputer fileComputer networkSystem callMetropolitan area networkBlock (periodic table)Heat transferMilitary operationParameter (computer programming)Asynchronous Transfer ModeKey (cryptography)Extension (kinesiology)Software testingUniformer RaumFile formatLength of stayExplosionAddressing modeDiscrete element methodInformationCodierung <Programmierung>ForceElectronic mailing listSingle-precision floating-point formatRead-only memoryCodeDifferent (Kate Ryan album)SCSIMultiplicationLinear mapFrame problemData transmissionWindowVirtualizationComputer file2 (number)SoftwareReading (process)MiniDiscPrimitive (album)1 (number)Interrupt <Informatik>Operator (mathematics)Integrated development environmentTime zoneContext awarenessFront and back endsComplete metric spaceBit rateiSCSISemiconductor memoryCore dumpPoint (geometry)Structural loadData storage deviceForm (programming)PlastikkarteMultiplication signFunctional (mathematics)Thread (computing)Total S.A.Different (Kate Ryan album)Software testingNumberComputer hardwareLinearizationWritingPhysical systemFiber (mathematics)AdditionPattern languageCASE <Informatik>Perspective (visual)Limit (category theory)Set (mathematics)Block (periodic table)Queue (abstract data type)Mathematical optimizationLatent heatMultiplicationBenchmarkVirtual machineCommunications protocolDrop (liquid)Connected spaceBefehlsprozessorMoment <Mathematik>HypermediaQuicksortDegree (graph theory)WordVector spaceHeat transferLocal ringForestScaling (geometry)Particle systemTheoryDirection (geometry)Semantics (computer science)MereologyIterationRight angleElectronic mailing listReduction of orderMoment (mathematics)Flock (web browser)Information securityEndliche ModelltheorieComputer programmingSummierbarkeitQuantumPhysical lawResultantComputer animation
Software testingGraph (mathematics)Linear mapBlock (periodic table)Different (Kate Ryan album)SCSICore dumpSineImplementationComputer hardwareTotal S.A.Software testingGraph (mathematics)LogicCausalityPhysical systemConnected spaceFiber (mathematics)Execution unitCASE <Informatik>Latent heatIntegrated development environmentWechselseitige InformationBitData storage deviceLevel (video gaming)CodeFraction (mathematics)Multiplication signIntegerNumberAreaPoint (geometry)Mathematical optimizationRule of inferenceDifferent (Kate Ryan album)Insertion lossSpacetimeQuicksortMeasurementProcedural programmingLimit (category theory)Set (mathematics)Functional (mathematics)PlastikkarteBeat (acoustics)WindowOperator (mathematics)Interpreter (computing)iSCSIVirtual realityoutputFront and back endsInterpolationEncapsulation (object-oriented programming)Projective planeDevice driverIn-System-ProgrammierungDampingOpen setSoftware developerBlock (periodic table)10 (number)Ideal (ethics)Table (information)Computer animation
SCSIConditional-access moduleSimulationBlock (periodic table)Game controllerArc (geometry)Front and back endsGeometryFunctional (mathematics)SpacetimeBuffer solutionCASE <Informatik>BitMoment (mathematics)System callExecution unitComputer fileSemiconductor memoryForcing (mathematics)Cache (computing)Thresholding (image processing)Latent heatPerfect groupView (database)Different (Kate Ryan album)Shape (magazine)QuicksortBlock (periodic table)Duality (mathematics)Civil engineeringInformation securityIntegrated development environmentComputer animation
Transcript: English(auto-generated)
I'm developing a story just known as FreeNAS and TrueNAS and today I will be talking about my work during the last couple years and fancy new features of CTL which were implemented and how CTL can now greatly interoperate
with other systems, especially initiators, and what it can provide to them, how can it improve all the environment. And so, so let's go. So at first several words about CTL.
So for those who don't know, CTL can be decoded as CAM target layer. It's only relation to CAM is in a couple frontends where it starts its frontends to
interoperate with CAM subsystem or FreeBSD, which is common access. It's a frustration to work with SCSI devices and here you may see those two frontends, CAM target and CAM sim. One allows to talk to
some fiber channel cards, some other target mode cards, and CAM sim allows to talk to actually the system itself. But I started something from the side. So still, CTL core is set of code which allows to emulate SCSI device.
Emulated quite good with all fancy details of SCSI, with reservations, with many, many, many features. On the right side, we can see bunch of backends where it actually stores the data. So CTL can store data in different block devices
represented by genome. It can store them in ZFSZ walls, which is the most advanced configuration in this scheme. And it can also store data in plain files, maybe in UFS, in ZFS, on any other file system.
But again, some more optimization was done for ZFS. And last time, thanks to Edward Napierala, we have another frontend for iSCSI. It's represented here. So CTL can work not only as fiber channel target
as it was before, but now as iSCSI target, and potentially could be extended to other kinds of protocols. For example, I'm dreaming about some SAS target, which should not be difficult. All we actually require for this is driver support
in target mode for, for example, LSI SAS HBAs. After that, we could do it. But that's kind of small general overview. And let's start to what actually happened last time.
So my topic is called fast and functional target. And from some point, function means faster. And that's first part of my talk, which I am going to start.
So the first for target to be fast, it should be convenient for initiator to do things. And target should explain to initiator how to do things efficiently and do it fast. If we start on, if we look on some classic
SCSI block device, all it provides, it's block size, number of blocks, and commands to read them and write them. That worked for, who knows, many years, 30 years, maybe more. And below, it's typically what operating system see about disk.
It's block size of 512 bytes, and bunch of those blocks, quite simple. But several years ago, we was hit by such functionality as called as advanced format, when we have disks which have internal structure not represented by previous numbers.
So we got physical sectors. They are 4K in case of most of modern disks. But there was some other cases like 2K and others one. And in fact, those physical block structure very closely maps to ZFS functionality.
Providing SCSI target on top of ZFS will make it have its own blocks of, for example, 8K, which is default for the wall, or even 128K if we do it on top of default ZFS data set on top of file on ZFS, or it can be 32K if we're doing SCSI target
on top of UFS with default block size. But in all the cases, especially in ZFS, characteristics of those access, access is very close to hard disks one. So same as for hard disks, ZFS is unable to read less than one block because it's unable to check soon.
And same as hard disks can't read less. It has to read all the physical block and then get part of it. For writing, it's even closer. So ZFS cannot write less than one block. So same as hard disk, it must read the whole block, modify some part, and write it back.
And to avoid those three modified cycles which are very expensive, we must inform our initiator about what our physical block size. And that's the first thing was added to CTL, which you may see here, it's available to initiators. So initiator may now see that target
has eight kilobyte physical block and that it should align all accesses, align partitions, and so on to reach performance. So this is example of FreeBSD initiator. And FreeBSD default installer should respect this data to align partitions and align some other things.
For example, if you create ZFS on top of these disks, ZFS will respect this block size and try to increase a shift up to eight kilobytes. It's actually maximum size acceptable for ZFS. If you try to go higher, ZFS will just drop to its default
because it can't go that high. So it depends on specific initiator capabilities, but it's always better to report to initiator what actually we can do to make it efficient. So now we have some other disks with other geometry specifics,
especially some shingle writes and so on. And who knows, maybe at some point, some of them will appear general enough to be reported. So next thing we are seeing on the market recently, together with appearance of virtualization,
is thin provisioning. It's normal for virtual machines to over-provision CPU, over-provision RAM, but it's also usual to over-provision storage spaces. So when storage is not needed,
it should be returned and can be reused for other proposals. When we are doing some block storage, SCSI storage on top of ZFS, that's clearly obvious because ZFS wants to have more free space available on the pool to reduce free space fragmentation
and data fragmentation later. So it's always better to return space back to pool, even if it's not immediately reused for better health of the pool. And this goes to another side of resource provisioning, which we can actively see in the face
of solid-state drives, which want to know which blocks are not used to be able to recycle them in more even pattern to reduce level of each separate block. And CTL got such support.
So CTL can report that it runs thin provisioning disk. It supports a couple VPD pages. It's logical block provisioning page and block limits page. It can report how big is un-mapped block size. And for ZFS, it's much as general physical block size,
but technically it could be different. There are some SCSI targets which have a different. And VEI, or VMware, calls it another primitive, whatever. It's called VEI, thin provisioning reporting.
So that's the first thing CTL got supported. It's quite straightforward. And there are a bunch of other primitives I'll describe later, but this one is first. The next thing that is important is to tell initiator
about critical situation when we got out of space, which is quite easy if we are doing storage or provisioning. And doing proper reporting, it's possible to make VMware to detect this failure and actually freeze virtual machine and ask what to do. So it's not just a storage error
which should crash everything, but it appears such message and ask what to do, retry, after administrator free some space or stop virtual machine, and so on. And this feature is called by VMware as VEI thin provisioning stand. In fact, it's just reporting proper code, but it's critical for proper integration.
And this thing is also supported by CTL. And once we are doing thin provisioning, it's important to be able to actually free some space on the storage, which previously was used. CTL supports all, most two flavors of the functionality. One is through write same command with unmapped flag,
and another is through unmapped command, which is closer alternative to ATA trim command. And VMware calls it VEI unmap, and here you may see statistics of those commands, ESX stop. So VMware can actually use it,
Windows initiator can use it, for VEI thin initiators can use it. So it's a question of better integration. After we unmapped some blocks, it's possible to actually ask CTL whether those block mapped or not, or if it is unmapped,
how many other blocks around are mapped to. And here you may see example how Windows defrag util uses it, so it can detect that our disk is thin provisioned, and it goes through all disks, checks that all unused blocks are unmapped, and reports we have 100% space efficiency. So neither of unused blocks
are not counted by storage as used. And if it founds a block, it calls unmap and frees them. So usually it does it in runtime, but if for some reason it didn't happen in runtime, it can fix it later. Also initiator can get statistics
about storage space usage. Here we may see logical block provisioning log page reported by storage, which shows how many LBAs we have available on our backend pool, and how many LBAs are actually used now.
And so far I don't know our initiators who would actively use those data, but at least it's possible to get them. It's SG logs, it's standard tool available on Linux, and it's possible to get the statistics.
But what is actually actively used is it's possible to set thresholds on those values, and make a target automatically notify initiator when thresholds are reached. So when system is going close to overflow, we should know it in advance before system will crash and everything will get very bad.
And so here you may see how VMware reports these kinds of errors. So you may set threshold, like notify me when free space will drop below 20%. And after that, every five minutes, CTL will bug all initiators.
Hey, space is going out, going out, blah, blah, blah. And VMware calls this feature, simple visioning space threshold warning. That's obviously important functionality, and it's also supported. And so the next part of functionality,
which is important for efficient integration with initiator is IO flow. So some operations done by storage does not necessarily require active integration with initiators. So initiator can tell, do something for me please,
and it got off. And storage will do it. So there are some functionality which exists in SCSI specifications for ages. For example, verify comment, define that signs very old SCSI specifications,
and CTL now also supports that. So you may see here on screenshot how Windows check disk tool checks physical disk for errors. It uses it using verify SCSI comment, and it's not so critical for a regular internal SCSI drives because usually disk speeds much lower
than interface speed, and you won't get much using verify comparing just a regular read. But when we are talking about iSCSI storages or fiber channel storage, it may happen that storage is actually faster than a connection to the host. And here you may see that verify does 400 megabytes per second
with only 500 kilobytes per second of network traffic. So we're saving network traffic, we're increasing bandwidth in case this is one gigabit link. We're definitely doing five times faster than link could do. And so we are flawed, completely flawed in initiator from pointless work.
The same alternative to same could be done for write operation. So it's quite useful in case of virtual machines. Also when you are creating your virtual machine and you want allocate some space to it, you wanna prearrange it,
you wanna make sure that there is no data leakage and so on. And it's possible to say to virtual machine just, to store it just erases please hundred gigabytes or whatever. And that's how it looks in VMware in this here. So you may see here that 40 gigabytes of storage was erased in 10 seconds,
it's four gigabytes per second. And network traffic is just insignificant comparing two. Yes? For the verify command, is that just reading the data from disk and making sure that it's still there or is that actually doing some like Windows check disk specific something? No, verify is just to make sure data is already able.
So it just goes through the data and checks that it reads. Actually on top of CTL, working on top of ZFS it actually does read and hopes that ZFS will do all its checksums and so on. So it's not actual scrap as it could be, but it just read.
Reads the data, but it doesn't add the checksums
or whatever CRCs that the drive does automatically and then just throw away the data. And one more primitive I've skipped, it's called compare and write. It's a common VMware calls it's atomic test and set. It's just basic atomic operation primitive
used in most of operating systems. It's compare and set, the same in x86. It allows to avoid discrete reservations for clustered file system access. So you can tell it get this data,
compare it to data at offset x and if they match, replace them with data y. And so it's much faster than doing it through reservations because it requires retries and so on. And VMware actually actively uses this feature.
And Windows and other clustered file systems initiators use it actively. But much more advanced of load can be reached using next primitive. It also existed in SCSI for ages, but was either not very much implemented
or not very much used because it require untrivial interpretation with file systems. And it again started using together with virtualization and VMware who can move large amounts of data within storage or between storages. And this functionality is called xcopy or extended copy.
It appeared in SCSI SPC3 quite a few years ago and represent by three main copies. Receive copy operating parameters allows to get actual device capabilities, whether it supports xcopy and so on and with what parameters. xcopy command actually can copy data and receive copy status allows to get status
for background copy operation. How it actually looks? So our initiator can talk to one device and ask it, please copy data from offset x on some other maybe as a disk y to offset z on disk t
and that's maybe many megabytes, gigabytes, whatever and go. And after completion, notify me when it happened. And it actually works. It's used for disk cloning or VM cloning
or VM migration in VMware with fear. Here you may see as 40 gigabytes virtual machine move it from one disk to another, 40 gigabytes in 30 seconds, 1.3 gigabytes per second. And network traffic is also not significant. So we are floating initiators, we are floating network
and we are getting much faster operation completion. In fact, what CTL does now at this point, it does a full copy. It reads data from source and writes them to destination. It would be very much interesting to do some more fancy things with ZFS to do something like manual dedupe,
like make data here to represent that data. So it could be much faster in that case but it's not very trivial and at this point it just wish to have it. But Microsoft on this side decided
this functionality is overcomplicated and they decided to reinvent the wheel and actually make it even more complicated. But they provided, they made own specification called XCopyLite or XPC4 XCopy. It introduces several new comments
and the idea is to split read and write phases of XCopy operation. So you may tell, now please create a token of those data token is 512 byte of some cryptographically strong hashes and some data which represent all those data
in maybe just in space of disk or maybe even in time. So it can be considered as snapshot of disk. And then when snapshot is created, you may say, okay, and now that other storage and other room, please write to disks X data
or read from that snapshot, from that token. And that copy may happen in the background. And CTL also supports this functionality so it's limited with only copy operation within one storage host. It doesn't support copying between hosts.
And here you may see how Windows 2012 uses this functionality. So you may see 1.47 gigabytes per second copy with clearly insignificant amount of network traffic. So all copy operation goes inside storage.
At this moment, CTL doesn't actually create snapshots. It supports only the most simple way of tokens, or simple form of tokens, not a point on time reference, but it could be also implemented later when we found at least some initiator
supporting this functionality because Windows at this point doesn't require. And Windows can use it for copying files within a disk between iSCSI disks of the same target. It's used by Microsoft virtualization software for moving virtual machine, cloning virtual machine,
and CTL can efficiently work in that environment. So to summarize, now CTL supports these three kinds of, these three groups of primitives. VMware VI block, VMware VI simple visioning,
and Microsoft offloaded data transfers. But actually we support more. So here we see list of common supported by CTL now and read comments which ones were added during the last couple of years. So efficiently we doubled set of supported comments and now it's not easy to find something specifications
which CTL would not support. But even with all afloat, there is still a situation where storage actually should do something, should read and write something. And that hopefully should be done fast enough.
So CTL also got bunch of optimizations to improve its performance. So it got multiple worker threads instead of single ones. It got fine grained locking. It's got per loon and per queue locks instead of big single one. It's got number of other optimizations.
For example, for iSCSI and for fiber channel protocols it can coalesce comment completion and data transfer completion operations that allows to reduce number of interrupts or calls to hardware for fiber channel
for write operation. You can just reduce from three interrupts to two. So the first interrupt you're getting a request and then you're sending comment. The second interrupt is just completion and you are done. And so instead of three, so it's like 30% benefits in performance. So it was switched to use Yuma Zones
instead of Ovena locator, which gives significant benefit because Yuma Zones are SMP aware and scaled to large systems or many core systems. There was many other optimizations for performance, for memory use and everything.
And here I have some benchmarks to see what CTL can do now. The first test is for iSCSI. I've measured peak IOPS and throughput for different sizes of I operations. I had one target machine with total of 60 gigabits of network connectivity
and three initiators with the same pass-through, the same throughput limit. I had bunch of SSDs to back this storage and issue of initiator success at all of the looms
in multi-thread linear read pattern. And this is number I've got. So CTL can completely saturate 60 gigabits of traffic and it can do 1.2 million reads per second. For CTL it's quite symmetric, it reads or writes.
So I haven't explicitly test writes there because it depends on performance of back-end storage. ZFS is quite good there too but maybe not as good as on read. In this situation, network cards use it all possible or flawed. It was Chelsea or cards with full TCP or flawed.
So those cards handled all the TCP acknowledges, receive window, everything. And so, but if we take less expensive cards without TCP or flawed and just drop to TCP segmentation and larger receive, you may see that we're still
doing quite good but about 20% slower on smaller commands. That's because system has to handle mentioned TCPX and the transmissions and so on in software and it's creates CPU load, increases CPU load. But also quite good.
If we take more simple torque without jumbo frames, you may see that performance doesn't drop so much because it's only slightly because of slightly worse wire usage because of additional heaters. But performance is about the same
thanks to TSO and LRO or flawed of cards. So operating system doesn't see much difference from load perspective between previous two cases because either in one case or another, packets are going like 64K blocks at a time. We can, I've tested opposite case too
so if we have card without any of load at all but it supports jumbo frames, we still can be quite good. So reaching almost 60 gigabits of traffic with million IOPS. And if we take cards with no acceleration at all, that probably doesn't exist at such speeds but still we can do 30 gigabits of traffic
with a lot of IOPS so we are quite good. So this kind of cards usually one gigabit or less hundred megabits and we are probably 10 times faster than it could be possible. So here is total summary of all cases. So we are quite fast on both IOPS and throughput.
You may note that there is still some gap in between on middle size packets because ideal graph would intersect almost at the top. So we are slightly lower so there is some space for future improvement
and I hope to improve it with later work or maybe we soon get some hardware or float from vendors who will close this window. We'll see. So another set of tests I've made for fiber channel.
So it was the same hardware for systems but I used dual eight gigabit Kologic fiber channel cards. So same test, same environment, just replaced one with another. So after all optimizations to ISP driver and CTL related,
I was able to reach 160,000 IOPS through two fiber channel ports. I believe it's some kind of limitation of fiber channel cards because the system is not loaded. I don't see obvious bottleneck and while Kologic declares 200,000 IOPS
which speaks throughput of each port separately without having hardware specification, hardware data sheets, it's not obvious how can we reach those numbers. So I would obviously be happy to read the specification to know how to use multi-Q in supported by hardware,
how to maybe use some other techniques to improve performance but 160,000 IOPS is still quite good. I haven't got full 16 megabits or 16 gigabits but probably because it's just difference in measurements because fiber channel uses eight bits per 10 bits
encapsulation on physical layer. So on throughput side, it should be good enough. So as I've told, there are still a bunch of areas where we could improve things. So it would be good to do some fancy things
with xcopy to actually avoid copying which should be possible for ZFS. It should be cool to support xcopy between hosts but it's require some other user level interpolation to handle connection between hosts to do discovery of other hosts. That's some bigger project
but which could give benefits for virtual environments and so on. It would be good to recreate high-reliability clustering which actually existed in CTL before it was open-sourced. And so now we have some open-handed code paths in CTL
to do this but it never was created on open-source world. So it would be good to improve ZFS prefetch because it doesn't always operate perfectly in case of block storage, especially if you are doing a multi-threaded IO
or simultaneously executing several commands to do prefetch. So it's sometimes getting nuts and just isn't prefetching anything or prefetching or prefetching things you don't want to. But there are some workarounds was made to CTL to handle those things as good as possible.
And a fiber channel in FreeBSD is not full-featured comparing to open salaries for example but there are still things that could be done, that should be done and I hope to do some work in this area too. So this kind of talk I would like to have.
If there are anybody have questions, I'd like to keep it to answer or comments. Yeah, please. I haven't tested Chelsea with iSCSI or FLOAT.
So I've tested only with TCP or FLOAT. At the point when I was doing the testing, there was no, I had no access to iSCSI or FLOAT code yet. I don't know whether it was officially published. It's somewhere in development and I hope we see one soon. Yeah, but TCP or FLOAT does quite a lot of things.
Yes, that's definitely the area I'm going to test but I haven't done it yet. So we'll see.
Questions? Ah, yeah, please.
Yes, please. This work is really cool. Is this all in previous T-HEAD? It's even in TENS table now, so. To be able to actually do like ZFS scrub iOS that make it like read all copies of the data. Yeah.
Also, now it works on top of VFS layer. So we are limited to set of comments which we can do through that layer. Actually, there are several backends on top of which CTL can work. It's device, it's devolved file and they have slightly different capabilities.
So now devolved backend is most functional where we can do all functionality with support. For example, for file backends, we can't at this moment do unmap because there is no respective call on VFS layer to do unmap on FreeBSD. It's probably only a tiny piece is missing but we just need to grab somebody
who knows all this environment and implement just couple functions because VFS can do it, CTL wants to do it and we need just to have some syscall to do it.
No, definitely, VFS would also benefit from that
because now it effectively does read. So it allocates some buffer, temporary one, then it does some memory copy from VFS arc into this buffer and then discards it. So yeah, again, question of API, we have BO read, BO write, BO delete but we don't have BO verify or BO whatever.
So in case of device here, in case of geom device, we support unmap but we may not support the other things such as space thresholds and so on
because there is no such thing in geom layer or we can't do some cache control things because again, there is no such control bits in geom. For example, for a bit of SCSI force unit access, which is supported by CTL on top of file and the wall
but it can't be implemented on top of geom now. So all these backends are slightly different and in all cases, we are as close to functionality as possible in specific case. But CTL tries to block functionality not supported so initiator should not be confused in either case.
It detects what backend can do and hides unsupported. Any other questions? Thank you for attending. Thank you.