Samba4 Status – Andrew Tridgell

Wed Jun 2 19:36:00 EST 2004

File system capabilities that aren't possible with Samba3 – a file like test.txt, when you do a mouse over, it shows information about the file in Windows. There's more information, like a comment/subject/title/author, rather than just type and the size (as samba3 would give you). This is on a Windows machine, hosting stuff on an ext3fs based Samba4 server. Where did the metadata come from? A concept called streams – there is more than one data stream in the NT world. :$data. A 2D world, for 2 streams... - virus scanners make use of these data streams. Right click file -> Properties, and then its where the meta data info.

When you play a video, separate streams of data are useful – one for low-res, one for DTS 5.1, one for left/right audio and so on – they might not do this yet, but this is definitely the future for playback.

Microsoft is dropping support for FAT. All supported versions of Windows will be using NTFS; once they assume that, they will assume some of the file capabilities (more than we can do in the Unix world). Application developers will make use of such features, and its going to be difficult to use on Linux/samba server.

Samba4 will have all the NT VFS backends (multiple VFS backends) – it can expose all the gory details of what NTFS is capable of. There will be binary files for summary information, so for a given file “test.txt”, there will be 2 other “hidden” files.

On Linux systems that have extended attributes, the extended attributes will be stuck there. However, it has a 64KB limit; unlike Windows where you can have gigabytes of data in your streams. XFS/ext3 are slow when doing synchronous operations with extended attributes. With XFS, its getting fixed with 512-byte inodes (? Mike from SGI poke up...)

Luckily, most current users of streams are small users. 64KB is enough, but this might not be the case in the future. DRM signatures will probably end up in streams; not only will Linux users not be able to copy files, they cant even store them! Then we'll need some packaging tool – more painful!

Longhorn & WinFS – “open by content”. File -> Open – its open by filename, classic tree view – file name, file type. A new option saying open by content, type filenames or bits of it, and then open up all files that meet it, across all filesystems, including network drives. Plus live indexing gets updated. This will be instant, like a Google search. BeOS might have done this, except it might be periodically – this new open by content must be done; they did it via a separate file, but no file content capability. Its available currently via MS CIFS specs at the moment. Its up to the client to open the candidate list to scan now – its proprietary, theres also a plugin system (so search functions for a .doc will work). Under Windows “search options”, enable the “indexing service” and this is somewhat enabled (the containing text dialog allows this – like a grep -rl, except its indexed!). Its off by default because its slow, but WinFS its gonna be live updates, and this will be mighty useful. Can be integrated with Internet search, since it has network support as well, and this can beat Google!

User expectations of a desktop might change now – interaction will change now. How are we as a Linux community going to fix this? Don't navigate thru trees, its a mess; “most recently used” might work, but its not as good as the above idea; spatial!

This is not the case of a database as a filesystem. Like each word has to be stored in an individual column, and this wont work! SQL will not solve this problem.

Notification systems in userspace daemons, instead of a new filesystem? Or a filesystem, that Reiser has talked about implementing.

“To me its just all data, I don't use Windows”.

Major Features

protocol completeness – CIFS is a massive protocol – 15 ways to change a password with the protocol (15 different types of crypto). Zero methods of changing the password remotely on a Linux box! So which is better? 0 or 15? ;)
implementing all is important, because you dont know which one is being used; samba3 does about 4, but it gets away because it's AD implementation isn't as complete.
implementation on debug in Samba3 – if someone found a problem, fix it – nowadays, Samba is a commodity, so Samba4 is going to implement it all, for instance, to be proactive.
there is now a protocol scanner – so when a new service pack comes out, scan it and look for new protocols to implement, for instance. Reverse engineering? Tsk tsk, this could be DMCA/FTA issues.
But samba should be ok, due to various historical reasons. Or maybe move to New Zealand, the “last bastion of freedom” :P
extreme testability
Test suites go very far – 'gentest'
Between two servers possible; do regression testing; portability (32bit vs. 64bit, etc...); different versions of Windows (and get interesting results!) - NTFS behaves differently from FAT
nbench is for speed testing/benchmarking – but its really a black art; gentest is more for testing the protocol thoroughly.
non-POSIX backends
“no longer tied to POSIX; its no longer tied to Windows either”
fully asynchronous internals
flexible process models
auto-generated RPC infrastructure
IDL files get generated automatically in C code – an IDL compiler called pidl
auto-generation of live testing suites!
Over 100k lines of Samba3 code has been replaced with less than 10k in Samba4!
flexible database architecture
LDB – midpoint between LDAP and TDB (core database in samba3/2.2 – trivial database, via key-value pair db)
LDAP-like API, with very fast indexing, and LDAP search expressions will be supported!
ldb usage for all persistent databases, while tdb is for temporary databases.
smb.conf is going to disappear – its going to sit in ldb
LDB is very, very important to Samba4. It is the core. Samba deals directly with the LDB database, nice and fast, and no protocol restrictions; then OpenLDAP acts as an LDAP front-end to expose it; Heimdel acts as a Kerboros front-end to expose it to the network – and you get Active Directory.
No need to go offline to reindex stuff. ldbsearch/edit/* can even backup databases, restore them, and so on.
Developers can get hands dirty now. Documenters, test suite folk, coders, etc... should come in and use it. It can be an /etc/passwd replacement for instance; GNOME might find it as a gconf replacement?
Raw client library
async interfaces, oplock support, no 'smart's so it sends exactly what its asked for. New interface takes a lot more code now.

Currently, thought is that no patents are in breach. They are lying to vendors (with embedded Windows sales for instance), but at the moment Samba doesn't violate any patents. Tridge spends lots of time talking to lawyers as well... As an extension to grep -irl, its prior art anyways :) As for encryption, its protecting your password, and you own your password; its not breaking a Microsoft copyrighted stuff, like DeCSS would be. So Samba is safe, again...

Currently, Windows XP can connect to a Samba4 Active Directory domain controller, but now it needs to login... 600ffff (we do the 701ffff) – encryption stuff. So here's some cryptographic challenge – session keys are 128-bit now rather than 64-bit. Its an MD5 of an MD5, and then a hmac_MD5, and thats your session key! It needs to fix crypto credentials now... “sesskey”. This has to be worked out, otherwise, you can't be an Active Directory Domain Controller.

Process models are new in Samba4 – you can write your own process model. Threads are slow – divide process models using pthreads, but don't do this – fork is better – threads are slower than processes.

NFS v4 is much closer to CIFS, just with different wrappers. At some stage, CIFS might be what we use for Unix-to-Unix sharing. Long way before NFS v4 becomes a serious filesystem, because it isn't something you expect out of a modern filesystem (meta data handling, for instance).