Form-based File Upload in HTML HTML におけるフォームを基にしたファイルのアップロード
This memo defines an Experimental Protocol for the Internet community. This memo does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested. Distribution of this memo is unlimited.
Currently, HTML forms allow the producer of the form to request information from the user reading the form. These forms have proven useful in a wide variety of applications in which input from the user is necessary. However, this capability is limited because HTML forms don't provide a way to ask the user to submit files of data. Service providers who need to get files from the user have had to implement custom user applications. (Examples of these custom browsers have appeared on the www-talk mailing list.) Since file-upload is a feature that will benefit many applications, this proposes an extension to HTML to allow information providers to express file upload requests uniformly, and a MIME compatible representation for file upload responses. This also includes a description of a backward compatibility strategy that allows new servers to interact with the current HTML user agents.
The proposal is independent of which version of HTML it becomes a part.
The current HTML specification defines eight possible values for the attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT.
In addition, it defines the default ENCTYPE attribute of the FORM element using the POST METHOD to have the default value "application/x-www-form-urlencoded".
This proposal makes two changes to HTML:
- 1) Add a FILE option for the TYPE attribute of INPUT.
- 2) Allow an ACCEPT attribute for INPUT tag, which is a list of media types or type patterns allowed for the input.
この提案は、 HTML に2つの変更を加えます。
In addition, it defines a new MIME media type, multipart/form-data, and specifies the behavior of HTML user agents when interpreting a form with ENCTYPE="multipart/form-data" and/or <INPUT type="file"> tags.
加えて、この提案は multipart/form-data
という新しい MIME 媒体型を定義し、
ENCTYPE="multipart/form-data"
及び/又は
<INPUT type="file">
タグのあるフォームを解釈する時の HTML
利用者エージェントの動作を規定します。
These changes might be considered independently, but are all necessary for reasonable file upload.
The author of an HTML form who wants to request one or more files from a user would write (for example):
<FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST> File to process: <INPUT NAME="userfile1" TYPE="file"> <INPUT TYPE="submit" VALUE="Send File"> </FORM>
The change to the HTML DTD is to add one item to the entity "InputType". In addition, it is proposed that the INPUT tag have an ACCEPT attribute, which is a list of comma-separated media types.
... (other elements) ... <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | RADIO | SUBMIT | RESET | IMAGE | HIDDEN | FILE )"> <!ELEMENT INPUT - 0 EMPTY> <!ATTLIST INPUT TYPE %InputType TEXT NAME CDATA #IMPLIED -- required for all but submit and reset VALUE CDATA #IMPLIED SRC %URI #IMPLIED -- for image inputs -- CHECKED (CHECKED) #IMPLIED SIZE CDATA #IMPLIED --like NUMBERS, but delimited with comma, not space MAXLENGTH NUMBER #IMPLIED ALIGN (top|middle|bottom) #IMPLIED ACCEPT CDATA #IMPLIED --list of content types > ... (other elements) ...
While user agents that interpret HTML have wide leeway to choose the most appropriate mechanism for their context, this section suggests how one class of user agent, WWW browsers, might implement file upload.
When a INPUT tag of type FILE is encountered, the browser might show a display of (previously selected) file names, and a "Browse" button or selection method. Selecting the "Browse" button would cause the browser to enter into a file selection mode appropriate for the platform. Window-based browsers might pop up a file selection window, for example. In such a file selection dialog, the user would have the option of replacing a current selection, adding a new file selection, etc. Browser implementors might choose let the list of file names be manually edited.
If an ACCEPT attribute is present, the browser might constrain the file patterns prompted for to match those with the corresponding appropriate file extensions for the platform.
When the user completes the form, and selects the SUBMIT element, the browser should send the form data and the content of the selected files. The encoding type application/x-www-form-urlencoded is inefficient for sending large quantities of binary data or text containing non-ASCII characters. Thus, a new media type, multipart/form-data, is proposed as a way of efficiently sending the values associated with a filled-out form from client to server.
利用者がフォームを埋めて、 SUBMIT
要素を選択したとき、ブラウザはフォーム・データと選択したファイルの内容を送信するべきです。
符号化型 application/x-www-form-urlencoded
は大量のバイナリ・データや非 ASCII 文字を含んだ文を送信するのには不十分です。
ですから、クライアントからサーバーに記入したフォームに関連付けられた値を送る十分な方法として新しい媒体型 multipart/form-data
を提案します。
The definition of multipart/form-data is included in section 7. A boundary is selected that does not occur in any of the data. (This selection is sometimes done probabilisticly.) Each field of the form is sent, in the order in which it occurs in the form, as a part of the multipart stream. Each part identifies the INPUT name within the original HTML form. Each part should be labelled with an appropriate content-type if the media type is known (e.g., inferred from the file extension or operating system typing information) or as application/octet-stream.
multipart/form-data
の定義は7章に含めています。
境界 (boundary) はデータのどこにも現れないものから選びます。
(この選択は時たま確率論的なものとなります。)
フォームの各欄は、フォーム中に出現した順序で、
複数部分流の一つの部分として送信します。
各欄は元の HTML フォーム中の INPUT
名で識別します。各部分は、媒体型がわかっているとき
(例えばファイルの拡張子やオペレーティング・システムの型情報から推測できるとき)
には、適当な内容型で、そうでないときには application/octet-stream
で札付けするべきです。
If multiple files are selected, they should be transferred together using the multipart/mixed format.
複数のファイルが選択されたときには、各ファイルは一緒に
multipart/mixed
書式を使って転送するべきです。
While the HTTP protocol can transport arbitrary BINARY data, the default for mail transport (e.g., if the ACTION is a "mailto:" URL) is the 7BIT encoding. The value supplied for a part may need to be encoded and the "content-transfer-encoding" header supplied if the value does not conform to the default encoding. [See section 5 of RFC 1521 for more details.]
HTTP プロトコルは任意の BINARY
データを転送できますが、
メイル転送 (例えば、 ACTION
が
mailto:
URL のとき) の既定値は 7BIT
符号化です。
部分に供給される値は、その値が既定符号化に適合しなければ、
符号化して content-transfer-encoding
頭をつける必要があるかもしれません。
(詳細については RFC 1521 の5章を参照。)
The original local file name may be supplied as well, either as a 'filename' parameter either of the 'content-disposition: form-data' header or in the case of multiple files in a 'content-disposition: file' header of the subpart. The client application should make best effort to supply the file name; if the file name of the client's operating system is not in US-ASCII, the file name might be approximated or encoded using the method of RFC 1522. This is a convenience for those cases where, for example, the uploaded files might contain references to each other, e.g., a TeX file and its .sty auxiliary style description.
元の局所ファイル名は、 content-disposition: form-data
頭の filename
引数として、又は複数ファイル群の場合には、
その小部分の content-disposition: file
頭において、同様に供給してもかまいません。
クライアント応用はファイル名を供給する最善の努力をするべきです。
送信者のオペレーティング・システムのファイル名が US-ASCII
でないなら、ファイル名は近似するか、 RFC1522
の方法を使って符号化しても構いません。
フォームで供給するファイル群が相互の参照を含んでいる場合、
例えば TeX ファイルとその .sty 追加スタイル記述のような場合には便利です。
On the server end, the ACTION might point to a HTTP URL that implements the forms action via CGI. In such a case, the CGI program would note that the content-type is multipart/form-data, parse the various fields (checking for validity, writing the file data to local files for subsequent processing, etc.).
サーバー側では、 ACTION
はフォーム動作を CGI
を介して実装している HTTP URL を指しているかもしれません。
そのような場合には、 CGI プログラムは content-type
が multipart/form-data
であることに気づき、
種々の欄を解析する (妥当性を確認し、ファイル・データを以後の処理のために局所ファイルに書き込むとか。)
ことでしょう。
The VALUE attribute might be used with <INPUT TYPE=file> tags for a default file name. This use is probably platform dependent. It might be useful, however, in sequences of more than one transaction, e.g., to avoid having the user prompted for the same file name over and over again.
The SIZE attribute might be specified using SIZE=width,height, where width is some default for file name width, while height is the expected size showing the list of selected files. For example, this would be useful for forms designers who expect to get several files and who would like to show a multiline file input field in the browser (with a "browse" button beside it, hopefully). It would be useful to show a one line text field when no height is specified (when the forms designer expects one file, only) and to show a multiline text area with scrollbars when the height is greater than 1 (when the forms designer expects multiple files).
While not necessary for successful adoption of an enhancement to the current WWW form mechanism, it is useful to also plan for a migration strategy: users with older browsers can still participate in file upload dialogs, using a helper application. Most current web browers, when given <INPUT TYPE=FILE>, will treat it as <INPUT TYPE=TEXT> and give the user a text box. The user can type in a file name into this text box. In addition, current browsers seem to ignore the ENCTYPE parameter in the <FORM> element, and always transmit the data as application/x-www-form-urlencoded.
Thus, the server CGI might be written in a way that would note that the form data returned had content-type application/x-www-form-urlencoded instead of multipart/form-data, and know that the user was using a browser that didn't implement file upload.
In this case, rather than replying with a "text/html" response, the CGI on the server could instead send back a data stream that a helper application might process instead; this would be a data stream of type "application/x-please-send-files", which contains:
この場合、 text/html
応用を返信するよりは、
サーバーの CGI は補助応用が代わりに処理するかもしれないデータ流を代わりに送ることができます。
これは型 application/x-please-send-files
のデータ流とすると、次のものを含むでしょう。
- * The (fully qualified) URL to which the actual form data should be posted (terminated with CRLF)
- * The list of field names that were supposed to be file contents (space separated, terminated with CRLF)
- * The entire original application/x-www-form-urlencoded form data as originally sent from client to server.
application/x-www-form-urlencoded
フォーム・データ全体。In this case, the browser needs to be configured to process application/x-please-send-files to launch a helper application.
The helper would read the form data, note which fields contained 'local file names' that needed to be replaced with their data content, might itself prompt the user for changing or adding to the list of files available, and then repackage the data & file contents in multipart/form-data for retransmission back to the server.
The helper would generate the kind of data that a 'new' browser should actually have sent in the first place, with the intention that the URL to which it is sent corresponds to the original ACTION URL. The point of this is that the server can use the *same* CGI to implement the mechanism for dealing with both old and new browsers.
The helper need not display the form data, but *should* ensure that the user actually be prompted about the suitability of sending the files requested (this is to avoid a security problem with malicious servers that ask for files that weren't actually promised by the user.) It would be useful if the status of the transfer of the files involved could be displayed.
This scheme doesn't address the possible compression of files. After some consideration, it seemed that the optimization issues of file compression were too complex to try to automatically have browsers decide that files should be compressed. Many link-layer transport mechanisms (e.g., high-speed modems) perform data compression over the link, and optimizing for compression at this layer might not be appropriate. It might be possible for browsers to optionally produce a content-transfer-encoding of x-compress for file data, and for servers to decompress the data before processing, if desired; this was left out of the proposal, however.
Similarly, the proposal does not contain a mechanism for encryption of the data; this should be handled by whatever other mechanisms are in place for secure transmission of data, whether via secure HTTP or mail.
In some situations, it might be advisable to have the server validate various elements of the form data (user name, account, etc.) before actually preparing to receive the data. However, after some consideration, it seemed best to require that servers that wish to do this should implement this as a series of forms, where some of the data elements that were previously validated might be sent back to the client as 'hidden' fields, or by arranging the form so that the elements that need validation occur first. This puts the onus of maintaining the state of a transaction only on those servers that wish to build a complex application, while allowing those cases that have simple input needs to be built simply.
The HTTP protocol may require a content-length for the overall transmission. Even if it were not to do so, HTTP clients are encouraged to supply content-length for overall file input so that a busy server could detect if the proposed file data is too large to be processed reasonably and just return an error code and close the connection without waiting to process all of the incoming data. Some current implementations of CGI require a content-length in all POST transactions.
If the INPUT tag includes the attribute MAXLENGTH, the user agent should consider its value to represent the maximum Content-Length (in bytes) which the server will accept for transferred files. In this way, servers can hint to the client how much space they have available for a file upload, before that upload takes place. It is important to note, however, that this is only a hint, and the actual requirements of the server may change between form creation and file submission.
In any case, a HTTP server may abort a file upload in the middle of the transaction if the file being received is too large.
Various people have suggested using new mime top-level type "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of "packet" to express indeterminate-length binary data, rather than relying on the multipart-style boundaries. While we are not opposed to doing so, this would require additional design and standardization work to get acceptance of "aggregate". On the other hand, the 'multipart' mechanisms are well established, simple to implement on both the sending client and receiving server, and as efficient as other methods of dealing with multiple combinations of binary data.
色々な人が、新しい mime 最上位型 aggregate
(修正)
を使って例えば aggregate/mixed
としたり、
不定長バイナリ・データの表現に複数部分様式の境界に依存するのではなく
packet
内容転送符号化を使ったりすることを提案しています。
そうすることに反対はしませんが、 aggregate
の追加の設計と承認のための標準化作業が必要になります。他方、 multipart
機構はよく確立しており、送信するクライアント及び受信するサーバーの両方を実装するのが簡単であり、
バイナリ・データの複数の組合せを処理する他の方式と同じくらい有効です。
Various people have wondered about the advisability of overloading 'INPUT' for this function, rather than merely providing a different type of FORM element. Among other considerations, the migration strategy which is allowed when using <INPUT> is important. In addition, the <INPUT> field *is* already overloaded to contain most kinds of data input; rather than creating multiple kinds of <INPUT> tags, it seems most reasonable to enhance <INPUT>. The 'type' of INPUT is not the content-type of what is returned, but rather the 'widget-type'; i.e., it identifies the interaction style with the user. The description here is carefully written to allow <INPUT TYPE=FILE> to work for text browsers or audio-markup.
Many input fields in HTML are to be typed in. There has been some ambiguity as to how form data should be transmitted back to servers. Making the content-type of <INPUT> fields be text/plain clearly disambiguates that the client should properly encode the data before sending it back to the server with CRLFs.
HTML の多くの入力欄は型付けされます。
どうフォーム・データをサーバーに送り返すべきかには幾らかの曖昧性があります。
<INPUT>
欄の内容型を text/plain
にすることによって、サーバーにこれを送る返す前に CRLF
で適当に符号化するべきであることを明確に曖昧でなくできます。
Independent of this proposal, it would be very useful for HTML interpreting user agents to allow a ACTION in a form to be a "mailto:" URL. This seems like a good idea, with or without this proposal. Similarly, the ACTION for a HTML form which is received via mail should probably default to the "reply-to:" of the message. These two proposals would allow HTML forms to be served via HTTP servers but sent back via mail, or, alternatively, allow HTML forms to be sent by mail, filled out by HTML-aware mail recipients, and the results mailed back.
この提案とは独立に、 HTML 解釈利用者エージェントがフォームの
ACTION
が mailto:
URL になることを許すととても便利になるでしょう。
これはこの提案とあわせてもあわせなくても良い考えに見えます。
同様に、メイルを介して受信した HTML フォームの ACTION
はおそらくメッセージの reply-to:
を既定値とするべきです。これらの2つの提案によって HTML フォームを HTTP
で供給してメイルで送り返したり、あるいは HTML フォームをメイルで送って
HTML が分かるメイル受信者が記入してメイルで送り返したりできるようになります。
In some scenarios, the user operating the client software might want to specify a URL for remote data rather than a local file. In this case, is there a way to allow the browser to send to the client a pointer to the external data rather than the entire contents? This capability could be implemented, for example, by having the client send to the server data of type "message/external-body" with "access-type" set to, say, "uri", and the URL of the remote data in the body of the message.
場合によっては、フォーム・ソフトウェアを操作する利用者が手元のファイルではなく遠隔データの
URL を指定したいと思うかもしれません。この場合、
ブラウザがクライアントに内容全体ではなく外部データの指示子を送ることができる方法があるでしょうか。
この能力は、例えばクライアントがサーバーに、
access-type
が、そう、 uri
に設定されて、メッセージの本体に遠隔データの URL
が入った型 message/external-body
のデータを送ることで実装できます。
If a form contains <INPUT TYPE=file> elements but does not contain an ENCTYPE in the enclosing <FORM>, the behavior is not specified. It is probably inappropriate to attempt to URN-encode large quantities of data to servers that don't expect it.
フォームが <INPUT TYPE=file>
要素を含んでいるものの囲んでいる
<FORM>
に ENCTYPE
が含まれていない時の動作は規定しません。
大量のデータを URN 符号化してそれを期待していないサーバーに送ろうとするのはおそらく不適切です。
As with all MIME transmissions, CRLF is used as the separator for lines in a POST of the data in multipart/form-data.
全ての MIME 転送同様、 multipart/form-data
のデータの
POST
の各行は CRLF を分離子として使います。
Note that mime headers are generally required to consist only of 7-bit data in the US-ASCII character set. Hence field names should be encoded according to the prescriptions of RFC 1522 if they contain characters outside of that set. In HTML 2.0, the default character set is ISO-8859-1, but non-ASCII characters in field names should be encoded.
Mime 頭は通常 US-ASCII 文字集合の7ビット・データだけで構成される必要があることに注意してください。 従って欄名がその集合の外の文字を含んでいるなら RFC 1522 の記述に従って符号化するべきです。 HTML2.0 では既定の文字集合は ISO-8859-1 ですが、欄名中の非 ASCII 文字は符号化するべきです。
Suppose the server supplies the following HTML:
<FORM ACTION="http://server.dom/cgi/handle" ENCTYPE="multipart/form-data" METHOD=POST> What is your name? <INPUT TYPE=TEXT NAME=submitter> What files are you sending? <INPUT TYPE=FILE NAME=pics> </FORM>
and the user types "Joe Blow" in the name field, and selects a text file "file1.txt" for the answer to 'What files are you sending?'
The client might send back the following data:
Content-type: multipart/form-data, boundary=AaB03x --AaB03x content-disposition: form-data; name="field1" Joe Blow --AaB03x content-disposition: form-data; name="pics"; filename="file1.txt" Content-Type: text/plain ... contents of file1.txt ... --AaB03x--
If the user also indicated an image file "file2.gif" for the answer to 'What files are you sending?', the client might
client might (>>6)send back the following data:Content-type: multipart/form-data, boundary=AaB03x --AaB03x content-disposition: form-data; name="field1" Joe Blow --AaB03x content-disposition: form-data; name="pics" Content-type: multipart/mixed, boundary=BbC04y --BbC04y Content-disposition: attachment; filename="file1.txt" Content-Type: text/plain ... contents of file1.txt ... --BbC04y Content-disposition: attachment; filename="file2.gif" Content-type: image/gif Content-Transfer-Encoding: binary ...contents of file2.gif... --BbC04y-- --AaB03x--
The media-type multipart/form-data follows the rules of all multipart MIME data streams as outlined in RFC 1521. It is intended for use in returning the data that comes about from filling out a form. In a form (in HTML, although other applications may also use forms), there are a series of fields to be supplied by the user who fills out the form. Each field has a name. Within a given form, the names are unique.
媒体型 multipart/form-data
は、 RFC 1521
で概説されている全複数部分 (multipart) MIME データ流の規則に従います。
(HTML の)
フォームには
(他の応用もフォームを使うかもしれませんが)、
フォームに記入した利用者によって供給された欄 (field)
の系列があります。各欄には名前があります。
あるフォームの中では、名前は唯一無二です。
multipart/form-data contains a series of parts. Each part is expected to contain a content-disposition header where the value is "form-data" and a name attribute specifies the field name within the form, e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is the field name corresponding to that field. Field names originally in non-ASCII character sets may be encoded using the method outlined in RFC 1522.
multipart/form-data
は部分 (part) の系列を含みます。
各部分は配置型が form-data
である
content-disposition
(内容配置) 頭 (header)
を含むことを想定します。例えば
content-disposition: form-data; name="xxxxx"
で、 xxxxx はその欄に対応する欄の名前です。
元々非 ASCII 文字集合で書かれていた欄名は、 name
引数の値の中では RFC 1522 で説明されている方式を使って符号化しても構いません。
As with all multipart MIME types, each part has an optional Content-Type which defaults to text/plain. If the contents of a file are returned via filling out a form, then the file input is identified as application/octet-stream or the appropriate media type, if known. If multiple files are to be returned as the result of a single form entry, they can be returned as multipart/mixed embedded within the multipart/form-data.
全ての複数部分 MIME 型について、各部分は任意選択の
Content-Type
があって、その既定値は
text/plain
になっています。
ファイルの内容がフォームの記入によって返される時には、
application/octet-stream
又は適切な媒体型が分かっていればそれで識別します。単一のフォーム項目の結果として複数のファイル群を返す時には、
multipart/form-data
中に埋め込まれた
multipart/mixed
として表現するべきです。
Each part may be encoded and the "content-transfer-encoding" header supplied if the value of that part does not conform to the default encoding.
各部分の値が既定の符号化に適合しないなら、
その部分は符号化して content-transfer-encoding
頭をつけても構いません。
File inputs may also identify the file name. The file name may be described using the 'filename' parameter of the "content-disposition" header. This is not required, but is strongly recommended in any case where the original filename is known. This is useful or necessary in many applications.
ファイル入力はファイル名をも識別するかもしれません。
ファイル名は content-disposition
頭の
filename
引数を使って記述します。
これは必須ではありませんが、元のファイル名を知っている場合には強く推奨します。
これは多くの応用で有用或いは必要です。
It is important that a user agent not send any file that the user has not explicitly asked to be sent. Thus, HTML interpreting agents are expected to confirm any default file names that might be suggested with <INPUT TYPE=file VALUE="yyyy">. Never have any hidden fields be able to specify any file.
This proposal does not contain a mechanism for encryption of the data; this should be handled by whatever other mechanisms are in place for secure transmission of data, whether via secure HTTP, or by security provided by MOSS (described in RFC 1848).
Once the file is uploaded, it is up to the receiver to process and store the file appropriately.
The suggested implementation gives the client a lot of flexibility in the number and types of files it can send to the server, it gives the server control of the decision to accept the files, and it gives servers a chance to interact with browsers which do not support INPUT TYPE "file".
The change to the HTML DTD is very simple, but very powerful. It enables a much greater variety of services to be implemented via the World-Wide Web than is currently possible due to the lack of a file submission facility. This would be an extremely valuable addition to the capabilities of the World-Wide Web.
Larry Masinter Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304
Phone: (415) 812-4365 Fax: (415) 812-4333 EMail: masinter@parc.xerox.com
Ernesto Nebel XSoft, Xerox Corporation 10875 Rancho Bernardo Road, Suite 200 San Diego, CA 92127-2116
Phone: (619) 676-7817 Fax: (619) 676-7865 EMail: nebel@xsoft.sd.xerox.com
- Media Type name
- multipart
- Media subtype name
- form-data
- Required parameters
- none
- Optional parameters
- none
- Encoding considerations
- No additional considerations other than as for other multipart types.
- Published specification
- RFC 1867
The multipart/form-data type introduces no new security considerations beyond what might occur with any of the enclosed parts.
[RFC 1521] MIME (Multipurpose Internet Mail Extensions) Part One:
Mechanisms for Specifying and Describing the Format of Internet Message Bodies. N. Borenstein & N. Freed. September 1993.
訳注 : RFC 1521 は廃止されました。新版は RFC2045, RFC2046, RFC2048, RFC2049 です。
[RFC 1522] MIME (Multipurpose Internet Mail Extensions) Part Two:
Message Header Extensions for Non-ASCII Text. K. Moore. September 1993.
訳注 : RFC 1522 は廃止されました。新版は RFC2047 です。
[RFC 1806] Communicating Presentation Information in Internet
Messages: The Content-Disposition Header. R. Troost & S. Dorner, June 1995.
訳注 : RFC 1806 は RFC2183 により廃止されました。 RFC 2183 は更に RFC2231 により更新されています。
, boundary
は誤りで、 ; boundary
が正解です。[8]
RFC 1867 - Form-based File Upload in HTML (2007-01-14 06:54:18 +09:00
版) <http://tools.ietf.org/html/rfc1867>